On the Effectiveness of Visible Watermarks
如果无法正常显示,请先停止浏览器的去广告插件。
1. On the Effectiveness of Visible Watermarks
Tali Dekel
Michael Rubinstein
Ce Liu
Google Research
William T. Freeman
{tdekel,mrub,celiu,wfreeman}@google.com
Watermark (W)
(a) Input watermarked image collection
Zoom-in
alpha matte (α)
(b) Computed watermark+ alpha matte
(c) Recovered images (our result)
Figure 1. We show that visible watermarks as employed by photographers and stock content marketplaces can be removed automatically.
While removing a watermark from a single image automatically is extremely challenging, watermarks are typically added in a consistent
manner to many images (a). We show that this consistency can be exploited to automatically infer the watermark pattern (b) and to obtain
the original, watermark-free content with high accuracy (c). We then investigate and report how robust such an attack is to different types
of inconsistencies that may be introduced in the watermarking process to improve its security, such as randomly changing the watermark’s
position and blend factor, or applying subtle geometric deformation to the watermark when embedding it in each image.
Abstract
Visible watermarking is a widely-used technique for
marking and protecting copyrights of many millions of im-
ages on the web, yet it suffers from an inherent security
flaw—watermarks are typically added in a consistent man-
ner to many images. We show that this consistency allows to
automatically estimate the watermark and recover the orig-
inal images with high accuracy. Specifically, we present a
generalized multi-image matting algorithm that takes a wa-
termarked image collection as input and automatically es-
timates the “foreground” (watermark), its alpha matte, and
the “background” (original) images. Since such an attack
relies on the consistency of watermarks across image col-
lections, we explore and evaluate how it is affected by var-
ious types of inconsistencies in the watermark embedding
that could potentially be used to make watermarking more
secure. We demonstrate the algorithm on stock imagery
available on the web, and provide extensive quantitative
analysis on synthetic watermarked data. A key takeaway
message of this paper is that visible watermarks should be
designed to not only be robust against removal from a single
image, but to be more resistant to mass-scale removal from
image collections as well.
1. Introduction
Visible watermarks are used extensively by photogra-
phers and stock content services to mark and protect dig-
ital photos and videos shared on the web. Such watermarks
typically involve overlaying a semi-transparent image con-
taining a name or a logo on the source image (Figure 1(a)).
Visible watermarks often contain complex structures
such as thin lines and shadows in order to make them harder
to remove. Indeed, removing a watermark from a single im-
age without user supervision or a-priori information is an
extremely difficult task. However, the fact that watermarks
are added in a consistent manner to many images has thus
far been overlooked. For example, stock content market-
places typically add similar versions of their logos to pre-
views of many millions of images on the web. We show
that the availability of such watermarked image collections
makes it possible to invert the watermarking process and
nearly perfectly recover the images that were intended to be
protected (Fig. 1(c)). This can be achieved automatically by
only observing the watermarked content.
We show how the problem of watermark removal from
an image collection can be formulated as a generalized
multi-image matting problem, where the goal is to esti-
mate the “foreground” (watermark) image and alpha matte,
along with the “background” (original) images, using many
observed examples. Different from natural image matting
methods that rely on user scribbles to constraint the prob-
lem, our method leverages the redundancy in the data. In
particular, we first extract consistent image structures across
the collection to obtain an initial estimate of the matted wa-
termark and detect the watermark region in all the images.
We then solve an optimization problem that separates the
matted watermark into its image and alpha matte compo-
nents (Fig. 1(b)) while reconstructing a subset of the back-
ground images. In our experiments we found that a few
1 2146
2. R
G
B
Image/Watermark Decomp.
Estimated Gradients
Watermark update
Input watermarked images
(I) Joint matted watermark estimation and detection
(Sec. 3.1)
Matte update
(II) Matte and Blend-Factor (III) Multi-Image Matting and recon.
Init. (Sec. 3.2)
(Sec. 3.2)
Figure 2. Automatic watermark extraction pipeline. (I) The algorithm first jointly estimates the matted watermark (the product of the
alpha matte and watermark image) and localizes it in all the images by detecting consistent image gradients across the collection. This
initial estimate is correct up to a spatially-varying shift. (II) The aligned detections are used to estimate an initial alpha matte, and the
estimated matted watermark is refined. (III) These are then used as initializations for our multi-image matting optimization.
hundred images marked by the same watermark already suf-
fice for high quality estimation of the watermark and alpha
matte. Once the watermark pattern is recovered, it can be ef-
ficiently removed in mass scale from any image marked by
it. Importantly, we do not synthesize or inpaint the water-
marked regions; rather, we actually invert the watermarking
process to recover the original, watermark-free images.
As such an attack relies on the watermark’s consistency
across the image collection, a natural question is whether
one could prevent it by breaking this consistency. There-
fore, we study how robust the attack is to various types
of inconsistencies—or variations—that could potentially be
introduced while embedding the watermark in each image.
We show, for example, that randomly changing the posi-
tion of the watermark across the collection does not prevent
such an attack from detecting and removing the watermark,
nor do random changes in the watermark’s opacity or color.
Interestingly, we found that applying small spatial deforma-
tion to the watermarks during embedding can significantly
degrade the quality of the watermark-removed images, with
only imperceptible changes to the watermark itself.
We demonstrate results on watermarked image collec-
tions obtained from top stock photography web sites, as
well as extensive quantitative analysis on synthetic water-
marked datasets. A key contribution of our paper is in
surfacing vulnerabilities in current visible watermarking
schemes, which put many millions of copyrighted images at
risk. Specifically, we argue that visible watermarks should
be designed to not only be robust against removal from sin-
gle images, but to be resistant against removal from image
collections as well. We believe our work can inspire devel-
opment of advanced watermarking techniques for the digital
photography and stock image industries.
2. Related Work
A vast literature exists on digital watermarking (see e.g.,
[16, 17] for surveys). We focus on visible watermarks su-
perimposed on images and limit the scope of our review to
work in that area.
Braudaway et al. [3] were among the first to introduce
visible watermarks in digital images. They used an adap-
tive, nonlinear pixel-domain technique to add a watermark
to an input image as a means to identify its ownership, while
at the same time not obscuring the image details behind it
and making the watermark difficult to remove. This scheme
has been extended in various ways. Meng and Cheng [13]
extended this model to the DCT domain and applied it to
compressed video steams. Kankanhalli and Ramakrish-
nan [8] used statistics of block DCT coefficients to deter-
mine the watermark embedding coefficients for each block.
They later extended it to account for the texture sensitivity
in the human visual system to better preserve the perceptual
quality of the images [14]. Hu and Kwong [6] implemented
adaptive visible watermarking in the wavelet domain to han-
dle visual discontinuities that may be introduced by DCT-
based methods. While some of these methods may improve
the visual quality of the watermarks and make them harder
to remove, in practice, most modern stock content market-
places use a standard additive watermarking model, which
is the model we focus on in Sec. 3 and generalize in Sec. 4.
As visible watermarking plays an important role in pro-
tecting image copyrights, researchers have looked into ways
to attack it. Pei and Zeng [15] proposed to use Independent
Component Analysis (ICA) to separate the source image
from the watermark. Huang and Wu [7] used classic image
inpainting methods [2] to fill in the image regions covered
by the watermark. These techniques operate on a single im-
age, require a user to manually mark the watermark area
and cannot handle large watermarked regions (Fig. 4(b)).
More related to our case are methods for watermark re-
moval in videos [21, 5, 19]. However, such methods rely on
temporal coherency of videos, i.e., assume that image con-
tent occluded by the watermark in one frame appears unoc-
cluded in others frames [21, 19]. This assumption does not
apply to the stock photo collections we deal with. In ad-
dition, all these methods inpaint the logo/watermark area,
whereas our goal is to explicitly recover the original image
by utilizing the semi-visible image content under the water-
mark.
Watermark removal is also related to classical image
matting, where the goal is to decompose a single image into
background and foreground layers [9, 18]. As the matting
problem is inherently ill-posed and the majority of pixels
2 2147
3. are either definitely background or foreground, most ex-
isting methods rely on a user to provide hard constraints.
Moreover, in our setting the opacity of the watermark is typ-
ically low in all pixels, i.e., all pixels are either background
or mixed. This makes the decomposition more challeng-
ing. Finally, natural image matting is typically used for im-
age editing applications such as object cut and paste, which
require an accurate alpha matte but can tolerate errors in
the background layer. In our case, the quality of the recon-
structed background is key. We compare our results with
image matting in Sec. 5.
3. An Attack on Watermarked Collections
A watermarked image, J, is typically obtained by super-
imposing a watermark, W , to a natural image, I. That is,
J(p) = α(p)W (p) + (1 − α(p)) I(p),
(1)
where p = (x, y) is the pixel location, and α(p) is a spatially
varying opacity, or alpha matte. The most commonly used
watermarks are translucent to keep the underlying image
content partially visible [3]. That is, α(p) < 1 for all pixels,
or α = c · α n , where c < 1 is a constant blending factor, and
α n ∈ [0, 1] is a normalized alpha matte. Similar to natural
image matting, for α n , the majority of pixels are either only
background (α n (p) = 0) or only foreground (α n (p) = 1).
Following Eq. 1, and given W and α, one could trivially
invert the watermarking process via the per-pixel operation
I(p) =
J(p) − α(p)W (p)
.
1 − α(p)
(2)
However, when no prior information is available, the prob-
lem of recovering I given J alone is extremely challeng-
ing and inherently under-determined—there are three un-
knowns per pixel (W, α, I), and a single constraint (Eq. 1).
However, as discussed in Sec. 1, watermarks are typi-
cally added in a consistent way to many images. Formally,
for a collection of images, {I k }, marked by the same W and
α, we have (omitting the pixel index p for brevity)
J k = αW + (1 − α)I k , k = 1, · · · , K
(3)
K
Our goal is to recover W , α and {I k } K
k=1 given {J k } k=1 .
This multi-image matting problem is still under-
determined as there are 3K equations and 3(K + 1) + 1
unknowns per pixel, for K color images. However, the
coherency of W and α over the image collection, together
with natural image priors, allow solving it to high accuracy,
fully automatically.
Our watermark removal algorithm consists of several
steps, illustrated in Fig. 2. We next describe each of them in
detail. We first consider the case of a consistent watermark-
ing scheme, i.e., the images are marked by the same water-
mark and alpha matte, in the same position. We then gener-
alize this model in Sec. 4, allowing for positional variations,
as well as subtle geometric and color variations across the
collection.
(b)
(a)
(c)
Figure 3. Initial watermark estimation and detection. (a) The
user provides a rough bounding box around the watermark in a
single image (for current stock collections on the web this is not
needed; see text). (b) The magnitude of gradients of (a). (c) The
magnitude of median gradients across the collection after 2 itera-
tions of watermark detection and estimation (see Sec. 3.1).
3.1. Initial Watermark Estimation & Detection
The first task is to determine which image structures in
the collection belong to the common watermark, and to
detect them in all the images. This is a chicken and an
egg problem since estimating the watermark requires know-
ing which regions in the images are watermarked, and vice
versa. We solve this by jointly estimating the matted water-
mark and detecting it in all the images. Specifically, we it-
erate between the following estimation and detection steps.
I. Estimating the Matted Watermark Given a current
estimate of the watermarked regions in the images, we de-
termine which image structures in the collection belong to
the common watermark by observing consistent image gra-
dients across the collection. Specifically, we compute the
median of the watermarked image gradients, independently
in x and y directions, at every pixel location p:
m (p) = median k (∇J k (p)).
∇ W
(4)
As the number of images K increases, Eq. 4 converges to
the gradients of the true matted watermark, W m = αW , up
to a shift (see Fig. 3). To demonstrate why that is the case,
we treat I k and J k as random variables, and compute the
exception E [∇J k ]. Using Eq. 3 we have,
E [∇J k ] = E [∇W m ] + E [∇I k ] − E [∇(αI k )]
= ∇W m + E [∇I k ] − ∇αE [I k ] − αE [∇I k ]
(5)
= ∇W m − ∇αE [I k ] ,
where the second equality is from the derivative of multipli-
cation. The third equality is based on the known property
of natural image gradients to be sparse, i.e., the chance of
having strong gradients at the same pixel location in multi-
ple images is small. Hence, E [∇I k ] ≈ 0. It follows that
E [∇J k ] approximates the gradients of the matted water-
mark, W m , except for pixels for which ∇α = 0. At those
pixels the gradients are shifted by ∇α · E [I k ]. For now, we
continue with this shifted initialization. We will show how
this shift can be corrected later on (Sec. 3.2).
m to remove boundary regions by comput-
We crop ∇ W
m (p) and taking the bounding box
ing the magnitude of ∇ W
3 2148
4. (a) Input
(b) Inpainting
(c) Matting decomp.
(d) Direct subtraction
(e) Ours
Figure 4. Watermark removal comparison with baselines. The watermarked region in (a) is inpainted by photoshop (b) using the
estimated matte, α, as a mask. (c) Result by alpha matting decomposition [9]. (d) Result when directly subtracting the estimated watermark
(Eq. 2). (e) The result of the attack described in Sec. 3.
of its edge map (using Canny with 0.4 threshold). The ini-
m ≈ W m (correct up to a shift) is
tial matted watermark W
obtained using Poisson reconstruction (Fig. 3(c)).
II. Watermark Detection Given the current estimate
m , we detect the watermark in each of the images us-
∇ W
ing Chamfer Distance commonly used for template match-
ing in object detection and recognition [1]. Specifically,
for a given watermarked image, we obtain a verbose edge
map (using Canny edge detector [4]), and compute its Eu-
clidean distance transform, which is then convolved with
m (flipped horizontally and vertically) to get the Cham-
∇ W
fer distance from each pixel to the closest edge. Lastly, the
watermark position is taken to be the pixel with minimum
distance in the map. We found this detection method to
be very robust, providing high detection rates for diverse
watermarks and different opacity levels, as demonstrated in
Sec. 5 and the supplementary material.
To initialize the joint estimation, if the relative position
of the watermark in the images is fixed (as is the case with
any stock image collection we observed on the web), we
m ,
get an initial estimation of the watermark gradients, ∇ W
by registering the images relative to their centers and run-
ning step I. If the watermark position is not fixed, we only
require the user to mark a rough bounding box around the
watermark in one of the images (Fig. 3(a)). We then use the
gradients in the given bounding box as an initial estimation
for the watermark gradients. We found that iterating be-
tween steps I. and II. 2-3 times is enough to obtain accurate
detections and a reliable matted watermark estimation.
3.2. Multi-Image Matting and Reconstruction
Given the aligned detections in all input images, our goal
is then to solve the multi image matting problem (Eq. 3),
i.e., decompose the observed intensities in each image into
their watermark, alpha-matte, and original image compo-
nents. The challenge is how to resolve the inherent ambigu-
ity in this problem completely automatically and reliably.
m pro-
Our initial estimation of the matted watermark W
vides us with valuable information about the structures that
do not belong to the original images. However, from Eq. 3
it is clear that it is insufficient to constrain the problem as
the matted watermark needs to be decomposed into its im-
age component W , and alpha matte component α. Accu-
rate estimation of each component is vital, as very small er-
rors (as little as 2 intensity levels in the watermark) may al-
ready show up as visible artifacts in the reconstruction (see
Fig. 4(d)).
To address these challenges, we formulate the watermark
inversion problem as an optimization problem that jointly
solves for W , α and a collection of K watermark-free im-
ages {I k }. Formally, we define the following objective:
arg min
W,α,{I k }
E data (W, α, I k )+λ I E reg (∇I k )
k
+λ w E reg (∇W )+λ α E reg (∇α)+βE f (∇(αW )). (6)
The term E data (W, α, I k ) penalizes for deviations of the k th
image from the formation model (Eq. 1), at every pixel p,
and is given by
E data (I k , W, α) =
√
p
Ψ |αW + (1 − α)I k − J k | 2 ,
(7)
where Ψ(s 2 ) = s 2 + ǫ 2 , ǫ = 0.001 is a robust function
that approximates L 1 distance (p is omitted for brevity).
The terms E reg (∇I) and E reg (∇W ) are regularization
terms that encourage the reconstructed images and the wa-
termark to be piecewise smooth where the gradients of the
alpha matte are strong. We define E reg (∇I) as
E reg (∇I) =
p
Ψ(|α x |I x 2 + |α y |I y 2 ),
(8)
where I x , I y are the horizontal and vertical derivatives of
the image, respectively, and Ψ as defined above. The term
E reg (∇W ) is defined similarly. The regularization term on
alpha is given by E reg (∇α) = p Ψ(α x 2 + α y 2 ).
Even with the use of multiple images and smoothness
priors the decomposition problem is still ambiguous. For
example, for a fixed alpha matte, there may be infinite num-
ber of piecewise smooth watermark/images decompositions
that can satisfy the formation model (Eq. 3) . The last term
in the objective–the fidelity term–reduces this ambiguity by
encouraging the matted watermark W m = αW to have sim-
m , and is given by
ilar gradients to the initial estimate W
E f (∇W m ) =
p
m 2 ).
Ψ(∇W m − ∇ W
(9)
The impact of the fidelity term is demonstrated in Fig. 5(a),
where we compare our result with (top) and without (bot-
tom) this term. In addition, we compare it to the matting
decomposition approach taken in [9], using the ground truth
alpha matte as input (Fig. 4(d)).
4 2149
5. No variations
0
Opacity var. (20/255 max) Spatial pert. 0.5 px (max) Spatial pert. 1 px (max)
w/ blend w/ flow w/ flow
w/o blend w/o flow w/o flow
(a)
(b)
(c)
(d)
Figure 5. Robustness to watermark variations. This figure is best viewed on a monitor. Top row shows the input. (a) Our result for
the traditional (consistent) watermarking model; bottom image shows the result for β = 0, i.e., when the fidelity term (Eq. 9) is not used.
(b) Our result when random opacity changes are introduced in each image, with (middle) and without (bottom) estimating a blend factor
per image (Sec. 4). Notice no significant visible difference comparing (a) middle and (b) middle. (c) Our results for 0.5 pixel per-image
perturbation, with (middle) and without (bottom) flow estimation. (d) similar to (c), for 1 pixel perturbation. Notice how visual artifacts
gradually increase with perturbation magnitude, making it difficult for the attack to produce artifact-free reconstruction.
per-image watermarks {W k } by minimizing the term E aux .
This step reduces to taking the median of {W k }. That is,
W = median k W k .
Optimization The resulting optimization problem (Eq. 6)
is non-linear and the number of unknowns may be very
large when dealing with a large collection (O(KN ) un-
knowns, where N and K are the number of pixels per im-
age, and number of images, respectively). To deal with
these challenges, we introduce auxiliary variables {W k },
where W k is the watermark of the k th image. Each per-
image watermark W k is required to be close to W . For-
mally, we rewrite the objective as follows
III. Matte Update. Here, we solve for α, while keeping
the rest of the unknowns fixed. In this case, we minimize
the following objective over α:
arg min k (E data (I k , W k , α) + λ I E reg (∇I k ) + λ w E reg (∇W k ) +
λ α E reg (∇α) + βE f (∇(αW k )) + γ k E aux (W, W k ),(10)
where E aux (W, W k ) = p |W − W k |.
Using these auxiliary variables, we solve smaller and sim-
pler optimization problems (using alternating minimiza-
tion). The resulting iterative algorithm consists of the fol-
lowing steps.
I. Image–Watermark Decomposition At this step, we
minimize the objective w.r.t. W k , and I k , while keeping α
and W fixed. Thus, the optimization in Eq. 10 reduces to:
arg min E data (I k , W k ) + λ I E reg (∇I k ) + λ w E reg (∇W k ) +
W k ,I k
βE f (∇(αW k )) + γE aux (W, W k ).
(11)
We solve this minimization problem using Iteratively-
Reweighed-Least-Square (IRLS), where the resulting linear
system is derived in Supplementary Materials (SM).
II. Watermark Update In this step, we opt to estimate a
global watermark W that is consistent with all the estimated
k
E data (α, I k , W ) + λ α E reg (∇α) + βE f (∇(αW )).
(12)
Here too, the solution is obtained using IRLS (the final
linear system is derived in the SM).
These steps are iterated several times until convergence.
Matte and Blend Factor Initialization: The matte in our
formulation is given by α = c · α n , where c is a constant
blending factor, and α n is a normalized matte. The initial-
ization for α n is obtained by first running single image mat-
ting [9]. To avoid the required user input of [9], our method
automatically computes foreground/background masks (us-
m ) and uses them as “scrib-
ing adaptive threshold on W
bles”. The initial matte is taken to be the median over all
single-image mattes.
We infer the blend factor from “black” patches (aver-
age intensity below 0.01) in the image collection because in
those patches the formation model reduces to: J k = c · α n W
(no image component). Since W is unknown at this point,
we use the initial matted watermark (Sec. 3.1) to estimate
m = c · α n W − c · α n E[I k ].
c. From Sec. 3.1, we have: W
m +
Combining these two observations, we get: J k = W
c · α n · E[I k ]. We estimate DC = E[I k ] as the median
5 2150
6. intensity across the image collection at each I k ’th patch lo-
cation, and plug our initial estimation of α n . We then solve
for c using least squares over all black patches. In all the
datasets we have tested, thousands of dark patches were de-
tected, which is more than enough for a robust estimation
of c. Finally, the estimated matte is used to correct for the
m . This is done by adding the term c · α n · DC
shift in W
to W m to get the final matted watermark estimation we will
use in the next section (Fig. 2(II)).
3.3. Removing the Watermark in a New Image
Once we have the solution for W and α, we can use them
to remove the watermark from any new image marked by it,
without the need to run the entire pipeline again. As dis-
cussed in Sec. 3.2, very subtle errors in the watermark or
alpha matte are likely to show up as noticeable visual ar-
tifacts. Therefore, we avoid direct reconstruction and in-
stead perform the image-watermark decomposition step of
our multi-image matting algorithm (Eq. 11).
Long Watermak Pipeline Removal
Edge Res.(w×h) (min)
(sec)
AdobeStock 422
1000 623 × 134
33
18
123RF
1376
650 650 × 433 115
60
CanStock
3000
450 450 × 300
28
12
fotolia
285
500 199 × 66
5
1.5
CVPR17
1000
640 403 × 67
34
5
Copyrights 1000
640 339 × 307
47
40
Table 1. Datasets and running time. Running time is given for:
(i) Pipeline: full framework, including initial watermark estima-
tion and detection (using all images in the dataset), and multi-
image matting (using a subset of 50 images). (ii) Removal: re-
construction of a single image.
Datasets # Images
α-aware flow: In this step we solve for ω based on the
current estimate of W , α, c and I. This results in an al-
gorithm which is similar to conventional optical flow and
can be solved using IRLS. To do so, we need to linearize
the data term. With an existing estimate ω, our goal is to
estimate the optimal increment dω = (du, dv). The Taylor
expansion results in
4. A Generalized Watermarking Model
The consistent watermarking model (Eq. 3) is the one
used in practice in every stock image collection we encoun-
tered on the Web (see Sec. 5.1). However, a natural question
is whether one can avoid the attack by breaking the consis-
tency of the watermark across the collection. To gain in-
sights, we explore the impact of variations in the watermark
and alpha matte from image to image.
We assume that watermarks cannot be arbitrary altered,
since various design principles and artistic choices are taken
into account in generating and placing them in images.
Thus, we focus on subtle variations that roughly preserve
the original appearance of the watermark. Specifically, we
generalize our model to allow for two types of variations:
(i.) subtle opacity changes (ii.) subtle spatial perturbations
(deformations). The generalized formation model for the
k’th image is given by
J k = c k α(ω k )W (ω k ) + (1 − c k α(ω k ))I k ,
(13)
where ω k = (u k (p), v k (p)) represents a dense warp field
(applied to both W and α), and c k is the per-image blending
factor (that controls the opacity).
Plugging Eq. 13 into our objective function (Eq. 6), and
adding regularization term on the warping field leads to
arg min
w
Ψ(|J −cW m (ω)−(1−cα(ω))I|)+λ ω
Ψ(|∇ω|),
where W m = αW (the image index k is omitted for
brevity). Here too, we use alternating minimization, i.e.,
our multi-image matting algorithm has two additional steps:
blend factor estimation (solving for c k ), and α-aware flow
estimation (solving for ω k ), per image.
Blend factor: The initial estimation from Sec. 3.2 yields
an average blend factor. We then solve for a small deviation
per image from that estimation. See the supplementary for
the full derivation.
′
′
′
W m (ω + dω) ≈ W m
+ W mx
du + W my
dv,
′
′
′
α(ω + dω)
≈ α + α x du + α y dv,
′
′
′
= W m (ω), α ′ = α(ω), and α x ′ , α y ′ , W mx
, W my
where W m
′
′
are the partial derivatives of α and W m , respectively. The
linearized data term is given by (omitting the constant blend
factor c)
′
′
′
Ψ(|W m
+ (1 − α ′ )I − J + (W mx
− α x ′ I)du + (W my
− α y ′ I)dv|).
This can be related to the classical optical flow equation
′
I t + I x du + I y dv, by denoting I t = W m
+(1−α ′ )I −J,
′
′
′
′
I x = W mx − α x I, and I y = W my − α y . We modified the
optical flow code [11] to implement this α-aware flow.
5. Results
We tested our algorithm extensively on watermarked im-
age collections by well-known stock photography web sites,
as well as watermarked datasets we generated. For each of
the datasets, the number of images, long edge, and the wa-
termark resolution are reported in Table 1.
We set the parameters of the multi-image matting al-
gorithm empirically: λ I ∈ [0, 1], λ α = 0.01, λ W ∈
[0.001, 0.1] β ∈ [0.001, 0.01]. In all the experiments, we
used a subset of 50 images from the entire collection and
4 iterations, and reconstructed all the watermarked images.
We used all available images in each dataset to get the ini-
tial estimation of the watermark. We show the effect of the
number of images on the performance in the supplemen-
tary material. Running times are reported in Table 1 using a
non-optimized MATLAB implementation.
5.1. Results on Stock Imagery
We crawled publicly available preview images from pop-
ular stock content web sites using 15 predefined queries
such as “fashion”, “food”, “sports” and “nature”. Sample
results are shown in Fig. 6 (more are available in the SM).
6 2151
7. AdobeStock (422 images), c=0.41
123RF (1340 images), c=0.2
CanStock (3000 images), c=0.17
fotolia (285 images), c=0.45
Figure 6. Results on stock imagery. This figure is best viewed on a monitor. For each dataset (row), we show (from left to right) the
input watermarked image (cropped for better viewing), our automatic, watermark-free image reconstruction, and zoomed in regions. The
number of images and estimated blend factor are reported above each row. The corresponding estimated watermarks are shown in Fig. 7 )
As can be seen in the input images in Fig. 6, the wa-
termarks contain various structures and shapes, some more
complex than the others, including both thick (e.g., Can-
Stock) and thin letters and lines (e.g., 123RF), smooth bor-
ders (e.g. fotolia), and shadows (e.g. AdobeStock). In part
of the images, the watermarked region is highly textured,
while in others it is smooth. The opacity of the watermarks
is low and varies across the different datasets (the estimated
blending factors are shown above each row). In all these
cases, our algorithm accurately estimated the watermarks
(Fig. 7) and original images (Fig. 6).
5.2. Synthetic Datasets & Quantitative Evaluation
We generated a number of watermarked collections,
using two watermark images: CVPR17, and Copyrights.
For the source, watermark-free images, we used 1000 im-
ages chosen randomly from the Microsoft COCO ‘val2014’
dataset [10]. With the ground truth in hand, we quantita-
tively evaluated different aspects of the method using the
following metrics:
Detection: We use the Euclidean distance between the cen-
ter of detected bounding box (Sec. 3.1) and the ground truth;
a distance larger than 2px in considered as miss detection.
Reconstruction: We use two well-known metrics: Peak-
Signal-to-Noise-Ratio (PSNR), and Structural dissimilarity
Image Index (DSSIM), both of which were shown to cap-
ture perceptible image degradations. Formally, the DSSIM
between each of the reconstructed images I ˜ k and the ground
truth I k is defined as DSSIM( I ˜ k , I k ) = 2 1 (1 − SSIM( I ˜ k , I k )),
where SSIM(x, y) is the Structural Similarity [20]. The to-
tal error was measured as the 95% percentile of the DSSIM
index map for each image, and taking the mean over the
entire image collection.
5.2.1
Consistent Watermark Collection
In CVPR17/Copyright-fixed, the images were consistently
watermarked using Eq. 3. Sampled input images and re-
sults are shown in Figs. [1,2,4,5]. As can be seen (e.g.,
Fig. 1(b)), our algorithm accurately estimates the fine struc-
tures and subtle gradients in the watermark and alpha matte.
The computed errors are shown in the Table 2, and Table 3.
As a baseline, we evaluated the image reconstructions ob-
tained by standard image matting. It is important to note
Copyrights-fixed CVPR2017-fixed
PSNR DSSIM PSNR DSSIM
Ours
36.2
0.038
32.73
0.07
Matting [9] 15.66
0.46
21.37
0.36
Direct Sub
30.89
0.080
30.65
0.085
Table 2. Reconstruction quality and comparison. PSNR and
DSSIM (mean over the dataset) for our reconstruction; matting
decomposition [9] supplied with the ground truth α; direct sub-
traction (Eq. 2) with the initial matted watermark.
7 2152
8. Figure 8. Example limitation. Inaccuracies in estimating subtle
watermark structures, e.g. shadows, may show up as visible arti-
facts, especially in smooth regions.
Figure 7. Estimated watermarks (αW ) for the datasets in Fig. 6.
that most of existing matting methods do not output (nor
evaluate) the quality of the underlying background image.
To facilitate the task of foreground-background decomposi-
tion, we supplied the matting algorithm [9] with the ground
truth alpha matte. While this method was able to get reason-
able reconstruction in some local regions, it generally fails
to resolve the ambiguity between the watermark and back-
ground image, especially when the watermark is colored.
This can be seen in Fig. 4(c) and the errors in 2.
As a second baseline, we considered the image recon-
structions obtained using a direct per pixel subtraction, us-
ing our initial matted-watermark. This approach does not
generate accurate reconstructions, as small errors in the esti-
mated watermark or alpha matte show up as visual artifacts
(Fig. 4(d)).
5.2.2
Robustness to Watermark Variations
We evaluated the robustness of our generalized framework
to per-image watermark variations (see Sec. 4). To do so,
we generated a number of datasets, using the same logos
as before. We first uniformly sampled different position
for the watermark per image. We introduced subtle opac-
ity variations by uniformly sampling a blend factor c k for
each image within a few intensity levels around the global
blend factor c, i.e., c k ∈ [c − x/255, c + x/255] (we used
x = 10, 20). We generated small spatial perturbations by
smoothing two i.i.d random noise images (for the x and y
components of the perturbation) with a Gaussian filter. We
limit the maximum perturbation in each direction to a de-
fined value (we used a maximum of 0.5 and 1 pixels). We
then used those as displacement fields to warp the original
watermark and alpha matte (using bilinear interpolation).
Finally, we also generated datasets with a combination of
the above variations.
We report the results in Table 3. As can be seen, the de-
tection is very robust to various variations. As the CVPR17
logo is mostly untextured and does not contain strong gra-
dients, it is more challenging to detect, yet its detection rate
is still high. We further observe that opacity changes do
not affect the results much, and that geometric perturba-
tions have the most significant impact on the quality of the
reconstructions. Note that the perturbations do not prevent
the algorithm from extracting a reliable estimate of the wa-
termark (as geometric noise can still be integrated out over
many images). Therefore, our generalized framework man-
ages to improve the reconstruction quality to some degree
(see Fig. 5(c-d)) (top)); however, it is unable to align the
perturbed watermark accurately enough and visual artifacts
are still noticeable (see Fig. 5(c-d) bottom).
6. Conclusion
We revealed a loophole in the way visible watermarks
are used, which allows to automatically remove them and
recover the original images with high accuracy. The attack
exploits the coherency of the watermark across many im-
ages, and is not limited by the watermark’s complexity or
its position in the images. We further studied and eval-
uated whether adding small random variations in geome-
try/opacity to the watermark can help prevent such an at-
tack. We found the attack is most affected by geometric
variations, which can provide an effective improvement in
watermark security compared to current, traditional water-
marking schemes.
Fig. 8 shows an example limitation of the attack. In par-
ticular, inaccuracies in estimating subtle watermark struc-
tures occasionally show up as visible artifacts when the un-
derlying image is smooth. We conjecture that it may be
possible to leverage this fact, in addition to the variations,
for content-aware watermark placement [12], to further im-
prove robustness to removal.
Dataset
Detection Rate PSNR DSSIM
CVPR17
No Var.
98.6
32.73 0.073
Translation
93.5
32.80 0.087
Opacity (10/255 max)
98.6
32.25 0.062
Opacity (20/255 max)
98.6
31.75 0.066
Spatial pert (0.5px max)
98.0
33.4
0.076
Spatial pert (1px max)
97.8
32.09 0.096
Trans.+Opacity10+pert1
92.4
30.81 0.097
Copyright
No Var.
100
36.2
0.038
Translation
100
36.35 0.039
Opacity (10/255 max)
100
34.79 0.037
Opacity (20/255 max)
100
33.17 0.063
Spatial pert (0.5px max)
100
33.20 0.059
Spatial pert (1px max)
100
31.23 0.085
Trans.+Opacity10+pert1
100
31.33
0.11
Table 3. Robustness to watermark variations. Detection rate
(over all images), PNSR and DSSIM (mean over 50 images) for
watermarked datasets we generated with several types of random,
per-image watermark variations: translation, opacity, geometric
perturbation, and their combination. See explanation of variations
and magnitudes in the text.
8 2153
9. References
[1] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers,
and J. Davis. Scape: shape completion and animation of peo-
ple. In ACM Transactions on Graphics (TOG), volume 24,
pages 408–416. ACM, 2005.
[2] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester. Image
inpainting. In Proceedings of the 27th annual conference on
Computer graphics and interactive techniques, pages 417–
424. ACM Press/Addison-Wesley Publishing Co., 2000.
[3] G. W. Braudaway, K. A. Magerlein, and F. C. Mintzer. Pro-
tecting publicly available images with a visible image wa-
termark. In Electronic Imaging: Science & Technology,
pages 126–133. International Society for Optics and Photon-
ics, 1996.
[4] J. Canny. A computational approach to edge detection. Pat-
tern Analysis and Machine Intelligence, IEEE Transactions
on, (6):679–698, 1986.
[5] M. Dashti, R. Safabakhsh, M. Pourfard, and M. Abdollahi-
fard. Video logo removal using iterative subsequent match-
ing. In AISP, 2015.
[6] Y. Hu and S. Kwong. Wavelet domain adaptive visible wa-
termarking. Electronics Letters, 37(20):1219–1220, 2001.
[7] C.-H. Huang and J.-L. Wu. Attacking visible watermarking
schemes. Multimedia, IEEE Transactions on, 6(1):16–30,
2004.
[8] M. S. Kankanhalli and K. Ramakrishnan. Adaptive visible
watermarking of images. In Multimedia Computing and Sys-
tems, 1999. IEEE International Conference on, volume 1,
pages 568–573. IEEE, 1999.
[9] A. Levin, D. Lischinski, and Y. Weiss. A closed-form solu-
tion to natural image matting. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 30(2):228–242, 2008.
[10] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra-
manan, P. Dollár, and C. L. Zitnick. Microsoft coco: Com-
mon objects in context. In Computer Vision–ECCV 2014,
pages 740–755. Springer, 2014.
[11] C. Liu. Beyond pixels: exploring new representations and
applications for motion analysis. PhD thesis, MIT, 2009.
[12] A. Lumini and D. Maio. Adaptive positioning of a visi-
ble watermark in a digital image. In Multimedia and Expo,
2004. ICME’04. 2004 IEEE International Conference on,
volume 2, pages 967–970. IEEE, 2004.
[13] J. Meng and S.-F. Chang. Embedding visible video wa-
termarks in the compressed domain. In Image Processing,
1998. ICIP 98. Proceedings. 1998 International Conference
on, volume 1, pages 474–477. IEEE, 1998.
[14] S. P. Mohanty, K. R. Ramakrishnan, and M. S. Kankanhalli.
A dct domain visible watermarking technique for images. In
Multimedia and Expo, 2000. ICME 2000. 2000 IEEE Inter-
national Conference on, volume 2, pages 1029–1032. IEEE,
2000.
[15] S.-C. Pei and Y.-C. Zeng. A novel image recovery algorithm
for visible watermarked images. Information Forensics and
Security, IEEE Transactions on, 1(4):543–550, 2006.
[16] C. I. Podilchuk and E. J. Delp. Digital watermarking: algo-
rithms and applications. Signal Processing Magazine, IEEE,
18(4):33–46, 2001.
[17] V. M. Potdar, S. Han, and E. Chang. A survey of digital
image watermarking techniques. In Industrial Informatics,
[18]
[19]
[20]
[21]
9 2154
2005. INDIN’05. 2005 3rd IEEE International Conference
on, pages 709–716. IEEE, 2005.
J. Wang and M. F. Cohen. Image and video matting: a sur-
vey. Now Publishers Inc, 2008.
J. Wang, Q. Liu, L. Duan, H. Lu, and C. Xu. Automatic tv
logo detection, tracking and removal in broadcast video. In
ICMM, 2007.
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli.
Image quality assessment: from error visibility to struc-
tural similarity. Image Processing, IEEE Transactions on,
13(4):600–612, 2004.
W.-Q. Yan, J. Wang, and M. S. Kankanhalli. Automatic
video logo detection and removal. Multimedia Systems,
2005.