On the Effectiveness of Visible Watermarks

如果无法正常显示，请先停止浏览器的去广告插件。

1. On the Effectiveness of Visible Watermarks Tali Dekel Michael Rubinstein Ce Liu Google Research William T. Freeman {tdekel,mrub,celiu,wfreeman}@google.com Watermark (W) (a) Input watermarked image collection Zoom-in alpha matte (α) (b) Computed watermark+ alpha matte (c) Recovered images (our result) Figure 1. We show that visible watermarks as employed by photographers and stock content marketplaces can be removed automatically. While removing a watermark from a single image automatically is extremely challenging, watermarks are typically added in a consistent manner to many images (a). We show that this consistency can be exploited to automatically infer the watermark pattern (b) and to obtain the original, watermark-free content with high accuracy (c). We then investigate and report how robust such an attack is to different types of inconsistencies that may be introduced in the watermarking process to improve its security, such as randomly changing the watermark’s position and blend factor, or applying subtle geometric deformation to the watermark when embedding it in each image. Abstract Visible watermarking is a widely-used technique for marking and protecting copyrights of many millions of im- ages on the web, yet it suffers from an inherent security ﬂaw—watermarks are typically added in a consistent man- ner to many images. We show that this consistency allows to automatically estimate the watermark and recover the orig- inal images with high accuracy. Speciﬁcally, we present a generalized multi-image matting algorithm that takes a wa- termarked image collection as input and automatically es- timates the “foreground” (watermark), its alpha matte, and the “background” (original) images. Since such an attack relies on the consistency of watermarks across image col- lections, we explore and evaluate how it is affected by var- ious types of inconsistencies in the watermark embedding that could potentially be used to make watermarking more secure. We demonstrate the algorithm on stock imagery available on the web, and provide extensive quantitative analysis on synthetic watermarked data. A key takeaway message of this paper is that visible watermarks should be designed to not only be robust against removal from a single image, but to be more resistant to mass-scale removal from image collections as well. 1. Introduction Visible watermarks are used extensively by photogra- phers and stock content services to mark and protect dig- ital photos and videos shared on the web. Such watermarks typically involve overlaying a semi-transparent image con- taining a name or a logo on the source image (Figure 1(a)). Visible watermarks often contain complex structures such as thin lines and shadows in order to make them harder to remove. Indeed, removing a watermark from a single im- age without user supervision or a-priori information is an extremely difﬁcult task. However, the fact that watermarks are added in a consistent manner to many images has thus far been overlooked. For example, stock content market- places typically add similar versions of their logos to pre- views of many millions of images on the web. We show that the availability of such watermarked image collections makes it possible to invert the watermarking process and nearly perfectly recover the images that were intended to be protected (Fig. 1(c)). This can be achieved automatically by only observing the watermarked content. We show how the problem of watermark removal from an image collection can be formulated as a generalized multi-image matting problem, where the goal is to esti- mate the “foreground” (watermark) image and alpha matte, along with the “background” (original) images, using many observed examples. Different from natural image matting methods that rely on user scribbles to constraint the prob- lem, our method leverages the redundancy in the data. In particular, we ﬁrst extract consistent image structures across the collection to obtain an initial estimate of the matted wa- termark and detect the watermark region in all the images. We then solve an optimization problem that separates the matted watermark into its image and alpha matte compo- nents (Fig. 1(b)) while reconstructing a subset of the back- ground images. In our experiments we found that a few 1 2146

2. R G B Image/Watermark Decomp. Estimated Gradients Watermark update Input watermarked images (I) Joint matted watermark estimation and detection (Sec. 3.1) Matte update (II) Matte and Blend-Factor (III) Multi-Image Matting and recon. Init. (Sec. 3.2) (Sec. 3.2) Figure 2. Automatic watermark extraction pipeline. (I) The algorithm ﬁrst jointly estimates the matted watermark (the product of the alpha matte and watermark image) and localizes it in all the images by detecting consistent image gradients across the collection. This initial estimate is correct up to a spatially-varying shift. (II) The aligned detections are used to estimate an initial alpha matte, and the estimated matted watermark is reﬁned. (III) These are then used as initializations for our multi-image matting optimization. hundred images marked by the same watermark already suf- ﬁce for high quality estimation of the watermark and alpha matte. Once the watermark pattern is recovered, it can be ef- ﬁciently removed in mass scale from any image marked by it. Importantly, we do not synthesize or inpaint the water- marked regions; rather, we actually invert the watermarking process to recover the original, watermark-free images. As such an attack relies on the watermark’s consistency across the image collection, a natural question is whether one could prevent it by breaking this consistency. There- fore, we study how robust the attack is to various types of inconsistencies—or variations—that could potentially be introduced while embedding the watermark in each image. We show, for example, that randomly changing the posi- tion of the watermark across the collection does not prevent such an attack from detecting and removing the watermark, nor do random changes in the watermark’s opacity or color. Interestingly, we found that applying small spatial deforma- tion to the watermarks during embedding can signiﬁcantly degrade the quality of the watermark-removed images, with only imperceptible changes to the watermark itself. We demonstrate results on watermarked image collec- tions obtained from top stock photography web sites, as well as extensive quantitative analysis on synthetic water- marked datasets. A key contribution of our paper is in surfacing vulnerabilities in current visible watermarking schemes, which put many millions of copyrighted images at risk. Speciﬁcally, we argue that visible watermarks should be designed to not only be robust against removal from sin- gle images, but to be resistant against removal from image collections as well. We believe our work can inspire devel- opment of advanced watermarking techniques for the digital photography and stock image industries. 2. Related Work A vast literature exists on digital watermarking (see e.g., [16, 17] for surveys). We focus on visible watermarks su- perimposed on images and limit the scope of our review to work in that area. Braudaway et al. [3] were among the ﬁrst to introduce visible watermarks in digital images. They used an adap- tive, nonlinear pixel-domain technique to add a watermark to an input image as a means to identify its ownership, while at the same time not obscuring the image details behind it and making the watermark difﬁcult to remove. This scheme has been extended in various ways. Meng and Cheng [13] extended this model to the DCT domain and applied it to compressed video steams. Kankanhalli and Ramakrish- nan [8] used statistics of block DCT coefﬁcients to deter- mine the watermark embedding coefﬁcients for each block. They later extended it to account for the texture sensitivity in the human visual system to better preserve the perceptual quality of the images [14]. Hu and Kwong [6] implemented adaptive visible watermarking in the wavelet domain to han- dle visual discontinuities that may be introduced by DCT- based methods. While some of these methods may improve the visual quality of the watermarks and make them harder to remove, in practice, most modern stock content market- places use a standard additive watermarking model, which is the model we focus on in Sec. 3 and generalize in Sec. 4. As visible watermarking plays an important role in pro- tecting image copyrights, researchers have looked into ways to attack it. Pei and Zeng [15] proposed to use Independent Component Analysis (ICA) to separate the source image from the watermark. Huang and Wu [7] used classic image inpainting methods [2] to ﬁll in the image regions covered by the watermark. These techniques operate on a single im- age, require a user to manually mark the watermark area and cannot handle large watermarked regions (Fig. 4(b)). More related to our case are methods for watermark re- moval in videos [21, 5, 19]. However, such methods rely on temporal coherency of videos, i.e., assume that image con- tent occluded by the watermark in one frame appears unoc- cluded in others frames [21, 19]. This assumption does not apply to the stock photo collections we deal with. In ad- dition, all these methods inpaint the logo/watermark area, whereas our goal is to explicitly recover the original image by utilizing the semi-visible image content under the water- mark. Watermark removal is also related to classical image matting, where the goal is to decompose a single image into background and foreground layers [9, 18]. As the matting problem is inherently ill-posed and the majority of pixels 2 2147

3. are either deﬁnitely background or foreground, most ex- isting methods rely on a user to provide hard constraints. Moreover, in our setting the opacity of the watermark is typ- ically low in all pixels, i.e., all pixels are either background or mixed. This makes the decomposition more challeng- ing. Finally, natural image matting is typically used for im- age editing applications such as object cut and paste, which require an accurate alpha matte but can tolerate errors in the background layer. In our case, the quality of the recon- structed background is key. We compare our results with image matting in Sec. 5. 3. An Attack on Watermarked Collections A watermarked image, J, is typically obtained by super- imposing a watermark, W , to a natural image, I. That is, J(p) = α(p)W (p) + (1 − α(p)) I(p), (1) where p = (x, y) is the pixel location, and α(p) is a spatially varying opacity, or alpha matte. The most commonly used watermarks are translucent to keep the underlying image content partially visible [3]. That is, α(p) < 1 for all pixels, or α = c · α n , where c < 1 is a constant blending factor, and α n ∈ [0, 1] is a normalized alpha matte. Similar to natural image matting, for α n , the majority of pixels are either only background (α n (p) = 0) or only foreground (α n (p) = 1). Following Eq. 1, and given W and α, one could trivially invert the watermarking process via the per-pixel operation I(p) = J(p) − α(p)W (p) . 1 − α(p) (2) However, when no prior information is available, the prob- lem of recovering I given J alone is extremely challeng- ing and inherently under-determined—there are three un- knowns per pixel (W, α, I), and a single constraint (Eq. 1). However, as discussed in Sec. 1, watermarks are typi- cally added in a consistent way to many images. Formally, for a collection of images, {I k }, marked by the same W and α, we have (omitting the pixel index p for brevity) J k = αW + (1 − α)I k , k = 1, · · · , K (3) K Our goal is to recover W , α and {I k } K k=1 given {J k } k=1 . This multi-image matting problem is still under- determined as there are 3K equations and 3(K + 1) + 1 unknowns per pixel, for K color images. However, the coherency of W and α over the image collection, together with natural image priors, allow solving it to high accuracy, fully automatically. Our watermark removal algorithm consists of several steps, illustrated in Fig. 2. We next describe each of them in detail. We ﬁrst consider the case of a consistent watermark- ing scheme, i.e., the images are marked by the same water- mark and alpha matte, in the same position. We then gener- alize this model in Sec. 4, allowing for positional variations, as well as subtle geometric and color variations across the collection. (b) (a) (c) Figure 3. Initial watermark estimation and detection. (a) The user provides a rough bounding box around the watermark in a single image (for current stock collections on the web this is not needed; see text). (b) The magnitude of gradients of (a). (c) The magnitude of median gradients across the collection after 2 itera- tions of watermark detection and estimation (see Sec. 3.1). 3.1. Initial Watermark Estimation & Detection The ﬁrst task is to determine which image structures in the collection belong to the common watermark, and to detect them in all the images. This is a chicken and an egg problem since estimating the watermark requires know- ing which regions in the images are watermarked, and vice versa. We solve this by jointly estimating the matted water- mark and detecting it in all the images. Speciﬁcally, we it- erate between the following estimation and detection steps. I. Estimating the Matted Watermark Given a current estimate of the watermarked regions in the images, we de- termine which image structures in the collection belong to the common watermark by observing consistent image gra- dients across the collection. Speciﬁcally, we compute the median of the watermarked image gradients, independently in x and y directions, at every pixel location p: m (p) = median k (∇J k (p)). ∇ W (4) As the number of images K increases, Eq. 4 converges to the gradients of the true matted watermark, W m = αW , up to a shift (see Fig. 3). To demonstrate why that is the case, we treat I k and J k as random variables, and compute the exception E [∇J k ]. Using Eq. 3 we have, E [∇J k ] = E [∇W m ] + E [∇I k ] − E [∇(αI k )] = ∇W m + E [∇I k ] − ∇αE [I k ] − αE [∇I k ] (5) = ∇W m − ∇αE [I k ] , where the second equality is from the derivative of multipli- cation. The third equality is based on the known property of natural image gradients to be sparse, i.e., the chance of having strong gradients at the same pixel location in multi- ple images is small. Hence, E [∇I k ] ≈ 0. It follows that E [∇J k ] approximates the gradients of the matted water- mark, W m , except for pixels for which ∇α = 0. At those pixels the gradients are shifted by ∇α · E [I k ]. For now, we continue with this shifted initialization. We will show how this shift can be corrected later on (Sec. 3.2). m to remove boundary regions by comput- We crop ∇ W m (p) and taking the bounding box ing the magnitude of ∇ W 3 2148

4. (a) Input (b) Inpainting (c) Matting decomp. (d) Direct subtraction (e) Ours Figure 4. Watermark removal comparison with baselines. The watermarked region in (a) is inpainted by photoshop (b) using the estimated matte, α, as a mask. (c) Result by alpha matting decomposition [9]. (d) Result when directly subtracting the estimated watermark (Eq. 2). (e) The result of the attack described in Sec. 3. of its edge map (using Canny with 0.4 threshold). The ini- m ≈ W m (correct up to a shift) is tial matted watermark W obtained using Poisson reconstruction (Fig. 3(c)). II. Watermark Detection Given the current estimate m , we detect the watermark in each of the images us- ∇ W ing Chamfer Distance commonly used for template match- ing in object detection and recognition [1]. Speciﬁcally, for a given watermarked image, we obtain a verbose edge map (using Canny edge detector [4]), and compute its Eu- clidean distance transform, which is then convolved with m (ﬂipped horizontally and vertically) to get the Cham- ∇ W fer distance from each pixel to the closest edge. Lastly, the watermark position is taken to be the pixel with minimum distance in the map. We found this detection method to be very robust, providing high detection rates for diverse watermarks and different opacity levels, as demonstrated in Sec. 5 and the supplementary material. To initialize the joint estimation, if the relative position of the watermark in the images is ﬁxed (as is the case with any stock image collection we observed on the web), we m , get an initial estimation of the watermark gradients, ∇ W by registering the images relative to their centers and run- ning step I. If the watermark position is not ﬁxed, we only require the user to mark a rough bounding box around the watermark in one of the images (Fig. 3(a)). We then use the gradients in the given bounding box as an initial estimation for the watermark gradients. We found that iterating be- tween steps I. and II. 2-3 times is enough to obtain accurate detections and a reliable matted watermark estimation. 3.2. Multi-Image Matting and Reconstruction Given the aligned detections in all input images, our goal is then to solve the multi image matting problem (Eq. 3), i.e., decompose the observed intensities in each image into their watermark, alpha-matte, and original image compo- nents. The challenge is how to resolve the inherent ambigu- ity in this problem completely automatically and reliably. m pro- Our initial estimation of the matted watermark W vides us with valuable information about the structures that do not belong to the original images. However, from Eq. 3 it is clear that it is insufﬁcient to constrain the problem as the matted watermark needs to be decomposed into its im- age component W , and alpha matte component α. Accu- rate estimation of each component is vital, as very small er- rors (as little as 2 intensity levels in the watermark) may al- ready show up as visible artifacts in the reconstruction (see Fig. 4(d)). To address these challenges, we formulate the watermark inversion problem as an optimization problem that jointly solves for W , α and a collection of K watermark-free im- ages {I k }. Formally, we deﬁne the following objective: arg min W,α,{I k } E data (W, α, I k )+λ I E reg (∇I k ) k +λ w E reg (∇W )+λ α E reg (∇α)+βE f (∇(αW )). (6) The term E data (W, α, I k ) penalizes for deviations of the k th image from the formation model (Eq. 1), at every pixel p, and is given by E data (I k , W, α) = √ p Ψ |αW + (1 − α)I k − J k | 2 , (7) where Ψ(s 2 ) = s 2 + ǫ 2 , ǫ = 0.001 is a robust function that approximates L 1 distance (p is omitted for brevity). The terms E reg (∇I) and E reg (∇W ) are regularization terms that encourage the reconstructed images and the wa- termark to be piecewise smooth where the gradients of the alpha matte are strong. We deﬁne E reg (∇I) as E reg (∇I) = p Ψ(|α x |I x 2 + |α y |I y 2 ), (8) where I x , I y are the horizontal and vertical derivatives of the image, respectively, and Ψ as deﬁned above. The term E reg (∇W ) is deﬁned similarly. The regularization term on alpha is given by E reg (∇α) = p Ψ(α x 2 + α y 2 ). Even with the use of multiple images and smoothness priors the decomposition problem is still ambiguous. For example, for a ﬁxed alpha matte, there may be inﬁnite num- ber of piecewise smooth watermark/images decompositions that can satisfy the formation model (Eq. 3) . The last term in the objective–the ﬁdelity term–reduces this ambiguity by encouraging the matted watermark W m = αW to have sim- m , and is given by ilar gradients to the initial estimate W E f (∇W m ) = p m 2 ). Ψ(∇W m − ∇ W (9) The impact of the ﬁdelity term is demonstrated in Fig. 5(a), where we compare our result with (top) and without (bot- tom) this term. In addition, we compare it to the matting decomposition approach taken in [9], using the ground truth alpha matte as input (Fig. 4(d)). 4 2149

5. No variations 0 Opacity var. (20/255 max) Spatial pert. 0.5 px (max) Spatial pert. 1 px (max) w/ blend w/ flow w/ flow w/o blend w/o flow w/o flow (a) (b) (c) (d) Figure 5. Robustness to watermark variations. This ﬁgure is best viewed on a monitor. Top row shows the input. (a) Our result for the traditional (consistent) watermarking model; bottom image shows the result for β = 0, i.e., when the ﬁdelity term (Eq. 9) is not used. (b) Our result when random opacity changes are introduced in each image, with (middle) and without (bottom) estimating a blend factor per image (Sec. 4). Notice no signiﬁcant visible difference comparing (a) middle and (b) middle. (c) Our results for 0.5 pixel per-image perturbation, with (middle) and without (bottom) ﬂow estimation. (d) similar to (c), for 1 pixel perturbation. Notice how visual artifacts gradually increase with perturbation magnitude, making it difﬁcult for the attack to produce artifact-free reconstruction. per-image watermarks {W k } by minimizing the term E aux . This step reduces to taking the median of {W k }. That is, W = median k W k . Optimization The resulting optimization problem (Eq. 6) is non-linear and the number of unknowns may be very large when dealing with a large collection (O(KN ) un- knowns, where N and K are the number of pixels per im- age, and number of images, respectively). To deal with these challenges, we introduce auxiliary variables {W k }, where W k is the watermark of the k th image. Each per- image watermark W k is required to be close to W . For- mally, we rewrite the objective as follows III. Matte Update. Here, we solve for α, while keeping the rest of the unknowns ﬁxed. In this case, we minimize the following objective over α: arg min k (E data (I k , W k , α) + λ I E reg (∇I k ) + λ w E reg (∇W k ) + λ α E reg (∇α) + βE f (∇(αW k )) + γ k E aux (W, W k ),(10) where E aux (W, W k ) = p |W − W k |. Using these auxiliary variables, we solve smaller and sim- pler optimization problems (using alternating minimiza- tion). The resulting iterative algorithm consists of the fol- lowing steps. I. Image–Watermark Decomposition At this step, we minimize the objective w.r.t. W k , and I k , while keeping α and W ﬁxed. Thus, the optimization in Eq. 10 reduces to: arg min E data (I k , W k ) + λ I E reg (∇I k ) + λ w E reg (∇W k ) + W k ,I k βE f (∇(αW k )) + γE aux (W, W k ). (11) We solve this minimization problem using Iteratively- Reweighed-Least-Square (IRLS), where the resulting linear system is derived in Supplementary Materials (SM). II. Watermark Update In this step, we opt to estimate a global watermark W that is consistent with all the estimated k E data (α, I k , W ) + λ α E reg (∇α) + βE f (∇(αW )). (12) Here too, the solution is obtained using IRLS (the ﬁnal linear system is derived in the SM). These steps are iterated several times until convergence. Matte and Blend Factor Initialization: The matte in our formulation is given by α = c · α n , where c is a constant blending factor, and α n is a normalized matte. The initial- ization for α n is obtained by ﬁrst running single image mat- ting [9]. To avoid the required user input of [9], our method automatically computes foreground/background masks (us- m ) and uses them as “scrib- ing adaptive threshold on W bles”. The initial matte is taken to be the median over all single-image mattes. We infer the blend factor from “black” patches (aver- age intensity below 0.01) in the image collection because in those patches the formation model reduces to: J k = c · α n W (no image component). Since W is unknown at this point, we use the initial matted watermark (Sec. 3.1) to estimate m = c · α n W − c · α n E[I k ]. c. From Sec. 3.1, we have: W m + Combining these two observations, we get: J k = W c · α n · E[I k ]. We estimate DC = E[I k ] as the median 5 2150

6. intensity across the image collection at each I k ’th patch lo- cation, and plug our initial estimation of α n . We then solve for c using least squares over all black patches. In all the datasets we have tested, thousands of dark patches were de- tected, which is more than enough for a robust estimation of c. Finally, the estimated matte is used to correct for the m . This is done by adding the term c · α n · DC shift in W to W m to get the ﬁnal matted watermark estimation we will use in the next section (Fig. 2(II)). 3.3. Removing the Watermark in a New Image Once we have the solution for W and α, we can use them to remove the watermark from any new image marked by it, without the need to run the entire pipeline again. As dis- cussed in Sec. 3.2, very subtle errors in the watermark or alpha matte are likely to show up as noticeable visual ar- tifacts. Therefore, we avoid direct reconstruction and in- stead perform the image-watermark decomposition step of our multi-image matting algorithm (Eq. 11). Long Watermak Pipeline Removal Edge Res.(w×h) (min) (sec) AdobeStock 422 1000 623 × 134 33 18 123RF 1376 650 650 × 433 115 60 CanStock 3000 450 450 × 300 28 12 fotolia 285 500 199 × 66 5 1.5 CVPR17 1000 640 403 × 67 34 5 Copyrights 1000 640 339 × 307 47 40 Table 1. Datasets and running time. Running time is given for: (i) Pipeline: full framework, including initial watermark estima- tion and detection (using all images in the dataset), and multi- image matting (using a subset of 50 images). (ii) Removal: re- construction of a single image. Datasets # Images α-aware ﬂow: In this step we solve for ω based on the current estimate of W , α, c and I. This results in an al- gorithm which is similar to conventional optical ﬂow and can be solved using IRLS. To do so, we need to linearize the data term. With an existing estimate ω, our goal is to estimate the optimal increment dω = (du, dv). The Taylor expansion results in 4. A Generalized Watermarking Model The consistent watermarking model (Eq. 3) is the one used in practice in every stock image collection we encoun- tered on the Web (see Sec. 5.1). However, a natural question is whether one can avoid the attack by breaking the consis- tency of the watermark across the collection. To gain in- sights, we explore the impact of variations in the watermark and alpha matte from image to image. We assume that watermarks cannot be arbitrary altered, since various design principles and artistic choices are taken into account in generating and placing them in images. Thus, we focus on subtle variations that roughly preserve the original appearance of the watermark. Speciﬁcally, we generalize our model to allow for two types of variations: (i.) subtle opacity changes (ii.) subtle spatial perturbations (deformations). The generalized formation model for the k’th image is given by J k = c k α(ω k )W (ω k ) + (1 − c k α(ω k ))I k , (13) where ω k = (u k (p), v k (p)) represents a dense warp ﬁeld (applied to both W and α), and c k is the per-image blending factor (that controls the opacity). Plugging Eq. 13 into our objective function (Eq. 6), and adding regularization term on the warping ﬁeld leads to arg min w Ψ(|J −cW m (ω)−(1−cα(ω))I|)+λ ω Ψ(|∇ω|), where W m = αW (the image index k is omitted for brevity). Here too, we use alternating minimization, i.e., our multi-image matting algorithm has two additional steps: blend factor estimation (solving for c k ), and α-aware ﬂow estimation (solving for ω k ), per image. Blend factor: The initial estimation from Sec. 3.2 yields an average blend factor. We then solve for a small deviation per image from that estimation. See the supplementary for the full derivation. ′ ′ ′ W m (ω + dω) ≈ W m + W mx du + W my dv, ′ ′ ′ α(ω + dω) ≈ α + α x du + α y dv, ′ ′ ′ = W m (ω), α ′ = α(ω), and α x ′ , α y ′ , W mx , W my where W m ′ ′ are the partial derivatives of α and W m , respectively. The linearized data term is given by (omitting the constant blend factor c) ′ ′ ′ Ψ(|W m + (1 − α ′ )I − J + (W mx − α x ′ I)du + (W my − α y ′ I)dv|). This can be related to the classical optical ﬂow equation ′ I t + I x du + I y dv, by denoting I t = W m +(1−α ′ )I −J, ′ ′ ′ ′ I x = W mx − α x I, and I y = W my − α y . We modiﬁed the optical ﬂow code [11] to implement this α-aware ﬂow. 5. Results We tested our algorithm extensively on watermarked im- age collections by well-known stock photography web sites, as well as watermarked datasets we generated. For each of the datasets, the number of images, long edge, and the wa- termark resolution are reported in Table 1. We set the parameters of the multi-image matting al- gorithm empirically: λ I ∈ [0, 1], λ α = 0.01, λ W ∈ [0.001, 0.1] β ∈ [0.001, 0.01]. In all the experiments, we used a subset of 50 images from the entire collection and 4 iterations, and reconstructed all the watermarked images. We used all available images in each dataset to get the ini- tial estimation of the watermark. We show the effect of the number of images on the performance in the supplemen- tary material. Running times are reported in Table 1 using a non-optimized MATLAB implementation. 5.1. Results on Stock Imagery We crawled publicly available preview images from pop- ular stock content web sites using 15 predeﬁned queries such as “fashion”, “food”, “sports” and “nature”. Sample results are shown in Fig. 6 (more are available in the SM). 6 2151

7. AdobeStock (422 images), c=0.41 123RF (1340 images), c=0.2 CanStock (3000 images), c=0.17 fotolia (285 images), c=0.45 Figure 6. Results on stock imagery. This ﬁgure is best viewed on a monitor. For each dataset (row), we show (from left to right) the input watermarked image (cropped for better viewing), our automatic, watermark-free image reconstruction, and zoomed in regions. The number of images and estimated blend factor are reported above each row. The corresponding estimated watermarks are shown in Fig. 7 ) As can be seen in the input images in Fig. 6, the wa- termarks contain various structures and shapes, some more complex than the others, including both thick (e.g., Can- Stock) and thin letters and lines (e.g., 123RF), smooth bor- ders (e.g. fotolia), and shadows (e.g. AdobeStock). In part of the images, the watermarked region is highly textured, while in others it is smooth. The opacity of the watermarks is low and varies across the different datasets (the estimated blending factors are shown above each row). In all these cases, our algorithm accurately estimated the watermarks (Fig. 7) and original images (Fig. 6). 5.2. Synthetic Datasets & Quantitative Evaluation We generated a number of watermarked collections, using two watermark images: CVPR17, and Copyrights. For the source, watermark-free images, we used 1000 im- ages chosen randomly from the Microsoft COCO ‘val2014’ dataset [10]. With the ground truth in hand, we quantita- tively evaluated different aspects of the method using the following metrics: Detection: We use the Euclidean distance between the cen- ter of detected bounding box (Sec. 3.1) and the ground truth; a distance larger than 2px in considered as miss detection. Reconstruction: We use two well-known metrics: Peak- Signal-to-Noise-Ratio (PSNR), and Structural dissimilarity Image Index (DSSIM), both of which were shown to cap- ture perceptible image degradations. Formally, the DSSIM between each of the reconstructed images I ˜ k and the ground truth I k is deﬁned as DSSIM( I ˜ k , I k ) = 2 1 (1 − SSIM( I ˜ k , I k )), where SSIM(x, y) is the Structural Similarity [20]. The to- tal error was measured as the 95% percentile of the DSSIM index map for each image, and taking the mean over the entire image collection. 5.2.1 Consistent Watermark Collection In CVPR17/Copyright-ﬁxed, the images were consistently watermarked using Eq. 3. Sampled input images and re- sults are shown in Figs. [1,2,4,5]. As can be seen (e.g., Fig. 1(b)), our algorithm accurately estimates the ﬁne struc- tures and subtle gradients in the watermark and alpha matte. The computed errors are shown in the Table 2, and Table 3. As a baseline, we evaluated the image reconstructions ob- tained by standard image matting. It is important to note Copyrights-ﬁxed CVPR2017-ﬁxed PSNR DSSIM PSNR DSSIM Ours 36.2 0.038 32.73 0.07 Matting [9] 15.66 0.46 21.37 0.36 Direct Sub 30.89 0.080 30.65 0.085 Table 2. Reconstruction quality and comparison. PSNR and DSSIM (mean over the dataset) for our reconstruction; matting decomposition [9] supplied with the ground truth α; direct sub- traction (Eq. 2) with the initial matted watermark. 7 2152

8. Figure 8. Example limitation. Inaccuracies in estimating subtle watermark structures, e.g. shadows, may show up as visible arti- facts, especially in smooth regions. Figure 7. Estimated watermarks (αW ) for the datasets in Fig. 6. that most of existing matting methods do not output (nor evaluate) the quality of the underlying background image. To facilitate the task of foreground-background decomposi- tion, we supplied the matting algorithm [9] with the ground truth alpha matte. While this method was able to get reason- able reconstruction in some local regions, it generally fails to resolve the ambiguity between the watermark and back- ground image, especially when the watermark is colored. This can be seen in Fig. 4(c) and the errors in 2. As a second baseline, we considered the image recon- structions obtained using a direct per pixel subtraction, us- ing our initial matted-watermark. This approach does not generate accurate reconstructions, as small errors in the esti- mated watermark or alpha matte show up as visual artifacts (Fig. 4(d)). 5.2.2 Robustness to Watermark Variations We evaluated the robustness of our generalized framework to per-image watermark variations (see Sec. 4). To do so, we generated a number of datasets, using the same logos as before. We ﬁrst uniformly sampled different position for the watermark per image. We introduced subtle opac- ity variations by uniformly sampling a blend factor c k for each image within a few intensity levels around the global blend factor c, i.e., c k ∈ [c − x/255, c + x/255] (we used x = 10, 20). We generated small spatial perturbations by smoothing two i.i.d random noise images (for the x and y components of the perturbation) with a Gaussian ﬁlter. We limit the maximum perturbation in each direction to a de- ﬁned value (we used a maximum of 0.5 and 1 pixels). We then used those as displacement ﬁelds to warp the original watermark and alpha matte (using bilinear interpolation). Finally, we also generated datasets with a combination of the above variations. We report the results in Table 3. As can be seen, the de- tection is very robust to various variations. As the CVPR17 logo is mostly untextured and does not contain strong gra- dients, it is more challenging to detect, yet its detection rate is still high. We further observe that opacity changes do not affect the results much, and that geometric perturba- tions have the most signiﬁcant impact on the quality of the reconstructions. Note that the perturbations do not prevent the algorithm from extracting a reliable estimate of the wa- termark (as geometric noise can still be integrated out over many images). Therefore, our generalized framework man- ages to improve the reconstruction quality to some degree (see Fig. 5(c-d)) (top)); however, it is unable to align the perturbed watermark accurately enough and visual artifacts are still noticeable (see Fig. 5(c-d) bottom). 6. Conclusion We revealed a loophole in the way visible watermarks are used, which allows to automatically remove them and recover the original images with high accuracy. The attack exploits the coherency of the watermark across many im- ages, and is not limited by the watermark’s complexity or its position in the images. We further studied and eval- uated whether adding small random variations in geome- try/opacity to the watermark can help prevent such an at- tack. We found the attack is most affected by geometric variations, which can provide an effective improvement in watermark security compared to current, traditional water- marking schemes. Fig. 8 shows an example limitation of the attack. In par- ticular, inaccuracies in estimating subtle watermark struc- tures occasionally show up as visible artifacts when the un- derlying image is smooth. We conjecture that it may be possible to leverage this fact, in addition to the variations, for content-aware watermark placement [12], to further im- prove robustness to removal. Dataset Detection Rate PSNR DSSIM CVPR17 No Var. 98.6 32.73 0.073 Translation 93.5 32.80 0.087 Opacity (10/255 max) 98.6 32.25 0.062 Opacity (20/255 max) 98.6 31.75 0.066 Spatial pert (0.5px max) 98.0 33.4 0.076 Spatial pert (1px max) 97.8 32.09 0.096 Trans.+Opacity10+pert1 92.4 30.81 0.097 Copyright No Var. 100 36.2 0.038 Translation 100 36.35 0.039 Opacity (10/255 max) 100 34.79 0.037 Opacity (20/255 max) 100 33.17 0.063 Spatial pert (0.5px max) 100 33.20 0.059 Spatial pert (1px max) 100 31.23 0.085 Trans.+Opacity10+pert1 100 31.33 0.11 Table 3. Robustness to watermark variations. Detection rate (over all images), PNSR and DSSIM (mean over 50 images) for watermarked datasets we generated with several types of random, per-image watermark variations: translation, opacity, geometric perturbation, and their combination. See explanation of variations and magnitudes in the text. 8 2153

9. References [1] D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis. Scape: shape completion and animation of peo- ple. In ACM Transactions on Graphics (TOG), volume 24, pages 408–416. ACM, 2005. [2] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester. Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 417– 424. ACM Press/Addison-Wesley Publishing Co., 2000. [3] G. W. Braudaway, K. A. Magerlein, and F. C. Mintzer. Pro- tecting publicly available images with a visible image wa- termark. In Electronic Imaging: Science & Technology, pages 126–133. International Society for Optics and Photon- ics, 1996. [4] J. Canny. A computational approach to edge detection. Pat- tern Analysis and Machine Intelligence, IEEE Transactions on, (6):679–698, 1986. [5] M. Dashti, R. Safabakhsh, M. Pourfard, and M. Abdollahi- fard. Video logo removal using iterative subsequent match- ing. In AISP, 2015. [6] Y. Hu and S. Kwong. Wavelet domain adaptive visible wa- termarking. Electronics Letters, 37(20):1219–1220, 2001. [7] C.-H. Huang and J.-L. Wu. Attacking visible watermarking schemes. Multimedia, IEEE Transactions on, 6(1):16–30, 2004. [8] M. S. Kankanhalli and K. Ramakrishnan. Adaptive visible watermarking of images. In Multimedia Computing and Sys- tems, 1999. IEEE International Conference on, volume 1, pages 568–573. IEEE, 1999. [9] A. Levin, D. Lischinski, and Y. Weiss. A closed-form solu- tion to natural image matting. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(2):228–242, 2008. [10] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra- manan, P. Dollár, and C. L. Zitnick. Microsoft coco: Com- mon objects in context. In Computer Vision–ECCV 2014, pages 740–755. Springer, 2014. [11] C. Liu. Beyond pixels: exploring new representations and applications for motion analysis. PhD thesis, MIT, 2009. [12] A. Lumini and D. Maio. Adaptive positioning of a visi- ble watermark in a digital image. In Multimedia and Expo, 2004. ICME’04. 2004 IEEE International Conference on, volume 2, pages 967–970. IEEE, 2004. [13] J. Meng and S.-F. Chang. Embedding visible video wa- termarks in the compressed domain. In Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on, volume 1, pages 474–477. IEEE, 1998. [14] S. P. Mohanty, K. R. Ramakrishnan, and M. S. Kankanhalli. A dct domain visible watermarking technique for images. In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE Inter- national Conference on, volume 2, pages 1029–1032. IEEE, 2000. [15] S.-C. Pei and Y.-C. Zeng. A novel image recovery algorithm for visible watermarked images. Information Forensics and Security, IEEE Transactions on, 1(4):543–550, 2006. [16] C. I. Podilchuk and E. J. Delp. Digital watermarking: algo- rithms and applications. Signal Processing Magazine, IEEE, 18(4):33–46, 2001. [17] V. M. Potdar, S. Han, and E. Chang. A survey of digital image watermarking techniques. In Industrial Informatics, [18] [19] [20] [21] 9 2154 2005. INDIN’05. 2005 3rd IEEE International Conference on, pages 709–716. IEEE, 2005. J. Wang and M. F. Cohen. Image and video matting: a sur- vey. Now Publishers Inc, 2008. J. Wang, Q. Liu, L. Duan, H. Lu, and C. Xu. Automatic tv logo detection, tracking and removal in broadcast video. In ICMM, 2007. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to struc- tural similarity. Image Processing, IEEE Transactions on, 13(4):600–612, 2004. W.-Q. Yan, J. Wang, and M. S. Kankanhalli. Automatic video logo detection and removal. Multimedia Systems, 2005.