PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

MarkCleaner: High-Fidelity Watermark Removal via
Imperceptible Micro-Geometric Perturbation

Xiaoxi Kong¹, Jieyu Yuan^2*, Pengdi Chen¹, Yuanlin Zhang², Chongyi Li², Bin Li^1*†

¹Shenzhen University, ²Nankai University

^*Corresponding Author, ^†Project Lead

TL;DR: MarkCleaner achieves high-fidelity watermark removal by exploiting the geometric vulnerability of semantic watermarks via micro-geometric perturbations and 2D Gaussian Splatting.

MarkCleaner removes watermarks without compromising visual fidelity.

(a) Trade-off between fidelity and erasure in existing methods.
(b) Our MarkCleaner offers a unified solution for both visible and invisible watermarks via mask-guided encoding and geometric perturbation.

Abstract

Semantic watermarks exhibit strong robustness against conventional image-space attacks. In this work, we show that such robustness does not survive under micro-geometric perturbations: spatial displacements can remove watermarks by breaking the phase alignment. Motivated by this observation, we introduce MarkCleaner, a watermark removal framework that avoids semantic drift caused by regeneration-based watermark removal.

Specifically, MarkCleaner is trained with micro-geometry-perturbed supervision, which encourages the model to separate semantic content from strict spatial alignment and enables robust reconstruction under subtle geometric displacements. The framework adopts a mask-guided encoder that learns explicit spatial representations and a 2D Gaussian Splatting–based decoder that explicitly parameterizes geometric perturbations while preserving semantic content. Extensive experiments demonstrate that MarkCleaner achieves superior performance in both watermark removal effectiveness and visual fidelity, while enabling efficient real-time inference.

Preliminary Analysis

We identify that semantic watermarks are fundamentally sensitive to geometric perturbation. Geometric transformations manifest as phase shifts in the frequency domain, which disrupt the precise phase alignment used by watermark detectors.

(a) Clean

(b) Watermarked

(c) Transformed

Geometric transformation disrupts watermark-induced phase ripples while preserving amplitude structure, indicating watermark invalidation arises from phase modulation rather than content alteration.

Methodology

MarkCleaner adopts a UNet-based encoder-decoder architecture. The core principle is to train the network to learn micro-geometric perturbations while maintaining visual consistency.

Mask-Guided Encoding: Disrupts potential watermark patterns in both frequency and spatial domains using dual-domain stochastic masking.
2D Gaussian Splatting (2DGS) Decoder: Explicitly parameterizes geometric perturbations using 2D Gaussian primitives to break spatial alignment while preserving content.
Content Alignment: Incorporates self-supervised visual features (DINOv2) to ensure semantic consistency under spatial displacement.

Quantitative Comparison

We evaluate MarkCleaner against 14 representative watermark removal approaches across 12 watermarking schemes. Results are shown as TPR@1%FPR / ACC. Lower values indicate more effective removal.

Attack Type	DwtDct	SSL	Stega	Stable	VINE	WOFA	Gaussian	T2S	Tree	RingID	HSTR	HSQR	mTPR(↓)
Attack Type	DwtDct	SSL	Stamp	Sign.	VINE	WOFA	Shading	Mark	Ring	RingID	HSTR	HSQR	mACC(↓)
None	.800/.888	1.0/1.0	1.0/.999	1.0/.993	1.0/.999	.977/.832	1.0/1.0	1.0/1.0	.943/.970	1.0/1.0	1.0/1.0	1.0/1.0	.977/.473
JPEG	.003/.490	.243/.680	1.0/.997	.782/.695	1.0/.989	.040/.458	1.0/1.0	1.0/.998	.083/.945	1.0/1.0	.980/.988	1.0/1.0	.678/.353
Crop&Scale	.007/.521	.927/.875	.033/.554	.999/.978	.013/.505	.800/.631	.003/.501	.000/.501	.000/.842	.010/.582	.070/.597	.153/.702	.251/.154
Blur	.357/.682	.987/.981	1.0/1.0	.818/.817	1.0/.998	.693/.707	1.0/1.0	1.0/1.0	.593/.965	1.0/1.0	1.0/1.0	1.0/1.0	.871/.429
Noise	.000/.505	.010/.508	.987/.872	.000/.543	1.0/.901	.069/.439	1.0/.991	.987/.951	.000/.925	.993/1.0	.103/.578	.987/.990	.511/.267
Rotation	.000/.522	.973/.918	.000/.510	.998/.813	.010/.497	.987/.845	.007/.539	.000/.499	.000/.850	1.0/1.0	.103/.578	.060/.613	.251/.154
Translation	.003/.501	.980/.923	.255/.587	1.0/.991	.013/.504	.822/.963	.027/.568	.000/.501	.000/.888	.013/.484	.087/.585	.057/.563	.271/.172
VAE-B (ICLR, '18)	.000/.499	.760/.818	1.0/.999	.730/.680	1.0/.990	.157/.439	1.0/1.0	1.0/1.0	.190/.957	1.0/1.0	1.0/1.0	1.0/1.0	.686/.365
VAE-C (CVPR, '20)	.000/.496	.410/.720	1.0/.998	.582/.652	.997/.959	.173/.440	1.0/1.0	1.0/1.0	.123/.947	1.0/1.0	.990/.993	1.0/1.0	.606/.351
DA (ICLR, '24)	.000/.494	.003/.510	.043/.531	.625/.463	.450/.609	.477/.479	1.0/.987	.993/.939	.000/.880	.927/.996	.800/.828	1.0/.998	.526/.226
CtrlRegen+ ('25)	.003/.494	.030/.557	.197/.603	.023/.479	.833/.671	.110/.416	1.0/.999	1.0/.987	.001/.885	.980/.999	.437/.890	1.0/.998	.468/.248
UnMarker ('25)	.010/.538	.973/.918	.990/.946	.999/.981	.007/.502	.997/.917	1.0/1.0	.017/.502	.010/.948	.460/.939	.097/.693	.033/.575	.466/.288
IRA (CVPR, '25)	.000/.487	.680/.759	.990/.905	.625/.508	1.0/.924	.100/.418	1.0/1.0	1.0/1.0	.800/.975	1.0/1.0	.990/.995	1.0/1.0	.682/.331
NFPA (NeurIPS, '25)	.000/.500	.030/.565	.013/.481	.043/.494	.013/.502	.087/.387	.003/.514	.000/.499	.003/.913	.030/.672	.153/.645	.367/.795	.061/.154
Ours	.003/.500	.000/.504	.010/.488	.000/.446	.063/.554	.002/.494	.003/.554	.003/.500	.001/.795	.003/.802	.051/.630	.022/.766	.014/.094

As demonstrated in the comprehensive evaluations above, MarkCleaner consistently achieves the best overall performance across all metrics and watermark types.

Qualitative Results

Traditional pixel-space distortions severely degrade visual quality, while generation-based methods tend to remove watermarks at the cost of semantic drift. Our MarkCleaner achieves more effective watermark suppression with better visual fidelity.

Comparison across different removal strategies. MarkCleaner preserves fine details without semantic drift.

Additional Results (Appendix)

We provide additional qualitative comparisons on various semantic watermarking schemes, including Tree-Ring, RingID, HSTR, and HSQR.

Tree-Ring Watermark Removal (A)

Tree-Ring Watermark Removal (B)

RingID Watermark Removal (A)

RingID Watermark Removal (B)

HSTR Watermark Removal (A)

HSTR Watermark Removal (B)

HSQR Watermark Removal (A)

HSQR Watermark Removal (B)

Ablation Study

Visualization of ablation studies. Red boxes highlight geometric subtle shifts, while Yellow boxes highlight structural preservation.

We analyze the contribution of each component: Mask-Guided Encoder (ME), Gaussian Rendering (GR), Content Alignment (CA), and Geometric Attacks (GA).

Module	TPR(↓)	FID(↓)	CLIP(↑)	LPIPS(↓)
w/o ME	0.2467	145.145	0.3154	0.6846
w/o GR	0.1367	146.604	0.3163	0.6837
w/o CA	0.1567	149.416	0.2528	0.7064
w/o GA	1.0000	131.487	0.3324	0.6676
Full Model	0.0001	133.196	0.3705	0.6295

BibTeX

@misc{kong2026markcleanerhighfidelitywatermarkremoval,
          title={MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation}, 
          author={Xiaoxi Kong and Jieyu Yuan and Pengdi Chen and Yuanlin Zhang and Chongyi Li and Bin Li},
          year={2026},
          eprint={2602.01513},
          archivePrefix={arXiv},
          primaryClass={eess.IV},
          url={https://arxiv.org/abs/2602.01513}
}