← 返回大厅
arXiv (CS.CV) 2026-06-16 12:00 DOI: arXiv:2606.14787

Vision-Encoder Behavioral Fingerprints of Image-to-Image Generative Models: A Training-Paradigm-Driven Taxonomy of Six Commercial APIs

作者:

摘要 / Abstract

We study six production image-to-image AI systems (gpt-image-1, Gemini 2.5 Flash Image, Flux Kontext, SDXL img2img, SD3 img2img, and Qwen Image Edit) under a content-adaptive sub-JND adversarial perturbation pipeline, scoring all outputs by frozen DINOv2 ViT-B/14 token distances against clean references. Across a 3,588-call corpus spanning COCO photographs, CelebA-HQ portraits, and AI-generated inputs, the six systems partition into two image-invariant behavioral bands on a 2D (patch_mean, ssim_clean) plane: edit-trained models (Flux Kontext, Qwen Edit, Gemini) cluster in a tight band, while T2I-base models adapted at sampling time (SDXL, SD3, gpt-image-1) cluster in a drift band.

同行评议区

登录学者账户后即可在此处发表评述或点赞。

立即登录

暂无评议记录。