MeVer GAN Detector v1.0

Contact

Olga Papadopoulou {olgapapa@iti.gr}

Organization

CERTH

Short Description

This model detects images that have been generated by the StyleGAN2 architecture, as well as GAN images from similar GAN architectures. It is based on a ResNet-50 pretrained on ImageNet. The model is trained using the AdamW optimizer, the learning rate was equal to 10−3, and a step scheduler with 5 epochs was used. Weight decay is also applied with a factor of 5 · 10−5. A drop path rate of 0.1 is employed to prevent overfitting, which randomly drops entire paths (i.e., sequences of layers) in the model during training.

Training Set

The training set contains images 35.000 real images of FFHQ (human faces) and 35.000 generated images from the corresponding StyleGAN2 pretrained NVIDIA model (that were selected with the GIQA method). The model was trained using strong augmentation schemes in order to achieve high generalization scores in different semantic domains (for e.g. AFHQ).

Evaluation Set

The evaluation set contains real and generated images (from the corresponding StyleGAN2 models) from the domains: FFHQ (human faces), AFHQ (animal faces), CelebA (human faces).

Version

v1.0, created 15/06/2023

Related Paper

Dogoulis, P., Kordopatis-Zilos, G., Kompatsiaris, I., & Papadopoulos, S. (2023, June). Improving Synthetically Generated Image Detection in Cross-Concept Settings. In Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation (pp. 28-35).

×

Mever LD Detector v1.0

Contact

Olga Papadopoulou {olgapapa@iti.gr}

Organization

CERTH

Short Description

This model detects images that have been generated by the Latent Diffusion architecture, as well as Latent Diffusion images from similar Diffusion architectures. It is based on a ResNet-50 pretrained on ImageNet. The model is trained using the AdamW optimizer, the learning rate was equal to 10−3, and a step scheduler with 5 epochs was used. Weight decay is also applied with a factor of 5 · 10−5. A drop path rate of 0.1 is employed to prevent overfitting, which randomly drops entire paths (i.e., sequences of layers) in the model during training.

Training Set

The training set consists of 2 different domains, LSUN Churches (Church images) and FFHQ (human faces). Each domain contained 10.000 real images and 10.000 generated images from Stable Diffusion pretrained models. The model was trained using strong augmentation schemes in order to achieve high generalization scores in different semantic domains (for e.g. LSUN Beds).

Evaluation Set

The evaluation set contains real and generated images (from the corresponding Diffusion models) from the domains: FFHQ (human faces), AFHQ (animal faces), LSUN Churches (images of churches) and LSUN Beds (images of beds).

Version

v1.0, created 15/06/2023

Related Paper

Dogoulis, P., Kordopatis-Zilos, G., Kompatsiaris, I., & Papadopoulos, S. (2023, June). Improving Synthetically Generated Image Detection in Cross-Concept Settings. In Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation (pp. 28-35).

×

UNINA SD Detector v1.0

Contact

Luisa Verdoliva {verdoliv@unina.it}

Organization

UNINA

Short Description

This model can detect fully synthetic images generated by Latent Diffusion models or architectures that are similar. The model's architecture is a variant of ResNet-50, pre-trained on ImageNet, that removes the stride in the stem. The model is trained using the Adam optimizer. The learning rate starts at 10−4 and is reduced by a factor of 10 if the accuracy on the validation set doesn't improve for 5 consecutive epochs. Training is halted if the learning rate drops below 10−6.

Training Set

The training set consists of 200K real images and 200K synthetic images. The real images are sourced from the COCOtrain and LSUN datasets. Meanwhile, the synthetic images are generated using 5 different Latent Diffusion models. For the image generated from the text, we use language prompts from the COCOtrain set. 10% of this set is reserved for validation.

Evaluation Set

The evaluation is carried out on images generated by several state-of-the-art generative models including GANs, transformers, and DMs: ProGAN, StyleGAN2, StyleGAN3, BigGAN, EG3D, Taming Transformer, DALL·E Mini, DALL·E 2, GLIDE, Latent Diffusion, Stable Diffusion and ADM (Ablated Diffusion Model). For text-to-image data we use language prompts from COCOval set. Real data instead come from COCOval set, ImageNet and UCID. In both cases, tests are performed on 1000 synthetic images for each model and 5000 real images. The approach obtains a high AUC (99%) for Latent Diffusion Models and an average AUC equal to 89%.

Version

v1.0, created 29/01/2023

Related Paper

Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., & Verdoliva, L. (2023, June). On the detection of synthetic images generated by diffusion models. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5).

×

MeVer RINE [LDM] v1.0

Contact

Christos Koutlis {ckoutlis@iti.gr}

Organization

CERTH

Short Description

This model is based on features extracted by intermediate blocks of the freezed CLIP's image encoder. Linear projection layers and a Trainable Importance Estimator module are learned on top of the extracted features in order to construct a forgery-aware vector space. The model is trained on LDM data for five epochs with batch size 128 and learning rate 1e-3, considering standard augmentations (bluring, jpeg compression, random croppping, random horizontal flip). The objective function combines cross-entropy and contrastive losses. This model can recognize images generated by Diffusion models such as Midjourney, DALL-E 2, and Firefly. Also, it exhibits good performance in recognizing GAN-generated content.

Training Set

The training set of this model contains 200K images generated from the Latent Diffusion Model and 200K real images from COCO and LSUN datasets. Specifically, it is the set provided by https://arxiv.org/abs/2211.00680 in https://github.com/grip-unina/DMimageDetection

Evaluation Set

The evaluation set is Synthbuster. It contains images generated by Midjourney, Firefly, Glide, Stable Diffusion, DALL-E 2, and DALL-E 3. An average 87.5% accuracy and 90.5% AP is reported. It should be mentioned that this model fails at DALL-E 3 image detection.

Version

v1.0, created Feb 2024

Related Paper

Koutlis, C., & Papadopoulos, S. (2024). Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection. arXiv preprint arXiv:2402.19091. https://arxiv.org/abs/2402.19091

×

MeVer RINE [ProGAN] v1.0

Contact

Christos Koutlis {ckoutlis@iti.gr}

Organization

CERTH

Short Description

This model is based on features extracted by intermediate blocks of the freezed CLIP's image encoder. Linear projection layers and a Trainable Importance Estimator module are learned on top of the extracted features in order to construct a forgery-aware vector space. The model is trained on ProGAN data for only one epoch with batch size 128 and learning rate 1e-3, considering standard augmentations (bluring, jpeg compression, random croppping, random horizontal flip). The objective function combines cross-entropy and contrastive losses. The model can recognize images generated from GANs (e.g., ProGAN, StyleGAN, BigGAN), as well as from Diffusion models (e.g., DALL-E, Latent Diffusion). However, its performance is decreased when confronted with images from Diffusion-based comercial tools such as Midjourney, DALL-E 2, DALL-E 3, Firefly.

Training Set

The training set is part of the dataset used in (Wang et al. 2020) (Wang, S. Y., Wang, O., Zhang, R., Owens, A., & Efros, A. A. (2020). CNN-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8695-8704), https://arxiv.org/abs/1912.11035.). It consists of 72K real images and 72K synthetic images. The real images are sourced from the LSUN dataset. Meanwhile, the synthetic images are generated using ProGAN. 4 models each trained on a different LSUN object category (car, cat, chair, horse) are used to generate 36K images per category. In total there are 144K images for training.

Evaluation Set

The evaluation sets are 20. Specifically, the synthetic samples derive from ProGAN, StyleGAN, StyleGAN2, BigGAN, CycleGAN, StarGAN, GauGAN, DeepFake, SITD, SAN, CRN, IMLE, Guided, LDM (3 variants), Glide (3 variants), and DALL-E. An average 91.5% accuracy and 98.8% AP is reported.

Version

v1.0, created Feb 2024

Related Paper

Koutlis, C., & Papadopoulos, S. (2024). Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection. arXiv preprint arXiv:2402.19091. https://arxiv.org/abs/2402.19091

×

Synthetic Image Detection

Use the text field below to insert the link (URL) to the image you want to check or upload an image from your computer. Our synthetic image detector will process the media and return the probability that this media is generated using a neural network known as a Generative Adversarial Network (GAN) or Diffusion-based model.

Contact: {olgapapa, gpan, papadop}@iti.gr

Back to examples

or have a look at some explanatory examples below

Status: Processing.
Status: Error.

IMAGE EXAMPLES

DETECT
DETECT


DETECT
DETECT


DETECT
DETECT
DETECT
DETECT
DETECT
DETECT


DETECT
DETECT

DETECT
DETECT
Detector Probability Comment
GAN-based
CERTH Diffusion-based
UNINA Diffusion-based
LDM-RINE
PROGAN-RINE