Tampered image detection

The spread of misinformation is not limited to misleading posts. Often, images are tampered to provide “proof” for false claims. The field of image forensics has provided a number of image analysis algorithms. However, the outputs of these algorithms are often notoriously hard to analyze.

Challenge Description

The goal of an automatic tampered image detection system is to provide the user with a final estimate of whether the image is tampered or not, based on the outputs produced by a number of forensic algorithms.

A dataset of tampered and untampered images is provided:

Training set: 719 tampered, 719 untampered. The images are annotated with 0 meaning ‘untampered’ and 1 meaning ‘tampered’.
Test set: 128 tampered, 128 untampered, similarly annotated.

For each image, the outputs of 8 tampering algorithms (also called ‘maps’) are provided (‘ADQ1’, ‘BLK’, ‘CAGI’, ‘CFA1’, ‘DCT’, ‘NOI1’, ‘NOI3’, ‘SCNN’) [1], both in grayscale format (‘./ForensicResults_PNG/’) and colorized with the ‘jet’ colormap (‘./ForensicResults_PNG_jet/’). For tampered images in the training set, the ground truth mask is also provided.

Baseline approach

The current baseline approach takes the grayscale maps, extracts a feature vector based on statistical information about the connected components plus image moments, and trains a Random Forest classifier using that information. The performance of this approach is not satisfactory (accuracy: 70-75%).

Provided code:

Two python scripts are provided, ‘01_extract_features.py’ which performs feature extraction and saves the extracted features to disk, and ‘02_train_test.py’ which loads the saved features, performs 10-fold cross-validation of the classifier on the training set, then trains the classifier on the entire training set and applies it on the test set. The results are written on an output file and evaluated.

Input

Inputs can be derived from the file structure and the provided file lists (‘train’ and ‘test’).

Output

The output list will be a list of values corresponding to the file list of ‘test’. ‘02_train_test.py’ already provides a sample of how to create the output file.

Upload

On the right side of the webpage there is the Hackathonist Details field. Fill your first name, last name and email.
Upload your zip file.
Click the Submit button.

Leaderboard

Accept the challenge and achieve better results!

Team name	Run	Precision	Recall	F-score
MKLab	Baseline	0.73	0.71	0.70

Best training set cross-validation results: 0.7-0.75 accuracy.

References:

[1] Zampoglou, M., Papadopoulos, S., & Kompatsiaris, Y. (2017) Large-scale evaluation of splicing localization algorithms for web images. International Journal of Multimedia Tools and Applications 76, no. 4 (2017): 4801-4834.

Contact:

Markos Zampoglou: markzampoglou@iti.gr

Symeon Papadopoulos: papadop@iti.gr

Download Code Files

Download TampImg.zip to get the provided Input Data and code.

Download