Tampered image detection
The spread of misinformation is not limited to misleading posts. Often, images are tampered to provide “proof” for false claims. The field of image forensics has provided a number of image analysis algorithms. However, the outputs of these algorithms are often notoriously hard to analyze.
The goal of an automatic tampered image detection system is to provide the user with a final estimate of whether the image is tampered or not, based on the outputs produced by a number of forensic algorithms.
A dataset of tampered and untampered images is provided:
- Training set: 719 tampered, 719 untampered. The images are annotated with 0 meaning ‘untampered’ and 1 meaning ‘tampered’.
- Test set: 128 tampered, 128 untampered, similarly annotated.
For each image, the outputs of 8 tampering algorithms (also called ‘maps’) are provided (‘ADQ1’, ‘BLK’, ‘CAGI’, ‘CFA1’, ‘DCT’, ‘NOI1’, ‘NOI3’, ‘SCNN’) , both in grayscale format (‘./ForensicResults_PNG/’) and colorized with the ‘jet’ colormap (‘./ForensicResults_PNG_jet/’). For tampered images in the training set, the ground truth mask is also provided.
The current baseline approach takes the grayscale maps, extracts a feature vector based on statistical information about the connected components plus image moments, and trains a Random Forest classifier using that information. The performance of this approach is not satisfactory (accuracy: 70-75%).
Two python scripts are provided, ‘01_extract_features.py’ which performs feature extraction and saves the extracted features to disk, and ‘02_train_test.py’ which loads the saved features, performs 10-fold cross-validation of the classifier on the training set, then trains the classifier on the entire training set and applies it on the test set. The results are written on an output file and evaluated.
Inputs can be derived from the file structure and the provided file lists (‘train’ and ‘test’).
The output list will be a list of values corresponding to the file list of ‘test’. ‘02_train_test.py’ already provides a sample of how to create the output file.
- On the right side of the webpage there is the Hackathonist Details field. Fill your first name, last name and email.
- Upload your zip file.
- Click the Submit button.
Accept the challenge and achieve better results!
Best training set cross-validation results: 0.7-0.75 accuracy.
References: Zampoglou, M., Papadopoulos, S., & Kompatsiaris, Y. (2017) Large-scale evaluation of splicing localization algorithms for web images. International Journal of Multimedia Tools and Applications 76, no. 4 (2017): 4801-4834.
Download Code Files
Download TampImg.zip to get the provided Input Data and code.