Resources

InVID FIVR-200K

The InVID FIVR-200K dataset has been developed in the context of the InVID project with the aim of simulating the problem of Fine-grained Incident Video Retrieval (FIVR). FIVR is the problem where: given a query video, the objective is to retrieve all associated videos, considering several types of associations that range from duplicate videos to videos from the same incident. To address the benchmarking needs of such problem, the large-scale video dataset FIVR-200K has been constructed. It comprises 225,960 YouTube videos collected based on 4,687 major news events crawled from Wikipedia, and 100 video queries selected based on an automatic selection process. For the annotation of the dataset, an annotation protocol has been devised with respect to four types of video associations, i.e., Near-Duplicate Videos (ND), Duplicate Scene Videos (DS), Complementary Scene Videos (CS), and Incident Scene Videos (IS). To this end, FIVR-200K dataset contains the list of the collected Youtube ids, the crawled events from Wikipedia and the video annotations, which include the set of videos for each associations type for each query in the dataset.

Created by: G. Kordopatis-Zilos (CERTH-ITI / QMUL), S. Papadopoulos (CERTH-ITI), I. Patras (QMUL), Y. Kompatsiaris (CERTH-ITI)

@Zenodo @Author's Website

InVID Fake Video Corpus 2018 (v3.0)

The InVID TV Fake Video Corpus was developed in the context of the InVID project with the aim of gaining a perspective of the types of fake video that can be encountered in the real world. This third version of the dataset, published near the end of the project, has been extended to include 200 fake videos and 180 real ones. Furthermore, using crawling and near-duplicate retrieval, a large number of near-duplicates of each fake and real video have been collected, leading to a total of 3957 videos annotated as fake and 2458 annotated as real. The videos are temporally ordered in cascades and accompanied by their metadata. As we do not own the rights to the videos, the dataset only contains the video URLs and annotations.

Created by: O. Papadopoulou, M. Zampoglou, S. Papadopoulos, I. Kompatsiaris (CERTH-ITI)

@Zenodo @Author's Website

InVID Fake Video Corpus v2.0

This is the second version of the InVID Fake Video Corpus, containing 117 fake videos and 110 real videos, alongside annotations and descriptions. As we do not own the rights to the videos, the dataset only contains the video URLs and annotations.

Created by: O. Papadopoulou, S. Papadopoulos, M. Zampoglou, I. Kompatsiaris (CERTH-ITI), D. Teyssou (AFP)

@Zenodo

InVID Fake Video Corpus v1.0

The InVID TV Fake Video Corpus is a small collection of verified fake videos. It was developed in the context of the InVID project with the aim of gaining a perspective of the types of fake video that can be encountered in the real world. The dataset does not aspire to serve as an exhaustive list of all forgeries that have circulated the Web in the past, but we intend to maintain and extend it throughout the course of the project as new cases arise. The collection is a collaborative effort between AFP and CERTH-ITI. Currently the Corpus consists of 59 videos. For each video, information is provided describing the fake, its original source, and the evidence proving it is a fake. As we do not own the videos, the dataset only provides the video URLs and metadata, in the form of a tab-separated value (TSV) file.

Created by: S. Papadopoulos, M. Zampoglou, I. Kompatsiaris (CERTH-ITI), D. Teyssou (AFP)

@Zenodo

InVID TV Logo Dataset v2.0

This dataset was created with the purpose of providing a training and evaluation benchmark for TV logo detection in videos. It contains the results from the segmentation and annotation of 2,749 YouTube videos originating from a large number of news TV channels, The videos have been annotated with respect to the TV channel logos they contain -specifically, by the name of the organization to which the logo belongs- and with shot boundary information. Furthermore, a set of logo templates has been extracted from the videos and organized alongside the corresponding channel information. As we do not own the rights to the videos, the dataset only contains the YouTube video IDs alongside the corresponding annotations. It further contains 503 logo template files and the corresponding metadata information (channel name, wikipedia link). See the README file for details. This is the second version of the dataset, including various corrections in annotation.

Created by: O. Papadopoulou, M. Zampoglou, S. Papadopoulos, I. Kompatsiaris (CERTH-ITI)

@Zenodo

InVID TV Logo Dataset v1.0

This dataset was created with the purpose of providing a training and evaluation benchmark for TV logo detection in videos. It contains the results from the segmentation and annotation of 2,749 YouTube videos originating from a large number of news TV channels. The videos have been annotated with respect to the TV channel logos they contain -specifically, by the name of the organization to which the logo belongs- and with shot boundary information. Furthermore, a set of logo templates has been extracted from the videos and organized alongside the corresponding channel information. As we do not own the rights to the videos, the dataset only contains the YouTube video IDs alongside the corresponding annotations. It further contains 503 logo template files and the corresponding metadata information (channel name, wikipedia link).

Created by: O. Papadopoulou, M. Zampoglou, S. Papadopoulos, I. Kompatsiaris (CERTH-ITI)

@Zenodo

Image Forensics

This is an integrated framework for image forensic analysis. It includes a Java webservice, including seven splicing detection algorithm implementations, plus additional forensic tools and a Matlab algorithm evaluation framework, including implementations of a large number of splicing detection algorithms.

Maintained by: Markos Zampoglou

@GitHub

Intermediate CNN Features

This repository contains the implementation of the feature extraction process described in Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers. Given an input video, one frame per second is sampled and its visual descriptor is extracted from the activations of the intermediate convolution layers of a pre-trained Convolutional Neural Network. Then, the Maximum Activation of Convolutions (MAC) function is applied on the activation of each layer to generate a compact layer vector. Finally, the layer vector are concatenated to generate a single frame descriptor.

Maintained by: Giorgos Kordopatis-Zilos

@GitHub

Near-Duplicate Video Retrieval with Deep Metric Learning

This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retrieval with Deep Metric Learning. It provides code for training and evalutation of a Deep Metric Learning (DML) network on the problem of Near-Duplicate Video Retrieval (NDVR). During training, the DML network is fed with video triplets, generated by a triplet generator. The network is trained based on the triplet loss function

Maintained by: Giorgos Kordopatis-Zilos

@GitHub

Computational Verification

A framework for "learning" how to classify social content as truthful/reliable or not. Features are extracted from the tweet text (Tweet-based features TF) and the user who published it (User-based features UB). A two level classification model is trained.

Maintained by: Olga Papadopoulou

@GitHub

Multimedia Geotagging

This repository contains the implementation of algorithms that estimate the geographic location of multimedia items based on their textual content. The approach is described in the paper Geotagging Text Content With Language Models and Feature Mining.

Maintained by: Giorgos Kordopatis-Zilos

@GitHub

Reveal Graph Embedding

Implementation of community-based graph embedding for user classification.

Maintained by: No longer maintained

@GitHub

Image Verification Assistant

The Media Verification Assistant features a multitude of image tampering detection algorithms plus metadata analysis, GPS Geolocation, EXIF Thumbnail extraction and integration with Google reverse image search.

@Demo @GitHub

Context Aggregation and Analysis

This is a demo platform aimed to facilitate the verification of UGC video content posted on YouTube, Twitter and Facebook. In contrast to other approaches, which attempt to analyze the videos themselves for traces of forgery, this platform analyzes the video context: The characteristics of the poster, any relevant user comments, the local weather reports at the time of the event, and other contextual pieces of information are aggregated and presented to the user for analysis.

@Demo

Tweet Verification Assistant

Get help in analyzing the veracity of a tweet. Given a tweet, a user can explore the veri cationresult, including the extracted feature values and their distribution on the Veri cation Corpus.

@Demo