FaceForensics++ is a forensics dataset consisting of 1000 original video sequences that have been manipulated with four automated face manipulation methods: Deepfakes, Face2Face, FaceSwap and NeuralTextures. The data has been sourced from 977 youtube videos and all videos contain a trackable mostly frontal face without occlusions which enables automated tampering methods to generate realistic forgeries.
FaceForensics is a video dataset consisting of more than 500,000 frames containing faces from 1004 videos that can be used to study image or video forgeries. All videos are downloaded from Youtube and are cut down to short continuous clips that contain mostly frontal faces. This dataset has two versions:
Source-to-Target: where the authors reenact over 1000 videos with new facial expressions extracted from other videos, which e.g. can be used to train a classifier to detect fake images or videos.
Selfreenactment: where the authors use Face2Face to reenact the facial expressions of videos with their own facial expressions as input to get pairs of videos, which e.g. can be used to train supervised generative refinement models.
OpenForensics is a large-scale dataset posing a high level of challenges that is designed with face-wise rich annotations explicitly for face forgery detection and segmentation. With its rich annotations, the OpenForensics dataset has great potentials for research in both deepfake prevention and general human face detection.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.
Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.
DeeperForensics-1.0 represents the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. The full dataset includes 48,475 source videos and 11,000 manipulated videos. The source videos are collected on 100 paid and consented actors from 26 countries, and the manipulated videos are generated by a newly proposed many-to-many end-to-end face swapping method, DF-VAE. 7 types of real-world perturbations at 5 intensity levels are employed to ensure a larger scale and higher diversity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This research enhances face verification and recognition systems for digital forensics and information security using SVM and FaceNet. It addresses accuracy, bias, and privacy challenges, proposing ethical frameworks for deployment in law enforcement, access control, and fraud prevention while mitigating risks like spoofing and data breaches.
ariesta/forensic-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This user guide outlines a method for objective, dense facial tissue depth mapping of 3D face and skull STL models generated from computed tomography (CT) scans, in an effort to produce a standardized reference dataset for forensic craniofacial identification applications. Any 3D STL face and skull models that are in correct anatomical orientation to each other can be mapped with this method.The methods described include hollowing and cropping of a face model to obtain a face "shell," mapping the face to the skull, colorizing and saving mapped data, and visualization of mapped data.The purpose of this guide is to provide an objective, standardized method for mapping facial tissue depth values for forensic craniofacial identification applications using free software.We have also attached a zipped fileset of Meshlab scripts (with instructions for installation included in the zip) that will assist with splitting the face and skull maps into 1 mm increments for enhanced visualizations.Images were generated with publicly available de-identified CT scans from The Cancer Imaging Archives (http://www.cancerimagingarchive.net/) (doi: 10.1007/s10278-013-9622-7).A link to the accompanying manuscript will be provided when published.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Forensic facial identification examiners are required to match the identity of faces in images that vary substantially, owing to changes in viewing conditions and in a person's appearance. These identifications affect the course and outcome of criminal investigations and convictions. Despite calls for research on sources of human error in forensic examination, existing scientific knowledge of face matching accuracy is based, almost exclusively, on people without formal training. Here, we administered three challenging face matching tests to a group of forensic examiners with many years' experience of comparing face images for law enforcement and government agencies. Examiners outperformed untrained participants and computer algorithms, thereby providing the first evidence that these examiners are experts at this task. Notably, computationally fusing responses of multiple experts produced near-perfect performance. Results also revealed qualitative differences between expert and non-expert performance. First, examiners' superiority was greatest at longer exposure durations, suggestive of more entailed comparison in forensic examiners. Second, experts were less impaired by image inversion than non-expert students, contrasting with face memory studies that show larger face inversion effects in high performers. We conclude that expertise in matching identity across unfamiliar face images is supported by processes that differ qualitatively from those supporting memory for individual faces.
bradchao/LLAMA3-Forensics dataset hosted on Hugging Face and contributed by the HF Datasets community
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our extensive Deepfake Database, featuring diverse and real-world videos for deepfake detection, face recognition, and video forensics.
VideoForensicsHQ is a benchmark dataset for face video forgery detection, providing high quality visual manipulations. It is one of the first face video manipulation benchmark sets that also contains audio and thus complements existing datasets along a new challenging dimension. VideoForensicsHQ shows manipulations at much higher video quality and resolution, and shows manipulations that are provably much harder to detect by humans than videos in other datasets.
VideoForensicsHQ contains 1,737 videos of speaking faces (44% male, 56% female), with 8 different emotions, most of them of “HD” resolution. The videos amount to 1,666,816 frames.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
such as FaceForensics++ and DFDC
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper focuses on an identity sharing scheme known as face image morphing or simply morphing. Morphing is the process of creating a composite face image, a morph, by digitally manipulating face images of different individuals, usually two. Under certain circumstances, the composite image looks like both contributors and can be used by one of them (accomplice) to issue an ID document. The other contributor (criminal) can then use the ID document for illegal activities, which is a serious security vulnerability. So far, researchers have focused on automated morphing detection solutions. Our main contribution is the evaluation of the effectiveness and limitations of two image forensics methods in visualizing morphing related traces in digital images. Visualization of morphing traces is important as it can be used as hard evidence in forensic context (i.e., court cases) and lead to the development of morphing algorithm specific feature extraction strategies for automated detection. To evaluate the two methods, we created morphs using two state-of-the-art morphing algorithms, complying with the face image requirements of three currently existing online passport application processes. We found that complementary use of the visualization methods can reveal morphing related traces. We also show how some application process-specific requirements affect visualization results by testing three likely morphing attack scenarios with varied image processing parameters and propose application process amendments that would make forensic image analysis more reliable.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The possibility of carrying out a meaningful forensics analysis on printed and scanned images plays a major role in many applications. First of all, printed documents are often associated with criminal activities, such as terrorist plans, child pornography pictures, and even fake packages. Additionally, printing and scanning can be used to hide the traces of image manipulation and even the synthetic nature of images, since the artifacts commonly found in manipulated and synthetic images are gone after the images are printed and scanned. A problem hindering research in this area is the lack of large scale reference datasets to be used for algorithm development and benchmarking. Motivated by this issue, we share a new dataset composed of a large number of synthetic and natural printed face images. Such a dataset can be used with several computer vision and machine learning approaches for two tasks: pinpointing the printer source of a document and detecting printed pictures generated by deep fakes.
When using the dataset, don't forget to cite our paper:
@Article{jimaging7030050, AUTHOR = {Ferreira, Anselmo and Nowroozi, Ehsan and Barni, Mauro}, TITLE = {VIPPrint: Validating Synthetic Image Detection and Source Linking Methods on a Large Scale Dataset of Printed Documents}, JOURNAL = {Journal of Imaging}, VOLUME = {7}, YEAR = {2021}, NUMBER = {3}, ARTICLE-NUMBER = {50}, URL = {https://www.mdpi.com/2313-433X/7/3/50}, ISSN = {2313-433X}, DOI = {10.3390/jimaging7030050} }
The GitHub folder contains the scripts to replicate the analysis corresponding to the paper Who Is That? Perceptual Expertise on Other-Race Face Comparisons, Disguised Face Comparisons, and Face Memory (https://doi.org/10.31234/osf.io/s87na). The data in the paper was collected from forensic facial professionals (examiners and reviewers) and undergraduate students. Participants rated the similarity of faces and chose which faces they memorized. Student data can be obtained by contacting Prof. Alice J. O'Toole at The University of Texas at Dallas; forensic facial professional data can be obtained by contacting Dr. Amy N. Yates at NIST.The file WhoIsThat.Rmd is the code used for figures and analysis in the paper, e.g., the Mann-Whitney tests and violin plots. The resultant output can be seen in the file WhoIsThat.html.
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Multiple Encounter Dataset (MEDS-II) is a test corpus organized from an extract of submissions of deceased persons with prior multiple encounters. MEDS is provided to assist the FBI and partner organizations refine tools, techniques, and procedures for face recognition as it supports Next Generation Identification (NGI), forensic comparison, training, analysis, face image conformance, and inter-agency exchange standards. The MITRE Corporation (MITRE) prepared MEDS in the FBI Data Analysis Support Laboratory (DASL) with support from the FBI Biometric Center of Excellence.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
forensicsman/text-images-white dataset hosted on Hugging Face and contributed by the HF Datasets community
Celeb-DF is a large-scale challenging dataset for deepfake forensics. It includes 590 original videos collected from YouTube with subjects of different ages, ethnic groups and genders, and 5639 corresponding DeepFake videos.
LFW-Beautified: A Dataset of Face Images with Beautification and Augmented Reality Filters
Usage
People & Contact
References
FaceForensics++ is a forensics dataset consisting of 1000 original video sequences that have been manipulated with four automated face manipulation methods: Deepfakes, Face2Face, FaceSwap and NeuralTextures. The data has been sourced from 977 youtube videos and all videos contain a trackable mostly frontal face without occlusions which enables automated tampering methods to generate realistic forgeries.