The DFDC (Deepfake Detection Challenge) is a dataset for deepface detection consisting of more than 100,000 videos.
The DFDC dataset consists of two versions:
Preview dataset. with 5k videos. Featuring two facial modification algorithms. Full dataset, with 124k videos. Featuring eight facial modification algorithms
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
The datasets contain the following parts for Open Media Forensics Challenge (OpenMFC) evaluations: 1. NC16 Kickoff dataset 2. NC17 development and evaluation datasets 3. MFC18 development and evaluation datasets 4. MFC19 development and evaluation datasets 5. MFC20 development and evaluation datasets 6. OpenMFC2022 steg datasets 7. OpenMFC2022 deepfake datasets
This dataset includes all detectable faces of the sample training dataset in Deepfake Detection Challenge. Kaggle and the host expected and encouraged us to train our models outside of Kaggle’s notebooks environment; however, for someone who prefers to stick to Kaggle's kernels, this dataset would help a lot 😄.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
For more information about SVDD Challenge 2024, please refer to https://challenge.singfake.org/.
We have released the test set here.
Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge (http://addchallenge.cn/add2023) includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks. The ADD 2023 dataset is publicly available. This data set is licensed with a CC BY-NC-ND 4.0 license. If you use this dataset, please cite the following paper: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li: ADD 2023: the Second Audio Deepfake Detection Challenge. DADA@IJCAI 2023: 125-130
The dataset used for the manipulated region location task in the second Audio Deepfake Detection Challenge (ADD 2023).
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li: ADD 2023: the Second Audio Deepfake Detection Challenge. DADA@IJCAI 2023: 125-130
Original Thread: https://www.kaggle.com/c/deepfake-detection-challenge/discussion/122786
All the links:
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li: ADD 2023: the Second Audio Deepfake Detection Challenge. DADA@IJCAI 2023: 125-130
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 (http://addchallenge.cn/add2022) includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake game (FG). The LF track focuses on dealing with bona fide and fully fake utterances with various real-world noises etc. The PF track aims to distinguish the partially fake audio from the real. The FG track is a rivalry game, which includes two tasks: an audio generation task and an audio fake detection task. In this paper, we describe the datasets, evaluation metrics, and protocols. We also report major findings that reflect the recent advances in audio deepfake detection tasks.
The ADD 2022 dataset is publicly available.
This data set is licensed with a CC BY-NC-ND 4.0 license.
If you use this dataset, please cite the following paper:
Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li:
ADD 2022: the first Audio Deep Synthesis Detection Challenge. ICASSP 2022: 9216-9220
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Fake Image Detection Market size was valued at USD 276.65 Million in 2024 and is projected to reach USD 1417.59 Million by 2031, growing at a CAGR of 22.66% from 2024 to 2031.
Global Fake Image Detection Market Overview
The widespread availability of image editing software and social media platforms has led to a surge in fake images, including digitally altered photos and manipulated visual content. This trend has fueled the demand for advanced detection solutions capable of identifying and flagging fake images in real-time. With the proliferation of fake news and misinformation online, there is an increasing awareness among consumers, businesses, and governments about the importance of combating digital fraud and preserving the authenticity of visual content. This heightened concern is driving investments in fake image detection technologies to mitigate the risks associated with misinformation.
However, despite advancements in AI and ML, detecting fake images remains a complex and challenging task, especially when dealing with sophisticated techniques such as deepfakes and generative adversarial networks (GANs). Developing robust detection algorithms capable of identifying increasingly sophisticated forms of image manipulation poses a significant challenge for researchers and developers. The deployment of fake image detection technologies raises concerns about privacy and data ethics, particularly regarding the collection and analysis of visual content shared online. Balancing the need for effective detection with respect for user privacy and ethical considerations remains a key challenge for stakeholders in the Fake Image Detection Market.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset is the training set (part 2 of 3) of the Codecfake dataset , corresponding to the manuscript "The Codecfake Dataset and Countermeasures for Universal Deepfake Audio Detection".
With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for effective detection methods. Unlike traditional deepfake audio generation, which often involves multi-step processes culminating in vocoder usage, ALM directly utilizes neural codec methods to decode discrete codes into audio. Moreover, driven by large-scale data, ALMs exhibit remarkable robustness and versatility, posing a significant challenge to current audio deepfake detection (ADD)
models. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including two languages, millions of audio samples, and various test conditions, tailored for ALM-based audio detection. Additionally, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose
the CSAM strategy to learn a domain balanced and generalized minima. Experiment results demonstrate that co-training on Codecfake dataset and vocoded dataset with CSAM strategy yield the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models.
Due to platform restrictions on the size of zenodo repositories, we have divided the Codecfake dataset into various subsets as shown in the table below:
Codecfake dataset | description | link |
training set (part 1 of 3) & label | train_split.zip & train_split.z01 - train_split.z06 | https://zenodo.org/records/11171708 |
training set (part 2 of 3) | train_split.z07 - train_split.z14 | https://zenodo.org/records/11171720 |
training set (part 3 of 3) | train_split.z15 - train_split.z19 | https://zenodo.org/records/11171724 |
development set | dev_split.zip & dev_split.z01 - dev_split.z02 | https://zenodo.org/records/11169872 |
test set (part 1 of 2) | Codec test: C1.zip - C6.cip & ALM test: A1.zip - A3.zip | https://zenodo.org/records/11169781 |
test set (part 2 of 2) | Codec unseen test: C7.zip | https://zenodo.org/records/11125029 |
The source code of the countermeasure and pre-trained model are available on GitHub https://github.com/xieyuankun/Codecfake.
The Codecfake dataset and pre-trained model are licensed with CC BY-NC-ND 4.0 license.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The DFDC (Deepfake Detection Challenge) is a dataset for deepface detection consisting of more than 100,000 videos.
The DFDC dataset consists of two versions:
Preview dataset. with 5k videos. Featuring two facial modification algorithms. Full dataset, with 124k videos. Featuring eight facial modification algorithms