27 datasets found

DEEP-VOICE: DeepFake Voice Recognition
kaggle.com
Updated Aug 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan J. Bird (2023). DEEP-VOICE: DeepFake Voice Recognition [Dataset]. https://www.kaggle.com/datasets/birdy654/deep-voice-deepfake-voice-recognition
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jordan J. Bird
Description
DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

This dataset contains examples of real human speech, and DeepFake versions of those speeches by using Retrieval-based Voice Conversion.

Can machine learning be used to detect when speech is AI-generated?

Introduction

There are growing implications surrounding generative AI in the speech domain that enable voice cloning and real-time voice conversion from one individual to another. This technology poses a significant ethical threat and could lead to breaches of privacy and misrepresentation, thus there is an urgent need for real-time detection of AI-generated speech for DeepFake Voice Conversion.

To address the above emerging issues, we are introducing the DEEP-VOICE dataset. DEEP-VOICE is comprised of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion.

For each speech, the accompaniment ("background noise") was removed before conversion using RVC. The original accompaniment is then added back to the DeepFake speech:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2039603%2F921dc2241837cd784329955d570f7802%2Fdfcover.png?generation=1692897655324630&alt=media" alt="Overview of the Retrieval-based Voice Conversion process to generate DeepFake speech with Ryan Gosling's speech converted to Margot Robbie. Conversion is run on the extracted vocals before being layered on the original background ambience.">

(Above: Overview of the Retrieval-based Voice Conversion process to generate DeepFake speech with Ryan Gosling's speech converted to Margot Robbie. Conversion is run on the extracted vocals before being layered on the original background ambience.)

Dataset

There are two forms to the dataset that are made available.

First, the raw audio can be found in the "AUDIO" directory. They are arranged within "REAL" and "FAKE" class directories. The audio filenames note which speakers provided the real speech, and which voices they were converted to. For example "Obama-to-Biden" denotes that Barack Obama's speech has been converted to Joe Biden's voice.

Second, the extracted features can be found in the "DATASET-balanced.csv" file. This is the data that was used in the below study. The dataset has each feature extracted from one-second windows of audio and are balanced through random sampling.

**Note: ** All experimental data is found within the "KAGGLE" directory. The "DEMONSTRATION" directory is used for playing cropped and compressed demos in notebooks due to Kaggle's limitations on file size.

A potential use of a successful system could be used for the following:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2039603%2F7ae536243464f0dbb48f3566765f6b50%2Fdfcover.png?generation=1692897790677119&alt=media" alt="Usage of the real-time system. The end user is notified when the machine learning model has processed the speech audio (e.g. a phone or conference call) and predicted that audio chunks contain AI-generated speech.">

(Above: Usage of the real-time system. The end user is notified when the machine learning model has processed the speech audio (e.g. a phone or conference call) and predicted that audio chunks contain AI-generated speech.)

Papers with Code

The dataset and all studies using it are linked using Papers with Code

The Papers with Code page can be found by clicking here: Papers with Code

Attribution

This dataset was produced from the study "Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion"

Bird, J.J. and Lotfi, A., 2023. Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion. arXiv preprint arXiv:2308.12734.

The preprint can be found on ArXiv by clicking here: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

License

This dataset is provided under the MIT License:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

*THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT H...
Z
Data from: WaveFake: A data set to facilitate audio DeepFake detection
data.niaid.nih.gov
Updated Jul 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schönherr, Lea (2024). WaveFake: A data set to facilitate audio DeepFake detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4904578
Explore at:
Dataset updated
Jul 18, 2024
Dataset provided by
Frank, Joel
Schönherr, Lea
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The main purpose of this data set is to facilitate research into audio DeepFakes. We hope that this work helps in finding new detection methods to prevent such attempts. These generated media files have been increasingly used to commit impersonation attempts or online harassment.

The data set consists of 104,885 generated audio clips (16-bit PCM wav). We examine multiple networks trained on two reference data sets. First, the LJSpeech data set consisting of 13,100 short audio clips (on average 6 seconds each; roughly 24 hours total) read by a female speaker. It features passages from 7 non-fiction books and the audio was recorded on a MacBook Pro microphone. Second, we include samples based on the JSUT data set, specifically, basic5000 corpus. This corpus consists of 5,000 sentences covering all basic kanji of the Japanese language (4.8 seconds on average; roughly 6.7 hours total). The recordings were performed by a female native Japanese speaker in an anechoic room. Finally, we include samples from a full text-to-speech pipeline (16,283 phrases; 3.8s on average; roughly 17.5 hours total). Thus, our data set consists of approximately 175 hours of generated audio files in total. Note that we do not redistribute the reference data.

We included a range of architectures in our data set:

MelGAN

Parallel WaveGAN

Multi-Band MelGAN

Full-Band MelGAN

WaveGlow

Additionally, we examined a bigger version of MelGAN and include samples from a full TTS-pipeline consisting of a conformer and parallel WaveGAN model.

Collection Process

For WaveGlow, we utilize the official implementation (commit 8afb643) in conjunction with the official pre-trained network on PyTorch Hub. We use a popular implementation available on GitHub (commit 12c677e) for the remaining networks. The repository also offers pre-trained models. We used the pre-trained networks to generate samples that are similar to their respective training distributions, LJ Speech and JSUT. When sampling the data set, we first extract Mel spectrograms from the original audio files, using the pre-processing scripts of the corresponding repositories. We then feed these Mel spectrograms to the respective models to obtain the data set. For sampling the full TTS results, we use the ESPnet project. To make sure the generated phrases do not overlap with the training set, we downloaded the common voices data set and extracted 16.285 phrases from it.

This data set is licensed with a CC-BY-SA 4.0 license.

This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -- EXC-2092 CaSa -- 390781972.
DEepfake CROss-lingual evaluation dataset (DECRO)
zenodo.org
zip
Updated Sep 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhongjie Ba; Qing Wen; Peng Cheng; Yuwei Wang; Feng Lin; Li Lu; Zhenguang Liu; Kui Ren; Zhongjie Ba; Qing Wen; Peng Cheng; Yuwei Wang; Feng Lin; Li Lu; Zhenguang Liu; Kui Ren (2023). DEepfake CROss-lingual evaluation dataset (DECRO) [Dataset]. http://doi.org/10.5281/zenodo.7603208
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7603208
Dataset updated
Sep 12, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zhongjie Ba; Qing Wen; Peng Cheng; Yuwei Wang; Feng Lin; Li Lu; Zhenguang Liu; Kui Ren; Zhongjie Ba; Qing Wen; Peng Cheng; Yuwei Wang; Feng Lin; Li Lu; Zhenguang Liu; Kui Ren
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deepfake cross-lingual evaluation dataset (DECRO) is constructed to evaluate the influence of language differences on deepfake detection.

If you use DECRO dataset for deepfake detection, please cite the paper "Transferring Audio Deepfake Detection Capability across Languages" published in www'23.
deepfake-detection-demo
huggingface.co
Updated Aug 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Behavioral Signals (2025). deepfake-detection-demo [Dataset]. https://huggingface.co/datasets/behavioralsignals/deepfake-detection-demo
Explore at:
Dataset updated
Aug 2, 2025
Dataset provided by
Behavioral Signal Technologies Inc.
Authors
Behavioral Signals
Description
Deepfake Detection Demo

This is a demo evaluation dataset for the task of Deepfake Detection on human speech. This dataset has been created to demonstate the capabalities of Behavioral Signals API.

Information

The dataset contains 22 utterances, containg an equal amount of genuine ("bonafide") and fake ("spoofed") utterances.Utterances from the "bonafide" class have been sourced from the test set of CommonVoice-17.0 corpus.The "deepfake" utterances have been cloned… See the full description on the dataset page: https://huggingface.co/datasets/behavioralsignals/deepfake-detection-demo.
CVoiceFake-Full ("SafeEar: Content Privacy-Preserving Audio Deepfake...
zenodo.org
bin
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xinfeng Li; Xinfeng Li (2024). CVoiceFake-Full ("SafeEar: Content Privacy-Preserving Audio Deepfake Detection", ACM CCS 2024) [Dataset]. http://doi.org/10.5281/zenodo.11229569
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11229569
Dataset updated
Oct 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Xinfeng Li; Xinfeng Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 7, 2024
Description
Introduction:

CVoiceFake (Full) encompasses five common languages (English, Chinese, German, French, and Italian) and utilizes multi-advanced and classical voice cloning techniques (Parallel WaveGAN, Multi-band MelGAN, Style MelGAN, Griffin-Lim, WORLD, and DiffWave) to produce audio samples that bear a high resemblance to authentic audio.

Parallel WaveGAN: As a non-autoregressive vocoder-based model, Parallel WaveGAN produces high-fidelity audio rapidly, ideal for efficient and quality deepfake generation.

Multi-band MelGAN: Multi-band MelGAN is a variant of MelGAN that divides the frequency spectrum into sub-bands for faster and more stable multi-lingual vocoder training, enhancing the robustness and scalability of the dataset.

Style MelGAN: Style MelGAN is designed to capture fine prosodic and stylistic nuances of speech, making it particularly compelling for deepfake applications that require high levels of expressivity and variation in speech synthesis.

Griffin-Lim: This algorithm reconstructs waveforms from spectrograms using an iterative phase estimation method. Though less high-fidelity than neural vocoders, it serves as a traditional baseline for comparing deepfake generation.

WORLD: WORLD is a statistical parameter-based voice synthesis system that offers fine control over the spectral and prosodic features of the synthesized audio. Its fine manipulation is useful for crafting the nuanced variations needed in deepfake datasets.

We have also built the SOTA diffusion-based deepfake audio (DiffWave); please contact the author at xinfengli@zju.edu.cn if you are interested in the dataset, particularly the DiffWave portion. Furthermore, any additional discussions are welcomed.
DiffWave: DiffWave is a diffusion probability model for waveform generation. It converts the white noise signal into structured waveform through a Markov chain, capable of both conditional and unconditional generation tasks. DiffWave represents the advanced synthesis method for its fast synthesis speed and high synthesis quality.

Full Dataset & Project Page:

The sampled small dataset is available on CVoiceFake Small as well. Please kindly also refer to the project page: SafeEar Website.

Citation:

If you find our paper/code/dataset helpful, please kindly consider citing this work with the following reference:

@inproceedings{li2024safeear,
author = {Li, Xinfeng and Li, Kai and Zheng, Yifan and Yan, Chen and Ji, Xiaoyu, and Xu, Wenyuan},
title = {{SafeEar: Content Privacy-Preserving Audio Deepfake Detection}},
booktitle = {Proceedings of the 2024 {ACM} {SIGSAC} Conference on Computer and Communications Security (CCS)}
year = {2024},
}
Z
ASVspoof 5: Design, Collection and Validation of Resources for Spoofing,...
data.niaid.nih.gov
zenodo.org
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yamagishi, Junichi (2025). ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14498690
Explore at:
Dataset updated
Feb 14, 2025
Dataset provided by
Zhu, Ge
Lee, Kong Aik
Kinnunen, Tomi
Zhang, Wangyou
Muller, Nicolas
Todisco, Massimiliano
Kukanov, Ivan
Wang, Xin
Sun, Chengzhe
Yamagishi, Junichi
Evans, Nicholas
Sahidullah, Md
Hou, Shuwei
Le Maguer, Sebastien
Liu, Xuechen
Jeong, Myeonghun
Zang, Yongyi
Shim, Hyejin
Jung, Jee-weon
Guo, Hanjie
Lyu, Siwei
Chen, Liping
Gong, Cheng
Singh, Vishwanath
Lux, Florian
Maiti, Soumi
Delgado, Héctor
Zhang, Neil
Tak, Hemlata
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This is the Zenodo repository for the ASVspoof 5 database. ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks, and the design of detection solutions. Compared to previous challenges, the ASVspoof~5 database is built from crowdsourced data collected from around 2,000 speakers in diverse acoustic conditions. More than 20 attacks, also crowdsourced, are generated and optionally tested using surrogate detection models, while seven adversarial attacks are incorporated for the first time.

Please check README.txt and LICENSE.txt before downloading the database.

Database paper (to be submitted): https://arxiv.org/abs/2502.08857

Please consider citing the reference listed at the bottom of this page.

It is highly recommended to follow the rules and instructions in the ASVspoof 5 challenge evaluation plan (phase 2, https://www.asvspoof.org/), if you want to produce results comparable with the literature.

Latest work using the ASVspoof 5 database can be found in the Automatic Speaker Verification Spoofing Countermeasures Workshop proceeding: https://www.isca-archive.org/asvspoof_2024/index.html

If you are interested in creating spoofed data for research purpose using the ASVspoof 5 protocol, please send request to info@asvspoof.org
Z
TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection
data.niaid.nih.gov
zenodo.org
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Hosler (2022). TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6560158
Explore at:
Dataset updated
Sep 21, 2022
Dataset provided by
Paolo Bestagini
Davide Salvi
Brian Hosler
Matthew C. Stamm
Stefano Tubaro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the rapid development of deep learning techniques, the generation and counterfeiting of multimedia material are becoming increasingly straightforward to perform. At the same time, sharing fake content on the web has become so simple that malicious users can create unpleasant situations with minimal effort. Also, forged media are getting more and more complex, with manipulated videos (e.g., deepfakes where both the visual and audio contents can be counterfeited) that are taking the scene over still images. The multimedia forensic community has addressed the possible threats that this situation could imply by developing detectors that verify the authenticity of multimedia objects. However, the vast majority of these tools only analyze one modality at a time. This was not a problem as long as still images were considered the most widely edited media, but now, since manipulated videos are becoming customary, performing monomodal analyses could be reductive. Nonetheless, there is a lack in the literature regarding multimodal detectors (systems that consider both audio and video components). This is due to the difficulty of developing them but also to the scarsity of datasets containing forged multimodal data to train and test the designed algorithms.

In this paper we focus on the generation of an audio-visual deepfake dataset. First, we present a general pipeline for synthesizing speech deepfake content from a given real or fake video, facilitating the creation of counterfeit multimodal material. The proposed method uses Text-to-Speech (TTS) and Dynamic Time Warping (DTW) techniques to achieve realistic speech tracks. Then, we use the pipeline to generate and release TIMIT-TTS, a synthetic speech dataset containing the most cutting-edge methods in the TTS field. This can be used as a standalone audio dataset, or combined with DeepfakeTIMIT and VidTIMIT video datasets to perform multimodal research. Finally, we present numerous experiments to benchmark the proposed dataset in both monomodal (i.e., audio) and multimodal (i.e., audio and video) conditions. This highlights the need for multimodal forensic detectors and more multimodal deepfake data.

For the initial version of TIMIT-TTS v1.0

Arxiv: https://arxiv.org/abs/2209.08000

TIMIT-TTS Database v1.0: https://zenodo.org/record/6560159
h
EchoFake
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EchoFake, EchoFake [Dataset]. https://huggingface.co/datasets/EchoFake/EchoFake
Explore at:
Authors
EchoFake
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection

Code for baseline models is available at https://github.com/EchoFake/EchoFake Auto-recording tools is available at https://github.com/EchoFake/EchoFake/tree/main/tools

Abstract

The growing prevalence of speech deepfakes has raised serious concerns, particularly in real-world scenarios such as telephone fraud and identity theft. While many anti-spoofing systems have demonstrated promising performance… See the full description on the dataset page: https://huggingface.co/datasets/EchoFake/EchoFake.
The Fake-or-Real (FoR) Dataset (deepfake audio)
kaggle.com
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Abdeldayem (2024). The Fake-or-Real (FoR) Dataset (deepfake audio) [Dataset]. https://www.kaggle.com/datasets/mohammedabdeldayem/the-fake-or-real-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohammed Abdeldayem
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
The Fake-or-Real (FoR) dataset is a collection of more than 195,000 utterances from real humans and computer generated speech. The dataset can be used to train classifiers to detect synthetic speech.

The dataset aggregates data from the latest TTS solutions (such as Deep Voice 3 and Google Wavenet TTS) as well as a variety of real human speech, including the Arctic Dataset (http://festvox.org/cmu_arctic/), LJSpeech Dataset (https://keithito.com/LJ-Speech-Dataset/), VoxForge Dataset (http://www.voxforge.org) and our own speech recordings.

The dataset is published in four versions: for-original, for-norm, for-2sec and for-rerec.

The first version, named for-original, contains the files as collected from the speech sources, without any modification (balanced version).

The second version, called for-norm, contains the same files, but balanced in terms of gender and class and normalized in terms of sample rate, volume and number of channels.

The third one, named for-2sec is based on the second one, but with the files truncated at 2 seconds.

The last version, named for-rerec, is a rerecorded version of the for-2second dataset, to simulate a scenario where an attacker sends an utterance through a voice channel (i.e. a phone call or a voice message).
h
SINE
huggingface.co
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peaceful Data (2025). SINE [Dataset]. https://huggingface.co/datasets/PeacefulData/SINE
Explore at:
Dataset updated
Jun 8, 2025
Dataset authored and provided by
Peaceful Data
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
SINE Dataset

Overview

The Speech INfilling Edit (SINE) dataset is a comprehensive collection for speech deepfake detection and audio authenticity verification. This dataset contains ~87GB of audio data distributed across 32 splits, featuring both authentic and synthetically manipulated speech samples.

Dataset Statistics

Total Size: ~87GB Number of Splits: 32 (split-0.tar.gz to split-31.tar.gz) Audio Format: WAV files Source: Speech edited from LibriLight… See the full description on the dataset page: https://huggingface.co/datasets/PeacefulData/SINE.
f
Confusion matrix for the binary group responses.
plos.figshare.com
bin
Updated Aug 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kimberly T. Mai; Sergi Bray; Toby Davies; Lewis D. Griffin (2023). Confusion matrix for the binary group responses. [Dataset]. http://doi.org/10.1371/journal.pone.0285333.t004
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285333.t004
Dataset updated
Aug 2, 2023
Dataset provided by
PLOS ONE
Authors
Kimberly T. Mai; Sergi Bray; Toby Davies; Lewis D. Griffin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Speech deepfakes are artificial voices generated by machine learning models. Previous literature has highlighted deepfakes as one of the biggest security threats arising from progress in artificial intelligence due to their potential for misuse. However, studies investigating human detection capabilities are limited. We presented genuine and deepfake audio to n = 529 individuals and asked them to identify the deepfakes. We ran our experiments in English and Mandarin to understand if language affects detection performance and decision-making rationale. We found that detection capability is unreliable. Listeners only correctly spotted the deepfakes 73% of the time, and there was no difference in detectability between the two languages. Increasing listener awareness by providing examples of speech deepfakes only improves results slightly. As speech synthesis algorithms improve and become more realistic, we can expect the detection task to become harder. The difficulty of detecting speech deepfakes confirms their potential for misuse and signals that defenses against this threat are needed.
h
CodecDeepfakeDetection
huggingface.co
Updated Sep 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Lux (2025). CodecDeepfakeDetection [Dataset]. https://huggingface.co/datasets/Flux9665/CodecDeepfakeDetection
Explore at:
Dataset updated
Sep 20, 2025
Authors
Florian Lux
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
In this spoof detection dataset, the bonafide speech is resynthesized using various popular neural audio codecs, which are used for compression and low-bandwidth transmission of speech signals. The spoofed speech samples we provide are generated with a selection of popular and well performing language model based speech synthesis methods, which utilize the same codecs as the bonafide audios to obtain discretized speech tokens. This takes the artifacts of the codecs out of the equation and lets… See the full description on the dataset page: https://huggingface.co/datasets/Flux9665/CodecDeepfakeDetection.
ADD 2023 Challenge Track 1.1 Evaluation Dataset
zenodo.org
application/gzip
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiangyan Yi; Chu Yuan Zhang; Jiangyan Yi; Chu Yuan Zhang (2024). ADD 2023 Challenge Track 1.1 Evaluation Dataset [Dataset]. http://doi.org/10.5281/zenodo.12145773
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12145773
Dataset updated
Jul 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiangyan Yi; Chu Yuan Zhang; Jiangyan Yi; Chu Yuan Zhang
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description

Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge (http://addchallenge.cn/add2023) includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks.

The ADD 2023 dataset is publicly available.

This data set is licensed with a CC BY-NC-ND 4.0 license.

If you use this dataset, please cite the following paper:

Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li: ADD 2023: the Second Audio Deepfake Detection Challenge. DADA@IJCAI 2023: 125-130
Data from: PartialEdit: Identifying Partial Deepfakes in the Era of Neural...
zenodo.org
application/gzip, csv
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
You Zhang; You Zhang; Baotong Tian; Baotong Tian; Lin Zhang; Lin Zhang; Zhiyao Duan; Zhiyao Duan (2025). PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing [Dataset]. http://doi.org/10.5281/zenodo.15519188
Explore at:
application/gzip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15519188
Dataset updated
May 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
You Zhang; You Zhang; Baotong Tian; Baotong Tian; Lin Zhang; Lin Zhang; Zhiyao Duan; Zhiyao Duan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is part of the dataset we curated based on VCTK to study partial speech deepfake detection in the era of neural speech editing. For more details, please refer to our Interspeech 2025 paper: "PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing".

In the paper, we curated four subsets: E1: VoiceCraft, E2: SSR-Speech, E3: Audiobox-Speech, and E4: Audiobox. Adhering to Audiobox's license, we cannot release the E3 and E4 subsets.

The folder structure is as follows:

PartialEdit/
├── PartialEdit_E1E2.csv
├── E1/
│ ├── p225/
│ │ ├── p225_001_edited_partial_16k.wav
│ │ ├── p225_002_edited_partial_16k.wav
│ │ └── ...
│ ├── p231/
│ │ ├── p231_001_edited_partial_16k.wav
│ │ ├── p231_002_edited_partial_16k.wav
│ │ └── ...
│ └── ...
├── E1-Codec/
│ └── (same structure as E1)
├── E2/
│ └── (same structure as E1)
├── E2-Codec/
│ └── (same structure as E1)
└── modified_txt/
├── p225/
│ ├── p225_001_modified.txt
│ ├── p225_002_modified.txt
│ ├── p225_003_modified.txt
│ └── ...
├── p231/
│ ├── p231_001_modified.txt
│ ├── p231_002_modified.txt
│ └── ...
└── ...

This is version 1.0, and we will include links to the paper and demo page soon.

The `PartialEdit_E1E2.csv` file contains information about the edited regions in each audio file. Each row represents the following columns:

- `filename`: The name of the audio file.
- `start of the edited region (s)`: The starting time (in seconds) of the first edited region.
- `end of the edited region (s)`: The ending time (in seconds) of the first edited region.
- `total duration (s)`: The total duration (in seconds) of the audio file.

If there are two edited regions within a file, the row format expands to include:

- `filename`: The name of the audio file.
- `start of the edited region (s)`: The starting time (in seconds) of the first edited region.
- `end of the edited region (s)`: The ending time (in seconds) of the first edited region.
- `start of the second edited region (s)`: The starting time (in seconds) of the second edited region.
- `end of the second edited region (s)`: The ending time (in seconds) of the second edited region.
- `total duration (s)`: The total duration (in seconds) of the audio file.

To make sure the download is complete, you can check the MD5 code with the following command:

md5sum *
A
AI Voice Changer Tool Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). AI Voice Changer Tool Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-voice-changer-tool-494160
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Aug 1, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The AI voice changer tool market is experiencing robust growth, projected to reach $237 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 9.6% from 2025 to 2033. This expansion is driven by several key factors. The increasing demand for personalized and accessible content creation across various sectors, including entertainment, education, and accessibility solutions, fuels market adoption. Advances in artificial intelligence, specifically in natural language processing and speech synthesis, are continuously improving the quality and realism of AI-generated voices, making them more appealing to both individual users and businesses. Furthermore, the rising affordability and ease of access to AI voice changer tools through cloud-based platforms and user-friendly software are broadening the market's reach. The market is also being shaped by trends towards greater voice-based interaction in applications and the increasing need for efficient and cost-effective audio production. Despite these positive trends, the market faces certain restraints. Concerns regarding ethical implications, particularly regarding potential misuse for malicious purposes like deepfakes, represent a significant challenge. The market also needs to overcome technological limitations in perfectly replicating nuanced human speech patterns and emotions. Addressing these challenges through technological advancements and robust ethical guidelines will be crucial for the sustainable and responsible growth of the AI voice changer tool market. Competition among numerous players such as Voice-Swap, Clipchamp, Lovo.ai, Speechify, PlayHT, Murf, Synthesys, VocaliD, Respeecher, Speechelo, Wavve, Altered, Listnr AI, and ReadSpeaker will further influence market dynamics. The market segmentation, while not explicitly provided, can be logically inferred as encompassing different pricing tiers, software vs. cloud-based solutions, and specific application areas (e.g., gaming, e-learning).
h
CodecFake_wavs
huggingface.co
Updated Aug 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuan Tseng (2024). CodecFake_wavs [Dataset]. https://huggingface.co/datasets/rogertseng/CodecFake_wavs
Explore at:
Dataset updated
Aug 2, 2024
Authors
Yuan Tseng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems

Paper, Code, Project Page Interspeech 2024

TL;DR: We show that better detection of deepfake speech from codec-based TTS systems can be achieved by training models on speech re-synthesized with neural audio codecs. This dataset is released for this purpose. See our paper and Github for more details on using our dataset.

Acknowledgement… See the full description on the dataset page: https://huggingface.co/datasets/rogertseng/CodecFake_wavs.
FAD: A Chinese Dataset for Fake Audio Detection
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haoxin Ma; Jiangyan Yi; Haoxin Ma; Jiangyan Yi (2023). FAD: A Chinese Dataset for Fake Audio Detection [Dataset]. http://doi.org/10.5281/zenodo.6635521
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6635521
Dataset updated
Jul 9, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Haoxin Ma; Jiangyan Yi; Haoxin Ma; Jiangyan Yi
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a
Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for
noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for
audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging.
The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD

The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy
conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into
disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can
evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model.
For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the
remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2.

Clean Real Audios Collection
From the point of eliminating the interference of irrelevant factors, we collect clean real audios from
two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset.

Clean Fake Audios Generation
We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios.

Noisy Audios Simulation
Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different
SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes.

This data set is licensed with a CC BY-NC-ND 4.0 license.
You can cite the data using the following BibTeX entry:
@inproceedings{ma2022fad,
title={FAD: A Chinese Dataset for Fake Audio Detection},
author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xunrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu},
booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks },
year={2022},
}
h
famousfigures
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Information Systems, Security and Forensics Lab, famousfigures [Dataset]. https://huggingface.co/datasets/issf/famousfigures
Explore at:
Dataset authored and provided by
Information Systems, Security and Forensics Lab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Famous Figures: Collecting, Curating, and Annotating Good Quality Speech Deepfake Dataset for Famous Figures — Process and Challenges

Paper, Project Page

Authors Hashim Ali, Surya Subramani, Raksha Varahamurthy, Nithin Adupa, Lekha Bollinani, Hafiz Malik

Interspeech 2025

Abstract Current audio deepfake detection systems fail to protect specific individuals from targeted voice spoofing attacks.A comprehensive methodology for collecting, curating, and generating high-quality speech… See the full description on the dataset page: https://huggingface.co/datasets/issf/famousfigures.
Z
FaciaVox a Multimodal Biometric Dataset
data.niaid.nih.gov
zenodo.org
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abuqaaud, Kamal (2025). FaciaVox a Multimodal Biometric Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14861091
Explore at:
Dataset updated
Feb 13, 2025
Dataset provided by
Bou Nassif, Ali
Shahin, Ismail
Abuqaaud, Kamal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The FaciaVox dataset is an extensive multimodal biometric resource designed to enable in-depth exploration of face-image and voice recording research areas in both masked and unmasked scenarios.

Features of the Dataset:

Multimodal Data: A total of 1,800 face images (JPG) and 6,000 audio recordings (WAV) were collected, enabling cross-domain analysis of visual and auditory biometrics.

Participants were categorized into four age groups for structured labeling:Label 1: Under 16 yearsLabel 2: 16 to less than 31 yearsLabel 3: 31 to less than 46 yearsLabel 4: 46 years and above

Sibling Data: Some participants are siblings, adding a challenging layer for speaker identification and facial recognition tasks due to genetic similarities in vocal and facial features. Sibling relationships are documented in the accompanying "FaciaVox List" data file.

Standardized Filenames: The dataset uses a consistent, intuitive naming convention for both facial images and voice recordings. Each filename includes:Type (F: Face Image, V: Voice Recording)Participant ID (e.g., sub001)Mask Type (e.g., a: unmasked, b: disposable mask, etc.)Zoom Level or Sentence ID (e.g., 1x, 3x, 5x for images or specific sentence identifier {01, 02, 03, ..., 10} for recordings)

Diverse Demographics: 19 different countries.

A challenging face recognition problem involving reflective mask shields and severe lighting conditions.

Each participant uttered 7 English statements and 3 Arabic statements, regardless of their native language. This adds a challenge for speaker identification.

Research Applications

FaciaVox is a versatile dataset supporting a wide range of research domains, including but not limited to:• Speaker Identification (SI) and Face Recognition (FR): Evaluating biometric systems under varying conditions.• Impact of Masks on Biometrics: Investigating how different facial coverings affect recognition performance.• Language Impact on SI: Exploring the effects of native and non-native speech on speaker identification.• Age and Gender Estimation: Inferring demographic information from voice and facial features.• Race and Ethnicity Matching: Studying biometrics across diverse populations.• Synthetic Voice and Deepfake Detection: Detecting cloned or generated speech.• Cross-Domain Biometric Fusion: Combining facial and vocal data for robust authentication.• Speech Intelligibility: Assessing how masks influence speech clarity.• Image Inpainting: Reconstructing occluded facial regions for improved recognition.

Researchers can use the facial images and voice recordings independently or in combination to explore multimodal biometric systems. The standardized filenames and accompanying metadata make it easy to align visual and auditory data for cross-domain analyses. Sibling relationships and demographic labels add depth for tasks such as familial voice recognition, demographic profiling, and model bias evaluation.
ADD 2023 Challenge Track 3 Training/Development Dataset
zenodo.org
application/gzip, bin
Updated Jul 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiangyan Yi; Chu Yuan Zhang; Jiangyan Yi; Chu Yuan Zhang (2024). ADD 2023 Challenge Track 3 Training/Development Dataset [Dataset]. http://doi.org/10.5281/zenodo.12179632
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12179632
Dataset updated
Jul 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiangyan Yi; Chu Yuan Zhang; Jiangyan Yi; Chu Yuan Zhang
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description

Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge (http://addchallenge.cn/add2023) includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks.

The ADD 2023 dataset is publicly available.

This data set is licensed with a CC BY-NC-ND 4.0 license.

If you use this dataset, please cite the following paper:

Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li: ADD 2023: the Second Audio Deepfake Detection Challenge. DADA@IJCAI 2023: 125-130

Facebook

Twitter

Click to copy link

Link copied

Cite

Jordan J. Bird (2023). DEEP-VOICE: DeepFake Voice Recognition [Dataset]. https://www.kaggle.com/datasets/birdy654/deep-voice-deepfake-voice-recognition

DEEP-VOICE: DeepFake Voice Recognition

Using machine learning to detect when speech is AI-Generated

Explore at:

9 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 24, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Jordan J. Bird

Description

DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

This dataset contains examples of real human speech, and DeepFake versions of those speeches by using Retrieval-based Voice Conversion.

Can machine learning be used to detect when speech is AI-generated?

Introduction

There are growing implications surrounding generative AI in the speech domain that enable voice cloning and real-time voice conversion from one individual to another. This technology poses a significant ethical threat and could lead to breaches of privacy and misrepresentation, thus there is an urgent need for real-time detection of AI-generated speech for DeepFake Voice Conversion.

To address the above emerging issues, we are introducing the DEEP-VOICE dataset. DEEP-VOICE is comprised of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion.

For each speech, the accompaniment ("background noise") was removed before conversion using RVC. The original accompaniment is then added back to the DeepFake speech:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2039603%2F921dc2241837cd784329955d570f7802%2Fdfcover.png?generation=1692897655324630&alt=media" alt="Overview of the Retrieval-based Voice Conversion process to generate DeepFake speech with Ryan Gosling's speech converted to Margot Robbie. Conversion is run on the extracted vocals before being layered on the original background ambience.">

(Above: Overview of the Retrieval-based Voice Conversion process to generate DeepFake speech with Ryan Gosling's speech converted to Margot Robbie. Conversion is run on the extracted vocals before being layered on the original background ambience.)

Dataset

There are two forms to the dataset that are made available.

First, the raw audio can be found in the "AUDIO" directory. They are arranged within "REAL" and "FAKE" class directories. The audio filenames note which speakers provided the real speech, and which voices they were converted to. For example "Obama-to-Biden" denotes that Barack Obama's speech has been converted to Joe Biden's voice.

Second, the extracted features can be found in the "DATASET-balanced.csv" file. This is the data that was used in the below study. The dataset has each feature extracted from one-second windows of audio and are balanced through random sampling.

**Note: ** All experimental data is found within the "KAGGLE" directory. The "DEMONSTRATION" directory is used for playing cropped and compressed demos in notebooks due to Kaggle's limitations on file size.

A potential use of a successful system could be used for the following:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2039603%2F7ae536243464f0dbb48f3566765f6b50%2Fdfcover.png?generation=1692897790677119&alt=media" alt="Usage of the real-time system. The end user is notified when the machine learning model has processed the speech audio (e.g. a phone or conference call) and predicted that audio chunks contain AI-generated speech.">

(Above: Usage of the real-time system. The end user is notified when the machine learning model has processed the speech audio (e.g. a phone or conference call) and predicted that audio chunks contain AI-generated speech.)

Papers with Code

The dataset and all studies using it are linked using Papers with Code

The Papers with Code page can be found by clicking here: Papers with Code

Attribution

This dataset was produced from the study "Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion"

Bird, J.J. and Lotfi, A., 2023. Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion. arXiv preprint arXiv:2308.12734.

The preprint can be found on ArXiv by clicking here: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

License

This dataset is provided under the MIT License:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

*THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT H...

Clear search

Close search

Google apps

Main menu

DEEP-VOICE: DeepFake Voice Recognition

DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

Introduction

Dataset

Papers with Code

Attribution

License

Data from: WaveFake: A data set to facilitate audio DeepFake detection

DEepfake CROss-lingual evaluation dataset (DECRO)

deepfake-detection-demo

CVoiceFake-Full ("SafeEar: Content Privacy-Preserving Audio Deepfake...

Introduction:

Full Dataset & Project Page:

Citation:

ASVspoof 5: Design, Collection and Validation of Resources for Spoofing,...

TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection

EchoFake

The Fake-or-Real (FoR) Dataset (deepfake audio)

SINE

Confusion matrix for the binary group responses.

CodecDeepfakeDetection

ADD 2023 Challenge Track 1.1 Evaluation Dataset

Data from: PartialEdit: Identifying Partial Deepfakes in the Era of Neural...

AI Voice Changer Tool Report

CodecFake_wavs

FAD: A Chinese Dataset for Fake Audio Detection

famousfigures

FaciaVox a Multimodal Biometric Dataset

ADD 2023 Challenge Track 3 Training/Development Dataset

DEEP-VOICE: DeepFake Voice Recognition

Using machine learning to detect when speech is AI-Generated

DEEP-VOICE: Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

Introduction

Dataset

Papers with Code

Attribution

License