5 datasets found

In The Wild (audio Deepfake)
kaggle.com
zip
Updated Apr 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdalla Mohamed (2024). In The Wild (audio Deepfake) [Dataset]. https://www.kaggle.com/datasets/abdallamohamed312/in-the-wild-audio-deepfake
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 20, 2024
Authors
Abdalla Mohamed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
'In-the-Wild' Dataset We present a dataset of audio deepfakes (and corresponding benign audio) for a set of politicians and other public figures, collected from publicly available sources such as social networks and video streaming platforms. For n = 58 celebrities and politicians, we collect both bona-fide and spoofed audio. In total, we collect 20.8 hours of bona-fide and 17.2 hours of spoofed audio. On average, there are 23 minutes of bona-fide and 18 minutes of spoofed audio per speaker.

The dataset is intended to be used for evaluating deepfake detection and voice anti-spoofing machine-learning models. It is especially useful to judge a model's capability to generalize to realistic, in-the-wild audio samples. Find more information in our paper, and download the dataset here.

The most interesting deepfake detection models we used in our experiments are open-source on GitHub:

RawNet 2 RawGAT-ST PC-Darts This dataset and the associated documentation are licensed under the Apache License, Version 2.0.

Speaker Recognition - CMU ARCTIC

kaggle.com

Updated Nov 21, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Gabriel Lins (2022). Speaker Recognition - CMU ARCTIC [Dataset]. https://www.kaggle.com/datasets/mrgabrielblins/speaker-recognition-cmu-arctic

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 21, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Gabriel Lins

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Can you predict which speaker is talking?
Can you predict what they are saying? This dataset makes all of these possible. Perfect for a school project, research project, or resume builder.

File information

train.csv - file containing all the data you need for training, with 4 columns, id (file id), file_path(path to .wav files), speech(transcription of audio file), and speaker (target column)
test.csv - file containing all the data you need to test your model (20% of total audio files), it has the same columns as train.csv
train/ - Folder with training data, subdivided with Speaker's folders
- aew/ - Folder containing audio files in .wav format for speaker aew
- ...
test/ - Folder containing audio files for test data.

Column description

Column	Description
id	file id (string)
file_path	file path to .wav file (string)
speech	transcription of the audio file (string)
speaker	speaker name, use this as the target variable if you are doing audio classification (string)

More Details

The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US-English single-speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.

The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databases include US English male (bdl) and female (slt) speakers (both experienced voice talent) as well as other accented speakers.

The 1132 sentence prompt list is available from cmuarctic.data

The distributions include 16KHz waveform and simultaneous EGG signals. Full phonetically labeling was performed by the CMU Sphinx using the FestVox based labeling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labeling, etc.

Acknowledgements

This work was partially supported by the U.S. National Science Foundation under Grant No. 0219687, "ITR/CIS Evaluation and Personalization of Synthetic Voices". Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

m
A kiswahili Dataset for Development of Text-To-Speech System
data.mendeley.com
Updated Nov 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kiptoo Rono (2021). A kiswahili Dataset for Development of Text-To-Speech System [Dataset]. http://doi.org/10.17632/vbvj6j6pm9.1
Explore at:
Unique identifier
https://doi.org/10.17632/vbvj6j6pm9.1
Dataset updated
Nov 30, 2021
Authors
Kiptoo Rono
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains Kiswahili text and audio files. The dataset contains 7,108 text files and audio files. The Kiswahili dataset was created from an open-source non-copyrighted material: Kiswahili audio Bible. The authors permit use for non-profit, educational, and public benefit purposes. The downloaded audio files length was more than 12.5s. Therefore, the audio files were programmatically split into short audio clips based on silence. They were then combined based on a random length such that each eventual audio file lies between 1 to 12.5s. This was done using python 3. The audio files were saved as a single channel,16 PCM WAVE file with a sampling rate of 22.05 kHz The dataset contains approximately 106,000 Kiswahili words. The words were then transcribed into mean words of 14.96 per text file and saved in CSV format. Each text file was divided into three parts: unique ID, transcribed words, and normalized words. A unique ID is a number assigned to each text file. The transcribed words are the text spoken by a reader. Normalized texts are the expansion of abbreviations and numbers into full words. An audio file split was assigned a unique ID, the same as the text file.
Wearable SELD Dataset
zenodo.org
explore.openaire.eu
+1more
bin, pdf, zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nagatomo Kento; Yasuda Masahiro; Yasuda Masahiro; Yatabe Kohei; Saito Shoichiro; Oikawa Yasuhiro; Nagatomo Kento; Yatabe Kohei; Saito Shoichiro; Oikawa Yasuhiro (2024). Wearable SELD Dataset [Dataset]. http://doi.org/10.5281/zenodo.6030111
Explore at:
bin, pdf, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6030111
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nagatomo Kento; Yasuda Masahiro; Yasuda Masahiro; Yatabe Kohei; Saito Shoichiro; Oikawa Yasuhiro; Nagatomo Kento; Yatabe Kohei; Saito Shoichiro; Oikawa Yasuhiro
Description
Wearable SELD dataset is a dataset to develop a sound event localization and detection (SELD) system with wearable devices. The dataset contains recordings collected by using wearable devices such as an earphone, a neck speaker, a headphone, and glasses. Wearable SELD dataset has three types of datasets as below.

Earphone type dataset: It contains recordings collected by 12 microphones placed around the ears mimicking microphones.

Mounting type dataset: It contains recordings collected by 12 microphones placed around the head with some accessories mimicking glasses, a headphone, and a neck speaker.

FOA format dataset: It contains 4 channels recordings collected by an ambisonic microphone to allow comparison with conventional methods using FOA format and those using the above datasets.

Further information is available at https://github.com/nttrd-mdlab/wearable-seld-dataset/

License: see the file named LICENSE.pdf
MusicCaps
huggingface.co
Updated Jan 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2023
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for MusicCaps

Dataset Summary

The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Abdalla Mohamed (2024). In The Wild (audio Deepfake) [Dataset]. https://www.kaggle.com/datasets/abdallamohamed312/in-the-wild-audio-deepfake

In The Wild (audio Deepfake)

"Unveiling the World of Audio Deepfakes: A Comprehensive Dataset"

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

zip(0 bytes)Available download formats

Dataset updated

Apr 20, 2024

Authors

Abdalla Mohamed

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

'In-the-Wild' Dataset We present a dataset of audio deepfakes (and corresponding benign audio) for a set of politicians and other public figures, collected from publicly available sources such as social networks and video streaming platforms. For n = 58 celebrities and politicians, we collect both bona-fide and spoofed audio. In total, we collect 20.8 hours of bona-fide and 17.2 hours of spoofed audio. On average, there are 23 minutes of bona-fide and 18 minutes of spoofed audio per speaker.

The dataset is intended to be used for evaluating deepfake detection and voice anti-spoofing machine-learning models. It is especially useful to judge a model's capability to generalize to realistic, in-the-wild audio samples. Find more information in our paper, and download the dataset here.

The most interesting deepfake detection models we used in our experiments are open-source on GitHub:

RawNet 2 RawGAT-ST PC-Darts This dataset and the associated documentation are licensed under the Apache License, Version 2.0.

Clear search

Close search

Google apps

Main menu

In The Wild (audio Deepfake)

Speaker Recognition - CMU ARCTIC

File information

Column description

More Details

Acknowledgements

A kiswahili Dataset for Development of Text-To-Speech System

Wearable SELD Dataset

MusicCaps

In The Wild (audio Deepfake)

"Unveiling the World of Audio Deepfakes: A Comprehensive Dataset"