Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes both VoxCeleb and VoxCeleb2
Multipart Zips
Already joined zips for convenience but these specified files are NOT part of the original datasets vox2_mp4_1.zip - vox2_mp4_6.zip vox2_aac_1.zip - vox2_aac_2.zip
Joining Zip
cat vox1_dev* > vox1_dev_wav.zip
cat vox2_dev_aac* > vox2_aac.zip
cat vox2_dev_mp4* > vox2_mp4.zip
Citation Information
@article{Nagrani19, author = "Arsha Nagrani and Joon~Son Chung and Weidi Xie and Andrew… See the full description on the dataset page: https://huggingface.co/datasets/ProgramComputer/voxceleb.
An large scale dataset for speaker identification. This data is collected from over 1,251 speakers, with over 150k samples in total. This release contains the audio part of the voxceleb1.1 dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('voxceleb', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
This dataset only contains test data, which is integrated into UltraEval-Audio(https://github.com/OpenBMB/UltraEval-Audio) framework.
python audio_evals/main.py --dataset voxceleb1 --model gpt4o_audio
python audio_evals/main.py --dataset voxceleb2 --model gpt4o_audio
🚀超凡体验,尽在UltraEval-Audio🚀
UltraEval-Audio——全球首个同时支持语音理解和语音生成评估的开源框架,专为语音大模型评估打造,集合了34项权威Benchmark,覆盖语音、声音、医疗及音乐四大领域,支持十种语言,涵盖十二类任务。选择UltraEval-Audio,您将体验到前所未有的便捷与高效:
一键式基准管理… See the full description on the dataset page: https://huggingface.co/datasets/TwinkStart/VoxCeleb.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This torrent shares the VoxCeleb1 and VoxCeleb2 datasets. The original dataset creators do not provide access to the dataset anymore. To ensure papers in the field of speaker recognition can be reproduced (many have used VoxCeleb in recent years) the data should be available for academic purposes. The audio data is stored as mono-channel, 16000hz, signed 16-bit (little-endian) PCM wav files. This torrent does not include video data.
DynamicSuperb/SentimentAnalysis_SLUE-VoxCeleb dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset was created by Arya Gokhale
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The training lists with noisy labels based on the VoxCeleb dataset
The clean training list for VoxCeleb1 is vox1_clean.txt
.
The clean training list for VoxCeleb2 is vox2_clean.txt
.
The noisy training lists for VoxCeleb1 are formatted as vox1_[noisy_type]_[noisy_rate].txt
.
The noisy training lists for VoxCeleb2 are formatted as vox2_[noisy_type]_[noisy_rate].txt
.
The noisy training lists for VoxCeleb1K-O are formatted as vox1k_[noisy_type]_[noisy_rate].txt
.
The noisy training lists for VoxCeleb5K-O are formatted as vox5k_[noisy_type]_[noisy_rate].txt
.
The evaluation lists are vox_O.txt
, vox_E.txt
, and vox_H.txt
.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
VoxCeleb2 Language-Detected Subset
This dataset is a language-labeled version of the VoxCeleb2 speaker identification dataset. It was created using the ProgramComputer/voxceleb Hugging Face dataset and the speechbrain/lang-id-voxlingua107-ecapa language identification model.
Dataset Contents
The dataset consists of two CSV files:
audio_clips_meta_data.csvContains metadata for each audio clip, including:
clip_id: Unique identifier for the audio clip. speaker_id: ID… See the full description on the dataset page: https://huggingface.co/datasets/johbac/voxceleb-language-metadata.
This dataset was created by Yosra Hashem
It contains the following files:
This dataset was created by K Tharun Chowdary
diabolocom/voxceleb-Mimi-filtered dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was constructed from the test set split of the VoxCeleb 2 dataset (VoxCeleb). The VoxCeleb 2 test set contains 118 speakers each in several different videos. To develop this dataset, only one video per speaker was selected. A face image was also extracted from the video, as well as, a low resolution face image (8x8). Age, gender and ethnicity of the person in the face image were determined using the “DeepFace” library, a face recognition and facial attribute analysis library.
This dataset can be used to evaluate speech2face, speech conditioned face generation and speech conditioned face super-resolution systems.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We propose a novel Voxceleb-3D dataset that includes paired voices and 3D face models. Voxceleb-3D is inherited from two widely used datasets: Voxceleb) and VGGFace, which include voice and face images of celebrities, respectively.
mteb/voxceleb-sentiment dataset hosted on Hugging Face and contributed by the HF Datasets community
ahmedelsayed/VoxCeleb-Gender dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Cleaned Dataset for Voice gender detection using the VoxCeleb dataset (7000+ unique speakers and utterances, 3683 males / 2312 females). The VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube. VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages.
VoxCeleb 2
VoxCeleb2 contains over 1 million utterances for 6,112 celebrities, extracted from videos uploaded to YouTube.
Verification Split
train validation test
5,994 5,994 118
982,808 109,201 36,237
Data Fields
ID (string): The ID of the sample with format
morateng/CapTTS-SFT-voxceleb-cleaned dataset hosted on Hugging Face and contributed by the HF Datasets community
VoxCeleb 1
VoxCeleb1 contains over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube.
Identification Split
train validation test
1251 1251 1251
138361 6904 8251
References
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes both VoxCeleb and VoxCeleb2
Multipart Zips
Already joined zips for convenience but these specified files are NOT part of the original datasets vox2_mp4_1.zip - vox2_mp4_6.zip vox2_aac_1.zip - vox2_aac_2.zip
Joining Zip
cat vox1_dev* > vox1_dev_wav.zip
cat vox2_dev_aac* > vox2_aac.zip
cat vox2_dev_mp4* > vox2_mp4.zip
Citation Information
@article{Nagrani19, author = "Arsha Nagrani and Joon~Son Chung and Weidi Xie and Andrew… See the full description on the dataset page: https://huggingface.co/datasets/ProgramComputer/voxceleb.