100+ datasets found

h
test-audio-dataset
huggingface.co
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Apolinário from multimodal AI art (2024). test-audio-dataset [Dataset]. https://huggingface.co/datasets/multimodalart/test-audio-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 10, 2024
Authors
Apolinário from multimodal AI art
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
multimodalart/test-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Deepfake-Audio-Dataset
huggingface.co
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hem Bahadur Gurung (2024). Deepfake-Audio-Dataset [Dataset]. https://huggingface.co/datasets/Hemg/Deepfake-Audio-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2024
Authors
Hem Bahadur Gurung
Description
Dataset Card for "Deepfake-Audio-Dataset"

More Information needed
h
audio-datasets
huggingface.co
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gijs (2025). audio-datasets [Dataset]. https://huggingface.co/datasets/gijs/audio-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Authors
Gijs
Description
This dataset repository contains all the text files of the datasets analysed in the Survey Paper on Audio Datasets of Scenes and Events. See here for the paper. The GitHub repository containing the scripts are shared here. Including a bash script to download the audio data for each of the datasets. In this repository, we also included a Python file dataset.py, for easy importing of each of the datasets. Please respect the original license of the dataset owner when downloading the data:… See the full description on the dataset page: https://huggingface.co/datasets/gijs/audio-datasets.
h
Audio-FLAN-Dataset
huggingface.co
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HKUST Audio (2025). Audio-FLAN-Dataset [Dataset]. https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset
Explore at:
Dataset updated
Feb 24, 2025
Dataset authored and provided by
HKUST Audio
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Audio-FLAN Dataset (Paper)

(the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.

1. Dataset Structure

The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.
h
audio-alpaca
huggingface.co
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deep Cognition and Language Research (DeCLaRe) Lab (2024). audio-alpaca [Dataset]. https://huggingface.co/datasets/declare-lab/audio-alpaca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2024
Dataset authored and provided by
Deep Cognition and Language Research (DeCLaRe) Lab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Audio-alpaca: A preference dataset for aligning text-to-audio models

Audio-alpaca is a pairwise preference dataset containing about 15k (prompt,chosen, rejected) triplets where given a textual prompt, chosen is the preferred generated audio and rejected is the undesirable audio.

Field details

prompt: Given textual prompt chosen: The preferred audio sample rejected: The rejected audio sample
h
VALID
huggingface.co
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ontocord.AI (2024). VALID [Dataset]. https://huggingface.co/datasets/ontocord/VALID
Explore at:
Dataset updated
Dec 5, 2024
Dataset provided by
Ontocord.AI
Description
VALID (Video-Audio Large Interleaved Dataset)

Overview

The VALID (Video-Audio Large Interleaved Dataset) is a multimodal dataset comprising approximately 720,000 Creative Commons licensed videos crawled from YouTube, and processed into audio-video-text data records for machine learning research. The dataset provides a unique opportunity for training models to understand relationships between modalities such as video frames, audio clips, and multilingual textual data… See the full description on the dataset page: https://huggingface.co/datasets/ontocord/VALID.
h
quran-audio-dataset
huggingface.co
Updated Aug 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
saeed ahmed (2025). quran-audio-dataset [Dataset]. https://huggingface.co/datasets/saeedahmedv/quran-audio-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2025
Authors
saeed ahmed
Description
saeedahmedv/quran-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
audio-files-v3
huggingface.co
Updated Aug 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramon (2025). audio-files-v3 [Dataset]. https://huggingface.co/datasets/mon1111/audio-files-v3
Explore at:
Dataset updated
Aug 3, 2025
Authors
Ramon
Description
mon1111/audio-files-v3 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Golos
huggingface.co
Updated Sep 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SberDevices (2022). Golos [Dataset]. https://huggingface.co/datasets/SberDevices/Golos
Explore at:
Dataset updated
Sep 5, 2022
Authors
SberDevices
Description
Golos dataset

Golos is a Russian corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available for downloading, along with the acoustic model prepared on this corpus. Also we create 3-gram KenLM language model using an open Common Crawl corpus.

Dataset structure

Domain Train files Train hours… See the full description on the dataset page: https://huggingface.co/datasets/SberDevices/Golos.
h
DODa-audio-dataset
huggingface.co
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AtlasIA (2025). DODa-audio-dataset [Dataset]. https://huggingface.co/datasets/atlasia/DODa-audio-dataset
Explore at:
Dataset updated
Nov 7, 2025
Dataset authored and provided by
AtlasIA
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Moroccan Darija Speech Dataset

Overview

This dataset consists of 12,743 parallel text and speech samples for Moroccan Darija, including its transcription in both Latin and Arabic scripts and English translations. It was created to support speech recognition, language modeling, and NLP tasks for Moroccan Darija.

Dataset Source

The dataset was originally sourced from this repository, where it was available as a CSV file containing three columns:

darija:… See the full description on the dataset page: https://huggingface.co/datasets/atlasia/DODa-audio-dataset.
h
doc-audio-11
huggingface.co
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasets examples (2024). doc-audio-11 [Dataset]. https://huggingface.co/datasets/datasets-examples/doc-audio-11
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2024
Dataset authored and provided by
Datasets examples
Description
[doc] audio dataset 11

This dataset contains two tar files that contain pairs of samples with one audio file and one JSON file.
esb-datasets-test-only
huggingface.co
Updated Sep 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face for Audio (2023). esb-datasets-test-only [Dataset]. https://huggingface.co/datasets/hf-audio/esb-datasets-test-only
Explore at:
Dataset updated
Sep 9, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face for Audio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All eight of datasets in ESB can be downloaded and prepared in just a single line of code through the Hugging Face Datasets library: from datasets import load_dataset

librispeech = load_dataset("esb/datasets", "librispeech", split="train")

"esb/datasets": the repository namespace. This is fixed for all ESB datasets.

"librispeech": the dataset name. This can be changed to any of any one of the eight datasets in ESB to download that dataset.

split="train": the split. Set this to one of… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only.
h
Audio-Reasoner-CoTA
huggingface.co
Updated May 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XIE ZHIFEI (2025). Audio-Reasoner-CoTA [Dataset]. https://huggingface.co/datasets/zhifeixie/Audio-Reasoner-CoTA
Explore at:
Dataset updated
May 16, 2025
Authors
XIE ZHIFEI
Description
zhifeixie/Audio-Reasoner-CoTA dataset hosted on Hugging Face and contributed by the HF Datasets community
h
audio-dataset-flickr-soundnet
huggingface.co
Updated Dec 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
christian simon (2023). audio-dataset-flickr-soundnet [Dataset]. https://huggingface.co/datasets/cssen/audio-dataset-flickr-soundnet
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2023
Authors
christian simon
Description
cssen/audio-dataset-flickr-soundnet dataset hosted on Hugging Face and contributed by the HF Datasets community
h
YouTube-Commons
huggingface.co
Updated Apr 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PleIAs (2024). YouTube-Commons [Dataset]. https://huggingface.co/datasets/PleIAs/YouTube-Commons
Explore at:
Dataset updated
Apr 17, 2024
Dataset authored and provided by
PleIAs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
📺 YouTube-Commons 📺

YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.

Content

The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels). In total, this represents nearly 45 billion words (44,811,518,375). All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.
h
AudioSet
huggingface.co
opendatalab.com
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Keesing (2024). AudioSet [Dataset]. https://huggingface.co/datasets/agkphysics/AudioSet
Explore at:
Dataset updated
Jul 4, 2024
Authors
Aaron Keesing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for AudioSet

Dataset Summary

AudioSet is a dataset of 10-second clips from YouTube, annotated into one or more sound categories, following the AudioSet ontology.

Supported Tasks and Leaderboards

audio-classification: Classify audio clips into categories. The leaderboard is available here

Languages

The class labels in the dataset are in English.

Dataset Structure Data Instances

Example instance from the dataset: {… See the full description on the dataset page: https://huggingface.co/datasets/agkphysics/AudioSet.
h
my-audio-dataset
huggingface.co
Updated Apr 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandipan Ray (2025). my-audio-dataset [Dataset]. https://huggingface.co/datasets/nickfuryavg/my-audio-dataset
Explore at:
Dataset updated
Apr 7, 2025
Authors
Sandipan Ray
Description
nickfuryavg/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sn105-audio-dataset-1
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bohdan Drozd (2025). sn105-audio-dataset-1 [Dataset]. https://huggingface.co/datasets/softdev629/sn105-audio-dataset-1
Explore at:
Dataset updated
Jun 1, 2025
Authors
Bohdan Drozd
Description
softdev629/sn105-audio-dataset-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
my-audio-dataset
huggingface.co
Updated Oct 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minhui (2025). my-audio-dataset [Dataset]. https://huggingface.co/datasets/RenMinhui/my-audio-dataset
Explore at:
Dataset updated
Oct 24, 2025
Authors
Minhui
Description
RenMinhui/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gigaspeech
huggingface.co
opendatalab.com
Updated Aug 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SpeechColab (2022). gigaspeech [Dataset]. http://doi.org/10.57967/hf/6261
Explore at:
Unique identifier
https://doi.org/10.57967/hf/6261
Dataset updated
Aug 30, 2022
Dataset authored and provided by
SpeechColab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality.

Facebook

Twitter

Click to copy link

Link copied

Cite

Apolinário from multimodal AI art (2024). test-audio-dataset [Dataset]. https://huggingface.co/datasets/multimodalart/test-audio-dataset

test-audio-dataset

multimodalart/test-audio-dataset

Explore at:

9 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 10, 2024

Authors

Apolinário from multimodal AI art

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

multimodalart/test-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

test-audio-dataset

Deepfake-Audio-Dataset

audio-datasets

Audio-FLAN-Dataset

audio-alpaca

VALID

quran-audio-dataset

audio-files-v3

Golos

DODa-audio-dataset

doc-audio-11

esb-datasets-test-only

Audio-Reasoner-CoTA

audio-dataset-flickr-soundnet

YouTube-Commons

AudioSet

my-audio-dataset

sn105-audio-dataset-1

my-audio-dataset

gigaspeech

test-audio-dataset

multimodalart/test-audio-dataset