100+ datasets found

h
Audio-FLAN-Dataset
huggingface.co
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HKUST Audio (2025). Audio-FLAN-Dataset [Dataset]. https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset
Explore at:
Dataset updated
Apr 27, 2025
Dataset authored and provided by
HKUST Audio
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Audio-FLAN Dataset (Paper)

(the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.

1. Dataset Structure

The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.
esb-datasets-test-only
huggingface.co
Updated Sep 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face for Audio (2023). esb-datasets-test-only [Dataset]. https://huggingface.co/datasets/hf-audio/esb-datasets-test-only
Explore at:
Dataset updated
Sep 9, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face for Audio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All eight of datasets in ESB can be downloaded and prepared in just a single line of code through the Hugging Face Datasets library: from datasets import load_dataset

librispeech = load_dataset("esb/datasets", "librispeech", split="train")

"esb/datasets": the repository namespace. This is fixed for all ESB datasets.

"librispeech": the dataset name. This can be changed to any of any one of the eight datasets in ESB to download that dataset.

split="train": the split. Set this to one of… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only.
h
medical_asr_recording_dataset
huggingface.co
Updated Oct 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hani. M (2023). medical_asr_recording_dataset [Dataset]. https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 19, 2023
Authors
Hani. M
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Data Source Kaggle Medical Speech, Transcription, and Intent Context

8.5 hours of audio utterances paired with text for common medical symptoms.

Content

This data contains thousands of audio utterances for common medical symptoms like “knee pain” or “headache,” totaling more than 8 hours in aggregate. Each utterance was created by individual human contributors based on a given symptom. These audio snippets can be used to train conversational agents in the medical field. This Figure Eight… See the full description on the dataset page: https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset.
h
wolof-audio-data
huggingface.co
Updated Dec 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdoulaye Diallo (2024). wolof-audio-data [Dataset]. https://huggingface.co/datasets/vonewman/wolof-audio-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2024
Authors
Abdoulaye Diallo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Wolof Audio Dataset

The Wolof Audio Dataset is a collection of audio recordings and their corresponding transcriptions in Wolof. This dataset is designed to support the development of Automatic Speech Recognition (ASR) models for the Wolof language. It was created by combining three existing datasets:

ALFFA: Available at serge-wilson/wolof_speech_transcription FLEURS: Available at vonewman/fleurs-wolof-dataset Urban Bus Wolof Speech Dataset: Available at vonewman/urban-bus-wolof… See the full description on the dataset page: https://huggingface.co/datasets/vonewman/wolof-audio-data.
h
viet_bud500
huggingface.co
Updated Feb 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tran Khanh Linh (2024). viet_bud500 [Dataset]. https://huggingface.co/datasets/linhtran92/viet_bud500
Explore at:
Dataset updated
Feb 29, 2024
Authors
Tran Khanh Linh
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Bud500: A Comprehensive Vietnamese ASR Dataset

Introducing Bud500, a diverse Vietnamese speech corpus designed to support ASR research community. With aprroximately 500 hours of audio, it covers a broad spectrum of topics including podcast, travel, book, food, and so on, while spanning accents from Vietnam's North, South, and Central regions. Derived from free public audio resources, this publicly accessible dataset is designed to significantly enhance the work of developers and… See the full description on the dataset page: https://huggingface.co/datasets/linhtran92/viet_bud500.
esb-datasets-test-only-sorted
huggingface.co
Updated Jul 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face for Audio (2024). esb-datasets-test-only-sorted [Dataset]. https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 2, 2024
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face for Audio
Description
ESB Test Sets: Parquet & Sorted

This dataset takes the open-asr-leaderboard/datasets-test-only data and sorts each split by audio length. The format is also changed, from custom loading script (un-safe remote code) to parquet (safe). Broadly speaking, this dataset was generated with the following code-snippet: from datasets import load_dataset, get_dataset_config_names

DATASET = "open-asr-leaderboard/datasets-test-only" # dataset to load from HUB_DATASET_ID =… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted.
h
my-audio-dataset
huggingface.co
Updated Jun 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohini Koli (2025). my-audio-dataset [Dataset]. https://huggingface.co/datasets/Rohini076/my-audio-dataset
Explore at:
Dataset updated
Jun 12, 2025
Authors
Rohini Koli
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Rohini076/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
audio-dataset
huggingface.co
Updated Oct 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swapnik Varala (2024). audio-dataset [Dataset]. https://huggingface.co/datasets/Swapnik/audio-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 20, 2024
Authors
Swapnik Varala
Description
Swapnik/audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
my-audio-dataset
huggingface.co
Updated Apr 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandipan Ray (2025). my-audio-dataset [Dataset]. https://huggingface.co/datasets/nickfuryavg/my-audio-dataset
Explore at:
Dataset updated
Apr 7, 2025
Authors
Sandipan Ray
Description
nickfuryavg/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
clean-audio-dataset
huggingface.co
Updated Sep 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radhika Singh (2024). clean-audio-dataset [Dataset]. https://huggingface.co/datasets/radhika-singh/clean-audio-dataset
Explore at:
Dataset updated
Sep 15, 2024
Authors
Radhika Singh
Description
radhika-singh/clean-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
quranic_audio_dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raghad Salameh, quranic_audio_dataset [Dataset]. https://huggingface.co/datasets/RetaSy/quranic_audio_dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Raghad Salameh
Description
Dataset Card for Quranic Audio Dataset : Crowdsourced and Labeled Recitation from Non-Arabic Speakers

Dataset Summary

We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We developed a crowdsourcing platform called Quran Voice for annotating the… See the full description on the dataset page: https://huggingface.co/datasets/RetaSy/quranic_audio_dataset.
h
turkishvoicedataset
huggingface.co
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EREN FAZLIOĞLU (2023). turkishvoicedataset [Dataset]. https://huggingface.co/datasets/erenfazlioglu/turkishvoicedataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 14, 2023
Authors
EREN FAZLIOĞLU
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for "turkishneuralvoice"

Dataset Overview

Dataset Name: Turkish Neural Voice Description: This dataset contains Turkish audio samples generated using Microsoft Text to Speech services. The dataset includes audio files and their corresponding transcriptions.

Dataset Structure

Configs:

default

Data Files:

Split: train Path: data/train-*

Dataset Info:

Features: audio: Audio file transcription: Corresponding text transcription

Splits: train… See the full description on the dataset page: https://huggingface.co/datasets/erenfazlioglu/turkishvoicedataset.
MusicCaps
huggingface.co
Updated Jan 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2023
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for MusicCaps

Dataset Summary

The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.
h
Golos
huggingface.co
Updated Sep 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SberDevices (2022). Golos [Dataset]. https://huggingface.co/datasets/SberDevices/Golos
Explore at:
Dataset updated
Sep 5, 2022
Dataset authored and provided by
SberDevices
Description
Golos dataset

Golos is a Russian corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available for downloading, along with the acoustic model prepared on this corpus. Also we create 3-gram KenLM language model using an open Common Crawl corpus.

Dataset structure

Domain Train files Train… See the full description on the dataset page: https://huggingface.co/datasets/SberDevices/Golos.
h
wavefake-audio
huggingface.co
Updated Feb 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajay Karthick Senthil Kumar (2025). wavefake-audio [Dataset]. https://huggingface.co/datasets/ajaykarthick/wavefake-audio
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 18, 2025
Authors
Ajay Karthick Senthil Kumar
Description
ajaykarthick/wavefake-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sn105-audio-dataset
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bohdan Drozd (2025). sn105-audio-dataset [Dataset]. https://huggingface.co/datasets/softdev629/sn105-audio-dataset
Explore at:
Dataset updated
Jun 1, 2025
Authors
Bohdan Drozd
Description
softdev629/sn105-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
torgo-audio-dataset-with-audio
huggingface.co
Updated Sep 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RSTV (2024). torgo-audio-dataset-with-audio [Dataset]. https://huggingface.co/datasets/RSTV-24/torgo-audio-dataset-with-audio
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2024
Dataset authored and provided by
RSTV
Description
RSTV-24/torgo-audio-dataset-with-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LAV-DF
huggingface.co
Updated Jul 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ControlNet (2023). LAV-DF [Dataset]. https://huggingface.co/datasets/ControlNet/LAV-DF
Explore at:
Dataset updated
Jul 11, 2023
Authors
ControlNet
License
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Description
Localized Audio Visual DeepFake Dataset (LAV-DF)

This repo is the dataset for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper "Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization submitted to CVIU.

LAV-DF Dataset Download

To use this LAV-DF dataset, you should… See the full description on the dataset page: https://huggingface.co/datasets/ControlNet/LAV-DF.
h
parlertts-pony-speech-audio
huggingface.co
Updated Sep 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vul traz (2024). parlertts-pony-speech-audio [Dataset]. https://huggingface.co/datasets/therealvul/parlertts-pony-speech-audio
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 2024
Authors
vul traz
Description
therealvul/parlertts-pony-speech-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
speech_commands
huggingface.co
tensorflow.org
+1more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google, speech_commands [Dataset]. https://huggingface.co/datasets/google/speech_commands
Explore at:
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a set of one-second .wav audio files, each containing a single spoken English word or background noise. These words are from a small set of commands, and are spoken by a variety of different speakers. This data set is designed to help train simple machine learning models. This dataset is covered in more detail at https://arxiv.org/abs/1804.03209.

Version 0.01 of the data set (configuration "v0.01") was released on August 3rd 2017 and contains 64,727 audio files.

In version 0.01 thirty different words were recoded: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow".

In version 0.02 more words were added: "Backward", "Forward", "Follow", "Learn", "Visual".

In both versions, ten of them are used as commands by convention: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go". Other words are considered to be auxiliary (in current implementation it is marked by True value of "is_unknown" feature). Their function is to teach a model to distinguish core words from unrecognized ones.

The _silence_ class contains a set of longer audio clips that are either recordings or a mathematical simulation of noise.

Facebook

Twitter

Click to copy link

Link copied

Cite

HKUST Audio (2025). Audio-FLAN-Dataset [Dataset]. https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset

Audio-FLAN-Dataset

Audio-FLAN

HKUSTAudio/Audio-FLAN-Dataset

Explore at:

Dataset updated

Apr 27, 2025

Dataset authored and provided by

HKUST Audio

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Audio-FLAN Dataset (Paper)

(the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.

  1. Dataset Structure

The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.

Clear search

Close search

Google apps

Main menu

Audio-FLAN-Dataset

esb-datasets-test-only

medical_asr_recording_dataset

wolof-audio-data

viet_bud500

esb-datasets-test-only-sorted

my-audio-dataset

audio-dataset

my-audio-dataset

clean-audio-dataset

quranic_audio_dataset

turkishvoicedataset

MusicCaps

Golos

wavefake-audio

sn105-audio-dataset

torgo-audio-dataset-with-audio

LAV-DF

parlertts-pony-speech-audio

speech_commands

Audio-FLAN-Dataset

Audio-FLAN

HKUSTAudio/Audio-FLAN-Dataset