100+ datasets found
  1. h

    Audio-FLAN-Dataset

    • huggingface.co
    Updated Apr 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HKUST Audio (2025). Audio-FLAN-Dataset [Dataset]. https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset
    Explore at:
    Dataset updated
    Apr 27, 2025
    Dataset authored and provided by
    HKUST Audio
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Audio-FLAN Dataset (Paper)

    (the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.

      1. Dataset Structure
    

    The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.

  2. esb-datasets-test-only

    • huggingface.co
    Updated Sep 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face for Audio (2023). esb-datasets-test-only [Dataset]. https://huggingface.co/datasets/hf-audio/esb-datasets-test-only
    Explore at:
    Dataset updated
    Sep 9, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face for Audio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All eight of datasets in ESB can be downloaded and prepared in just a single line of code through the Hugging Face Datasets library: from datasets import load_dataset

    librispeech = load_dataset("esb/datasets", "librispeech", split="train")

    "esb/datasets": the repository namespace. This is fixed for all ESB datasets.

    "librispeech": the dataset name. This can be changed to any of any one of the eight datasets in ESB to download that dataset.

    split="train": the split. Set this to one of… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only.

  3. h

    medical_asr_recording_dataset

    • huggingface.co
    Updated Oct 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hani. M (2023). medical_asr_recording_dataset [Dataset]. https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 19, 2023
    Authors
    Hani. M
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Data Source Kaggle Medical Speech, Transcription, and Intent Context

    8.5 hours of audio utterances paired with text for common medical symptoms.

    Content

    This data contains thousands of audio utterances for common medical symptoms like “knee pain” or “headache,” totaling more than 8 hours in aggregate. Each utterance was created by individual human contributors based on a given symptom. These audio snippets can be used to train conversational agents in the medical field. This Figure Eight… See the full description on the dataset page: https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset.

  4. h

    wolof-audio-data

    • huggingface.co
    Updated Dec 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdoulaye Diallo (2024). wolof-audio-data [Dataset]. https://huggingface.co/datasets/vonewman/wolof-audio-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2024
    Authors
    Abdoulaye Diallo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Wolof Audio Dataset

    The Wolof Audio Dataset is a collection of audio recordings and their corresponding transcriptions in Wolof. This dataset is designed to support the development of Automatic Speech Recognition (ASR) models for the Wolof language. It was created by combining three existing datasets:

    ALFFA: Available at serge-wilson/wolof_speech_transcription FLEURS: Available at vonewman/fleurs-wolof-dataset Urban Bus Wolof Speech Dataset: Available at vonewman/urban-bus-wolof… See the full description on the dataset page: https://huggingface.co/datasets/vonewman/wolof-audio-data.

  5. h

    viet_bud500

    • huggingface.co
    Updated Feb 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tran Khanh Linh (2024). viet_bud500 [Dataset]. https://huggingface.co/datasets/linhtran92/viet_bud500
    Explore at:
    Dataset updated
    Feb 29, 2024
    Authors
    Tran Khanh Linh
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Bud500: A Comprehensive Vietnamese ASR Dataset

    Introducing Bud500, a diverse Vietnamese speech corpus designed to support ASR research community. With aprroximately 500 hours of audio, it covers a broad spectrum of topics including podcast, travel, book, food, and so on, while spanning accents from Vietnam's North, South, and Central regions. Derived from free public audio resources, this publicly accessible dataset is designed to significantly enhance the work of developers and… See the full description on the dataset page: https://huggingface.co/datasets/linhtran92/viet_bud500.

  6. esb-datasets-test-only-sorted

    • huggingface.co
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face for Audio (2024). esb-datasets-test-only-sorted [Dataset]. https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face for Audio
    Description

    ESB Test Sets: Parquet & Sorted

    This dataset takes the open-asr-leaderboard/datasets-test-only data and sorts each split by audio length. The format is also changed, from custom loading script (un-safe remote code) to parquet (safe). Broadly speaking, this dataset was generated with the following code-snippet: from datasets import load_dataset, get_dataset_config_names

    DATASET = "open-asr-leaderboard/datasets-test-only" # dataset to load from HUB_DATASET_ID =… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted.

  7. h

    my-audio-dataset

    • huggingface.co
    Updated Jun 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohini Koli (2025). my-audio-dataset [Dataset]. https://huggingface.co/datasets/Rohini076/my-audio-dataset
    Explore at:
    Dataset updated
    Jun 12, 2025
    Authors
    Rohini Koli
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Rohini076/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    audio-dataset

    • huggingface.co
    Updated Oct 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swapnik Varala (2024). audio-dataset [Dataset]. https://huggingface.co/datasets/Swapnik/audio-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 20, 2024
    Authors
    Swapnik Varala
    Description

    Swapnik/audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    my-audio-dataset

    • huggingface.co
    Updated Apr 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandipan Ray (2025). my-audio-dataset [Dataset]. https://huggingface.co/datasets/nickfuryavg/my-audio-dataset
    Explore at:
    Dataset updated
    Apr 7, 2025
    Authors
    Sandipan Ray
    Description

    nickfuryavg/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    clean-audio-dataset

    • huggingface.co
    Updated Sep 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radhika Singh (2024). clean-audio-dataset [Dataset]. https://huggingface.co/datasets/radhika-singh/clean-audio-dataset
    Explore at:
    Dataset updated
    Sep 15, 2024
    Authors
    Radhika Singh
    Description

    radhika-singh/clean-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    quranic_audio_dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raghad Salameh, quranic_audio_dataset [Dataset]. https://huggingface.co/datasets/RetaSy/quranic_audio_dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Raghad Salameh
    Description

    Dataset Card for Quranic Audio Dataset : Crowdsourced and Labeled Recitation from Non-Arabic Speakers

      Dataset Summary
    

    We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We developed a crowdsourcing platform called Quran Voice for annotating the… See the full description on the dataset page: https://huggingface.co/datasets/RetaSy/quranic_audio_dataset.

  12. h

    turkishvoicedataset

    • huggingface.co
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EREN FAZLIOĞLU (2023). turkishvoicedataset [Dataset]. https://huggingface.co/datasets/erenfazlioglu/turkishvoicedataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2023
    Authors
    EREN FAZLIOĞLU
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for "turkishneuralvoice"

      Dataset Overview
    

    Dataset Name: Turkish Neural Voice Description: This dataset contains Turkish audio samples generated using Microsoft Text to Speech services. The dataset includes audio files and their corresponding transcriptions.

      Dataset Structure
    

    Configs:

    default

    Data Files:

    Split: train Path: data/train-*

    Dataset Info:

    Features: audio: Audio file transcription: Corresponding text transcription

    Splits: train… See the full description on the dataset page: https://huggingface.co/datasets/erenfazlioglu/turkishvoicedataset.

  13. MusicCaps

    • huggingface.co
    Updated Jan 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2023). MusicCaps [Dataset]. https://huggingface.co/datasets/google/MusicCaps
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2023
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for MusicCaps

      Dataset Summary
    

    The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.

  14. h

    Golos

    • huggingface.co
    Updated Sep 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SberDevices (2022). Golos [Dataset]. https://huggingface.co/datasets/SberDevices/Golos
    Explore at:
    Dataset updated
    Sep 5, 2022
    Dataset authored and provided by
    SberDevices
    Description

    Golos dataset

    Golos is a Russian corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available for downloading, along with the acoustic model prepared on this corpus. Also we create 3-gram KenLM language model using an open Common Crawl corpus.

      Dataset structure
    

    Domain Train files Train… See the full description on the dataset page: https://huggingface.co/datasets/SberDevices/Golos.

  15. h

    wavefake-audio

    • huggingface.co
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajay Karthick Senthil Kumar (2025). wavefake-audio [Dataset]. https://huggingface.co/datasets/ajaykarthick/wavefake-audio
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 18, 2025
    Authors
    Ajay Karthick Senthil Kumar
    Description

    ajaykarthick/wavefake-audio dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    sn105-audio-dataset

    • huggingface.co
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bohdan Drozd (2025). sn105-audio-dataset [Dataset]. https://huggingface.co/datasets/softdev629/sn105-audio-dataset
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Bohdan Drozd
    Description

    softdev629/sn105-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    torgo-audio-dataset-with-audio

    • huggingface.co
    Updated Sep 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RSTV (2024). torgo-audio-dataset-with-audio [Dataset]. https://huggingface.co/datasets/RSTV-24/torgo-audio-dataset-with-audio
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2024
    Dataset authored and provided by
    RSTV
    Description

    RSTV-24/torgo-audio-dataset-with-audio dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    LAV-DF

    • huggingface.co
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ControlNet (2023). LAV-DF [Dataset]. https://huggingface.co/datasets/ControlNet/LAV-DF
    Explore at:
    Dataset updated
    Jul 11, 2023
    Authors
    ControlNet
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Localized Audio Visual DeepFake Dataset (LAV-DF)

    This repo is the dataset for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper "Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization submitted to CVIU.

      LAV-DF Dataset
    
    
    
    
    
    
    
      Download
    

    To use this LAV-DF dataset, you should… See the full description on the dataset page: https://huggingface.co/datasets/ControlNet/LAV-DF.

  19. h

    parlertts-pony-speech-audio

    • huggingface.co
    Updated Sep 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vul traz (2024). parlertts-pony-speech-audio [Dataset]. https://huggingface.co/datasets/therealvul/parlertts-pony-speech-audio
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 2024
    Authors
    vul traz
    Description

    therealvul/parlertts-pony-speech-audio dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. speech_commands

    • huggingface.co
    • tensorflow.org
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google, speech_commands [Dataset]. https://huggingface.co/datasets/google/speech_commands
    Explore at:
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a set of one-second .wav audio files, each containing a single spoken English word or background noise. These words are from a small set of commands, and are spoken by a variety of different speakers. This data set is designed to help train simple machine learning models. This dataset is covered in more detail at https://arxiv.org/abs/1804.03209.

    Version 0.01 of the data set (configuration "v0.01") was released on August 3rd 2017 and contains 64,727 audio files.

    In version 0.01 thirty different words were recoded: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow".

    In version 0.02 more words were added: "Backward", "Forward", "Follow", "Learn", "Visual".

    In both versions, ten of them are used as commands by convention: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go". Other words are considered to be auxiliary (in current implementation it is marked by True value of "is_unknown" feature). Their function is to teach a model to distinguish core words from unrecognized ones.

    The _silence_ class contains a set of longer audio clips that are either recordings or a mathematical simulation of noise.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
HKUST Audio (2025). Audio-FLAN-Dataset [Dataset]. https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset

Audio-FLAN-Dataset

Audio-FLAN

HKUSTAudio/Audio-FLAN-Dataset

Explore at:
Dataset updated
Apr 27, 2025
Dataset authored and provided by
HKUST Audio
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Audio-FLAN Dataset (Paper)

(the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.

  1. Dataset Structure

The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.

Search
Clear search
Close search
Google apps
Main menu