100+ datasets found
  1. h

    test-audio-dataset

    • huggingface.co
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Apolinário from multimodal AI art (2024). test-audio-dataset [Dataset]. https://huggingface.co/datasets/multimodalart/test-audio-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2024
    Authors
    Apolinário from multimodal AI art
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    multimodalart/test-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    Deepfake-Audio-Dataset

    • huggingface.co
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hem Bahadur Gurung (2024). Deepfake-Audio-Dataset [Dataset]. https://huggingface.co/datasets/Hemg/Deepfake-Audio-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2024
    Authors
    Hem Bahadur Gurung
    Description

    Dataset Card for "Deepfake-Audio-Dataset"

    More Information needed

  3. h

    audio-datasets

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gijs (2025). audio-datasets [Dataset]. https://huggingface.co/datasets/gijs/audio-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Authors
    Gijs
    Description

    This dataset repository contains all the text files of the datasets analysed in the Survey Paper on Audio Datasets of Scenes and Events. See here for the paper. The GitHub repository containing the scripts are shared here. Including a bash script to download the audio data for each of the datasets. In this repository, we also included a Python file dataset.py, for easy importing of each of the datasets. Please respect the original license of the dataset owner when downloading the data:… See the full description on the dataset page: https://huggingface.co/datasets/gijs/audio-datasets.

  4. h

    Audio-FLAN-Dataset

    • huggingface.co
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HKUST Audio (2025). Audio-FLAN-Dataset [Dataset]. https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset
    Explore at:
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    HKUST Audio
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Audio-FLAN Dataset (Paper)

    (the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.

      1. Dataset Structure
    

    The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.

  5. h

    audio-alpaca

    • huggingface.co
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deep Cognition and Language Research (DeCLaRe) Lab (2024). audio-alpaca [Dataset]. https://huggingface.co/datasets/declare-lab/audio-alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2024
    Dataset authored and provided by
    Deep Cognition and Language Research (DeCLaRe) Lab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Audio-alpaca: A preference dataset for aligning text-to-audio models

    Audio-alpaca is a pairwise preference dataset containing about 15k (prompt,chosen, rejected) triplets where given a textual prompt, chosen is the preferred generated audio and rejected is the undesirable audio.

      Field details
    

    prompt: Given textual prompt chosen: The preferred audio sample rejected: The rejected audio sample

  6. h

    VALID

    • huggingface.co
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ontocord.AI (2024). VALID [Dataset]. https://huggingface.co/datasets/ontocord/VALID
    Explore at:
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    Ontocord.AI
    Description

    VALID (Video-Audio Large Interleaved Dataset)

      Overview
    

    The VALID (Video-Audio Large Interleaved Dataset) is a multimodal dataset comprising approximately 720,000 Creative Commons licensed videos crawled from YouTube, and processed into audio-video-text data records for machine learning research. The dataset provides a unique opportunity for training models to understand relationships between modalities such as video frames, audio clips, and multilingual textual data… See the full description on the dataset page: https://huggingface.co/datasets/ontocord/VALID.

  7. h

    quran-audio-dataset

    • huggingface.co
    Updated Aug 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    saeed ahmed (2025). quran-audio-dataset [Dataset]. https://huggingface.co/datasets/saeedahmedv/quran-audio-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2025
    Authors
    saeed ahmed
    Description

    saeedahmedv/quran-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    audio-files-v3

    • huggingface.co
    Updated Aug 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramon (2025). audio-files-v3 [Dataset]. https://huggingface.co/datasets/mon1111/audio-files-v3
    Explore at:
    Dataset updated
    Aug 3, 2025
    Authors
    Ramon
    Description

    mon1111/audio-files-v3 dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    Golos

    • huggingface.co
    Updated Sep 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SberDevices (2022). Golos [Dataset]. https://huggingface.co/datasets/SberDevices/Golos
    Explore at:
    Dataset updated
    Sep 5, 2022
    Authors
    SberDevices
    Description

    Golos dataset

    Golos is a Russian corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available for downloading, along with the acoustic model prepared on this corpus. Also we create 3-gram KenLM language model using an open Common Crawl corpus.

      Dataset structure
    

    Domain Train files Train hours… See the full description on the dataset page: https://huggingface.co/datasets/SberDevices/Golos.

  10. h

    DODa-audio-dataset

    • huggingface.co
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AtlasIA (2025). DODa-audio-dataset [Dataset]. https://huggingface.co/datasets/atlasia/DODa-audio-dataset
    Explore at:
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    AtlasIA
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Moroccan Darija Speech Dataset

      Overview
    

    This dataset consists of 12,743 parallel text and speech samples for Moroccan Darija, including its transcription in both Latin and Arabic scripts and English translations. It was created to support speech recognition, language modeling, and NLP tasks for Moroccan Darija.

      Dataset Source
    

    The dataset was originally sourced from this repository, where it was available as a CSV file containing three columns:

    darija:… See the full description on the dataset page: https://huggingface.co/datasets/atlasia/DODa-audio-dataset.

  11. h

    doc-audio-11

    • huggingface.co
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasets examples (2024). doc-audio-11 [Dataset]. https://huggingface.co/datasets/datasets-examples/doc-audio-11
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2024
    Dataset authored and provided by
    Datasets examples
    Description

    [doc] audio dataset 11

    This dataset contains two tar files that contain pairs of samples with one audio file and one JSON file.

  12. esb-datasets-test-only

    • huggingface.co
    Updated Sep 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face for Audio (2023). esb-datasets-test-only [Dataset]. https://huggingface.co/datasets/hf-audio/esb-datasets-test-only
    Explore at:
    Dataset updated
    Sep 9, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face for Audio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All eight of datasets in ESB can be downloaded and prepared in just a single line of code through the Hugging Face Datasets library: from datasets import load_dataset

    librispeech = load_dataset("esb/datasets", "librispeech", split="train")

    "esb/datasets": the repository namespace. This is fixed for all ESB datasets.

    "librispeech": the dataset name. This can be changed to any of any one of the eight datasets in ESB to download that dataset.

    split="train": the split. Set this to one of… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only.

  13. h

    Audio-Reasoner-CoTA

    • huggingface.co
    Updated May 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XIE ZHIFEI (2025). Audio-Reasoner-CoTA [Dataset]. https://huggingface.co/datasets/zhifeixie/Audio-Reasoner-CoTA
    Explore at:
    Dataset updated
    May 16, 2025
    Authors
    XIE ZHIFEI
    Description

    zhifeixie/Audio-Reasoner-CoTA dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    audio-dataset-flickr-soundnet

    • huggingface.co
    Updated Dec 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    christian simon (2023). audio-dataset-flickr-soundnet [Dataset]. https://huggingface.co/datasets/cssen/audio-dataset-flickr-soundnet
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2023
    Authors
    christian simon
    Description

    cssen/audio-dataset-flickr-soundnet dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    YouTube-Commons

    • huggingface.co
    Updated Apr 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PleIAs (2024). YouTube-Commons [Dataset]. https://huggingface.co/datasets/PleIAs/YouTube-Commons
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset authored and provided by
    PleIAs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    📺 YouTube-Commons 📺

    YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.

      Content
    

    The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels). In total, this represents nearly 45 billion words (44,811,518,375). All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.

  16. h

    AudioSet

    • huggingface.co
    • opendatalab.com
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Keesing (2024). AudioSet [Dataset]. https://huggingface.co/datasets/agkphysics/AudioSet
    Explore at:
    Dataset updated
    Jul 4, 2024
    Authors
    Aaron Keesing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for AudioSet

      Dataset Summary
    

    AudioSet is a dataset of 10-second clips from YouTube, annotated into one or more sound categories, following the AudioSet ontology.

      Supported Tasks and Leaderboards
    

    audio-classification: Classify audio clips into categories. The leaderboard is available here

      Languages
    

    The class labels in the dataset are in English.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    Example instance from the dataset: {… See the full description on the dataset page: https://huggingface.co/datasets/agkphysics/AudioSet.

  17. h

    my-audio-dataset

    • huggingface.co
    Updated Apr 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandipan Ray (2025). my-audio-dataset [Dataset]. https://huggingface.co/datasets/nickfuryavg/my-audio-dataset
    Explore at:
    Dataset updated
    Apr 7, 2025
    Authors
    Sandipan Ray
    Description

    nickfuryavg/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    sn105-audio-dataset-1

    • huggingface.co
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bohdan Drozd (2025). sn105-audio-dataset-1 [Dataset]. https://huggingface.co/datasets/softdev629/sn105-audio-dataset-1
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Bohdan Drozd
    Description

    softdev629/sn105-audio-dataset-1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    my-audio-dataset

    • huggingface.co
    Updated Oct 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minhui (2025). my-audio-dataset [Dataset]. https://huggingface.co/datasets/RenMinhui/my-audio-dataset
    Explore at:
    Dataset updated
    Oct 24, 2025
    Authors
    Minhui
    Description

    RenMinhui/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    gigaspeech

    • huggingface.co
    • opendatalab.com
    Updated Aug 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SpeechColab (2022). gigaspeech [Dataset]. http://doi.org/10.57967/hf/6261
    Explore at:
    Dataset updated
    Aug 30, 2022
    Dataset authored and provided by
    SpeechColab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Apolinário from multimodal AI art (2024). test-audio-dataset [Dataset]. https://huggingface.co/datasets/multimodalart/test-audio-dataset

test-audio-dataset

multimodalart/test-audio-dataset

Explore at:
9 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 10, 2024
Authors
Apolinário from multimodal AI art
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

multimodalart/test-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu