Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
multimodalart/test-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset Card for "Deepfake-Audio-Dataset"
More Information needed
Facebook
TwitterThis dataset repository contains all the text files of the datasets analysed in the Survey Paper on Audio Datasets of Scenes and Events. See here for the paper. The GitHub repository containing the scripts are shared here. Including a bash script to download the audio data for each of the datasets. In this repository, we also included a Python file dataset.py, for easy importing of each of the datasets. Please respect the original license of the dataset owner when downloading the data:… See the full description on the dataset page: https://huggingface.co/datasets/gijs/audio-datasets.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Audio-FLAN Dataset (Paper)
(the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.
1. Dataset Structure
The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Audio-alpaca: A preference dataset for aligning text-to-audio models
Audio-alpaca is a pairwise preference dataset containing about 15k (prompt,chosen, rejected) triplets where given a textual prompt, chosen is the preferred generated audio and rejected is the undesirable audio.
Field details
prompt: Given textual prompt chosen: The preferred audio sample rejected: The rejected audio sample
Facebook
TwitterVALID (Video-Audio Large Interleaved Dataset)
Overview
The VALID (Video-Audio Large Interleaved Dataset) is a multimodal dataset comprising approximately 720,000 Creative Commons licensed videos crawled from YouTube, and processed into audio-video-text data records for machine learning research. The dataset provides a unique opportunity for training models to understand relationships between modalities such as video frames, audio clips, and multilingual textual data… See the full description on the dataset page: https://huggingface.co/datasets/ontocord/VALID.
Facebook
Twittersaeedahmedv/quran-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittermon1111/audio-files-v3 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterGolos dataset
Golos is a Russian corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available for downloading, along with the acoustic model prepared on this corpus. Also we create 3-gram KenLM language model using an open Common Crawl corpus.
Dataset structure
Domain Train files Train hours… See the full description on the dataset page: https://huggingface.co/datasets/SberDevices/Golos.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Moroccan Darija Speech Dataset
Overview
This dataset consists of 12,743 parallel text and speech samples for Moroccan Darija, including its transcription in both Latin and Arabic scripts and English translations. It was created to support speech recognition, language modeling, and NLP tasks for Moroccan Darija.
Dataset Source
The dataset was originally sourced from this repository, where it was available as a CSV file containing three columns:
darija:… See the full description on the dataset page: https://huggingface.co/datasets/atlasia/DODa-audio-dataset.
Facebook
Twitter[doc] audio dataset 11
This dataset contains two tar files that contain pairs of samples with one audio file and one JSON file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All eight of datasets in ESB can be downloaded and prepared in just a single line of code through the Hugging Face Datasets library: from datasets import load_dataset
librispeech = load_dataset("esb/datasets", "librispeech", split="train")
"esb/datasets": the repository namespace. This is fixed for all ESB datasets.
"librispeech": the dataset name. This can be changed to any of any one of the eight datasets in ESB to download that dataset.
split="train": the split. Set this to one of… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only.
Facebook
Twitterzhifeixie/Audio-Reasoner-CoTA dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittercssen/audio-dataset-flickr-soundnet dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
📺 YouTube-Commons 📺
YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.
Content
The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels). In total, this represents nearly 45 billion words (44,811,518,375). All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for AudioSet
Dataset Summary
AudioSet is a dataset of 10-second clips from YouTube, annotated into one or more sound categories, following the AudioSet ontology.
Supported Tasks and Leaderboards
audio-classification: Classify audio clips into categories. The leaderboard is available here
Languages
The class labels in the dataset are in English.
Dataset Structure
Data Instances
Example instance from the dataset: {… See the full description on the dataset page: https://huggingface.co/datasets/agkphysics/AudioSet.
Facebook
Twitternickfuryavg/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittersoftdev629/sn105-audio-dataset-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterRenMinhui/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
multimodalart/test-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community