Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Audio-FLAN Dataset (Paper)
(the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.
1. Dataset Structure
The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All eight of datasets in ESB can be downloaded and prepared in just a single line of code through the Hugging Face Datasets library: from datasets import load_dataset
librispeech = load_dataset("esb/datasets", "librispeech", split="train")
"esb/datasets": the repository namespace. This is fixed for all ESB datasets.
"librispeech": the dataset name. This can be changed to any of any one of the eight datasets in ESB to download that dataset.
split="train": the split. Set this to one of… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Data Source Kaggle Medical Speech, Transcription, and Intent Context
8.5 hours of audio utterances paired with text for common medical symptoms.
Content
This data contains thousands of audio utterances for common medical symptoms like “knee pain” or “headache,” totaling more than 8 hours in aggregate. Each utterance was created by individual human contributors based on a given symptom. These audio snippets can be used to train conversational agents in the medical field. This Figure Eight… See the full description on the dataset page: https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Wolof Audio Dataset
The Wolof Audio Dataset is a collection of audio recordings and their corresponding transcriptions in Wolof. This dataset is designed to support the development of Automatic Speech Recognition (ASR) models for the Wolof language. It was created by combining three existing datasets:
ALFFA: Available at serge-wilson/wolof_speech_transcription FLEURS: Available at vonewman/fleurs-wolof-dataset Urban Bus Wolof Speech Dataset: Available at vonewman/urban-bus-wolof… See the full description on the dataset page: https://huggingface.co/datasets/vonewman/wolof-audio-data.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Bud500: A Comprehensive Vietnamese ASR Dataset
Introducing Bud500, a diverse Vietnamese speech corpus designed to support ASR research community. With aprroximately 500 hours of audio, it covers a broad spectrum of topics including podcast, travel, book, food, and so on, while spanning accents from Vietnam's North, South, and Central regions. Derived from free public audio resources, this publicly accessible dataset is designed to significantly enhance the work of developers and… See the full description on the dataset page: https://huggingface.co/datasets/linhtran92/viet_bud500.
ESB Test Sets: Parquet & Sorted
This dataset takes the open-asr-leaderboard/datasets-test-only data and sorts each split by audio length. The format is also changed, from custom loading script (un-safe remote code) to parquet (safe). Broadly speaking, this dataset was generated with the following code-snippet: from datasets import load_dataset, get_dataset_config_names
DATASET = "open-asr-leaderboard/datasets-test-only" # dataset to load from HUB_DATASET_ID =… See the full description on the dataset page: https://huggingface.co/datasets/hf-audio/esb-datasets-test-only-sorted.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Rohini076/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Swapnik/audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
nickfuryavg/my-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
radhika-singh/clean-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Quranic Audio Dataset : Crowdsourced and Labeled Recitation from Non-Arabic Speakers
Dataset Summary
We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We developed a crowdsourcing platform called Quran Voice for annotating the… See the full description on the dataset page: https://huggingface.co/datasets/RetaSy/quranic_audio_dataset.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for "turkishneuralvoice"
Dataset Overview
Dataset Name: Turkish Neural Voice Description: This dataset contains Turkish audio samples generated using Microsoft Text to Speech services. The dataset includes audio files and their corresponding transcriptions.
Dataset Structure
Configs:
default
Data Files:
Split: train Path: data/train-*
Dataset Info:
Features: audio: Audio file transcription: Corresponding text transcription
Splits: train… See the full description on the dataset page: https://huggingface.co/datasets/erenfazlioglu/turkishvoicedataset.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for MusicCaps
Dataset Summary
The MusicCaps dataset contains 5,521 music examples, each of which is labeled with an English aspect list and a free text caption written by musicians. An aspect list is for example "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead", while the caption consists of multiple sentences about the music, e.g., "A low sounding male voice is rapping over a fast paced drums… See the full description on the dataset page: https://huggingface.co/datasets/google/MusicCaps.
Golos dataset
Golos is a Russian corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available for downloading, along with the acoustic model prepared on this corpus. Also we create 3-gram KenLM language model using an open Common Crawl corpus.
Dataset structure
Domain Train files Train… See the full description on the dataset page: https://huggingface.co/datasets/SberDevices/Golos.
ajaykarthick/wavefake-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
softdev629/sn105-audio-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
RSTV-24/torgo-audio-dataset-with-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Localized Audio Visual DeepFake Dataset (LAV-DF)
This repo is the dataset for the DICTA paper Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization (Best Award), and the journal paper "Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization submitted to CVIU.
LAV-DF Dataset
Download
To use this LAV-DF dataset, you should… See the full description on the dataset page: https://huggingface.co/datasets/ControlNet/LAV-DF.
therealvul/parlertts-pony-speech-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a set of one-second .wav audio files, each containing a single spoken English word or background noise. These words are from a small set of commands, and are spoken by a variety of different speakers. This data set is designed to help train simple machine learning models. This dataset is covered in more detail at https://arxiv.org/abs/1804.03209.
Version 0.01 of the data set (configuration "v0.01"
) was released on August 3rd 2017 and contains
64,727 audio files.
In version 0.01 thirty different words were recoded: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow".
In version 0.02 more words were added: "Backward", "Forward", "Follow", "Learn", "Visual".
In both versions, ten of them are used as commands by convention: "Yes", "No", "Up", "Down", "Left",
"Right", "On", "Off", "Stop", "Go". Other words are considered to be auxiliary (in current implementation
it is marked by True
value of "is_unknown"
feature). Their function is to teach a model to distinguish core words
from unrecognized ones.
The _silence_
class contains a set of longer audio clips that are either recordings or
a mathematical simulation of noise.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Audio-FLAN Dataset (Paper)
(the FULL audio files and jsonl files are still updating) An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.
1. Dataset Structure
The Audio-FLAN-Dataset has the following directory structure: Audio-FLAN-Dataset/ ├── audio_files/ │ ├── audio/ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/ │ │ └── 179_Audioset_for_Audio_Inpainting/ │ │ └── ... │ ├── music/ │ │ └──… See the full description on the dataset page: https://huggingface.co/datasets/HKUSTAudio/Audio-FLAN-Dataset.