According to data from February 2022, 32.5 million people in the United States were said to listen to podcasts on Spotify that year, while Apple had 28.5 million podcast listeners. Spotify's and Apple's figures were projected to add up to 42.4 million and 29.2 million by 2025, respectively.
https://www.listennotes.com/podcast-datasets/solutions/#termshttps://www.listennotes.com/podcast-datasets/solutions/#terms
Batch export all publicly accessible podcasts to a SQLite file.
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
OVERVIEW: The PodcastFillers dataset consists of 199 full-length podcast episodes in English with manually annotated filler words and automatically generated transcripts. The podcast audio recordings, sourced from SoundCloud (www.soundcloud.com), are CC-licensed, gender-balanced, and total 145 hours of audio from over 350 speakers. The annotations are provided under a non-commercial license and consist of 85,803 manually annotated audio events including approximately 35,000 filler words (“uh” and “um”) and 50,000 non-filler events such as breaths, music, laughter, repeated words, and noise. The annotated events are also provided as pre-processed 1-second audio clips. The dataset also includes automatically generated speech transcripts from a speech-to-text system. A detailed description is provided below.
The PodcastFillers dataset homepage: PodcastFillers.github.io The preprocessing utility functions and code repository for reproducing our experimental results: PodcastFillersUtils
LICENSE:
The PodcastFillers dataset has separate licenses for the audio data and for the metadata. The metadata includes all annotations, speech-to-text transcriptions, and model outputs including VAD activations and FillerNet classification predictions.
Note: PodcastFillers is provided for research purposes only. The metadata license prohibits commercial use, which in turn prohibits deploying technology developed using the PodcastFillers metadata (such as the CSV annotations or audio clips extracted based on these annotations) in commercial applications.
This license agreement (the “License”) between Adobe Inc., having a place of business at 345 Park Avenue, San Jose, California 95110-2704 (“Adobe”), and you, the individual or entity exercising rights under this License (“you” or “your”), sets forth the terms for your use of certain research materials that are owned by Adobe (the “Licensed Materials”). By exercising rights under this License, you accept and agree to be bound by its terms. If you are exercising rights under this License on behalf of an entity, then “you” means you and such entity, and you (personally) represent and warrant that you (personally) have all necessary authority to bind that entity to the terms of this License.
All of the podcast episode audio files come from SoundCloud. Please see podcast_episode_license.csv (included in the dataset) for a detailed license info for each episode. They include CC-BY-3.0, CC-BY-SA 3.0 and CC-BY-ND-3.0 licenses.
ACKNOWLEDGEMENT: Please cite the following paper in work that makes use of this dataset:
Filler Word Detection and Classification: A Dataset and Benchmark Ge Zhu, Juan-Pablo Caceres and Justin Salamon In 23rd Annual Cong. of the Int. Speech Communication Association (INTERSPEECH), Incheon, Korea, Sep. 2022.
Bibtex
@inproceedings{Zhu:FillerWords:INTERSPEECH:22, title = {Filler Word Detection and Classification: A Dataset and Benchmark}, booktitle = {23rd Annual Cong.~of the Int.~Speech Communication Association (INTERSPEECH)}, address = {Incheon, Korea}, month = {Sep.}, url = {https://arxiv.org/abs/2203.15135}, author = {Zhu, Ge and Caceres, Juan-Pablo and Salamon, Justin}, year = {2022}, }
ANNOTATIONS: The annotations include 85,803 manually annotated audio events covering common English filler-word and non-filler-word events. We also provide automatically-generated speech transcripts from a speech-to-text system, which do not contain the manually annotated events. Full label vocabulary Each of the 85,803 manually annotated events is labeled as one of 5 filler classes or 8 non-filler classes (label: number of events).
Fillers - Uh: 17,907 - Um: 17,078 - You know: 668 - Other: 315 - Like: 157
Non-fillers - Words: 12,709 - Repetitions: 9,024 - Breath: 8,288 - Laughter: 6,623 - Music : 5,060 - Agree (agreement sounds, e.g., “mm-hmm”, “ah-ha”): 3,755 - Noise : 2,735 - Overlap (overlapping speakers): 1,484
Total: 85,803 Consolidated label vocabulary 76,689 of the audio events are also labeled with a smaller, consolidated vocabulary with 6 classes. The consolidated vocabulary was obtained by removing classes with less than 5,000 annotations (like, you know, other, agreement sounds, overlapping speakers, noise), and grouping “repetitions” and “words” into “words”.
Music : 5,060
Total: 76,689
The consolidated vocabulary was used to train FillerNet
For a detailed description of how the dataset was created, please see our paper. Data Split for Machine Learning: To facilitate machine learning experiments, the audio data in this dataset (full-length recordings and preprocessed 1-sec clips) are pre-arranged into “train”, “validation”, and “test” folders. This split ensures that episodes from the same podcast show are always in the same subset (train, validation, or test), to prevent speaker leakage. We also ensured that each subset in this split remains gender balanced, same as the complete dataset.
We strongly recommend using this split in your experiments. It will ensure your results are not inflated due to overfitting, and that they are comparable to the results published in the FillerNet paper
AUDIO FILES:
Full-length podcast episodes (MP3) 199 audio files of the full-length podcast episode recordings in mp3 format, stereo channels, 44.1 kHz sample rate and 32 bit depth. Filename format: [show name]_[episode name].mp3.
Pre-processed full-length podcast episodes (WAV) 199 audio files of the full-length podcast episode recordings in wav format, mono channel, 16 kHz sample rate and 32 bit depth. The files are split into train, validation and test partitions (folders), see Data Split for Machine Learning above. Filename format: [show name]_[episode name].wav
Pre-processed WAV clips Pre-processed 1-second audio clips of the annotated events, where each clip is centered on the center of the event. For annotated events longer than 1 second, we truncate them from the center into 1-second. The clips are in the same format as the pre-processed full-length podcast episodes: wav format, mono channel, 16 kHz sample rate and 32 bit depth.
The clips that have consolidated vocabulary labels (76,689) are split into “train”, “validation” and “test” partitions (folders), see Data Split for Machine Learning above. The remainder of the clips (9,114) are placed in an “extra” folder.
Filename format: [pfID].wav where:
[pfID] = the PodcastFillers ID of the audio clip (see metadata below)
METADATA:
Each word in the transcript is annotated as a dictionary: {“confidence”:(float), “duration”:(int), “offset”:(int), “text”:(string)} where “confidence” indicates the STT confidence in the prediction, “duration” (unit:microsecond or 1e-6 second) is the duration of the transcribed word, “offset” (unit:microsecond or 1e-6 second) is the start time of the transcribed word in the full-length recording.
2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are currently more than 4 million podcast titles on the platform today.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book is How to get your message out fast & free using podcasts : everything you need to know about podcasting explained simply, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).
https://www.listennotes.com/podcast-datasets/category/#termshttps://www.listennotes.com/podcast-datasets/category/#terms
Batch export all podcasts in specific countries, languages or genres.
When asked about "Digital audio purchases", 12 percent of Argentinian respondents answer "Yes, on downloads". This online survey was conducted in 2024, among 1,045 consumers.As an element of Statista Consumer Insights, our Consumer Insights Global survey offers you up-to-date market research data from over 50 countries and territories worldwide.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality.
https://www.listennotes.com/podcast-datasets/faq/#termshttps://www.listennotes.com/podcast-datasets/faq/#terms
Batch export all podcasts by Apple Podcasts IDs (iTunes IDs).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
According to data from February 2022, 32.5 million people in the United States were said to listen to podcasts on Spotify that year, while Apple had 28.5 million podcast listeners. Spotify's and Apple's figures were projected to add up to 42.4 million and 29.2 million by 2025, respectively.