9 datasets found

Audience of Apple and Spotify podcasts in the U.S. 2020-2025
statista.com
Updated May 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Audience of Apple and Spotify podcasts in the U.S. 2020-2025 [Dataset]. https://www.statista.com/statistics/1303252/apple-spotify-podcast-listeners-united-states/
Explore at:
Dataset updated
May 29, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
According to data from February 2022, 32.5 million people in the United States were said to listen to podcasts on Spotify that year, while Apple had 28.5 million podcast listeners. Spotify's and Apple's figures were projected to add up to 42.4 million and 29.2 million by 2025, respectively.
Metadata of all public podcasts
listennotes.com
sqlite
Updated Mar 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Listen Notes, Inc. (2022). Metadata of all public podcasts [Dataset]. https://www.listennotes.com/podcast-datasets/solutions/
Explore at:
sqliteAvailable download formats
Dataset updated
Mar 23, 2022
Dataset provided by
Listen Notes
Authors
Listen Notes, Inc.
License
https://www.listennotes.com/podcast-datasets/solutions/#termshttps://www.listennotes.com/podcast-datasets/solutions/#terms
Description
Batch export all publicly accessible podcasts to a SQLite file.
Z
PodcastFillers
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Oct 9, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan-Pablo Caceres (2022). PodcastFillers [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6609214
Explore at:
Dataset updated
Oct 9, 2022
Dataset provided by
Juan-Pablo Caceres
Justin Salamon
Ge Zhu
License
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
Description
OVERVIEW: The PodcastFillers dataset consists of 199 full-length podcast episodes in English with manually annotated filler words and automatically generated transcripts. The podcast audio recordings, sourced from SoundCloud (www.soundcloud.com), are CC-licensed, gender-balanced, and total 145 hours of audio from over 350 speakers. The annotations are provided under a non-commercial license and consist of 85,803 manually annotated audio events including approximately 35,000 filler words (“uh” and “um”) and 50,000 non-filler events such as breaths, music, laughter, repeated words, and noise. The annotated events are also provided as pre-processed 1-second audio clips. The dataset also includes automatically generated speech transcripts from a speech-to-text system. A detailed description is provided below.

The PodcastFillers dataset homepage: PodcastFillers.github.io The preprocessing utility functions and code repository for reproducing our experimental results: PodcastFillersUtils

LICENSE:

The PodcastFillers dataset has separate licenses for the audio data and for the metadata. The metadata includes all annotations, speech-to-text transcriptions, and model outputs including VAD activations and FillerNet classification predictions.

Note: PodcastFillers is provided for research purposes only. The metadata license prohibits commercial use, which in turn prohibits deploying technology developed using the PodcastFillers metadata (such as the CSV annotations or audio clips extracted based on these annotations) in commercial applications.

License for PodcastFillers Dataset metadata

This license agreement (the “License”) between Adobe Inc., having a place of business at 345 Park Avenue, San Jose, California 95110-2704 (“Adobe”), and you, the individual or entity exercising rights under this License (“you” or “your”), sets forth the terms for your use of certain research materials that are owned by Adobe (the “Licensed Materials”). By exercising rights under this License, you accept and agree to be bound by its terms. If you are exercising rights under this License on behalf of an entity, then “you” means you and such entity, and you (personally) represent and warrant that you (personally) have all necessary authority to bind that entity to the terms of this License.

GRANT OF LICENSE. 1.1 Adobe grants you a nonexclusive, worldwide, royalty-free, revocable, fully paid license to (A) reproduce, use, modify, and publicly display the Licensed Materials for noncommercial research purposes only; and (B) redistribute the Licensed Materials, and modifications or derivative works thereof, for noncommercial research purposes only, provided that you give recipients a copy of this License upon redistribution. 1.2 You may add your own copyright statement to your modifications and/or provide additional or different license terms for use, reproduction, modification, public display, and redistribution of your modifications and derivative works, provided that such license terms limit the use, reproduction, modification, public display, and redistribution of such modifications and derivative works to noncommercial research purposes only. 1.3 For purposes of this License, noncommercial research purposes include academic research and teaching only. Noncommercial research purposes do not include commercial licensing or distribution, development of commercial products, or any other activity that results in commercial gain.

OWNERSHIP AND ATTRIBUTION. Adobe and its licensors own all right, title, and interest in the Licensed Materials. You must retain all copyright notices and/or disclaimers in the Licensed Materials.

DISCLAIMER OF WARRANTIES. THE LICENSED MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND. THE ENTIRE RISK AS TO THE USE, RESULTS, AND PERFORMANCE OF THE LICENSED MATERIALS IS ASSUMED BY YOU. ADOBE DISCLAIMS ALL WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, WITH REGARD TO YOUR USE OF THE LICENSED MATERIALS, INCLUDING, BUT NOT LIMITED TO, NONINFRINGEMENT OF THIRD-PARTY RIGHTS.

LIMITATION OF LIABILITY. IN NO EVENT WILL ADOBE BE LIABLE FOR ANY ACTUAL, INCIDENTAL, SPECIAL OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION, LOSS OF PROFITS OR OTHER COMMERCIAL LOSS, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE LICENSED MATERIALS, EVEN IF ADOBE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

TERM AND TERMINATION.
5.1 The License is effective upon acceptance by you and will remain in effect unless terminated earlier in accordance with Section 5.2. 5.2 Any breach of any material provision of this License will automatically terminate the rights granted herein. 5.3 Sections 2 (Ownership and Attribution), 3 (Disclaimer of Warranties), 4 (Limitation of Liability) will survive termination of this License. ## License for PodcastFillers Dataset audio files

All of the podcast episode audio files come from SoundCloud. Please see podcast_episode_license.csv (included in the dataset) for a detailed license info for each episode. They include CC-BY-3.0, CC-BY-SA 3.0 and CC-BY-ND-3.0 licenses.

ACKNOWLEDGEMENT: Please cite the following paper in work that makes use of this dataset:

Filler Word Detection and Classification: A Dataset and Benchmark Ge Zhu, Juan-Pablo Caceres and Justin Salamon In 23rd Annual Cong. of the Int. Speech Communication Association (INTERSPEECH), Incheon, Korea, Sep. 2022.

Bibtex

@inproceedings{Zhu:FillerWords:INTERSPEECH:22, title = {Filler Word Detection and Classification: A Dataset and Benchmark}, booktitle = {23rd Annual Cong.~of the Int.~Speech Communication Association (INTERSPEECH)}, address = {Incheon, Korea}, month = {Sep.}, url = {https://arxiv.org/abs/2203.15135}, author = {Zhu, Ge and Caceres, Juan-Pablo and Salamon, Justin}, year = {2022}, }

ANNOTATIONS: The annotations include 85,803 manually annotated audio events covering common English filler-word and non-filler-word events. We also provide automatically-generated speech transcripts from a speech-to-text system, which do not contain the manually annotated events. Full label vocabulary Each of the 85,803 manually annotated events is labeled as one of 5 filler classes or 8 non-filler classes (label: number of events).

Fillers - Uh: 17,907 - Um: 17,078 - You know: 668 - Other: 315 - Like: 157

Non-fillers - Words: 12,709 - Repetitions: 9,024 - Breath: 8,288 - Laughter: 6,623 - Music : 5,060 - Agree (agreement sounds, e.g., “mm-hmm”, “ah-ha”): 3,755 - Noise : 2,735 - Overlap (overlapping speakers): 1,484

Total: 85,803 Consolidated label vocabulary 76,689 of the audio events are also labeled with a smaller, consolidated vocabulary with 6 classes. The consolidated vocabulary was obtained by removing classes with less than 5,000 annotations (like, you know, other, agreement sounds, overlapping speakers, noise), and grouping “repetitions” and “words” into “words”.

Words: 21,733

Uh: 17,907

Um: 17,078

Breath: 8,288

Laughter: 6,623

Music : 5,060

Total: 76,689

The consolidated vocabulary was used to train FillerNet

For a detailed description of how the dataset was created, please see our paper. Data Split for Machine Learning: To facilitate machine learning experiments, the audio data in this dataset (full-length recordings and preprocessed 1-sec clips) are pre-arranged into “train”, “validation”, and “test” folders. This split ensures that episodes from the same podcast show are always in the same subset (train, validation, or test), to prevent speaker leakage. We also ensured that each subset in this split remains gender balanced, same as the complete dataset.

We strongly recommend using this split in your experiments. It will ensure your results are not inflated due to overfitting, and that they are comparable to the results published in the FillerNet paper

AUDIO FILES:

Full-length podcast episodes (MP3) 199 audio files of the full-length podcast episode recordings in mp3 format, stereo channels, 44.1 kHz sample rate and 32 bit depth. Filename format: [show name]_[episode name].mp3.

Pre-processed full-length podcast episodes (WAV) 199 audio files of the full-length podcast episode recordings in wav format, mono channel, 16 kHz sample rate and 32 bit depth. The files are split into train, validation and test partitions (folders), see Data Split for Machine Learning above. Filename format: [show name]_[episode name].wav

Pre-processed WAV clips Pre-processed 1-second audio clips of the annotated events, where each clip is centered on the center of the event. For annotated events longer than 1 second, we truncate them from the center into 1-second. The clips are in the same format as the pre-processed full-length podcast episodes: wav format, mono channel, 16 kHz sample rate and 32 bit depth.

The clips that have consolidated vocabulary labels (76,689) are split into “train”, “validation” and “test” partitions (folders), see Data Split for Machine Learning above. The remainder of the clips (9,114) are placed in an “extra” folder.

Filename format: [pfID].wav where:

[pfID] = the PodcastFillers ID of the audio clip (see metadata below)

METADATA:

Speech-to-text podcasts transcripts Speech transcript in JSON format for each podcast episode. Generated using the SpeechMatics STT Filename format: [show name]_[episode name].json.

Each word in the transcript is annotated as a dictionary: {“confidence”:(float), “duration”:(int), “offset”:(int), “text”:(string)} where “confidence” indicates the STT confidence in the prediction, “duration” (unit:microsecond or 1e-6 second) is the duration of the transcribed word, “offset” (unit:microsecond or 1e-6 second) is the start time of the transcribed word in the full-length recording.

2.
s
Spotify’s Podcasts
searchlogistics.com
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Spotify’s Podcasts [Dataset]. https://www.searchlogistics.com/learn/statistics/spotify-statistics/
Explore at:
Dataset updated
Mar 24, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
There are currently more than 4 million podcast titles on the platform today.
w
Books called How to get your message out fast & free using podcasts :...
workwithdata.com
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Books called How to get your message out fast & free using podcasts : everything you need to know about podcasting explained simply [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=How+to+get+your+message+out+fast+%26+free+using+podcasts+%3A+everything+you+need+to+know+about+podcasting+explained+simply
Explore at:
Dataset updated
Aug 6, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books and is filtered where the book is How to get your message out fast & free using podcasts : everything you need to know about podcasting explained simply, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).
Podcast metadata by category
listennotes.com
csv
Updated Nov 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Listen Notes, Inc. (2019). Podcast metadata by category [Dataset]. https://www.listennotes.com/podcast-datasets/category/
Explore at:
csvAvailable download formats
Dataset updated
Nov 21, 2019
Dataset provided by
Listen Notes
Authors
Listen Notes, Inc.
License
https://www.listennotes.com/podcast-datasets/category/#termshttps://www.listennotes.com/podcast-datasets/category/#terms
Description
Batch export all podcasts in specific countries, languages or genres.
Digital audio purchases in Argentina 2024
statista.com
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair Bashir (2025). Digital audio purchases in Argentina 2024 [Dataset]. https://www.statista.com/topics/9770/podcast-consumption-in-latin-america/
Explore at:
Dataset updated
Feb 28, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Umair Bashir
Description
When asked about "Digital audio purchases", 12 percent of Argentinian respondents answer "Yes, on downloads". This online survey was conducted in 2024, among 1,045 consumers.As an element of Statista Consumer Insights, our Consumer Insights Global survey offers you up-to-date market research data from over 50 countries and territories worldwide.
h
gigaspeech
huggingface.co
paperswithcode.com
+1more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SpeechColab, gigaspeech [Dataset]. https://huggingface.co/datasets/speechcolab/gigaspeech
Explore at:
Dataset authored and provided by
SpeechColab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality.
Podcast metadata by Apple Podcasts IDs (iTunes IDs)
listennotes.com
csv
Updated Apr 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Listen Notes, Inc. (2021). Podcast metadata by Apple Podcasts IDs (iTunes IDs) [Dataset]. https://www.listennotes.com/podcast-datasets/faq/
Explore at:
csvAvailable download formats
Dataset updated
Apr 14, 2021
Dataset provided by
Listen Notes
Authors
Listen Notes, Inc.
License
https://www.listennotes.com/podcast-datasets/faq/#termshttps://www.listennotes.com/podcast-datasets/faq/#terms
Description
Batch export all podcasts by Apple Podcasts IDs (iTunes IDs).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2024). Audience of Apple and Spotify podcasts in the U.S. 2020-2025 [Dataset]. https://www.statista.com/statistics/1303252/apple-spotify-podcast-listeners-united-states/

Audience of Apple and Spotify podcasts in the U.S. 2020-2025

Explore at:

Dataset updated

May 29, 2024

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

United States

Description

According to data from February 2022, 32.5 million people in the United States were said to listen to podcasts on Spotify that year, while Apple had 28.5 million podcast listeners. Spotify's and Apple's figures were projected to add up to 42.4 million and 29.2 million by 2025, respectively.

Clear search

Close search

Google apps

Main menu

Audience of Apple and Spotify podcasts in the U.S. 2020-2025

Metadata of all public podcasts

PodcastFillers

License for PodcastFillers Dataset metadata

Spotify’s Podcasts

Books called How to get your message out fast & free using podcasts :...

Podcast metadata by category

Digital audio purchases in Argentina 2024

gigaspeech

Podcast metadata by Apple Podcasts IDs (iTunes IDs)

Audience of Apple and Spotify podcasts in the U.S. 2020-2025