Facebook
Twitterhttp://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
Podcasting is a unique space where people can share their voices, ideas, and stories freely. Unlike platforms controlled by a single company (like YouTube or Instagram), podcasting supports true freedom of expression. However, this openness is now being threatened by AI tools, such as Notebook LM, which make it easy to produce fake, low-quality podcasts. Unfortunately, many of these AI-generated shows are created by spammers, scammers, or blackhat SEOs, and they are harming both listeners and genuine podcast creators.
At Listen Notes, the leading podcast search engine and podcast API, we believe that creating a quality podcast takes real effort. Listeners can tell when a show has been crafted with care, and that’s why we are committed to stopping the spread of fake, AI-generated podcasts on our platform.
This dataset represents a small subset of AI-generated fake podcasts that were flagged during attempts to add them to the Listen Notes podcast database. These "podcasts" were predominantly created using Notebook LM and are not designed for human consumption.
The goal of sharing this dataset is to support the AI community in developing more effective tools to combat spam. While it may not be possible to eliminate spam entirely, we can work together to minimize its impact and contribute to making the digital world a better place.
If you're building a podcast app for discovering human-made shows, PodcastAPI.com is your best bet. Apple Podcasts and Spotify are increasingly flooded with AI-generated fakes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This SPSS dataset is from a 2019 survey conducted via . There are 323 participants in the file, 306 with complete data for the key measures. Measures include the Big Five Inventory, the Interest/Deprivation Curiosity Scale, the Need for Cognition Scale, the Need to Belong Scale, the Basic Psychological Need Satisfaction Scale, the General Belongingness Scale, the Meaning in Life Questionnaire, the Mindful Attention Awareness Scale, the Smartphone Addiction Scale, and some questions about listening to podcasts.
In relation to podcasts, participants were first asked if they had ever listened to a podcast. Those who said yes (N = 240) were asked questions related to amount of listening, categories and format of podcasts, setting of listening, device used, social engagement around podcasts, and parasocial relationships with their favourite podcast host. Participants also indicated their age, gender, and country of residence.
The datafile contains item ratings and scale scores for all measures. Item wording and response labels are provided in the variable view tab of the downloaded file. Other files available on the OSF site include a syntax file related to the analyses reported in a published paper and a copy of the survey.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The SEP-28k dataset contains stuttering event annotations for approximately 28,000 3-second clips. In addition we include stutter event annotations for about 4,000 3-second clips from the FluencyBank dataset. Audio files are not part of this released dataset but may be downloaded using URLs provided in the *_episodes.csv files. Original copyright remains with the podcast owners.
Each 3-second clip was annotated with the following labels by three annotators who were not clinicians but did have training on how to identify each type of stuttering event. Label files contain counts (out of three) corresponding to how many reviewers selected a given label. Multiple labels may be selected for a given clip.
SEP-28k_episodes.csv and fluencybank_episodes.csv).SEP-28k_labels.csv and fluencybank_labels.csv).If you find the SEP-28k dataset or this code useful in your research, please cite the following paper:
@inproceedings{stuttering-event-detection,
title = {Sep-28k: A Dataset for Stuttering Event Detection from Podcasts with People Who Stutter},
author = {Colin Lea and Vikramjit Mitra and Aparna Joshi and Sachin Kajarekar and Jeffrey Bigham},
year = {2021},
URL = {https://arxiv.org/pdf/2102.12394.pdf}
}
The SEP-28k dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/4.0/.
This dataset is provided by Apple and does not, by any mean, belong to me. Find this dataset in GitHub.
Facebook
Twitterhttps://www.listennotes.com/podcast-datasets/keyword/#termshttps://www.listennotes.com/podcast-datasets/keyword/#terms
Batch export all podcasts or episodes by full-text keyword search, e.g., people, brands, topics...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttp://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
Podcasting is a unique space where people can share their voices, ideas, and stories freely. Unlike platforms controlled by a single company (like YouTube or Instagram), podcasting supports true freedom of expression. However, this openness is now being threatened by AI tools, such as Notebook LM, which make it easy to produce fake, low-quality podcasts. Unfortunately, many of these AI-generated shows are created by spammers, scammers, or blackhat SEOs, and they are harming both listeners and genuine podcast creators.
At Listen Notes, the leading podcast search engine and podcast API, we believe that creating a quality podcast takes real effort. Listeners can tell when a show has been crafted with care, and that’s why we are committed to stopping the spread of fake, AI-generated podcasts on our platform.
This dataset represents a small subset of AI-generated fake podcasts that were flagged during attempts to add them to the Listen Notes podcast database. These "podcasts" were predominantly created using Notebook LM and are not designed for human consumption.
The goal of sharing this dataset is to support the AI community in developing more effective tools to combat spam. While it may not be possible to eliminate spam entirely, we can work together to minimize its impact and contribute to making the digital world a better place.
If you're building a podcast app for discovering human-made shows, PodcastAPI.com is your best bet. Apple Podcasts and Spotify are increasingly flooded with AI-generated fakes.