AbstractTTS/PODCAST dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "lexFridmanPodcast-transcript-audio"
Dataset Summary
This dataset is created by applying whisper to the videos of the Youtube channel Lex Fridman Podcast. The dataset was created a medium size whisper model.
Languages
Language: English
Dataset Structure
The dataset contains all the transcripts plus the audio of the different videos of Lex Fridman Podcast.
Data Fields
The dataset is composed by:
id: Id of the youtube… See the full description on the dataset page: https://huggingface.co/datasets/Whispering-GPT/lex-fridman-podcast-transcript-audio.
https://www.listennotes.com/podcast-datasets/keyword/#termshttps://www.listennotes.com/podcast-datasets/keyword/#terms
Batch export all podcasts or episodes by full-text keyword search, e.g., people, brands, topics...
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Some Podcasts
Podcasts are taken from the PodcastFillers dataset. The PodcastFillers dataset consists of 199 full-length podcast episodes in English with manually annotated filler words and automatically generated transcripts. The podcast audio recordings, sourced from SoundCloud, are CC-licensed, gender-balanced, and total 145 hours of audio from over 350 speakers.
[!TIP] This dataset doesn't upload the PodcastFillers annotations, which are under a non-commercial license. See here… See the full description on the dataset page: https://huggingface.co/datasets/ylacombe/podcast_fillers_by_license.
https://www.listennotes.com/podcast-datasets/category/#termshttps://www.listennotes.com/podcast-datasets/category/#terms
Batch export all podcasts in specific countries, languages or genres.
== Quick facts ==
The most up-to-date and comprehensive podcast database available All languages & All countries Includes over 3,500,000 podcasts Features 35+ data fields , such as basic metadata, global rank, RSS feed (with audio URLs), Spotify links, and more Delivered in SQLite format Learn how we build a high quality podcast database: https://www.listennotes.help/article/105-high-quality-podcast-database-from-listen-notes
== Use Cases ==
AI training, including speech recognition, generative AI, voice cloning / synthesis, and news analysis Alternative data for investment research, such as sentiment analysis of executive interviews, market research and tracking investment themes PR and marketing, including social monitoring, content research, outreach, and guest booking ...
== Data Attributes ==
See the full list of data attributes on this page: https://www.listennotes.com/podcast-datasets/fields/?filter=podcast_only
How to access podcast audio files: Our dataset includes RSS feed URLs for all podcasts. You can retrieve audio for over 170 million episodes directly from these feeds. With access to the raw audio, you’ll have high-quality podcast speech data ideal for AI training and related applications.
== Custom Offers ==
We can provide custom datasets based on your needs, such as language-specific data, daily/weekly/monthly update frequency, or one-time purchases.
We also provide a RESTful API at PodcastAPI.com
Contact us: hello@listennotes.com
== Need Help? ==
If you have any questions about our products, feel free to reach out hello@listennotes.com
== About Listen Notes, Inc. ==
Since 2017, Listen Notes, Inc. has provided the leading podcast search engine and podcast database.
According to a data from April 2025, the number of podcasts reached roughly **** million that year. At the same time, the number of episodes stood at more than *** million published up to then.
The number of podcast consumers in the United States has been growing steadily. According to estimates, around *** million people consumed podcasts of any format. This marks an increase of around ** million Americans. For the first time, these estimates included both audio and video podcasts, compared to previous years, when the data only covered audio consumption.
In 2024, a survey on podcast consumption revealed that ** percent of U.S. adults had either listened to or watched a podcast within the last month, a figure which has more than tripled over the past decade. Weekly podcast consumption has also sharply increased, and some of the world’s leading podcast publishers achieve millions of unique streams and downloads per month. Podcast consumption in the U.S. Once a niche format, podcasts have now become part of the mainstream media landscape. Between 2011 and 2025, the share of Americans who had ever consumed a podcast almost tripled, growing from ** to ** percent. As podcasts have grown in popularity, so has the variety of content available in the format. Some of the more popular podcast genres are music and comedy, but tens of millions of U.S. households have fans of sports, science, news and arts podcasts too. Podcasts are often also used as part of marketing strategies or to generate engagement between bloggers, news publications, or even different departments within a company. Like most forms of modern media, podcasts frequently include ads, and podcast ad revenue reached over *** billion U.S. dollars in the United States in 2023. By 2024, it is expected that advertising revenue in this sector will grow by around *** million each year and will exceed *** billion U.S. dollars in 2026. For U.S. consumers, podcasts are not just a source of inspiration or a way to escape from daily life but also an opportunity to educate themselves. In a survey held in early 2019, the majority of respondents said that their main reason for listening to podcasts was to learn new things. There are podcasts on philosophy, history, travel, and business, as well as much more including content aimed solely at educating children.
The Spotify Podcast Dataset consists of 105,360 episodes with transcripts and creator descriptions, and is provided as a training dataset for the summarization task.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
2 million podcast reviews for 100k podcasts, updated monthly. This dataset is intended to aid in analysis of text feedback and review data.
Dataset built with PointScrape.
Photo credit: Kati at xilophotography.com; Instagram @xilophotography. The creator featured can be found on Twitter @csallen. Full story: convertkit.com/courtland
For commercial applications and use of full dataset, please contact stuart@thoughtvector.io.
https://www.listennotes.com/podcast-datasets/playlist/#termshttps://www.listennotes.com/podcast-datasets/playlist/#terms
Batch export all podcasts or episodes in a specific playlist.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of 1024 freely accessible podcast episodes. Link to the respective audio file is provided.
== Quick facts ==
The most up-to-date and comprehensive podcast database available Includes over 3,500,000 podcasts and over 176 million episodes (including direct playable audio urls) Features 35+ data fields , such as basic metadata, global rank, RSS feed (with audio URLs), Spotify links, and more Delivered in SQLite format
== Use Cases ==
AI training, including speech recognition, generative AI, voice cloning / synthesis, and news analysis Alternative data for investment research, such as sentiment analysis of executive interviews, market research and tracking investment themes PR and marketing, including social monitoring, content research, outreach, and guest booking ...
== Custom Offers ==
We can provide custom datasets based on your needs, such as language-specific data, daily/weekly/monthly update frequency, or one-time purchases.
We also provide a RESTful API at PodcastAPI.com
Contact us: hello@listennotes.com
== Need Help? ==
If you have any questions about our products, feel free to reach out hello@listennotes.com
== About Listen Notes, Inc. ==
Since 2017, Listen Notes, Inc. has provided the leading podcast search engine and podcast database.
== Quick starts ==
Batch export podcast metadata to CSV files:
1) Export by search keyword: https://www.listennotes.com/podcast-datasets/keyword/
2) Export by category: https://www.listennotes.com/podcast-datasets/category/
== Quick facts ==
The most up-to-date and comprehensive podcast database available All languages & All countries Includes over 3,500,000 podcasts Features 35+ data fields , such as basic metadata, global rank, RSS feed (with audio URLs), Spotify links, and more Delivered in CSV format
== Data Attributes ==
See the full list of data attributes on this page: https://www.listennotes.com/podcast-datasets/fields/?filter=podcast_only
How to access podcast audio files: Our dataset includes RSS feed URLs for all podcasts. You can retrieve audio for over 170 million episodes directly from these feeds. With access to the raw audio, you’ll have high-quality podcast speech data ideal for AI training and related applications.
== Custom Offers ==
We can provide custom datasets based on your needs, such as language-specific data, daily/weekly/monthly update frequency, or one-time purchases.
We also provide a RESTful API at PodcastAPI.com
Contact us: hello@listennotes.com
== Need Help? ==
If you have any questions about our products, feel free to reach out hello@listennotes.com
== About Listen Notes, Inc. ==
Since 2017, Listen Notes, Inc. has provided the leading podcast search engine and podcast database.
In October 2020, it was found that the most popular podcast genre in the United States was comedy, with ** percent of respondents to a survey stating that they were very interested in podcasts designed to make them laugh. News podcasts and those based on true crime were also popular choices, as well as sport and health and fitness.<br><br><h2>Podcasts are becoming increasingly popular</h2>Podcasts have become a go-to form of audio entertainment, with digital episodes on different topics being either streamable or downloadable for easily accessible consumption. The number of podcast listeners within the United States is estimated to increase from **** million listeners to over *********** in 2024.<br><br> <h2>Podcast market leaders</h2>Within the constantly growing market for podcast, the globally leading podcast publisher in 2020 was iHeartRadio with 266.06 million unique streams and downloads. At the same time, a study found that Apple Podcast and Spotify were the most popular platforms to access podcast among Americans in 2020. Spotify was especially popular among the age group of 18 to 34 while younger and older consumers also often used Apple Podcasts.
According to a forecast from August 2023 on global podcast consumption, the number of podcast listeners worldwide has steadily increased and is predicted to rise even further. In 2023, the number of podcast listeners worldwide amounted to over *** million internet users, while this number was predicted to grow to more than *** million in 2027.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We release a new dataset consisting of podcast metadata (title and description) for 29 539 shows. This dataset can be used to reproduce the experiments from the article Topic Modeling on Podcast Short-Text Metadata accepted at the ECIR 2022 conference.
More information about this data and how it should be used in experiments can be found in our paper and GitHub repository.
Please cite our paper if you use the code or data.
https://www.listennotes.com/podcast-datasets/solutions/#termshttps://www.listennotes.com/podcast-datasets/solutions/#terms
Batch export all publicly accessible podcasts to a SQLite file.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global podcast market is experiencing robust growth, projected to reach a market size of $2312.1 million in 2025, expanding at a Compound Annual Growth Rate (CAGR) of 8.5% from 2025 to 2033. This significant expansion is driven by several key factors. The increasing accessibility of podcasts through various platforms like Apple Podcasts, Spotify, and others, coupled with the rise of smart speakers and mobile devices, has broadened the audience significantly. The diverse content formats, ranging from interviews and conversational podcasts to storytelling and investigative pieces, cater to a wide range of interests and preferences, fueling user engagement. Furthermore, the growing popularity of podcast advertising and sponsorship opportunities has attracted significant investment, fostering market expansion. The segmentation by podcast type (interview, conversational, monologue, etc.) and application (mobile, desktop) reveals specific areas of high demand which further inform growth strategies. Geographic distribution shows a strong presence across North America and Europe, with Asia-Pacific expected to exhibit significant growth potential in the coming years. The continued evolution of podcasting technology, including improvements in audio quality and accessibility features, will further enhance the user experience. The emergence of new platforms and innovative monetization strategies will play a crucial role in shaping the future of the market. While potential restraints like competition and maintaining consistent high-quality content exist, the overall growth trajectory remains positive, fueled by increasing listener engagement and a dynamic market landscape. The diverse range of podcast formats and applications ensures the market’s continued appeal and ensures sustained market expansion throughout the forecast period. This creates numerous opportunities for both established players and new entrants within the podcasting ecosystem.
AbstractTTS/PODCAST dataset hosted on Hugging Face and contributed by the HF Datasets community