100+ datasets found

YouTube Datasets
brightdata.com
.json, .csv, .xlsx
Updated Jan 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2023). YouTube Datasets [Dataset]. https://brightdata.com/products/datasets/youtube
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 9, 2023
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide, YouTube
Description
Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.
h
YouTube-Commons
huggingface.co
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PleIAs (2024). YouTube-Commons [Dataset]. https://huggingface.co/datasets/PleIAs/YouTube-Commons
Explore at:
Dataset updated
Apr 17, 2024
Dataset authored and provided by
PleIAs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
📺 YouTube-Commons 📺

YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.

Content

The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels). In total, this represents nearly 45 billion words (44,811,518,375). All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.
P
MLB-YouTube Dataset Dataset
paperswithcode.com
Updated Mar 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AJ Piergiovanni; Michael S. Ryoo (2021). MLB-YouTube Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mlb-youtube-dataset
Explore at:
Dataset updated
Mar 23, 2021
Authors
AJ Piergiovanni; Michael S. Ryoo
Area covered
YouTube
Description
The MLB-YouTube dataset is a new, large-scale dataset consisting of 20 baseball games from the 2017 MLB post-season available on YouTube with over 42 hours of video footage. The dataset consists of two components: segmented videos for activity recognition and continuous videos for activity classification. It is quite challenging as it is created from TV broadcast baseball games where multiple different activities share the camera angle. Further, the motion/appearance difference between the various activities is quite small.
h
youtube-transcriptions
huggingface.co
Updated Jul 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Briggs (2021). youtube-transcriptions [Dataset]. https://huggingface.co/datasets/jamescalam/youtube-transcriptions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2021
Authors
James Briggs
License
https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/
Area covered
YouTube
Description
The YouTube transcriptions dataset contains technical tutorials (currently from James Briggs, Daniel Bourke, and AI Coffee Break) transcribed using OpenAI's Whisper (large). Each row represents roughly a sentence-length chunk of text alongside the video URL and timestamp. Note that each item in the dataset contains just a short chunk of text. For most use cases you will likely need to merge multiple rows to create more substantial chunks of text, if you need to do that, this code snippet will… See the full description on the dataset page: https://huggingface.co/datasets/jamescalam/youtube-transcriptions.
P
YouTube-100M Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shawn Hershey; Sourish Chaudhuri; Daniel P. W. Ellis; Jort F. Gemmeke; Aren Jansen; R. Channing Moore; Manoj Plakal; Devin Platt; Rif A. Saurous; Bryan Seybold; Malcolm Slaney; Ron J. Weiss; Kevin Wilson, YouTube-100M Dataset [Dataset]. https://paperswithcode.com/dataset/youtube-100m
Explore at:
Authors
Shawn Hershey; Sourish Chaudhuri; Daniel P. W. Ellis; Jort F. Gemmeke; Aren Jansen; R. Channing Moore; Manoj Plakal; Devin Platt; Rif A. Saurous; Bryan Seybold; Malcolm Slaney; Ron J. Weiss; Kevin Wilson
Area covered
YouTube
Description
The YouTube-100M data set consists of 100 million YouTube videos: 70M training videos, 10M evaluation videos, and 20M validation videos. Videos average 4.6 minutes each for a total of 5.4M training hours. Each of these videos is labeled with 1 or more topic identifiers from a set of 30,871 labels. There are an average of around 5 labels per video. The labels are assigned automatically based on a combination of metadata (title, description, comments, etc.), context, and image content for each video. The labels apply to the entire video and range from very generic (e.g. “Song”) to very specific (e.g. “Cormorant”). Being machine generated, the labels are not 100% accurate and of the 30K labels, some are clearly acoustically relevant (“Trumpet”) and others are less so (“Web Page”). Videos often bear annotations with multiple degrees of specificity. For example, videos labeled with “Trumpet” are often labeled “Entertainment” as well, although no hierarchy is enforced.
c
Youtube programming videos sample dataset
crawlfeeds.com
json, zip
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Youtube programming videos sample dataset [Dataset]. https://crawlfeeds.com/datasets/youtube-programming-videos-sample-dataset
Explore at:
zip, jsonAvailable download formats
Dataset updated
Apr 27, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Area covered
YouTube
Description
Programming youtube videos dataset. Total records extracted more than 300. Last extracted on 24 jan 2022.

Get in touch with crawlfeeds team for large datasets and customized youtube datasets.
YouTube 8 Million - Data Lakehouse Ready
registry.opendata.aws
Updated Feb 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Services (2022). YouTube 8 Million - Data Lakehouse Ready [Dataset]. https://registry.opendata.aws/yt8m/
Explore at:
Dataset updated
Feb 17, 2022
Dataset provided by
Amazon Web Serviceshttp://aws.amazon.com/
Amazon Web Serviceshttps://aws.amazon.com/
Area covered
YouTube
Description
This both the original .tfrecords and a Parquet representation of the YouTube 8 Million dataset. YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This dataset also includes the YouTube-8M Segments data from June 2019. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of the Registry of Open Data S3 bucket. Deploy this dataset's corresponding CloudFormation template to create the AWS Glue Catalog entries into your account in about 30 seconds. That one step will enable you to interact with the data with AWS Athena, AWS SageMaker, AWS EMR, or join into your AWS Redshift clusters. More detail in (the documentation)[https://github.com/aws-samples/data-lake-as-code/blob/roda-ml/README.md.
Data from: Youtube social network
kaggle.com
zip
Updated Sep 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenzo De Tomasi (2019). Youtube social network [Dataset]. https://www.kaggle.com/datasets/lodetomasi1995/youtube-social-network
Explore at:
zip(10604317 bytes)Available download formats
Dataset updated
Sep 1, 2019
Authors
Lorenzo De Tomasi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
Youtube social network and ground-truth communities Dataset information Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.

We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

more info : https://snap.stanford.edu/data/com-Youtube.html
Hours of video uploaded to YouTube every minute 2007-2022
statista.com
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Hours of video uploaded to YouTube every minute 2007-2022 [Dataset]. https://www.statista.com/statistics/259477/hours-of-video-uploaded-to-youtube-every-minute/
Explore at:
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2007 - Jun 2022
Area covered
Worldwide, YouTube
Description
As of June 2022, more than *** hours of video were uploaded to YouTube every minute. This equates to approximately ****** hours of newly uploaded content per hour. The amount of content on YouTube has increased dramatically as consumer’s appetites for online video has grown. In fact, the number of video content hours uploaded every 60 seconds grew by around ** percent between 2014 and 2020. YouTube global users Online video is one of the most popular digital activities worldwide, with ** percent of internet users worldwide watching more than ** hours of online videos on a weekly basis in 2023. It was estimated that in 2023 YouTube would reach approximately *** million users worldwide. In 2022, the video platform was one of the leading media and entertainment brands worldwide, with a value of more than ** billion U.S. dollars. YouTube video content consumption The most viewed YouTube channels of all time have racked up billions of viewers, millions of subscribers and cover a wide variety of topics ranging from music to cosmetics. The YouTube channel owner with the most video views is Indian music label T-Series, which counted ****** billion lifetime views. Other popular YouTubers are gaming personalities such as PewDiePie, DanTDM and Markiplier.
h
youtube-dataset
huggingface.co
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DrTim-AI (2025). youtube-dataset [Dataset]. https://huggingface.co/datasets/BitiBytes123/youtube-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2025
Authors
DrTim-AI
Area covered
YouTube
Description
BitiBytes123/youtube-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
YouTube users worldwide 2020-2029
statista.com
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). YouTube users worldwide 2020-2029 [Dataset]. https://www.statista.com/forecasts/1144088/youtube-users-in-the-world
Explore at:
Dataset updated
Mar 3, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total 232.5 million users (+24.91 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 1.2 billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
h
youtube-dataset
huggingface.co
Updated Oct 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prompt Engineer (2024). youtube-dataset [Dataset]. https://huggingface.co/datasets/Prompt48/youtube-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 23, 2024
Authors
Prompt Engineer
Area covered
YouTube
Description
Prompt48/youtube-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Countries with the most YouTube users 2025
statista.com
ai-chatbox.pro
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Countries with the most YouTube users 2025 [Dataset]. https://www.statista.com/statistics/280685/number-of-monthly-unique-youtube-users/
Explore at:
Dataset updated
Feb 17, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
YouTube, Worldwide
Description
As of February 2025, India was the country with the largest YouTube audience by far, with approximately 491 million users engaging with the popular social video platform. The United States followed, with around 253 million YouTube viewers. Brazil came in third, with 144 million users watching content on YouTube. The United Kingdom saw around 54.8 million internet users engaging with the platform in the examined period. What country has the highest percentage of YouTube users? In July 2024, the United Arab Emirates was the country with the highest YouTube penetration worldwide, as around 94 percent of the country's digital population engaged with the service. In 2024, YouTube counted around 100 million paid subscribers for its YouTube Music and YouTube Premium services. YouTube mobile markets In 2024, YouTube was among the most popular social media platforms worldwide. In terms of revenues, the YouTube app generated approximately 28 million U.S. dollars in revenues in the United States in January 2024, as well as 19 million U.S. dollars in Japan.
A YouTube Dataset with User-Level Usage Data
kaggle.com
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shruti Lall (2025). A YouTube Dataset with User-Level Usage Data [Dataset]. https://www.kaggle.com/datasets/shrutilall/a-youtube-dataset-with-user-level-usage-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shruti Lall
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
YouTube
Description
This dataset contains anonymized logs of user-level YouTube viewing activity, collected via Amazon Mechanical Turk. Each user in the dataset provided at least six months of their YouTube watch history, enabling longitudinal analysis of personal viewing patterns.

Each row in the dataset represents a single watch event and includes metadata such as: - the video ID - watch timestamp - whether the user was subscribed to the channel at the time - and whether the video was part of a playlist

This dataset is intended to support research in user behavior modeling, content recommendation systems, temporal video engagement, and personalized analytics.

The dataset accompanies the paper:

"A YouTube dataset with user-level usage data: Baseline characteristics and key insights"
Authors: Shruti Lall, Mohit Agarwal, Raghupathy Sivakumar
Conference: IEEE ICC 2020 – International Conference on Communications

If you use this dataset in your research, please cite the paper above.
Data from: YouTube Videos Datasets
brightdata.com
.json, .csv, .xlsx
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). YouTube Videos Datasets [Dataset]. https://brightdata.com/products/datasets/youtube/videos
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 20, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide, YouTube
Description
Use our YouTube Videos dataset to extract detailed information from public videos and filter by video title, views, upload date, or likes. Data points include video URL, title, description, thumbnail, upload date, view count, like count, comment count, tags, and more. You can purchase the entire dataset or a customized subset, tailored to your needs. Popular use cases for this dataset include trend analysis, content performance tracking, brand monitoring, and influencer campaign optimization.
f
YouTube Dataset on Mobile Streaming for Internet Traffic Modeling, Network...
figshare.com
txt
Updated Apr 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank Loh; Florian Wamser; Fabian Poignée; Stefan Geißler; Tobias Hoßfeld (2022). YouTube Dataset on Mobile Streaming for Internet Traffic Modeling, Network Management, and Streaming Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.19096823.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19096823.v2
Dataset updated
Apr 14, 2022
Dataset provided by
figshare
Authors
Frank Loh; Florian Wamser; Fabian Poignée; Stefan Geißler; Tobias Hoßfeld
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.
Most Watched Youtube Videos
kaggle.com
zip
Updated Apr 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jatinthakur706 (2024). Most Watched Youtube Videos [Dataset]. https://www.kaggle.com/datasets/jatinthakur706/most-watched-youtube-videos
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 19, 2024
Authors
Jatinthakur706
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
This dataset contains data related to most watched YouTube videos till April 2024 . This contains different columns namely views,artist,channel,etc. The data is ranked on the basis of number of views.
a
Yale YouTube Video Text
academictorrents.com
bittorrent
Updated Oct 20, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yale (2014). Yale YouTube Video Text [Dataset]. https://academictorrents.com/details/156802226bcf5747e0bea4e4f14c03b3b952de80
Explore at:
bittorrent(434765881)Available download formats
Dataset updated
Oct 20, 2014
Dataset authored and provided by
Yale
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Area covered
YouTube
Description
YouTube Video Text (YVT) contains 30 videos. Each video has 15-second length, 30 frames per second, HD 720p quality and was collected from YouTube. The text content in the dataset can be divided into two categories, overlay text (e.g., captions, songs title, logos) and scene text (e.g. street signs, business signs, words on shirt).
T
youtube_vis
tensorflow.org
Updated Feb 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). youtube_vis [Dataset]. https://www.tensorflow.org/datasets/catalog/youtube_vis
Explore at:
Dataset updated
Feb 11, 2021
Area covered
YouTube
Description
Youtube-vis is a video instance segmentation dataset. It contains 2,883 high-resolution YouTube videos, a per-pixel category label set including 40 common objects such as person, animals and vehicles, 4,883 unique video instances, and 131k high-quality manual annotations.

The YouTube-VIS dataset is split into 2,238 training videos, 302 validation videos and 343 test videos.

No files were removed or altered during preprocessing.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('youtube_vis', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
R
Racoons Youtube Dataset
universe.roboflow.com
zip
Updated Oct 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PUP (2022). Racoons Youtube Dataset [Dataset]. https://universe.roboflow.com/pup/racoons-youtube-io8ou
Explore at:
zipAvailable download formats
Dataset updated
Oct 18, 2022
Dataset authored and provided by
PUP
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Variables measured
Coons Bounding Boxes
Description
Racoons Youtube

## Overview Racoons Youtube is a dataset for object detection tasks - it contains Coons annotations for 400 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2023). YouTube Datasets [Dataset]. https://brightdata.com/products/datasets/youtube

YouTube Datasets

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Jan 9, 2023

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide, YouTube

Description

Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.

Clear search

Close search

Google apps

Main menu

YouTube Datasets

YouTube-Commons

MLB-YouTube Dataset Dataset

youtube-transcriptions

YouTube-100M Dataset

Youtube programming videos sample dataset

YouTube 8 Million - Data Lakehouse Ready

Data from: Youtube social network

Hours of video uploaded to YouTube every minute 2007-2022

youtube-dataset

YouTube users worldwide 2020-2029

youtube-dataset

Countries with the most YouTube users 2025

A YouTube Dataset with User-Level Usage Data

Data from: YouTube Videos Datasets

YouTube Dataset on Mobile Streaming for Internet Traffic Modeling, Network...

Most Watched Youtube Videos

Yale YouTube Video Text

youtube_vis

Racoons Youtube Dataset

Racoons Youtube

YouTube Datasets