52 datasets found

YouTube Datasets
brightdata.com
.json, .csv, .xlsx
Updated Jan 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2023). YouTube Datasets [Dataset]. https://brightdata.com/products/datasets/youtube
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 9, 2023
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide, YouTube
Description
Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.
YouTube Video and Channel Analytics
kaggle.com
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). YouTube Video and Channel Analytics [Dataset]. https://www.kaggle.com/datasets/thedevastator/youtube-video-and-channel-analytics/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Area covered
YouTube
Description
YouTube Video and Channel Analytics

YouTube Video and Channel Analytics: Statistics and Features

By VISHWANATH SESHAGIRI [source]

About this dataset

The YouTube Video and Channel Metadata dataset is a comprehensive collection of data related to YouTube videos and channels. It consists of various features and statistics that provide insights into the performance and engagement of videos, as well as the overall popularity and success of channels.

The dataset includes both direct features, such as total views, channel elapsed time, channel ID, video category ID, channel view count, likes per subscriber, dislikes per subscriber, comments per subscriber, and more. Additionally, there are indirect features derived from YouTube's API that provide additional metrics for analysis.

One important aspect covered in this dataset is the ratio between certain metrics. For example: - The totalviews/channelelapsedtime ratio represents the average number of views a video has received relative to the elapsed time since the channel was created. - The likes/dislikes ratio indicates the proportion of likes on a video compared to dislikes. - The views/subscribers ratio showcases how engaged subscribers are by measuring the number of views relative to the number of subscribers.

Other metrics explored in this dataset include comments/views ratio (representing viewer engagement), dislikes/views ratio (measuring viewer sentiment), comments/subscriber ratio (indicating community participation), likes/subscriber ratio (reflecting audience loyalty), dislikes/subscriber ratio (highlighting dissatisfaction levels), total number of subscribers for a channel (subscriberCount), total views on a channel (channelViewCount), total number of comments on a channel (channelCommentCount), among others.

By analyzing these features and statistics within this dataset, researchers or data analysts can gain valuable insights into various aspects related to YouTube videos and channels. Furthermore, it may be possible to build statistical relationships between videos based on their performance characteristics or even develop topic trees based on similarities between different content categories. This dataset serves as an excellent resource for studying YouTube's ecosystem comprehensively.

For accessing additional resources related to this dataset or exploring code repositories associated with it, users can refer to the provided GitHub repository

How to use the dataset

Introduction:

Step 1: Understanding the Dataset Start by familiarizing yourself with the columns in the dataset. Here are some key features to pay attention to:

totalviews/channelelapsedtime: The ratio of total views of a video to the elapsed time of the channel.

channelViewCount: The total number of views on the channel.

likes/subscriber: The ratio of likes on a video to the number of subscribers of the channel.

views/subscribers: The ratio of views on a video to the number of subscribers of the channel.

subscriberCount: The total number of subscribers for a channel.

dislikes/views: The ratio of dislikes on a video to its total views.

comments/subscriber: The ratio comments on a video receive per subscriber count.

Step 2: Determining Data Analysis Objectives Define your objectives or research questions before diving into data analysis using this dataset. For example, you may want to explore relationships between viewership, engagement metrics, and various attributes such as category ID or elapsed time.

Step 3: Analyzing Relationships between Variables Use statistical techniques like correlation analysis or visualization tools like scatter plots, bar graphs, or heatmaps to understand relationships between variables in this dataset.

For example: - Plotting totalviews/channelelapsedtime against channelViewCount can help identify patterns between overall video popularity and channels' view count growth over time. - Comparing likes/dislikes with comments/views can give insights into viewer engagement levels across different videos.

Step 4: Building Machine Learning Models (Optional) If your objective includes predictive analysis or building machine learning models, select relevant features as predictors and the target variable (e.g., totalviews/channelelapsedtime) for training and evaluation.

You can use various algorithms such as linear regression, decision trees, or neural networks to predict video performance or channel growth based on available attributes.

Step 5: Evaluating Model Performance Assess the predictive model's performance using appropriate evaluation metrics like mean square...
YouTube users worldwide 2020-2029
statista.com
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). YouTube users worldwide 2020-2029 [Dataset]. https://www.statista.com/forecasts/1144088/youtube-users-in-the-world
Explore at:
Dataset updated
Jul 7, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide, YouTube
Description
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
YouTube Videos and Channels Metadata
kaggle.com
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). YouTube Videos and Channels Metadata [Dataset]. https://www.kaggle.com/datasets/thedevastator/revealing-insights-from-youtube-video-and-channe
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Area covered
YouTube
Description
YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

By VISHWANATH SESHAGIRI [source]

About this dataset

This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use This Dataset

In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.

To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released

Research Ideas

Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.

Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.

Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...
YouTube users in India 2020-2029
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). YouTube users in India 2020-2029 [Dataset]. https://www.statista.com/forecasts/1146150/youtube-users-in-india
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
India
Description
The number of Youtube users in India was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach ****** million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Sri Lanka and Nepal.
h
YouTube-Commons
huggingface.co
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PleIAs (2024). YouTube-Commons [Dataset]. https://huggingface.co/datasets/PleIAs/YouTube-Commons
Explore at:
Dataset updated
Apr 17, 2024
Dataset authored and provided by
PleIAs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
📺 YouTube-Commons 📺

YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.

Content

The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels). In total, this represents nearly 45 billion words (44,811,518,375). All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.
Youtube video statistics for 1 million videos
kaggle.com
Updated Jun 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mattia Zeni (2020). Youtube video statistics for 1 million videos [Dataset]. https://www.kaggle.com/datasets/mattiazeni/youtube-video-statistics-1million-videos/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2020
Dataset provided by
Kaggle
Authors
Mattia Zeni
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
YouTube
Description
Motivation

Study how YouTube videos become viral or, more in general, how they evolve in terms of views, likes and subscriptions is a topic of interest in many disciplines. With this dataset you can study such phenomena, with statistics about 1 million YouTube videos. The information was collected in 2013 when YouTube was exposing the data publicly: they removed this functionality in the years and now it's possible to have such statistics only to the owner of the video. This makes this dataset unique.

Context

This Dataset has been generated with YOUStatAnalyzer, a tool developed by myself (Mattia Zeni) when I was working for CREATE-NET (www.create-net.org) within the framework of the CONGAS FP7 project (http://www.congas-project.eu). For the project we needed to collect and analyse the dynamics of YouTube videos popularity. The dataset contains statistics of more than 1 million Youtube videos, chosen accordingly to random keywords extracted from the WordNet library (http://wordnet.princeton.edu).

The motivation that led us to the development of the YOUStatAnalyser data collection tool and the creation of this dataset is that there's an active research community working on the interplay among user individual preferences, social dynamics, advertising mechanisms and a common problem is the lack of open large-scale datasets. At the same time, no tool was present at that time. Today, YouTube removed the possibility to visualize these data on each video's page, making this dataset unique.

When using our dataset for research purposes, please cite it as:

@INPROCEEDINGS{YOUStatAnalyzer, author={Mattia Zeni and Daniele Miorandi and Francesco {De Pellegrini}}, title = {{YOUStatAnalyzer}: a Tool for Analysing the Dynamics of {YouTube} Content Popularity}, booktitle = {Proc. 7th International Conference on Performance Evaluation Methodologies and Tools (Valuetools, Torino, Italy, December 2013)}, address = {Torino, Italy}, year = {2013} }

Content

The dataset contains statistics and metadata of 1 million YouTube videos, collected in 2013. The videos have been chosen accordingly to random keywords extracted from the WordNet library (http://wordnet.princeton.edu).

Dataset structure

The structure of a dataset is the following: { u'_id': u'9eToPjUnwmU', u'title': u'Traitor Compilation # 1 (Trouble ...', u'description': u'A traitor compilation by one are ...', u'category': u'Games', u'commentsNumber': u'6', u'publishedDate': u'2012-10-09T23:42:12.000Z', u'author': u'ServilityGaming', u'duration': u'208', u'type': u'video/3gpp', u'relatedVideos': [u'acjHy7oPmls', u'EhW2LbCjm7c', u'UUKigFAQLMA', ...], u'accessControl': { u'comment': {u'permission': u'allowed'}, u'list': {u'permission': u'allowed'}, u'videoRespond': {u'permission': u'moderated'}, u'rate': {u'permission': u'allowed'}, u'syndicate': {u'permission': u'allowed'}, u'embed': {u'permission': u'allowed'}, u'commentVote': {u'permission': u'allowed'}, u'autoPlay': {u'permission': u'allowed'} }, u'views': { u'cumulative': { u'data': [15.0, 25.0, 26.0, 26.0, ...] }, u'daily': { u'data': [15.0, 10.0, 1.0, 0.0, ..] } }, u'shares': { u'cumulative': { u'data': [0.0, 0.0, 0.0, 0.0, ...] }, u'daily': { u'data': [0.0, 0.0, 0.0, 0.0, ...] } }, u'watchtime': { u'cumulative': { u'data': [22.5666666667, 36.5166666667, 36.7, 36.7, ...] }, u'daily': { u'data': [22.5666666667, 13.95, 0.166666666667, 0.0, ...] } }, u'subscribers': { u'cumulative': { u'data': [0.0, 0.0, 0.0, 0.0, ...] }, u'daily': { u'data': [-1.0, 0.0, 0.0, 0.0, ...] } }, u'day': { u'data': [1349740800000.0, 1349827200000.0, 1349913600000.0, 1350000000000.0, ...] } }

From the structure above is possible to see which fields an entry in the dataset has. It is possible to divide them into 2 sections:

1) Video Information.

_id -> Corresponding to the video ID and to the unique identifier of an entry in the database. title -> Te video's title. description -> The video's description. category -> The YouTube category the video is inserted in. commentsNumber -> The number of comments posted by users. publishedDate -> The date the video has been published. author -> The author of the video. duration -> The video duration in seconds. type -> The encoding type of the video. relatedVideos -> A list of related videos. accessControl -> A list of access policies for different aspects related to the video.

2) Video Statistics.

Each video can have 4 different statistics variables: views, shares, subscribers and watchtime. Recent videos have all of them while older video can have only the 'views' variable. Each variable has 2 dimensions, daily and cumulative.

`views -> number of views collected by the vi...
i
Data from: YouTube Video Network Dataset for Israel-Hamas War
ieee-dataport.org
Updated Dec 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thejas T (2023). YouTube Video Network Dataset for Israel-Hamas War [Dataset]. https://ieee-dataport.org/documents/youtube-video-network-dataset-israel-hamas-war
Explore at:
Dataset updated
Dec 23, 2023
Authors
Thejas T
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Israel, YouTube
Description
Over the past few years YouTube has became a popular site for video broadcasting and earning money by publishing various different skills in the form of videos. For some people it has become a main source to earn money. Getting the videos trending among the viewers is one of the major tasks which each and every content creator wants. Popularity of any video and its reach to the audience is completely based on YouTube's Recommendation algorithm. This document is a dataset descriptor for the dataset collected over the time span of about 45 days during the Israel-Hamas War
YouTube Channel Statistics Dataset
kaggle.com
Updated Jul 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vamshi krishna Pennakoduru (2023). YouTube Channel Statistics Dataset [Dataset]. https://www.kaggle.com/datasets/vamshikrishna305/youtube-channel-statistics-dataset/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vamshi krishna Pennakoduru
Area covered
YouTube
Description
This comprehensive YouTube Video Analytics Dataset provides valuable insights into the performance of a wide range of videos on the popular platform. Spanning various genres, the dataset encompasses essential information such as - 1.Genre 2.video titles, 3.publish times, 4.view counts, 5.watch time (in hours), 6.subscriber counts, 7.average view durations, 8.impressions, and 9.impressions click-through rates (%).

By leveraging this dataset, researchers, analysts, and data enthusiasts can delve into the factors that influence video success on YouTube. Analyze the correlation between genre and view counts, investigate the impact of subscriber counts on watch time, or explore how average view durations and click-through rates affect video impressions.

Whether you're interested in exploring video trends, identifying patterns in user behavior, or developing machine learning models, this dataset serves as a valuable resource. Gain actionable insights into YouTube video performance and contribute to the ever-growing field of online content analysis. LICENCE NOTE - This is the dataset of my own channel.
Z
Dataset of Video Comments of a Vision Video Classified by Their Relevance,...
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristo, Eklekta (2024). Dataset of Video Comments of a Vision Video Classified by Their Relevance, Polarity, Intention, and Topic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4533301
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Karras, Oliver
Kristo, Eklekta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains all comments (comments and replies) of the YouTube vision video "Tunnels" by "The Boring Company" fetched on 2020-10-13 using YouTube API. The comments are classified manually by three persons. We performed a single-class labeling of the video comments regarding their relevance for requirement engineering (RE) (ham/spam), their polarity (positive/neutral/negative). Furthermore, we performed a multi-class labeling of the comments regarding their intention (feature request and problem report) and their topic (efficiency and safety). While a comment can only be relevant or not relevant and have only one polarity, a comment can have one or more intentions and also one or more topics.

For the replies, one person also classified them regarding their relevance for RE. However, the investigation of the replies is ongoing and future work.

Remark: For 126 comments and 26 replies, we could not determine the date and time since they were no longer accessible on YouTube at the time this data set was created. In the case of a missing date and time, we inserted "NULL" in the corresponding cell.

This data set includes the following files:

Dataset.xlsx contains the raw and labeled video comments and replies:

For each comment, the data set contains:

ID: An identification number generated by YouTube for the comment

Date: The date and time of the creation of the comment

Author: The username of the author of the comment

Likes: The number of likes of the comment

Replies: The number of replies to the comment

Comment: The written comment

Relevance: Label indicating the relevance of the comment for RE (ham = relevant, spam = irrelevant)

Polarity: Label indicating the polarity of the comment

Feature request: Label indicating that the comment request a feature

Problem report: Label indicating that the comment reports a problem

Efficiency: Label indicating that the comment deals with the topic efficiency

Safety: Label indicating that the comment deals with the topic safety

For each reply, the data set contains:

ID: The identification number of the comment to which the reply belongs

Date: The date and time of the creation of the reply

Author: The username of the author of the reply

Likes: The number of likes of the reply

Comment: The written reply

Relevance: Label indicating the relevance of the reply for RE (ham = relevant, spam = irrelevant)

Detailed analysis results.xlsx contains the detailed results of all ten times repeated 10-fold cross validation analyses for each of all considered combinations of machine learning algorithms and features

Guide Sheet - Multi-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual multi-class labeling

Guide Sheet - Single-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual single-class labeling

Python scripts for analysis.zip contains the scripts (as jupyter notebooks) and prepared data (as csv-files) for the analyses
Top 1000 YouTube Channels in the World 🌐📊🎥
kaggle.com
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mayank Anand (2024). Top 1000 YouTube Channels in the World 🌐📊🎥 [Dataset]. https://www.kaggle.com/datasets/mayankanand2701/top-1000-youtube-channels-in-the-world/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 25, 2024
Dataset provided by
Kaggle
Authors
Mayank Anand
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
YouTube
Description
YouTube is the world's largest video-sharing platform, launched in 2005. It allows users to upload, view, and share videos, and has grown to be a central hub for content creators across various fields, including entertainment, education, music, and more. With over 2 billion logged-in users monthly, YouTube has become an essential platform for digital content and marketing.

The Top 1000 YouTube Channels Dataset captures detailed information about the top-performing YouTube channels globally. This dataset includes the following columns:

Rank : The ranking of the YouTube channel based on its overall popularity and performance.

Youtuber : The name of the YouTuber or the title of the YouTube channel.

Subscribers : The total number of subscribers to the channel, indicating its reach and popularity.

Video Views : The total number of video views the channel has accumulated, reflecting its engagement and audience interaction.

Video Count : The total number of videos uploaded by the channel, showing the content volume produced.

Category : The genre or category the channel belongs to, such as music, education, entertainment, etc.

Started : The year the channel was created, providing insight into its longevity and growth over time.

This dataset is invaluable for analyzing trends, understanding content strategies, and benchmarking channel performances within the YouTube ecosystem.
Youtube users in Vietnam 2017-2025
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Youtube users in Vietnam 2017-2025 [Dataset]. https://www.statista.com/forecasts/1146013/youtube-users-in-vietnam
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017 - 2019
Area covered
Vietnam
Description
In 2021, YouTube's user base in Vietnam amounts to approximately ***** million users. The number of YouTube users in Vietnam is projected to reach ***** million users by 2025. User figures have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
YouTube Dataset - Medical papers analising YouTube videos
figshare.com
xlsx
Updated Jun 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan-José Boté (2019). YouTube Dataset - Medical papers analising YouTube videos [Dataset]. http://doi.org/10.6084/m9.figshare.7108511.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7108511.v3
Dataset updated
Jun 1, 2019
Dataset provided by
figshare
Authors
Juan-José Boté
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
These dataset are review from YouTube content analysis papers in Health. Data used to publish a review paper into the Revista Cubana de Información en Ciencias de la Salud.This dataset are a list of scientific papers that have been reviewed to analise how health care scientist classify videos they analyse.Este conjunto de datos es una revisión de artículos sobre el análisis de contenido en YouTube en el campo de la salud. Este dataset se ha empleado para publicar un articulo en la Revista Cubana de Información en Ciencias de la Salud. Este dataset es una lista de articulos científicos que han sido revisados para analizar como los científicos en atención médica clasifican los videos que analizan.

Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube...

zenodo.org

Updated Jun 29, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Jiro Kikkawa; Jiro Kikkawa; Masao Takaku; Masao Takaku (2025). Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube Videos (TPDL 2025) [Dataset]. http://doi.org/10.5281/zenodo.15377209

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.15377209

Dataset updated

Jun 29, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Jiro Kikkawa; Jiro Kikkawa; Masao Takaku; Masao Takaku

Area covered

YouTube

Description

This dataset and supplementary tables are released in conjunction with the TPDL 2025 paper titled “How Retracted Research Persists on YouTube: Retraction Severity, Visibility, and Disclosure.” They provide detailed information used in the analysis to promote transparency, ensure reproducibility, and facilitate future studies on scholarly communication and retractions.

The dataset contains the following files:

Filename	Data Format	Description
01_dataset_scholarly_references_on_YouTube.json.gz	JSON Lines	An integrated dataset of scholarly references in YouTube video descriptions, covering videos posted up to the end of December 2023. This dataset combines the Altmetric dataset and the YA Domain Dataset and is the basis for identifying references to retracted articles. This dataset contains 743,529 scholarly references (386,628 unique DOIs) found in 322,521 YouTube videos uploaded by 77,974 channels.
02_dataset_references_to_retracted_articles_on_YouTube.json.gz	JSON Lines	A dataset of retracted articles referenced in YouTube videos, used as the primary source for analysis in this paper. The dataset was created by cross-referencing the integrated reference dataset with the Retraction Watch database. It includes metadata such as DOI, article title, retraction reason, and severity classification (Severe, Moderate, or Minor) based on Woo and Walsh (2024), along with video- and channel-level statistics (e.g., view counts and subscriber counts) retrieved via the YouTube Data API v3 as of April 22, 2025. This dataset contains 1,002 retracted articles (360 unique DOIs) found in 956 YouTube videos uploaded by 714 channels.
03_full_list_table3_sorted_by_reference_count_retracted_articles_on_YouTube.json.gz	JSON Lines	Complete list corresponding to Table 3, "Top 7 retracted articles ranked by the number of YouTube videos in which they are referenced." in the paper.
04_full_list_table5_top10_most-viewed_video.json.gz	JSON Lines	Complete list corresponding to Table 5, "Top 10 most-viewed YouTube videos that reference retracted articles, sorted by video view count." in the paper.
05_detailed_manual_coding_40_sampled_retracted_articles.xlsx	XLSX	This file provides detailed annotations for a manually coded sample of 40 YouTube videos referencing retracted scholarly articles. The sample includes 10 randomly selected videos from each of the four analytical groups categorized by publication timing (before/after retraction) and retraction severity (Moderate/Severe). The file includes reference stance for each video, visual/verbal mention of the article, and relevant timestamps when applicable. This dataset supplements the manual analysis results presented in Tables 6 and 7 in paper.

Due to concerns over potential misuse (e.g., identification or harassment of individual content creators), this dataset is not made publicly available.
Researchers who wish to use this dataset for scholarly purposes may contact the authors to request access.

References

Woo, S., Walsh, J.P.: On the shoulders of fallen giants: What do references to retracted research tell us about citation behaviors? Quantitative Science Studies 5(1), 1–30 (2024). https://doi.org/10.1162/qss_a_00303
Kikkawa, J., Takaku, M.: How Retracted Article Persists on YouTube: Retraction Severity, Visibility, and Disclosure. Accepted for publication in the Proceedings of the 29th International Conference on Theory and Practice of Digital Libraries (TPDL 2025).
Accepted Papers (TPDL2025) - https://tpdl2025.github.io/Program/accepted_papers.html

Fundings

JSPS KAKENHI Grant Numbers JP22K18147 and JP23K11761.

YouTube users in Europe 2020-2029
statista.com
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). YouTube users in Europe 2020-2029 [Dataset]. https://www.statista.com/topics/3853/internet-usage-in-europe/
Explore at:
Dataset updated
May 21, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The number of Youtube users in Europe was forecast to continuously increase between 2024 and 2029 by in total 7.8 million users (+3.61 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 223.61 million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like North America and Australia & Oceania.
Data from: Tag Recommendation Datasets
figshare.com
txt
Updated Jan 25, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabiano Belem (2016). Tag Recommendation Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.2067183.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2067183.v4
Dataset updated
Jan 25, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Fabiano Belem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Associative Tag Recommendation Exploiting Multiple Textual FeaturesFabiano Belem, Eder Martins, Jussara M. Almeida Marcos Goncalves In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July. 2011AbstractThis work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimen- sions of the problem: (i) term co-occurrence with tags preassigned to the target object, (ii) terms extracted from mul- tiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic meth- ods, which extend previous, highly effective and efficient, state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object’s content. We also exploit two learning to rank techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Some further improvements can also be achieved, in some scenarios, with the new learning-to-rank based strategies, which have the additional advantage of being quite flexible and easily extensible to exploit other aspects of the tag recommendation problem.Bibtex Citation@inproceedings{belem@sigir11, author = {Fabiano Bel\'em and Eder Martins and Jussara Almeida and Marcos Gon\c{c}alves}, title = {Associative Tag Recommendation Exploiting Multiple Textual Features}, booktitle = {{Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR'11)}}, month = {{July}}, year = {2011} }
o
Dataset for "The Good, the Bad and the Bait: Detecting and Characterizing...
explore.openaire.eu
zenodo.org
+1more
Updated Jan 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Savvas Zannettou; Sotirios Chatzis; Kostantinos Papadamou; Michael Sirivianos (2019). Dataset for "The Good, the Bad and the Bait: Detecting and Characterizing Clickbait on YouTube" [Dataset]. http://doi.org/10.5281/zenodo.2546907
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.2546907
Dataset updated
Jan 22, 2019
Authors
Savvas Zannettou; Sotirios Chatzis; Kostantinos Papadamou; Michael Sirivianos
Area covered
YouTube
Description
This is the dataset used for the research "The Good, the Bad and the Bait: Detecting and Characterizing Clickbait on YouTube", with DOI: 10.1109/SPW.2018.00018. The dataset consists of three files: 1. groundtruth.json: This is the groundtruth dataset. We have 3443 manually annotated videos (we manually annotated more after the acceptance of the paper), and 17,648 videos that were obtained from channels that post clickbait or not. You can distinguish the method of annotation by observing the field "comments" in "clickbaitClassification" (the ones that have the comment "channels" are the ones obtained from the channels). 2. videos.json: Contains the data for 206K videos that were obtained as described in the paper. 3. predictions.json: It contains the mapping between the video id and the probability of our classifier. In our paper, we treat a video as clickbait if the probability is larger than 0.5. The related software produced for this study may be found here.
h
youtube_subs_howto100M
huggingface.co
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wonchang Chung (2023). youtube_subs_howto100M [Dataset]. https://huggingface.co/datasets/totuta/youtube_subs_howto100M
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2023
Authors
Wonchang Chung
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
YouTube
Description
Dataset Card for youtube_subs_howto100M

Dataset Summary

The youtube_subs_howto100M dataset is an English-language dataset of instruction-response pairs extracted from 309136 YouTube videos. The dataset was orignally inspired by and sourced from the HowTo100M dataset, which was developed for natural language search for video clips.

Supported Tasks and Leaderboards

conversational: The dataset can be used to train a model for instruction(request) and a long form… See the full description on the dataset page: https://huggingface.co/datasets/totuta/youtube_subs_howto100M.
Youtube Statistics and MacroEconomics - 2023
kaggle.com
Updated May 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
raahul raj (2024). Youtube Statistics and MacroEconomics - 2023 [Dataset]. https://www.kaggle.com/datasets/raahulraj/youtube-statistics-and-macroeconomics-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
raahul raj
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
YouTube
Description
The dataset provides a comprehensive overview of leading YouTube channels, capturing key metrics such as subscriber counts, video views, and estimated annual earnings. It includes information on the channel's category, number of uploads, and geographical data like country and urban population. Additionally, socio-economic indicators such as gross tertiary education enrollment, unemployment rate, and development status of the channel's country are included. For instance, T-Series, the top-ranked channel, has 245 million subscribers and 228 billion video views, generating significant annual earnings. This dataset is invaluable for analyzing the dynamics of content creation on YouTube and understanding how geographical and economic factors influence channel success.
i
Netflix
ieee-dataport.org
Updated Oct 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danil Shamsimukhametov (2021). Netflix [Dataset]. https://ieee-dataport.org/documents/youtube-netflix-web-dataset-encrypted-traffic-classification
Explore at:
Dataset updated
Oct 1, 2021
Authors
Danil Shamsimukhametov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
YouTube flows

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2023). YouTube Datasets [Dataset]. https://brightdata.com/products/datasets/youtube

YouTube Datasets

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Jan 9, 2023

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide, YouTube

Description

Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.

Clear search

Close search

Google apps

Main menu

YouTube Datasets

YouTube Video and Channel Analytics

YouTube Video and Channel Analytics

YouTube Video and Channel Analytics: Statistics and Features

About this dataset

How to use the dataset

YouTube users worldwide 2020-2029

YouTube Videos and Channels Metadata

YouTube Videos and Channels Metadata

Analyze the statistical relation between videos and form a topic tree

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use This Dataset

Research Ideas

Acknowledgements

License

Columns

YouTube users in India 2020-2029

YouTube-Commons

Youtube video statistics for 1 million videos

Motivation

Context

Content

Dataset structure

Data from: YouTube Video Network Dataset for Israel-Hamas War

YouTube Channel Statistics Dataset

Dataset of Video Comments of a Vision Video Classified by Their Relevance,...

Top 1000 YouTube Channels in the World 🌐📊🎥

Youtube users in Vietnam 2017-2025

YouTube Dataset - Medical papers analising YouTube videos

Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube...

YouTube users in Europe 2020-2029

Data from: Tag Recommendation Datasets

Dataset for "The Good, the Bad and the Bait: Detecting and Characterizing...

youtube_subs_howto100M

Youtube Statistics and MacroEconomics - 2023

Netflix

YouTube Datasets