52 datasets found
  1. YouTube Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jan 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2023). YouTube Datasets [Dataset]. https://brightdata.com/products/datasets/youtube
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jan 9, 2023
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide, YouTube
    Description

    Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.

  2. YouTube Video and Channel Analytics

    • kaggle.com
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). YouTube Video and Channel Analytics [Dataset]. https://www.kaggle.com/datasets/thedevastator/youtube-video-and-channel-analytics/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Area covered
    YouTube
    Description

    YouTube Video and Channel Analytics

    YouTube Video and Channel Analytics: Statistics and Features

    By VISHWANATH SESHAGIRI [source]

    About this dataset

    The YouTube Video and Channel Metadata dataset is a comprehensive collection of data related to YouTube videos and channels. It consists of various features and statistics that provide insights into the performance and engagement of videos, as well as the overall popularity and success of channels.

    The dataset includes both direct features, such as total views, channel elapsed time, channel ID, video category ID, channel view count, likes per subscriber, dislikes per subscriber, comments per subscriber, and more. Additionally, there are indirect features derived from YouTube's API that provide additional metrics for analysis.

    One important aspect covered in this dataset is the ratio between certain metrics. For example: - The totalviews/channelelapsedtime ratio represents the average number of views a video has received relative to the elapsed time since the channel was created. - The likes/dislikes ratio indicates the proportion of likes on a video compared to dislikes. - The views/subscribers ratio showcases how engaged subscribers are by measuring the number of views relative to the number of subscribers.

    Other metrics explored in this dataset include comments/views ratio (representing viewer engagement), dislikes/views ratio (measuring viewer sentiment), comments/subscriber ratio (indicating community participation), likes/subscriber ratio (reflecting audience loyalty), dislikes/subscriber ratio (highlighting dissatisfaction levels), total number of subscribers for a channel (subscriberCount), total views on a channel (channelViewCount), total number of comments on a channel (channelCommentCount), among others.

    By analyzing these features and statistics within this dataset, researchers or data analysts can gain valuable insights into various aspects related to YouTube videos and channels. Furthermore, it may be possible to build statistical relationships between videos based on their performance characteristics or even develop topic trees based on similarities between different content categories. This dataset serves as an excellent resource for studying YouTube's ecosystem comprehensively.

    For accessing additional resources related to this dataset or exploring code repositories associated with it, users can refer to the provided GitHub repository

    How to use the dataset

    Introduction:

    Step 1: Understanding the Dataset Start by familiarizing yourself with the columns in the dataset. Here are some key features to pay attention to:

    • totalviews/channelelapsedtime: The ratio of total views of a video to the elapsed time of the channel.
    • channelViewCount: The total number of views on the channel.
    • likes/subscriber: The ratio of likes on a video to the number of subscribers of the channel.
    • views/subscribers: The ratio of views on a video to the number of subscribers of the channel.
    • subscriberCount: The total number of subscribers for a channel.
    • dislikes/views: The ratio of dislikes on a video to its total views.
    • comments/subscriber: The ratio comments on a video receive per subscriber count.

    Step 2: Determining Data Analysis Objectives Define your objectives or research questions before diving into data analysis using this dataset. For example, you may want to explore relationships between viewership, engagement metrics, and various attributes such as category ID or elapsed time.

    Step 3: Analyzing Relationships between Variables Use statistical techniques like correlation analysis or visualization tools like scatter plots, bar graphs, or heatmaps to understand relationships between variables in this dataset.

    For example: - Plotting totalviews/channelelapsedtime against channelViewCount can help identify patterns between overall video popularity and channels' view count growth over time. - Comparing likes/dislikes with comments/views can give insights into viewer engagement levels across different videos.

    Step 4: Building Machine Learning Models (Optional) If your objective includes predictive analysis or building machine learning models, select relevant features as predictors and the target variable (e.g., totalviews/channelelapsedtime) for training and evaluation.

    You can use various algorithms such as linear regression, decision trees, or neural networks to predict video performance or channel growth based on available attributes.

    Step 5: Evaluating Model Performance Assess the predictive model's performance using appropriate evaluation metrics like mean square...

  3. YouTube users worldwide 2020-2029

    • statista.com
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). YouTube users worldwide 2020-2029 [Dataset]. https://www.statista.com/forecasts/1144088/youtube-users-in-the-world
    Explore at:
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide, YouTube
    Description

    The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.

  4. YouTube Videos and Channels Metadata

    • kaggle.com
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). YouTube Videos and Channels Metadata [Dataset]. https://www.kaggle.com/datasets/thedevastator/revealing-insights-from-youtube-video-and-channe
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Area covered
    YouTube
    Description

    YouTube Videos and Channels Metadata

    Analyze the statistical relation between videos and form a topic tree

    By VISHWANATH SESHAGIRI [source]

    About this dataset

    This dataset contains YouTube video and channel metadata to analyze the statistical relation between videos and form a topic tree. With 9 direct features, 13 more indirect features, it has all that you need to build a deep understanding of how videos are related – including information like total views per unit time, channel views, likes/subscribers ratio, comments/views ratio, dislikes/subscribers ratio etc. This data provides us with a unique opportunity to gain insights on topics such as subscriber count trends over time or calculating the impact of trends on subscriber engagement. We can develop powerful models that show us how different types of content drive viewership and identify the most popular styles or topics within YouTube's vast catalogue. Additionally this data offers an intriguing look into consumer behaviour as we can explore what drives people to watch specific videos at certain times or appreciate certain channels more than others - by analyzing things like likes per subscribers and dislikes per views ratios for example! Finally this dataset is completely open source with an easy-to-understand Github repo making it an invaluable resource for anyone looking to gain better insights into how their audience interacts with their content and how they might improve it in the future

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use This Dataset

    In general, it is important to understand each parameter in the data set before proceeding with analysis. The parameters included are totalviews/channelelapsedtime, channelViewCount, likes/subscriber, views/subscribers, subscriberCounts, dislikes/views comments/subscriberchannelCommentCounts,, likes/dislikes comments/views dislikes/ subscribers totviewes /totsubsvews /elapsedtime.

    To use this dataset for your own analysis:1) Review each parameter’s meaning and purpose in our dataset; 2) Get familiar with basic descriptive statistics such as mean median mode range; 3) Create visualizations or tables based on subsets of our data; 4) Understand correlations between different sets of variables or parameters; 5) Generate meaningful conclusions about specific channels or topics based on organized graph hierarchies or tables.; 6) Analyze trends over time for individual parameters as well as an aggregate reaction from all users when videos are released

    Research Ideas

    • Predicting the Relative Popularity of Videos: This dataset can be used to build a statistical model that can predict the relative popularity of videos based on various factors such as total views, channel viewers, likes/dislikes ratio, and comments/views ratio. This model could then be used to make recommendations and predict which videos are likely to become popular or go viral.

    • Creating Topic Trees: The dataset can also be used to create topic trees or taxonomies by analyzing the content of videos and looking at what topics they cover. For example, one could analyze the most popular YouTube channels in a specific subject area, group together those that discuss similar topics, and then build an organized tree structure around those topics in order to better understand viewer interests in that area.

    • Viewer Engagement Analysis: This dataset could also be used for viewer engagement analysis purposes by analyzing factors such as subscriber count, average time spent watching a video per user (elapsed time), comments made per view etc., so as to gain insights into how engaged viewers are with specific content or channels on YouTube. From this information it would be possible to optimize content strategy accordingly in order improve overall engagement rates across various types of video content and channel types

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: YouTubeDataset_withChannelElapsed.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------| | totalviews/channelelapsedtime | Ratio of total views to channel elapsed time. (Ratio) | | channelViewCount | Total number of views for the channel. (Integer) | | likes/subscriber ...

  5. YouTube users in India 2020-2029

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). YouTube users in India 2020-2029 [Dataset]. https://www.statista.com/forecasts/1146150/youtube-users-in-india
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    The number of Youtube users in India was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach ****** million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Sri Lanka and Nepal.

  6. h

    YouTube-Commons

    • huggingface.co
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PleIAs (2024). YouTube-Commons [Dataset]. https://huggingface.co/datasets/PleIAs/YouTube-Commons
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset authored and provided by
    PleIAs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    📺 YouTube-Commons 📺

    YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.

      Content
    

    The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels). In total, this represents nearly 45 billion words (44,811,518,375). All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.

  7. Youtube video statistics for 1 million videos

    • kaggle.com
    Updated Jun 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mattia Zeni (2020). Youtube video statistics for 1 million videos [Dataset]. https://www.kaggle.com/datasets/mattiazeni/youtube-video-statistics-1million-videos/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2020
    Dataset provided by
    Kaggle
    Authors
    Mattia Zeni
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Motivation

    Study how YouTube videos become viral or, more in general, how they evolve in terms of views, likes and subscriptions is a topic of interest in many disciplines. With this dataset you can study such phenomena, with statistics about 1 million YouTube videos. The information was collected in 2013 when YouTube was exposing the data publicly: they removed this functionality in the years and now it's possible to have such statistics only to the owner of the video. This makes this dataset unique.

    Context

    This Dataset has been generated with YOUStatAnalyzer, a tool developed by myself (Mattia Zeni) when I was working for CREATE-NET (www.create-net.org) within the framework of the CONGAS FP7 project (http://www.congas-project.eu). For the project we needed to collect and analyse the dynamics of YouTube videos popularity. The dataset contains statistics of more than 1 million Youtube videos, chosen accordingly to random keywords extracted from the WordNet library (http://wordnet.princeton.edu).

    The motivation that led us to the development of the YOUStatAnalyser data collection tool and the creation of this dataset is that there's an active research community working on the interplay among user individual preferences, social dynamics, advertising mechanisms and a common problem is the lack of open large-scale datasets. At the same time, no tool was present at that time. Today, YouTube removed the possibility to visualize these data on each video's page, making this dataset unique.

    When using our dataset for research purposes, please cite it as:

    @INPROCEEDINGS{YOUStatAnalyzer, author={Mattia Zeni and Daniele Miorandi and Francesco {De Pellegrini}}, title = {{YOUStatAnalyzer}: a Tool for Analysing the Dynamics of {YouTube} Content Popularity}, booktitle = {Proc. 7th International Conference on Performance Evaluation Methodologies and Tools (Valuetools, Torino, Italy, December 2013)}, address = {Torino, Italy}, year = {2013} }

    Content

    The dataset contains statistics and metadata of 1 million YouTube videos, collected in 2013. The videos have been chosen accordingly to random keywords extracted from the WordNet library (http://wordnet.princeton.edu).

    Dataset structure

    The structure of a dataset is the following: { u'_id': u'9eToPjUnwmU', u'title': u'Traitor Compilation # 1 (Trouble ...', u'description': u'A traitor compilation by one are ...', u'category': u'Games', u'commentsNumber': u'6', u'publishedDate': u'2012-10-09T23:42:12.000Z', u'author': u'ServilityGaming', u'duration': u'208', u'type': u'video/3gpp', u'relatedVideos': [u'acjHy7oPmls', u'EhW2LbCjm7c', u'UUKigFAQLMA', ...], u'accessControl': { u'comment': {u'permission': u'allowed'}, u'list': {u'permission': u'allowed'}, u'videoRespond': {u'permission': u'moderated'}, u'rate': {u'permission': u'allowed'}, u'syndicate': {u'permission': u'allowed'}, u'embed': {u'permission': u'allowed'}, u'commentVote': {u'permission': u'allowed'}, u'autoPlay': {u'permission': u'allowed'} }, u'views': { u'cumulative': { u'data': [15.0, 25.0, 26.0, 26.0, ...] }, u'daily': { u'data': [15.0, 10.0, 1.0, 0.0, ..] } }, u'shares': { u'cumulative': { u'data': [0.0, 0.0, 0.0, 0.0, ...] }, u'daily': { u'data': [0.0, 0.0, 0.0, 0.0, ...] } }, u'watchtime': { u'cumulative': { u'data': [22.5666666667, 36.5166666667, 36.7, 36.7, ...] }, u'daily': { u'data': [22.5666666667, 13.95, 0.166666666667, 0.0, ...] } }, u'subscribers': { u'cumulative': { u'data': [0.0, 0.0, 0.0, 0.0, ...] }, u'daily': { u'data': [-1.0, 0.0, 0.0, 0.0, ...] } }, u'day': { u'data': [1349740800000.0, 1349827200000.0, 1349913600000.0, 1350000000000.0, ...] } }

    From the structure above is possible to see which fields an entry in the dataset has. It is possible to divide them into 2 sections:

    1) Video Information.

    _id -> Corresponding to the video ID and to the unique identifier of an entry in the database. title -> Te video's title. description -> The video's description. category -> The YouTube category the video is inserted in. commentsNumber -> The number of comments posted by users. publishedDate -> The date the video has been published. author -> The author of the video. duration -> The video duration in seconds. type -> The encoding type of the video. relatedVideos -> A list of related videos. accessControl -> A list of access policies for different aspects related to the video.

    2) Video Statistics.

    Each video can have 4 different statistics variables: views, shares, subscribers and watchtime. Recent videos have all of them while older video can have only the 'views' variable. Each variable has 2 dimensions, daily and cumulative.

    `views -> number of views collected by the vi...

  8. i

    Data from: YouTube Video Network Dataset for Israel-Hamas War

    • ieee-dataport.org
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thejas T (2023). YouTube Video Network Dataset for Israel-Hamas War [Dataset]. https://ieee-dataport.org/documents/youtube-video-network-dataset-israel-hamas-war
    Explore at:
    Dataset updated
    Dec 23, 2023
    Authors
    Thejas T
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Israel, YouTube
    Description

    Over the past few years YouTube has became a popular site for video broadcasting and earning money by publishing various different skills in the form of videos. For some people it has become a main source to earn money. Getting the videos trending among the viewers is one of the major tasks which each and every content creator wants. Popularity of any video and its reach to the audience is completely based on YouTube's Recommendation algorithm. This document is a dataset descriptor for the dataset collected over the time span of about 45 days during the Israel-Hamas War

  9. YouTube Channel Statistics Dataset

    • kaggle.com
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vamshi krishna Pennakoduru (2023). YouTube Channel Statistics Dataset [Dataset]. https://www.kaggle.com/datasets/vamshikrishna305/youtube-channel-statistics-dataset/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vamshi krishna Pennakoduru
    Area covered
    YouTube
    Description

    This comprehensive YouTube Video Analytics Dataset provides valuable insights into the performance of a wide range of videos on the popular platform. Spanning various genres, the dataset encompasses essential information such as - 1.Genre 2.video titles, 3.publish times, 4.view counts, 5.watch time (in hours), 6.subscriber counts, 7.average view durations, 8.impressions, and 9.impressions click-through rates (%).

    By leveraging this dataset, researchers, analysts, and data enthusiasts can delve into the factors that influence video success on YouTube. Analyze the correlation between genre and view counts, investigate the impact of subscriber counts on watch time, or explore how average view durations and click-through rates affect video impressions.

    Whether you're interested in exploring video trends, identifying patterns in user behavior, or developing machine learning models, this dataset serves as a valuable resource. Gain actionable insights into YouTube video performance and contribute to the ever-growing field of online content analysis. LICENCE NOTE - This is the dataset of my own channel.

  10. Z

    Dataset of Video Comments of a Vision Video Classified by Their Relevance,...

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristo, Eklekta (2024). Dataset of Video Comments of a Vision Video Classified by Their Relevance, Polarity, Intention, and Topic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4533301
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Karras, Oliver
    Kristo, Eklekta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all comments (comments and replies) of the YouTube vision video "Tunnels" by "The Boring Company" fetched on 2020-10-13 using YouTube API. The comments are classified manually by three persons. We performed a single-class labeling of the video comments regarding their relevance for requirement engineering (RE) (ham/spam), their polarity (positive/neutral/negative). Furthermore, we performed a multi-class labeling of the comments regarding their intention (feature request and problem report) and their topic (efficiency and safety). While a comment can only be relevant or not relevant and have only one polarity, a comment can have one or more intentions and also one or more topics.

    For the replies, one person also classified them regarding their relevance for RE. However, the investigation of the replies is ongoing and future work.

    Remark: For 126 comments and 26 replies, we could not determine the date and time since they were no longer accessible on YouTube at the time this data set was created. In the case of a missing date and time, we inserted "NULL" in the corresponding cell.

    This data set includes the following files:

    Dataset.xlsx contains the raw and labeled video comments and replies:

    For each comment, the data set contains:

    ID: An identification number generated by YouTube for the comment

    Date: The date and time of the creation of the comment

    Author: The username of the author of the comment

    Likes: The number of likes of the comment

    Replies: The number of replies to the comment

    Comment: The written comment

    Relevance: Label indicating the relevance of the comment for RE (ham = relevant, spam = irrelevant)

    Polarity: Label indicating the polarity of the comment

    Feature request: Label indicating that the comment request a feature

    Problem report: Label indicating that the comment reports a problem

    Efficiency: Label indicating that the comment deals with the topic efficiency

    Safety: Label indicating that the comment deals with the topic safety

    For each reply, the data set contains:

    ID: The identification number of the comment to which the reply belongs

    Date: The date and time of the creation of the reply

    Author: The username of the author of the reply

    Likes: The number of likes of the reply

    Comment: The written reply

    Relevance: Label indicating the relevance of the reply for RE (ham = relevant, spam = irrelevant)

    Detailed analysis results.xlsx contains the detailed results of all ten times repeated 10-fold cross validation analyses for each of all considered combinations of machine learning algorithms and features

    Guide Sheet - Multi-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual multi-class labeling

    Guide Sheet - Single-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual single-class labeling

    Python scripts for analysis.zip contains the scripts (as jupyter notebooks) and prepared data (as csv-files) for the analyses

  11. Top 1000 YouTube Channels in the World 🌐📊🎥

    • kaggle.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayank Anand (2024). Top 1000 YouTube Channels in the World 🌐📊🎥 [Dataset]. https://www.kaggle.com/datasets/mayankanand2701/top-1000-youtube-channels-in-the-world/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    Kaggle
    Authors
    Mayank Anand
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    YouTube
    Description

    YouTube is the world's largest video-sharing platform, launched in 2005. It allows users to upload, view, and share videos, and has grown to be a central hub for content creators across various fields, including entertainment, education, music, and more. With over 2 billion logged-in users monthly, YouTube has become an essential platform for digital content and marketing.

    The Top 1000 YouTube Channels Dataset captures detailed information about the top-performing YouTube channels globally. This dataset includes the following columns:

    • Rank : The ranking of the YouTube channel based on its overall popularity and performance.
    • Youtuber : The name of the YouTuber or the title of the YouTube channel.
    • Subscribers : The total number of subscribers to the channel, indicating its reach and popularity.
    • Video Views : The total number of video views the channel has accumulated, reflecting its engagement and audience interaction.
    • Video Count : The total number of videos uploaded by the channel, showing the content volume produced.
    • Category : The genre or category the channel belongs to, such as music, education, entertainment, etc.
    • Started : The year the channel was created, providing insight into its longevity and growth over time.

    This dataset is invaluable for analyzing trends, understanding content strategies, and benchmarking channel performances within the YouTube ecosystem.

  12. Youtube users in Vietnam 2017-2025

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Youtube users in Vietnam 2017-2025 [Dataset]. https://www.statista.com/forecasts/1146013/youtube-users-in-vietnam
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017 - 2019
    Area covered
    Vietnam
    Description

    In 2021, YouTube's user base in Vietnam amounts to approximately ***** million users. The number of YouTube users in Vietnam is projected to reach ***** million users by 2025. User figures have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  13. YouTube Dataset - Medical papers analising YouTube videos

    • figshare.com
    xlsx
    Updated Jun 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan-José Boté (2019). YouTube Dataset - Medical papers analising YouTube videos [Dataset]. http://doi.org/10.6084/m9.figshare.7108511.v3
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2019
    Dataset provided by
    figshare
    Authors
    Juan-José Boté
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    These dataset are review from YouTube content analysis papers in Health. Data used to publish a review paper into the Revista Cubana de Información en Ciencias de la Salud.This dataset are a list of scientific papers that have been reviewed to analise how health care scientist classify videos they analyse.Este conjunto de datos es una revisión de artículos sobre el análisis de contenido en YouTube en el campo de la salud. Este dataset se ha empleado para publicar un articulo en la Revista Cubana de Información en Ciencias de la Salud. Este dataset es una lista de articulos científicos que han sido revisados para analizar como los científicos en atención médica clasifican los videos que analizan.

  14. Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube...

    • zenodo.org
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiro Kikkawa; Jiro Kikkawa; Masao Takaku; Masao Takaku (2025). Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube Videos (TPDL 2025) [Dataset]. http://doi.org/10.5281/zenodo.15377209
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jiro Kikkawa; Jiro Kikkawa; Masao Takaku; Masao Takaku
    Area covered
    YouTube
    Description
    This dataset and supplementary tables are released in conjunction with the TPDL 2025 paper titled “How Retracted Research Persists on YouTube: Retraction Severity, Visibility, and Disclosure.” They provide detailed information used in the analysis to promote transparency, ensure reproducibility, and facilitate future studies on scholarly communication and retractions.

    The dataset contains the following files:

    FilenameData FormatDescription
    01_dataset_scholarly_references_on_YouTube.json.gzJSON LinesAn integrated dataset of scholarly references in YouTube video descriptions, covering videos posted up to the end of December 2023. This dataset combines the Altmetric dataset and the YA Domain Dataset and is the basis for identifying references to retracted articles. This dataset contains 743,529 scholarly references (386,628 unique DOIs) found in 322,521 YouTube videos uploaded by 77,974 channels.
    02_dataset_references_to_retracted_articles_on_YouTube.json.gzJSON Lines

    A dataset of retracted articles referenced in YouTube videos, used as the primary source for analysis in this paper. The dataset was created by cross-referencing the integrated reference dataset with the Retraction Watch database. It includes metadata such as DOI, article title, retraction reason, and severity classification (Severe, Moderate, or Minor) based on Woo and Walsh (2024), along with video- and channel-level statistics (e.g., view counts and subscriber counts) retrieved via the YouTube Data API v3 as of April 22, 2025. This dataset contains 1,002 retracted articles (360 unique DOIs) found in 956 YouTube videos uploaded by 714 channels.

    03_full_list_table3_sorted_by_reference_count_retracted_articles_on_YouTube.json.gzJSON Lines

    Complete list corresponding to Table 3, "Top 7 retracted articles ranked by the number of YouTube videos in which they are referenced." in the paper.

    04_full_list_table5_top10_most-viewed_video.json.gzJSON Lines

    Complete list corresponding to Table 5, "Top 10 most-viewed YouTube videos that reference retracted articles, sorted by video view count." in the paper.

    05_detailed_manual_coding_40_sampled_retracted_articles.xlsxXLSX

    This file provides detailed annotations for a manually coded sample of 40 YouTube videos referencing retracted scholarly articles. The sample includes 10 randomly selected videos from each of the four analytical groups categorized by publication timing (before/after retraction) and retraction severity (Moderate/Severe). The file includes reference stance for each video, visual/verbal mention of the article, and relevant timestamps when applicable. This dataset supplements the manual analysis results presented in Tables 6 and 7 in paper.

    Due to concerns over potential misuse (e.g., identification or harassment of individual content creators), this dataset is not made publicly available.
    Researchers who wish to use this dataset for scholarly purposes may contact the authors to request access.

    References

    • Woo, S., Walsh, J.P.: On the shoulders of fallen giants: What do references to retracted research tell us about citation behaviors? Quantitative Science Studies 5(1), 1–30 (2024). https://doi.org/10.1162/qss_a_00303
    • Kikkawa, J., Takaku, M.: How Retracted Article Persists on YouTube: Retraction Severity, Visibility, and Disclosure. Accepted for publication in the Proceedings of the 29th International Conference on Theory and Practice of Digital Libraries (TPDL 2025).
    • Accepted Papers (TPDL2025) - https://tpdl2025.github.io/Program/accepted_papers.html

    Fundings

    JSPS KAKENHI Grant Numbers JP22K18147 and JP23K11761.

  15. YouTube users in Europe 2020-2029

    • statista.com
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). YouTube users in Europe 2020-2029 [Dataset]. https://www.statista.com/topics/3853/internet-usage-in-europe/
    Explore at:
    Dataset updated
    May 21, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The number of Youtube users in Europe was forecast to continuously increase between 2024 and 2029 by in total 7.8 million users (+3.61 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 223.61 million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like North America and Australia & Oceania.

  16. Data from: Tag Recommendation Datasets

    • figshare.com
    txt
    Updated Jan 25, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabiano Belem (2016). Tag Recommendation Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.2067183.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 25, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Fabiano Belem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Associative Tag Recommendation Exploiting Multiple Textual FeaturesFabiano Belem, Eder Martins, Jussara M. Almeida Marcos Goncalves In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July. 2011AbstractThis work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimen- sions of the problem: (i) term co-occurrence with tags preassigned to the target object, (ii) terms extracted from mul- tiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic meth- ods, which extend previous, highly effective and efficient, state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object’s content. We also exploit two learning to rank techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Some further improvements can also be achieved, in some scenarios, with the new learning-to-rank based strategies, which have the additional advantage of being quite flexible and easily extensible to exploit other aspects of the tag recommendation problem.Bibtex Citation@inproceedings{belem@sigir11, author = {Fabiano Bel\'em and Eder Martins and Jussara Almeida and Marcos Gon\c{c}alves}, title = {Associative Tag Recommendation Exploiting Multiple Textual Features}, booktitle = {{Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR'11)}}, month = {{July}}, year = {2011} }

  17. o

    Dataset for "The Good, the Bad and the Bait: Detecting and Characterizing...

    • explore.openaire.eu
    • zenodo.org
    • +1more
    Updated Jan 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Savvas Zannettou; Sotirios Chatzis; Kostantinos Papadamou; Michael Sirivianos (2019). Dataset for "The Good, the Bad and the Bait: Detecting and Characterizing Clickbait on YouTube" [Dataset]. http://doi.org/10.5281/zenodo.2546907
    Explore at:
    Dataset updated
    Jan 22, 2019
    Authors
    Savvas Zannettou; Sotirios Chatzis; Kostantinos Papadamou; Michael Sirivianos
    Area covered
    YouTube
    Description

    This is the dataset used for the research "The Good, the Bad and the Bait: Detecting and Characterizing Clickbait on YouTube", with DOI: 10.1109/SPW.2018.00018. The dataset consists of three files: 1. groundtruth.json: This is the groundtruth dataset. We have 3443 manually annotated videos (we manually annotated more after the acceptance of the paper), and 17,648 videos that were obtained from channels that post clickbait or not. You can distinguish the method of annotation by observing the field "comments" in "clickbaitClassification" (the ones that have the comment "channels" are the ones obtained from the channels). 2. videos.json: Contains the data for 206K videos that were obtained as described in the paper. 3. predictions.json: It contains the mapping between the video id and the probability of our classifier. In our paper, we treat a video as clickbait if the probability is larger than 0.5. The related software produced for this study may be found here.

  18. h

    youtube_subs_howto100M

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wonchang Chung (2023). youtube_subs_howto100M [Dataset]. https://huggingface.co/datasets/totuta/youtube_subs_howto100M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Authors
    Wonchang Chung
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    YouTube
    Description

    Dataset Card for youtube_subs_howto100M

      Dataset Summary
    

    The youtube_subs_howto100M dataset is an English-language dataset of instruction-response pairs extracted from 309136 YouTube videos. The dataset was orignally inspired by and sourced from the HowTo100M dataset, which was developed for natural language search for video clips.

      Supported Tasks and Leaderboards
    

    conversational: The dataset can be used to train a model for instruction(request) and a long form… See the full description on the dataset page: https://huggingface.co/datasets/totuta/youtube_subs_howto100M.

  19. Youtube Statistics and MacroEconomics - 2023

    • kaggle.com
    Updated May 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    raahul raj (2024). Youtube Statistics and MacroEconomics - 2023 [Dataset]. https://www.kaggle.com/datasets/raahulraj/youtube-statistics-and-macroeconomics-2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    raahul raj
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    YouTube
    Description

    The dataset provides a comprehensive overview of leading YouTube channels, capturing key metrics such as subscriber counts, video views, and estimated annual earnings. It includes information on the channel's category, number of uploads, and geographical data like country and urban population. Additionally, socio-economic indicators such as gross tertiary education enrollment, unemployment rate, and development status of the channel's country are included. For instance, T-Series, the top-ranked channel, has 245 million subscribers and 228 billion video views, generating significant annual earnings. This dataset is invaluable for analyzing the dynamics of content creation on YouTube and understanding how geographical and economic factors influence channel success.

  20. i

    Netflix

    • ieee-dataport.org
    Updated Oct 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danil Shamsimukhametov (2021). Netflix [Dataset]. https://ieee-dataport.org/documents/youtube-netflix-web-dataset-encrypted-traffic-classification
    Explore at:
    Dataset updated
    Oct 1, 2021
    Authors
    Danil Shamsimukhametov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    YouTube flows

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bright Data (2023). YouTube Datasets [Dataset]. https://brightdata.com/products/datasets/youtube
Organization logo

YouTube Datasets

Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 9, 2023
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered
Worldwide, YouTube
Description

Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.

Search
Clear search
Close search
Google apps
Main menu