36 datasets found
  1. YouTube users worldwide 2020-2029

    • statista.com
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). YouTube users worldwide 2020-2029 [Dataset]. https://www.statista.com/forecasts/1144088/youtube-users-in-the-world
    Explore at:
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide, YouTube
    Description

    The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.

  2. Top Youtube Artist

    • kaggle.com
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrityunjay Pathak (2023). Top Youtube Artist [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/top-youtube-artist
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2023
    Dataset provided by
    Kaggle
    Authors
    Mrityunjay Pathak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    YouTube was created in 2005, with the first video – Me at the Zoo - being uploaded on 23 April 2005. Since then, 1.3 billion people have set up YouTube accounts. In 2018, people watch nearly 5 billion videos each day. People upload 300 hours of video to the site every minute.

    According to 2016 research undertaken by Pexeso, music only accounts for 4.3% of YouTube’s content. Yet it makes 11% of the views. Clearly, an awful lot of people watch a comparatively small number of music videos. It should be no surprise, therefore, that the most watched videos of all time on YouTube are predominantly music videos.

    On August 13, BTS became the most-viewed artist in YouTube history, accumulating over 26.7 billion views across all their official channels. This count includes all music videos and dance practice videos.

    Justin Bieber and Ed Sheeran now hold the records for second and third-highest views, with over 26 billion views each.

    Currently, BTS’s most viewed videos are their music videos for “**Boy With Luv**,” “**Dynamite**,” and “**DNA**,” which all have over 1.4 billion views.

    Headers of the Dataset Total = Total views (in millions) across all official channels Avg = Current daily average of all videos combined 100M = Number of videos with more than 100 million views

  3. Top 1000 YouTube Channels in the World 🌐📊🎥

    • kaggle.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayank Anand (2024). Top 1000 YouTube Channels in the World 🌐📊🎥 [Dataset]. https://www.kaggle.com/datasets/mayankanand2701/top-1000-youtube-channels-in-the-world/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    Kaggle
    Authors
    Mayank Anand
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    YouTube
    Description

    YouTube is the world's largest video-sharing platform, launched in 2005. It allows users to upload, view, and share videos, and has grown to be a central hub for content creators across various fields, including entertainment, education, music, and more. With over 2 billion logged-in users monthly, YouTube has become an essential platform for digital content and marketing.

    The Top 1000 YouTube Channels Dataset captures detailed information about the top-performing YouTube channels globally. This dataset includes the following columns:

    • Rank : The ranking of the YouTube channel based on its overall popularity and performance.
    • Youtuber : The name of the YouTuber or the title of the YouTube channel.
    • Subscribers : The total number of subscribers to the channel, indicating its reach and popularity.
    • Video Views : The total number of video views the channel has accumulated, reflecting its engagement and audience interaction.
    • Video Count : The total number of videos uploaded by the channel, showing the content volume produced.
    • Category : The genre or category the channel belongs to, such as music, education, entertainment, etc.
    • Started : The year the channel was created, providing insight into its longevity and growth over time.

    This dataset is invaluable for analyzing trends, understanding content strategies, and benchmarking channel performances within the YouTube ecosystem.

  4. YouTube users in India 2020-2029

    • statista.com
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). YouTube users in India 2020-2029 [Dataset]. https://www.statista.com/forecasts/1146150/youtube-users-in-india
    Explore at:
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    The number of Youtube users in India was forecast to continuously increase between 2024 and 2029 by in total 222.2 million users (+34.88 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 859.26 million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Sri Lanka and Nepal.

  5. h

    YouTube-Commons

    • huggingface.co
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PleIAs (2024). YouTube-Commons [Dataset]. https://huggingface.co/datasets/PleIAs/YouTube-Commons
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset authored and provided by
    PleIAs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    📺 YouTube-Commons 📺

    YouTube-Commons is a collection of audio transcripts of 2,063,066 videos shared on YouTube under a CC-By license.

      Content
    

    The collection comprises 22,709,724 original and automatically translated transcripts from 3,156,703 videos (721,136 individual channels). In total, this represents nearly 45 billion words (44,811,518,375). All the videos where shared on YouTube with a CC-BY license: the dataset provide all the necessary provenance information… See the full description on the dataset page: https://huggingface.co/datasets/PleIAs/YouTube-Commons.

  6. YouTube's Channels Dataset

    • kaggle.com
    Updated Mar 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HarshitHGupta (2021). YouTube's Channels Dataset [Dataset]. https://www.kaggle.com/datasets/harshithgupta/youtubes-channels-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    HarshitHGupta
    Area covered
    YouTube
    Description

    Context

    YouTube is an American online video-sharing platform headquartered in San Bruno, California. The service, created in February 2005 by three former PayPal employees—Chad Hurley, Steve Chen, and Jawed Karim—was bought by Google in November 2006 for US$1.65 billion and now operates as one of the company's subsidiaries. YouTube is the second most-visited website after Google Search, according to Alexa Internet rankings.

    YouTube allows users to upload, view, rate, share, add to playlists, report, comment on videos, and subscribe to other users. Available content includes video clips, TV show clips, music videos, short and documentary films, audio recordings, movie trailers, live streams, video blogging, short original videos, and educational videos.

    YouTube (the world-famous video sharing website) maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments, and likes). Note that they’re not the most-viewed videos overall for the calendar year”. Top performers on the YouTube trending list are music videos (such as the famously virile “Gangam Style”), celebrity and/or reality TV performances, and the random dude-with-a-camera viral videos that YouTube is well-known for.

    This dataset is a daily record of the top trending YouTube videos.

    Note that this dataset is a structurally improved version of this dataset.

    Acknowledgements

    This dataset was collected using the YouTube API. This Description is cited in Wikipedia.

  7. Z

    Spotify and Youtube

    • data.niaid.nih.gov
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guarisco, Marco (2023). Spotify and Youtube [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10253414
    Explore at:
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Guarisco, Marco
    Sallustio, Marco
    Rastelli, Salvatore
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    This is the statistics for the Top 10 songs of various spotify artists and their YouTube videos. The Creators above generated the data and uploaded it to Kaggle on February 6-7 2023. The license to use this data is "CC0: Public Domain", allowing the data to be copied, modified, distributed, and worked on without having to ask permission. The data is in numerical and textual CSV format as attached. This dataset contains the statistics and attributes of the top 10 songs of various artists in the world. As described by the creators above, it includes 26 variables for each of the songs collected from spotify. These variables are briefly described next:

    Track: name of the song, as visible on the Spotify platform. Artist: name of the artist. Url_spotify: the Url of the artist. Album: the album in wich the song is contained on Spotify. Album_type: indicates if the song is relesead on Spotify as a single or contained in an album. Uri: a spotify link used to find the song through the API. Danceability: describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. Energy: is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. Key: the key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. Loudness: the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. Speechiness: detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. Acousticness: a confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. Instrumentalness: predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. Liveness: detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. Valence: a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). Tempo: the overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. Duration_ms: the duration of the track in milliseconds. Stream: number of streams of the song on Spotify. Url_youtube: url of the video linked to the song on Youtube, if it have any. Title: title of the videoclip on youtube. Channel: name of the channel that have published the video. Views: number of views. Likes: number of likes. Comments: number of comments. Description: description of the video on Youtube. Licensed: Indicates whether the video represents licensed content, which means that the content was uploaded to a channel linked to a YouTube content partner and then claimed by that partner. official_video: boolean value that indicates if the video found is the official video of the song. The data was last updated on February 7, 2023.

  8. YouTube Trending Video Dataset (updated daily)

    • kaggle.com
    zip
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishav Sharma (2024). YouTube Trending Video Dataset (updated daily) [Dataset]. https://www.kaggle.com/rsrishav/youtube-trending-video-dataset
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 15, 2024
    Authors
    Rishav Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    This dataset is a daily record of the top trending YouTube videos and it will be updated daily.

    Context

    YouTube maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments and likes). Note that they’re not the most-viewed videos overall for the calendar year”.

    Note that this dataset is a structurally improved version of this dataset.

    Content

    This dataset includes several months (and counting) of data on daily trending YouTube videos. Data is included for the IN, US, GB, DE, CA, FR, RU, BR, MX, KR, and JP regions (India, USA, Great Britain, Germany, Canada, France, Russia, Brazil, Mexico, South Korea, and, Japan respectively), with up to 200 listed trending videos per day.

    Each region’s data is in a separate file. Data includes the video title, channel title, publish time, tags, views, likes and dislikes, description, and comment count.

    The data also includes a category_id field, which varies between regions. To retrieve the categories for a specific video, find it in the associated JSON. One such file is included for each of the 11 regions in the dataset.

    For more information on specific columns in the dataset refer to the column metadata.

    Acknowledgements

    This dataset was collected using the YouTube API. This dataset is the updated version of Trending YouTube Video Statistics.

    Inspiration

    Possible uses for this dataset could include: - Sentiment analysis in a variety of forms - Categorizing YouTube videos based on their comments and statistics. - Training ML algorithms like RNNs to generate their own YouTube comments. - Analyzing what factors affect how popular a YouTube video will be. - Statistical analysis over time .

    For further inspiration, see the kernels on this dataset!

  9. A

    ‘5-Minute Crafts: Video Clickbait Titles?’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘5-Minute Crafts: Video Clickbait Titles?’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-5-minute-crafts-video-clickbait-titles-7f86/72f4a841/?iid=022-316&v=presentation
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘5-Minute Crafts: Video Clickbait Titles?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shivamb/5minute-crafts-video-views-dataset on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    5-Minute Crafts - Youtube ClickBait Titles?

    5-Minute Crafts is a DIY-style YouTube channel owned by TheSoul Publishing. As of October 2021, it is the 9th most-subscribed channel on the platform, It is also one of the most viewed channels. The channel has drawn criticism for unusual and potentially dangerous life hacks and its reliance on clickbait. Irrespective of the criticism, 5-Minute Crafts videos do get a lot of views.

    In this dataset, a complete record of video titles from 5-Minute Craft youtube channels is collected along with many other meta-features of the titles. It also contains - total video views, duration, active since, sentiment, etc.

    Key Tasks

    Use this dataset to perform the following different types. of analysis and modeling - 1. Relation of different words used in the titles and total views garnered 2. Which features of a video title are most important with respect. total views 3. Is there any correlation between title meta-features, total views, duration, and sentiment?

    --- Original source retains full ownership of the source dataset ---

  10. l

    YouTube RPM by Niche (2025)

    • learningrevolution.net
    html
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jawad Khan (2025). YouTube RPM by Niche (2025) [Dataset]. https://www.learningrevolution.net/how-much-money-does-youtube-pay-for-1-million-views/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 23, 2025
    Dataset provided by
    Learning Revolution
    Authors
    Jawad Khan
    Area covered
    YouTube
    Variables measured
    Gaming, Travel, Finance, Education, Technology, Memes/Vlogs
    Description

    This dataset provides estimated YouTube RPM (Revenue Per Mille) ranges for different niches in 2025, based on ad revenue earned per 1,000 monetized views.

  11. Pakistani Top 1000 Youtubers in 2022

    • kaggle.com
    Updated Oct 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anees Ayoub (2022). Pakistani Top 1000 Youtubers in 2022 [Dataset]. https://www.kaggle.com/datasets/aneesayoub/pakistani-top-1000-youtubers-in-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 26, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anees Ayoub
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Pakistan
    Description

    Top 1000 YouTubers in World's 5th largest country (According to population) Pakistan. This Data contains the Total Views of the Channel, Channel Category, Number of subscribers, and Total Videos on the Channel.

    # Inspiration

    I want to see what Pakistanis are watching.

    channel_name : Name of YouTube Channel

    subscribers : Total No. of Subscribers

    total_views : Total Views of All Videos

    total_videos : Total video content of a channel

    category : Category of YouTube Channel like education , food etc

    started : Starting Year of Channel.

  12. 5-Minute Crafts: Video Clickbait Titles?

    • kaggle.com
    Updated Nov 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Bansal (2021). 5-Minute Crafts: Video Clickbait Titles? [Dataset]. https://www.kaggle.com/shivamb/5minute-crafts-video-views-dataset/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2021
    Dataset provided by
    Kaggle
    Authors
    Shivam Bansal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    5-Minute Crafts - Youtube ClickBait Titles?

    5-Minute Crafts is a DIY-style YouTube channel owned by TheSoul Publishing. As of October 2021, it is the 9th most-subscribed channel on the platform, It is also one of the most viewed channels. The channel has drawn criticism for unusual and potentially dangerous life hacks and its reliance on clickbait. Irrespective of the criticism, 5-Minute Crafts videos do get a lot of views.

    In this dataset, a complete record of video titles from 5-Minute Craft youtube channels is collected along with many other meta-features of the titles. It also contains - total video views, duration, active since, sentiment, etc.

    Key Tasks

    Use this dataset to perform the following different types. of analysis and modeling - 1. Relation of different words used in the titles and total views garnered 2. Which features of a video title are most important with respect. total views 3. Is there any correlation between title meta-features, total views, duration, and sentiment?

  13. e

    Videos published by the top 25 indie animation channels on Youtube...

    • b2find.eudat.eu
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Videos published by the top 25 indie animation channels on Youtube (2006-2018) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/7674df7b-b487-574b-b0e0-8064d4fa281b
    Explore at:
    Dataset updated
    Feb 13, 2025
    Area covered
    YouTube
    Description

    The file stores the record of the references that were used as units of analysis in the research that resulted in the publication “Is the YouTube Animation Algorithm-Friendly? How YouTube's Algorithm Influences the Evolution of Animation Production on the Internet”. The data set consists of 3,376 videos published by the 25 channels, which total 8,822,179,453 views from the day of publication to the day of sampling.

  14. YouTube NSI Captioning Dataset

    • zenodo.org
    bin, csv
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lloyd May; Lloyd May; Keita Ohshiro; Keita Ohshiro; Khang Dang; Khang Dang; Sripathi Sridhar; Sripathi Sridhar; Jhanvi Pai; Jhanvi Pai; Magdalena Fuentes; Magdalena Fuentes; Sooyeon Lee; Sooyeon Lee; Mark Cartwright; Mark Cartwright (2024). YouTube NSI Captioning Dataset [Dataset]. http://doi.org/10.5281/zenodo.10681804
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lloyd May; Lloyd May; Keita Ohshiro; Keita Ohshiro; Khang Dang; Khang Dang; Sripathi Sridhar; Sripathi Sridhar; Jhanvi Pai; Jhanvi Pai; Magdalena Fuentes; Magdalena Fuentes; Sooyeon Lee; Sooyeon Lee; Mark Cartwright; Mark Cartwright
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Version 1.0, March 2024

    Created by

    Lloyd May (1), Keita Ohshiro (2,3), Khang Dang (2,3), Sripathi Sridhar (2,3), Jhanvi Pai (2,3), Magdalena Fuentes (4), Sooyeon Lee (3), Mark Cartwright (2,3,4)

    1. Center for Computer Research in Music and Acoustics, Stanford University
    2. Sound Interaction and Computing Lab, New Jersey Institute of Technology
    3. Department of Informatics, New Jersey Institute of Technology
    4. Music and Audio Research Lab, New York University

    Publication

    If using this data in an academic work, please reference the DOI and version, as well as cite the following paper, which presented the data collection procedure and the first version of the dataset:

    May, L., Ohshiro, K., Dang, K., Sridhar, S., Pai, J., Fuentes, M., Lee, S., Cartwright, M. Unspoken Sound: Identifying Trends in Non-Speech Audio Captioning on YouTube. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), 2024.

    Description

    The YouTube NSI Captioning Dataset was developed to analyze the contemporary and historical state of non-speech information (NSI) captioning on YouTube. NSI includes information about non-speech sounds such as environmental sounds, sound effects, incidental sounds, and music, as well as additional narrative information and extra-speech information (ESI), which gives context to spoken or signed language such as manner of speech (e.g. "[Whispering] Oh no") or speaker label (e.g., "[Juan] Oh no"). The dataset contains measures of estimated and annotated NSI in the captions of two different samples of videos: a popular video sample and a studio video sample. The aim of the popular sample is to understand the captioning practices in a broad spectrum of popular, impactful videos on YouTube. In contrast, the aim of the studio sample is to examine captioning practices among the top-tier production houses, often viewed as industry benchmarks due to their influence and vast resources available for accessibility. Using the YouTube API, we queried for videos in these two samples for each month from 2013 to 2022. We then estimated which captions contain NSI by searching for non-alphanumeric symbols that are indicative of NSI, e.g., "[" and "]" (see Section 3.2 of the paper for a full list). In addition, the research team manually annotated which captions have NSI from a subset of approximately 1800 videos from years 2013, 2018, and 2022. Please see the Section 3.3 of the paper for details of the annotation process.

    The resulting YouTube NSI Captioning Dataset consists of NSI information from ~715k videos containing ~273M lines of captions, ~ 6M of which are estimated instances of NSI. These videos span 10 years and 21 topics. The annotated subset consists of 1799 videos with a total of ~36k annotated captions lines, ~114k of which are instances of NSI annotated on 7 different categories. These videos span 3 years (2013, 2018, and 2022) and 20 YouTube-assigned topics. Each video was annotated by two annotators along with the consensus annotation. The dataset contains the links to the YouTube videos, video metadata from the YouTube API, and measures of both estimated and annotated NSI. Due to copyright concerns, we are only publicly releasing data consisting of summary NSI measures for each video. If you need access to the raw data used to create these summary NSI measures, contact Mark Cartwright at mark.cartwright@njit.edu.

    Files

    • estimated_full_set_aggregate.csv : Data file containing the full set of video data with measures of estimated NSI.

    • annotated_subset_aggregate.csv : Data file containing the smaller annotated subset of video data with measures of both annotated and estimated NSI.

    Columns

    The following columns are present in both data files.

    • video_id : The YouTube video ID

    • year : The year associated with the time period from which the video was sampled.

    • sample : The sample which the video is from (i.e., popular or studio)

    • sampling_period_start_date : The start date of the time period from which the video was sampled.

    • sampling_period_end_date : The end date of the time period from which the video was sampled.

    • caption_type : This can take one of three values: auto which indicates a caption was provided by YouTube's automated caption system, manual which indicates a caption was provided by the uploader, or none which indicates that no captions are present for the video.

    • duration_minutes : The duration of the video in minutes.

    • channel_id : The ID that YouTube uses to uniquely identify the channel.

    • published_datetime : The date and time at which the video was published on YouTube.

    • youtube_topics : The YouTube-provided list of Wikipedia URLs that provide a description of the video's content.

    • category_id : The YouTube video category associated with the video.

    • view_count : The count of views on YouTube at the time of sampling (Spring 2023).

    • like_count : The count of likes on YouTube at the time of sampling (Spring 2023).

    • comment_count : The count of comments on YouTube at the time of sampling (Spring 2023).

    • high_level_topics : List of topics at a higher semantic level than youtube_topics that provide a description of the video's content. See paper for details on the mapping between youtube_topics and high_level_topics.

    • : The remainder of the columns take this form with the values listed below.

    Values for :

    • estimated_nsi : This NSI type is an estimation of NSI based on the presence of particular non-alphanumeric characters that are indicative of NSI as described in Section 3.2 of the paper.

    • general_nsi (only in annotated_subset_aggregate.csv) : The most general of NSI types that is inclusive of music_nsi, environmental_nsi, additionalnarrativ_nsi, and quotedspeech_nsi. All of these NSI types are included in the calculation of measures associated with general_nsi. Note that misc_nsi and nonenglish_captions are not included as those may or may not contain NSI, and thus, we opt for precision over recall. Not present for the unlabeled

    • music_nsi (only in annotated_subset_aggregate.csv) : Any genre of music, whether diegetic or not.

    • environmental_nsi (only in annotated_subset_aggregate.csv) : Environmental sounds, sound effects, and incidental sounds, i.e., non-music and non-speech sounds. This includes non-verbal vocalizations like laughter, grunts, and crying, provided they aren't used to modify speech.

    • extraspeech_nsi (only in annotated_subset_aggregate.csv) : Extra-speech Information (ESI), i.e., text that gives added context to spoken or signed language.

    • additionalnarrative_nsi (only in annotated_subset_aggregate.csv) : Additional narrative information in the form of descriptive text that doesn't pertain directly to sounds.

    • quotedspeech_nsi (only in annotated_subset_aggregate.csv) : Quoted Speech Captions containing internal quotation marks.

    • misc_nsi (only in annotated_subset_aggregate.csv) : Unsure, misc, or ambiguous, i.e., instances where the appropriate label is unclear or the caption doesn't fit current categories.

    • nonenglish_captions (only in annotated_subset_aggregate.csv) : Captions not written in English and thus have uncertain NSI status.

    Values for :

    • count : The number of captions identified as containing NSI of the specified type in the video.

    • presence : Indication of whether there is NSI of the specified type present in the video. 1 if present (e.g., count > 0), 0 if not present (e.g., count==0).

    • count_per_minute : A measure of the density of NSI captions. count_per_min = count / duration_minutes

    • count_per_minute_if_present : If presence==1, then count_per_minute, else, NaN. This is used for computing the aggregate CPMIP measure, which as discussed in the paper is intended to be a measure of the quality of NSI captions based on the assumption that more frequently captioned NSI within a video is an indicator of better NSI captioning. See Section 5 of the paper for details.

    Conditions of use

    Dataset created by Lloyd May, Keita Ohshiro, Khang Dang, Sripathi Sridhar, Jhanvi Pai, Magdalena Fuentes, Sooyeon Lee, and Mark Cartwright

    The YouTube NSI Captioning Dataset dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/

    Feedback

    Please help us improve YouTube NSI Captioning Dataset by sending your feedback to:

    • Mark

  15. f

    Table S1 - The Potential of Accelerating Early Detection of Autism through...

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent A. Fusaro; Jena Daniels; Marlena Duda; Todd F. DeLuca; Olivia D’Angelo; Jenna Tamburello; James Maniscalco; Dennis P. Wall (2023). Table S1 - The Potential of Accelerating Early Detection of Autism through Content Analysis of YouTube Videos [Dataset]. http://doi.org/10.1371/journal.pone.0093533.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Vincent A. Fusaro; Jena Daniels; Marlena Duda; Todd F. DeLuca; Olivia D’Angelo; Jenna Tamburello; James Maniscalco; Dennis P. Wall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Details on the 100 YouTube videos. Forty-five autism spectrum disorder (ASD) and 55 non-ASD samples were collected in total. The Table contains the original download URL, the self-reported diagnosis, the gender and age of the subject, length of video, and the total number of YouTube views (as of 12/10/13). (XLSX)

  16. Professional Shogi Players Youtube Channel Data

    • kaggle.com
    Updated Aug 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satoshi_S (2022). Professional Shogi Players Youtube Channel Data [Dataset]. https://www.kaggle.com/datasets/satoshiss/professional-shogi-players-youtube-channel-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Satoshi_S
    Area covered
    YouTube
    Description

    Context

    Dataset was made from 17 professional shogi players' Youtube channels with Youtube Data API. I made a dataset from one of the channels before with Selenium on https://www.kaggle.com/datasets/satoshiss/shogi-channels-data.

    If you are interested in Shogi(Japanese Chess), please check any videos listed.

    Content

    The channel stats file has overall stats for each youtube channel and the video_details file have information on each video including title, views, likes, comment counts, tags, description, and published date.

  17. d

    YouTube & Google Maps Data | 21+ Attributes | Channel metrics, Creator Info,...

    • datarade.ai
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Exellius Systems (2024). YouTube & Google Maps Data | 21+ Attributes | Channel metrics, Creator Info, Video Metrics | Google My Business Rating, Maps | Social Media Data [Dataset]. https://datarade.ai/data-products/youtube-google-maps-data-20-attributes-channel-metrics-exellius-systems
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    May 27, 2024
    Dataset authored and provided by
    Exellius Systems
    Area covered
    Mayotte, Taiwan, Bonaire, Cameroon, Lesotho, Burkina Faso, Sao Tome and Principe, Honduras, Jersey, United Kingdom, YouTube
    Description

    Our dataset offers a unique blend of attributes from YouTube and Google Maps, empowering users with comprehensive insights into online content and geographical reach. Let's delve into what makes our data stand out:

    Unique Attributes: - From YouTube: Detailed video information including title, description, upload date, video ID, and channel URL. Video metrics such as views, likes, comments, and duration are also provided. - Creator Info: Access author details like name and channel URL. - Channel Information: Gain insights into channel title, description, location, join date, and visual branding elements like logo and banner URLs. - Channel Metrics: Understand a channel's performance with metrics like total views, subscribers, and video count. - Google Maps Integration: Explore business ratings from Google My Business and location data from Google Maps.

    Data Sourcing: - Our data is meticulously sourced from publicly available information on YouTube and Google Maps, ensuring accuracy and reliability.

    Primary Use-Cases: - Marketing: Analyze video performance metrics to optimize content strategies. - Research: Explore trends in creator behavior and audience engagement. - Location-Based Insights: Utilize Google Maps data for market research, competitor analysis, and location-based targeting.

    Fit within Broader Offering: - This dataset complements our broader data offering by providing rich insights into online content consumption and geographical presence. It enhances decision-making processes across various industries, including marketing, advertising, research, and business intelligence.

    Usage Examples: - Marketers can identify popular video topics and optimize advertising campaigns accordingly. - Researchers can analyze audience engagement patterns to understand viewer preferences. - Businesses can assess their Google My Business ratings and geographical distribution for strategic planning.

    With scalable solutions and high-quality data, our dataset offers unparalleled depth for extracting actionable insights and driving informed decisions in the digital landscape.

  18. Most Streamed Spotify Songs 2024

    • kaggle.com
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2024). Most Streamed Spotify Songs 2024 [Dataset]. http://doi.org/10.34740/kaggle/dsv/8700156
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description

    This dataset presents a comprehensive compilation of the most streamed songs on Spotify in 2024. It provides extensive insights into each track's attributes, popularity, and presence on various music platforms, offering a valuable resource for music analysts, enthusiasts, and industry professionals. The dataset includes information such as track name, artist, release date, ISRC, streaming statistics, and presence on platforms like YouTube, TikTok, and more.

    DOI

    Here is the link for the 2023 data: "https://www.kaggle.com/datasets/nelgiriyewithana/top-spotify-songs-2023">Most Streamed Spotify Songs 2023 🟢

    Key Features

    • Track Name: Name of the song.
    • Album Name: Name of the album the song belongs to.
    • Artist: Name of the artist(s) of the song.
    • Release Date: Date when the song was released.
    • ISRC: International Standard Recording Code for the song.
    • All Time Rank: Ranking of the song based on its all-time popularity.
    • Track Score: Score assigned to the track based on various factors.
    • Spotify Streams: Total number of streams on Spotify.
    • Spotify Playlist Count: Number of Spotify playlists the song is included in.
    • Spotify Playlist Reach: Reach of the song across Spotify playlists.
    • Spotify Popularity: Popularity score of the song on Spotify.
    • YouTube Views: Total views of the song's official video on YouTube.
    • YouTube Likes: Total likes on the song's official video on YouTube.
    • TikTok Posts: Number of TikTok posts featuring the song.
    • TikTok Likes: Total likes on TikTok posts featuring the song.
    • TikTok Views: Total views on TikTok posts featuring the song.
    • YouTube Playlist Reach: Reach of the song across YouTube playlists.
    • Apple Music Playlist Count: Number of Apple Music playlists the song is included in.
    • AirPlay Spins: Number of times the song has been played on radio stations.
    • SiriusXM Spins: Number of times the song has been played on SiriusXM.
    • Deezer Playlist Count: Number of Deezer playlists the song is included in.
    • Deezer Playlist Reach: Reach of the song across Deezer playlists.
    • Amazon Playlist Count: Number of Amazon Music playlists the song is included in.
    • Pandora Streams: Total number of streams on Pandora.
    • Pandora Track Stations: Number of Pandora stations featuring the song.
    • Soundcloud Streams: Total number of streams on Soundcloud.
    • Shazam Counts: Total number of times the song has been Shazamed.
    • TIDAL Popularity: Popularity score of the song on TIDAL.
    • Explicit Track: Indicates whether the song contains explicit content.

    Potential Use Cases

    • Music Analysis: Analyze trends in audio features to understand popular song characteristics.
    • Platform Comparison: Compare song popularity across different music platforms.
    • Artist Impact: Study the relationship between artist attributes and song success.
    • Temporal Trends: Identify changes in music attributes and preferences over time.
    • Cross-Platform Presence: Investigate song performance across various streaming services.

    Your support through an upvote would be greatly appreciated if you find this dataset useful! ❤️🙂 Thank you.

  19. e

    DEyeAdicContact - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Oct 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). DEyeAdicContact - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/a5b5c1d4-f3bf-54d0-891d-4b285cf94b55
    Explore at:
    Dataset updated
    Oct 22, 2023
    Description

    We created our own dataset of natural dyadic interactions with fine-grained eye contact annotations using videos of dyadic interviews published on YouTube. Especially compared to lab-based recordings, these Youtube interviews allow us to analyse behaviour in a natural situation. All interviews were conducted via video conferencing and provide frontal views of interviewer and interviewee side-by-side. Specifically, we downloaded videos from the YouTube channels “Wisdom From North” and “The Spa Dr.” that both provide a large number of interviews, often with a high video quality. Each channel features a single host interviewing different guests in each session. We manually selected videos with high video quality, resulting in 60 videos for “The Spa Dr.” and 61 videos for “Wisdom From North”. All videos are recorded at a frame rate between 24 and 30 fps and vary in length from 17 minutes to 58 minutes (average: 37 minutes). In total the videos contain 74 hours of conversations, amounting to 7,817,821 video frames. We instructed five human annotators to classify the gaze of interviewer and interviewee (in the following referred to as “subjects”). Even though in this study we were only interested in a binary classification of averted gaze versus eye contact, a more fine-grained distinction of averted gaze might prove beneficial for future research. To this end we used in total 11 mutually exclusive classes during annotation. Annotators were asked to select the class “eye contact” if the subject was looking at the location of the other person on her screen or the camera from which she was recorded. We found that annotators were able to reliably determine the placements of camera and screen by skimming through the video prior to starting the annotation. If there was no eye contact, annotators classified whether the subject gazed “up”,“down”, “left”, “right”, or to the “upper left”, “lower left”, ”upper right” or “lower right”. In the following, we refer to the union of these classes as the “no eye contact class”. A separate class was dedicated to blinks, while yet another class indicated instances in which annotators were unsure about how to decide, e.g. as a result of low image quality. As annotators worked on disjoint sets of videos, one of the authors was present throughout the first sessions in order to ensure consistency. To strike a good balance between sufficient coverage and annotation effort, we collected these annotations on a frame-by-frame basis every 30 seconds for the Wisdom From North interviews, and every 15 seconds for The Spa Dr. interviews. We collected annotations for The Spa Dr. on a finer timescale given that the host of that channel almost always keeps eye contact with her interviewees. A coarser time scale would have increased the risk of missing the no eye contact classes in the annotation. In total, we collected 23,131 annotated video frames of which 83% were labelled as "eye contact".

  20. C

    Videos published by the top 25 indie animation channels on Youtube...

    • dataverse.csuc.cat
    csv, txt
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xavier Ribes Guàrdia; Xavier Ribes Guàrdia (2025). Videos published by the top 25 indie animation channels on Youtube (2006-2018) [Dataset]. http://doi.org/10.34810/data1987
    Explore at:
    txt(6795), csv(4127420)Available download formats
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    CORA.Repositori de Dades de Recerca
    Authors
    Xavier Ribes Guàrdia; Xavier Ribes Guàrdia
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    The file stores the record of the references that were used as units of analysis in the research that resulted in the publication “Is the YouTube Animation Algorithm-Friendly? How YouTube's Algorithm Influences the Evolution of Animation Production on the Internet”. The data set consists of 3,376 videos published by the 25 channels, which total 8,822,179,453 views from the day of publication to the day of sampling.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). YouTube users worldwide 2020-2029 [Dataset]. https://www.statista.com/forecasts/1144088/youtube-users-in-the-world
Organization logo

YouTube users worldwide 2020-2029

Explore at:
51 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 7, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide, YouTube
Description

The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.

Search
Clear search
Close search
Google apps
Main menu