The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
YouTube was launched in 2005. It was founded by three PayPal employees: Chad Hurley, Steve Chen, and Jawed Karim, who ran the company from an office above a small restaurant in San Mateo. The first...
In 2024, users engaged more with the videos they watched on YouTube compared to the previous year. The number of average interactions on YouTube grew to 2.36 in the last measured year. This is an increase compared to 2023, when the number of comments, likes, and share on pieces of content hosted on YouTube was of approximately 2.1 interactions on average.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all comments (comments and replies) of the YouTube vision video "Tunnels" by "The Boring Company" fetched on 2020-10-13 using YouTube API. The comments are classified manually by three persons. We performed a single-class labeling of the video comments regarding their relevance for requirement engineering (RE) (ham/spam), their polarity (positive/neutral/negative). Furthermore, we performed a multi-class labeling of the comments regarding their intention (feature request and problem report) and their topic (efficiency and safety). While a comment can only be relevant or not relevant and have only one polarity, a comment can have one or more intentions and also one or more topics.
For the replies, one person also classified them regarding their relevance for RE. However, the investigation of the replies is ongoing and future work.
Remark: For 126 comments and 26 replies, we could not determine the date and time since they were no longer accessible on YouTube at the time this data set was created. In the case of a missing date and time, we inserted "NULL" in the corresponding cell.
This data set includes the following files:
Dataset.xlsx contains the raw and labeled video comments and replies:
For each comment, the data set contains:
ID: An identification number generated by YouTube for the comment
Date: The date and time of the creation of the comment
Author: The username of the author of the comment
Likes: The number of likes of the comment
Replies: The number of replies to the comment
Comment: The written comment
Relevance: Label indicating the relevance of the comment for RE (ham = relevant, spam = irrelevant)
Polarity: Label indicating the polarity of the comment
Feature request: Label indicating that the comment request a feature
Problem report: Label indicating that the comment reports a problem
Efficiency: Label indicating that the comment deals with the topic efficiency
Safety: Label indicating that the comment deals with the topic safety
For each reply, the data set contains:
ID: The identification number of the comment to which the reply belongs
Date: The date and time of the creation of the reply
Author: The username of the author of the reply
Likes: The number of likes of the reply
Comment: The written reply
Relevance: Label indicating the relevance of the reply for RE (ham = relevant, spam = irrelevant)
Detailed analysis results.xlsx contains the detailed results of all ten times repeated 10-fold cross validation analyses for each of all considered combinations of machine learning algorithms and features
Guide Sheet - Multi-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual multi-class labeling
Guide Sheet - Single-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual single-class labeling
Python scripts for analysis.zip contains the scripts (as jupyter notebooks) and prepared data (as csv-files) for the analyses
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
With top ~5000 youtube channel ID, You can facilitate extracting data like videos and playlist from top creators.
The dataset is pretty much simple, consists of 2 columns of name and channel_id, name represents given name of channel (Around 200 of them is concat with '...' and not full name) and channel_id represent the search first search result from YouTube Search API sorting by relevant.
Thanks to socialblade https://www.kaggle.com/mdhrumil/top-5000-youtube-channels-data-from-socialblade dataset
Following data can be used to dig down data from most active channels in YouTube.
In 2021, YouTube's user base in the United Kingdom amounts to approximately ***** million users. The number of YouTube users in the United Kingdom is projected to reach ***** million users by 2025. User figures have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
I was finding a specific dataset but never got one.
This is a text dataset focussing on the top comments on the best youtube videos (views>1B)
I wanna thank youtube api for helping me, lol and mongo db where I stored all the raw data.
I shared this dataset to see how the world will react and what will people do with this dataset. I hope this helps me learn more about NLP and ML
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The COVYT dataset contains speech samples from individuals who self-reported their COVID-19 infection on public social media platforms (YouTube, Xiaohongshu). These videos, as well as accompanying videos of the same people prior to infection, were mined in an attempt to gather publicly-available data for COVID-19 research. This release includes the links to the original videos along with the accompanying manual segmentation and diarisation that identifies the utterances of the target individuals. We are additionally releasing features derived from the segmented utterances. Finally, the dataset includes partitioning information according to 4 different cross-validation schemes. See the arxiv pre-print for more details: https://arxiv.org/abs/2206.11045
In 2021, YouTube's user base in the United States amounts to approximately ****** million users. The number of YouTube users in the United States is projected to reach ****** million users by 2025. User figures have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
After getting mixed results from the news sources, I thought to analyze the Vice Presidential and Presidential debates using Data Science. The idea is to use YouTube comments as a medium to get the sentiment regarding the debate and getting insights from the data. In this analysis, we plot common phrases, common words, we also analyze sentiment and in the end for all my data science practitioners I present them a full-fledged dataset containing YouTube Comments of VP and Presidential debates.
Why: After getting mixed results from the news sources about the outcome of the debate, I decided to use data science to help me see the outcome of the result. With the elections around the corner, technology or to be precise analytics plays a key role in shaping our thoughts and supporting our hypothesis. How: To Analyze YouTube Comments we use Python and various other NLP Libraries followed by some data visualization tools. We will use the wonders of the awesome data wrangling library known as Pandas and we hope to find some interesting insights.
The dataset contains comments (YT comment scraped) and a sentiment calculated using the TextBlob library.
YouTube data API
Youtube-vis is a video instance segmentation dataset. It contains 2,883 high-resolution YouTube videos, a per-pixel category label set including 40 common objects such as person, animals and vehicles, 4,883 unique video instances, and 131k high-quality manual annotations.
The YouTube-VIS dataset is split into 2,238 training videos, 302 validation videos and 343 test videos.
No files were removed or altered during preprocessing.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('youtube_vis', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
News dissemination plays a vital role in supporting people to incorporate beneficial actions during public health emergencies, thereby significantly reducing the adverse influences of events. Based on big data from YouTube, this research study takes the declaration of COVID-19 National Public Health Emergency (PHE) as the event impact and employs a DiD model to investigate the effect of PHE on the news dissemination strength of relevant videos. The study findings indicate that the views, comments, and likes on relevant videos significantly increased during the COVID-19 public health emergency. Moreover, the public’s response to PHE has been rapid, with the highest growth in comments and views on videos observed within the first week of the public health emergency, followed by a gradual decline and returning to normal levels within four weeks. In addition, during the COVID-19 public health emergency, in the context of different types of media, lifestyle bloggers, local media, and institutional media demonstrated higher growth in the news dissemination strength of relevant videos as compared to news & political bloggers, foreign media, and personal media, respectively. Further, the audience attracted by related news tends to display a certain level of stickiness, therefore this audience may subscribe to these channels during public health emergencies, which confirms the incentive mechanisms of social media platforms to foster relevant news dissemination during public health emergencies. The proposed findings provide essential insights into effective news dissemination in potential future public health events.
In 2023, all the analyzed channels with an audience between 50,000 and 55 million subscribers had over 418,000 disliked on YouTube, against the approximately 17 million likes recorded in 2023. In comparison, all the tiny accounts analyzed - which had up to 500 subscribers - managed to accumulate a total of one million likes, as well as 53,600 dislikes and 41,430 comments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RealVAD: A Real-world Dataset for Voice Activity Detection
The task of automatically detecting “Who is Speaking and When” is broadly named as Voice Activity Detection (VAD). Automatic VAD is a very important task and also the foundation of several domains, e.g., human-human, human-computer/ robot/ virtual-agent interaction analyses, and industrial applications.
RealVAD dataset is constructed from a YouTube video composed of a panel discussion lasting approx. 83 minutes. The audio is available from a single channel. There is one static camera capturing all panelists, the moderator and audiences.
Particular aspects of RealVAD dataset are:
The annotations includes:
All info regarding the annotations are given in the ReadMe.txt and Acoustic Features README.txt files.
When using this dataset for your research, please cite the following paper in your publication:
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
I never look at a group’s chart until after I’ve fallen for their music. But once that happens, my astrologer brain kicks in. Was there something in the stars that day? This project is my way of testing that idea, using data from 120 K-pop groups.
What’s in the dataset?
Astrological data: Sun signs, moon signs, rising signs (when available), planetary retrogrades, and moon phases at debut
Career metrics: PAKs, music show wins, physical album sales, YouTube views
Time reliability: "Reliable" (verified debut time) or "Unreliable" (date only)
For years, I’ve casually tracked K-pop debuts (read: my YouTube history is 60% comeback stages, 30% astrology videos). When I started learning data analysis, I realized I could finally ask properly: do certain planetary alignments show up more often in "successful" groups? No mysticism. Just dates, numbers, and a lot of spreadsheet tabs.
How the data was collected
Group info and career stats come from Kpopping and SoriData
Debut times were taken from YouTube when available (for newer groups)
For older groups, exact debut times are often unavailable because many didn’t debut with YouTube videos in the early years
All astrological calculations were done using Astro-Seek’s calculator with Seoul as the default location
Some interesting notes
Leo sun signs appear frequently among award-winning boy groups
Want to explore?
Compare different generations: Are 4th-gen groups more likely to have certain signs?
Check if Mercury retrograde at debut had any impact on a group’s early success
This isn’t about proving astrology works. It’s about exploring whether patterns exist between the stars and K-pop success. The data is here for you to analyze and draw your own conclusions.
P.S. If your bias’s Moon sign matches yours… welcome to the "wait, why do I feel so seen?" club.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
These are the published date of music videos of every song in
https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking
Most of the time, music videos published dates are same as music themselves.
It would be valid to use the dates as release dates.
There are no other sources better than youtube to cover as much songs as possible.
The file contains no header
20 songs remained Nan (unavailable to find related videos)
This data was retrieved by Youtube API
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a list of Top 10 YouTubers worldwide having more subscribers. From this dataset researchers can analyze the following: - Which country use more YouTube - Which type of content people want to watch - Which age use more social media - How much they earn from YouTube - What is the top trending in 2022
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://snap.stanford.edu/data/com-Youtube.html
Dataset information
Youtube (http://www.youtube.com/) is a video-sharing web site that includes
a social network. In the Youtube social network, users form friendship each
other and users can create groups which other users can join. We consider
such user-defined groups as ground-truth communities. This data is provided
by Alan Mislove et al.
(http://socialnetworks.mpi-sws.org/data-imc2007.html)
We regard each connected component in a group as a separate ground-truth
community. We remove the ground-truth communities which have less than 3
nodes. We also provide the top 5,000 communities with highest quality
which are described in our paper (http://arxiv.org/abs/1205.6233). As for
the network, we provide the largest connected component.
Network statistics
Nodes 1,134,890
Edges 2,987,624
Nodes in largest WCC 1134890 (1.000)
Edges in largest WCC 2987624 (1.000)
Nodes in largest SCC 1134890 (1.000)
Edges in largest SCC 2987624 (1.000)
Average clustering coefficient 0.0808
Number of triangles 3056386
Fraction of closed triangles 0.002081
Diameter (longest shortest path) 20
90-percentile effective diameter 6.5
Community statistics
Number of communities 8,385
Average community size 13.50
Average membership size 0.10
Source (citation)
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based
on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233
Files
File Description
com-youtube.ungraph.txt.gz Undirected Youtube network
com-youtube.all.cmty.txt.gz Youtube communities
com-youtube.top5000.cmty.txt.gz Youtube communities (Top 5,000)
The graph in the SNAP data set is 1-based, with nodes numbered 1 to
1,157,827.
In the SuiteSparse Matrix Collection, Problem.A is the undirected Youtube
network, a matrix of size n-by-n with n=1,134,890, which is the number of
unique user id's appearing in any edge.
Problem.aux.nodeid is a list of the node id's that appear in the SNAP data
set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
node id's are the same as the SNAP data set (1-based).
C = Problem.aux.Communities_all is a sparse matrix of size n by 16,386
which represents the communities in the com-youtube.all.cmty.txt file.
The kth line in that file defines the kth community, and is the column
C(:,k), where C(i,k)=1 if person nodeid(i) is in the kth community. Row
C(i,:) and row/column i of the A matrix thus refer to the same person,
nodeid(i).
Ctop = Problem.aux.Communities_top5000 is n-by-5000, with the same
structure as the C array above, with the content of the
com-youtube.top5000.cmty.txt.gz file.
This data set was prepared from 88 open-source YouTube cooking videos. The YouCook dataset contains videos of people cooking various recipes. The videos were downloaded from YouTube and are all in the third-person viewpoint; they represent a significantly more challenging visual problem than existing cooking and kitchen datasets (the background kitchen/scene is different for many and most videos have dynamic camera changes). In addition, frame-by-frame object and action annotations are provided for training data (as well as a number of precomputed low-level features). Finally, each video has a number of human provided natural language descriptions (on average, there are eight different descriptions per video). This dataset has been created to serve as a benchmark in describing complex real-world videos with natural language descriptions.
In 2024, the engagement rate on YouTube content experienced a small decrease compared to the previous year. The average engagement rate on YouTube was of 3.87 percent in the last examined period, down from the 3.97 percent recorded in 2023.
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.