The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total 232.5 million users (+24.91 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 1.2 billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the past few years YouTube has became a popular site for video broadcasting and earning money by publishing various different skills in the form of videos. For some people it has become a main source to earn money. Getting the videos trending among the viewers is one of the major tasks which each and every content creator wants. Popularity of any video and its reach to the audience is completely based on YouTube's Recommendation algorithm. This document is a dataset descriptor for the dataset collected over the time span of about 45 days during the Israel-Hamas War
This data set was prepared from 88 open-source YouTube cooking videos. The YouCook dataset contains videos of people cooking various recipes. The videos were downloaded from YouTube and are all in the third-person viewpoint; they represent a significantly more challenging visual problem than existing cooking and kitchen datasets (the background kitchen/scene is different for many and most videos have dynamic camera changes). In addition, frame-by-frame object and action annotations are provided for training data (as well as a number of precomputed low-level features). Finally, each video has a number of human provided natural language descriptions (on average, there are eight different descriptions per video). This dataset has been created to serve as a benchmark in describing complex real-world videos with natural language descriptions.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
When modelling for the social we need to consider more than one medium. Little is known as to how platform community characteristics shape the discussion and how communicators could best engage each community, taking into consideration these characteristics. In this dataset, we consider comments on TED videos featuring roboticists, shared at TED.com and YouTube. The textual comments were then subjected to analysis via the Linguistic Inquiry and Word Count tool (LIWC).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I have created this dataset for people interested in League of Legends who want to approach the game from a more analytical side.
Most of the data was acquired from Games of Legends (https://gol.gg/tournament/tournament-stats/LEC%20Spring%20Season%202024/) and also from official account of the League of Legends EMEA Championship (https://www.youtube.com/c/LEC)
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We present an English YouTube dataset manually annotated for hate speech types and targets. The comments to be annotated were sampled from the English YouTube comments on videos about the Covid-19 pandemic in the period from January 2020 to May 2020. Two sets were annotated: a training set with 51,655 comments (IMSyPP_EN_YouTube_comments_train.csv) and two evaluation sets, one annotated in-context (IMSyPP_EN_YouTube_comments_evaluation_context.csv), another out-of-context (IMSyPP_EN_YouTube_comments_evaluation_no_context.csv), each based on the same 10,759 comments. The dataset was annotated by 10 annotators with most (99.9%) of the comments being annotated by two annotators. It was used to train a classification model for hate speech types detection that is publicly available at the following URL: https://huggingface.co/IMSyPP/hate_speech_en.
The dataset consists of the following fields: Video_ID - YouTube ID of the video under which the comment was posted Comment_ID - YouTube ID of the comment Text - text of the comment Type - type of hate speech Target - the target of hate speech Annotator - code of the human annotator
YouTube-BoundingBoxes (YT-BB) is a large-scale data set of video URLs with densely-sampled object bounding box annotations. The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the MS COCO label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second.
This dataset provides estimated YouTube RPM (Revenue Per Mille) ranges for different niches in 2025, based on ad revenue earned per 1,000 monetized views.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
News dissemination plays a vital role in supporting people to incorporate beneficial actions during public health emergencies, thereby significantly reducing the adverse influences of events. Based on big data from YouTube, this research study takes the declaration of COVID-19 National Public Health Emergency (PHE) as the event impact and employs a DiD model to investigate the effect of PHE on the news dissemination strength of relevant videos. The study findings indicate that the views, comments, and likes on relevant videos significantly increased during the COVID-19 public health emergency. Moreover, the public’s response to PHE has been rapid, with the highest growth in comments and views on videos observed within the first week of the public health emergency, followed by a gradual decline and returning to normal levels within four weeks. In addition, during the COVID-19 public health emergency, in the context of different types of media, lifestyle bloggers, local media, and institutional media demonstrated higher growth in the news dissemination strength of relevant videos as compared to news & political bloggers, foreign media, and personal media, respectively. Further, the audience attracted by related news tends to display a certain level of stickiness, therefore this audience may subscribe to these channels during public health emergencies, which confirms the incentive mechanisms of social media platforms to foster relevant news dissemination during public health emergencies. The proposed findings provide essential insights into effective news dissemination in potential future public health events.
The Kinetics dataset is a large-scale, high-quality dataset for human action recognition in videos. The dataset consists of around 500,000 video clips covering 600 human action classes with at least 600 video clips for each action class. Each video clip lasts around 10 seconds and is labeled with a single action class. The videos are collected from YouTube.
Researchers mostly use the dataset launched for ChaLearn Looking At People First Impression Challenge (ECCV Challenge). The CVPR’17 dataset (an extension to the ECCV challenge dataset) consists of video files labelled with Big Five Personality Traits. The dataset consists of 3,000 high-definition YouTube videos featuring YouTubers speaking in English. To create the dataset, selected videos were divided into 10,000 clips with an average duration of 15 seconds. The dataset comprises three sets for training, validation, and testing, with a ratio of 3:1:1. The videos feature YouTubers from various nationalities, genders, and age groups. To label the videos with Big-Five personality traits, Amazon Mechanical Turk (AMT) was used. Each video clip was assigned a label corresponding to its Big-Five values, which range from 0 to 1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Dataset Name
A data set of images of faces of people affected with Bell's palsy (Facial palsy).
Dataset Details
Dataset Description
A data set of images of faces of people affected with Bell's palsy (Facial palsy). Created using curating and editing publically available youtube videos. Also included are images from people not affected by it, using the same method.
License: CC-BY-4.0
Uses
Can be used to train image models to detect… See the full description on the dataset page: https://huggingface.co/datasets/jasir/palsynet-data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Content The dataset contains two basic attributes from which you can extract an arrangement of exciting features, starting from DateTime-based features up to text-based features.
The first is the time in the video in which the comment was posted; it is important to note that the EST time the live stream started is 2:15.
The second is the comment that was posted; here, it is important to note that non-english comments were removed.
Inspiration I think it might be interesting to get a better understanding of how people around the world reacted to the rover landing on Mars and the content shown in the video. There were many points where the video lagged, or the site crashed.
CC0
Original Data Source: Perseverance Land on Mars YouTube Live Comments
The ShareGPT4Video dataset is a large-scale resource designed to improve video understanding and generation¹. It features 1.2 million highly descriptive captions⁴ for video clips, surpassing existing datasets in diversity and information content⁴. The captions cover a wide range of aspects, including world knowledge, object properties, spatial relationships, and aesthetic evaluations⁴.
The dataset includes detailed captions of 40K videos generated by GPT-4V¹ and 4.8M videos generated by ShareCaptioner-Video¹. The videos are sourced from YouTube and other user-uploaded video websites, and they cover a variety of scenarios, such as human activities and auto-driving¹.
The ShareGPT4Video dataset also provides a basis for the ShareCaptioner-Video, an exceptional video captioner capable of efficiently generating high-quality captions for videos of a wide range of resolution, aspect ratio, and duration¹.
For example, the dataset includes a detailed caption of a video documenting a meticulous meal preparation by an individual with tattooed forearms¹. The caption describes the individual's actions in detail, from slicing a cucumber to mixing the dressing and adding croutons to the salad¹.
In addition to its use in research, the ShareGPT4Video dataset has been used to train the sharegpt4video-8b model, an open-source video chatbot². This model was trained on open-source video instruction data and is primarily intended for researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence².
(1) arXiv:2406.04325v1 [cs.CV] 6 Jun 2024. https://arxiv.org/pdf/2406.04325. (2) ShareGPT4V: Improving Large Multi-Modal Models with Better Captions. https://arxiv.org/abs/2311.12793. (3) Lin-Chen/sharegpt4video-8b · Hugging Face. https://huggingface.co/Lin-Chen/sharegpt4video-8b. (4) ShareGPT4Video: Improving Video Understanding and Generation with .... https://www.aimodels.fyi/papers/arxiv/sharegpt4video-improving-video-understanding-generation-better-captions. (5) GitHub - ShareGPT4Omni/ShareGPT4Video: An official implementation of .... https://github.com/ShareGPT4Omni/ShareGPT4Video. (6) undefined. https://sharegpt4video.github.io/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Concise comparison of the top 10 YouTube alternatives for content creators in 2025. Covers monetization, audience size, and ideal use cases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RealVAD: A Real-world Dataset for Voice Activity Detection
The task of automatically detecting “Who is Speaking and When” is broadly named as Voice Activity Detection (VAD). Automatic VAD is a very important task and also the foundation of several domains, e.g., human-human, human-computer/ robot/ virtual-agent interaction analyses, and industrial applications.
RealVAD dataset is constructed from a YouTube video composed of a panel discussion lasting approx. 83 minutes. The audio is available from a single channel. There is one static camera capturing all panelists, the moderator and audiences.
Particular aspects of RealVAD dataset are:
It is composed of panelists with different nationalities (British, Dutch, French, German, Italian, American, Mexican, Columbian, Thai). This aspect allows studying the effect of ethnic origin variety to the automatic VAD.
There is a gender balance such that there are four female and five male panelists.
The panelists are sitting in two rows and they can be gazing audience, other panelists, their laptop, the moderator or anywhere in the room while speaking or not-speaking. Therefore, they were captured not only from frontal-view but also from side-view varying based on their instant posture and head orientation.
The panelists are moving freely and are doing various spontaneous actions (e.g., drinking water, checking their cell phone, using their laptop, etc.), resulting in different postures.
The panelists’ body parts are sometimes partially occluded by their/other's body part or belongings (e.g., laptop).
There are also natural changes of illumination and shadow rising on the wall behind the panelists in the back row.
Especially, for the panelists sitting in the front row, there is sometimes background motion occurring when the person(s) behind them moves.
The annotations includes:
The upper body detection of nine panelists in bounding box form.
Associated VAD ground-truth (speaking, not-speaking) for nine panelists.
Acoustic features extracted from the video: MFCC and raw filterbank energies.
All info regarding the annotations are given in the ReadMe.txt and Acoustic Features README.txt files.
When using this dataset for your research, please cite the following paper in your publication:
C. Beyan, M. Shahid and V. Murino, "RealVAD: A Real-world Dataset and A Method for Voice Activity Detection by Body Motion Analysis", in IEEE Transactions on Multimedia, 2020.
https://researchdata.ntu.edu.sg/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.21979/N9/5G18B1https://researchdata.ntu.edu.sg/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.21979/N9/5G18B1
Existing research on avatar creation is typically limited to laboratory datasets, which require high costs against scalability and exhibit insufficient representation of the real world. On the other hand, the web abounds with off-the-shelf real-world human videos, but these videos vary in quality and require accurate annotations for avatar creation. To this end, we propose an automatic annotating pipeline with filtering protocols to curate these humans from the web. Our pipeline surpasses state-of-the-art methods on the EMDB benchmark, and the filtering protocols boost verification metrics on web videos. We then curate WildAvatar, a web-scale in-the-wild human avatar creation dataset extracted from YouTube, with 10,000+ different human subjects and scenes. WildAvatar is at least 10x richer than previous datasets for 3D human avatar creation and closer to the real world. To explore its potential, we demonstrate the quality and generalizability of avatar creation methods on WildAvatar. We will publicly release our code, data source links and annotations to push forward 3D human avatar creation and other related fields for real-world applications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To enable the development of an objective, real-time CPR (cardiopulmonary resuscitation) quality assessment system using object detection, this specialized dataset was created. This dataset was manually compiled and annotated from existing open-source datasets and Youtube video segmentation into screenshots, focusing on critical elements such as people, hands, manikins and the CPR action.
The dataset includes five classes representing essential HandsOnlyCPR process components: "CPR_Massage", "man", "Dummy", "hands" . These main objects were determined to be the main part of tracking CPR actions, enabling the model to access enough contextual data for evaluating the quality of CPR performed using Computer vision.
The purpose of creating this dataset was to provide a targeted, clinically relevant resource for training deep learning models capable of recognizing critical CPR objects. Publicly available datasets often lack detailed medical context, do not focus on CPR-specific objects or lack enough variaty and quality of the performed CPR. As a result, this dataset fills an important gap, especially for developing automated medical quality control systems that focus on metrics like CPM (compressions per minute), Compression depth and pause time.
Note: This dataset is not meant for the assessment of posture and hand placement accuracy. Even though this is one of the biggest datasets on Hands_only CPR object detection it still lacks enough contextual data for detection with new angles, distances and distorted images.
Overview This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube. Data Collection We used Social Tracker, along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint. In both cases, we provided keywords related to the event to receive the data. It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis. Content We have data from 34 events, and for each of them we provide the files: items_full.csv: It contains links to any social media post that was collected. images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media. video.csv: Urls of YouTube videos that were gathered about the event. video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file. description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it. In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work. Events We divided the events into six groups. They are: Fire: Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions. 14 Events Collapse: Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire). 5 Events Shooting: Likely images of guns and police officers. Few or no destruction of the environment. 5 Events Demonstration: Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event. 7 Events Collision: Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street. 1 Event Flood: Events that range from fierce rain to a tsunami. Many pictures depict water. 2 Events Media Content Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A structured dataset comparing viral view thresholds and timeframes across major platforms, including TikTok, YouTube (long-form & Shorts), Instagram Reels, Facebook, Twitter (X), LinkedIn Video, and LinkedIn Posts.
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total 232.5 million users (+24.91 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 1.2 billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.