As of June 2022, more than 500 hours of video were uploaded to YouTube every minute. This equates to approximately 30,000 hours of newly uploaded content per hour. The amount of content on YouTube has increased dramatically as consumer’s appetites for online video has grown. In fact, the number of video content hours uploaded every 60 seconds grew by around 40 percent between 2014 and 2020.
YouTube global users
Online video is one of the most popular digital activities worldwide, with 27 percent of internet users worldwide watching more than 17 hours of online videos on a weekly basis in 2023. It was estimated that in 2023 YouTube would reach approximately 900 million users worldwide. In 2022, the video platform was one of the leading media and entertainment brands worldwide, with a value of more than 86 billion U.S. dollars.
YouTube video content consumption
The most viewed YouTube channels of all time have racked up billions of viewers, millions of subscribers and cover a wide variety of topics ranging from music to cosmetics. The YouTube channel owner with the most video views is Indian music label T-Series, which counted 217.25 billion lifetime views. Other popular YouTubers are gaming personalities such as PewDiePie, DanTDM and Markiplier.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
YouTube maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments and likes). Note that they’re not the most-viewed videos overall for the calendar year”.
Note that this dataset is a structurally improved version of this dataset.
This dataset includes several months (and counting) of data on daily trending YouTube videos. Data is included for the IN, US, GB, DE, CA, FR, RU, BR, MX, KR, and JP regions (India, USA, Great Britain, Germany, Canada, France, Russia, Brazil, Mexico, South Korea, and, Japan respectively), with up to 200 listed trending videos per day.
Each region’s data is in a separate file. Data includes the video title, channel title, publish time, tags, views, likes and dislikes, description, and comment count.
The data also includes a category_id field, which varies between regions. To retrieve the categories for a specific video, find it in the associated JSON. One such file is included for each of the 11 regions in the dataset.
For more information on specific columns in the dataset refer to the column metadata.
This dataset was collected using the YouTube API. This dataset is the updated version of Trending YouTube Video Statistics.
Possible uses for this dataset could include: - Sentiment analysis in a variety of forms - Categorizing YouTube videos based on their comments and statistics. - Training ML algorithms like RNNs to generate their own YouTube comments. - Analyzing what factors affect how popular a YouTube video will be. - Statistical analysis over time.
For further inspiration, see the kernels on this dataset!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total 232.5 million users (+24.91 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 1.2 billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains data related to most watched YouTube videos till April 2024 . This contains different columns namely views,artist,channel,etc. The data is ranked on the basis of number of views.
The number of Youtube users in India was forecast to continuously increase between 2024 and 2029 by in total 222.2 million users (+34.88 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 859.26 million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Sri Lanka and Nepal.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If using this dataset, please cite the following paper and the current Zenodo repository.
This dataset is described in detail in the following paper:
The research work leading to this dataset was conducted at the Department of Electrical Engineering (ESAT), KU Leuven.
This dataset contains electroencephalogram (EEG) data collected from 19 young participants with normal or corrected-to-normal eyesight when they were watching a series of carefully selected YouTube videos. The videos were muted to avoid the confounds introduced by audio. For synchronization, a square box was encoded outside of the original frames and flashed every 30 seconds in the top right corner of the screen. A photosensor, detecting the light changes from this flashing box, was affixed to that region using black tape to ensure that the box did not distract participants. The EEG data was recorded using a BioSemi ActiveTwo system at a sample rate of 2048 Hz. Participants wore a 64-channel EEG cap, and 4 electrooculogram (EOG) sensors were positioned around the eyes to track eye movements.
The dataset includes a total of (19 subjects x 63 min + 9 subjects x 24 min) of data. Further details can be found in the following section.
The dataset is divided into two subsets: Single-shot and MrBean, based on the characteristics of the video stimuli.
The stimuli of this dataset consist of 13 single-shot videos (63 min in total), each depicting a single individual engaging in various activities such as dancing, mime, acrobatics, and magic shows. All the participants watched this video collection.
Video ID | Link | Start time (s) | End time (s) |
---|---|---|---|
01_Dance_1 | https://youtu.be/uOUVE5rGmhM | 8.54 | 231.20 |
03_Acrob_1 | https://youtu.be/DjihbYg6F2Y | 4.24 | 231.91 |
04_Magic_1 | https://youtu.be/CvzMqIQLiXE | 3.68 | 348.17 |
05_Dance_2 | https://youtu.be/f4DZp0OEkK4 | 5.05 | 227.99 |
06_Mime_2 | https://youtu.be/u9wJUTnBdrs | 5.79 | 347.05 |
07_Acrob_2 | https://youtu.be/kRqdxGPLajs | 183.61 | 519.27 |
08_Magic_2 | https://youtu.be/FUv-Q6EgEFI | 3.36 | 270.62 |
09_Dance_3 | https://youtu.be/LXO-jKksQkM | 5.61 | 294.17 |
12_Magic_3 | https://youtu.be/S84AoWdTq3E | 1.76 | 426.36 |
13_Dance_4 | https://youtu.be/0wc60tA1klw | 14.28 | 217.18 |
14_Mime_3 | https://youtu.be/0Ala3ypPM3M | 21.87 | 386.84 |
15_Dance_5 | https://youtu.be/mg6-SnUl0A0 | 15.14 | 233.85 |
16_Mime_6 | https://youtu.be/8V7rhAJF6Gc | 31.64 | 388.61 |
Additionally, 9 participants watched an extra 24-minute clip from the first episode of Mr. Bean, where multiple (moving) objects may exist and interact, and the camera viewpoint may change. The subject IDs and the signal quality files are inherited from the single-shot dataset.
Video ID | Link | Start time (s) | End time (s) |
---|---|---|---|
Mr_Bean | https://www.youtube.com/watch?v=7Im2I6STbms | 39.77 | 1495.00 |
This research is funded by the Research Foundation - Flanders (FWO) project No G081722N, junior postdoctoral fellowship fundamental research of the FWO (for S. Geirnaert, No. 1242524N), the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No 802895), the Flemish Government (AI Research Program), and the PDM mandate from KU Leuven (for S. Geirnaert, No PDMT1/22/009).
We also thank the participants for their time and effort in the experiments.
Executive researcher: Yuanyuan Yao, yuanyuan.yao@kuleuven.be
Led by: Prof. Alexander Bertrand, alexander.bertrand@kuleuven.be
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Car crash dataset RUSSIA 2022-2023 is a big driving video dataset that contains over 500 high-resolution videos of various driving scenarios. The dataset was created to aid the development and testing of autonomous driving systems and other related technologies. It includes videos from Russia, captured from a diverse set of locations, weather conditions, and lighting conditions, each video lasting about 10 seconds. The videos are annotated with bounding boxes around objects such as different types of cars, pedestrians, and cyclists, as well as traffic signs, and traffic lights. Additionally, the dataset includes metadata information for each video.Car crash dataset RUSSIA 2022-2023 is considered to be one of the few datasets from Russia on this topic. Created by 7 students from Moscow, MIEM HSE. First version published on 9th May, 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:
N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A. Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” Proceedings of the 26th International Conference on Human-Computer Interaction (HCII 2024), Washington, USA, 29 June - 4 July 2024. (Accepted as a Late Breaking Paper, Preprint Available at: https://doi.org/10.48550/arXiv.2406.07693)
Abstract
This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ITTV is a publicly available dataset of Italian TV programs introduced in
Alessandro Ilic Mezza, Paolo Sani, and Augusto Sarti, "Automatic TV Genre Classification Based on Visually-Conditioned Deep Audio Features," in 2023 31st European Signal Processing Conference (EUSIPCO), 2023.
ITTV consists of 2625 manually annotated YouTube videos, totaling over 670 hours. Each clip is assigned one of seven classes:
ITTV genre taxonomy is similar to that of the well-known RAI dataset described in
Maurizio Montagnuolo and Alberto Messina, "Parallel neural networks for multimodal video genre classification,” Multimedia Tools and Applications, vol. 41, no. 1, pp. 125–159, 2009.
The dataset contains genre annotations and metadata in CSV format. Please note that audio data is not provided.
We provide the annotations for a balanced training (1575 clips) and validation (525 clips) split, as well as for a disjoint test set containing 525 installments from TV programs not included in the development set.
As YouTube continuously updates, some videos may not be available in the future. Although we intend to keep ITTV updated as best as possible, please note that some content may not be available at any given time.
Some YouTube videos (especially from the Football
class and, to a lesser extent, the Cartoons
class) may only be available in some countries due to regional restrictions imposed by the content creator. All videos are known to be accessible from Italy (last accessed on Nov. 25th, 2022.)
Please contact Alessandro Ilic Mezza for further questions (e-mail: alessandroilic.mezza@polimi.it).
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The data was collected from the famous cookery Youtube channels in India. The major focus was to collect the viewers' comments in Hinglish languages. The datasets are taken from top 2 Indian cooking channel named Nisha Madhulika channel and Kabita’s Kitchen channel.
Both the datasets comments are divided into seven categories:-
Label 1- Gratitude
Label 2- About the recipe
Label 3- About the video
Label 4- Praising
Label 5- Hybrid
Label 6- Undefined
Label 7- Suggestions and queries
All the labelling has been done manually.
Nisha Madhulika dataset:
Dataset characteristics: Multivariate
Number of instances: 4900
Area: Cooking
Attribute characteristics: Real
Number of attributes: 3
Date donated: March, 2019
Associate tasks: Classification
Missing values: Null
Kabita Kitchen dataset:
Dataset characteristics: Multivariate
Number of instances: 4900
Area: Cooking
Attribute characteristics: Real
Number of attributes: 3
Date donated: March, 2019
Associate tasks: Classification
Missing values: Null
There are two separate datasets file of each channel named as preprocessing and main file .
The files with preprocessing names are generated after doing the preprocessing and exploratory data analysis on both the datasets. This file includes:
The main file includes:
Please cite the paper
https://www.mdpi.com/2504-2289/3/3/37
MDPI and ACS Style
Kaur, G.; Kaushik, A.; Sharma, S. Cooking Is Creating Emotion: A Study on Hinglish Sentiments of Youtube Cookery Channels Using Semi-Supervised Approach. Big Data Cogn. Comput. 2019, 3, 37.
Abstract: We present a method for estimating the ideology of political YouTube videos. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work to estimate the ideology of a political YouTube video. First, we start with a matrix of political Reddit posts linking to YouTube videos and apply correspondence analysis to place those videos in an ideological space. Second, we train a language model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. We demonstrate the utility of this method by applying it to the watch histories of survey respondents to evaluate the prevalence of echo chambers on YouTube in addition to the association between video ideology and viewer engagement. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological landscape. Keywords: Ideology estimation, YouTube, latent variable This folder contains the replication materials for "Estimating the Ideology of Political YouTube Videos."
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
These instructional videos walk users through the portal and its different features.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT This research aims to analyze the use of Youtube as a useful platform for the activities of library and information science professionals in brazilian academic libraries. Related audiovisual practices of the university libraries to encourage activities and focus on the importance of the librarian as a content producer in the digital enviroment. The survey results serve as reference material for information scientists and managers of information units interested in sharing audiovisual information as a new way of relationship with their users. Finally, based on the results, it is recommended to plan the communication strategy on social media platforms as YouTube, and prepare relevant content to engage with their subscribers and users.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for First Impressions V2
The first impressions data set, comprises 10000 clips (average duration 15s) extracted from more than 3,000 different YouTube high-definition (HD) videos of people facing and speaking in English to a camera. The videos are split into training, validation and test sets with a 3:1:1 ratio. People in videos show different gender, age, nationality, and ethnicity. Videos are labeled with personality traits variables. Amazon Mechanical Turk (AMT) was… See the full description on the dataset page: https://huggingface.co/datasets/yeray142/first-impressions-v2.
characteristics of published youtube video reviewsData extracted from published manuscripts, Excel spreadsheet with two tabs, one for the orginal sample and another for newer manuscripts. The row titled PMID indicates the PubMed ID number,and identifies that article that the data is taken from.DE on review methods V2.xls
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Nara Association of Medical Technologists has introduced online Off-Job Training (Off-JT) starting from FY2020 in response to the COVID-19 pandemic. This study aims to evaluate the online Off-JT, which differs from the traditional face-to-face format. Firstly, we compared the online format's ability to attract participants with the face-to-face format based on the number of training sessions and attendees. Despite having fewer training sessions (40.8% less), the online format had an average attendance of 105.4% higher (39.7 vs. 19.3) than the face-to-face format. To enhance participant convenience, we offered a limited number of live and video-on-demand (VOD) sessions on YouTube, evaluating their usefulness through an online survey focusing on work-life balance (WLB). The survey results showed that 81.9% (458/559) of respondents reported an improvement in WLB. The effect on WLB improvement varied depending on the viewing method, with VOD sessions showing 84.1% (376/447) and live sessions showing 73.2% (82/112). We believe that the increased ability to attract participants in the online Off-JT is mainly due to the elimination of travel burdens through internet-connected devices. The combination of live and VOD sessions on YouTube allowed participants to adjust their viewing time, leading to better allocation of free time and improved WLB. The online Off-JT and VOD delivery have shown to enhance convenience for participants by removing geographical and time constraints, resulting in positive effects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The video subtitle images were collected from 24 videos shared on Facebook and Youtube. The subtitle text included Thai and English languages, including Thai characters, Roman characters, Thai numerals, Arabic numerals, and special characters with 157 characters in total.
In the data-preprocessing step, we converted all 24 videos to images and obtained 2,700 images with subtitle text. The size of the subtitle text image was 1280x720 pixels and it was stored in JPG format. Further, we generated the ground truth from 4,224 subtitle images using the labelImg program. Also, the labels were then assigned to each subtitle image. Note that the number before the label is the order of the subtitle text image.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RealVAD: A Real-world Dataset for Voice Activity Detection
The task of automatically detecting “Who is Speaking and When” is broadly named as Voice Activity Detection (VAD). Automatic VAD is a very important task and also the foundation of several domains, e.g., human-human, human-computer/ robot/ virtual-agent interaction analyses, and industrial applications.
RealVAD dataset is constructed from a YouTube video composed of a panel discussion lasting approx. 83 minutes. The audio is available from a single channel. There is one static camera capturing all panelists, the moderator and audiences.
Particular aspects of RealVAD dataset are:
It is composed of panelists with different nationalities (British, Dutch, French, German, Italian, American, Mexican, Columbian, Thai). This aspect allows studying the effect of ethnic origin variety to the automatic VAD.
There is a gender balance such that there are four female and five male panelists.
The panelists are sitting in two rows and they can be gazing audience, other panelists, their laptop, the moderator or anywhere in the room while speaking or not-speaking. Therefore, they were captured not only from frontal-view but also from side-view varying based on their instant posture and head orientation.
The panelists are moving freely and are doing various spontaneous actions (e.g., drinking water, checking their cell phone, using their laptop, etc.), resulting in different postures.
The panelists’ body parts are sometimes partially occluded by their/other's body part or belongings (e.g., laptop).
There are also natural changes of illumination and shadow rising on the wall behind the panelists in the back row.
Especially, for the panelists sitting in the front row, there is sometimes background motion occurring when the person(s) behind them moves.
The annotations includes:
The upper body detection of nine panelists in bounding box form.
Associated VAD ground-truth (speaking, not-speaking) for nine panelists.
Acoustic features extracted from the video: MFCC and raw filterbank energies.
All info regarding the annotations are given in the ReadMe.txt and Acoustic Features README.txt files.
When using this dataset for your research, please cite the following paper in your publication:
C. Beyan, M. Shahid and V. Murino, "RealVAD: A Real-world Dataset and A Method for Voice Activity Detection by Body Motion Analysis", in IEEE Transactions on Multimedia, 2020.
The proposed Extended-YouTube Faces (E-YTF) is an extension of the famous YouTube Faces (YTF) dataset and is specifically designed to further push the challenges of face recognition by addressing the problem of open-set face identification from heterogeneous data i.e. still images vs video.
As of June 2022, more than 500 hours of video were uploaded to YouTube every minute. This equates to approximately 30,000 hours of newly uploaded content per hour. The amount of content on YouTube has increased dramatically as consumer’s appetites for online video has grown. In fact, the number of video content hours uploaded every 60 seconds grew by around 40 percent between 2014 and 2020.
YouTube global users
Online video is one of the most popular digital activities worldwide, with 27 percent of internet users worldwide watching more than 17 hours of online videos on a weekly basis in 2023. It was estimated that in 2023 YouTube would reach approximately 900 million users worldwide. In 2022, the video platform was one of the leading media and entertainment brands worldwide, with a value of more than 86 billion U.S. dollars.
YouTube video content consumption
The most viewed YouTube channels of all time have racked up billions of viewers, millions of subscribers and cover a wide variety of topics ranging from music to cosmetics. The YouTube channel owner with the most video views is Indian music label T-Series, which counted 217.25 billion lifetime views. Other popular YouTubers are gaming personalities such as PewDiePie, DanTDM and Markiplier.