This dataset contains statistics for a selection of YouTube videos, capturing metrics such as views, comments, likes, dislikes, and the timestamp when the data was recorded. The dataset provides insights into the popularity and engagement levels of these videos as of April 15, 2019. This data can be useful for analyzing trends in video performance, user engagement, and the impact of content over time.
File Description: This CSV file contains detailed statistics for a set of YouTube videos, including unique video identifiers and various engagement metrics. Each row represents a different video, and the columns provide specific data points related to the video's performance.
videostatsid: Unique identifier for each video statistics entry. ytvideoid: Unique YouTube video identifier. views: The total number of views the video has received. comments: The total number of comments posted on the video. likes: The total number of likes the video has received. dislikes: The total number of dislikes the video has received. timestamp:The date and time when the statistics were recorded, in the format YYYY-MM-DD HH:MM
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
The number of Youtube users in India was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach ****** million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Sri Lanka and Nepal.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all comments (comments and replies) of the YouTube vision video "Tunnels" by "The Boring Company" fetched on 2020-10-13 using YouTube API. The comments are classified manually by three persons. We performed a single-class labeling of the video comments regarding their relevance for requirement engineering (RE) (ham/spam), their polarity (positive/neutral/negative). Furthermore, we performed a multi-class labeling of the comments regarding their intention (feature request and problem report) and their topic (efficiency and safety). While a comment can only be relevant or not relevant and have only one polarity, a comment can have one or more intentions and also one or more topics.
For the replies, one person also classified them regarding their relevance for RE. However, the investigation of the replies is ongoing and future work.
Remark: For 126 comments and 26 replies, we could not determine the date and time since they were no longer accessible on YouTube at the time this data set was created. In the case of a missing date and time, we inserted "NULL" in the corresponding cell.
This data set includes the following files:
Dataset.xlsx contains the raw and labeled video comments and replies:
For each comment, the data set contains:
ID: An identification number generated by YouTube for the comment
Date: The date and time of the creation of the comment
Author: The username of the author of the comment
Likes: The number of likes of the comment
Replies: The number of replies to the comment
Comment: The written comment
Relevance: Label indicating the relevance of the comment for RE (ham = relevant, spam = irrelevant)
Polarity: Label indicating the polarity of the comment
Feature request: Label indicating that the comment request a feature
Problem report: Label indicating that the comment reports a problem
Efficiency: Label indicating that the comment deals with the topic efficiency
Safety: Label indicating that the comment deals with the topic safety
For each reply, the data set contains:
ID: The identification number of the comment to which the reply belongs
Date: The date and time of the creation of the reply
Author: The username of the author of the reply
Likes: The number of likes of the reply
Comment: The written reply
Relevance: Label indicating the relevance of the reply for RE (ham = relevant, spam = irrelevant)
Detailed analysis results.xlsx contains the detailed results of all ten times repeated 10-fold cross validation analyses for each of all considered combinations of machine learning algorithms and features
Guide Sheet - Multi-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual multi-class labeling
Guide Sheet - Single-class labeling.pdf describes the coding task, defines the categories, and lists examples to reduce inconsistencies and increase the quality of manual single-class labeling
Python scripts for analysis.zip contains the scripts (as jupyter notebooks) and prepared data (as csv-files) for the analyses
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Late Night Talk Shows are a staple of American television culture and with the shows establishing a digital presence in the form of YouTube channels, this culture has become more global. Some of the channels here have more than 20 Million subscribers which shows the amount of influence they hold in this platform.
The data is organized on a per-show channel basis which has the most important information like video titles, and all the numeric counts of Likes, Dislikes, Comments and number of views (as of 13th June 2020)
All of this data is responsibly scraped from YouTube and I would like to acknowledge all the respective Talk Shows for making their content free for the public.
The main inspiration for this dataset is how a video title or a particular celebrity appearing on the talk show can affect the engagement rate of a video
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We present an English YouTube dataset manually annotated for hate speech types and targets. The comments to be annotated were sampled from the English YouTube comments on videos about the Covid-19 pandemic in the period from January 2020 to May 2020. Two sets were annotated: a training set with 51,655 comments (IMSyPP_EN_YouTube_comments_train.csv) and two evaluation sets, one annotated in-context (IMSyPP_EN_YouTube_comments_evaluation_context.csv), another out-of-context (IMSyPP_EN_YouTube_comments_evaluation_no_context.csv), each based on the same 10,759 comments. The dataset was annotated by 10 annotators with most (99.9%) of the comments being annotated by two annotators. It was used to train a classification model for hate speech types detection that is publicly available at the following URL: https://huggingface.co/IMSyPP/hate_speech_en. The dataset consists of the following fields: Video_ID - YouTube ID of the video under which the comment was posted Comment_ID - YouTube ID of the comment Text - text of the comment Type - type of hate speech Target - the target of hate speech Annotator - code of the human annotator
During the first quarter of 2024, Huge YouTube accounts, which had over 50,000 followers, reported an engagement rate of approximately 6.2 percent on their short-format content. In comparison, engagement was sensibly lower on long-format videos, which reported an engagement rate of 1.72 percent for Huge accounts. Medium YouTube accounts, which had a following between 2,001 and 10,000 users, reported engagement ratings of almost three percent on their Shorts, while long videos had an engagement of around 0.15 percent.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The collection “Protests Belarus 2020” contains 101 videos (mp4) on protests in the second half of 2020 (mainly Minsk area) triggered by the repeated election of Lukashenko and the treatment of oppositions during the presidential elections. We have downloaded all data in September 2020 and made screenshots (pdf) of websites so that the discussion and comments on the single video posts can be followed. All data is processed in an MS Excel database with metadata.
We collect all videos that are 1) event related AND show actions of this event, 2) downloadable, 3) we can find with our search words during a particular period. We strictly aim at a systematic and objective selection and organized storage of protest-related videos. We identify particular event-related search words after intense research on the event. According to the snowball principle, we then start the collection of videos with the help of these search words and try to download as much relevant content as possible. However, we cannot guarantee the completeness of protest videos on the particular event. We search the videos and include them into the collection until a particular degree of saturation has been reached. Due to copyright restrictions, we are only allowed to give access to the database of the collected video files including the hyperlinks with its metadata and not to the videos themselves.
The videos have been posted mainly by the participants of the events. Therefore, the material is only an extract and biased by the perspective of the single creator.
The collection is part of a larger and ongoing collection of videos on protest events in the post-Soviet region.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RealVAD: A Real-world Dataset for Voice Activity Detection
The task of automatically detecting “Who is Speaking and When” is broadly named as Voice Activity Detection (VAD). Automatic VAD is a very important task and also the foundation of several domains, e.g., human-human, human-computer/ robot/ virtual-agent interaction analyses, and industrial applications.
RealVAD dataset is constructed from a YouTube video composed of a panel discussion lasting approx. 83 minutes. The audio is available from a single channel. There is one static camera capturing all panelists, the moderator and audiences.
Particular aspects of RealVAD dataset are:
It is composed of panelists with different nationalities (British, Dutch, French, German, Italian, American, Mexican, Columbian, Thai). This aspect allows studying the effect of ethnic origin variety to the automatic VAD.
There is a gender balance such that there are four female and five male panelists.
The panelists are sitting in two rows and they can be gazing audience, other panelists, their laptop, the moderator or anywhere in the room while speaking or not-speaking. Therefore, they were captured not only from frontal-view but also from side-view varying based on their instant posture and head orientation.
The panelists are moving freely and are doing various spontaneous actions (e.g., drinking water, checking their cell phone, using their laptop, etc.), resulting in different postures.
The panelists’ body parts are sometimes partially occluded by their/other's body part or belongings (e.g., laptop).
There are also natural changes of illumination and shadow rising on the wall behind the panelists in the back row.
Especially, for the panelists sitting in the front row, there is sometimes background motion occurring when the person(s) behind them moves.
The annotations includes:
The upper body detection of nine panelists in bounding box form.
Associated VAD ground-truth (speaking, not-speaking) for nine panelists.
Acoustic features extracted from the video: MFCC and raw filterbank energies.
All info regarding the annotations are given in the ReadMe.txt and Acoustic Features README.txt files.
When using this dataset for your research, please cite the following paper in your publication:
C. Beyan, M. Shahid and V. Murino, "RealVAD: A Real-world Dataset and A Method for Voice Activity Detection by Body Motion Analysis", in IEEE Transactions on Multimedia, 2020.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We present an Italian YouTube dataset manually annotated for hate speech types and targets. The comments to be annotated were sampled from the Italian YouTube comments on videos about the Covid-19 pandemic in the period from January 2020 to May 2020. Two sets were annotated: a training set with 59,870 comments (IMSyPP_IT_YouTube_comments_train.csv) and an evaluation set with 10,536 comments (IMSyPP_IT_YouTube_comments_evaluation.csv). The dataset was annotated by 8 annotators with each comment being annotated by two annotators. It was used to train a classification model for hate speech types detection that is publicly available at the following URL: https://huggingface.co/IMSyPP/hate_speech_it. The dataset consists of the following fields: ID_Commento - YouTube ID of the comment ID_Video - YouTube ID of the video under which the comment was posted Testo - text of the comment Tipo - type of hate speech Target - the target of hate speech Additionally, we have included the Italian YouTube data (SR_YT_comments.csv) which was collected in the same period as the training data and was annotated using the aforementioned model. The automatically labeled data was used to analyze the relationship between hate speech and misinformation on Italian YouTube. The results of this analysis are presented in the associated paper. The analyzed data are represented with the following fields: ID_Commento - YouTube ID of the comment Label - automatically assigned label by the model is_questionable - the type of channel where the comment was collected from; the channels could either be categorized as spreading reliable or questionable information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If using this dataset, please cite the following paper and the current Zenodo repository.
This dataset is described in detail in the following paper:
[1] Yao, Y., Stebner, A., Tuytelaars, T., Geirnaert, S., & Bertrand, A. (2024). Identifying temporal correlations between natural single-shot videos and EEG signals. Journal of Neural Engineering, 21(1), 016018. doi:10.1088/1741-2552/ad2333
The associated code is available at: https://github.com/YYao-42/Identifying-Temporal-Correlations-Between-Natural-Single-shot-Videos-and-EEG-Signals?tab=readme-ov-file
Introduction
The research work leading to this dataset was conducted at the Department of Electrical Engineering (ESAT), KU Leuven.
This dataset contains electroencephalogram (EEG) data collected from 19 young participants with normal or corrected-to-normal eyesight when they were watching a series of carefully selected YouTube videos. The videos were muted to avoid the confounds introduced by audio. For synchronization, a square box was encoded outside of the original frames and flashed every 30 seconds in the top right corner of the screen. A photosensor, detecting the light changes from this flashing box, was affixed to that region using black tape to ensure that the box did not distract participants. The EEG data was recorded using a BioSemi ActiveTwo system at a sample rate of 2048 Hz. Participants wore a 64-channel EEG cap, and 4 electrooculogram (EOG) sensors were positioned around the eyes to track eye movements.
The dataset includes a total of (19 subjects x 63 min + 9 subjects x 24 min) of data. Further details can be found in the following section.
Content
YouTube Videos: Due to copyright constraints, the dataset includes links to the original YouTube videos along with precise timestamps for the segments used in the experiments. The features proposed in 1 have been extracted and can be downloaded here: https://drive.google.com/file/d/1J1tYrxVizrl1xP-W1imvlA_v-DPzZ2Qh/view?usp=sharing.
Raw EEG Data: Organized by subject ID, the dataset contains EEG segments corresponding to the presented videos. Both EEGLAB .set files (containing metadata) and .fdt files (containing raw data) are provided, which can also be read by popular EEG analysis Python packages such as MNE.
The naming convention links each EEG segment to its corresponding video. E.g., the EEG segment 01_eeg corresponds to video 01_Dance_1, 03_eeg corresponds to video 03_Acrob_1, Mr_eeg corresponds to video Mr_Bean, etc.
The raw data have 68 channels. The first 64 channels are EEG data, and the last 4 channels are EOG data. The position coordinates of the standard BioSemi headcaps can be downloaded here: https://www.biosemi.com/download/Cap_coords_all.xls.
Due to minor synchronization ambiguities, different clocks in the PC and EEG recorder, and missing or extra video frames during video playback (rarely occurred), the length of the EEG data may not perfectly match the corresponding video data. The difference, typically within a few milliseconds, can be resolved by truncating the modality with the excess samples.
Signal Quality Information: A supplementary .txt file detailing potential bad channels. Users can opt to create their own criteria for identifying and handling bad channels.
The dataset is divided into two subsets: Single-shot and MrBean, based on the characteristics of the video stimuli.
Single-shot Dataset
The stimuli of this dataset consist of 13 single-shot videos (63 min in total), each depicting a single individual engaging in various activities such as dancing, mime, acrobatics, and magic shows. All the participants watched this video collection.
Video ID Link Start time (s) End time (s)
01_Dance_1 https://youtu.be/uOUVE5rGmhM 8.54 231.20
03_Acrob_1 https://youtu.be/DjihbYg6F2Y 4.24 231.91
04_Magic_1 https://youtu.be/CvzMqIQLiXE 3.68 348.17
05_Dance_2 https://youtu.be/f4DZp0OEkK4 5.05 227.99
06_Mime_2 https://youtu.be/u9wJUTnBdrs 5.79 347.05
07_Acrob_2 https://youtu.be/kRqdxGPLajs 183.61 519.27
08_Magic_2 https://youtu.be/FUv-Q6EgEFI 3.36 270.62
09_Dance_3 https://youtu.be/LXO-jKksQkM 5.61 294.17
12_Magic_3 https://youtu.be/S84AoWdTq3E 1.76 426.36
13_Dance_4 https://youtu.be/0wc60tA1klw 14.28 217.18
14_Mime_3 https://youtu.be/0Ala3ypPM3M 21.87 386.84
15_Dance_5 https://youtu.be/mg6-SnUl0A0 15.14 233.85
16_Mime_6 https://youtu.be/8V7rhAJF6Gc 31.64 388.61
MrBean Dataset
Additionally, 9 participants watched an extra 24-minute clip from the first episode of Mr. Bean, where multiple (moving) objects may exist and interact, and the camera viewpoint may change. The subject IDs and the signal quality files are inherited from the single-shot dataset.
Video ID Link Start time (s) End time (s)
Mr_Bean https://www.youtube.com/watch?v=7Im2I6STbms 39.77 1495.00
Acknowledgement
This research is funded by the Research Foundation - Flanders (FWO) project No G081722N, junior postdoctoral fellowship fundamental research of the FWO (for S. Geirnaert, No. 1242524N), the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No 802895), the Flemish Government (AI Research Program), and the PDM mandate from KU Leuven (for S. Geirnaert, No PDMT1/22/009).
We also thank the participants for their time and effort in the experiments.
Contact Information
Executive researcher: Yuanyuan Yao, yuanyuan.yao@kuleuven.be
Led by: Prof. Alexander Bertrand, alexander.bertrand@kuleuven.be
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rivas Brousse, I. & Rams, S. (2020). Dataset del Trabajo Fin de Grado "¿Qué puedo aprender en YouTube sobre microscopía escolar?". [Dataset] Zenodo. DOI: 10.5281/zenodo.4300668
In 2024, users engaged more with the videos they watched on YouTube compared to the previous year. The number of average interactions on YouTube grew to 2.36 in the last measured year. This is an increase compared to 2023, when the number of comments, likes, and share on pieces of content hosted on YouTube was of approximately 2.1 interactions on average.
This dataset was complied as a resource for analyzing viewer engagement, sentiment, and discussion trends on the Ben Shapiro YouTube channel over the specified period. It comprises user-generated comments extracted from the Ben Shapiro YouTube channel. The collection process involved first cataloging a comprehensive list of all videos published on the channel. Subsequently, these videos were categorized into three distinct time frames. From each time frame, the ten videos that garnered the highest number of comments were identified for detailed comment extraction. The extraction of videos and their associated comments was conducted utilizing YouTube Data Tools (Rieder, 2015). The dataset was finalized on September 12, 2022, and encompasses 711,909 comments ranging from September 1, 2020, to September 12, 2022. This dataset was uploaded and analyzed in the 4CAT: Capture & Anlysis Toolkit (Peeters & Hagen, 2022).
References:
Peeters, S., & Hagen, S. (2022). The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research. Computational Communication Research, 4(2), 571–589. https://doi.org/10.5117/CCR2022.2.007.HAGE
Rieder, B. (2015). YouTube Data Tools (1.11) [Computer software].
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The collection “Protests Georgia 2019” contains 50 videos (mp4) on protests in July 2019 (mainly Tbilisi area) triggered by the actions of the Russian politician Sergei Gavrilov who visited Georgia in June 2019. We have downloaded all data in August 2020 and made screenshots (pdf) of websites so that the discussion and comments on the single video posts can be followed. All data is processed in an MS Excel database with metadata.
We collect all videos that are 1) event related AND show actions of this event, 2) downloadable, 3) we can find with our search words during a particular period. We strictly aim at a systematic and objective selection and organized storage of protest-related videos. We identify particular event-related search words after intense research on the event. According to the snowball principle, we then start the collection of videos with the help of these search words and try to download as much relevant content as possible. However, we cannot guarantee the completeness of protest videos on the particular event. We search the videos and include them into the collection until a particular degree of saturation has been reached. Due to copyright restrictions, we are only allowed to give access to the database of the collected video files including the hyperlinks with its metadata and not to the videos themselves.
The videos have been posted mainly by the participants of the events. Therefore, the material is only an extract and biased by the perspective of the single creator.
The collection is part of a larger and ongoing collection of videos on protest events in the post-Soviet region.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Youtube-Dataset for Language Identification in Speech Signals
- for scientific use only, for questions contact: jakob.abesser@idmt.fraunhofer.de
Reference
In case you use this dataset for your research, please cite
Alexandra Draghici, Jakob Abeßer & Hanna Lukashevich: A Study on Spoken Language Identification
using Deep Neural Networks, Proceedings of the Audio Mostly Conference 2020
Dataset
The YouTube News Collection is a collection of videos from various
Youtube news channels. We gathered data from channels like BBC
news, France24, DW News, and Noticias Telemundo.
- 135664 npy files (numpy matrices exported from Python)
- each npy file includes a mel spectrogram (see below) of an audio file
- the subfolders "0" - "5" encode the language id:
0 - English
1 - French
2 - German
3 - Greek
4 - Italian
5 - Spanish
Audio Processing
- mono, sample rate 22.05 kHz
- mel spectrogram (librosa python package)
- windows size 512 samples
- hopsize 441 samples (20 ms)
- 129 mel bands
- file-level spectrogram are normalized to maximum of 1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The YouTube-ASMR dataset contains URLS for over 900 hours of ASMR video clips with stereo/binaural audio produced by various YouTube artists. The following paper contains a detailed description of the dataset and how it was compiled:
K. Yang, B. Russell and J. Salamon, "Telling Left from Right: Learning Spatial Correspondence of Sight and Sound", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, June 2020.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Emergency evacuations. 126 publicly available videos in which people are or should be evacuating. Sources: YouTube & news sites. See detailed information about the videos and data collection method in the excel file: vanderWal2020-details-onlinerepository.xlsxA few videos could not be uploaded to figshare, please see excel file for the source to download yourself or request complete set in a zip file (also could not upload the zip file to figshare).
The file "2020.03.10 - Análise_Campanhas.xlsx" contains the names of all YouTube videos that were evaluated in the survey, along with their corresponding access links (for the paper - accessed occurred in March 2020), and the number of shares up to that date. The file "Conjoint Analysis.sav" contains the data collection used for the Conjoint Analysis - Study 2 of the article. If you want the same data in SAV I can provide it. The file "Appendix.pdf" contains the images used in the data collection
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube.
Data Collection
We used Social Tracker (https://github.com/MKLab-ITI/mmdemo-dockerized), along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint (https://github.com/twintproject/twint). In both cases, we provided keywords related to the event to receive the data.
It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis.
Content
We have data from 34 events, and for each of them we provide the files:
items_full.csv: It contains links to any social media post that was collected.
images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media.
video.csv: Urls of YouTube videos that were gathered about the event.
video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file.
description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it.
In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work.
Events
We divided the events into six groups. They are,
Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions
14 Events
Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire).
5 Events
Likely images of guns and police officers. Few or no destruction of the environment.
5 Events
Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event.
7 Events
Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street.
1 Event
Events that range from fierce rain to a tsunami. Many pictures depict water.
2 Events
We enlist the events in the file recod-ai-events-dataset-list.pdf
Media Content
Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.
Funding
DéjàVu thematic project, São Paulo Research Foundation (grants 2017/12646-3, 2018/18264-8 and 2020/02241-9)
This dataset contains statistics for a selection of YouTube videos, capturing metrics such as views, comments, likes, dislikes, and the timestamp when the data was recorded. The dataset provides insights into the popularity and engagement levels of these videos as of April 15, 2019. This data can be useful for analyzing trends in video performance, user engagement, and the impact of content over time.
File Description: This CSV file contains detailed statistics for a set of YouTube videos, including unique video identifiers and various engagement metrics. Each row represents a different video, and the columns provide specific data points related to the video's performance.
videostatsid: Unique identifier for each video statistics entry. ytvideoid: Unique YouTube video identifier. views: The total number of views the video has received. comments: The total number of comments posted on the video. likes: The total number of likes the video has received. dislikes: The total number of dislikes the video has received. timestamp:The date and time when the statistics were recorded, in the format YYYY-MM-DD HH:MM