56 datasets found

Top Youtube Artist
kaggle.com
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrityunjay Pathak (2023). Top Youtube Artist [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/top-youtube-artist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2023
Dataset provided by
Kaggle
Authors
Mrityunjay Pathak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
YouTube was created in 2005, with the first video – Me at the Zoo - being uploaded on 23 April 2005. Since then, 1.3 billion people have set up YouTube accounts. In 2018, people watch nearly 5 billion videos each day. People upload 300 hours of video to the site every minute.

According to 2016 research undertaken by Pexeso, music only accounts for 4.3% of YouTube’s content. Yet it makes 11% of the views. Clearly, an awful lot of people watch a comparatively small number of music videos. It should be no surprise, therefore, that the most watched videos of all time on YouTube are predominantly music videos.

On August 13, BTS became the most-viewed artist in YouTube history, accumulating over 26.7 billion views across all their official channels. This count includes all music videos and dance practice videos.

Justin Bieber and Ed Sheeran now hold the records for second and third-highest views, with over 26 billion views each.

Currently, BTS’s most viewed videos are their music videos for “**Boy With Luv**,” “**Dynamite**,” and “**DNA**,” which all have over 1.4 billion views.

Headers of the Dataset Total = Total views (in millions) across all official channels Avg = Current daily average of all videos combined 100M = Number of videos with more than 100 million views
YouTube users worldwide 2020-2029
statista.com
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). YouTube users worldwide 2020-2029 [Dataset]. https://www.statista.com/forecasts/1144088/youtube-users-in-the-world
Explore at:
Dataset updated
Jul 7, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide, YouTube
Description
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total ***** million users (+***** percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach *** billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
i
Data from: YouTube Video Network Dataset for Israel-Hamas War
ieee-dataport.org
Updated Dec 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thejas T (2023). YouTube Video Network Dataset for Israel-Hamas War [Dataset]. https://ieee-dataport.org/documents/youtube-video-network-dataset-israel-hamas-war
Explore at:
Dataset updated
Dec 23, 2023
Authors
Thejas T
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube, Israel
Description
Over the past few years YouTube has became a popular site for video broadcasting and earning money by publishing various different skills in the form of videos. For some people it has become a main source to earn money. Getting the videos trending among the viewers is one of the major tasks which each and every content creator wants. Popularity of any video and its reach to the audience is completely based on YouTube's Recommendation algorithm. This document is a dataset descriptor for the dataset collected over the time span of about 45 days during the Israel-Hamas War
YouTube users in India 2020-2029
statista.com
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). YouTube users in India 2020-2029 [Dataset]. https://www.statista.com/forecasts/1146150/youtube-users-in-india
Explore at:
Dataset updated
Mar 3, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
India
Description
The number of Youtube users in India was forecast to continuously increase between 2024 and 2029 by in total 222.2 million users (+34.88 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 859.26 million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Sri Lanka and Nepal.
BBC YouTube Videos Metadata
kaggle.com
zip
Updated Aug 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Preda (2020). BBC YouTube Videos Metadata [Dataset]. https://www.kaggle.com/gpreda/bbc-youtube-videos-metadata
Explore at:
zip(1856076 bytes)Available download formats
Dataset updated
Aug 13, 2020
Authors
Gabriel Preda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Introduction

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F769452%2F3c07321245b5cbec0dad06a5d9c3201d%2Fssssss.png?generation=1597339315897882&alt=media" alt="">

The data id collected using YouTube Data Tools from BBC YouTube channel. It shows information about all videos from this channel, starting with 2007.

Data collection

Using YouTube Data Tools one can access the metadata for YouTube channels, videos, comments, upvotes.

References

YouTube Data Tools, https://tools.digitalmethods.net/

Inspiration

Use this amazing dataset to analyze the impact of these videos, by looking to view, like, dislike, favorite, comments. Try to understand from description of the video if some subjects have larger impact. Factor-in the ”age” of each video, with this amazing dataset collecting video metadata starting from 2007.
O
YouCook
opendatalab.com
paperswithcode.com
zip
Updated Mar 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State University of New York (2023). YouCook [Dataset]. https://opendatalab.com/OpenDataLab/YouCook
Explore at:
zip(1865855952 bytes)Available download formats
Dataset updated
Mar 22, 2023
Dataset provided by
State University of New York
Description
This data set was prepared from 88 open-source YouTube cooking videos. The YouCook dataset contains videos of people cooking various recipes. The videos were downloaded from YouTube and are all in the third-person viewpoint; they represent a significantly more challenging visual problem than existing cooking and kitchen datasets (the background kitchen/scene is different for many and most videos have dynamic camera changes). In addition, frame-by-frame object and action annotations are provided for training data (as well as a number of precomputed low-level features). Finally, each video has a number of human provided natural language descriptions (on average, there are eight different descriptions per video). This dataset has been created to serve as a benchmark in describing complex real-world videos with natural language descriptions.
MOST LIKED COMMENTS ON YOUTUBE
kaggle.com
Updated Sep 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nipun Arora (2020). MOST LIKED COMMENTS ON YOUTUBE [Dataset]. https://www.kaggle.com/nipunarora8/most-liked-comments-on-youtube/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 9, 2020
Dataset provided by
Kaggle
Authors
Nipun Arora
Area covered
YouTube
Description
Context

I was finding a specific dataset but never got one.

Content

This is a text dataset focussing on the top comments on the best youtube videos (views>1B)

Acknowledgements

I wanna thank youtube api for helping me, lol and mongo db where I stored all the raw data.

Inspiration

I shared this dataset to see how the world will react and what will people do with this dataset. I hope this helps me learn more about NLP and ML
TED talks - Youtube
kaggle.com
Updated May 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ulrike Herold (2024). TED talks - Youtube [Dataset]. https://www.kaggle.com/datasets/ulrikeherold/ted-talks-youtube
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 11, 2024
Dataset provided by
Kaggle
Authors
Ulrike Herold
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Area covered
YouTube
Description
Data is from the Youtube Channel "TED".

Data scraping was on March 16th 2024 using an API via fetching the channels ID, using a node.js code.

"The TED Talks channel features the best talks and performances from the TED Conference, where the world's leading thinkers and doers give the talk of their lives in 18 minutes (or less). Look for talks on Technology, Entertainment and Design -- plus science, business, global issues, the arts and more. You're welcome to link to or embed these videos, forward them to others and share these ideas with people you know." - Information from the Ted talks - Youtube page https://www.youtube.com/@TED

Deleted columns: "channelId", "publishedAt", "position", "duration", "dimension", "definition", "defaultLanguage", "thumbnail_maxres", "licensedContent", "locationDescription", "latitude", "longitude", "dislikeCount", "favoriteCount"

Split column publishedAtSQL into Date (release_date) and Time (release_time).

Changed durationSec - duration of video in seconds - to duration - duration of video mm:ss.

Split information in "Title" into "Title" of episode and "Speaker".
R
RECOD.ai events dataset
redu.unicamp.br
Updated Mar 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Repositório de Dados de Pesquisa da Unicamp (2025). RECOD.ai events dataset [Dataset]. http://doi.org/10.25824/redu/BLIYYR
Explore at:
Unique identifier
https://doi.org/10.25824/redu/BLIYYR
Dataset updated
Mar 21, 2025
Dataset provided by
Repositório de Dados de Pesquisa da Unicamp
Dataset funded by
Fundação de Amparo à Pesquisa do Estado de São Paulo
Description
Overview This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube. Data Collection We used Social Tracker, along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint. In both cases, we provided keywords related to the event to receive the data. It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis. Content We have data from 34 events, and for each of them we provide the files: items_full.csv: It contains links to any social media post that was collected. images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media. video.csv: Urls of YouTube videos that were gathered about the event. video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file. description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it. In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work. Events We divided the events into six groups. They are: Fire: Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions. 14 Events Collapse: Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire). 5 Events Shooting: Likely images of guns and police officers. Few or no destruction of the environment. 5 Events Demonstration: Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event. 7 Events Collision: Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street. 1 Event Flood: Events that range from fierce rain to a tsunami. Many pictures depict water. 2 Events Media Content Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.
o
How to make google plus posts private - Dataset - openAFRICA
open.africa
Updated Jan 4, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). How to make google plus posts private - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/how-to-make-google-plus-posts-private
Explore at:
Dataset updated
Jan 4, 2018
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number
Z
Data from: Introducing the COVID-19 YouTube (COVYT) speech dataset featuring...
data.niaid.nih.gov
zenodo.org
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Triantafyllopoulos (2022). Introducing the COVID-19 YouTube (COVYT) speech dataset featuring the same speakers with and without infection [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6962929
Explore at:
Dataset updated
Sep 8, 2022
Dataset provided by
Meishu Song
Anastasia Semertzidou
Andreas Triantafyllopoulos
Florian B. Pokorny
Björn W. Schuller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
The COVYT dataset contains speech samples from individuals who self-reported their COVID-19 infection on public social media platforms (YouTube, Xiaohongshu). These videos, as well as accompanying videos of the same people prior to infection, were mined in an attempt to gather publicly-available data for COVID-19 research. This release includes the links to the original videos along with the accompanying manual segmentation and diarisation that identifies the utterances of the target individuals. We are additionally releasing features derived from the segmented utterances. Finally, the dataset includes partitioning information according to 4 different cross-validation schemes. See the arxiv pre-print for more details: https://arxiv.org/abs/2206.11045
YouTube users in Europe 2020-2029
statista.com
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). YouTube users in Europe 2020-2029 [Dataset]. https://www.statista.com/topics/3853/internet-usage-in-europe/
Explore at:
Dataset updated
May 21, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
Europe
Description
The number of Youtube users in Europe was forecast to continuously increase between 2024 and 2029 by in total 7.8 million users (+3.61 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 223.61 million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like North America and Australia & Oceania.
P
VLEP Dataset
paperswithcode.com
Updated Oct 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jie Lei; Licheng Yu; Tamara L. Berg; Mohit Bansal (2021). VLEP Dataset [Dataset]. https://paperswithcode.com/dataset/vlep
Explore at:
Dataset updated
Oct 12, 2021
Authors
Jie Lei; Licheng Yu; Tamara L. Berg; Mohit Bansal
Description
VLEP contains 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. Each example (see Figure 1) consists of a Premise Event (a short video clip with dialogue), a Premise Summary (a text summary of the premise event), and two potential natural language Future Events (along with Rationales) written by people. These clips are on average 6.1 seconds long and are harvested from diverse event-rich sources, i.e., TV show and YouTube Lifestyle Vlog videos.
ATM Anomaly Video Dataset (ATMA-V)
kaggle.com
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehant Kammakomati (2022). ATM Anomaly Video Dataset (ATMA-V) [Dataset]. http://doi.org/10.34740/kaggle/dsv/3455016
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/3455016
Dataset updated
Apr 13, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mehant Kammakomati
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
ATMA-V Dataset

The video dataset comprises 65 videos that consist of both anomalous and normal video segments. These videos are temporally annotated by human annotators for anomalous and normal segments. Annotations are cross-validated by a different person who was not part of the annotators' group, this is done to minimize human error to a certain extent. Annotation data for videos is represented as a set of frame ranges that contain anomalous segments and those frames that are not included within the range are considered normal video segments.

Data Collection

To ensure diversification in terms of location and people, the data for both image and video formats have been collected manually from the internet. Mostly, multimedia sharing platforms such as YouTube, Kaotic, Dailymail, Itemfix, leakedreality, GettyImages, and Shutterstock are leveraged as sources. Collection from internet sources is done with the help of multiple text-based search queries that are slightly varied in terms of vocabulary and language such as "atm robbery", "atm theft", "atm chori", and "atm Diebstahl". Genuine ATM-based data on the internet is meager, so this approach of search and collection has mitigated the challenge to some extent. To prepare a high-quality dataset, certain conditions are imposed during the collection process such as: avoiding shaky, overly labeled videos/images, and videos that are compiled.
Youtube users in Vietnam 2017-2025
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Youtube users in Vietnam 2017-2025 [Dataset]. https://www.statista.com/forecasts/1146013/youtube-users-in-vietnam
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017 - 2019
Area covered
Vietnam
Description
In 2021, YouTube's user base in Vietnam amounts to approximately ***** million users. The number of YouTube users in Vietnam is projected to reach ***** million users by 2025. User figures have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
IdiapVideoAge
zenodo.org
explore.openaire.eu
application/gzip
Updated Sep 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavel Korshunov; Pavel Korshunov; Sébastien Marcel; Sébastien Marcel (2022). IdiapVideoAge [Dataset]. http://doi.org/10.34777/e6vt-fz55
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.34777/e6vt-fz55
Dataset updated
Sep 7, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pavel Korshunov; Pavel Korshunov; Sébastien Marcel; Sébastien Marcel
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description

IdiapVideoAge dataset is a set of youtube video IDs with age labels to facilitates the research in the area of audio-visual age verification with the focus on detecting ages of people below 18 years old. The dataset contains 4260 IDs to the youtube videos that come from two existing video databases: VoxCeleb2 and child speech dataset from Google. Our main contribution are the age labels of people in the videos. Three different human annotators were used for labeling. They were instructed give a valid age label if a person's face in a video is visible within more than 80% of the frames and it is clear that the audible speech matches the person in the video. As the age label, we used the average of the three annotators. Out of the total 4260 videos, 1973 videos are of the minors below 18 years old.

Reference

If you use this dataset, please cite the following publication:

Pavel Korshunov and Sebastien Marcel, "Face Anthropometry Aware Audio-visual Age Verification", ACM Multimedia international conference (MM'22), October 2022.
https://publications.idiap.ch/index.php/publications/show/4862
R
Accident Detection Model Dataset
universe.roboflow.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 8, 2024
Dataset authored and provided by
Accident detection model
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accident Bounding Boxes
Description
Accident-Detection-Model

Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

Problem Statement

Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.

According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.

The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

Accidents survey

https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

Literature Survey

Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.

Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

Research Gap

Lack of real-world data - We trained model for more then 3200 images.

Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.

Outdated Versions of previous works - We aer using Latest version of Yolo v8.

Proposed methodology

We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.

This model after training with 25 iterations and is ready to detect an accident with a significant probability.

Model Set-up

Preparing Custom dataset

We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.

Then we annotated all of them individually on a tool called roboflow.

During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident

Then we divided the data set into train, val, test in the ratio of 8:1:1

At the final step we downloaded the dataset in yolov8 format.
#### Using Google Collab

We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.

You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.

Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.

In Google collab, First of all we Changed runtime from TPU to GPU.

We cross checked it by running command ‘!nvidia-smi’
#### Coding

First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’

Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’

Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’

Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’

After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’

Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’

The results are stored in the runs/detect/predict folder.
Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

Challenges I ran into

I majorly ran into 3 problems while making this model

I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.

I was facing problem on cvat website because i was not sure what
l
TL;DR Dataset: Best YouTube Alternatives for Creators in 2025
learningrevolution.net
html
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jawad Khan (2024). TL;DR Dataset: Best YouTube Alternatives for Creators in 2025 [Dataset]. https://www.learningrevolution.net/youtube-alternatives/
Explore at:
htmlAvailable download formats
Dataset updated
Sep 25, 2024
Dataset provided by
Learning Revolution
Authors
Jawad Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Variables measured
Platform, Best Use Case
Description
Concise comparison of the top 10 YouTube alternatives for content creators in 2025. Covers monetization, audience size, and ideal use cases.
P
TikTok Dataset Dataset
paperswithcode.com
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasamin Jafarian; Hyun Soo Park (2024). TikTok Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/tiktok-dataset
Explore at:
Dataset updated
Jul 22, 2024
Authors
Yasamin Jafarian; Hyun Soo Park
Description
We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.

Download TikTok Dataset:

Please use the dataset only for the research purpose.

The dataset can be viewed and downloaded from the Kaggle page. (you need to make an account in Kaggle to be able to download the data. It is free!)

The dataset can also be downloaded from here (42 GB). The dataset resolution is: (1080 x 604)

The original YouTube videos corresponding to each sequence and the dance name can be downloaded from here (2.6 GB).

GAViD: Group Affect from ViDeos

zenodo.org

csv, zip

Updated Jun 5, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Deepak Kumar; Deepak Kumar; Puneet Kumar; Puneet Kumar; Xiaobai Li; Xiaobai Li; Balasubramanian Raman; Balasubramanian Raman (2025). GAViD: Group Affect from ViDeos [Dataset]. http://doi.org/10.5281/zenodo.15448846

Explore at:

csv, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15448846

Dataset updated

Jun 5, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Deepak Kumar; Deepak Kumar; Puneet Kumar; Puneet Kumar; Xiaobai Li; Xiaobai Li; Balasubramanian Raman; Balasubramanian Raman

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jun 1, 2025

Description

Overview

We introduce the Group Affect from ViDeos (GAViD) dataset, which comprises 5091 video clips with multimodal data (video, audio, and context), annotated with ternary valence and discrete emotion labels, and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present CAGNet, a baseline model for multimodal context aware group affect recognition. CAGNet achieves 61.20% test accuracy on GAViD, comparable to state-of-the art performance in the field.

NOTE: For now we are providing only Train video clips. The corresponding paper is under Review in ACM Multimedia 2025 Dataset Track. After its publication, the validation and Test set access will be granted upon request and approval, in accordance with the Responsible Use Policy.

Dataset Description

GAViD is a large-scale, in-the-wild multimodal dataset of 5091 samples, each annotated with the elements listed below. The following sections describe its key details and compilation procedure.

Raw video clips of an average duration of five seconds,
Audio aligned with the video clips,
Contextual metadata (scene descriptions, event labels) generated by a multimodal LLM and human-verified,
Group affect labels: ternary valence (positive, neutral, negative) and five discrete emotions (happy, sad, fear, anger, neutral),
Emotion intensity ratings (high, medium, low),
Interaction type labels (cooperative, hostile, neutral),
Action cues (e.g. smiling, clapping, shouting, dancing, singing).

Dataset details

Number of clips (samples) in GAViD-> 5130
Number of samples with some problem-> 39
Number of samples after filtering-> 5,091
Duration per clip-> 5 sec
Clip count per video-> 1–35
Dataset split-> Train: 3503; Val: 542; Test:1046
Affect labels (classwise distribution)-> Positive: 2600; Negative: 1189; Neutral: 1302
Emotion label distribution-> Neutral: 1522; Happy: 2428; Anger: 884; Sad: 201; Fear: 56

Keywords used to rearch the raw videos from YouTube

Positive	Positive	Negative	Negative	Neutral	Neutral
Team Celebration	Happy	Protest	Angry Sport	Group Meeting	Panel Discussion
Group Meeting	Video Conference	Heated Argument	Violent Protest	Parliament speech	People on street
Get Together	Meeting	Emotional breakdown in Public	Aggressive Argument	People walking on street	Team brainstorming Session
Celebration	Press Conference	Spritual Gathering	Aggressive Group	Team Building Activities	Group Discussion
Religious gathering	Talk Show	Street Race	Condolence	Group work session	Team Planning session
Farewell	Group Performance	Group Fight	Wrestling	Students in Discussion	Wedding Group Dance
People Dancing on Street	Street Comedy	MMA Fight	VIolence	Roundtable Discus- sion	Oath
Wedding Performance	Dhol masti	Boxing	Silent Protest	Mental health ad- dress	General Talk
Couple group dance	Comedy show	People in the fight	Group Fight	Wedding Celebration	Festival Celebration

Emotion Recognition Results using CAGNet

Model	Val Acc.	Val F1	Test Acc.	Test F1
CAGNet	62.55%	0.454	60.33%	0.448

Components of the Dataset

The dataset comprises two main components:

GAViD_train.csv file: Contains bin number used by labelbox in the annotation process, video_id, group_emotion (Positive, Negative, Neutral), specific_emotion (happy, sad, fear, anger, neutral), emotion_intensity, interaction_type, action_cuse, Video Description genertaed using Video-ChatGPT model.
GAViD_Train_VideoClips.zip folder: Contains the video clips of train set [For Now we are providing only Train video clips. Validation and Test set video clips will be provided as per the request].

Data Format and Fields of the CSV File

The dataset is structured in GAViD.csv file along with corresponding Videos in related folders. This CSV file includes the following fields:

Video_ID: Unique Identifier of a video
Group_Affect: Positive, Negative, Neutral
Descrete_Emotion: Happy, Sad, Fear, Anger, Neutral
Emotion_Intensity: High, Medium, Low
Interaction_Type: Cooperative, Hostile, Neutral
Action_Cues: e.g. Smiling, Clapping, Shouting, Dancing, Singing etc.
Context: Each video clip's summary generated from the Video-ChatGPT model.

Ethical considerations, data privacy and misuse prevention

Data Collection and Consent: The data collection and annotation strictly followed established ethical protocols in line with YouTube's Terms, which state “Public videos with a Creative Commons license may be reused". We downloaded only public-domain videos licensed under Creative Commons (CC BY 4.0), which “allows others to share, copy and redistribute the material in any medium or format, and to adapt, remix, transform, and build upon it for any purpose, even commercially".
Privacy: All content was reviewed to ensure no private or sensitive information is present. Faces are included only from public domain videos as needed for group affect research; only group-level content is released, with no attempt or risk of individual identification. Other personally identifiable information, such as
names and addresses and contacts, was removed.

Code and Citation

Code Repository: https: //github.com/deepakkumar-iitr/GAViD/tree/main
Citing the Dataset: Users of the dataset should cite the corresponding paper described at the above GitHub Repository.

License & Access

This dataset is released for academic research only and is free to researchers from educational or

Facebook

Twitter

Click to copy link

Link copied

Cite

Mrityunjay Pathak (2023). Top Youtube Artist [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/top-youtube-artist

Top Youtube Artist

Top Youtube Artist with Total Views (in millions) across all Official Channels

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 12, 2023

Dataset provided by

Kaggle

Authors

Mrityunjay Pathak

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

YouTube

Description

YouTube was created in 2005, with the first video – Me at the Zoo - being uploaded on 23 April 2005. Since then, 1.3 billion people have set up YouTube accounts. In 2018, people watch nearly 5 billion videos each day. People upload 300 hours of video to the site every minute.

According to 2016 research undertaken by Pexeso, music only accounts for 4.3% of YouTube’s content. Yet it makes 11% of the views. Clearly, an awful lot of people watch a comparatively small number of music videos. It should be no surprise, therefore, that the most watched videos of all time on YouTube are predominantly music videos.

On August 13, BTS became the most-viewed artist in YouTube history, accumulating over 26.7 billion views across all their official channels. This count includes all music videos and dance practice videos.

Justin Bieber and Ed Sheeran now hold the records for second and third-highest views, with over 26 billion views each.

Currently, BTS’s most viewed videos are their music videos for “**Boy With Luv**,” “**Dynamite**,” and “**DNA**,” which all have over 1.4 billion views.

Headers of the Dataset Total = Total views (in millions) across all official channels Avg = Current daily average of all videos combined 100M = Number of videos with more than 100 million views

Clear search

Close search

Google apps

Main menu

Top Youtube Artist

YouTube users worldwide 2020-2029

Data from: YouTube Video Network Dataset for Israel-Hamas War

YouTube users in India 2020-2029

BBC YouTube Videos Metadata

Introduction

Data collection

References

Inspiration

YouCook

MOST LIKED COMMENTS ON YOUTUBE

Context

Content

Acknowledgements

Inspiration

TED talks - Youtube

RECOD.ai events dataset

How to make google plus posts private - Dataset - openAFRICA

Data from: Introducing the COVID-19 YouTube (COVYT) speech dataset featuring...

YouTube users in Europe 2020-2029

VLEP Dataset

ATM Anomaly Video Dataset (ATMA-V)

ATMA-V Dataset

Data Collection

Youtube users in Vietnam 2017-2025

IdiapVideoAge

Accident Detection Model Dataset

Accident-Detection-Model

Problem Statement

Accidents survey

Literature Survey

Research Gap

Proposed methodology

Model Set-up

Preparing Custom dataset

Challenges I ran into

I majorly ran into 3 problems while making this model

TL;DR Dataset: Best YouTube Alternatives for Creators in 2025

TikTok Dataset Dataset

GAViD: Group Affect from ViDeos

Overview

Dataset Description

Keywords used to rearch the raw videos from YouTube

Emotion Recognition Results using CAGNet

Components of the Dataset

Data Format and Fields of the CSV File

Ethical considerations, data privacy and misuse prevention

Code and Citation

License & Access

Top Youtube Artist

Top Youtube Artist with Total Views (in millions) across all Official Channels