https://brightdata.com/licensehttps://brightdata.com/license
Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.
The global number of Youtube users in was forecast to continuously increase between 2024 and 2029 by in total 232.5 million users (+24.91 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 1.2 billion users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Africa and South America.
The number of Youtube users in India was forecast to continuously increase between 2024 and 2029 by in total 222.2 million users (+34.88 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 859.26 million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Sri Lanka and Nepal.
This dataset was extracted for one of the assignment during the Data Science course. This data is extracted from "https://www.youtube.com/c/ZeeshanUsmani78" . If someone interested in Python code that I have used to extract, you can view in my profile: "https://github.com/meayyaz/ParsingInPython/blob/main/ChannelData.py" This kind of data can help to Learn any Youtube channel statistics.
Dataset : There are only 325 rows in this dataset and columns are "VideoId", "Title" (title of video), "PublishTime", "ViewCount", "LikeCount", "DislikeCount", "favoriteCount" , "commentCount"
I would like to Thanks Zeeshan-ul-hassan Usmani for allowing to upload this data and giving such a good live example.
I would like to learn Data Science and Machine Learning with my others fellows. Here I think we should get from this dataset: - Main target "After loading any new video, what will be the 'view-count', 'Like-count' in next 24 hours, after 7 days ... " - What kind of videos has more view? - Any relationship of Video publish timestamp?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the past few years YouTube has became a popular site for video broadcasting and earning money by publishing various different skills in the form of videos. For some people it has become a main source to earn money. Getting the videos trending among the viewers is one of the major tasks which each and every content creator wants. Popularity of any video and its reach to the audience is completely based on YouTube's Recommendation algorithm. This document is a dataset descriptor for the dataset collected over the time span of about 45 days during the Israel-Hamas War
The YouTube-100M data set consists of 100 million YouTube videos: 70M training videos, 10M evaluation videos, and 20M validation videos. Videos average 4.6 minutes each for a total of 5.4M training hours. Each of these videos is labeled with 1 or more topic identifiers from a set of 30,871 labels. There are an average of around 5 labels per video. The labels are assigned automatically based on a combination of metadata (title, description, comments, etc.), context, and image content for each video. The labels apply to the entire video and range from very generic (e.g. “Song”) to very specific (e.g. “Cormorant”). Being machine generated, the labels are not 100% accurate and of the 30K labels, some are clearly acoustically relevant (“Trumpet”) and others are less so (“Web Page”). Videos often bear annotations with multiple degrees of specificity. For example, videos labeled with “Trumpet” are often labeled “Entertainment” as well, although no hierarchy is enforced.
As YouTube is now one of the biggest online earning platform for content creators, lots of new content creators join everyday and upload almost thousands of video daily, which creates enormous amount of data everyday, from which we can do lots of things. Here I have taken data of T-Series, one of the most subscribed channel on YouTube, it's views and ratings of its past video and estimate its revenue for each video.
There's a story behind every dataset and here's your opportunity to share yours.
There are very less features in this dataset, namely: Date: The date when the particular video was released Name: Name of the video on YouTube Views: The views on YouTube as per December 2020 Ratings: The ratings of the video Comments: Number of comments on the video Estimated Revenue: The revenue generated by the video on YouTube What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
This data search wouldn't be possible without my sister as she was constantly watching videos on YouTube which lead me to this idea and then started working on this dataset.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
In 2021, YouTube's user base in Vietnam amounts to approximately 66.63 million users. The number of YouTube users in Vietnam is projected to reach 75.44 million users by 2025. User figures have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
The number of Youtube users ranking is led by India with 637.1 million users, while Russia is following with 95.38 million users. In contrast, Iceland is at the bottom of the ranking with 0.26 million users, showing a difference of 636.84 million users to India. User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Associative Tag Recommendation Exploiting Multiple Textual FeaturesFabiano Belem, Eder Martins, Jussara M. Almeida Marcos Goncalves In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, July. 2011AbstractThis work addresses the task of recommending relevant tags to a target object by jointly exploiting three dimen- sions of the problem: (i) term co-occurrence with tags preassigned to the target object, (ii) terms extracted from mul- tiple textual features, and (iii) several metrics of tag relevance. In particular, we propose several new heuristic meth- ods, which extend previous, highly effective and efficient, state-of-the-art strategies by including new metrics that try to capture how accurately a candidate term describes the object’s content. We also exploit two learning to rank techniques, namely RankSVM and Genetic Programming, for the task of generating ranking functions that combine multiple metrics to accurately estimate the relevance of a tag to a given object. We evaluate all proposed methods in various scenarios for three popular Web 2.0 applications, namely, LastFM, YouTube and YahooVideo. We found that our new heuristics greatly outperform the methods on which they are based, producing gains in precision of up to 181%, as well as another state-of-the-art technique, with improvements in precision of up to 40% over the best baseline in any scenario. Some further improvements can also be achieved, in some scenarios, with the new learning-to-rank based strategies, which have the additional advantage of being quite flexible and easily extensible to exploit other aspects of the tag recommendation problem.Bibtex Citation@inproceedings{belem@sigir11, author = {Fabiano Bel\'em and Eder Martins and Jussara Almeida and Marcos Gon\c{c}alves}, title = {Associative Tag Recommendation Exploiting Multiple Textual Features}, booktitle = {{Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR'11)}}, month = {{July}}, year = {2011} }
This statistic shows a ranking of the estimated number of Youtube users in 2020 in Latin America and the Caribbean, differentiated by country. The user numbers have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in more than 150 countries and regions worldwide. All input data are sourced from international institutions, national statistical offices, and trade associations. All data has been are processed to generate comparable datasets (see supplementary notes under details for more information).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We present an English YouTube dataset manually annotated for hate speech types and targets. The comments to be annotated were sampled from the English YouTube comments on videos about the Covid-19 pandemic in the period from January 2020 to May 2020. Two sets were annotated: a training set with 51,655 comments (IMSyPP_EN_YouTube_comments_train.csv) and two evaluation sets, one annotated in-context (IMSyPP_EN_YouTube_comments_evaluation_context.csv), another out-of-context (IMSyPP_EN_YouTube_comments_evaluation_no_context.csv), each based on the same 10,759 comments. The dataset was annotated by 10 annotators with most (99.9%) of the comments being annotated by two annotators. It was used to train a classification model for hate speech types detection that is publicly available at the following URL: https://huggingface.co/IMSyPP/hate_speech_en.
The dataset consists of the following fields: Video_ID - YouTube ID of the video under which the comment was posted Comment_ID - YouTube ID of the comment Text - text of the comment Type - type of hate speech Target - the target of hate speech Annotator - code of the human annotator
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for youtube_subs_howto100M
Dataset Summary
The youtube_subs_howto100M dataset is an English-language dataset of instruction-response pairs extracted from 309136 YouTube videos. The dataset was orignally inspired by and sourced from the HowTo100M dataset, which was developed for natural language search for video clips.
Supported Tasks and Leaderboards
conversational: The dataset can be used to train a model for instruction(request) and a long form… See the full description on the dataset page: https://huggingface.co/datasets/totuta/youtube_subs_howto100M.
YouTube-BoundingBoxes (YT-BB) is a large-scale data set of video URLs with densely-sampled object bounding box annotations. The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the MS COCO label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second.
MeLa BitChute is a near-complete dataset of over 3M videos from 61K channels over 2.5 years (June 2019 to December 2021) from the social video hosting platform BitChute, a commonly used alternative to YouTube. Additionally, the dataset includes a variety of video-level metadata, including comments, channel descriptions, and views for each video.
The dataset contains data from 3,036,190 videos, 61,229 channels, and 11,434,571 comments between June 28th, 2019 and December 31st, 2021. This dataset provides timestamped activities and estimates on views for the majority of channels and videos on the platform, allowing researchers to align BitChute videos with behavior on other platforms. Therefore, this dataset can facilitate both studies of BitChute in isolation and studies of BitChute’s role in the larger ecosystem.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
YouTube flows
The number of Youtube users in Europe was forecast to continuously increase between 2024 and 2029 by in total 7.8 million users (+3.61 percent). After the ninth consecutive increasing year, the Youtube user base is estimated to reach 223.61 million users and therefore a new peak in 2029. Notably, the number of Youtube users of was continuously increasing over the past years.User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like North America and Australia & Oceania.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube.
Data Collection
We used Social Tracker (https://github.com/MKLab-ITI/mmdemo-dockerized), along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint (https://github.com/twintproject/twint). In both cases, we provided keywords related to the event to receive the data.
It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis.
Content
We have data from 34 events, and for each of them we provide the files:
items_full.csv: It contains links to any social media post that was collected.
images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media.
video.csv: Urls of YouTube videos that were gathered about the event.
video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file.
description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it.
In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work.
Events
We divided the events into six groups. They are,
Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions
14 Events
Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire).
5 Events
Likely images of guns and police officers. Few or no destruction of the environment.
5 Events
Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event.
7 Events
Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street.
1 Event
Events that range from fierce rain to a tsunami. Many pictures depict water.
2 Events
We enlist the events in the file recod-ai-events-dataset-list.pdf
Media Content
Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.
Funding
DéjàVu thematic project, São Paulo Research Foundation (grants 2017/12646-3, 2018/18264-8 and 2020/02241-9)
The number of Youtube users in Africa was forecast to continuously increase between 2024 and 2029 by in total 0.03 million users (+3.95 percent). The Youtube user base is estimated to amount to 0.79 million users in 2029. User figures, shown here regarding the platform youtube, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Youtube users in countries like Worldwide and the Americas.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EOAD is a collection of videos captured by wearable cameras, mostly of sports activities. It contains both visual and audio modalities.
It was initiated by the HUJI and FPVSum egocentric activity datasets. However, the number of samples and diversity of activities for HUJI and FPVSum were insufficient. Therefore, we combined these datasets and populated them with new YouTube videos.
The selection of videos was based on the following criteria:
The videos should not include text overlays.
The videos should contain natural sound (no external music)
The actions in videos should be continuous (no cutting the scene or jumping in time)
Video samples were trimmed depending on scene changes for long videos (such as driving, scuba diving, and cycling). As a result, a video may have several clips depicting egocentric actions. Hence, video clips were extracted from carefully defined time intervals within videos. The final dataset includes video clips with a single action and natural audio information.
Statistics for EOAD:
30 activities
303 distinct videos
1392 video clips
2243 minutes labeled videos clips
The detailed statistics for the selected datasets and the crawled videos clips from YouTube are given below:
HUJI: 49 distinct videos - 148 video clips for 9 activities (driving, biking, motorcycle, walking, boxing, horse riding, running, skiing, stair climbing)
FPVSum: 39 distinct videos - 124 video segments for 8 activities (biking, horse riding, skiing, longboarding, rock climbing, scuba, skateboarding, surfing)
YouTube: 216 distinct videos - 1120 video clips for 27 activities (american football, basketball, bungee jumping, driving, go-kart, horse riding, ice hockey, jet ski, kayaking, kitesurfing, longboarding, motorcycle, paintball, paragliding, rafting, rock climbing, rowing, running, sailing, scuba diving, skateboarding, soccer, stair climbing, surfing, tennis, volleyball, walking)
The video clips used for training, validation and test sets for each activity are listed in Table 1. Multiple video clips may belong to a single video because of trimming it for some reasons (i.e., scene cut, temporary overlayed text on videos, or video parts unrelated to activities).
While splitting the dataset, the minimum number of videos for each activity was selected as 8. Additionally, the video samples were divided as 50%, 25%, and 25% for training (minimum four videos), validation (minimum two videos), and testing (minimum two videos), respectively. On the other hand, videos were split according to the raw video footage to prevent the mixing of similar video clips (having the same actors and scenes) into training, validation, and test sets. Therefore, we ensured that the video clips trimmed from the same videos were split together into training, validation, or test sets to satisfy a fair comparison.
Some activities have continuity throughout the video, such as scuba, longboarding, or riding horse, which also have an equal number of video segments with the number of videos. However, some activities, such as skating, occurred in a short time, making the number of video segments higher than the others. As a result, the number of video clips for training, validation, and test sets was highly imbalanced for the selected activities (i.e., jet ski and rafting have 4; however, soccer has 99 video clips for training).
Table 1 - Dataset splitting for EOAD
Train
Validation
Test
Action Label
Total Duration
Total Duration
Total Duration
AmericanFootball
34
00:06:09
36
00:05:03
9
00:01:20
Basketball
43
01:13:22
19
00:08:13
10
00:28:46
Biking
9
01:58:01
6
00:32:22
11
00:36:16
Boxing
7
00:24:54
11
00:14:14
5
00:17:30
BungeeJumping
7
00:02:22
4
00:01:36
4
00:01:31
Driving
19
00:37:23
9
00:24:46
9
00:29:23
GoKart
5
00:40:00
3
00:11:46
3
00:19:46
Horseback
5
01:15:14
5
01:02:26
2
00:20:38
IceHockey
52
00:19:22
46
00:20:34
10
00:36:59
Jetski
4
00:23:35
5
00:18:42
6
00:02:43
Kayaking
28
00:43:11
22
00:14:23
4
00:11:05
Kitesurfing
30
00:21:51
17
00:05:38
6
00:01:32
Longboarding
5
00:15:40
4
00:18:03
4
00:09:11
Motorcycle
20
00:49:38
21
00:13:53
8
00:20:30
Paintball
7
00:33:52
4
00:12:08
4
00:08:52
Paragliding
11
00:28:42
4
00:10:16
4
00:19:50
Rafting
4
00:15:41
3
00:07:27
3
00:06:13
RockClimbing
6
00:49:38
2
00:21:59
2
00:18:50
Rowing
5
00:47:05
3
00:13:21
3
00:03:26
Running
21
01:21:56
19
00:46:29
11
00:42:59
Sailing
7
00:39:30
4
00:14:39
6
00:15:43
Scuba
5
00:35:02
3
00:23:43
2
00:18:52
Skate
91
00:15:53
30
00:07:01
10
00:02:03
Ski
14
01:48:15
17
01:01:59
7
00:39:15
Soccer
102
00:48:39
52
00:13:17
16
00:06:54
StairClimbing
6
01:05:32
6
00:17:18
5
00:20:22
Surfing
23
00:12:51
17
00:06:52
10
00:07:04
Tennis
34
00:27:04
9
00:06:03
9
00:03:14
Volleyball
87
00:19:14
35
00:07:46
7
00:18:58
Walking
49
00:43:02
36
00:38:25
10
00:10:23
Total
30
740
20:22:37
452
09:20:23
200
08:00:08
EOAD Code Repository
Scripts for downloading raw videos and trim them in to video clips are provided in this GitHub repository.
Regarding the questions, please contact mali.arabaci@gmail.com.
https://brightdata.com/licensehttps://brightdata.com/license
Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.