22 datasets found

Data from: YouTube Videos Datasets
brightdata.com
.json, .csv, .xlsx
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). YouTube Videos Datasets [Dataset]. https://brightdata.com/products/datasets/youtube/videos
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 20, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide, YouTube
Description
Use our YouTube Videos dataset to extract detailed information from public videos and filter by video title, views, upload date, or likes. Data points include video URL, title, description, thumbnail, upload date, view count, like count, comment count, tags, and more. You can purchase the entire dataset or a customized subset, tailored to your needs. Popular use cases for this dataset include trend analysis, content performance tracking, brand monitoring, and influencer campaign optimization.
Hours of video uploaded to YouTube every minute 2007-2022
statista.com
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Hours of video uploaded to YouTube every minute 2007-2022 [Dataset]. https://www.statista.com/statistics/259477/hours-of-video-uploaded-to-youtube-every-minute/
Explore at:
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2007 - Jun 2022
Area covered
Worldwide, YouTube
Description
As of June 2022, more than *** hours of video were uploaded to YouTube every minute. This equates to approximately ****** hours of newly uploaded content per hour. The amount of content on YouTube has increased dramatically as consumer’s appetites for online video has grown. In fact, the number of video content hours uploaded every 60 seconds grew by around ** percent between 2014 and 2020. YouTube global users Online video is one of the most popular digital activities worldwide, with ** percent of internet users worldwide watching more than ** hours of online videos on a weekly basis in 2023. It was estimated that in 2023 YouTube would reach approximately *** million users worldwide. In 2022, the video platform was one of the leading media and entertainment brands worldwide, with a value of more than ** billion U.S. dollars. YouTube video content consumption The most viewed YouTube channels of all time have racked up billions of viewers, millions of subscribers and cover a wide variety of topics ranging from music to cosmetics. The YouTube channel owner with the most video views is Indian music label T-Series, which counted ****** billion lifetime views. Other popular YouTubers are gaming personalities such as PewDiePie, DanTDM and Markiplier.
Top 1000 YouTube Channels in the World 🌐📊🎥
kaggle.com
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mayank Anand (2024). Top 1000 YouTube Channels in the World 🌐📊🎥 [Dataset]. https://www.kaggle.com/datasets/mayankanand2701/top-1000-youtube-channels-in-the-world/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 25, 2024
Dataset provided by
Kaggle
Authors
Mayank Anand
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
YouTube
Description
YouTube is the world's largest video-sharing platform, launched in 2005. It allows users to upload, view, and share videos, and has grown to be a central hub for content creators across various fields, including entertainment, education, music, and more. With over 2 billion logged-in users monthly, YouTube has become an essential platform for digital content and marketing.

The Top 1000 YouTube Channels Dataset captures detailed information about the top-performing YouTube channels globally. This dataset includes the following columns:

Rank : The ranking of the YouTube channel based on its overall popularity and performance.

Youtuber : The name of the YouTuber or the title of the YouTube channel.

Subscribers : The total number of subscribers to the channel, indicating its reach and popularity.

Video Views : The total number of video views the channel has accumulated, reflecting its engagement and audience interaction.

Video Count : The total number of videos uploaded by the channel, showing the content volume produced.

Category : The genre or category the channel belongs to, such as music, education, entertainment, etc.

Started : The year the channel was created, providing insight into its longevity and growth over time.

This dataset is invaluable for analyzing trends, understanding content strategies, and benchmarking channel performances within the YouTube ecosystem.
Z
Spotify and Youtube
data.niaid.nih.gov
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guarisco, Marco (2023). Spotify and Youtube [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10253414
Explore at:
Dataset updated
Dec 4, 2023
Dataset provided by
Rastelli, Salvatore
Guarisco, Marco
Sallustio, Marco
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
YouTube
Description
This is the statistics for the Top 10 songs of various spotify artists and their YouTube videos. The Creators above generated the data and uploaded it to Kaggle on February 6-7 2023. The license to use this data is "CC0: Public Domain", allowing the data to be copied, modified, distributed, and worked on without having to ask permission. The data is in numerical and textual CSV format as attached. This dataset contains the statistics and attributes of the top 10 songs of various artists in the world. As described by the creators above, it includes 26 variables for each of the songs collected from spotify. These variables are briefly described next:

Track: name of the song, as visible on the Spotify platform. Artist: name of the artist. Url_spotify: the Url of the artist. Album: the album in wich the song is contained on Spotify. Album_type: indicates if the song is relesead on Spotify as a single or contained in an album. Uri: a spotify link used to find the song through the API. Danceability: describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. Energy: is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. Key: the key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. Loudness: the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. Speechiness: detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. Acousticness: a confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. Instrumentalness: predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. Liveness: detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. Valence: a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). Tempo: the overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. Duration_ms: the duration of the track in milliseconds. Stream: number of streams of the song on Spotify. Url_youtube: url of the video linked to the song on Youtube, if it have any. Title: title of the videoclip on youtube. Channel: name of the channel that have published the video. Views: number of views. Likes: number of likes. Comments: number of comments. Description: description of the video on Youtube. Licensed: Indicates whether the video represents licensed content, which means that the content was uploaded to a channel linked to a YouTube content partner and then claimed by that partner. official_video: boolean value that indicates if the video found is the official video of the song. The data was last updated on February 7, 2023.
Top Youtube Artist
kaggle.com
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrityunjay Pathak (2023). Top Youtube Artist [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/top-youtube-artist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2023
Dataset provided by
Kaggle
Authors
Mrityunjay Pathak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
YouTube was created in 2005, with the first video – Me at the Zoo - being uploaded on 23 April 2005. Since then, 1.3 billion people have set up YouTube accounts. In 2018, people watch nearly 5 billion videos each day. People upload 300 hours of video to the site every minute.

According to 2016 research undertaken by Pexeso, music only accounts for 4.3% of YouTube’s content. Yet it makes 11% of the views. Clearly, an awful lot of people watch a comparatively small number of music videos. It should be no surprise, therefore, that the most watched videos of all time on YouTube are predominantly music videos.

On August 13, BTS became the most-viewed artist in YouTube history, accumulating over 26.7 billion views across all their official channels. This count includes all music videos and dance practice videos.

Justin Bieber and Ed Sheeran now hold the records for second and third-highest views, with over 26 billion views each.

Currently, BTS’s most viewed videos are their music videos for “**Boy With Luv**,” “**Dynamite**,” and “**DNA**,” which all have over 1.4 billion views.

Headers of the Dataset Total = Total views (in millions) across all official channels Avg = Current daily average of all videos combined 100M = Number of videos with more than 100 million views
YouTube's Channels Dataset
kaggle.com
Updated Mar 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HarshitHGupta (2021). YouTube's Channels Dataset [Dataset]. https://www.kaggle.com/datasets/harshithgupta/youtubes-channels-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
HarshitHGupta
Area covered
YouTube
Description
Context

YouTube is an American online video-sharing platform headquartered in San Bruno, California. The service, created in February 2005 by three former PayPal employees—Chad Hurley, Steve Chen, and Jawed Karim—was bought by Google in November 2006 for US$1.65 billion and now operates as one of the company's subsidiaries. YouTube is the second most-visited website after Google Search, according to Alexa Internet rankings.

YouTube allows users to upload, view, rate, share, add to playlists, report, comment on videos, and subscribe to other users. Available content includes video clips, TV show clips, music videos, short and documentary films, audio recordings, movie trailers, live streams, video blogging, short original videos, and educational videos.

YouTube (the world-famous video sharing website) maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments, and likes). Note that they’re not the most-viewed videos overall for the calendar year”. Top performers on the YouTube trending list are music videos (such as the famously virile “Gangam Style”), celebrity and/or reality TV performances, and the random dude-with-a-camera viral videos that YouTube is well-known for.

This dataset is a daily record of the top trending YouTube videos.

Note that this dataset is a structurally improved version of this dataset.

Acknowledgements

This dataset was collected using the YouTube API. This Description is cited in Wikipedia.
Data from: Using Multistreaming Social Media Video as a Research Method for...
research.usc.edu.au
researchdata.edu.au
Updated Mar 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karen Sutherland; Krisztina Morris (2022). Using Multistreaming Social Media Video as a Research Method for Interview Data Collection [Dataset]. https://research.usc.edu.au/esploro/outputs/dataset/Using-Multistreaming-Social-Media-Video-as/99620208702621
Explore at:
Dataset updated
Mar 23, 2022
Dataset provided by
Sagehttp://www.sagepublications.com/
Authors
Karen Sutherland; Krisztina Morris
Time period covered
2022
Description
This dataset is designed to explore multistreaming social media video as a research method used to collect semi-structured interview data. The data are provided by Dr Karen E. Sutherland and Ms Krisztina Morris from the School of Business and Creative Industries at the University of the Sunshine Coast in Queensland, Australia. The dataset is drawn from the publicly available video recording of an interview undertaken as part of the research project called: ‘Like, Share, Follow’, a multistreaming show, featuring Dr Sutherland interviewing university graduates about their career journeys, that is broadcast across Facebook, LinkedIn, and Twitter and later uploaded to YouTube. This dataset examines how multistreaming video interview data can be used to answer research questions and the benefits and challenges this specific method of data collection can pose in the process of data analysis. The video example is accompanied by a teaching guide and a student guide.
iShowSpeed YouTube Channel Videos' Stats
kaggle.com
Updated Jan 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chik0di (2025). iShowSpeed YouTube Channel Videos' Stats [Dataset]. https://www.kaggle.com/datasets/chik0di/ishowspeed-youtube-channel-videos-and-stats
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 4, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chik0di
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
This dataset contains detailed information about IShowSpeed's YouTube channel, including metadata for all his videos and associated performance metrics. IShowSpeed, widely known for his energetic and entertaining content, has amassed a massive following on YouTube. This dataset provides insights into his content strategy, video performance, and audience engagement patterns.

Use Cases: - Analyze patterns in video uploads and performance over time. - Study audience interaction through likes, comments, and views. - Forecast future growth and engagement based on historical data. - Identify video features that resonate most with viewers.

Check out Notebook here

Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube...

zenodo.org

Updated Jun 29, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Jiro Kikkawa; Jiro Kikkawa; Masao Takaku; Masao Takaku (2025). Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube Videos (TPDL 2025) [Dataset]. http://doi.org/10.5281/zenodo.15377209

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.15377209

Dataset updated

Jun 29, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Jiro Kikkawa; Jiro Kikkawa; Masao Takaku; Masao Takaku

Area covered

YouTube

Description

This dataset and supplementary tables are released in conjunction with the TPDL 2025 paper titled “How Retracted Research Persists on YouTube: Retraction Severity, Visibility, and Disclosure.” They provide detailed information used in the analysis to promote transparency, ensure reproducibility, and facilitate future studies on scholarly communication and retractions.

The dataset contains the following files:

Filename	Data Format	Description
01_dataset_scholarly_references_on_YouTube.json.gz	JSON Lines	An integrated dataset of scholarly references in YouTube video descriptions, covering videos posted up to the end of December 2023. This dataset combines the Altmetric dataset and the YA Domain Dataset and is the basis for identifying references to retracted articles. This dataset contains 743,529 scholarly references (386,628 unique DOIs) found in 322,521 YouTube videos uploaded by 77,974 channels.
02_dataset_references_to_retracted_articles_on_YouTube.json.gz	JSON Lines	A dataset of retracted articles referenced in YouTube videos, used as the primary source for analysis in this paper. The dataset was created by cross-referencing the integrated reference dataset with the Retraction Watch database. It includes metadata such as DOI, article title, retraction reason, and severity classification (Severe, Moderate, or Minor) based on Woo and Walsh (2024), along with video- and channel-level statistics (e.g., view counts and subscriber counts) retrieved via the YouTube Data API v3 as of April 22, 2025. This dataset contains 1,002 retracted articles (360 unique DOIs) found in 956 YouTube videos uploaded by 714 channels.
03_full_list_table3_sorted_by_reference_count_retracted_articles_on_YouTube.json.gz	JSON Lines	Complete list corresponding to Table 3, "Top 7 retracted articles ranked by the number of YouTube videos in which they are referenced." in the paper.
04_full_list_table5_top10_most-viewed_video.json.gz	JSON Lines	Complete list corresponding to Table 5, "Top 10 most-viewed YouTube videos that reference retracted articles, sorted by video view count." in the paper.
05_detailed_manual_coding_40_sampled_retracted_articles.xlsx	XLSX	This file provides detailed annotations for a manually coded sample of 40 YouTube videos referencing retracted scholarly articles. The sample includes 10 randomly selected videos from each of the four analytical groups categorized by publication timing (before/after retraction) and retraction severity (Moderate/Severe). The file includes reference stance for each video, visual/verbal mention of the article, and relevant timestamps when applicable. This dataset supplements the manual analysis results presented in Tables 6 and 7 in paper.

Due to concerns over potential misuse (e.g., identification or harassment of individual content creators), this dataset is not made publicly available.
Researchers who wish to use this dataset for scholarly purposes may contact the authors to request access.

References

Woo, S., Walsh, J.P.: On the shoulders of fallen giants: What do references to retracted research tell us about citation behaviors? Quantitative Science Studies 5(1), 1–30 (2024). https://doi.org/10.1162/qss_a_00303
Kikkawa, J., Takaku, M.: How Retracted Article Persists on YouTube: Retraction Severity, Visibility, and Disclosure. Accepted for publication in the Proceedings of the 29th International Conference on Theory and Practice of Digital Libraries (TPDL 2025).
Accepted Papers (TPDL2025) - https://tpdl2025.github.io/Program/accepted_papers.html

Fundings

JSPS KAKENHI Grant Numbers JP22K18147 and JP23K11761.

H
Replication Data for: Beyond Views: Measuring and Predicting Engagement in...
dataverse.harvard.edu
Updated Aug 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siqi Wu; Marian-Andrei Rizoiu; Lexing Xie (2019). Replication Data for: Beyond Views: Measuring and Predicting Engagement in Online Videos [Dataset]. http://doi.org/10.7910/DVN/L3UWZT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/L3UWZT
Dataset updated
Aug 23, 2019
Dataset provided by
Harvard Dataverse
Authors
Siqi Wu; Marian-Andrei Rizoiu; Lexing Xie
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The dataset is first introduced in the following paper: Siqi Wu, Marian-Andrei Rizoiu, and Lexing Xie. Beyond Views: Measuring and Predicting Engagement in Online Videos. In AAAI International Conference on Weblogs and Social Media (ICWSM), 2018. Tweeted videos dataset This dataset contains YouTube videos published between July 1st and August 31st, 2016. To be collected, the video needs (a) be mentioned on Twitter during aforementioned collection period; (b) have insight statistics available; (c) have at least 100 views within the first 30 days after upload. Quality videos datasets These datasets contain videos deemed of high quality by domain experts. Vevo videos: Videos of verified Vevo artists, as of August 31st, 2016. Billboard16 videos: Videos of 2016 Billboard Hot 100 chart. Top news videos: Videos of top 100 most viewed News channels. freebase_mid_type_name.csv It maps a freebase mid to a real-world entity. See more details in this data description.
Copyright claims to YouTube H1 2023, by detection method
statista.com
ai-chatbox.pro
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Copyright claims to YouTube H1 2023, by detection method [Dataset]. https://www.statista.com/statistics/1281164/copyright-claims-youtube-by-detection-method/
Explore at:
Dataset updated
Apr 11, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide, YouTube
Description
During the first half of 2023, the majority of copyright claims received by YouTube were spotted by the platform's Content ID tool, which cross-checks uploaded videos against a larger file database. Over 2.75 million claims were submitted via Copyright Match Tool, while approximately of two million claims were submitted to the platform via webforms.
d
YouTube & Google Maps Data | 21+ Attributes | Channel metrics, Creator Info,...
datarade.ai
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exellius Systems (2024). YouTube & Google Maps Data | 21+ Attributes | Channel metrics, Creator Info, Video Metrics | Google My Business Rating, Maps | Social Media Data [Dataset]. https://datarade.ai/data-products/youtube-google-maps-data-20-attributes-channel-metrics-exellius-systems
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
May 27, 2024
Dataset authored and provided by
Exellius Systems
Area covered
Lesotho, Sao Tome and Principe, Honduras, Bonaire, Mayotte, Taiwan, Cameroon, Burkina Faso, Jersey, United Kingdom, YouTube
Description
Our dataset offers a unique blend of attributes from YouTube and Google Maps, empowering users with comprehensive insights into online content and geographical reach. Let's delve into what makes our data stand out:

Unique Attributes: - From YouTube: Detailed video information including title, description, upload date, video ID, and channel URL. Video metrics such as views, likes, comments, and duration are also provided. - Creator Info: Access author details like name and channel URL. - Channel Information: Gain insights into channel title, description, location, join date, and visual branding elements like logo and banner URLs. - Channel Metrics: Understand a channel's performance with metrics like total views, subscribers, and video count. - Google Maps Integration: Explore business ratings from Google My Business and location data from Google Maps.

Data Sourcing: - Our data is meticulously sourced from publicly available information on YouTube and Google Maps, ensuring accuracy and reliability.

Primary Use-Cases: - Marketing: Analyze video performance metrics to optimize content strategies. - Research: Explore trends in creator behavior and audience engagement. - Location-Based Insights: Utilize Google Maps data for market research, competitor analysis, and location-based targeting.

Fit within Broader Offering: - This dataset complements our broader data offering by providing rich insights into online content consumption and geographical presence. It enhances decision-making processes across various industries, including marketing, advertising, research, and business intelligence.

Usage Examples: - Marketers can identify popular video topics and optimize advertising campaigns accordingly. - Researchers can analyze audience engagement patterns to understand viewer preferences. - Businesses can assess their Google My Business ratings and geographical distribution for strategic planning.

With scalable solutions and high-quality data, our dataset offers unparalleled depth for extracting actionable insights and driving informed decisions in the digital landscape.
Youtube Dataset
kaggle.com
Updated Jan 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arbaz Mohammad (2023). Youtube Dataset [Dataset]. https://www.kaggle.com/datasets/arbazmohammad/youtube-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arbaz Mohammad
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
Dataset Description:

Column1: Video id of 11 characters. Column2: uploader of the video of string data type. Column3: Interval between day of establishment of Youtube and the date of uploading of the video of integer data type. Column4: Category of the video of String data type. Column5: Length of the video of integer data type. Column6: Number of views for the video of integer data type. Column7: Rating on the video of float data type. Column8: Number of ratings given on the video. Column9: Number of comments on the videos in integer data type. Column10: Related video ids with the uploaded video.
E
Webis YouTube 8M Augmented 2018
live.european-language-grid.eu
json
Updated Mar 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Webis YouTube 8M Augmented 2018 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7585
Explore at:
jsonAvailable download formats
Dataset updated
Mar 19, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We used the YouTube Data API to augment the YouTube 8M corpus by crawling a variety of meta data for the videos.
First point of interest was the "video resource," which comprises data about the video, such as the video’s title, description, uploader name, tags, view count, and more. Also included in the meta data is whether comments have been left for the video. If so, we downloaded them as well, including information about their authors, likes, dislikes, and responses.
There is no property which specifies a video’s language, since this information is not mandatory when uploading a video. Also, the API provides only information about the available captions, but not the captions themselves. Only the uploader of a video is given access to its captions via the API; we extracted them using youtube-dl. For each video, all manually created captions were downloaded, and auto-generated captions in the "default" language and English. The "default" auto-generated caption gives perhaps the only hint at a video’s original language.
Finally, we downloaded all thumbnails used to advertise a video, which are not available via the API, but only via a canonical URL. Our corpus provides the possibility to recreate the way a video is presented on YouTube (meta data and thumbnail), what the actual content is ((sub)titles and descriptions), and how its viewers reacted (comments).
If you use this dataset in your publication, please cite the dataset as outlined in the right column.
Z
Dataset used for HTTPS traffic classification using packet burst statistics
data.niaid.nih.gov
zenodo.org
Updated Apr 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cejka Tomas (2022). Dataset used for HTTPS traffic classification using packet burst statistics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4911550
Explore at:
Dataset updated
Apr 11, 2022
Dataset provided by
Hynek Karel
Tropkova Zdena
Cejka Tomas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We are publishing a dataset we created for the HTTPS traffic classification.

Since the data were captured mainly in the real backbone network, we omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).

During our research, we divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.

We have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. We also used several popular websites that primarily focus on the audience in our country. The identified traffic classes and their representatives are provided below:

Live Video Stream Twitch, Czech TV, YouTube Live

Video Player DailyMotion, Stream.cz, Vimeo, YouTube

Music Player AppleMusic, Spotify, SoundCloud

File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive

Website and Other Traffic Websites from Alexa Top 1M list
ChatGPT - Youtube Data
kaggle.com
Updated Mar 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dekomori_sanae09 (2023). ChatGPT - Youtube Data [Dataset]. https://www.kaggle.com/datasets/dekomorisanae09/chatgpt-youtube-analysis-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
dekomori_sanae09
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
The data is scrapped using the Youtube API.

Index

videoId: A unique video ID of the Youtube Video. publishedAt: Date of upload of the video. channelID: A unique channel ID of the Youtube Channel. title: The title of the youtube video. channelTitle: The name of the channel. channelType: The Youtube Category ID of the Channel Type.
ABOME: A Multi-platform Data Repository of Artificially Boosted Online Media...
zenodo.org
Updated Jan 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hridoy Sankar Dutta; Udit Arora; Tanmoy Chakraborty; Hridoy Sankar Dutta; Udit Arora; Tanmoy Chakraborty (2021). ABOME: A Multi-platform Data Repository of Artificially Boosted Online Media Entities [Dataset]. http://doi.org/10.5281/zenodo.3609250
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3609250
Dataset updated
Jan 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hridoy Sankar Dutta; Udit Arora; Tanmoy Chakraborty; Hridoy Sankar Dutta; Udit Arora; Tanmoy Chakraborty
Description
Motivation

The rise of online media has enabled users to choose various unethical and artificial ways of gaining social growth to boost their credibility (number of followers/retweets/views/likes/subscriptions) within a short time period. In this work, we present ABOME, a novel data repository consisting of datasets collected from multiple platforms for the analysis of blackmarket-driven collusive activities, which are prevalent but often unnoticed in online media. ABOME contains data related to tweets and users on Twitter, YouTube videos, YouTube channels. We believe ABOME is a unique data repository that one can leverage to identify and analyze blackmarket based temporal fraudulent activities in online media as well as the network dynamics.

License

Creative Commons License.

Description of the dataset

- Historical Data

We collected the metadata of each entity present in the historical data

Twitter:

We collected the following fields for retweets and followers on Twitter:

user_details: A JSON object representing a Twitter user.

tweet_details: A JSON object representing a tweet.

tweet_retweets: A JSON list of tweet objects representing the most recent 100 retweets of a given tweet.

https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object ↩︎

https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object ↩︎

YouTube:

We collected the following fields for YouTube likes and comments:

is_family_friendly: Whether the video is marked as family friendly or not.

genre: Genre of the video.

duration: Duration of the video in ISO 8601 format (duration type). This format is generally used when the duration denotes the amount of intervening time in a time interval.

description: Description of the video.

upload_date: Date that the video was uploaded.

is_paid: Whether the video is paid or not.

is_unlisted: The privacy status of the video, i.e., whether the video is unlisted or not. Here, the flag unlisted indicates that the video can only be accessed by people who have a direct link to it.

statistics: A JSON object containing the number of dislikes, views and likes for the video.

comments: A list of comments for the video. Each element in the list is a JSON object of the text (the comment text) and time (the time when the comment was posted).

We collected the following fields for YouTube channels:

channel_description: Description of the channel.

hidden_subscriber_count: Total number of hidden subscribers of the channel.

published_at: Time when the channel was created. The time is specified in ISO 8601 format (YYYY-MM-DDThh:mm:ss.sZ).

video_count: Total number of videos uploaded to the channel.

subscriber_count: Total number of subscribers of the channel.

view_count: The number of times the channel has been viewed.

kind: The API resource type (e.g., youtube#channel for YouTube channels).

country: The country the channel is associated with.

comment_count: Total number of comments the channel has received.

etag: The ETag of the channel which is an HTTP header used for web browser cache validation.

The historical data is stored in five directories named according to the type of data inside it. Each directory contains json files corresponding to the data described above.

- Time-series Data

We collect the following time-series data for retweets and followers on Twitter:

user_timeline: This is a JSON list of tweet objects in the user’s timeline, which consists of the tweets posted, retweeted and quoted by the user. The file created at each time interval contains the new tweets posted by the user during each time interval.

user_followers: This is a JSON file containing the user ids of all the followers of a user that were added or removed from the follower list during each time interval.

user_followees: This is a JSON file consisting of the user ids of all the users followed by a user, i.e., the followees of a user, that were added or removed from the followee list during each time interval.

tweet_details: This is a JSON object representing a given tweet, collected after every time interval.

tweet_retweets: This is a JSON list of tweet objects representing the most recent 100 retweets of a given tweet, collected after every time interval.

The time-series data is stored in directories named according to the timestamp of the collection time. Each directory contains sub-directories corresponding to the data described above.

Data Anonymization

The data is anonymized by removing all Personally Identifiable Information (PII) and generating pseud-IDs corresponding to the original IDs. A consistent mapping between the original and pseudo-IDs is maintained to maintain the integrity of the data.
c
ckanext-videoviewer
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-videoviewer [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-videoviewer
Explore at:
Dataset updated
Jun 4, 2025
Description
The videoviewer extension for CKAN aims to enhance the data catalog's capabilities by enabling direct video viewing or embedding of videos associated with datasets. While the provided documentation is minimal, it suggests the extension focuses on facilitating the integration and playback of video resources within the CKAN platform. It appears to allow CKAN to better handle and present video-based data resources, making them more accessible to users. Key Features (Inferred from the context of a video viewer extension): Video Resource Integration: Likely allows linking or embedding video resources (e.g., from YouTube, Vimeo, or direct file uploads) to datasets within CKAN. Inline Video Playback: Potentially provides a built-in video player within the CKAN interface, allowing users to view videos directly without leaving the platform. Configuration Settings (Assumed): May offer configuration options for specifying supported video formats, player settings, or integration with third-party video hosting services. Metadata Display (Inferred): Could display video-related metadata, such as duration, resolution, or upload date, alongside the video player. Theming Integration (Expected): Should seamlessly integrate with CKAN's theming system to provide a consistent user experience. Technical Integration: Though specific details are not provided, installation instructions suggest integrating the extension by adding videoviewer to the ckan.plugins setting in the CKAN configuration file. Activation involves installing the Python package and restarting CKAN. Benefits & Impact (Predicted): While the documentation is sparse, based on common video viewer features, we can assume the video viewer extension would improve the accessibility and utility of video-based data resources managed within CKAN. This enhancement will likely increase user engagement and provide a richer data discovery experience.
Data from: HTTPS traffic classification
kaggle.com
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Đinh Ngọc Ân (2024). HTTPS traffic classification [Dataset]. https://www.kaggle.com/datasets/inhngcn/https-traffic-classification/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Đinh Ngọc Ân
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The people from Czech are publishing a dataset for the HTTPS traffic classification.

Since the data were captured mainly in the real backbone network, they omitted IP addresses and ports. The datasets consist of calculated from bidirectional flows exported with flow probe Ipifixprobe. This exporter can export a sequence of packet lengths and times and a sequence of packet bursts and time. For more information, please visit ipfixprobe repository (Ipifixprobe).

During research, they divided HTTPS into five categories: L -- Live Video Streaming, P -- Video Player, M -- Music Player, U -- File Upload, D -- File Download, W -- Website, and other traffic.

They have chosen the service representatives known for particular traffic types based on the Alexa Top 1M list and Moz's list of the most popular 500 websites for each category. They also used several popular websites that primarily focus on the audience in Czech. The identified traffic classes and their representatives are provided below:

Live Video Stream Twitch, Czech TV, YouTube Live Video Player DailyMotion, Stream.cz, Vimeo, YouTube Music Player AppleMusic, Spotify, SoundCloud File Upload/Download FileSender, OwnCloud, OneDrive, Google Drive Website and Other Traffic Websites from Alexa Top 1M list
f
Reasons for excluding videos for each search term.
plos.figshare.com
xls
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiali Wu; Danlin Li; Minkui Lin (2024). Reasons for excluding videos for each search term. [Dataset]. http://doi.org/10.1371/journal.pone.0298597.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298597.t005
Dataset updated
Mar 6, 2024
Dataset provided by
PLOS ONE
Authors
Jiali Wu; Danlin Li; Minkui Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reasons for excluding videos for each search term.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2024). YouTube Videos Datasets [Dataset]. https://brightdata.com/products/datasets/youtube/videos

Data from: YouTube Videos Datasets

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Dec 20, 2024

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide, YouTube

Description

Use our YouTube Videos dataset to extract detailed information from public videos and filter by video title, views, upload date, or likes. Data points include video URL, title, description, thumbnail, upload date, view count, like count, comment count, tags, and more. You can purchase the entire dataset or a customized subset, tailored to your needs. Popular use cases for this dataset include trend analysis, content performance tracking, brand monitoring, and influencer campaign optimization.

Clear search

Close search

Google apps

Main menu

Data from: YouTube Videos Datasets

Hours of video uploaded to YouTube every minute 2007-2022

Top 1000 YouTube Channels in the World 🌐📊🎥

Spotify and Youtube

Top Youtube Artist

YouTube's Channels Dataset

Context

Acknowledgements

Data from: Using Multistreaming Social Media Video as a Research Method for...

iShowSpeed YouTube Channel Videos' Stats

Dataset and Supplementary Tables on Retracted Articles Referenced in YouTube...

Replication Data for: Beyond Views: Measuring and Predicting Engagement in...

Copyright claims to YouTube H1 2023, by detection method

YouTube & Google Maps Data | 21+ Attributes | Channel metrics, Creator Info,...

Youtube Dataset

Webis YouTube 8M Augmented 2018

Dataset used for HTTPS traffic classification using packet burst statistics

ChatGPT - Youtube Data

Index

ABOME: A Multi-platform Data Repository of Artificially Boosted Online Media...

ckanext-videoviewer

Data from: HTTPS traffic classification

Reasons for excluding videos for each search term.

Data from: YouTube Videos Datasets