40 datasets found
  1. i

    Data from: YouTube Video Network Dataset for Israel-Hamas War

    • ieee-dataport.org
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thejas T (2023). YouTube Video Network Dataset for Israel-Hamas War [Dataset]. https://ieee-dataport.org/documents/youtube-video-network-dataset-israel-hamas-war
    Explore at:
    Dataset updated
    Dec 23, 2023
    Authors
    Thejas T
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Israel, YouTube
    Description

    Over the past few years YouTube has became a popular site for video broadcasting and earning money by publishing various different skills in the form of videos. For some people it has become a main source to earn money. Getting the videos trending among the viewers is one of the major tasks which each and every content creator wants. Popularity of any video and its reach to the audience is completely based on YouTube's Recommendation algorithm. This document is a dataset descriptor for the dataset collected over the time span of about 45 days during the Israel-Hamas War

  2. P

    Long Video Dataset Dataset

    • paperswithcode.com
    Updated Nov 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongqing Liang; Xin Li; Navid Jafari; Qin Chen (2020). Long Video Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/long-video-dataset
    Explore at:
    Dataset updated
    Nov 18, 2020
    Authors
    Yongqing Liang; Xin Li; Navid Jafari; Qin Chen
    Description

    We randomly selected three videos from the Internet, that are longer than 1.5K frames and have their main objects continuously appearing. Each video has 20 uniformly sampled frames manually annotated for evaluation.

  3. d

    A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

    • search.dataone.org
    Updated Sep 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.7910/DVN/QTJ9HC
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew
    Time period covered
    Jan 1, 2024 - May 31, 2024
    Area covered
    YouTube
    Description

    Please cite the following paper when using this dataset: N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: http://arxiv.org/abs/2406.07693 Abstract This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.

  4. Average daily time spent on social media worldwide 2012-2025

    • statista.com
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

  5. d

    Custom dataset from any website on the Internet

    • datarade.ai
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ScrapeLabs (2022). Custom dataset from any website on the Internet [Dataset]. https://datarade.ai/data-products/custom-dataset-from-any-website-on-the-internet-scrapelabs
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Sep 21, 2022
    Dataset authored and provided by
    ScrapeLabs
    Area covered
    Kazakhstan, Bulgaria, India, Argentina, Tunisia, Lebanon, Jordan, Guinea-Bissau, Aruba, Turks and Caicos Islands
    Description

    We'll extract any data from any website on the Internet. You don't have to worry about buying and maintaining complex and expensive software, or hiring developers.

    Some common use cases our customers use the data for: • Data Analysis • Market Research • Price Monitoring • Sales Leads • Competitor Analysis • Recruitment

    We can get data from websites with pagination or scroll, with captchas, and even from behind logins. Text, images, videos, documents.

    Receive data in any format you need: Excel, CSV, JSON, or any other.

  6. Facebook Spam Dataset

    • kaggle.com
    Updated Apr 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khaja Hussain SK (2021). Facebook Spam Dataset [Dataset]. https://www.kaggle.com/khajahussainsk/facebook-spam-dataset/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Khaja Hussain SK
    Description

    Context Collection of Facebook spam-legit profile and content-based data. It can be used for classification tasks.

    Content The dataset can be used for building machine learning models. To collect the dataset, Facebook API and Facebook Graph API are used and the data is collected from public profiles. There are 500 legit profiles and 100 spam profiles. The list of features is as follows with Label (0-legit, 1-spam). 1. Number of friends 2. Number of followings 3. Number of Community 4. The age of the user account (in days) 5. Total number of posts shared 6. Total number of URLs shared 7. Total number of photos/videos shared 8. Fraction of the posts containing URLs 9. Fraction of the posts containing photos/videos 10. Average number of comments per post 11. Average number of likes per post 12. Average number of tags in a post (Rate of tagging) 13. Average number of hashtags present in a post

    Inspiration Dataset helps the community to understand how features can help to differ Facebook legit users from spam users.

  7. youtube

    • kaggle.com
    Updated Sep 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramin Rahimzada (2020). youtube [Dataset]. https://www.kaggle.com/raminrahimzada/youtube/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ramin Rahimzada
    Area covered
    YouTube
    Description

    Context

    A portion of data grabbed from Youtube

    Content

    Dataset contains youtube channels-videos-comments

    Acknowledgements

    Data shown in dataset such as likes count may be different from now so we described a GrabDate column

  8. d

    Replication Data for: Automated Coding of Political Campaign Advertisement...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarr, Alex; Imai, Kosuke; Hwang, June (2023). Replication Data for: Automated Coding of Political Campaign Advertisement Videos: An Empirical Validation Study [Dataset]. http://doi.org/10.7910/DVN/6SWKPR
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Tarr, Alex; Imai, Kosuke; Hwang, June
    Description

    Video advertisements, either through television or the Internet, play an essential role in modern political campaigns. For over two decades, researchers have studied television video ads by analyzing the hand-coded data from the Wisconsin Advertising Project and its successor, the Wesleyan Media Project (WMP). Unfortunately, manually coding more than a hundred of variables, such as issue mentions, opponent appearance, and negativity, for many videos is a laborious and expensive process. We propose to automatically code campaign advertisement videos. Applying state-of-the-art machine learning methods, we extract various audio and image features from each video file. We show that our machine coding is comparable to human coding for many variables of the WMP data sets. Since many candidates make their advertisement videos available on the Internet, automated coding can dramatically improve the efficiency and scope of campaign advertisement research. Open-source software package is available for implementing the proposed methodology.

  9. P

    EDUVSUM Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junaid Ahmed Ghauri; Sherzod Hakimov; Ralph Ewerth, EDUVSUM Dataset [Dataset]. https://paperswithcode.com/dataset/eduvsum
    Explore at:
    Authors
    Junaid Ahmed Ghauri; Sherzod Hakimov; Ralph Ewerth
    Description

    EDUVSUM contains educational videos with subtitles from three popular e-learning platforms: Edx,YouTube, and TIB AV-Portal that cover the following topics: crash course on history of science and engineering, computer science, python and web programming, machine learning and computer vision, Internet of things (IoT), and software engineering. In total, the current version of the dataset contains 98 videos with ground truth values annotated by a user with an academic background in computer science.

  10. RECOD.ai Events Dataset

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, pdf
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Nascimento; José Nascimento; Anderson Rocha; Anderson Rocha (2024). RECOD.ai Events Dataset [Dataset]. http://doi.org/10.5281/zenodo.5547606
    Explore at:
    pdf, application/gzipAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    José Nascimento; José Nascimento; Anderson Rocha; Anderson Rocha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube.

    Data Collection

    We used Social Tracker (https://github.com/MKLab-ITI/mmdemo-dockerized), along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint (https://github.com/twintproject/twint). In both cases, we provided keywords related to the event to receive the data.

    It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis.

    Content

    We have data from 34 events, and for each of them we provide the files:

    items_full.csv: It contains links to any social media post that was collected.

    images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media.

    video.csv: Urls of YouTube videos that were gathered about the event.

    video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file.

    description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it.

    In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work.

    Events

    We divided the events into six groups. They are,

    1. Fire

    • Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions

    • 14 Events

    2. Collapse

    • Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire).

    • 5 Events

    3. Shooting

    • Likely images of guns and police officers. Few or no destruction of the environment.

    • 5 Events

    4. Demonstration

    • Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event.

    • 7 Events

    5. Collision

    • Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street.

    • 1 Event

    6. Flood

    • Events that range from fierce rain to a tsunami. Many pictures depict water.

    • 2 Events

    We enlist the events in the file recod-ai-events-dataset-list.pdf

    Media Content

    Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.

    Funding

    DéjàVu thematic project, São Paulo Research Foundation (grants 2017/12646-3, 2018/18264-8 and 2020/02241-9)

  11. f

    Data from: Faces in the wild: A naturalistic study of children’s facial...

    • tandf.figshare.com
    avi
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael M. Shuster; Linda A. Camras; Adam Grabell; Susan B. Perlman (2023). Faces in the wild: A naturalistic study of children’s facial expressions in response to an Internet prank [Dataset]. http://doi.org/10.6084/m9.figshare.8121359.v2
    Explore at:
    aviAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Michael M. Shuster; Linda A. Camras; Adam Grabell; Susan B. Perlman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There is surprisingly little empirical evidence supporting theoretical and anecdotal claims regarding the spontaneous production of prototypic facial expressions used in numerous emotion recognition studies. Proponents of innate prototypic expressions believe that this lack of evidence may be due to ethical restrictions against presenting powerful elicitors in the lab. The current popularity of internet platforms designed for public sharing of videos allows investigators to shed light on this debate by examining naturally-occurring facial expressions outside the laboratory. An Internet prank (“Scary Maze”) has provided a unique opportunity to observe children reacting to a consistent fear- and surprise-inducing stimulus: The unexpected presentation of a “scary face” during an online maze game. The purpose of this study was to examine children’s facial expressions in this naturalistic setting. Emotion ratings of non-facial behaviour (provided by untrained undergraduates) and anatomically-based facial codes were obtained from 60 videos of children (ages 4–7) found on YouTube. Emotion ratings were highest for fear and surprise. Correspondingly, children displayed more facial expressions of fear and surprise than for other emotions (e.g. anger, joy). These findings provide partial support for the ecological validity of fear and surprise expressions. Still prototypic expressions were produced by fewer than half the children.

  12. UMAHand: Hand Activity Dataset (Universidad de Málaga)

    • figshare.com
    • portaldelainvestigacion.uma.es
    zip
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Casilari; Jennifer Barbosa-Galeano; Francisco Javier González-Cañete (2024). UMAHand: Hand Activity Dataset (Universidad de Málaga) [Dataset]. http://doi.org/10.6084/m9.figshare.25638246.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Eduardo Casilari; Jennifer Barbosa-Galeano; Francisco Javier González-Cañete
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The objective of the UMAHand dataset is to provide a systematic, Internet-accessible benchmarking database for evaluating algorithms for the automatic identification of manual activities. The database was created by monitoring 29 predefined activities involving specific movements of the dominant hand. These activities were performed by 25 participants, each completing a certain number of repetitions. During each movement, participants wore a 'mote' or Shimmer sensor device on their dominant hand's wrist. This sensor, comparable in weight and volume to a wristwatch, was attached with an elastic band according to a predetermined orientation.The Shimmer device contains an Inertial Measurement Unit (IMU) with a triaxial accelerometer, gyroscope, magnetometer, and barometer. These sensors recorded measurements of acceleration, angular velocity, magnetic field, and atmospheric pressure at a constant sampling frequency of 100 Hz during each movement.The UMAHand Dataset comprises a main directory and three subdirectories: TRACES (containing measurements), VIDEOS (containing video sequences) and SCRIPTS (with two scripts that automate the downloading, unzipping and processing of the dataset). The main directory also includes three descriptive plain text files and an image:• "readme.txt": this file is a brief guide of the dataset which describes the basic characteristics of the database, the testbed or experimental framework used to generate it and the organization of the data file.• "user_characteristics.txt": which contains a line of six numerical (comma-separated) values for each participant describing their personal characteristics in the following order: 1) an abstract user identifier (a number from 01 to 25), 2) a binary value indicating whether the participant is left-handed (0) or right-handed (1), 3) a numerical value indicating gender: male (0), female (1), undefined or undisclosed (2), 4) the weight in kg, and 5) the height in cm and 6) the age in years.• "activity_description.txt": For each activity, this text file incorporates a line with the activity identifier (numbered from 01 to 29) and an alphanumeric string that briefly describes the performed action.• "sensor_orientation.jpg": a JPEG-type image file illustrating the way the sensor is carried and the orientation of the measurement axes.The TRACE subfolder with the data is, in turn, organized into 25 secondary subfolders, one for each participant, named with the word "output" followed by underscore symbol (_) and the corresponding participant identifier (a number from 1 to 25). Each subdirectory contains one CSV (Comma Separated Values) file for each trial (each repetition of any activity) performed by the corresponding volunteer.The filenames with the monitored data follow the following format: "user_XX_activity_YY_trial_ZZ.csv" where XX, YY, and ZZ represent the identifiers of the participant (XX), the activity (YY) and the repetition number (ZZ), respectively.In the files, which do not include any header, each line corresponds to a sample taken by the sensing node. Thus, each line of the CSV files presents a set of the simultaneous measurements captured by the sensors of the Shimmer mote at a certain instant. The values in each line are arranged as follows:•Timestamp, Ax, Ay, Az, Gx, Gy, Gz, Mx, My, Mz, Pwhere:-Timestamp is the time indication of the moment when the following measurements were taken. Time is measured in milliseconds elapsed since the start of the recording. Therefore, the first sample, in the first line of the file, has a zero value while the rest of the timestamps in the file are relative to this first sample.-Ax, Ay, Az are the measurements of the three axes of the triaxial accelerometer (in g units).-Gx, Gy, Gz indicate the components of the angular velocity measured by the triaxial gyroscope (in degrees per second or dps).-Mx, My, Mz represent the 3-axis data in microteslas (µT) captured by the magnetometer.-P is the measurement of pressure in millibars.Besides, the VIDEOS directory includes 29 anonymized video clips that illustrate with the corresponding examples the 29 manual activities carried out by the participants. The video files are encoded in MPEG4 format and named according to the format "Example_Activity_XX.mp4", where XX indicates the identifier of the movement (as described in the activity_description.txt file).Finally, the SCRIPTS subfolder comprises two scripts written in Python and Matlab. These two programs (named Load_traces), which perform the same function, are designed to automate the downloading and processing of the data. Specifically, these scripts perform the following tasks:1. Download the database from the public repository as a single compressed zip file.2. Unzip the aforementioned file and create the subfolder structure of the dataset in a specific directory named UMAHand_Dataset. As previously commented, in the subfolder named TRACES, one CSV trace file per each experiment (i.e. per each movement, user, and trial) is created.3. Read all the CSV files and store their information in a list of dictionaries (Python) or a matrix of structures (Matlab) named datasetTraces. Each element in that list/matrix has two fields: the filename (which identifies the user, the type of performed activity, and the trial number) and a numerical array of 11 columns containing the timestamps and the measurements of the sensors for that experiment (arranged as mentioned above).All experiments and data acquisition were conducted in private home environments. Participants were asked to perform those activities involving sustained or continuous hand movements (e.g. clapping hands) for at least 10 seconds. In the case of brief and punctual movements, which might require less than 10 seconds (e.g. picking up an object from the floor), volunteers were simply asked to execute the action until its conclusion. Thus, a total of 752 samples were collected, with durations ranging from 1.98 to 119.98 seconds.

  13. m

    Network traffic and code for machine learning classification

    • data.mendeley.com
    Updated Feb 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Labayen (2020). Network traffic and code for machine learning classification [Dataset]. http://doi.org/10.17632/5pmnkshffm.2
    Explore at:
    Dataset updated
    Feb 20, 2020
    Authors
    Víctor Labayen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.

    Activities:

    Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.

    The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.

    The amount of data is stated as follows:

    Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes

    The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.

  14. P

    Shot2Story20K Dataset

    • paperswithcode.com
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingfei Han; Linjie Yang; Xiaojun Chang; Heng Wang (2024). Shot2Story20K Dataset [Dataset]. https://paperswithcode.com/dataset/shot2story20k
    Explore at:
    Dataset updated
    May 30, 2024
    Authors
    Mingfei Han; Linjie Yang; Xiaojun Chang; Heng Wang
    Description

    A short clip of video may contain progression of multiple events and an interesting story line. A human needs to capture both the event in every shot and associate them together to understand the story behind it.

    In this work, we present a new multi-shot video understanding benchmark Shot2Story with detailed shot-level captions and comprehensive video summaries. To facilitate better semantic understanding of videos, we provide captions for both visual signals and human narrations. We design several distinct tasks including single-shot video and narration captioning, multi-shot video summarization, and video retrieval with shot descriptions.

    Preliminary experiments show some challenges to generate a long and comprehensive video summary. Nevertheless, the generated imperfect summaries can already significantly boost the performance of existing video understanding tasks such as video question-answering, promoting an underexplored setting of video understanding with detailed summaries.

  15. j

    Data from: Television shows based on video games 1975-2019: Original data...

    • jyx.jyu.fi
    Updated Mar 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tero Kerttula (2025). Television shows based on video games 1975-2019: Original data and preliminary analysis [Dataset]. http://doi.org/10.17011/jyx/dataset/71622
    Explore at:
    Dataset updated
    Mar 23, 2025
    Authors
    Tero Kerttula
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of data of different types of television shows based on video games from years mentioned in the title. The data has been used in articles and conference presentations before (e.g. Kerttula 2019; Kerttula 2020). The data is free to use in any future publications with proper references to the author and the original data. Should the data be used in further research, it is to be noted that the dataset is not 100% complete. The reasons to this are difficulties with language and cultural barriers. It also needs to be mentioned, that some of the television shows and production companies have probably being forgotten over time, which means that a complete list would quite likely prove to be very difficult to gather. Some of the data included is missing classification information. This is because in these cases, the data needed was not available or hard to figure out. For example, the time slot data was missing for these shows, or there was not enough information available to make conclusions about the structure of the show. This applies only for a handful of shows, however. This data does not compromise or endanger any copyrights or personal information. All the data gathered here is publically available from different internet sources. No personal information, such as addresses, phone numbers or contact persons was recorded in the data. Some shows feature episodes from video depositories around internet, but if the production company wants to take the episodes offline, it does not harm the dataset.

  16. A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    csv
    Updated Jul 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur; Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and other sources about the 2024 outbreak of Measles [Dataset]. http://doi.org/10.5281/zenodo.11711230
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nirmalya Thakur; Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 15, 2024
    Area covered
    YouTube
    Description

    Please cite the following paper when using this dataset:

    N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A. Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” Proceedings of the 26th International Conference on Human-Computer Interaction (HCII 2024), Washington, USA, 29 June - 4 July 2024. (Accepted as a Late Breaking Paper, Preprint Available at: https://doi.org/10.48550/arXiv.2406.07693)

    Abstract

    This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.

  17. n

    youtube

    • networkrepository.com
    csv
    Updated Jun 23, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Network Data Repository (2016). youtube [Dataset]. https://networkrepository.com/soc-youtube.php
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 23, 2016
    Dataset authored and provided by
    Network Data Repository
    License

    https://networkrepository.com/policy.phphttps://networkrepository.com/policy.php

    Area covered
    YouTube
    Description

    Youtube online social network - Youtube is a video-sharing web site that includes a social network. The dataset contains a list of all of the user-to-user links.

  18. P

    QUILT-1M Dataset

    • paperswithcode.com
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). QUILT-1M Dataset [Dataset]. https://paperswithcode.com/dataset/quilt-1m
    Explore at:
    Dataset updated
    Feb 10, 2025
    Description

    Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of similar data in the medical field, specifically in histopathology, has halted similar progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models), handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets, from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new pathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.

  19. h

    activities-videos-of-120-uinque-people-different-skin-tones

    • huggingface.co
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AIxBlock (2025). activities-videos-of-120-uinque-people-different-skin-tones [Dataset]. https://huggingface.co/datasets/AIxBlock/activities-videos-of-120-uinque-people-different-skin-tones
    Explore at:
    Dataset updated
    May 20, 2025
    Authors
    AIxBlock
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset is provided by AIxBlock and features 40 sets, each containing 120 unique individuals with diverse skin tones. Each set includes multiple participants performing a variety of activities across different backgrounds. All videos were recorded using smartphones following a strict guideline—none of the footage was sourced from the internet. These videos are designed for training computer vision (CV) models to recognize human activities and movement across a range of environments and… See the full description on the dataset page: https://huggingface.co/datasets/AIxBlock/activities-videos-of-120-uinque-people-different-skin-tones.

  20. d

    Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data

    • catalog.data.gov
    • data.transportation.gov
    • +5more
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2025). Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data [Dataset]. https://catalog.data.gov/dataset/next-generation-simulation-ngsim-vehicle-trajectories-and-supporting-data
    Explore at:
    Dataset updated
    Jun 16, 2025
    Dataset provided by
    Federal Highway Administration
    Description

    Click “Export” on the right to download the vehicle trajectory data. The associated metadata and additional data can be downloaded below under "Attachments". Researchers for the Next Generation Simulation (NGSIM) program collected detailed vehicle trajectory data on southbound US 101 and Lankershim Boulevard in Los Angeles, CA, eastbound I-80 in Emeryville, CA and Peachtree Street in Atlanta, Georgia. Data was collected through a network of synchronized digital video cameras. NGVIDEO, a customized software application developed for the NGSIM program, transcribed the vehicle trajectory data from the video. This vehicle trajectory data provided the precise location of each vehicle within the study area every one-tenth of a second, resulting in detailed lane positions and locations relative to other vehicles. Click the "Show More" button below to find additional contextual data and metadata for this dataset. For site-specific NGSIM video file datasets, please see the following: - NGSIM I-80 Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-I-80-Vide/2577-gpny - NGSIM US-101 Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-US-101-Vi/4qzi-thur - NGSIM Lankershim Boulevard Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-Lankershi/uv3e-y54k - NGSIM Peachtree Street Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-Peachtree/mupt-aksf

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Thejas T (2023). YouTube Video Network Dataset for Israel-Hamas War [Dataset]. https://ieee-dataport.org/documents/youtube-video-network-dataset-israel-hamas-war

Data from: YouTube Video Network Dataset for Israel-Hamas War

Related Article
Explore at:
Dataset updated
Dec 23, 2023
Authors
Thejas T
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Israel, YouTube
Description

Over the past few years YouTube has became a popular site for video broadcasting and earning money by publishing various different skills in the form of videos. For some people it has become a main source to earn money. Getting the videos trending among the viewers is one of the major tasks which each and every content creator wants. Popularity of any video and its reach to the audience is completely based on YouTube's Recommendation algorithm. This document is a dataset descriptor for the dataset collected over the time span of about 45 days during the Israel-Hamas War

Search
Clear search
Close search
Google apps
Main menu