Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.
Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.
The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is flood data in the city of Parepare, South Sulawesi Province, which contains video data collected from social media Instagram. This dataset was created to develop deep learning methods for recognizing floods and surrounding objects, specializing in semantic segmentation methods. This dataset consists of three folders, namely raw video data collected from Instagram, image data resulting from splitting the video into several images, and annotation data containing images that have been color-labeled according to their objects. There are 6 object classifications based on color labels, namely: floods (blue light), buildings (red), plants (green), people (sage), vehicles (orange), and sky (dark blue). This dataset has data in image (JPEG/PNG) and video (MP4) formats. This dataset is suitable for object recognition tasks with the semantic segmentation method. In addition, because this dataset contains original data in the form of videos and images, it can be developed for other purposes in the future. As a note, if you intend to use this dataset, please ensure that you comply with applicable copyright, privacy, and regulatory requirements. If you intend to read the paper about this dataset, please visit this link: https://doi.org/10.1016/j.dib.2023.109768
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Video Dataset on Hugging Face
This repository hosts the video dataset, a widely used benchmark dataset for human action recognition in videos. The dataset has been processed and uploaded to the Hugging Face Hub for easy access, sharing, and integration into machine learning workflows.
Introduction
The dataset is a large-scale video dataset designed for action recognition tasks. It contains 13,320 video clips across 101 action categories, making it one of the most… See the full description on the dataset page: https://huggingface.co/datasets/ProgramerSalar/video-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
9x9 views
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Comprehensive video dataset of interviews designed to train AI/ML models. Enhance machine learning with diverse, high-quality, and realistic interview scenarios.
Video dataset capturing diverse facial expressions and emotions from 1000+ people, suitable for emotion recognition AI training
About
We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 500+ hours of footage and featuring 20,841 unique identities from around the world.
Distribution
Detailing the format, size, and structure of the dataset:
-Total Size: 2.7TB
-Total Videos: 47,547
-Identities Covered: 20,841
-Resolution: 60% 4k(1980), 33% fullHD(1080)
-Formats: MP4
-Full-length videos with visible mouth movements in every frame.
-Minimum face size of 400 pixels.
-Video durations range from 20 seconds to 5 minutes.
-Faces have not been cut out, full screen videos including backgrounds.
Usage
This dataset is ideal for a variety of applications:
Face Recognition & Verification: Training and benchmarking facial recognition models.
Action Recognition: Identifying human activities and behaviors.
Re-Identification (Re-ID): Tracking identities across different videos and environments.
Deepfake Detection: Developing methods to detect manipulated videos.
Generative AI: Training high-resolution video generation models.
Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.
Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.
Coverage
Explaining the scope and coverage of the dataset:
Geographic Coverage: Worldwide
Time Range: Time range and size of the videos have been noted in the CSV file.
Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.
Languages Covered (Videos):
English: 23,038 videos
Portuguese: 1,346 videos
Spanish: 677 videos
Norwegian: 1,266 videos
Swedish: 1,056 videos
Korean: 848 videos
Polish: 1,807 videos
Indonesian: 1,163 videos
French: 1,102 videos
German: 1,276 videos
Japanese: 1,433 videos
Dutch: 1,666 videos
Indian: 1,163 videos
Czech: 590 videos
Chinese: 685 videos
Italian: 975 videos
Philipeans: 920 videos
Bulgaria: 340 videos
Romanian: 1144 videos
Arabic: 1691 videos
Who Can Use It
List examples of intended users and their use cases:
Data Scientists: Training machine learning models for video-based AI applications.
Researchers: Studying human behavior, facial analysis, or video AI advancements.
Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.
Additional Notes
Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, down sampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 100GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file. I’d be happy to provide example videos selected by the potential buyer.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
To address the limitations of current datasets used for training automated crime and violence detection systems, we have created a new, balanced dataset consisting of 3,000 video clips. The dataset, which includes an equal number of violent and non-violent real-world scenarios recorded by non-professional actors, provides a more comprehensive and representative source for the development and assessment of these systems. Security and law enforcement professionals can use this comprehensive approach to analyze surveillance footage and identify pertinent incidents more efficiently.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
🇬🇧 English:
This synthetic dataset is designed for predicting the popularity of YouTube videos using metadata. It includes fields like video title, duration, tags, and view count. Useful for regression modeling, feature engineering, and exploring social media analytics.
Use this dataset to:
Features:
🇹🇷 Türkçe:
Bu sentetik veri seti, YouTube videolarının popülerliğini (izlenme sayısını) tahmin etmek amacıyla oluşturulmuştur. Başlık uzunluğu, etiket sayısı ve video süresi gibi meta verileri içermektedir. Sosyal medya analizi ve regresyon modeli geliştirmek isteyenler için uygundur.
Bu veri seti sayesinde:
Değişkenler:
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The SAW-IT-Plus dataset contains 11,458 videos collected in the wild, and 22 homemade videos (snake category). Videos are arranged in 8 main categories of animals (frogs - 0, snakes - 1, lizards - 2, birds - 3, small mammals < 2kg - 4, medium or large mammals > 2kg - 5, spiders - 7 and scorpions - 8). Echidnas - originally category 6 – were merged with big mammals. Some videos of crustacea and other reptiles are available but not classified. Empty videos (7,896) were added to allow for further testing of the algorithm. They are separated in 3 categories (details in Table 1).
CSV files are available to detail the species for frogs, lizards, birds and small mammals for each video. Because the videos were mainly collected from real-world data; the number of videos for each animal category are unbalanced (Table 1). This folder also contains training images used to automatically detect videos containing animals in our overall dataset. More information available in the ReadMe files.
The dataset was collected in Victoria, Australia, from February to October 2021 as part of the ERP22 (formerly ARI-PPD 05) grant.
UCF Crime Dataset in the most suitable structure. Contains 1900 videos from 13 different categories. To ensure the quality of this dataset, it is trained ten annotators (having different levels of computer vision expertise) to collect the dataset. Using videos search on YouTube and LiveLeak using text search queries (with slight variations e.g. “car crash”, “road accident”) of each anomaly.
Population distribution : race distribution: Asians, Caucasians, black people; gender distribution: gender balance; age distribution: from child to the elderly, the young people and the middle aged are the majorities
Collection environment : indoor scenes, outdoor scenes
Collection diversity : various postures, expressions, light condition, scenes, time periods and distances
Collection device : iPhone, android phone, iPad
Collection time : daytime,night
Image Parameter : the video format is .mov or .mp4, the image format is .jpg
Accuracy : the accuracy of actions exceeds 97%
EDUVSUM contains educational videos with subtitles from three popular e-learning platforms: Edx,YouTube, and TIB AV-Portal that cover the following topics: crash course on history of science and engineering, computer science, python and web programming, machine learning and computer vision, Internet of things (IoT), and software engineering. In total, the current version of the dataset contains 98 videos with ground truth values annotated by a user with an academic background in computer science.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive collection of high-quality image and video datasets for computer vision, AI training, and machine learning research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
namely VSOD
The ShareGPT4Video dataset is a large-scale resource designed to improve video understanding and generation¹. It features 1.2 million highly descriptive captions⁴ for video clips, surpassing existing datasets in diversity and information content⁴. The captions cover a wide range of aspects, including world knowledge, object properties, spatial relationships, and aesthetic evaluations⁴.
The dataset includes detailed captions of 40K videos generated by GPT-4V¹ and 4.8M videos generated by ShareCaptioner-Video¹. The videos are sourced from YouTube and other user-uploaded video websites, and they cover a variety of scenarios, such as human activities and auto-driving¹.
The ShareGPT4Video dataset also provides a basis for the ShareCaptioner-Video, an exceptional video captioner capable of efficiently generating high-quality captions for videos of a wide range of resolution, aspect ratio, and duration¹.
For example, the dataset includes a detailed caption of a video documenting a meticulous meal preparation by an individual with tattooed forearms¹. The caption describes the individual's actions in detail, from slicing a cucumber to mixing the dressing and adding croutons to the salad¹.
In addition to its use in research, the ShareGPT4Video dataset has been used to train the sharegpt4video-8b model, an open-source video chatbot². This model was trained on open-source video instruction data and is primarily intended for researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence².
(1) arXiv:2406.04325v1 [cs.CV] 6 Jun 2024. https://arxiv.org/pdf/2406.04325. (2) ShareGPT4V: Improving Large Multi-Modal Models with Better Captions. https://arxiv.org/abs/2311.12793. (3) Lin-Chen/sharegpt4video-8b · Hugging Face. https://huggingface.co/Lin-Chen/sharegpt4video-8b. (4) ShareGPT4Video: Improving Video Understanding and Generation with .... https://www.aimodels.fyi/papers/arxiv/sharegpt4video-improving-video-understanding-generation-better-captions. (5) GitHub - ShareGPT4Omni/ShareGPT4Video: An official implementation of .... https://github.com/ShareGPT4Omni/ShareGPT4Video. (6) undefined. https://sharegpt4video.github.io/.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was created by myself. This dataset contains videos of people doing workouts. The name of the existing workout corresponds to the name of the folder listed.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
WLASL (Word-Level American Sign Language) dataset with 12,000 processed videos covering 2,000 ASL words. Ideal for research and machine learning in sign language recognition, licensed under C-UDA
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
postures
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
datasets of aerial videos captured from drones are essential.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.
Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.
The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.