UCF Crime Dataset in the most suitable structure. Contains 1900 videos from 13 different categories. To ensure the quality of this dataset, it is trained ten annotators (having different levels of computer vision expertise) to collect the dataset. Using videos search on YouTube and LiveLeak using text search queries (with slight variations e.g. “car crash”, “road accident”) of each anomaly.
hf-internal-testing/tiny-video-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
PE Video Dataset (PVD)
[📃 Tech Report] [📂 Github] The PE Video Dataset (PVD) is a large-scale collection of 1 million diverse videos, featuring 120,000+ expertly annotated clips. The dataset was introduced in our paper "Perception Encoder".
Overview
PE Video Dataset (PVD) comprises 1M high quality and diverse videos. Among them, 120K videos are accompanied by automated and human-verified annotations. and all videos are accompanied with video description and keywords.… See the full description on the dataset page: https://huggingface.co/datasets/facebook/PE-Video.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Video Dataset on Hugging Face
This repository hosts the video dataset, a widely used benchmark dataset for human action recognition in videos. The dataset has been processed and uploaded to the Hugging Face Hub for easy access, sharing, and integration into machine learning workflows.
Introduction
The dataset is a large-scale video dataset designed for action recognition tasks. It contains 13,320 video clips across 101 action categories, making it one of the most… See the full description on the dataset page: https://huggingface.co/datasets/ProgramerSalar/video-dataset.
3D video data asset of CVPR 2022 Paper "Neural 3D Video Synthesis"
We randomly selected three videos from the Internet, that are longer than 1.5K frames and have their main objects continuously appearing. Each video has 20 uniformly sampled frames manually annotated for evaluation.
The YCB-Video dataset is a large-scale video dataset for 6D object pose estimation. provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Dataset Video is a dataset for object detection tasks - it contains Senang Murung Bingung Normal annotations for 226 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://www.sapien.io/termshttps://www.sapien.io/terms
High-quality image and video datasets for AI training in computer vision applications, including object recognition, scene understanding, and more.
The i3-video dataset contains "is-it-instructional" annotations for 6.4k videos from Youtube-8M. The videos are considered to be instructional if they focus on real-world human actions accompanied by procedural language that explains what’s happening on screen in reasonable details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A large scale synthetic dataset about dynamic human-object interactions. It features about 10 hours of video with 8337 sequences and 2M images. The generation of this dataset is described in the paper "InterTrack: Tracking Human Object Interaction without Object Templates" (3DV'25). Please check the github repo for detailed file structure of the dataset: https://github.com/xiexh20/ProciGen If you use our data, please cite: @inproceedings{xie2024InterTrack, title = {InterTrack: Tracking Human Object Interaction without Object Templates}, author = {Xie, Xianghui and Lenssen, Jan Eric and Pons-Moll, Gerard}, booktitle = {International Conference on 3D Vision (3DV)}, month = {March}, year = {2025}, }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
9x9 views
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
It is used to develop the human activity recognition, classification
The Kinetics dataset is a large-scale, high-quality dataset for human action recognition in videos. The dataset consists of around 500,000 video clips covering 600 human action classes with at least 600 video clips for each action class. Each video clip lasts around 10 seconds and is labeled with a single action class. The videos are collected from YouTube.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
D²-City is a large-scale driving video dataset that provides more than 10,000 dashcam videos recorded in 720p HD or 1080p FHD. Around 1000 of the videos come with detection and tracking annotation in each frame of all road objects, including bounding boxes and the tracking IDs of cars, vans, buses, trucks, pedestrians, motorcycles, bicycles, open- and closed-tricycles, forklifts, and large- and small-blocks. Some of the remainders of the videos come with road objects annotated in keyframes. Compared with existing datasets, D²-City benefits from its huge amount of diversity as data is collected from several cities throughout China and features varying weather, road, and traffic conditions. D²-City pays special attention to challenges in complex and various traffic scenarios. By bring more challenging cases to the community, we hope that this dataset will encourage and help new advances in the perception area of intelligent driving. The D²-City dataset and the corresponding challenges are originally hosted on DiDi GAIA's platform (URL: https://outreach.didichuxing.com/d2city/d2city)
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
The BBC Land Girls TV series is a 3 season series. Each season is 5 episodes of about 45mins each. The TRECVID group at NIST worked with the BBC Corp. to release the dataset to the research community to work on video understanding tasks. Unfortunately, the hosting arrangement for the dataset was not successful and the release of the video dataset couldn't be done. We are releasing the annotations conducted by NIST, without any video data, so that the researchers interested in working on knowledge graph understanding and natural language analysis can take advantage of them.
The first AI-generated video detection datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ATTENTION: THIS DATASET DOES NOT HOST ANY SOURCE VIDEOS. WE PROVIDE ONLY HIDDEN FEATURES GENERATED BY PRE-TRAINED DEEP MODELS AS DATA
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 600 hours of footage and featuring 20,841 unique identities from around the world.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
20 Video is a dataset for object detection tasks - it contains Bcdj annotations for 760 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
UCF Crime Dataset in the most suitable structure. Contains 1900 videos from 13 different categories. To ensure the quality of this dataset, it is trained ten annotators (having different levels of computer vision expertise) to collect the dataset. Using videos search on YouTube and LiveLeak using text search queries (with slight variations e.g. “car crash”, “road accident”) of each anomaly.