Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository contains the data presented in Video-R1: Reinforcing Video Reasoning in MLLMs. Code: https://github.com/tulerfeng/Video-R1 Video data folder: CLEVRER, LLaVA-Video-178K, NeXT-QA, PerceptionTest, STAR Image data folder: Chart, General, Knowledge, Math, OCR, Spatial Video-R1-COT-165k.json is for SFT cold start, and Video-R1-260k.json is for RL training. Data Format in Video-R1-COT-165k: { "problem_id": 2, "problem": "What appears on the screen in Russian during the… See the full description on the dataset page: https://huggingface.co/datasets/Video-R1/Video-R1-data.
ahmedheakl/video-r1-RL dataset hosted on Hugging Face and contributed by the HF Datasets community
conctsai/video-r1-image dataset hosted on Hugging Face and contributed by the HF Datasets community
DLNorb/video-r1-processed-mini dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Video Dataset on Hugging Face
This repository hosts the video dataset, a widely used benchmark dataset for human action recognition in videos. The dataset has been processed and uploaded to the Hugging Face Hub for easy access, sharing, and integration into machine learning workflows.
Introduction
The dataset is a large-scale video dataset designed for action recognition tasks. It contains 13,320 video clips across 101 action categories, making it one of the most… See the full description on the dataset page: https://huggingface.co/datasets/ProgramerSalar/video-dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository contains the data presented in Video-R1: Reinforcing Video Reasoning in MLLMs. Code: https://github.com/tulerfeng/Video-R1
ahmedheakl/videos-ours-r1 dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
TinyLLaVA-Video-R1
We select multiple choice questions from the NextQA subset of LLaVA-Video-178K as training data. To maintain manageable training time with limited computational resources, we only choose the subset of data with a duration of 0 to 30 seconds, which contains 5,496 samples. In addition, we manually annotate 16 samples for cold-starting and provide the annotations.
Organize Data
Organize the files and annotation files as follows in path/to/your/dataset: dataset ├──… See the full description on the dataset page: https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-R1-training-data.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository contains the datasets presented in Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🧠 Ego-R1 Data
Welcome to the Ego-R1 Data, a comprehensive collection designed to facilitate the training of large language models for tool-augmented reasoning and reinforcement learning. This dataset will be used for Ego-R1 Codebase, presented in the paper Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning.
📊 Dataset Overview
The Ego-R1 Dataset consists of two main components:
Ego-CoTT-25K: 25,000 Chain-of-Tool-Thought examples for Supervised… See the full description on the dataset page: https://huggingface.co/datasets/Ego-R1/Ego-R1-Data.
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
For more information, please visit the official VersaVid-R1 GitHub Repository.
License
VersaVid-R1 and its training data are intended solely for academic research purposes, and any form of commercial use is strictly prohibited. The copyright of all videos belongs to the video owners. If there is any infringement in VersaVid-R1 training data, please email… See the full description on the dataset page: https://huggingface.co/datasets/VersaVid-R1/VersaVid-R1_training_data.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Multi Task Video Reasoning Dataset
This is the official training dataset for Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning. [Project] [arXiv] [Code]
Data Structure
└── MultiTaskVideoReasoning ├── MTVR_CoT │ ├── actnet.json │ ├── charades.json │ ├── longvideo-reason.json │ ├── nextgqa.json │ ├── rextime.json │ ├── vidchapters.json │ ├── Video-R1-data-image.json │ └──… See the full description on the dataset page: https://huggingface.co/datasets/zhang9302002/MultiTaskVideoReasoning.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning
VAU-R1 is a data-efficient framework for video anomaly reasoning that combines Multimodal Large Language Models (MLLMs) with Reinforcement Fine-Tuning (RFT). This repository contains VAU-Bench, the first Chain-of-Thought (CoT) benchmark specifically designed for video anomaly understanding. It enables multimodal tasks such as multiple-choice question answering, temporal anomaly grounding, rationale-based… See the full description on the dataset page: https://huggingface.co/datasets/7xiang/VAU-Bench.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains two datasets for instructional video analysis tasks:
1. DenseStep200K.json
Description
A large-scale dataset containing 222,000 detailed, temporally grounded instructional steps annotated across 10,000 high-quality instructional videos (totaling 732 hours). Constructed through a training-free automated pipeline leveraging multimodal foundation models (Qwen2.5-VL-72B and DeepSeek-R1-671B) to process noisy HowTo100M videos, achieving precise… See the full description on the dataset page: https://huggingface.co/datasets/gmj03/DenseStep200K.
🧠 Ego-R1 Benchmark
We establish Ego-R1 Benchmark for ultra-long egocentric video understanding. It was proposed in the paper Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning.
📁 Dataset Structure
The Ego-R1 Benchmark contains 300 carefully curated question-answer pairs in total:
🏷️ 150 Human-Labeled: Manually crafted questions by 6 annotators, with 25 QA pairs from each perspective. 🤖 150 Gemini-Generated + Human-Verfied: AI-generated… See the full description on the dataset page: https://huggingface.co/datasets/Ego-R1/Ego-R1-Bench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository contains the data presented in Video-R1: Reinforcing Video Reasoning in MLLMs. Code: https://github.com/tulerfeng/Video-R1 Video data folder: CLEVRER, LLaVA-Video-178K, NeXT-QA, PerceptionTest, STAR Image data folder: Chart, General, Knowledge, Math, OCR, Spatial Video-R1-COT-165k.json is for SFT cold start, and Video-R1-260k.json is for RL training. Data Format in Video-R1-COT-165k: { "problem_id": 2, "problem": "What appears on the screen in Russian during the… See the full description on the dataset page: https://huggingface.co/datasets/Video-R1/Video-R1-data.