18 datasets found

h
LLaVA-Video-178K
huggingface.co
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LMMs-Lab (2024). LLaVA-Video-178K [Dataset]. https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2024
Dataset authored and provided by
LMMs-Lab
Description
Dataset Card for LLaVA-Video-178K

Uses

This dataset is used for the training of the LLaVA-Video model. We only allow the use of this dataset for academic research and education purpose. For OpenAI GPT-4 generated data, we recommend the users to check the OpenAI Usage Policy.

Data Sources

For the training of LLaVA-Video, we utilized video-language data from five primary sources:

LLaVA-Video-178K: This dataset includes 178,510 caption entries, 960,792 open-ended… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K.
h
LLaVA-Video-small-swift
huggingface.co
Updated Nov 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malte (2024). LLaVA-Video-small-swift [Dataset]. https://huggingface.co/datasets/malterei/LLaVA-Video-small-swift
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2024
Authors
Malte
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card LLaVA-Video-small-swift

Small subset of LLaVA-Video-178K for educational purposes to learn how to fine-tune video models.
h
llava-video-json
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruyang Liu (2025). llava-video-json [Dataset]. https://huggingface.co/datasets/farewellthree/llava-video-json
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Authors
Ruyang Liu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
farewellthree/llava-video-json dataset hosted on Hugging Face and contributed by the HF Datasets community
h
TinyLLaVA-Video-v1-training-data
huggingface.co
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang Xingjian (2025). TinyLLaVA-Video-v1-training-data [Dataset]. https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-v1-training-data
Explore at:
Dataset updated
Apr 14, 2025
Authors
Zhang Xingjian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
TinyLLaVA-Video

This dataset combines data from multiple sources for pre-training and fine-tuning. Pretrain Data: Four subsets of LLaVA-Video-178K (0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, 30_60_s_youtube_v0_1), supplemented with filtered Video-LLaVA data (https://huggingface.co/datasets/LanguageBind/Video-LLaVA) and data from Valley (https://github.com/RupertLuo/Valley). The video data can be downloaded from the linked datasets, and cleaned annotations are provided… See the full description on the dataset page: https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-v1-training-data.
h
llava-video-178k-frames
huggingface.co
Updated Mar 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weili Xu (2025). llava-video-178k-frames [Dataset]. https://huggingface.co/datasets/weili-0234/llava-video-178k-frames
Explore at:
Dataset updated
Mar 30, 2025
Authors
Weili Xu
Description
weili-0234/llava-video-178k-frames dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang, LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1 [Dataset]. https://huggingface.co/datasets/Xiaodong/LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1
Explore at:
Authors
Wang
Area covered
YouTube
Description
Xiaodong/LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
TinyLLaVA-Video-R1-training-data
huggingface.co
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang Xingjian (2025). TinyLLaVA-Video-R1-training-data [Dataset]. https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-R1-training-data
Explore at:
Dataset updated
Apr 15, 2025
Authors
Zhang Xingjian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
TinyLLaVA-Video-R1

We select multiple choice questions from the NextQA subset of LLaVA-Video-178K as training data. To maintain manageable training time with limited computational resources, we only choose the subset of data with a duration of 0 to 30 seconds, which contains 5,496 samples. In addition, we manually annotate 16 samples for cold-starting and provide the annotations.

Organize Data

Organize the files and annotation files as follows in path/to/your/dataset: dataset ├──… See the full description on the dataset page: https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-R1-training-data.
t
Thermal and visible videos from lava lakes around the world - Vdataset - LDM...
service.tib.eu
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Thermal and visible videos from lava lakes around the world - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-899433
Explore at:
Dataset updated
Nov 30, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Active lava lakes represent a variety of open-vent volcanism in which a sizable body of lava accumulates at the top of the magma column, constrained by the vent and/or crater geometry. The longevity of lava lakes reflects a balancing of cooling and outgassing occurring at the surface and input of hot and gas-rich magma from below. Due to their longevity and relative accessibility, lava lakes provide a natural laboratory for studying fundamental volcanic processes such as degassing, convection and cooling. This article examines all seven lakes that existed at the time of writing in 2018, located in the Pacific, Antarctica, Africa, and South and Central America. These lakes span all tectonic environments, and a range of magma compositions. We focus on analysis of the lake surface motion using image velocimetry, which reveals both similarities and contrasts in outgassing and lake dynamics when comparing the different lakes. We identify two categories of lake behavior: Organized (Erta'Ale, Nyiragongo, Kīlauea after 2011, and Erebus) and Chaotic (Villarrica, Masaya, Marum). This division does not map directly to lake size, viscosity, gas emission rate, or temperature. Instead, when examined together, we find that the lakes follow a linear relationship between average surface speed and the ratio of total gas flux to lake surface area. This relationship points to the combined importance of both flux and lake size in addition to the total volume of gas emission, and suggests that a shared deep mechanism controls the supply of heat and gas to all lakes. On the other hand, the differences between Chaotic and Organized lakes highlight the important role of the geometry of the conduit-lake transition, which superimposes a shallow signal on that of the deep circulation. The spatial patterns of surface motion we document suggest that the release of gas bubbles at Chaotic lakes is more efficient (i.e., bubbles are less likely to be retained and recycled) compared with Organized lakes. In addition, the data presented here indicate that the solidified crust of Organized lakes plays a role in regulating convection and outgassing in lava lakes.
h
Video-R1-data
huggingface.co
Updated Mar 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Video-R1 (2025). Video-R1-data [Dataset]. https://huggingface.co/datasets/Video-R1/Video-R1-data
Explore at:
Dataset updated
Mar 29, 2025
Dataset authored and provided by
Video-R1
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This repository contains the data presented in Video-R1: Reinforcing Video Reasoning in MLLMs. Code: https://github.com/tulerfeng/Video-R1 Video data folder: CLEVRER, LLaVA-Video-178K, NeXT-QA, PerceptionTest, STAR Image data folder: Chart, General, Knowledge, Math, OCR, Spatial Video-R1-COT-165k.json is for SFT cold start, and Video-R1-260k.json is for RL training. Data Format in Video-R1-COT-165k: { "problem_id": 2, "problem": "What appears on the screen in Russian during the… See the full description on the dataset page: https://huggingface.co/datasets/Video-R1/Video-R1-data.
h
VideoEspresso_train_multi_image
huggingface.co
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Songhao Han (2025). VideoEspresso_train_multi_image [Dataset]. https://huggingface.co/datasets/hshjerry0315/VideoEspresso_train_multi_image
Explore at:
Dataset updated
Jan 24, 2025
Authors
Songhao Han
Description
VideoEspresso

This dataset is the multi-image version.

Leaderboard

Model Params Frames Overall Narrative Analysis Event Dynamic Preparation Steps Causal Analysis Theme Analysis Contextual Analysis Influence Analysis Role Analysis Interaction Analysis Behavior Analysis Emotion Analysis Cooking Process Traffic Analysis Situation Analysis

LLaVA-Video 72B 64 66.3% 68.4% 66.2% 74.5% 62.7% 62.3% 71.6% 62.5% 63.5% 67.7% 63.2% 60.0% 75.5% 76.7% 74.0%

LLaVA-OneVision… See the full description on the dataset page: https://huggingface.co/datasets/hshjerry0315/VideoEspresso_train_multi_image.
h
VideoRoPE
huggingface.co
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xilin Wei (2025). VideoRoPE [Dataset]. https://huggingface.co/datasets/Wiselnn/VideoRoPE
Explore at:
Dataset updated
Jun 17, 2025
Authors
Xilin Wei
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
V-NIAH-D Benchmark

A Visual Needle-In-A-Haystack Benchmark with Periodic Distractors. It was presented in VideoRoPE: What Makes for Good Video Rotary Position Embedding?. One can use it by following steps similar to V-NIAH.

VideoRoPE Training Data

To facilitate the reproduction of our experimental results, we have also uploaded the data used by VideoRoPE. We use a subset of the LLaVA-Video-178K dataset to train VideoRoPE. The LLaVA-Video-178K dataset consists of 178K… See the full description on the dataset page: https://huggingface.co/datasets/Wiselnn/VideoRoPE.
h
MMTrail-20M
huggingface.co
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
c (2024). MMTrail-20M [Dataset]. https://huggingface.co/datasets/litwell/MMTrail-20M
Explore at:
Dataset updated
Jul 30, 2024
Authors
c
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
🎞MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music, aiming to enhance cross-modality studies and fine-grained multimodal-language model training. In short, we provided 2M+ LLaVA Video captions, 2M+ Music captions, and 60M+ Coca frame captions for 27.1khrs of… See the full description on the dataset page: https://huggingface.co/datasets/litwell/MMTrail-20M.
h
LaVA-Video-2_3_m_youtube_mc-4o
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang, LaVA-Video-2_3_m_youtube_mc-4o [Dataset]. https://huggingface.co/datasets/Xiaodong/LaVA-Video-2_3_m_youtube_mc-4o
Explore at:
Authors
Wang
Area covered
YouTube
Description
Xiaodong/LaVA-Video-2_3_m_youtube_mc-4o dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ViDRiP_Instruct_Test
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trinh Vuong (2025). ViDRiP_Instruct_Test [Dataset]. https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Test
Explore at:
Dataset updated
Jun 1, 2025
Authors
Trinh Vuong
License
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
Description
🧬 ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos

ViDRiP-LLaVA is a vision-language framework designed for instruction-based diagnostic reasoning using both image patches and video clips from pathology slides. It builds on LLaVA and extends it to the medical domain with domain-specific datasets and fine-tuned models. 🧠 Introducing our ViDRiP-LLaVA: the first multimodal model for diagnostic reasoning in pathology through video-based instruction.… See the full description on the dataset page: https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Test.
h
VTdataset
huggingface.co
Updated Apr 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank Hu (2024). VTdataset [Dataset]. https://huggingface.co/datasets/Ftest/VTdataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2024
Authors
Frank Hu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Dataset Name

Youtube clips video data processed for conversational llava model. This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

Dataset Description

Video data are segmented into intervals of 30 seconds. Each interval is converted into a collage of 3 x 3 frames uniformaly selected. Dataset is generated in two-folds:

Basic Llava model tasked with describing the 3 x 3 collage. Llama 3 prompted… See the full description on the dataset page: https://huggingface.co/datasets/Ftest/VTdataset.
h
Causal2Needles
huggingface.co
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
causal2needles (2025). Causal2Needles [Dataset]. https://huggingface.co/datasets/causal2needles/Causal2Needles
Explore at:
Dataset updated
Apr 28, 2025
Authors
causal2needles
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Causal2Needles

Overview

Paper Code Causal2Needles is a benchmark dataset and evaluation toolkit designed to assess the capabilities of vision-language models (e.g., Gemini-1.5-Pro and LLaVA-Next-Video-7B) in long-video understanding and causal reasoning.This repository provides:

Dataset (Videos, Questions, Narration...) Instructions for downloading and setting up the dataset Example scripts for testing models Automated evaluation of model performance across three types… See the full description on the dataset page: https://huggingface.co/datasets/causal2needles/Causal2Needles.
h
ViDRiP_Instruct_Train
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trinh Vuong (2025). ViDRiP_Instruct_Train [Dataset]. https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Train
Explore at:
Dataset updated
Jun 1, 2025
Authors
Trinh Vuong
License
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
Description
⚠️ Access RequiredTo access the files in this dataset, you must agree to the cc-by-nc-nd-3.0 license terms.This dataset is for academic research use only and not intended for commercial or clinical applications.

🧬 ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos

ViDRiP-LLaVA is a vision-language framework designed for instruction-based diagnostic reasoning using both image patches and video clips from pathology slides. It builds on LLaVA and… See the full description on the dataset page: https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Train.
h
M4-IT
huggingface.co
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuxuan Wang (2025). M4-IT [Dataset]. https://huggingface.co/datasets/ColorfulAI/M4-IT
Explore at:
Dataset updated
Apr 1, 2025
Authors
Yuxuan Wang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
M4-IT

This dataset, M4-IT, is a synthetic instruction finetuning dataset used in the development of the M4 framework, designed to enhance real-time interactive reasoning in multi-modal language models. The M4 framework is evaluated on OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts.

Data Description

Building on the LLaVA-NeXT-Data, we crafted a small video-free synthetic instruction finetuning dataset, M4-IT, with the assistance… See the full description on the dataset page: https://huggingface.co/datasets/ColorfulAI/M4-IT.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

LMMs-Lab (2024). LLaVA-Video-178K [Dataset]. https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K

LLaVA-Video-178K

lmms-lab/LLaVA-Video-178K

Explore at:

29 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 16, 2024

Dataset authored and provided by

LMMs-Lab

Description

Dataset Card for LLaVA-Video-178K

  Uses

This dataset is used for the training of the LLaVA-Video model. We only allow the use of this dataset for academic research and education purpose. For OpenAI GPT-4 generated data, we recommend the users to check the OpenAI Usage Policy.

  Data Sources

For the training of LLaVA-Video, we utilized video-language data from five primary sources:

LLaVA-Video-178K: This dataset includes 178,510 caption entries, 960,792 open-ended… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K.

Clear search

Close search

Google apps

Main menu

LLaVA-Video-178K

LLaVA-Video-small-swift

llava-video-json

TinyLLaVA-Video-v1-training-data

llava-video-178k-frames

LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1

TinyLLaVA-Video-R1-training-data

Thermal and visible videos from lava lakes around the world - Vdataset - LDM...

Video-R1-data

VideoEspresso_train_multi_image

VideoRoPE

MMTrail-20M

LaVA-Video-2_3_m_youtube_mc-4o

ViDRiP_Instruct_Test

VTdataset

Causal2Needles

ViDRiP_Instruct_Train

M4-IT

LLaVA-Video-178K

lmms-lab/LLaVA-Video-178K