18 datasets found
  1. h

    LLaVA-Video-178K

    • huggingface.co
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMMs-Lab (2024). LLaVA-Video-178K [Dataset]. https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Dataset authored and provided by
    LMMs-Lab
    Description

    Dataset Card for LLaVA-Video-178K

      Uses
    

    This dataset is used for the training of the LLaVA-Video model. We only allow the use of this dataset for academic research and education purpose. For OpenAI GPT-4 generated data, we recommend the users to check the OpenAI Usage Policy.

      Data Sources
    

    For the training of LLaVA-Video, we utilized video-language data from five primary sources:

    LLaVA-Video-178K: This dataset includes 178,510 caption entries, 960,792 open-ended… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K.

  2. h

    LLaVA-Video-small-swift

    • huggingface.co
    Updated Nov 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malte (2024). LLaVA-Video-small-swift [Dataset]. https://huggingface.co/datasets/malterei/LLaVA-Video-small-swift
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 22, 2024
    Authors
    Malte
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card LLaVA-Video-small-swift

    Small subset of LLaVA-Video-178K for educational purposes to learn how to fine-tune video models.

  3. h

    llava-video-json

    • huggingface.co
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruyang Liu (2025). llava-video-json [Dataset]. https://huggingface.co/datasets/farewellthree/llava-video-json
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Authors
    Ruyang Liu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    farewellthree/llava-video-json dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    TinyLLaVA-Video-v1-training-data

    • huggingface.co
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang Xingjian (2025). TinyLLaVA-Video-v1-training-data [Dataset]. https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-v1-training-data
    Explore at:
    Dataset updated
    Apr 14, 2025
    Authors
    Zhang Xingjian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TinyLLaVA-Video

    This dataset combines data from multiple sources for pre-training and fine-tuning. Pretrain Data: Four subsets of LLaVA-Video-178K (0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, 30_60_s_youtube_v0_1), supplemented with filtered Video-LLaVA data (https://huggingface.co/datasets/LanguageBind/Video-LLaVA) and data from Valley (https://github.com/RupertLuo/Valley). The video data can be downloaded from the linked datasets, and cleaned annotations are provided… See the full description on the dataset page: https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-v1-training-data.

  5. h

    llava-video-178k-frames

    • huggingface.co
    Updated Mar 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weili Xu (2025). llava-video-178k-frames [Dataset]. https://huggingface.co/datasets/weili-0234/llava-video-178k-frames
    Explore at:
    Dataset updated
    Mar 30, 2025
    Authors
    Weili Xu
    Description

    weili-0234/llava-video-178k-frames dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1 [Dataset]. https://huggingface.co/datasets/Xiaodong/LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1
    Explore at:
    Authors
    Wang
    Area covered
    YouTube
    Description

    Xiaodong/LLaVA-Video-2_3_m_youtube_mc-qwen_filter_1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    TinyLLaVA-Video-R1-training-data

    • huggingface.co
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang Xingjian (2025). TinyLLaVA-Video-R1-training-data [Dataset]. https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-R1-training-data
    Explore at:
    Dataset updated
    Apr 15, 2025
    Authors
    Zhang Xingjian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TinyLLaVA-Video-R1

    We select multiple choice questions from the NextQA subset of LLaVA-Video-178K as training data. To maintain manageable training time with limited computational resources, we only choose the subset of data with a duration of 0 to 30 seconds, which contains 5,496 samples. In addition, we manually annotate 16 samples for cold-starting and provide the annotations.

      Organize Data
    

    Organize the files and annotation files as follows in path/to/your/dataset: dataset ├──… See the full description on the dataset page: https://huggingface.co/datasets/Zhang199/TinyLLaVA-Video-R1-training-data.

  8. t

    Thermal and visible videos from lava lakes around the world - Vdataset - LDM...

    • service.tib.eu
    Updated Nov 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Thermal and visible videos from lava lakes around the world - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-899433
    Explore at:
    Dataset updated
    Nov 30, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Active lava lakes represent a variety of open-vent volcanism in which a sizable body of lava accumulates at the top of the magma column, constrained by the vent and/or crater geometry. The longevity of lava lakes reflects a balancing of cooling and outgassing occurring at the surface and input of hot and gas-rich magma from below. Due to their longevity and relative accessibility, lava lakes provide a natural laboratory for studying fundamental volcanic processes such as degassing, convection and cooling. This article examines all seven lakes that existed at the time of writing in 2018, located in the Pacific, Antarctica, Africa, and South and Central America. These lakes span all tectonic environments, and a range of magma compositions. We focus on analysis of the lake surface motion using image velocimetry, which reveals both similarities and contrasts in outgassing and lake dynamics when comparing the different lakes. We identify two categories of lake behavior: Organized (Erta'Ale, Nyiragongo, Kīlauea after 2011, and Erebus) and Chaotic (Villarrica, Masaya, Marum). This division does not map directly to lake size, viscosity, gas emission rate, or temperature. Instead, when examined together, we find that the lakes follow a linear relationship between average surface speed and the ratio of total gas flux to lake surface area. This relationship points to the combined importance of both flux and lake size in addition to the total volume of gas emission, and suggests that a shared deep mechanism controls the supply of heat and gas to all lakes. On the other hand, the differences between Chaotic and Organized lakes highlight the important role of the geometry of the conduit-lake transition, which superimposes a shallow signal on that of the deep circulation. The spatial patterns of surface motion we document suggest that the release of gas bubbles at Chaotic lakes is more efficient (i.e., bubbles are less likely to be retained and recycled) compared with Organized lakes. In addition, the data presented here indicate that the solidified crust of Organized lakes plays a role in regulating convection and outgassing in lava lakes.

  9. h

    Video-R1-data

    • huggingface.co
    Updated Mar 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Video-R1 (2025). Video-R1-data [Dataset]. https://huggingface.co/datasets/Video-R1/Video-R1-data
    Explore at:
    Dataset updated
    Mar 29, 2025
    Dataset authored and provided by
    Video-R1
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This repository contains the data presented in Video-R1: Reinforcing Video Reasoning in MLLMs. Code: https://github.com/tulerfeng/Video-R1 Video data folder: CLEVRER, LLaVA-Video-178K, NeXT-QA, PerceptionTest, STAR Image data folder: Chart, General, Knowledge, Math, OCR, Spatial Video-R1-COT-165k.json is for SFT cold start, and Video-R1-260k.json is for RL training. Data Format in Video-R1-COT-165k: { "problem_id": 2, "problem": "What appears on the screen in Russian during the… See the full description on the dataset page: https://huggingface.co/datasets/Video-R1/Video-R1-data.

  10. h

    VideoEspresso_train_multi_image

    • huggingface.co
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Songhao Han (2025). VideoEspresso_train_multi_image [Dataset]. https://huggingface.co/datasets/hshjerry0315/VideoEspresso_train_multi_image
    Explore at:
    Dataset updated
    Jan 24, 2025
    Authors
    Songhao Han
    Description

    VideoEspresso

    This dataset is the multi-image version.

      Leaderboard
    

    Model Params Frames Overall Narrative Analysis Event Dynamic Preparation Steps Causal Analysis Theme Analysis Contextual Analysis Influence Analysis Role Analysis Interaction Analysis Behavior Analysis Emotion Analysis Cooking Process Traffic Analysis Situation Analysis

    LLaVA-Video 72B 64 66.3% 68.4% 66.2% 74.5% 62.7% 62.3% 71.6% 62.5% 63.5% 67.7% 63.2% 60.0% 75.5% 76.7% 74.0%

    LLaVA-OneVision… See the full description on the dataset page: https://huggingface.co/datasets/hshjerry0315/VideoEspresso_train_multi_image.

  11. h

    VideoRoPE

    • huggingface.co
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xilin Wei (2025). VideoRoPE [Dataset]. https://huggingface.co/datasets/Wiselnn/VideoRoPE
    Explore at:
    Dataset updated
    Jun 17, 2025
    Authors
    Xilin Wei
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    V-NIAH-D Benchmark

    A Visual Needle-In-A-Haystack Benchmark with Periodic Distractors. It was presented in VideoRoPE: What Makes for Good Video Rotary Position Embedding?. One can use it by following steps similar to V-NIAH.

      VideoRoPE Training Data
    

    To facilitate the reproduction of our experimental results, we have also uploaded the data used by VideoRoPE. We use a subset of the LLaVA-Video-178K dataset to train VideoRoPE. The LLaVA-Video-178K dataset consists of 178K… See the full description on the dataset page: https://huggingface.co/datasets/Wiselnn/VideoRoPE.

  12. h

    MMTrail-20M

    • huggingface.co
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    c (2024). MMTrail-20M [Dataset]. https://huggingface.co/datasets/litwell/MMTrail-20M
    Explore at:
    Dataset updated
    Jul 30, 2024
    Authors
    c
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    🎞MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music, aiming to enhance cross-modality studies and fine-grained multimodal-language model training. In short, we provided 2M+ LLaVA Video captions, 2M+ Music captions, and 60M+ Coca frame captions for 27.1khrs of… See the full description on the dataset page: https://huggingface.co/datasets/litwell/MMTrail-20M.

  13. h

    LaVA-Video-2_3_m_youtube_mc-4o

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, LaVA-Video-2_3_m_youtube_mc-4o [Dataset]. https://huggingface.co/datasets/Xiaodong/LaVA-Video-2_3_m_youtube_mc-4o
    Explore at:
    Authors
    Wang
    Area covered
    YouTube
    Description

    Xiaodong/LaVA-Video-2_3_m_youtube_mc-4o dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    ViDRiP_Instruct_Test

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trinh Vuong (2025). ViDRiP_Instruct_Test [Dataset]. https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Test
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Trinh Vuong
    License

    Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
    License information was derived automatically

    Description

    🧬 ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos

    ViDRiP-LLaVA is a vision-language framework designed for instruction-based diagnostic reasoning using both image patches and video clips from pathology slides. It builds on LLaVA and extends it to the medical domain with domain-specific datasets and fine-tuned models. 🧠 Introducing our ViDRiP-LLaVA: the first multimodal model for diagnostic reasoning in pathology through video-based instruction.… See the full description on the dataset page: https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Test.

  15. h

    VTdataset

    • huggingface.co
    Updated Apr 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Hu (2024). VTdataset [Dataset]. https://huggingface.co/datasets/Ftest/VTdataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2024
    Authors
    Frank Hu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

    Youtube clips video data processed for conversational llava model. This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Description
    

    Video data are segmented into intervals of 30 seconds. Each interval is converted into a collage of 3 x 3 frames uniformaly selected. Dataset is generated in two-folds:

    Basic Llava model tasked with describing the 3 x 3 collage. Llama 3 prompted… See the full description on the dataset page: https://huggingface.co/datasets/Ftest/VTdataset.

  16. h

    Causal2Needles

    • huggingface.co
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    causal2needles (2025). Causal2Needles [Dataset]. https://huggingface.co/datasets/causal2needles/Causal2Needles
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    causal2needles
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Causal2Needles

      Overview
    

    Paper Code Causal2Needles is a benchmark dataset and evaluation toolkit designed to assess the capabilities of vision-language models (e.g., Gemini-1.5-Pro and LLaVA-Next-Video-7B) in long-video understanding and causal reasoning.This repository provides:

    Dataset (Videos, Questions, Narration...) Instructions for downloading and setting up the dataset Example scripts for testing models Automated evaluation of model performance across three types… See the full description on the dataset page: https://huggingface.co/datasets/causal2needles/Causal2Needles.

  17. h

    ViDRiP_Instruct_Train

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trinh Vuong (2025). ViDRiP_Instruct_Train [Dataset]. https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Train
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Trinh Vuong
    License

    Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
    License information was derived automatically

    Description

    ⚠️ Access RequiredTo access the files in this dataset, you must agree to the cc-by-nc-nd-3.0 license terms.This dataset is for academic research use only and not intended for commercial or clinical applications.

      🧬 ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos
    

    ViDRiP-LLaVA is a vision-language framework designed for instruction-based diagnostic reasoning using both image patches and video clips from pathology slides. It builds on LLaVA and… See the full description on the dataset page: https://huggingface.co/datasets/trinhvg/ViDRiP_Instruct_Train.

  18. h

    M4-IT

    • huggingface.co
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuxuan Wang (2025). M4-IT [Dataset]. https://huggingface.co/datasets/ColorfulAI/M4-IT
    Explore at:
    Dataset updated
    Apr 1, 2025
    Authors
    Yuxuan Wang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    M4-IT

    This dataset, M4-IT, is a synthetic instruction finetuning dataset used in the development of the M4 framework, designed to enhance real-time interactive reasoning in multi-modal language models. The M4 framework is evaluated on OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts.

      Data Description
    

    Building on the LLaVA-NeXT-Data, we crafted a small video-free synthetic instruction finetuning dataset, M4-IT, with the assistance… See the full description on the dataset page: https://huggingface.co/datasets/ColorfulAI/M4-IT.

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
LMMs-Lab (2024). LLaVA-Video-178K [Dataset]. https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K

LLaVA-Video-178K

lmms-lab/LLaVA-Video-178K

Explore at:
29 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2024
Dataset authored and provided by
LMMs-Lab
Description

Dataset Card for LLaVA-Video-178K

  Uses

This dataset is used for the training of the LLaVA-Video model. We only allow the use of this dataset for academic research and education purpose. For OpenAI GPT-4 generated data, we recommend the users to check the OpenAI Usage Policy.

  Data Sources

For the training of LLaVA-Video, we utilized video-language data from five primary sources:

LLaVA-Video-178K: This dataset includes 178,510 caption entries, 960,792 open-ended… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K.

Search
Clear search
Close search
Google apps
Main menu