MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MVBench
Important Update
[18/10/2024] Due to NTU RGB+D License, 320 videos from NTU RGB+D need to be downloaded manually. Please visit ROSE Lab to access the data. We also provide a list of the 320 videos used in MVBench for your reference.
We introduce a novel static-to-dynamic method for defining temporal-related tasks. By converting static tasks into dynamic ones, we facilitate systematic generation of video tasks necessitating a wide range of temporal abilities, from… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/MVBench.
MVBench is a comprehensive Multi-modal Video understanding Benchmark. It was introduced to evaluate the comprehension capabilities of Multi-modal Large Language Models (MLLMs), particularly their temporal understanding in dynamic video tasks. MVBench covers 20 challenging video tasks that cannot be effectively solved with a single frame. It introduces a novel static-to-dynamic method to define these temporal-related tasks. By transforming various static tasks into dynamic ones, it enables the systematic generation of video tasks that require a broad spectrum of temporal skills, ranging from perception to cognition.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains optimized video files based on the MVBench dataset. All non-video data remains the same, and users are encouraged to refer to the original dataset for the rest of the data and annotations. Original MVBench Dataset: MVBench on Hugging Face
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The evaluation data for the MVBench. The data structure follows the evaluation code of PixelReasoner
MVBench
We introduce a novel static-to-dynamic method for defining temporal-related tasks. By converting static tasks into dynamic ones, we facilitate systematic generation of video tasks necessitating a wide range of temporal abilities, from perception to cognition. Guided by task definitions, we then automatically transform public video annotations into multiple-choice QA for task evaluation. This unique paradigm enables efficient creation of MVBench with minimal manual… See the full description on the dataset page: https://huggingface.co/datasets/Mitzi4132/VideoLLava.
firstep-ai/MV-Bench-mini dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MVTamperBench Dataset
Overview
MVTamperBenchEnd is a robust benchmark designed to evaluate Vision-Language Models (VLMs) against adversarial video tampering effects. It leverages the diverse and well-structured MVBench dataset, systematically augmented with four distinct tampering techniques:
Masking: Overlays a black rectangle on a 1-second segment, simulating visual data loss. Repetition: Repeats a 1-second segment, introducing temporal redundancy. Rotation: Rotates a… See the full description on the dataset page: https://huggingface.co/datasets/Srikant86/MVTamperBenchEnd.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Proposed MVTamperBench, a novel benchmark that systematically evaluates the adversarial robustness of VLMs against video specific tampering techniques, with a focus on temporal reasoning and multimodal coherence.
Dataset Description
MVTamperBench applies five distinct tampering techniques to the original MVBench videos: Dropping, Masking, Substitution, Repetition, and Rotation. Each tampering effect introduces unique adversarial challenges to test VLM robustness under… See the full description on the dataset page: https://huggingface.co/datasets/Srikant86/MVTamperBenchSample.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MVBench
Important Update
[18/10/2024] Due to NTU RGB+D License, 320 videos from NTU RGB+D need to be downloaded manually. Please visit ROSE Lab to access the data. We also provide a list of the 320 videos used in MVBench for your reference.
We introduce a novel static-to-dynamic method for defining temporal-related tasks. By converting static tasks into dynamic ones, we facilitate systematic generation of video tasks necessitating a wide range of temporal abilities, from… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/MVBench.