Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for LongVideoBench
Large multimodal models (LMMs) are handling increasingly longer and more complex inputs. However, few public benchmarks are available to assess these advancements. To address this, we introduce LongVideoBench, a question-answering benchmark with video-language interleaved inputs up to an hour long. It comprises 3,763 web-collected videos with subtitles across diverse themes, designed to evaluate LMMs on long-term multimodal understanding. The… See the full description on the dataset page: https://huggingface.co/datasets/longvideobench/LongVideoBench.
longvideobench/LongVideoBench-Meta dataset hosted on Hugging Face and contributed by the HF Datasets community
topyun/LongVideoBench-Long dataset hosted on Hugging Face and contributed by the HF Datasets community
Jialuo21/LongVideoBench dataset hosted on Hugging Face and contributed by the HF Datasets community
VideoEval-Pro
VideoEval-Pro is a robust and realistic long video understanding benchmark containing open-ended, short-answer QA problems. The dataset is constructed by reformatting questions from four existing long video understanding MCQ benchmarks: Video-MME, MLVU, LVBench, and LongVideoBench into free-form questions. The paper can be found here. The evaluation code and scripts are available at: TIGER-AI-Lab/VideoEval-Pro
Dataset Structure
Each example in the… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/VideoEval-Pro.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for LongVideoBench
Large multimodal models (LMMs) are handling increasingly longer and more complex inputs. However, few public benchmarks are available to assess these advancements. To address this, we introduce LongVideoBench, a question-answering benchmark with video-language interleaved inputs up to an hour long. It comprises 3,763 web-collected videos with subtitles across diverse themes, designed to evaluate LMMs on long-term multimodal understanding. The… See the full description on the dataset page: https://huggingface.co/datasets/longvideobench/LongVideoBench.