12 datasets found

P
MMBench Dataset
paperswithcode.com
Updated Apr 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YuAn Liu; Haodong Duan; Yuanhan Zhang; Bo Li; Songyang Zhang; Wangbo Zhao; Yike Yuan; Jiaqi Wang; Conghui He; Ziwei Liu; Kai Chen; Dahua Lin (2023). MMBench Dataset [Dataset]. https://paperswithcode.com/dataset/mmbench
Explore at:
Dataset updated
Apr 13, 2025
Authors
YuAn Liu; Haodong Duan; Yuanhan Zhang; Bo Li; Songyang Zhang; Wangbo Zhao; Yike Yuan; Jiaqi Wang; Conghui He; Ziwei Liu; Kai Chen; Dahua Lin
Description
MMBench is a multi-modality benchmark. It methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions.
h
MMBench
huggingface.co
Updated Apr 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xize cheng (2025). MMBench [Dataset]. https://huggingface.co/datasets/Exgc/MMBench
Explore at:
Dataset updated
Apr 4, 2025
Authors
xize cheng
Description
Exgc/MMBench dataset hosted on Hugging Face and contributed by the HF Datasets community
P
GMAI-MMBench Dataset
paperswithcode.com
Updated May 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pengcheng Chen; Jin Ye; Guoan Wang; Yanjun Li; Zhongying Deng; Wei Li; Tianbin Li; Haodong Duan; Ziyan Huang; Yanzhou Su; Benyou Wang; Shaoting Zhang; Bin Fu; Jianfei Cai; Bohan Zhuang; Eric J Seibel; Junjun He; Yu Qiao (2025). GMAI-MMBench Dataset [Dataset]. https://paperswithcode.com/dataset/gmai-mmbench
Explore at:
Dataset updated
May 31, 2025
Authors
Pengcheng Chen; Jin Ye; Guoan Wang; Yanjun Li; Zhongying Deng; Wei Li; Tianbin Li; Haodong Duan; Ziyan Huang; Yanzhou Su; Benyou Wang; Shaoting Zhang; Bin Fu; Jianfei Cai; Bohan Zhuang; Eric J Seibel; Junjun He; Yu Qiao
Description
Click to add a brief description of the dataset (Markdown and LaTeX enabled).

Provide:

a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset
h
MM-SpuBench
huggingface.co
Updated Jun 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenqian Ye (2024). MM-SpuBench [Dataset]. https://huggingface.co/datasets/mmbench/MM-SpuBench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 16, 2024
Authors
Wenqian Ye
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MM-SpuBench Datacard

Basic Information

Title: The Multimodal Spurious Benchmark (MM-SpuBench) Description: MM-SpuBench is a comprehensive benchmark designed to evaluate the robustness of MLLMs to spurious biases. This benchmark systematically assesses how well these models distinguish between core and spurious features, providing a detailed framework for understanding and quantifying spurious biases. Data Structure: ├── data/images │ ├── 000000.jpg │ ├── 000001.jpg │… See the full description on the dataset page: https://huggingface.co/datasets/mmbench/MM-SpuBench.
h
MMBench_dev
huggingface.co
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HuggingFaceM4 (2023). MMBench_dev [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/MMBench_dev
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2023
Dataset authored and provided by
HuggingFaceM4
Description
Dataset Card for "MMBench_dev"

Dataset Summary

In recent years, the field has seen a surge in the development of numerous vision-language (VL) models, such as MiniGPT-4 and LLaVA. These models showcase promising performance in tackling previously challenging tasks. However, effectively evaluating these models' performance has become a primary challenge hindering further advancement in large VL models. Traditional benchmarks like VQAv2 and COCO Caption are widely used to… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/MMBench_dev.
h
MMBench-GUI
huggingface.co
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenGVLab (2025). MMBench-GUI [Dataset]. https://huggingface.co/datasets/OpenGVLab/MMBench-GUI
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
OpenGVLab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
🖥️ MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Introduction

We are happy to release MMBench-GUI, a hierarchical, multi-platform benchmark framework and toolbox, to evaluate GUI agents. MMBench-GUI is comprising four evaluation levels: GUI Content Understanding, GUI Element Grounding, GUI Task Automation, and GUI Task Collaboration. We also propose the Efficiency–Quality Area (EQA) metric for agent navigation, integrating… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/MMBench-GUI.
h
mmbench
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YaxinLuo, mmbench [Dataset]. https://huggingface.co/datasets/YaxinLuo/mmbench
Explore at:
Authors
YaxinLuo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
YaxinLuo/mmbench dataset hosted on Hugging Face and contributed by the HF Datasets community
h
KC-MMbench
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kwai-Keye (2025). KC-MMbench [Dataset]. https://huggingface.co/datasets/Kwai-Keye/KC-MMbench
Explore at:
Dataset updated
Jun 26, 2025
Authors
Kwai-Keye
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Based on the Kuaishou short video data, we constructed 6 datasets for Vision-Language Models (VLMs) like Kwai Keye-VL-8B, Qwen2.5-VL and InternVL to evaluate performance.

Tasks

Task Description

CPV The task of predicting product attributes in e-commerce.

Hot_Videos_Aggregation The task of determining whether multiple videos belong to the same topic.

Collection_Order The task of determining the logical order between multiple videos with the same topic.… See the full description on the dataset page: https://huggingface.co/datasets/Kwai-Keye/KC-MMbench.
h
GMAI-MMBench
huggingface.co
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VLMEval (2024). GMAI-MMBench [Dataset]. https://huggingface.co/datasets/VLMEval/GMAI-MMBench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2024
Dataset authored and provided by
VLMEval
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
VLMEval/GMAI-MMBench dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MMBench-Video
huggingface.co
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li Shicheng (2024). MMBench-Video [Dataset]. https://huggingface.co/datasets/lscpku/MMBench-Video
Explore at:
Dataset updated
Jul 30, 2024
Authors
Li Shicheng
Description
lscpku/MMBench-Video dataset hosted on Hugging Face and contributed by the HF Datasets community
h
mini-MMBench-Video
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingyang Mao (2025). mini-MMBench-Video [Dataset]. https://huggingface.co/datasets/Maoger/mini-MMBench-Video
Explore at:
Dataset updated
Jun 26, 2025
Authors
Mingyang Mao
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a subset of the video understanding benchmark MMBench-Video.
h
K-MMBench
huggingface.co
Updated Dec 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
korean-vision-language (2024). K-MMBench [Dataset]. https://huggingface.co/datasets/ko-vlm/K-MMBench
Explore at:
Dataset updated
Dec 5, 2024
Dataset authored and provided by
korean-vision-language
Description
ko-vlm/K-MMBench dataset hosted on Hugging Face and contributed by the HF Datasets community
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

YuAn Liu; Haodong Duan; Yuanhan Zhang; Bo Li; Songyang Zhang; Wangbo Zhao; Yike Yuan; Jiaqi Wang; Conghui He; Ziwei Liu; Kai Chen; Dahua Lin (2023). MMBench Dataset [Dataset]. https://paperswithcode.com/dataset/mmbench

MMBench Dataset

Explore at:

Dataset updated

Apr 13, 2025

Authors

YuAn Liu; Haodong Duan; Yuanhan Zhang; Bo Li; Songyang Zhang; Wangbo Zhao; Yike Yuan; Jiaqi Wang; Conghui He; Ziwei Liu; Kai Chen; Dahua Lin

Description

MMBench is a multi-modality benchmark. It methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions.

Clear search

Close search

Google apps

Main menu

MMBench Dataset

MMBench

GMAI-MMBench Dataset

MM-SpuBench

MMBench_dev

MMBench-GUI

mmbench

KC-MMbench

GMAI-MMBench

MMBench-Video

mini-MMBench-Video

K-MMBench

MMBench Dataset