12 datasets found
  1. P

    MMBench Dataset

    • paperswithcode.com
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YuAn Liu; Haodong Duan; Yuanhan Zhang; Bo Li; Songyang Zhang; Wangbo Zhao; Yike Yuan; Jiaqi Wang; Conghui He; Ziwei Liu; Kai Chen; Dahua Lin (2023). MMBench Dataset [Dataset]. https://paperswithcode.com/dataset/mmbench
    Explore at:
    Dataset updated
    Apr 13, 2025
    Authors
    YuAn Liu; Haodong Duan; Yuanhan Zhang; Bo Li; Songyang Zhang; Wangbo Zhao; Yike Yuan; Jiaqi Wang; Conghui He; Ziwei Liu; Kai Chen; Dahua Lin
    Description

    MMBench is a multi-modality benchmark. It methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions.

  2. h

    MMBench

    • huggingface.co
    Updated Apr 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xize cheng (2025). MMBench [Dataset]. https://huggingface.co/datasets/Exgc/MMBench
    Explore at:
    Dataset updated
    Apr 4, 2025
    Authors
    xize cheng
    Description

    Exgc/MMBench dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. P

    GMAI-MMBench Dataset

    • paperswithcode.com
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pengcheng Chen; Jin Ye; Guoan Wang; Yanjun Li; Zhongying Deng; Wei Li; Tianbin Li; Haodong Duan; Ziyan Huang; Yanzhou Su; Benyou Wang; Shaoting Zhang; Bin Fu; Jianfei Cai; Bohan Zhuang; Eric J Seibel; Junjun He; Yu Qiao (2025). GMAI-MMBench Dataset [Dataset]. https://paperswithcode.com/dataset/gmai-mmbench
    Explore at:
    Dataset updated
    May 31, 2025
    Authors
    Pengcheng Chen; Jin Ye; Guoan Wang; Yanjun Li; Zhongying Deng; Wei Li; Tianbin Li; Haodong Duan; Ziyan Huang; Yanzhou Su; Benyou Wang; Shaoting Zhang; Bin Fu; Jianfei Cai; Bohan Zhuang; Eric J Seibel; Junjun He; Yu Qiao
    Description

    Click to add a brief description of the dataset (Markdown and LaTeX enabled).

    Provide:

    a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset

  4. h

    MM-SpuBench

    • huggingface.co
    Updated Jun 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenqian Ye (2024). MM-SpuBench [Dataset]. https://huggingface.co/datasets/mmbench/MM-SpuBench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 16, 2024
    Authors
    Wenqian Ye
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MM-SpuBench Datacard

      Basic Information
    

    Title: The Multimodal Spurious Benchmark (MM-SpuBench) Description: MM-SpuBench is a comprehensive benchmark designed to evaluate the robustness of MLLMs to spurious biases. This benchmark systematically assesses how well these models distinguish between core and spurious features, providing a detailed framework for understanding and quantifying spurious biases. Data Structure: ├── data/images │ ├── 000000.jpg │ ├── 000001.jpg │… See the full description on the dataset page: https://huggingface.co/datasets/mmbench/MM-SpuBench.

  5. h

    MMBench_dev

    • huggingface.co
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HuggingFaceM4 (2023). MMBench_dev [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/MMBench_dev
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2023
    Dataset authored and provided by
    HuggingFaceM4
    Description

    Dataset Card for "MMBench_dev"

      Dataset Summary
    

    In recent years, the field has seen a surge in the development of numerous vision-language (VL) models, such as MiniGPT-4 and LLaVA. These models showcase promising performance in tackling previously challenging tasks. However, effectively evaluating these models' performance has become a primary challenge hindering further advancement in large VL models. Traditional benchmarks like VQAv2 and COCO Caption are widely used to… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/MMBench_dev.

  6. h

    MMBench-GUI

    • huggingface.co
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenGVLab (2025). MMBench-GUI [Dataset]. https://huggingface.co/datasets/OpenGVLab/MMBench-GUI
    Explore at:
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    OpenGVLab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🖥️ MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

      Introduction
    

    We are happy to release MMBench-GUI, a hierarchical, multi-platform benchmark framework and toolbox, to evaluate GUI agents. MMBench-GUI is comprising four evaluation levels: GUI Content Understanding, GUI Element Grounding, GUI Task Automation, and GUI Task Collaboration. We also propose the Efficiency–Quality Area (EQA) metric for agent navigation, integrating… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/MMBench-GUI.

  7. h

    mmbench

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YaxinLuo, mmbench [Dataset]. https://huggingface.co/datasets/YaxinLuo/mmbench
    Explore at:
    Authors
    YaxinLuo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    YaxinLuo/mmbench dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    KC-MMbench

    • huggingface.co
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kwai-Keye (2025). KC-MMbench [Dataset]. https://huggingface.co/datasets/Kwai-Keye/KC-MMbench
    Explore at:
    Dataset updated
    Jun 26, 2025
    Authors
    Kwai-Keye
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Based on the Kuaishou short video data, we constructed 6 datasets for Vision-Language Models (VLMs) like Kwai Keye-VL-8B, Qwen2.5-VL and InternVL to evaluate performance.

      Tasks
    

    Task Description

    CPV The task of predicting product attributes in e-commerce.

    Hot_Videos_Aggregation The task of determining whether multiple videos belong to the same topic.

    Collection_Order The task of determining the logical order between multiple videos with the same topic.… See the full description on the dataset page: https://huggingface.co/datasets/Kwai-Keye/KC-MMbench.

  9. h

    GMAI-MMBench

    • huggingface.co
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VLMEval (2024). GMAI-MMBench [Dataset]. https://huggingface.co/datasets/VLMEval/GMAI-MMBench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2024
    Dataset authored and provided by
    VLMEval
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    VLMEval/GMAI-MMBench dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    MMBench-Video

    • huggingface.co
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Shicheng (2024). MMBench-Video [Dataset]. https://huggingface.co/datasets/lscpku/MMBench-Video
    Explore at:
    Dataset updated
    Jul 30, 2024
    Authors
    Li Shicheng
    Description

    lscpku/MMBench-Video dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    mini-MMBench-Video

    • huggingface.co
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingyang Mao (2025). mini-MMBench-Video [Dataset]. https://huggingface.co/datasets/Maoger/mini-MMBench-Video
    Explore at:
    Dataset updated
    Jun 26, 2025
    Authors
    Mingyang Mao
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a subset of the video understanding benchmark MMBench-Video.

  12. h

    K-MMBench

    • huggingface.co
    Updated Dec 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    korean-vision-language (2024). K-MMBench [Dataset]. https://huggingface.co/datasets/ko-vlm/K-MMBench
    Explore at:
    Dataset updated
    Dec 5, 2024
    Dataset authored and provided by
    korean-vision-language
    Description

    ko-vlm/K-MMBench dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
YuAn Liu; Haodong Duan; Yuanhan Zhang; Bo Li; Songyang Zhang; Wangbo Zhao; Yike Yuan; Jiaqi Wang; Conghui He; Ziwei Liu; Kai Chen; Dahua Lin (2023). MMBench Dataset [Dataset]. https://paperswithcode.com/dataset/mmbench

MMBench Dataset

Explore at:
Dataset updated
Apr 13, 2025
Authors
YuAn Liu; Haodong Duan; Yuanhan Zhang; Bo Li; Songyang Zhang; Wangbo Zhao; Yike Yuan; Jiaqi Wang; Conghui He; Ziwei Liu; Kai Chen; Dahua Lin
Description

MMBench is a multi-modality benchmark. It methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions.

Search
Clear search
Close search
Google apps
Main menu