6 datasets found
  1. h

    MMMU

    • huggingface.co
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MMMU (2023). MMMU [Dataset]. https://huggingface.co/datasets/MMMU/MMMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 4, 2023
    Dataset authored and provided by
    MMMU
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)

    🌐 Homepage | 🏆 Leaderboard | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub

      🔔News
    

    🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25; test_Materials_17, 242) and content error in validation_Materials_25. 🛠️[2024-04-30]: Fixed missing "-" or "^" signs in Math dataset items (dev_Math_2, validation_Math_11, 12, 16; test_Math_8… See the full description on the dataset page: https://huggingface.co/datasets/MMMU/MMMU.

  2. h

    JMMMU

    • huggingface.co
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JMMMU (2024). JMMMU [Dataset]. https://huggingface.co/datasets/JMMMU/JMMMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2024
    Dataset authored and provided by
    JMMMU
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark

    🌐 Homepage | 🤗 Dataset | 🏆 HF Leaderboard | 📖 arXiv | 💻 Code

      Introduction
    

    We introduce JMMMU (Japanese MMMU), a multimodal benchmark that can truly evaluate LMM performance in Japanese. To create JMMMU, we first carefully analyzed the existing MMMU benchmark and examined its cultural dependencies. Then, for questions in culture-agnostic subjects, we employed native Japanese speakers who… See the full description on the dataset page: https://huggingface.co/datasets/JMMMU/JMMMU.

  3. Benchmark comparison between Google's Gemini and OpenAI's GPT-4 in 2024

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Benchmark comparison between Google's Gemini and OpenAI's GPT-4 in 2024 [Dataset]. https://www.statista.com/statistics/1446321/gemini-and-gpt-4-comparison/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023 - 2024
    Area covered
    Worldwide
    Description

    Gemini Ultra, developed by Google, has beaten OpenAI's GPT-4 in the MMMU benchmark. Only in business and science did GPT-4 perform better. The overall quality of the models is very similar, with Gemini only having a * point lead on its OpenAI developed competitor.

  4. h

    MMMU_with_difficulty_level

    • huggingface.co
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jierun Chen (2025). MMMU_with_difficulty_level [Dataset]. https://huggingface.co/datasets/JierunChen/MMMU_with_difficulty_level
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    Jierun Chen
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    MMMU with difficulty level tags

    This dataset extends the 🤗 MMMU val benchmark by introducing two additional tags: passrate_for_qwen2.5_vl_7b and difficulty_level_for_qwen2.5_vl_7b. Further details are available in our paper The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs.

      🚀 Data Usage
    

    from datasets import load_dataset

    dataset = load_dataset("JierunChen/MMMU_with_difficulty_level") print(dataset)

      📑… See the full description on the dataset page: https://huggingface.co/datasets/JierunChen/MMMU_with_difficulty_level.
    
  5. E

    Google Gemini Statistics By Features, Performance and AI Versions

    • enterpriseappstoday.com
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EnterpriseAppsToday (2023). Google Gemini Statistics By Features, Performance and AI Versions [Dataset]. https://www.enterpriseappstoday.com/stats/google-gemini-statistics.html
    Explore at:
    Dataset updated
    Dec 20, 2023
    Dataset authored and provided by
    EnterpriseAppsToday
    License

    https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Google Gemini Statistics: In 2023, Google unveiled the most powerful AI model to date. Google Gemini is the world’s most advanced AI leaving the ChatGPT 4 behind in the line. Google has 3 different sizes of models, superior to each, and can perform tasks accordingly. According to Google Gemini Statistics, these can understand and solve complex problems related to absolutely anything. Google even said, they will develop AI in such as way that it will let you know how helpful AI is in our daily routine. Well, we hope our next generation won’t be fully dependent on such technologies, otherwise, we will lose all of our natural talent! Editor’s Choice Google Gemini can follow natural and engaging conversations. According to Google Gemini Statistics, Gemini Ultra has a 90.0% score on the MMLU benchmark for testing the knowledge of and problem-solving on subjects including history, physics, math, law, ethics, history, and medicine. If you ask Gemini what to do with your raw material, it can provide you with ideas in the form of text or images according to the given input. Gemini has outperformed ChatGPT -4 tests in the majority of the cases. According to the report this LLM is said to be unique because it can process multiple types of data at the same time along with video, images, computer code, and text. Google is considering its development as The Gemini Era, showing the importance of our AI is significant in improving our daily lives. Google Gemini can talk like a real person Gemini Ultra is the largest model and can solve extremely complex problems. Gemini models are trained on multilingual and multimodal datasets. Gemini’s Ultra performance on the MMMU benchmark has also outperformed the GPT-4V in the following results Art and Design (74.2), Business (62.7), Health and Medicine (71.3), Humanities and Social Science (78.3), and Technology and Engineering (53.00).

  6. h

    Medical_Multimodal_Evaluation_Data

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FreedomAI (2025). Medical_Multimodal_Evaluation_Data [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    FreedomAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Evaluation Guide

    This dataset is used to evaluate medical multimodal LLMs, as used in HuatuoGPT-Vision. It includes benchmarks such as VQA-RAD, SLAKE, PathVQA, PMC-VQA, OmniMedVQA, and MMMU-Medical-Tracks.
    To get started:

    Download the dataset and extract the images.zip file.
    Find evaluation code on our GitHub: HuatuoGPT-Vision.

    This open-source release aims to simplify the evaluation of medical multimodal capabilities in large models. Please cite the relevant benchmark… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MMMU (2023). MMMU [Dataset]. https://huggingface.co/datasets/MMMU/MMMU

MMMU

mmmu

MMMU/MMMU

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2023
Dataset authored and provided by
MMMU
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)

🌐 Homepage | 🏆 Leaderboard | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub

  🔔News

🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25; test_Materials_17, 242) and content error in validation_Materials_25. 🛠️[2024-04-30]: Fixed missing "-" or "^" signs in Math dataset items (dev_Math_2, validation_Math_11, 12, 16; test_Math_8… See the full description on the dataset page: https://huggingface.co/datasets/MMMU/MMMU.

Search
Clear search
Close search
Google apps
Main menu