6 datasets found

h
MMMU
huggingface.co
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MMMU (2023). MMMU [Dataset]. https://huggingface.co/datasets/MMMU/MMMU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2023
Dataset authored and provided by
MMMU
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)

🌐 Homepage | 🏆 Leaderboard | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub

🔔News

🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25; test_Materials_17, 242) and content error in validation_Materials_25. 🛠️[2024-04-30]: Fixed missing "-" or "^" signs in Math dataset items (dev_Math_2, validation_Math_11, 12, 16; test_Math_8… See the full description on the dataset page: https://huggingface.co/datasets/MMMU/MMMU.
h
JMMMU
huggingface.co
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JMMMU (2024). JMMMU [Dataset]. https://huggingface.co/datasets/JMMMU/JMMMU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 1, 2024
Dataset authored and provided by
JMMMU
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark

🌐 Homepage | 🤗 Dataset | 🏆 HF Leaderboard | 📖 arXiv | 💻 Code

Introduction

We introduce JMMMU (Japanese MMMU), a multimodal benchmark that can truly evaluate LMM performance in Japanese. To create JMMMU, we first carefully analyzed the existing MMMU benchmark and examined its cultural dependencies. Then, for questions in culture-agnostic subjects, we employed native Japanese speakers who… See the full description on the dataset page: https://huggingface.co/datasets/JMMMU/JMMMU.
Benchmark comparison between Google's Gemini and OpenAI's GPT-4 in 2024
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Benchmark comparison between Google's Gemini and OpenAI's GPT-4 in 2024 [Dataset]. https://www.statista.com/statistics/1446321/gemini-and-gpt-4-comparison/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023 - 2024
Area covered
Worldwide
Description
Gemini Ultra, developed by Google, has beaten OpenAI's GPT-4 in the MMMU benchmark. Only in business and science did GPT-4 perform better. The overall quality of the models is very similar, with Gemini only having a * point lead on its OpenAI developed competitor.
h
MMMU_with_difficulty_level
huggingface.co
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jierun Chen (2025). MMMU_with_difficulty_level [Dataset]. https://huggingface.co/datasets/JierunChen/MMMU_with_difficulty_level
Explore at:
Dataset updated
Jul 15, 2025
Authors
Jierun Chen
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
MMMU with difficulty level tags

This dataset extends the 🤗 MMMU val benchmark by introducing two additional tags: passrate_for_qwen2.5_vl_7b and difficulty_level_for_qwen2.5_vl_7b. Further details are available in our paper The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs.

🚀 Data Usage

from datasets import load_dataset

dataset = load_dataset("JierunChen/MMMU_with_difficulty_level") print(dataset)

📑… See the full description on the dataset page: https://huggingface.co/datasets/JierunChen/MMMU_with_difficulty_level.
E
Google Gemini Statistics By Features, Performance and AI Versions
enterpriseappstoday.com
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EnterpriseAppsToday (2023). Google Gemini Statistics By Features, Performance and AI Versions [Dataset]. https://www.enterpriseappstoday.com/stats/google-gemini-statistics.html
Explore at:
Dataset updated
Dec 20, 2023
Dataset authored and provided by
EnterpriseAppsToday
License
https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Google Gemini Statistics: In 2023, Google unveiled the most powerful AI model to date. Google Gemini is the worldâ€™s most advanced AI leaving the ChatGPT 4 behind in the line. Google has 3 different sizes of models, superior to each, and can perform tasks accordingly. According to Google Gemini Statistics, these can understand and solve complex problems related to absolutely anything. Google even said, they will develop AI in such as way that it will let you know how helpful AI is in our daily routine. Well, we hope our next generation wonâ€™t be fully dependent on such technologies, otherwise, we will lose all of our natural talent! Editorâ€™s Choice Google Gemini can follow natural and engaging conversations. According to Google Gemini Statistics, Gemini Ultra has a 90.0% score on the MMLU benchmark for testing the knowledge of and problem-solving on subjects including history, physics, math, law, ethics, history, and medicine. If you ask Gemini what to do with your raw material, it can provide you with ideas in the form of text or images according to the given input. Gemini has outperformed ChatGPT -4 tests in the majority of the cases. According to the report this LLM is said to be unique because it can process multiple types of data at the same time along with video, images, computer code, and text. Google is considering its development as The Gemini Era, showing the importance of our AI is significant in improving our daily lives. Google Gemini can talk like a real person Gemini Ultra is the largest model and can solve extremely complex problems. Gemini models are trained on multilingual and multimodal datasets. Geminiâ€™s Ultra performance on the MMMU benchmark has also outperformed the GPT-4V in the following results Art and Design (74.2), Business (62.7), Health and Medicine (71.3), Humanities and Social Science (78.3), and Technology and Engineering (53.00).
h
Medical_Multimodal_Evaluation_Data
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FreedomAI (2025). Medical_Multimodal_Evaluation_Data [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 1, 2025
Dataset authored and provided by
FreedomAI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Evaluation Guide

This dataset is used to evaluate medical multimodal LLMs, as used in HuatuoGPT-Vision. It includes benchmarks such as VQA-RAD, SLAKE, PathVQA, PMC-VQA, OmniMedVQA, and MMMU-Medical-Tracks.
To get started:

Download the dataset and extract the images.zip file.
Find evaluation code on our GitHub: HuatuoGPT-Vision.

This open-source release aims to simplify the evaluation of medical multimodal capabilities in large models. Please cite the relevant benchmark… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

MMMU (2023). MMMU [Dataset]. https://huggingface.co/datasets/MMMU/MMMU

MMMU

mmmu

MMMU/MMMU

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 4, 2023

Dataset authored and provided by

MMMU

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)

  🔔News

🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25; test_Materials_17, 242) and content error in validation_Materials_25. 🛠️[2024-04-30]: Fixed missing "-" or "^" signs in Math dataset items (dev_Math_2, validation_Math_11, 12, 16; test_Math_8… See the full description on the dataset page: https://huggingface.co/datasets/MMMU/MMMU.

Clear search

Close search

Google apps

Main menu

MMMU

JMMMU

Benchmark comparison between Google's Gemini and OpenAI's GPT-4 in 2024

MMMU_with_difficulty_level

Google Gemini Statistics By Features, Performance and AI Versions

Medical_Multimodal_Evaluation_Data

MMMU

mmmu

MMMU/MMMU