Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)
🌐 Homepage | 🏆 Leaderboard | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub
🔔News
🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25; test_Materials_17, 242) and content error in validation_Materials_25. 🛠️[2024-04-30]: Fixed missing "-" or "^" signs in Math dataset items (dev_Math_2, validation_Math_11, 12, 16; test_Math_8… See the full description on the dataset page: https://huggingface.co/datasets/MMMU/MMMU.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark
🌐 Homepage | 🤗 Dataset | 🏆 HF Leaderboard | 📖 arXiv | 💻 Code
Introduction
We introduce JMMMU (Japanese MMMU), a multimodal benchmark that can truly evaluate LMM performance in Japanese. To create JMMMU, we first carefully analyzed the existing MMMU benchmark and examined its cultural dependencies. Then, for questions in culture-agnostic subjects, we employed native Japanese speakers who… See the full description on the dataset page: https://huggingface.co/datasets/JMMMU/JMMMU.
Gemini Ultra, developed by Google, has beaten OpenAI's GPT-4 in the MMMU benchmark. Only in business and science did GPT-4 perform better. The overall quality of the models is very similar, with Gemini only having a * point lead on its OpenAI developed competitor.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MMMU with difficulty level tags
This dataset extends the 🤗 MMMU val benchmark by introducing two additional tags: passrate_for_qwen2.5_vl_7b and difficulty_level_for_qwen2.5_vl_7b. Further details are available in our paper The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs.
🚀 Data Usage
from datasets import load_dataset
dataset = load_dataset("JierunChen/MMMU_with_difficulty_level") print(dataset)
📑… See the full description on the dataset page: https://huggingface.co/datasets/JierunChen/MMMU_with_difficulty_level.
https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy
Google Gemini Statistics: In 2023, Google unveiled the most powerful AI model to date. Google Gemini is the world’s most advanced AI leaving the ChatGPT 4 behind in the line. Google has 3 different sizes of models, superior to each, and can perform tasks accordingly. According to Google Gemini Statistics, these can understand and solve complex problems related to absolutely anything. Google even said, they will develop AI in such as way that it will let you know how helpful AI is in our daily routine. Well, we hope our next generation won’t be fully dependent on such technologies, otherwise, we will lose all of our natural talent! Editor’s Choice Google Gemini can follow natural and engaging conversations. According to Google Gemini Statistics, Gemini Ultra has a 90.0% score on the MMLU benchmark for testing the knowledge of and problem-solving on subjects including history, physics, math, law, ethics, history, and medicine. If you ask Gemini what to do with your raw material, it can provide you with ideas in the form of text or images according to the given input. Gemini has outperformed ChatGPT -4 tests in the majority of the cases. According to the report this LLM is said to be unique because it can process multiple types of data at the same time along with video, images, computer code, and text. Google is considering its development as The Gemini Era, showing the importance of our AI is significant in improving our daily lives. Google Gemini can talk like a real person Gemini Ultra is the largest model and can solve extremely complex problems. Gemini models are trained on multilingual and multimodal datasets. Gemini’s Ultra performance on the MMMU benchmark has also outperformed the GPT-4V in the following results Art and Design (74.2), Business (62.7), Health and Medicine (71.3), Humanities and Social Science (78.3), and Technology and Engineering (53.00).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Evaluation Guide
This dataset is used to evaluate medical multimodal LLMs, as used in HuatuoGPT-Vision. It includes benchmarks such as VQA-RAD, SLAKE, PathVQA, PMC-VQA, OmniMedVQA, and MMMU-Medical-Tracks.
To get started:
Download the dataset and extract the images.zip file.
Find evaluation code on our GitHub: HuatuoGPT-Vision.
This open-source release aims to simplify the evaluation of medical multimodal capabilities in large models. Please cite the relevant benchmark… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/Medical_Multimodal_Evaluation_Data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MMMU (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)
🌐 Homepage | 🏆 Leaderboard | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub
🔔News
🛠️[2024-05-30]: Fixed duplicate option issues in Materials dataset items (validation_Materials_25; test_Materials_17, 242) and content error in validation_Materials_25. 🛠️[2024-04-30]: Fixed missing "-" or "^" signs in Math dataset items (dev_Math_2, validation_Math_11, 12, 16; test_Math_8… See the full description on the dataset page: https://huggingface.co/datasets/MMMU/MMMU.