94 datasets found

h
MMLU-Pro
huggingface.co
paperswithcode.com
Updated May 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIGER-Lab (2024). MMLU-Pro [Dataset]. http://doi.org/10.57967/hf/2439
Explore at:
Unique identifier
https://doi.org/10.57967/hf/2439
Dataset updated
May 8, 2024
Dataset authored and provided by
TIGER-Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MMLU-Pro Dataset

MMLU-Pro dataset is a more robust and challenging massive multi-task understanding dataset tailored to more rigorously benchmark large language models' capabilities. This dataset contains 12K complex questions across various disciplines. |Github | 🏆Leaderboard | 📖Paper |

🚀 What's New

[2025.04.06] We corrected 15 answers in medical domain based on the recommendations of medical professionals, thanks to Dr. Robert (Bob) Hoyt and the subspecialists… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro.
MMLU-Pro
huggingface.co
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SB Intuitions (2024). MMLU-Pro [Dataset]. https://huggingface.co/datasets/sbintuitions/MMLU-Pro
Explore at:
Dataset updated
Oct 18, 2024
Dataset provided by
Authors
SB Intuitions
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
評価スコアの再現性確保と SB Intuitions 修正版の公開用クローンソース： TIGER-Lab/MMLU-Pro on Hugging Face

MMLU-Pro

MMLU-Pro dataset is a more robust and challenging massive multi-task understanding dataset tailored to more rigorously benchmark large language models' capabilities. This dataset contains 12K complex questions across various disciplines.

Licensing Information

MIT

Citation Information

@misc{wang2024mmlupro, title={MMLU-Pro: A More Robust and Challenging Multi-Task… See the full description on the dataset page: https://huggingface.co/datasets/sbintuitions/MMLU-Pro.
h
MMLU-Pro
huggingface.co
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jaeyong Park (2025). MMLU-Pro [Dataset]. https://huggingface.co/datasets/jaypyon/MMLU-Pro
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2025
Authors
Jaeyong Park
Description
jaypyon/MMLU-Pro dataset hosted on Hugging Face and contributed by the HF Datasets community
Mmlu Pro by Model
artificialanalysis.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis, Mmlu Pro by Model [Dataset]. https://artificialanalysis.ai/evaluations/mmlu-pro
Explore at:
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Independently conducted by Artificial Analysis by Model
h
MMLU-Pro-ita
huggingface.co
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edoardo Federici (2024). MMLU-Pro-ita [Dataset]. https://huggingface.co/datasets/efederici/MMLU-Pro-ita
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2024
Authors
Edoardo Federici
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MMLU-Pro-ita Dataset Introduction

This is an Italian translation of MMLU-Pro, a more robust and challenging massive multi-task understanding dataset tailored to more rigorously benchmark large language models' capabilities. This dataset contains 12K complex questions across various disciplines.

1. What's new about MMLU-Pro

Compared to the original MMLU, there are three major differences:

The original MMLU dataset only contains 4 options, MMLU-Pro increases it to 10… See the full description on the dataset page: https://huggingface.co/datasets/efederici/MMLU-Pro-ita.
Leading AI model performance on MMLU-Pro 2025
statista.com
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading AI model performance on MMLU-Pro 2025 [Dataset]. https://www.statista.com/statistics/1611886/mmlu-pro-accuracy/
Explore at:
Dataset updated
Jun 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
Artificial intelligence models continue to push the boundaries of language understanding and generation, with DeepSeek-R1 leading the pack in 2025 with an impressive ** percent accuracy rate on the AI MMLU benchmark. This achievement highlights the rapid progress in AI capabilities, as all major programs now demonstrate success ratios exceeding ** percent, indicating a significant leap in machine comprehension across various domains. Multilingual capabilities The AI landscape is not just about general language understanding. In 2024, the artificial analysis multilingual index ranked AI models based on their ability to handle multiple languages, with o1 leading at ** percent. Testing includes Spanish, Bengali, German, Japanese, English, Chinese, Swahili and French. Challenging exams This multilingual proficiency is further tested by humanity's last exam (HLE), an exceptionally tough evaluation consisting of ***** challenging questions across numerous subjects. On this rigorous test, o1 again emerged as the top performer with an *** percent score, followed by Gemini *** Flash at *** percent, showcasing the current limits of AI in tackling highly complex, multidisciplinary problems.
f
MMLU-Pro Benchmark Questions
figshare.com
xlsx
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Hoyt; Dacre Knight; Maria Bajwa; Maruf Haider (2025). MMLU-Pro Benchmark Questions [Dataset]. http://doi.org/10.6084/m9.figshare.28751756.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28751756.v1
Dataset updated
Apr 8, 2025
Dataset provided by
figshare
Authors
Robert Hoyt; Dacre Knight; Maria Bajwa; Maruf Haider
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We investigated DeepSeek R1's ability to diagnose 162 medical scenarios that are part of MMLU-Pro question and answer dataset
h
MMLU-Pro-Results
huggingface.co
Updated Sep 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
leafspark (2024). MMLU-Pro-Results [Dataset]. https://huggingface.co/datasets/leafspark/MMLU-Pro-Results
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2024
Authors
leafspark
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
leafspark/MMLU-Pro-Results dataset hosted on Hugging Face and contributed by the HF Datasets community
h
mmlu-pro-nomath-sml
huggingface.co
Updated Jul 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sam Paech (2024). mmlu-pro-nomath-sml [Dataset]. https://huggingface.co/datasets/sam-paech/mmlu-pro-nomath-sml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 9, 2024
Authors
Sam Paech
Description
MMLU-Pro-NoMath

MMLU-Pro-NoMath and MMLU-Pro-NoMath-Sml are subsets of MMLU-Pro with questions requiring multi-step calculation removed (43% of the original test set). We used claude-3.5-sonnet as the classifier. Questions were capped to an upper length limit to make logprobs evals faster and less likely to OOM. It's fast! 20 mins for NoMath and 7 mins for NoMath-Sml to evaluate gemma-2-9b using Eleuther harness.

Contents

Why do this? NoMath Subset Details What… See the full description on the dataset page: https://huggingface.co/datasets/sam-paech/mmlu-pro-nomath-sml.
h
MMLU-PRO
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bengali.AI (2025). MMLU-PRO [Dataset]. https://huggingface.co/datasets/bengaliAI/MMLU-PRO
Explore at:
Dataset updated
Jun 1, 2025
Dataset provided by
Bengali.AI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
bengaliAI/MMLU-PRO dataset hosted on Hugging Face and contributed by the HF Datasets community
d
MMLU Pro 大模型评测基准排行榜
datalearner.com
Updated Feb 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
数据学习 (DataLearner) (2025). MMLU Pro 大模型评测基准排行榜 [Dataset]. https://www.datalearner.com/ai-models/llm-benchmark-tests/16
Explore at:
Dataset updated
Feb 2, 2025
Dataset authored and provided by
数据学习 (DataLearner)
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
基于 MMLU Pro 基准的最新大语言模型（LLM）性能排行榜，包含各模型的得分、发布机构、发布时间等数据。
P
MML Dataset
paperswithcode.com
Updated Jan 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt (2025). MML Dataset [Dataset]. https://paperswithcode.com/dataset/mmlu
Explore at:
Dataset updated
Jan 10, 2025
Authors
Dan Hendrycks; Collin Burns; Steven Basart; Andy Zou; Mantas Mazeika; Dawn Song; Jacob Steinhardt
Description
MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability. Subjects range from traditional areas, such as mathematics and history, to more specialized areas like law and ethics. The granularity and breadth of the subjects makes the benchmark ideal for identifying a model’s blind spots.
h
MMLU-Pro-sample
huggingface.co
Updated Sep 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
du (2024). MMLU-Pro-sample [Dataset]. https://huggingface.co/datasets/dododododo/MMLU-Pro-sample
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2024
Authors
du
Description
dododododo/MMLU-Pro-sample dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MMLU-Pro-json
huggingface.co
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Cuenca (2025). MMLU-Pro-json [Dataset]. https://huggingface.co/datasets/pcuenq/MMLU-Pro-json
Explore at:
Dataset updated
Jun 13, 2025
Authors
Pedro Cuenca
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MMLU-Pro json

This is a reupload of MMLU-Pro in json format. Please, refer to the original dataset for details.
Intelligence Index by Gemini Endpoint
artificialanalysis.ai
Updated Jun 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2025). Intelligence Index by Gemini Endpoint [Dataset]. https://artificialanalysis.ai/models/gemini-2-5-pro
Explore at:
Dataset updated
Jun 15, 2025
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Artificial Analysis Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500 by Model
h
mmlu-pro
huggingface.co
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
InterestingWorks (2025). mmlu-pro [Dataset]. https://huggingface.co/datasets/guanning-ai/mmlu-pro
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
InterestingWorks
Description
guanning-ai/mmlu-pro dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MMLU-Pro-education-level
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LARSS (2025). MMLU-Pro-education-level [Dataset]. https://huggingface.co/datasets/LabARSS/MMLU-Pro-education-level
Explore at:
Dataset updated
May 29, 2025
Dataset authored and provided by
LARSS
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for MMLU Pro with education levels

MMLU Pro dataset with education levels

Dataset Details Dataset Description

A popular human-like complexity metric is an education level that is appropriate for a question. To get it for MMLU Pro dataset, we ask a large LLM (Mistral 123B) to act as a judge and return its estimate. Next, we query the large LLM again to estimate the quality of the previous assessment from 1 to 10 following the practice introduced… See the full description on the dataset page: https://huggingface.co/datasets/LabARSS/MMLU-Pro-education-level.
Intelligence Index by o1-preview Endpoint
artificialanalysis.ai
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2025). Intelligence Index by o1-preview Endpoint [Dataset]. https://artificialanalysis.ai/models/o1-preview
Explore at:
Dataset updated
May 15, 2025
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Artificial Analysis Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500 by Model
h
mmlu-pro-prep-eval-Llama-3.1-8B-Instruct-thinking
huggingface.co
Updated Oct 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Vila (2024). mmlu-pro-prep-eval-Llama-3.1-8B-Instruct-thinking [Dataset]. https://huggingface.co/datasets/dvilasuero/mmlu-pro-prep-eval-Llama-3.1-8B-Instruct-thinking
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 16, 2024
Authors
Daniel Vila
Description
dvilasuero/mmlu-pro-prep-eval-Llama-3.1-8B-Instruct-thinking dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MMLU-Pro-reasoning-score
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LARSS (2025). MMLU-Pro-reasoning-score [Dataset]. https://huggingface.co/datasets/LabARSS/MMLU-Pro-reasoning-score
Explore at:
Dataset updated
May 29, 2025
Dataset authored and provided by
LARSS
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for MMLU Pro with reasoning scores

MMLU Pro dataset with reasoning scores

Dataset Details Dataset Description

As discovered in "When an LLM is apprehensive about its answers -- and when its uncertainty is justified", amount of reasoning required to answer a question (a.k.a. reasoning score) is a beter metric to estimate model uncertainty compared to more human-like level of education. Following the foot steps outlined in that paper, we ask a… See the full description on the dataset page: https://huggingface.co/datasets/LabARSS/MMLU-Pro-reasoning-score.

Facebook

Twitter

Click to copy link

Link copied

Cite

TIGER-Lab (2024). MMLU-Pro [Dataset]. http://doi.org/10.57967/hf/2439

MMLU-Pro

TIGER-Lab/MMLU-Pro

Explore at:

Unique identifier

https://doi.org/10.57967/hf/2439

Dataset updated

May 8, 2024

Dataset authored and provided by

TIGER-Lab

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

MMLU-Pro Dataset

MMLU-Pro dataset is a more robust and challenging massive multi-task understanding dataset tailored to more rigorously benchmark large language models' capabilities. This dataset contains 12K complex questions across various disciplines. |Github | 🏆Leaderboard | 📖Paper |

  🚀 What's New

[2025.04.06] We corrected 15 answers in medical domain based on the recommendations of medical professionals, thanks to Dr. Robert (Bob) Hoyt and the subspecialists… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro.

Clear search

Close search

Google apps

Main menu

MMLU-Pro

MMLU-Pro

MMLU-Pro

Mmlu Pro by Model

MMLU-Pro-ita

Leading AI model performance on MMLU-Pro 2025

MMLU-Pro Benchmark Questions

MMLU-Pro-Results

mmlu-pro-nomath-sml

MMLU-PRO

MMLU Pro 大模型评测基准排行榜

MML Dataset

MMLU-Pro-sample

MMLU-Pro-json

Intelligence Index by Gemini Endpoint

mmlu-pro

MMLU-Pro-education-level

Intelligence Index by o1-preview Endpoint

mmlu-pro-prep-eval-Llama-3.1-8B-Instruct-thinking

MMLU-Pro-reasoning-score

MMLU-ProSee More Versions

MMLU-Pro

TIGER-Lab/MMLU-Pro

MMLU-Pro