Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for MMLU-Redux
[!TIP] Please consider using MMLU-Redux-2.0 which contains all 57 MMLU subjects.
MMLU-Redux is a subset of 3,000 manually re-annotated questions across 30 MMLU subjects.
News
[2025.02.08] We corrected one annotation in High School Mathematics subset, as noted in the PlatinumBench paper. [2025.01.23] MMLU-Redux is accepted to NAACL 2025!
Dataset Details
Dataset Description
Each data point in MMLU-Redux contains… See the full description on the dataset page: https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux.
aqweteddy/MMLU-Redux-MCQ dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
MMLU REDUX FILTERED I postprocessed the "edinburgh-dawg/mmlu-redux" dataset into macro-categories and added in each category only the samples where type_error had label ok: This is the macro-category division: Medicine and Health :[anatomy,clinical_knowledge,college_medicine,human_aging] Science:[college_chemistry,college_physics,high_school_chemistry,high_school_physics,virology,conceptual_physics,astronomy] Mathematics:[college_mathematics,high_school_mathematics,high_school_statistics… See the full description on the dataset page: https://huggingface.co/datasets/miguelamendez/mmlu_redux_filtered.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Model Name
MMLU-Pro-Plus Baseline Drop MMLU-Pro Baseline Drop Added Exp MMLU Pro Plus Added MMLU-redux 2.0 Baseline Drop AQUA-RAT Baseline Drop
CohereLabs/c4ai-command-a-03-2025 111B ✅ (single inference) ✅ done ✅ (HF naive batch) ✅ done ✅ done
-
-
-
google/gemma-3-12b-it 12B ✅ (HF naive batch) ✅ done ✅ (HF naive batch) ✅ done ✅ done
-
-
-
meta-llama/Llama-4-Scout-17B-16E 17B ✅ (HF naive batch) ✅ done ✅ (HF naive batch) ✅ done ✅ done
-
-
-
Qwen/Qwen3-4B 4B… See the full description on the dataset page: https://huggingface.co/datasets/sleeping-ai/Judgement-baseline.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for MMLU-Redux
[!TIP] Please consider using MMLU-Redux-2.0 which contains all 57 MMLU subjects.
MMLU-Redux is a subset of 3,000 manually re-annotated questions across 30 MMLU subjects.
News
[2025.02.08] We corrected one annotation in High School Mathematics subset, as noted in the PlatinumBench paper. [2025.01.23] MMLU-Redux is accepted to NAACL 2025!
Dataset Details
Dataset Description
Each data point in MMLU-Redux contains… See the full description on the dataset page: https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux.