MATH-500 test set with the remaining 12000 examples in train. import datasets
from math_utils import last_boxed_only_string, remove_boxed
math = datasets.load_dataset('DigitalLearningGmbH/MATH-lighteval', 'default') math500 = datasets.load_dataset('HuggingFaceH4/MATH-500')
def map_to_500(example): return { 'problem':โฆ See the full description on the dataset page: https://huggingface.co/datasets/ricdomolm/MATH-500.
Comparison of Independently conducted by Artificial Analysis by Model
gudleifrr/MATH-500 dataset hosted on Hugging Face and contributed by the HF Datasets community
MATH-500 Multilingual Problem Set ๐โ
A multilingual subset from OpenAI's MATH benchmark. Perfect for testing math skills across languages, this dataset includes same problems in English, French, Italian, Turkish and Spanish.
๐ Available Languages
English ๐ฌ๐ง
French ๐ซ๐ท
Italian ๐ฎ๐น
Turkish ๐น๐ท
Spanish ๐ช๐ธ
๐ Source & Attribution
Original Dataset: Sourced from HuggingFaceH4/MATH-500.
๐ Quick Start
Load the datasetโฆ See the full description on the dataset page: https://huggingface.co/datasets/bezir/MATH-500-multilingual.
ko-math-500
ko-math-500 is a Korean-translated subset of 500 representative problems from the widely used MATH (Mathematics Aptitude Test of Heuristics) dataset, designed to evaluate the mathematical reasoning abilities of large language models. The ko-math-500 subset is based on the standard evaluation set of 500 problems used in the 2023 paper Letโs Verify Step by Step for model performance comparison. The original dataset is publicly available at HuggingFaceH4/MATH-500. Theโฆ See the full description on the dataset page: https://huggingface.co/datasets/davidkim205/ko-math-500.
appier-ai-research/MATH-500-translated dataset hosted on Hugging Face and contributed by the HF Datasets community
jacobmorrison/MATH-500-uppercase dataset hosted on Hugging Face and contributed by the HF Datasets community
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
In 2024, the artificial analysis math index ranked AI models based on their mathematical reasoning using benchmarks like AIME 2024 and Math-500. o1, QwQ-32B, and DeepSeek R1, led the rankings, showing the highest proficiency in mathematical problem solving.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual math proficiency from 2011 to 2023 for Princeton HSD 500 School District vs. Illinois
jacobmorrison/MATH-500 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual math proficiency from 2010 to 2011 for The 500 Role Model Academy vs. Florida and Miami-Dade School District
Comparison of Artificial Analysis Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500 by Model
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The term common jump days used here only means that the two indices both have jumps on these days. The mean and standard deviation of are calculated conditional on The quantities of price variations shown are all scaled by 10000.
matsant01/omni-MATH-500 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual math proficiency from 2012 to 2023 for Cape Elizabeth Middle School vs. Maine and Cape Elizabeth School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual math proficiency from 2012 to 2023 for Yarmouth Elementary School vs. Maine and Yarmouth Schools School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual math proficiency from 2012 to 2023 for Frank H Harrison Middle School vs. Maine and Yarmouth Schools School District
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FDR is controlled at level and . The quantities of price variations shown are all scaled by 10000.
Youthquake123/top-3-MATH-500-questions dataset hosted on Hugging Face and contributed by the HF Datasets community
MATH-500 test set with the remaining 12000 examples in train. import datasets
from math_utils import last_boxed_only_string, remove_boxed
math = datasets.load_dataset('DigitalLearningGmbH/MATH-lighteval', 'default') math500 = datasets.load_dataset('HuggingFaceH4/MATH-500')
def map_to_500(example): return { 'problem':โฆ See the full description on the dataset page: https://huggingface.co/datasets/ricdomolm/MATH-500.