Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for AIMO Validation AMC
All 83 come from AMC12 2022, AMC12 2023, and have been extracted from the AOPS wiki page https://artofproblemsolving.com/wiki/index.php/AMC_12_Problems_and_Solutions This dataset serves as an internal validation set during our participation in the AIMO progress prize competition. Using data after 2021 is to avoid potential overlap with the MATH training set. Here are the different columns in the dataset: problem: the modified problem statement… See the full description on the dataset page: https://huggingface.co/datasets/AI-MO/aimo-validation-amc.
zwhe99/amc23 dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
chandrabhuma/AMC dataset hosted on Hugging Face and contributed by the HF Datasets community
yiboowang/aimo-validation-amc-repeated3 dataset hosted on Hugging Face and contributed by the HF Datasets community
felixZzz/math_eval_suite-amc dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Easy2Hard-Bench
Dataset Description
Easy2Hard-Bench is a benchmark consisting with 6 datasets in different domain (mathematics, programming, chess, and various reasoning tasks). The problems from each dataset are labeled with continuous-valued difficulty levels.
Topic Source Statistics Used to Infer Difficulty Source Type Estimation Method
E2H-AMC Math Competitions AMC, AIME, HMMT Item difficulties Human IRT
E2H-Codeforces Competitive Programming… See the full description on the dataset page: https://huggingface.co/datasets/furonghuang-lab/Easy2Hard-Bench.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for ToM-in-AMC
The dataset consists of ∼1,000 parsed movie scripts from IMSDb, each corresponding to a character understanding task.
Citation
BibTeX: @inproceedings{yu2024few, title = {Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind}, author = {Yu, Mo and Wang, Qiujing and Zhang, Shunchi and Sang, Yisi and Pu, Kangsheng and Wei, Zekai and Wang, Han and Xu, Liyan and Li, Jing and Yu, Yue and Zhou, Jie}… See the full description on the dataset page: https://huggingface.co/datasets/ShunchiZhang/ToM-in-AMC.
All problems copyrighted by the Mathematical Association of America's American Mathematics Competitions Source:
https://artofproblemsolving.com/wiki/index.php/2024_AMC_12A_Problems https://artofproblemsolving.com/wiki/index.php/2024_AMC_12B_Problems
Removed problems with figures:
12A: problem 14,18,22 12B: problem 7, 19
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Mathematics Aptitude Test of Heuristics, hard subset (MATH-Hard) dataset
Dataset Summary
The Mathematics Aptitude Test of Heuristics (MATH) dataset consists of problems from mathematics competitions, including the AMC 10, AMC 12, AIME, and more. Each problem in MATH has a full step-by-step solution, which can be used to teach models to generate answer derivations and explanations. For MATH-Hard, only the hardest questions were kept (Level 5).… See the full description on the dataset page: https://huggingface.co/datasets/lighteval/MATH-Hard.
AMC/AIME Mathematics Problem and Solution Dataset
Dataset Details
Dataset Name: AMC/AIME Mathematics Problem and Solution Dataset Version: 1.0 Release Date: 2024-06-1 Authors: Kevin Amiri
Intended Use
Primary Use: The dataset is created and intended for research and an AI Mathematical Olympiad Kaggle competition. Intended Users: Researchers in AI & mathematics or science.
Dataset Composition
Number of Examples: 20,300 problems and solution sets… See the full description on the dataset page: https://huggingface.co/datasets/kevin009/olympiad-math-contest-llama3-20k.
Dataset Sources: AMC 8 - AMC 10 - AMC 12 Both problems and solutions were scraped from their original URLs, preserving LaTeX format.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data
Our training dataset consists of approximately 40,000 unique mathematics problem-answer pairs compiled from:
AIME (American Invitational Mathematics Examination) problems (1984-2023) AMC (American Mathematics Competition) problems (prior to 2023) Omni-MATH dataset Still dataset
Format
Each row in the JSON dataset contains:
problem: The mathematical question text, formatted with LaTeX notation. solution: Offical solution to the problem, including LaTeX formatting… See the full description on the dataset page: https://huggingface.co/datasets/agentica-org/DeepScaleR-Preview-Dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MATH Dataset
The Mathematics Aptitude Test of Heuristics (MATH) dataset consists of problems from mathematics competitions, including the AMC 10, AMC 12, AIME, and more. Each problem in MATH has a full step-by-step solution, which can be used to teach models to generate answer derivations and explanations. This is a converted version of the hendrycks/competition_math originally created by Hendrycks et al. The dataset has been converted to parquet format for easier loading and usage.… See the full description on the dataset page: https://huggingface.co/datasets/Maxwell-Jia/MATH.
Description
Turkish translated version of barandinho/amc_2k_answers (solution column is dropped and not translated).
Dataset Curation Process
AMC 8, 10 and 12 problems were scraped (Acknowledgment: zypchn)Scraped data then deduplicated with basic Jaccard similarity methodThen answer column is created from scraped solutionsFinally Turkish translation was done via claude-3-7-sonnet-20250219 batch processing (cost us approx. $5)Note : we discarded rows that include string… See the full description on the dataset page: https://huggingface.co/datasets/barandinho/amc_turkish.
MiniF2F is a formal mathematics benchmark (translated across multiple formal systems) consisting of exercise statements from olympiads (AMC, AIME, IMO) as well as high-school and undergraduate maths classes. This dataset contains formal statements in Isabelle. Each statement is paired with an informal statement and an informal proof, as described in Draft, Sketch, Prove [Jiang et al 2023]. The problems in this dataset use the most recent facebookresearch/miniF2F commit on July 3, 2023.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Difficulty Estimation on MATH
We annotate the entire MATH dataset with a difficulty score based on the performance of the Qwen 2.5-MATH-7B model. This provides an adaptive signal for curriculum construction and model evaluation. The Mathematics Aptitude Test of Heuristics (MATH) dataset consists of problems from mathematics competitions, including the AMC 10, AMC 12, AIME, and more. Each problem in MATH has a full step-by-step solution, which can be used to teach models to generate… See the full description on the dataset page: https://huggingface.co/datasets/lime-nlp/MATH_Difficulty.
Test Dataset Compilation For Self-Rewarding Training
This is our test dataset compilation for our paper, "Can Large Reasoning Models Self-Train?" Please see our project page for more information about our project. In our paper, we use the three following datasets for evaluation:
AIME 2024 AIME 2025 AMC
Moreover, we also subsample 1% of the DAPO dataset for additional validation purposes. In this dataset, we compile all 4 of them together. This, together with our data preprocessing… See the full description on the dataset page: https://huggingface.co/datasets/ftajwar/srt_test_dataset.
OREAL-RL-Prompts
Links
Arxiv Github OREAL-7B Model OREAL-32B Model Data
Introduction
This repository contains the prompts used in the RL training phase of the OREAL project. The prompts are collected from MATH, Numina, and historical AMC/AIME (2024 is excluded). The pass rate of the prompts are calculated with 16 times of inference with OREAL-7B-SFT.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for AIMO Validation AMC
All 83 come from AMC12 2022, AMC12 2023, and have been extracted from the AOPS wiki page https://artofproblemsolving.com/wiki/index.php/AMC_12_Problems_and_Solutions This dataset serves as an internal validation set during our participation in the AIMO progress prize competition. Using data after 2021 is to avoid potential overlap with the MATH training set. Here are the different columns in the dataset: problem: the modified problem statement… See the full description on the dataset page: https://huggingface.co/datasets/AI-MO/aimo-validation-amc.