Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for NuminaMath-1.5-RL-Verifiable
Dataset Summary
NuminaMath-1.5-RL-Verifiable is a curated subset of the NuminaMath-1.5 dataset, specifically filtered to support reinforcement learning applications requiring verifiable outcomes. This collection consists of 131,063 math word problems from the original dataset that meet strict filtering criteria: all problems have definitive numerical answers, validated problem statements and solutions, and come from… See the full description on the dataset page: https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable.
NuminaMath-1.5 Proofs Only
This is a filtered subset of the AI-MO/NuminaMath-1.5 dataset containing only proof problems.
Dataset Information
Total Problems: 110,998 Filter Criteria: question_type == 'proof' Original Dataset: AI-MO/NuminaMath-1.5 License: CC BY-NC 4.0
Usage
This dataset contains high-quality proof problems from various mathematical competitions and sources, formatted in Chain of Thought (CoT) manner.
Source Breakdown
The proof… See the full description on the dataset page: https://huggingface.co/datasets/nlile/NuminaMath-1.5-proofs-only.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
📃 Paper
This dataset contains EFAs inferred for a subset of NuminaMath_CoT, specifically the first 5,000 problems.
These EFAs were inferred by this model, and the prompts used for training are linked in the model card.
The dataset contains multiple EFA candidates for most of the first 5,000 problems in NuminaMath.
Each row in the dataset is described by the Row class below:
from pydantic import BaseModel
class ProblemVariant(BaseModel): """Synthetic problem variants constructed by… See the full description on the dataset page: https://huggingface.co/datasets/codezakh/NuminaMath-1.5-EFA-Subset.
laolaorkk/collect-data-NuminaMath-1.5-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
jnanliu/NuminaMath-1.5-550k dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for NuminaMath CoT
Dataset Summary
Tool-integrated reasoning (TIR) plays a crucial role in this competition. However, collecting and annotating such data is both costly and time-consuming. To address this, we selected approximately 70k problems from the NuminaMath-CoT dataset, focusing on those with numerical outputs, most of which are integers. We then utilized a pipeline leveraging GPT-4 to generate TORA-like reasoning paths, executing the code and… See the full description on the dataset page: https://huggingface.co/datasets/AI-MO/NuminaMath-TIR.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
daffapadantya/OpenR1-Math-220k-NuminaMath-1.5-Big-Math-RL-Verified-Cleaned dataset hosted on Hugging Face and contributed by the HF Datasets community
chenggong1995/NuminaMath-1.5-hard dataset hosted on Hugging Face and contributed by the HF Datasets community
wentingzhao/NuminaMath-1.5-RL-Verifiable_Qwen3-8B_zero_solve dataset hosted on Hugging Face and contributed by the HF Datasets community
Created by processing Numina [1] and Goedel-Pset-v1 [2] with LeanInteract [3]. [1] https://huggingface.co/datasets/AI-MO/NuminaMath-1.5 [2] https://huggingface.co/datasets/Goedel-LM/Goedel-Pset-v1 [3] https://github.com/augustepoiroux/LeanInteract
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
IMPORTANT NOTE
This data is part of the progress. Current translation progress: 24.85% (2024-09-18 01:32 KST) I'm taking a short break due to personal reasons. I'll be back in a month.
TODO-LIST
Finish translation
Translation
I used gemini-1.5-pro-exp-0827. The prompt used for translation will be disclosed at the end.
Dataset Card for NuminaMath CoT
Dataset Summary
Tool-integrated reasoning (TIR) plays a crucial role in this… See the full description on the dataset page: https://huggingface.co/datasets/ChuGyouk/AI-MO-NuminaMath-TIR-korean-240918.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is the math reasoning dataset used to train Notbad v1.0 Mistral 24B reasoning model. The reasoning data were sampled from a RL-based self-improved Mistral-Small-24B-Instruct-2501 model. The questions were sourced from:
NuminaMath 1.5 GSM8k Training Set MATH Training Set
You can try Notbad v1.0 Mistral 24B on chat.labml.ai.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
OpenR1-Math-220k
Dataset description
OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by DeepSeek R1 for problems from NuminaMath 1.5. The traces were verified using Math Verify for most samples and Llama-3.3-70B-Instruct as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer. The dataset consists of two splits:… See the full description on the dataset page: https://huggingface.co/datasets/oieieio/OpenR1-Math-220k.
concat: OpenMathInstruct-2, OpenMathReasoning, AceMath, OpenR1-Math, Numinamath-CoT, Numinamath 1.5, OpenThoughts2-1M, MetaMathQA, Maths-College
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
OpenR1-Math-Raw
Dataset description
OpenR1-Math-Raw is a large-scale dataset for mathematical reasoning. It consists of 516k math problems sourced from AI-MO/NuminaMath-1.5 with 1 to 8 reasoning traces generated by DeepSeek R1. The traces were verified using Math Verify and LLM-as-Judge based verifier (Llama-3.3-70B-Instruct) The dataset contains:
516,499 problems 1,209,403 R1-generated solutions, with 2.3 solutions per problem on average re-parsed answers… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw.
HuggingFaceFW/fineweb-edu (20%) (common knowledge) devngho/the-stack-llm-annotations-v2 (25%) (code) AI-MO/NuminaMath-1.5 (20%) (math) HuggingFaceH4/ultrachat_200k (20%) (chat) HuggingFaceFW/fineweb-2 (15%) (multilingual: [cmn_Hani, deu_Latn, jpn_Jpan, spa_Latn, fra_Latn, ita_Latn, por_Latn, nld_Latn, arb_Arab])
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
OpenR1-Math-220k
Dataset description
OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by DeepSeek R1 for problems from NuminaMath 1.5. The traces were verified using Math Verify for most samples and Llama-3.3-70B-Instruct as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer. The dataset consists of two splits:… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k.
Türkçe Matematik Veri Seti
Bu veri seti AI-MO/NuminaMath-1.5 veri setinin Türkçe'ye çevirilmiş bir alt parçasıdır ve paylaştığımız veri setinde orijinal veri setinden yaklaşık 186 bin satır bulunmaktadır. Veri setindeki sütunlar ve diğer bilgiler ile ilgili detaylı bilgiye orijinal veri seti üzerinden ulaşabilirsiniz. Problem ve çözümlerin çevirileri için gemini-2.0-flash modeli kullanılmıştır ve matematik notasyonları başta olmak üzere veri setinin çeviri kalitesinin üst düzeyde… See the full description on the dataset page: https://huggingface.co/datasets/ituperceptron/turkish-math-186k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for NuminaMath-1.5-RL-Verifiable
Dataset Summary
NuminaMath-1.5-RL-Verifiable is a curated subset of the NuminaMath-1.5 dataset, specifically filtered to support reinforcement learning applications requiring verifiable outcomes. This collection consists of 131,063 math word problems from the original dataset that meet strict filtering criteria: all problems have definitive numerical answers, validated problem statements and solutions, and come from… See the full description on the dataset page: https://huggingface.co/datasets/nlile/NuminaMath-1.5-RL-Verifiable.