100+ datasets found

h
OpenR1-Math-220k
huggingface.co
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). OpenR1-Math-220k [Dataset]. https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Dataset authored and provided by
Open R1
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
OpenR1-Math-220k

Dataset description

OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by DeepSeek R1 for problems from NuminaMath 1.5. The traces were verified using Math Verify for most samples and Llama-3.3-70B-Instruct as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer. The dataset consists of two splits:… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k.
h
OpenThoughts-114k-math
huggingface.co
Updated Jan 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). OpenThoughts-114k-math [Dataset]. https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2025
Dataset authored and provided by
Open R1
Description
This is a filtered and metadata enriched version of open-thoughts/OpenThoughts-114k. While the original dataset is a valuable resource containing DeepSeek-R1 outputs, it has very little metadata (only 2 fields: system and conversations). It does not contain, for instance, the original solution label, which means that we can not verify the model answers.

What we did

filtered the dataset for math content (math questions were prefixed by "Return your final response within… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math.
h
OpenR1-Math-Raw
huggingface.co
Updated Mar 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). OpenR1-Math-Raw [Dataset]. https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2025
Dataset authored and provided by
Open R1
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
OpenR1-Math-Raw

Dataset description

OpenR1-Math-Raw is a large-scale dataset for mathematical reasoning. It consists of 516k math problems sourced from AI-MO/NuminaMath-1.5 with 1 to 8 reasoning traces generated by DeepSeek R1. The traces were verified using Math Verify and LLM-as-Judge based verifier (Llama-3.3-70B-Instruct) The dataset contains:

516,499 problems 1,209,403 R1-generated solutions, with 2.3 solutions per problem on average re-parsed answers… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-Raw.
h
Big-Math-RL-Verified-Processed
huggingface.co
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). Big-Math-RL-Verified-Processed [Dataset]. https://huggingface.co/datasets/open-r1/Big-Math-RL-Verified-Processed
Explore at:
Dataset updated
May 13, 2025
Dataset authored and provided by
Open R1
Description
Dataset Card for Big-Math-RL-Verified-Processed

This is a processed version of SynthLabsAI/Big-Math-RL-Verified where we have applied the following filters:

Removed samples where llama8b_solve_rate is None Removed samples that could not be parsed by math-verify (empty lists)

We have also created 5 additional subsets to indicate difficulty level, similar to the MATH dataset. To do so, we computed quintiles on the llama8b_solve_rate values and then filtered the dataset into the… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/Big-Math-RL-Verified-Processed.
h
Mixture-of-Thoughts
huggingface.co
Updated May 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). Mixture-of-Thoughts [Dataset]. https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts
Explore at:
Dataset updated
May 26, 2025
Dataset authored and provided by
Open R1
Description
Dataset summary

Mixture-of-Thoughts is a curated dataset of 350k verified reasoning traces distilled from DeepSeek-R1. The dataset spans tasks in mathematics, coding, and science, and is designed to teach language models to reason step-by-step. It was used in the Open R1 project to train OpenR1-Distill-7B, an SFT model that replicates the reasoning capabilities of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B from the same base model. To load the dataset, run: from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts.
h
multimodal-open-r1-8k-verified
huggingface.co
Updated Jun 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huang's INTelligence lab (2025). multimodal-open-r1-8k-verified [Dataset]. https://huggingface.co/datasets/HINT-lab/multimodal-open-r1-8k-verified
Explore at:
Dataset updated
Jun 22, 2025
Dataset authored and provided by
Huang's INTelligence lab
Description
HINT-lab/multimodal-open-r1-8k-verified dataset hosted on Hugging Face and contributed by the HF Datasets community
h
codeforces
huggingface.co
Updated May 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). codeforces [Dataset]. https://huggingface.co/datasets/open-r1/codeforces
Explore at:
Dataset updated
May 13, 2025
Dataset authored and provided by
Open R1
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for CodeForces

Dataset description

CodeForces is one of the most popular websites among competitive programmers, hosting regular contests where participants must solve challenging algorithmic optimization problems. The challenging nature of these problems makes them an interesting dataset to improve and test models’ code reasoning capabilities. This dataset includes more than 10k unique problems covering the very first contests all the way to 2025.… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/codeforces.
h
verifiable-coding-problems-python_decontaminated-tested-shuffled
huggingface.co
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). verifiable-coding-problems-python_decontaminated-tested-shuffled [Dataset]. https://huggingface.co/datasets/open-r1/verifiable-coding-problems-python_decontaminated-tested-shuffled
Explore at:
Dataset updated
Apr 23, 2025
Dataset authored and provided by
Open R1
Description
open-r1/verifiable-coding-problems-python_decontaminated-tested-shuffled dataset hosted on Hugging Face and contributed by the HF Datasets community
h
DAPO-Math-17k-Processed
huggingface.co
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). DAPO-Math-17k-Processed [Dataset]. https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed
Explore at:
Dataset updated
Apr 10, 2025
Dataset authored and provided by
Open R1
Description
Dataset Card for DAPO-Math-17k-Processed

This is a processed version of BytedTsinghua-SIA/DAPO-Math-17k where we have:

Deduplicated the prompts Reformatted the prompts and ground truth answers to be compatible with TRL's GRPO trainer

We have also derived pure English and Chinese subsets. The full dataset processing logic can be found in create_dataset.py. If you find this dataset useful in your work, please cite the original source with: @misc{yu2025dapoopensourcellmreinforcement… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed.
h
codeforces-cots
huggingface.co
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). codeforces-cots [Dataset]. https://huggingface.co/datasets/open-r1/codeforces-cots
Explore at:
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Open R1
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for CodeForces-CoTs

Dataset description

CodeForces-CoTs is a large-scale dataset for training reasoning models on competitive programming tasks. It consists of 10k CodeForces problems with up to five reasoning traces generated by DeepSeek R1. We did not filter the traces for correctness, but found that around 84% of the Python ones pass the public tests. The dataset consists of several subsets:

solutions: we prompt R1 to solve the problem and produce code.… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/codeforces-cots.
h
ioi-2024-model-solutions
huggingface.co
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). ioi-2024-model-solutions [Dataset]. https://huggingface.co/datasets/open-r1/ioi-2024-model-solutions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 13, 2025
Dataset authored and provided by
Open R1
Description
open-r1/ioi-2024-model-solutions dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-r1-math
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LLMcompe Team Watanabe (2025). open-r1-math [Dataset]. https://huggingface.co/datasets/LLMcompe-Team-Watanabe/open-r1-math
Explore at:
Dataset updated
Feb 12, 2025
Dataset authored and provided by
LLMcompe Team Watanabe
Description
LLMcompe-Team-Watanabe/open-r1-math dataset hosted on Hugging Face and contributed by the HF Datasets community
h
multimodal-open-r1-simple-filtered
huggingface.co
Updated Jun 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Feuer (2025). multimodal-open-r1-simple-filtered [Dataset]. https://huggingface.co/datasets/penfever/multimodal-open-r1-simple-filtered
Explore at:
Dataset updated
Jun 8, 2025
Authors
Benjamin Feuer
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
multimodal-open-r1-simple-filtered

Original dataset structure preserved, filtered by token length and image quality

Dataset Description

This dataset was processed using the data-preproc package for vision-language model training.

Processing Configuration

Base Model: allenai/Molmo-7B-O-0924 Tokenizer: allenai/Molmo-7B-O-0924 Sequence Length: 8192 Processing Type: Vision Language (VL)

Dataset Features

input_ids: Tokenized input sequences… See the full description on the dataset page: https://huggingface.co/datasets/penfever/multimodal-open-r1-simple-filtered.
h
ioi-test-cases
huggingface.co
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). ioi-test-cases [Dataset]. https://huggingface.co/datasets/open-r1/ioi-test-cases
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 13, 2025
Dataset authored and provided by
Open R1
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IOI

The International Olympiad in Informatics (IOI) is one of five international science olympiads (if you are familiar with AIME, IOI is the programming equivalent of IMO, for which the very best students who take part in AIME are invited) and tests a very select group of high school students (4 per country) in complex algorithmic problems. The problems are extremely challenging, and the full test sets are available and released under a permissive (CC-BY) license. This means that… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/ioi-test-cases.
h
multimodal-open-r1-8k-verified_and_geometry3k
huggingface.co
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Hu (2025). multimodal-open-r1-8k-verified_and_geometry3k [Dataset]. https://huggingface.co/datasets/North2ICESea/multimodal-open-r1-8k-verified_and_geometry3k
Explore at:
Dataset updated
Jun 12, 2025
Authors
Andrew Hu
Description
North2ICESea/multimodal-open-r1-8k-verified_and_geometry3k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-r1-sampled
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yehya (2025). open-r1-sampled [Dataset]. https://huggingface.co/datasets/ykarout/open-r1-sampled
Explore at:
Dataset updated
Jun 1, 2025
Authors
yehya
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
ykarout/open-r1-sampled dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-r1-math-220k-chatml-v2
huggingface.co
Updated Apr 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sw_tuenguyen (2025). open-r1-math-220k-chatml-v2 [Dataset]. https://huggingface.co/datasets/meoconxinhxan/open-r1-math-220k-chatml-v2
Explore at:
Dataset updated
Apr 18, 2025
Authors
sw_tuenguyen
Description
meoconxinhxan/open-r1-math-220k-chatml-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-r1-integral-answer
huggingface.co
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khiem Pham (2025). open-r1-integral-answer [Dataset]. https://huggingface.co/datasets/drproduck/open-r1-integral-answer
Explore at:
Dataset updated
Mar 26, 2025
Authors
Khiem Pham
Description
drproduck/open-r1-integral-answer dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Open-R1-Math_DESC
huggingface.co
Updated Aug 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taiki METOKI (2025). Open-R1-Math_DESC [Dataset]. https://huggingface.co/datasets/Ta1k1/Open-R1-Math_DESC
Explore at:
Dataset updated
Aug 31, 2025
Authors
Taiki METOKI
Description
Ta1k1/Open-R1-Math_DESC dataset hosted on Hugging Face and contributed by the HF Datasets community
h
DeepSeek-R1-Zero-best_of_n-VLLM-Skywork-o1-Open-PRM-Qwen-2.5-7B-completions
huggingface.co
Updated Apr 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flora (2025). DeepSeek-R1-Zero-best_of_n-VLLM-Skywork-o1-Open-PRM-Qwen-2.5-7B-completions [Dataset]. https://huggingface.co/datasets/FUfu99/DeepSeek-R1-Zero-best_of_n-VLLM-Skywork-o1-Open-PRM-Qwen-2.5-7B-completions
Explore at:
Dataset updated
Apr 26, 2025
Authors
Flora
Description
FUfu99/DeepSeek-R1-Zero-best_of_n-VLLM-Skywork-o1-Open-PRM-Qwen-2.5-7B-completions dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Open R1 (2025). OpenR1-Math-220k [Dataset]. https://huggingface.co/datasets/open-r1/OpenR1-Math-220k

OpenR1-Math-220k

open-r1/OpenR1-Math-220k

Explore at:

84 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 12, 2025

Dataset authored and provided by

Open R1

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

OpenR1-Math-220k

  Dataset description

OpenR1-Math-220k is a large-scale dataset for mathematical reasoning. It consists of 220k math problems with two to four reasoning traces generated by DeepSeek R1 for problems from NuminaMath 1.5. The traces were verified using Math Verify for most samples and Llama-3.3-70B-Instruct as a judge for 12% of the samples, and each problem contains at least one reasoning trace with a correct answer. The dataset consists of two splits:… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k.

Clear search

Close search

Google apps

Main menu

OpenR1-Math-220k

OpenThoughts-114k-math

OpenR1-Math-Raw

Big-Math-RL-Verified-Processed

Mixture-of-Thoughts

multimodal-open-r1-8k-verified

codeforces

verifiable-coding-problems-python_decontaminated-tested-shuffled

DAPO-Math-17k-Processed

codeforces-cots

ioi-2024-model-solutions

open-r1-math

multimodal-open-r1-simple-filtered

ioi-test-cases

multimodal-open-r1-8k-verified_and_geometry3k

open-r1-sampled

open-r1-math-220k-chatml-v2

open-r1-integral-answer

Open-R1-Math_DESC

DeepSeek-R1-Zero-best_of_n-VLLM-Skywork-o1-Open-PRM-Qwen-2.5-7B-completions

OpenR1-Math-220kSee More Versions

open-r1/OpenR1-Math-220k

OpenR1-Math-220k