Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for CodeForces-CoTs
Dataset description
CodeForces-CoTs is a large-scale dataset for training reasoning models on competitive programming tasks. It consists of 10k CodeForces problems with up to five reasoning traces generated by DeepSeek R1. We did not filter the traces for correctness, but found that around 84% of the Python ones pass the public tests. The dataset consists of several subsets:
solutions: we prompt R1 to solve the problem and produce code.… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/codeforces-cots.
Dataset Card for open-r1-codeforces-cot-kd-solutions-sample
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/winglian/open-r1-codeforces-cot-kd-solutions-sample/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info… See the full description on the dataset page: https://huggingface.co/datasets/winglian/open-r1-codeforces-cot-kd-solutions-sample.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for CodeForces
Dataset description
CodeForces is one of the most popular websites among competitive programmers, hosting regular contests where participants must solve challenging algorithmic optimization problems. The challenging nature of these problems makes them an interesting dataset to improve and test models’ code reasoning capabilities. This dataset includes more than 10k unique problems covering the very first contests all the way to 2025.… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/codeforces.
Astro-R1
Astro-R1 is designed to enhance AI models' reasoning capabilities, blending datasets for math, coding, and conversational tasks and has been shuffled. This dataset is a mix of the following (in diffrent amounts): simplescaling/s1K-1.1 LucidityAI/QWQ-Distill ServiceNow-AI/R1-Distill-SFT open-r1/codeforces-cots simplescaling/s1K-claude-3-7-sonnet
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for CodeForces-CoTs
Dataset description
CodeForces-CoTs is a large-scale dataset for training reasoning models on competitive programming tasks. It consists of 10k CodeForces problems with up to five reasoning traces generated by DeepSeek R1. We did not filter the traces for correctness, but found that around 84% of the Python ones pass the public tests. The dataset consists of several subsets:
solutions: we prompt R1 to solve the problem and produce code.… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/codeforces-cots.