This is a filtered and metadata enriched version of open-thoughts/OpenThoughts-114k. While the original dataset is a valuable resource containing DeepSeek-R1 outputs, it has very little metadata (only 2 fields: system and conversations). It does not contain, for instance, the original solution label, which means that we can not verify the model answers.
What we did
filtered the dataset for math content (math questions were prefixed by "Return your final response withinโฆ See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math.
Dataset description
This dataset is the same as open-r1/OpenThoughts-114k-Code decontaminated against the benchmark datasets.
The decontamination has been run using the script in huggingface/open-r1:
python scripts/decontaminate.py
--dataset "open-r1/OpenThoughts-114k-Code"
-c
...
Removed 2 samples from 'aime_2025'
Removed 28 samples from 'math_500'
Removed 3482 samples from 'lcb'
Initial size: 19890, Final size: 16378
justinj92/OpenThoughts-GRPO dataset hosted on Hugging Face and contributed by the HF Datasets community
sandeshghimire/openthoughts-dataset-sj-ghimmire dataset hosted on Hugging Face and contributed by the HF Datasets community
trungtvu/open-thoughts-science dataset hosted on Hugging Face and contributed by the HF Datasets community
shivikai/OpenThoughts-114k dataset hosted on Hugging Face and contributed by the HF Datasets community
andreuka18/OpenThoughts-10k-DeepSeek-R1 dataset hosted on Hugging Face and contributed by the HF Datasets community
chansung/openthoughts-coding-llama-factory dataset hosted on Hugging Face and contributed by the HF Datasets community
dome015/OpenThoughts-1k-Sampled dataset hosted on Hugging Face and contributed by the HF Datasets community
reinhardh/open-thoughts-subset dataset hosted on Hugging Face and contributed by the HF Datasets community
gadkins/open-thoughts-verified-mix-dry-run dataset hosted on Hugging Face and contributed by the HF Datasets community
mlfoundations-dev/openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912
Precomputed model outputs for evaluation.
Evaluation Results
GPQADiamond
Average Accuracy: 31.31% ยฑ 4.97% Number of Runs: 3
Run Accuracy Questions Solved Total Questions
1 24.24% 48 198
2 26.26% 52 198
3 43.43% 86 198
andreuka18/DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized dataset hosted on Hugging Face and contributed by the HF Datasets community
Seerkfang/OpenThoughts-2048-98K dataset hosted on Hugging Face and contributed by the HF Datasets community
O2iginal/open-thoughts-code-dry-run dataset hosted on Hugging Face and contributed by the HF Datasets community
mlfoundations-dev/openthoughts-114k-no-special-template dataset hosted on Hugging Face and contributed by the HF Datasets community
SmallDoge/OpenThoughts-25k dataset hosted on Hugging Face and contributed by the HF Datasets community
mlfoundations-dev/open-thoughts-unverified-mix-claude dataset hosted on Hugging Face and contributed by the HF Datasets community
PJMixers-Dev/OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT dataset hosted on Hugging Face and contributed by the HF Datasets community
trungtvu/open-thoughts-verified-mix dataset hosted on Hugging Face and contributed by the HF Datasets community
This is a filtered and metadata enriched version of open-thoughts/OpenThoughts-114k. While the original dataset is a valuable resource containing DeepSeek-R1 outputs, it has very little metadata (only 2 fields: system and conversations). It does not contain, for instance, the original solution label, which means that we can not verify the model answers.
What we did
filtered the dataset for math content (math questions were prefixed by "Return your final response withinโฆ See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math.