37 datasets found

h
OpenThoughts-114k-math
huggingface.co
Updated Jan 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). OpenThoughts-114k-math [Dataset]. https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2025
Dataset authored and provided by
Open R1
Description
This is a filtered and metadata enriched version of open-thoughts/OpenThoughts-114k. While the original dataset is a valuable resource containing DeepSeek-R1 outputs, it has very little metadata (only 2 fields: system and conversations). It does not contain, for instance, the original solution label, which means that we can not verify the model answers.

What we did

filtered the dataset for math content (math questions were prefixed by "Return your final response within… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math.
h
OpenThoughts-114k-Code_decontaminated
huggingface.co
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open R1 (2025). OpenThoughts-114k-Code_decontaminated [Dataset]. https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 21, 2025
Dataset authored and provided by
Open R1
Description
Dataset description

This dataset is the same as open-r1/OpenThoughts-114k-Code decontaminated against the benchmark datasets. The decontamination has been run using the script in huggingface/open-r1: python scripts/decontaminate.py
--dataset "open-r1/OpenThoughts-114k-Code"
-c ... Removed 2 samples from 'aime_2025' Removed 28 samples from 'math_500' Removed 3482 samples from 'lcb' Initial size: 19890, Final size: 16378
h
OpenThoughts-GRPO
huggingface.co
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Justin J (2025). OpenThoughts-GRPO [Dataset]. https://huggingface.co/datasets/justinj92/OpenThoughts-GRPO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Authors
Justin J
Description
justinj92/OpenThoughts-GRPO dataset hosted on Hugging Face and contributed by the HF Datasets community
h
openthoughts-dataset-sj-ghimmire
huggingface.co
Updated Mar 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandesh Ghimire (2025). openthoughts-dataset-sj-ghimmire [Dataset]. https://huggingface.co/datasets/sandeshghimire/openthoughts-dataset-sj-ghimmire
Explore at:
Dataset updated
Mar 2, 2025
Authors
Sandesh Ghimire
Description
sandeshghimire/openthoughts-dataset-sj-ghimmire dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-thoughts-science
huggingface.co
Updated Jan 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open-thoughts-science [Dataset]. https://huggingface.co/datasets/trungtvu/open-thoughts-science
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2025
Authors
Trung Vu
Description
trungtvu/open-thoughts-science dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenThoughts-114k
huggingface.co
Updated Mar 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SHIVIK AI (2025). OpenThoughts-114k [Dataset]. https://huggingface.co/datasets/shivikai/OpenThoughts-114k
Explore at:
Dataset updated
Mar 11, 2025
Authors
SHIVIK AI
Description
shivikai/OpenThoughts-114k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenThoughts-10k-DeepSeek-R1
huggingface.co
Updated Feb 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Galichin (2025). OpenThoughts-10k-DeepSeek-R1 [Dataset]. https://huggingface.co/datasets/andreuka18/OpenThoughts-10k-DeepSeek-R1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2025
Authors
Andrey Galichin
Description
andreuka18/OpenThoughts-10k-DeepSeek-R1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
openthoughts-coding-llama-factory
huggingface.co
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
openthoughts-coding-llama-factory [Dataset]. https://huggingface.co/datasets/chansung/openthoughts-coding-llama-factory
Explore at:
Dataset updated
Mar 12, 2025
Authors
chansung park
Description
chansung/openthoughts-coding-llama-factory dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenThoughts-1k-Sampled
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Domenico Manuardi, OpenThoughts-1k-Sampled [Dataset]. https://huggingface.co/datasets/dome015/OpenThoughts-1k-Sampled
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Domenico Manuardi
Description
dome015/OpenThoughts-1k-Sampled dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-thoughts-subset
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reinhard Heckel, open-thoughts-subset [Dataset]. https://huggingface.co/datasets/reinhardh/open-thoughts-subset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Reinhard Heckel
Description
reinhardh/open-thoughts-subset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-thoughts-verified-mix-dry-run
huggingface.co
Updated Feb 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grayson Adkins (2025). open-thoughts-verified-mix-dry-run [Dataset]. https://huggingface.co/datasets/gadkins/open-thoughts-verified-mix-dry-run
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 6, 2025
Authors
Grayson Adkins
Description
gadkins/open-thoughts-verified-mix-dry-run dataset hosted on Hugging Face and contributed by the HF Datasets community
h
openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development, openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912

Precomputed model outputs for evaluation.

Evaluation Results GPQADiamond

Average Accuracy: 31.31% ± 4.97% Number of Runs: 3

Run Accuracy Questions Solved Total Questions

1 24.24% 48 198

2 26.26% 52 198

3 43.43% 86 198
h
DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized
huggingface.co
Updated Mar 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Galichin (2025). DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized [Dataset]. https://huggingface.co/datasets/andreuka18/DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2025
Authors
Andrey Galichin
Description
andreuka18/DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenThoughts-2048-98K
huggingface.co
Updated Mar 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yunhao Fang (2025). OpenThoughts-2048-98K [Dataset]. https://huggingface.co/datasets/Seerkfang/OpenThoughts-2048-98K
Explore at:
Dataset updated
Mar 23, 2025
Authors
Yunhao Fang
Description
Seerkfang/OpenThoughts-2048-98K dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-thoughts-code-dry-run
huggingface.co
Updated Feb 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yan Gao (2025). open-thoughts-code-dry-run [Dataset]. https://huggingface.co/datasets/O2iginal/open-thoughts-code-dry-run
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2025
Authors
Yan Gao
Description
O2iginal/open-thoughts-code-dry-run dataset hosted on Hugging Face and contributed by the HF Datasets community
h
openthoughts-114k-no-special-template
huggingface.co
Updated Feb 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development (2025). openthoughts-114k-no-special-template [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/openthoughts-114k-no-special-template
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 13, 2025
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/openthoughts-114k-no-special-template dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenThoughts-25k
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doge Face, OpenThoughts-25k [Dataset]. https://huggingface.co/datasets/SmallDoge/OpenThoughts-25k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Doge Face
Description
SmallDoge/OpenThoughts-25k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-thoughts-unverified-mix-claude
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development, open-thoughts-unverified-mix-claude [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/open-thoughts-unverified-mix-claude
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/open-thoughts-unverified-mix-claude dataset hosted on Hugging Face and contributed by the HF Datasets community
h
OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT...
huggingface.co
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peanut Jar Mixers Development (2025). OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT [Dataset]. https://huggingface.co/datasets/PJMixers-Dev/OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT
Explore at:
Dataset updated
Mar 25, 2025
Dataset authored and provided by
Peanut Jar Mixers Development
Description
PJMixers-Dev/OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open-thoughts-verified-mix
huggingface.co
Updated Jan 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
open-thoughts-verified-mix [Dataset]. https://huggingface.co/datasets/trungtvu/open-thoughts-verified-mix
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2025
Authors
Trung Vu
Description
trungtvu/open-thoughts-verified-mix dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Open R1 (2025). OpenThoughts-114k-math [Dataset]. https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math

OpenThoughts-114k-math

open-r1/OpenThoughts-114k-math

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 30, 2025

Dataset authored and provided by

Open R1

Description

This is a filtered and metadata enriched version of open-thoughts/OpenThoughts-114k. While the original dataset is a valuable resource containing DeepSeek-R1 outputs, it has very little metadata (only 2 fields: system and conversations). It does not contain, for instance, the original solution label, which means that we can not verify the model answers.

  What we did

filtered the dataset for math content (math questions were prefixed by "Return your final response within… See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math.

Clear search

Close search

Google apps

Main menu

OpenThoughts-114k-math

OpenThoughts-114k-Code_decontaminated

OpenThoughts-GRPO

openthoughts-dataset-sj-ghimmire

open-thoughts-science

OpenThoughts-114k

OpenThoughts-10k-DeepSeek-R1

openthoughts-coding-llama-factory

OpenThoughts-1k-Sampled

open-thoughts-subset

open-thoughts-verified-mix-dry-run

openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912

DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized

OpenThoughts-2048-98K

open-thoughts-code-dry-run

openthoughts-114k-no-special-template

OpenThoughts-25k

open-thoughts-unverified-mix-claude

OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT...

open-thoughts-verified-mix

OpenThoughts-114k-mathSee More Versions

open-r1/OpenThoughts-114k-math

OpenThoughts-114k-math