37 datasets found
  1. h

    OpenThoughts-114k-math

    • huggingface.co
    Updated Jan 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open R1 (2025). OpenThoughts-114k-math [Dataset]. https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2025
    Dataset authored and provided by
    Open R1
    Description

    This is a filtered and metadata enriched version of open-thoughts/OpenThoughts-114k. While the original dataset is a valuable resource containing DeepSeek-R1 outputs, it has very little metadata (only 2 fields: system and conversations). It does not contain, for instance, the original solution label, which means that we can not verify the model answers.

      What we did
    

    filtered the dataset for math content (math questions were prefixed by "Return your final response withinโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math.

  2. h

    OpenThoughts-114k-Code_decontaminated

    • huggingface.co
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open R1 (2025). OpenThoughts-114k-Code_decontaminated [Dataset]. https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2025
    Dataset authored and provided by
    Open R1
    Description

    Dataset description

    This dataset is the same as open-r1/OpenThoughts-114k-Code decontaminated against the benchmark datasets. The decontamination has been run using the script in huggingface/open-r1: python scripts/decontaminate.py
    --dataset "open-r1/OpenThoughts-114k-Code"
    -c ... Removed 2 samples from 'aime_2025' Removed 28 samples from 'math_500' Removed 3482 samples from 'lcb' Initial size: 19890, Final size: 16378

  3. h

    OpenThoughts-GRPO

    • huggingface.co
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin J (2025). OpenThoughts-GRPO [Dataset]. https://huggingface.co/datasets/justinj92/OpenThoughts-GRPO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Authors
    Justin J
    Description

    justinj92/OpenThoughts-GRPO dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    openthoughts-dataset-sj-ghimmire

    • huggingface.co
    Updated Mar 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandesh Ghimire (2025). openthoughts-dataset-sj-ghimmire [Dataset]. https://huggingface.co/datasets/sandeshghimire/openthoughts-dataset-sj-ghimmire
    Explore at:
    Dataset updated
    Mar 2, 2025
    Authors
    Sandesh Ghimire
    Description

    sandeshghimire/openthoughts-dataset-sj-ghimmire dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    open-thoughts-science

    • huggingface.co
    Updated Jan 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open-thoughts-science [Dataset]. https://huggingface.co/datasets/trungtvu/open-thoughts-science
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2025
    Authors
    Trung Vu
    Description

    trungtvu/open-thoughts-science dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    OpenThoughts-114k

    • huggingface.co
    Updated Mar 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SHIVIK AI (2025). OpenThoughts-114k [Dataset]. https://huggingface.co/datasets/shivikai/OpenThoughts-114k
    Explore at:
    Dataset updated
    Mar 11, 2025
    Authors
    SHIVIK AI
    Description

    shivikai/OpenThoughts-114k dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    OpenThoughts-10k-DeepSeek-R1

    • huggingface.co
    Updated Feb 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Galichin (2025). OpenThoughts-10k-DeepSeek-R1 [Dataset]. https://huggingface.co/datasets/andreuka18/OpenThoughts-10k-DeepSeek-R1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2025
    Authors
    Andrey Galichin
    Description

    andreuka18/OpenThoughts-10k-DeepSeek-R1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    openthoughts-coding-llama-factory

    • huggingface.co
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    openthoughts-coding-llama-factory [Dataset]. https://huggingface.co/datasets/chansung/openthoughts-coding-llama-factory
    Explore at:
    Dataset updated
    Mar 12, 2025
    Authors
    chansung park
    Description

    chansung/openthoughts-coding-llama-factory dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    OpenThoughts-1k-Sampled

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Domenico Manuardi, OpenThoughts-1k-Sampled [Dataset]. https://huggingface.co/datasets/dome015/OpenThoughts-1k-Sampled
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Domenico Manuardi
    Description

    dome015/OpenThoughts-1k-Sampled dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    open-thoughts-subset

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reinhard Heckel, open-thoughts-subset [Dataset]. https://huggingface.co/datasets/reinhardh/open-thoughts-subset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Reinhard Heckel
    Description

    reinhardh/open-thoughts-subset dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    open-thoughts-verified-mix-dry-run

    • huggingface.co
    Updated Feb 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grayson Adkins (2025). open-thoughts-verified-mix-dry-run [Dataset]. https://huggingface.co/datasets/gadkins/open-thoughts-verified-mix-dry-run
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2025
    Authors
    Grayson Adkins
    Description

    gadkins/open-thoughts-verified-mix-dry-run dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development, openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912 [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/openthoughts-114k-no-special-template_eval_03-11-25_05-44-46_f912

    Precomputed model outputs for evaluation.

      Evaluation Results
    
    
    
    
    
      GPQADiamond
    

    Average Accuracy: 31.31% ยฑ 4.97% Number of Runs: 3

    Run Accuracy Questions Solved Total Questions

    1 24.24% 48 198

    2 26.26% 52 198

    3 43.43% 86 198

  13. h

    DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized

    • huggingface.co
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Galichin (2025). DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized [Dataset]. https://huggingface.co/datasets/andreuka18/DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2025
    Authors
    Andrey Galichin
    Description

    andreuka18/DeepSeek-R1-Distill-Llama-8B-OpenThoughts-114k-tokenized dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    OpenThoughts-2048-98K

    • huggingface.co
    Updated Mar 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunhao Fang (2025). OpenThoughts-2048-98K [Dataset]. https://huggingface.co/datasets/Seerkfang/OpenThoughts-2048-98K
    Explore at:
    Dataset updated
    Mar 23, 2025
    Authors
    Yunhao Fang
    Description

    Seerkfang/OpenThoughts-2048-98K dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    open-thoughts-code-dry-run

    • huggingface.co
    Updated Feb 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Gao (2025). open-thoughts-code-dry-run [Dataset]. https://huggingface.co/datasets/O2iginal/open-thoughts-code-dry-run
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Authors
    Yan Gao
    Description

    O2iginal/open-thoughts-code-dry-run dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    openthoughts-114k-no-special-template

    • huggingface.co
    Updated Feb 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development (2025). openthoughts-114k-no-special-template [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/openthoughts-114k-no-special-template
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/openthoughts-114k-no-special-template dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    OpenThoughts-25k

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doge Face, OpenThoughts-25k [Dataset]. https://huggingface.co/datasets/SmallDoge/OpenThoughts-25k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Doge Face
    Description

    SmallDoge/OpenThoughts-25k dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    open-thoughts-unverified-mix-claude

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development, open-thoughts-unverified-mix-claude [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/open-thoughts-unverified-mix-claude
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/open-thoughts-unverified-mix-claude dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT...

    • huggingface.co
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peanut Jar Mixers Development (2025). OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT [Dataset]. https://huggingface.co/datasets/PJMixers-Dev/OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset authored and provided by
    Peanut Jar Mixers Development
    Description

    PJMixers-Dev/OpenThoughts-114k-Code_decontaminated-4k-think-2k-response-filtered-ShareGPT dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    open-thoughts-verified-mix

    • huggingface.co
    Updated Jan 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open-thoughts-verified-mix [Dataset]. https://huggingface.co/datasets/trungtvu/open-thoughts-verified-mix
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2025
    Authors
    Trung Vu
    Description

    trungtvu/open-thoughts-verified-mix dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Open R1 (2025). OpenThoughts-114k-math [Dataset]. https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math

OpenThoughts-114k-math

open-r1/OpenThoughts-114k-math

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2025
Dataset authored and provided by
Open R1
Description

This is a filtered and metadata enriched version of open-thoughts/OpenThoughts-114k. While the original dataset is a valuable resource containing DeepSeek-R1 outputs, it has very little metadata (only 2 fields: system and conversations). It does not contain, for instance, the original solution label, which means that we can not verify the model answers.

  What we did

filtered the dataset for math content (math questions were prefixed by "Return your final response withinโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/open-r1/OpenThoughts-114k-math.

Search
Clear search
Close search
Google apps
Main menu