38 datasets found
  1. olmo-2-1124-13b-preference-mix

    • huggingface.co
    Updated Nov 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2024). olmo-2-1124-13b-preference-mix [Dataset]. https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    OLMo 2 1124 13B Preference Mixture

    Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact. This mix is made up of the following on-policy preference datasets generated using a synthetic data generation pipeline similar to Tulu

    Reused prompts from the SFT mix (via ai2-adapt-dev/sft_v3.9_used_on_policy_po_olmo2_13b and… See the full description on the dataset page: https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix.

  2. olmo-mix-1124

    • huggingface.co
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2024). olmo-mix-1124 [Dataset]. https://huggingface.co/datasets/allenai/olmo-mix-1124
    Explore at:
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    OLMo 2 (November 2024) Pretraining set

    Collection of data used to train OLMo-2-1124 models. The majority of this dataset comes from DCLM-Baseline with no additional filtering, but we provide the explicit breakdowns below.

    Name Tokens Bytes (uncompressed) Documents License

    DCLM-Baseline 3.70T 21.3TB 2.95B CC-BY-4.0

    Arxiv 20.8B 77.2GB 3.95M ODC-BY

    pes2o 58.6B 412GB 38M ODC-BY

    starcoder 83.0B 458GB 78.7M ODC-BY

    Algebraic-stack 11.8B 44.0GB 2.83M ODC-BY

    OpenWebMath… See the full description on the dataset page: https://huggingface.co/datasets/allenai/olmo-mix-1124.

  3. h

    olmo2-32b-combined-outputs

    • huggingface.co
    Updated Aug 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Morrison (2025). olmo2-32b-combined-outputs [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo2-32b-combined-outputs
    Explore at:
    Dataset updated
    Aug 12, 2025
    Authors
    Jacob Morrison
    Description

    jacobmorrison/olmo2-32b-combined-outputs dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    olmo-2-pref-mix-no-source

    • huggingface.co
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saumya Malik (2025). olmo-2-pref-mix-no-source [Dataset]. https://huggingface.co/datasets/saumyamalik/olmo-2-pref-mix-no-source
    Explore at:
    Dataset updated
    Aug 15, 2025
    Authors
    Saumya Malik
    Description

    saumyamalik/olmo-2-pref-mix-no-source dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    olmo-2-7b-pref-mix-delta-olmo2

    • huggingface.co
    Updated Aug 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saumya Malik (2025). olmo-2-7b-pref-mix-delta-olmo2 [Dataset]. https://huggingface.co/datasets/saumyamalik/olmo-2-7b-pref-mix-delta-olmo2
    Explore at:
    Dataset updated
    Aug 14, 2025
    Authors
    Saumya Malik
    Description

    saumyamalik/olmo-2-7b-pref-mix-delta-olmo2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    olmo-2-1124-7b-preference-mix-filtered-overlapping

    • huggingface.co
    Updated Aug 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Morrison (2025). olmo-2-1124-7b-preference-mix-filtered-overlapping [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo-2-1124-7b-preference-mix-filtered-overlapping
    Explore at:
    Dataset updated
    Aug 13, 2025
    Authors
    Jacob Morrison
    Description

    jacobmorrison/olmo-2-1124-7b-preference-mix-filtered-overlapping dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    olmo2-delta-qwen2.5_3b_over_1.5b

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Geng, olmo2-delta-qwen2.5_3b_over_1.5b [Dataset]. https://huggingface.co/datasets/scottgeng00/olmo2-delta-qwen2.5_3b_over_1.5b
    Explore at:
    Authors
    Scott Geng
    Description

    scottgeng00/olmo2-delta-qwen2.5_3b_over_1.5b dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    olmo-2-1124-13b-preference-mix-randomcase

    • huggingface.co
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Morrison (2025). olmo-2-1124-13b-preference-mix-randomcase [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo-2-1124-13b-preference-mix-randomcase
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    Jacob Morrison
    Description

    jacobmorrison/olmo-2-1124-13b-preference-mix-randomcase dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    rlhf-library-OLMo-2-1124-7B-DPO

    • huggingface.co
    Updated Sep 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan Lambert (2025). rlhf-library-OLMo-2-1124-7B-DPO [Dataset]. https://huggingface.co/datasets/natolambert/rlhf-library-OLMo-2-1124-7B-DPO
    Explore at:
    Dataset updated
    Sep 20, 2025
    Authors
    Nathan Lambert
    Description

    natolambert/rlhf-library-OLMo-2-1124-7B-DPO dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    ultrafeedback-cleaned-olmo2-7b-unused-gemma3

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saumya Malik, ultrafeedback-cleaned-olmo2-7b-unused-gemma3 [Dataset]. https://huggingface.co/datasets/saumyamalik/ultrafeedback-cleaned-olmo2-7b-unused-gemma3
    Explore at:
    Authors
    Saumya Malik
    Description

    saumyamalik/ultrafeedback-cleaned-olmo2-7b-unused-gemma3 dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    olmo2-13b-generated

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federico Barbero, olmo2-13b-generated [Dataset]. https://huggingface.co/datasets/fedzbar/olmo2-13b-generated
    Explore at:
    Authors
    Federico Barbero
    Description

    fedzbar/olmo2-13b-generated dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    tulu-3-sft-olmo-2-mixture-0225

    • huggingface.co
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2025). tulu-3-sft-olmo-2-mixture-0225 [Dataset]. https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture-0225
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2025
    Dataset authored and provided by
    Ai2
    Description

    Used to train OLMo 2 32B. From the blog post:

    Filtered out instructions from the SFT dataset and the chosen responses of the preference data that included mentions of a date cutoff from the synthetic data generation process. This resulted in a new version of the instruction dataset, Tulu 3 SFT Mixture 0225, and preference dataset, OLMo-2-32B-pref-mix-0325. We use majority voting to improve the quality of answers to our synthetic math questions. For our Persona MATH and Grade School Math… See the full description on the dataset page: https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture-0225.

  13. h

    olmo-2-0325-32b-preference-mix-leetspeak

    • huggingface.co
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Morrison (2025). olmo-2-0325-32b-preference-mix-leetspeak [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo-2-0325-32b-preference-mix-leetspeak
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    Jacob Morrison
    Description

    jacobmorrison/olmo-2-0325-32b-preference-mix-leetspeak dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    olmo-2-0325-32b-preference-mix-20-pct-perturbed

    • huggingface.co
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saumya Malik (2025). olmo-2-0325-32b-preference-mix-20-pct-perturbed [Dataset]. https://huggingface.co/datasets/saumyamalik/olmo-2-0325-32b-preference-mix-20-pct-perturbed
    Explore at:
    Dataset updated
    Jul 12, 2025
    Authors
    Saumya Malik
    Description

    saumyamalik/olmo-2-0325-32b-preference-mix-20-pct-perturbed dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    olmo-2-0325-32b-preference-mix

    • huggingface.co
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victoria Graf (2025). olmo-2-0325-32b-preference-mix [Dataset]. https://huggingface.co/datasets/VGraf/olmo-2-0325-32b-preference-mix
    Explore at:
    Dataset updated
    Mar 13, 2025
    Authors
    Victoria Graf
    Description

    VGraf/olmo-2-0325-32b-preference-mix dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    allenai_OLMo-2-1124-7B-Instruct-details

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open LLM Leaderboard (2025). allenai_OLMo-2-1124-7B-Instruct-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/allenai_OLMo-2-1124-7B-Instruct-details
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Open LLM Leaderboard
    Description

    Dataset Card for Evaluation run of allenai/OLMo-2-1124-7B-Instruct

    Dataset automatically created during the evaluation run of model allenai/OLMo-2-1124-7B-Instruct The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/allenai_OLMo-2-1124-7B-Instruct-details.

  17. h

    OLMo-2-0425-1B-Instruct_DPO

    • huggingface.co
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neel Rajani (2025). OLMo-2-0425-1B-Instruct_DPO [Dataset]. https://huggingface.co/datasets/Neelectric/OLMo-2-0425-1B-Instruct_DPO
    Explore at:
    Dataset updated
    Dec 2, 2025
    Authors
    Neel Rajani
    Description

    Neelectric/OLMo-2-0425-1B-Instruct_DPO dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT

    • huggingface.co
    Updated Jun 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peanut Jar Mixers Development (2025). allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT [Dataset]. https://huggingface.co/datasets/PJMixers-Dev/allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT
    Explore at:
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    Peanut Jar Mixers Development
    Description

    Removed sources:

    ai2-adapt-dev/tulu_v3.9_wildjailbreak_decontaminated_50k ai2-adapt-dev/tulu_v3.9_synthetic_finalresp_wildguardmixtrain_decontaminated_50k ai2-adapt-dev/coconot_converted ai2-adapt-dev/tulu_hard_coded_repeated_10 ai2-adapt-dev/tulu_v3.9_aya_100k ai2-adapt-dev/numinamath_tir_math_decontaminated ai2-adapt-dev/tulu_v3.9_open_math_2_gsm8k_50k ai2-adapt-dev/tulu_v3.9_personahub_math_interm_algebra_20k allenai/tulu-3-sft-personas-math-filtered… See the full description on the dataset page: https://huggingface.co/datasets/PJMixers-Dev/allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT.

  19. h

    daringanteater-prefs-olmo2-7b-unused-gemma3

    • huggingface.co
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saumya Malik (2025). daringanteater-prefs-olmo2-7b-unused-gemma3 [Dataset]. https://huggingface.co/datasets/saumyamalik/daringanteater-prefs-olmo2-7b-unused-gemma3
    Explore at:
    Dataset updated
    Nov 25, 2025
    Authors
    Saumya Malik
    Description

    saumyamalik/daringanteater-prefs-olmo2-7b-unused-gemma3 dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    olmo-2-0325-32b-preference-mix-mis-sense

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Morrison, olmo-2-0325-32b-preference-mix-mis-sense [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo-2-0325-32b-preference-mix-mis-sense
    Explore at:
    Authors
    Jacob Morrison
    Description

    jacobmorrison/olmo-2-0325-32b-preference-mix-mis-sense dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ai2 (2024). olmo-2-1124-13b-preference-mix [Dataset]. https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix
Organization logo

olmo-2-1124-13b-preference-mix

allenai/olmo-2-1124-13b-preference-mix

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License

https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

Description

OLMo 2 1124 13B Preference Mixture

Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact. This mix is made up of the following on-policy preference datasets generated using a synthetic data generation pipeline similar to Tulu

Reused prompts from the SFT mix (via ai2-adapt-dev/sft_v3.9_used_on_policy_po_olmo2_13b and… See the full description on the dataset page: https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix.

Search
Clear search
Close search
Google apps
Main menu