38 datasets found

olmo-2-1124-13b-preference-mix
huggingface.co
Updated Nov 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2024). olmo-2-1124-13b-preference-mix [Dataset]. https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
OLMo 2 1124 13B Preference Mixture

Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact. This mix is made up of the following on-policy preference datasets generated using a synthetic data generation pipeline similar to Tulu

Reused prompts from the SFT mix (via ai2-adapt-dev/sft_v3.9_used_on_policy_po_olmo2_13b and… See the full description on the dataset page: https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix.
olmo-mix-1124
huggingface.co
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2024). olmo-mix-1124 [Dataset]. https://huggingface.co/datasets/allenai/olmo-mix-1124
Explore at:
Dataset updated
Jun 4, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
OLMo 2 (November 2024) Pretraining set

Collection of data used to train OLMo-2-1124 models. The majority of this dataset comes from DCLM-Baseline with no additional filtering, but we provide the explicit breakdowns below.

Name Tokens Bytes (uncompressed) Documents License

DCLM-Baseline 3.70T 21.3TB 2.95B CC-BY-4.0

Arxiv 20.8B 77.2GB 3.95M ODC-BY

pes2o 58.6B 412GB 38M ODC-BY

starcoder 83.0B 458GB 78.7M ODC-BY

Algebraic-stack 11.8B 44.0GB 2.83M ODC-BY

OpenWebMath… See the full description on the dataset page: https://huggingface.co/datasets/allenai/olmo-mix-1124.
h
olmo2-32b-combined-outputs
huggingface.co
Updated Aug 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Morrison (2025). olmo2-32b-combined-outputs [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo2-32b-combined-outputs
Explore at:
Dataset updated
Aug 12, 2025
Authors
Jacob Morrison
Description
jacobmorrison/olmo2-32b-combined-outputs dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo-2-pref-mix-no-source
huggingface.co
Updated Aug 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saumya Malik (2025). olmo-2-pref-mix-no-source [Dataset]. https://huggingface.co/datasets/saumyamalik/olmo-2-pref-mix-no-source
Explore at:
Dataset updated
Aug 15, 2025
Authors
Saumya Malik
Description
saumyamalik/olmo-2-pref-mix-no-source dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo-2-7b-pref-mix-delta-olmo2
huggingface.co
Updated Aug 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saumya Malik (2025). olmo-2-7b-pref-mix-delta-olmo2 [Dataset]. https://huggingface.co/datasets/saumyamalik/olmo-2-7b-pref-mix-delta-olmo2
Explore at:
Dataset updated
Aug 14, 2025
Authors
Saumya Malik
Description
saumyamalik/olmo-2-7b-pref-mix-delta-olmo2 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo-2-1124-7b-preference-mix-filtered-overlapping
huggingface.co
Updated Aug 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Morrison (2025). olmo-2-1124-7b-preference-mix-filtered-overlapping [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo-2-1124-7b-preference-mix-filtered-overlapping
Explore at:
Dataset updated
Aug 13, 2025
Authors
Jacob Morrison
Description
jacobmorrison/olmo-2-1124-7b-preference-mix-filtered-overlapping dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo2-delta-qwen2.5_3b_over_1.5b
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Geng, olmo2-delta-qwen2.5_3b_over_1.5b [Dataset]. https://huggingface.co/datasets/scottgeng00/olmo2-delta-qwen2.5_3b_over_1.5b
Explore at:
Authors
Scott Geng
Description
scottgeng00/olmo2-delta-qwen2.5_3b_over_1.5b dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo-2-1124-13b-preference-mix-randomcase
huggingface.co
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Morrison (2025). olmo-2-1124-13b-preference-mix-randomcase [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo-2-1124-13b-preference-mix-randomcase
Explore at:
Dataset updated
Jul 15, 2025
Authors
Jacob Morrison
Description
jacobmorrison/olmo-2-1124-13b-preference-mix-randomcase dataset hosted on Hugging Face and contributed by the HF Datasets community
h
rlhf-library-OLMo-2-1124-7B-DPO
huggingface.co
Updated Sep 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathan Lambert (2025). rlhf-library-OLMo-2-1124-7B-DPO [Dataset]. https://huggingface.co/datasets/natolambert/rlhf-library-OLMo-2-1124-7B-DPO
Explore at:
Dataset updated
Sep 20, 2025
Authors
Nathan Lambert
Description
natolambert/rlhf-library-OLMo-2-1124-7B-DPO dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ultrafeedback-cleaned-olmo2-7b-unused-gemma3
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saumya Malik, ultrafeedback-cleaned-olmo2-7b-unused-gemma3 [Dataset]. https://huggingface.co/datasets/saumyamalik/ultrafeedback-cleaned-olmo2-7b-unused-gemma3
Explore at:
Authors
Saumya Malik
Description
saumyamalik/ultrafeedback-cleaned-olmo2-7b-unused-gemma3 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo2-13b-generated
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federico Barbero, olmo2-13b-generated [Dataset]. https://huggingface.co/datasets/fedzbar/olmo2-13b-generated
Explore at:
Authors
Federico Barbero
Description
fedzbar/olmo2-13b-generated dataset hosted on Hugging Face and contributed by the HF Datasets community
h
tulu-3-sft-olmo-2-mixture-0225
huggingface.co
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2025). tulu-3-sft-olmo-2-mixture-0225 [Dataset]. https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture-0225
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Ai2
Description
Used to train OLMo 2 32B. From the blog post:

Filtered out instructions from the SFT dataset and the chosen responses of the preference data that included mentions of a date cutoff from the synthetic data generation process. This resulted in a new version of the instruction dataset, Tulu 3 SFT Mixture 0225, and preference dataset, OLMo-2-32B-pref-mix-0325. We use majority voting to improve the quality of answers to our synthetic math questions. For our Persona MATH and Grade School Math… See the full description on the dataset page: https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture-0225.
h
olmo-2-0325-32b-preference-mix-leetspeak
huggingface.co
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Morrison (2025). olmo-2-0325-32b-preference-mix-leetspeak [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo-2-0325-32b-preference-mix-leetspeak
Explore at:
Dataset updated
Jul 15, 2025
Authors
Jacob Morrison
Description
jacobmorrison/olmo-2-0325-32b-preference-mix-leetspeak dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo-2-0325-32b-preference-mix-20-pct-perturbed
huggingface.co
Updated Jul 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saumya Malik (2025). olmo-2-0325-32b-preference-mix-20-pct-perturbed [Dataset]. https://huggingface.co/datasets/saumyamalik/olmo-2-0325-32b-preference-mix-20-pct-perturbed
Explore at:
Dataset updated
Jul 12, 2025
Authors
Saumya Malik
Description
saumyamalik/olmo-2-0325-32b-preference-mix-20-pct-perturbed dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo-2-0325-32b-preference-mix
huggingface.co
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victoria Graf (2025). olmo-2-0325-32b-preference-mix [Dataset]. https://huggingface.co/datasets/VGraf/olmo-2-0325-32b-preference-mix
Explore at:
Dataset updated
Mar 13, 2025
Authors
Victoria Graf
Description
VGraf/olmo-2-0325-32b-preference-mix dataset hosted on Hugging Face and contributed by the HF Datasets community
h
allenai_OLMo-2-1124-7B-Instruct-details
huggingface.co
Updated Jul 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open LLM Leaderboard (2025). allenai_OLMo-2-1124-7B-Instruct-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/allenai_OLMo-2-1124-7B-Instruct-details
Explore at:
Dataset updated
Jul 30, 2025
Dataset authored and provided by
Open LLM Leaderboard
Description
Dataset Card for Evaluation run of allenai/OLMo-2-1124-7B-Instruct

Dataset automatically created during the evaluation run of model allenai/OLMo-2-1124-7B-Instruct The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/allenai_OLMo-2-1124-7B-Instruct-details.
h
OLMo-2-0425-1B-Instruct_DPO
huggingface.co
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neel Rajani (2025). OLMo-2-0425-1B-Instruct_DPO [Dataset]. https://huggingface.co/datasets/Neelectric/OLMo-2-0425-1B-Instruct_DPO
Explore at:
Dataset updated
Dec 2, 2025
Authors
Neel Rajani
Description
Neelectric/OLMo-2-0425-1B-Instruct_DPO dataset hosted on Hugging Face and contributed by the HF Datasets community
h
allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT
huggingface.co
Updated Jun 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peanut Jar Mixers Development (2025). allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT [Dataset]. https://huggingface.co/datasets/PJMixers-Dev/allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT
Explore at:
Dataset updated
Jun 12, 2025
Dataset authored and provided by
Peanut Jar Mixers Development
Description
Removed sources:

ai2-adapt-dev/tulu_v3.9_wildjailbreak_decontaminated_50k ai2-adapt-dev/tulu_v3.9_synthetic_finalresp_wildguardmixtrain_decontaminated_50k ai2-adapt-dev/coconot_converted ai2-adapt-dev/tulu_hard_coded_repeated_10 ai2-adapt-dev/tulu_v3.9_aya_100k ai2-adapt-dev/numinamath_tir_math_decontaminated ai2-adapt-dev/tulu_v3.9_open_math_2_gsm8k_50k ai2-adapt-dev/tulu_v3.9_personahub_math_interm_algebra_20k allenai/tulu-3-sft-personas-math-filtered… See the full description on the dataset page: https://huggingface.co/datasets/PJMixers-Dev/allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT.
h
daringanteater-prefs-olmo2-7b-unused-gemma3
huggingface.co
Updated Nov 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saumya Malik (2025). daringanteater-prefs-olmo2-7b-unused-gemma3 [Dataset]. https://huggingface.co/datasets/saumyamalik/daringanteater-prefs-olmo2-7b-unused-gemma3
Explore at:
Dataset updated
Nov 25, 2025
Authors
Saumya Malik
Description
saumyamalik/daringanteater-prefs-olmo2-7b-unused-gemma3 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
olmo-2-0325-32b-preference-mix-mis-sense
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Morrison, olmo-2-0325-32b-preference-mix-mis-sense [Dataset]. https://huggingface.co/datasets/jacobmorrison/olmo-2-0325-32b-preference-mix-mis-sense
Explore at:
Authors
Jacob Morrison
Description
jacobmorrison/olmo-2-0325-32b-preference-mix-mis-sense dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Ai2 (2024). olmo-2-1124-13b-preference-mix [Dataset]. https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix

olmo-2-1124-13b-preference-mix

allenai/olmo-2-1124-13b-preference-mix

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 26, 2024

Dataset provided by

Allen Institute for AIhttp://allenai.org/

Authors

Ai2

License

https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

Description

OLMo 2 1124 13B Preference Mixture

Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact. This mix is made up of the following on-policy preference datasets generated using a synthetic data generation pipeline similar to Tulu

Reused prompts from the SFT mix (via ai2-adapt-dev/sft_v3.9_used_on_policy_po_olmo2_13b and… See the full description on the dataset page: https://huggingface.co/datasets/allenai/olmo-2-1124-13b-preference-mix.

Clear search

Close search

Google apps

Main menu

olmo-2-1124-13b-preference-mix

olmo-mix-1124

olmo2-32b-combined-outputs

olmo-2-pref-mix-no-source

olmo-2-7b-pref-mix-delta-olmo2

olmo-2-1124-7b-preference-mix-filtered-overlapping

olmo2-delta-qwen2.5_3b_over_1.5b

olmo-2-1124-13b-preference-mix-randomcase

rlhf-library-OLMo-2-1124-7B-DPO

ultrafeedback-cleaned-olmo2-7b-unused-gemma3

olmo2-13b-generated

tulu-3-sft-olmo-2-mixture-0225

olmo-2-0325-32b-preference-mix-leetspeak

olmo-2-0325-32b-preference-mix-20-pct-perturbed

olmo-2-0325-32b-preference-mix

allenai_OLMo-2-1124-7B-Instruct-details

OLMo-2-0425-1B-Instruct_DPO

allenai_tulu-3-sft-olmo-2-mixture-0225-filtered-ShareGPT

daringanteater-prefs-olmo2-7b-unused-gemma3

olmo-2-0325-32b-preference-mix-mis-sense

olmo-2-1124-13b-preference-mixSee More Versions

allenai/olmo-2-1124-13b-preference-mix

olmo-2-1124-13b-preference-mix