100+ datasets found

h
truthy-dpo-v0.1
huggingface.co
Updated Dec 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jon Durbin (2023). truthy-dpo-v0.1 [Dataset]. https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2023
Authors
Jon Durbin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Truthy DPO

This is a dataset designed to enhance the overall truthfulness of LLMs, without sacrificing immersion when roleplaying as a human. For example, in normal AI assistant model, the model should not try to describe what the warmth of the sun feels like, but if the system prompt indicates it's a human, it should. Mostly targets corporeal, spacial, temporal awareness, and common misconceptions.

Contribute

If you're interested in new functionality/datasets, take a… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1.
h
orpo-dpo-mix-40k
huggingface.co
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Labonne (2024). orpo-dpo-mix-40k [Dataset]. https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 18, 2024
Authors
Maxime Labonne
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ORPO-DPO-mix-40k v1.2

This dataset is designed for ORPO or DPO training. See Fine-tune Llama 3 with ORPO for more information about how to use it. It is a combination of the following high-quality DPO datasets:

argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)argilla/distilabel-intel-orca-dpo-pairs: highly scored chosen answers >=9, not in GSM8K (2,299 samples) argilla/ultrafeedback-binarized-preferences-cleaned: highly scored chosen answers >=5 (22… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k.
h
Human-Like-DPO-Dataset
huggingface.co
Updated May 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Human-Like LLMs (2024). Human-Like-DPO-Dataset [Dataset]. https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 19, 2024
Dataset authored and provided by
Human-Like LLMs
License
https://choosealicense.com/licenses/llama3/https://choosealicense.com/licenses/llama3/
Description
Enhancing Human-Like Responses in Large Language Models

🤗 Models | 📊 Dataset | 📄 Paper

Human-Like-DPO-Dataset

This dataset was created as part of research aimed at improving conversational fluency and engagement in large language models. It is suitable for formats like Direct Preference Optimization (DPO) to guide models toward generating more human-like responses. The dataset includes 10,884 samples across 256 topics, including: Technology Daily Life Science… See the full description on the dataset page: https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset.
h
toxic-dpo-v0.2
huggingface.co
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
toxic-dpo-v0.2 [Dataset]. https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2024
Dataset authored and provided by
unalignment
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Toxic-DPO

This is a highly toxic, "harmful" dataset meant to illustrate how DPO can be used to de-censor/unalign a model quite easily using direct-preference-optimization (DPO) using very few examples. Many of the examples still contain some amount of warnings/disclaimers, so it's still somewhat editorialized.

Usage restriction

To use this data, you must acknowledge/agree to the following:

data contained within is "toxic"/"harmful", and contains profanity and other types… See the full description on the dataset page: https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2.
h
dataset-tldr-preference-dpo
huggingface.co
Updated Jun 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dataset-tldr-preference-dpo [Dataset]. https://huggingface.co/datasets/davanstrien/dataset-tldr-preference-dpo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 26, 2024
Authors
Daniel van Strien
Description
Dataset Card for dataset-tldr-preference-dpo

This dataset has been created with distilabel.

Dataset Summary

This is a dataset intended for training models using DPO/ORPO for the task of producing concise tl;dr summaries of machine learning datasets based on their dataset cards. The dataset was created with distilabel. Each row of the dataset contains a dataset card which has been parsed to remove empty sections and placeholder text. The instruction request… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/dataset-tldr-preference-dpo.
MemGPT-DPO-Dataset
huggingface.co
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MemGPT (2024). MemGPT-DPO-Dataset [Dataset]. https://huggingface.co/datasets/MemGPT/MemGPT-DPO-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2024
Dataset authored and provided by
MemGPT
Description
MemGPT-DPO-Dataset is our initial release of a potential series of datasets. Please check "files" tab for other languages!

Details

The dataset is synthetically generated by GPT-4, led by @starsnatched and @cpacker. This dataset is intended to be used with text-generation models, such as Mistral-7B-Instruct. The dataset allows the LLM to learn to use MemGPT-specific tools.

→ Features

Teaches an LLM to prefer a function over the other.

→ Dataset size & splits… See the full description on the dataset page: https://huggingface.co/datasets/MemGPT/MemGPT-DPO-Dataset.
h
Ling-Coder-DPO
huggingface.co
Updated Oct 17, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
inclusionAI (2019). Ling-Coder-DPO [Dataset]. https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO
Explore at:
Dataset updated
Oct 17, 2019
Dataset authored and provided by
inclusionAI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
🤗 Hugging Face 🤖 ModelScope 🖥️ GitHub

Ling-Coder Dataset

The Ling-Coder Dataset comprises the following components:

Ling-Coder-SFT: A subset of SFT data used for training Ling-Coder Lite, containing more than 5 million samples. Ling-Coder-DPO: A subset of DPO data used for training Ling-Coder Lite, containing 250k samples. Ling-Coder-SyntheticQA: A subset of synthetic data used for annealing training of Ling-Coder Lite, containing more… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO.
h
Multifaceted-Collection-DPO
huggingface.co
Updated Jun 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Multifaceted-Collection-DPO [Dataset]. https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-DPO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2024
Dataset authored and provided by
KAIST AI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Multifaceted Collection DPO

Links for Reference

Homepage: https://lklab.kaist.ac.kr/Janus/ Repository: https://github.com/kaistAI/Janus Paper: https://arxiv.org/abs/2405.17977 Point of Contact: suehyunpark@kaist.ac.kr

TL;DR

Multifaceted Collection is a preference dataset for aligning LLMs to diverse human preferences, where system messages are used to represent individual preferences. The instructions are acquired from five existing… See the full description on the dataset page: https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-DPO.
h
Snorkel-Mistral-PairRM-DPO-Dataset
huggingface.co
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Snorkel AI (2024). Snorkel-Mistral-PairRM-DPO-Dataset [Dataset]. https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 24, 2024
Dataset authored and provided by
Snorkel AI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset:

This is the data used for training Snorkel model We use ONLY the prompts from UltraFeedback; no external LLM responses used.

Methodology:

Generate 5 response variations for each prompt from a subset of 20,000 using the LLM - to start, we used Mistral-7B-Instruct-v0.2. Apply PairRM for response reranking. Update the LLM by applying Direct Preference Optimization (DPO) on the top (chosen) and bottom (rejected) responses. Use this LLM as the base model for the next… See the full description on the dataset page: https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.
h4-tests-format-dpo-dataset
huggingface.co
Updated Mar 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
h4-tests-format-dpo-dataset [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/h4-tests-format-dpo-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2024
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face H4
Description
HuggingFaceH4/h4-tests-format-dpo-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
orpo-dpo-mix-40k-flat
huggingface.co
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Labonne (2024). orpo-dpo-mix-40k-flat [Dataset]. https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k-flat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2024
Authors
Maxime Labonne
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ORPO-DPO-mix-40k-flat

This dataset is designed for ORPO or DPO training. See Uncensor any LLM with Abliteration for more information about how to use it. This is version with raw text instead of lists of dicts as in the original version here. It makes easier to parse in Axolotl, especially for DPO.ORPO-DPO-mix-40k-flat is a combination of the following high-quality DPO datasets:

argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k-flat.
h
distilabel-capybara-dpo-7k-binarized
huggingface.co
Updated Jan 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2024). distilabel-capybara-dpo-7k-binarized [Dataset]. https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2024
Dataset authored and provided by
Argilla
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Capybara-DPO 7K binarized

A DPO dataset built with distilabel atop the awesome LDJnr/Capybara

This is a preview version to collect feedback from the community. v2 will include the full base dataset and responses from more powerful models.

Why?

Multi-turn dialogue data is key to fine-tune capable chat models. Multi-turn preference data has been used by the most relevant RLHF works (Anthropic, Meta Llama2, etc.). Unfortunately, there are very few… See the full description on the dataset page: https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized.
h
LLM-QE-DPO-Training-Data
huggingface.co
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chengpingan (2025). LLM-QE-DPO-Training-Data [Dataset]. https://huggingface.co/datasets/chengpingan/LLM-QE-DPO-Training-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2025
Authors
chengpingan
Description
chengpingan/LLM-QE-DPO-Training-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
h
distilabel-math-preference-dpo
huggingface.co
Updated Nov 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2023). distilabel-math-preference-dpo [Dataset]. https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2023
Dataset authored and provided by
Argilla
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for "distilabel-math-preference-dpo"

More Information needed
h
distilabel-intel-orca-dpo-pairs
huggingface.co
Updated Dec 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
distilabel-intel-orca-dpo-pairs [Dataset]. https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs
Explore at:
Dataset updated
Dec 11, 2024
Dataset authored and provided by
Argilla
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
distilabel Orca Pairs for DPO

The dataset is a "distilabeled" version of the widely used dataset: Intel/orca_dpo_pairs. The original dataset has been used by 100s of open-source practitioners and models. We knew from fixing UltraFeedback (and before that, Alpacas and Dollys) that this dataset could be highly improved. Continuing with our mission to build the best alignment datasets for open-source LLMs and the community, we spent a few hours improving it with… See the full description on the dataset page: https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs.
h
Math-Step-DPO-10K
huggingface.co
Updated Jul 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Lai (2024). Math-Step-DPO-10K [Dataset]. https://huggingface.co/datasets/xinlai/Math-Step-DPO-10K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 1, 2024
Authors
Xin Lai
Description
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

🖥️Code | 🤗Data | 📄Paper This repo contains the Math-Step-DPO-10K dataset for our paper Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs, Step-DPO is a simple, effective, and data-efficient method for boosting the mathematical reasoning ability of LLMs. Notably, Step-DPO, when applied to Qwen2-72B-Instruct, achieves scores of 70.8% and 94.0% on the test sets of MATH and GSM8K… See the full description on the dataset page: https://huggingface.co/datasets/xinlai/Math-Step-DPO-10K.
h
gutenberg-dpo-v0.1
huggingface.co
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jon Durbin (2024). gutenberg-dpo-v0.1 [Dataset]. https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2024
Authors
Jon Durbin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Gutenberg DPO

Overview

This is a dataset meant to enhance novel writing capabilities of LLMs, by using public domain books from Project Gutenberg

Process

First, the each book is parsed, split into chapters, cleaned up from the original format (remove superfluous newlines, illustration tags, etc.). Once we have chapters, an LLM is prompted with each chapter to create a synthetic prompt that would result in that chapter being written. Each chapter has a summary… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1.
h
dpo-dataset
huggingface.co
Updated Aug 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sam (2024). dpo-dataset [Dataset]. https://huggingface.co/datasets/srbdtwentyfour/dpo-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2024
Authors
Sam
Description
srbdtwentyfour/dpo-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ChemPref-DPO-for-Chemistry-data-en
huggingface.co
Updated Apr 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI4Chem (2024). ChemPref-DPO-for-Chemistry-data-en [Dataset]. https://huggingface.co/datasets/AI4Chem/ChemPref-DPO-for-Chemistry-data-en
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2024
Dataset authored and provided by
AI4Chem
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Citation

@misc{zhang2024chemllm, title={ChemLLM: A Chemical Large Language Model}, author={Di Zhang and Wei Liu and Qian Tan and Jingdan Chen and Hang Yan and Yuliang Yan and Jiatong Li and Weiran Huang and Xiangyu Yue and Dongzhan Zhou and Shufei Zhang and Mao Su and Hansen Zhong and Yuqiang Li and Wanli Ouyang}, year={2024}, eprint={2402.06852}, archivePrefix={arXiv}, primaryClass={cs.AI} }
h
alpaca-vs-alpaca-orpo-dpo
huggingface.co
Updated Feb 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edoardo Federici (2024). alpaca-vs-alpaca-orpo-dpo [Dataset]. https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 6, 2024
Authors
Edoardo Federici
Description
Alpaca vs. Alpaca

Dataset Description

The Alpaca vs. Alpaca dataset is a curated blend of the Alpaca dataset and the Alpaca GPT-4 dataset, both available on HuggingFace Datasets. It uses the standard GPT dataset as the 'rejected' answer, steering the model towards the GPT-4 answer, which is considered as the 'chosen' one. However, it's important to note that the 'correctness' here is not absolute. The premise is based on the assumption that GPT-4 answers are generally… See the full description on the dataset page: https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jon Durbin (2023). truthy-dpo-v0.1 [Dataset]. https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1

truthy-dpo-v0.1

jondurbin/truthy-dpo-v0.1

Explore at:

11 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 11, 2023

Authors

Jon Durbin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Truthy DPO

This is a dataset designed to enhance the overall truthfulness of LLMs, without sacrificing immersion when roleplaying as a human. For example, in normal AI assistant model, the model should not try to describe what the warmth of the sun feels like, but if the system prompt indicates it's a human, it should. Mostly targets corporeal, spacial, temporal awareness, and common misconceptions.

  Contribute

If you're interested in new functionality/datasets, take a… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1.

Clear search

Close search

Google apps

Main menu

truthy-dpo-v0.1

orpo-dpo-mix-40k

Human-Like-DPO-Dataset

toxic-dpo-v0.2

dataset-tldr-preference-dpo

MemGPT-DPO-Dataset

Ling-Coder-DPO

Multifaceted-Collection-DPO

Snorkel-Mistral-PairRM-DPO-Dataset

h4-tests-format-dpo-dataset

orpo-dpo-mix-40k-flat

distilabel-capybara-dpo-7k-binarized

LLM-QE-DPO-Training-Data

distilabel-math-preference-dpo

distilabel-intel-orca-dpo-pairs

Math-Step-DPO-10K

gutenberg-dpo-v0.1

dpo-dataset

ChemPref-DPO-for-Chemistry-data-en

alpaca-vs-alpaca-orpo-dpo

truthy-dpo-v0.1See More Versions

jondurbin/truthy-dpo-v0.1

truthy-dpo-v0.1