Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Truthy DPO
This is a dataset designed to enhance the overall truthfulness of LLMs, without sacrificing immersion when roleplaying as a human. For example, in normal AI assistant model, the model should not try to describe what the warmth of the sun feels like, but if the system prompt indicates it's a human, it should. Mostly targets corporeal, spacial, temporal awareness, and common misconceptions.
Contribute
If you're interested in new functionality/datasets, take a… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ORPO-DPO-mix-40k v1.2
This dataset is designed for ORPO or DPO training. See Fine-tune Llama 3 with ORPO for more information about how to use it. It is a combination of the following high-quality DPO datasets:
argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)argilla/distilabel-intel-orca-dpo-pairs: highly scored chosen answers >=9, not in GSM8K (2,299 samples) argilla/ultrafeedback-binarized-preferences-cleaned: highly scored chosen answers >=5 (22… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k.
https://choosealicense.com/licenses/llama3/https://choosealicense.com/licenses/llama3/
Enhancing Human-Like Responses in Large Language Models
🤗 Models | 📊 Dataset | 📄 Paper
Human-Like-DPO-Dataset
This dataset was created as part of research aimed at improving conversational fluency and engagement in large language models. It is suitable for formats like Direct Preference Optimization (DPO) to guide models toward generating more human-like responses. The dataset includes 10,884 samples across 256 topics, including: Technology Daily Life Science… See the full description on the dataset page: https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Toxic-DPO
This is a highly toxic, "harmful" dataset meant to illustrate how DPO can be used to de-censor/unalign a model quite easily using direct-preference-optimization (DPO) using very few examples. Many of the examples still contain some amount of warnings/disclaimers, so it's still somewhat editorialized.
Usage restriction
To use this data, you must acknowledge/agree to the following:
data contained within is "toxic"/"harmful", and contains profanity and other types… See the full description on the dataset page: https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2.
Dataset Card for dataset-tldr-preference-dpo
This dataset has been created with distilabel.
Dataset Summary
This is a dataset intended for training models using DPO/ORPO for the task of producing concise tl;dr summaries of machine learning datasets based on their dataset cards. The dataset was created with distilabel. Each row of the dataset contains a dataset card which has been parsed to remove empty sections and placeholder text. The instruction request… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/dataset-tldr-preference-dpo.
MemGPT-DPO-Dataset is our initial release of a potential series of datasets. Please check "files" tab for other languages!
Details
The dataset is synthetically generated by GPT-4, led by @starsnatched and @cpacker. This dataset is intended to be used with text-generation models, such as Mistral-7B-Instruct. The dataset allows the LLM to learn to use MemGPT-specific tools.
→ Features
Teaches an LLM to prefer a function over the other.
→ Dataset size & splits… See the full description on the dataset page: https://huggingface.co/datasets/MemGPT/MemGPT-DPO-Dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🤗 Hugging Face 🤖 ModelScope 🖥️ GitHub
Ling-Coder Dataset
The Ling-Coder Dataset comprises the following components:
Ling-Coder-SFT: A subset of SFT data used for training Ling-Coder Lite, containing more than 5 million samples. Ling-Coder-DPO: A subset of DPO data used for training Ling-Coder Lite, containing 250k samples. Ling-Coder-SyntheticQA: A subset of synthetic data used for annealing training of Ling-Coder Lite, containing more… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Multifaceted Collection DPO
Links for Reference
Homepage: https://lklab.kaist.ac.kr/Janus/ Repository: https://github.com/kaistAI/Janus Paper: https://arxiv.org/abs/2405.17977 Point of Contact: suehyunpark@kaist.ac.kr
TL;DR
Multifaceted Collection is a preference dataset for aligning LLMs to diverse human preferences, where system messages are used to represent individual preferences. The instructions are acquired from five existing… See the full description on the dataset page: https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-DPO.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset:
This is the data used for training Snorkel model We use ONLY the prompts from UltraFeedback; no external LLM responses used.
Methodology:
Generate 5 response variations for each prompt from a subset of 20,000 using the LLM - to start, we used Mistral-7B-Instruct-v0.2. Apply PairRM for response reranking. Update the LLM by applying Direct Preference Optimization (DPO) on the top (chosen) and bottom (rejected) responses. Use this LLM as the base model for the next… See the full description on the dataset page: https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.
HuggingFaceH4/h4-tests-format-dpo-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ORPO-DPO-mix-40k-flat
This dataset is designed for ORPO or DPO training. See Uncensor any LLM with Abliteration for more information about how to use it. This is version with raw text instead of lists of dicts as in the original version here. It makes easier to parse in Axolotl, especially for DPO.ORPO-DPO-mix-40k-flat is a combination of the following high-quality DPO datasets:
argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k-flat.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Capybara-DPO 7K binarized
A DPO dataset built with distilabel atop the awesome LDJnr/Capybara
This is a preview version to collect feedback from the community. v2 will include the full base dataset and responses from more powerful models.
Why?
Multi-turn dialogue data is key to fine-tune capable chat models. Multi-turn preference data has been used by the most relevant RLHF works (Anthropic, Meta Llama2, etc.). Unfortunately, there are very few… See the full description on the dataset page: https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized.
chengpingan/LLM-QE-DPO-Training-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for "distilabel-math-preference-dpo"
More Information needed
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
distilabel Orca Pairs for DPO
The dataset is a "distilabeled" version of the widely used dataset: Intel/orca_dpo_pairs. The original dataset has been used by 100s of open-source practitioners and models. We knew from fixing UltraFeedback (and before that, Alpacas and Dollys) that this dataset could be highly improved. Continuing with our mission to build the best alignment datasets for open-source LLMs and the community, we spent a few hours improving it with… See the full description on the dataset page: https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs.
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
🖥️Code | 🤗Data | 📄Paper This repo contains the Math-Step-DPO-10K dataset for our paper Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs, Step-DPO is a simple, effective, and data-efficient method for boosting the mathematical reasoning ability of LLMs. Notably, Step-DPO, when applied to Qwen2-72B-Instruct, achieves scores of 70.8% and 94.0% on the test sets of MATH and GSM8K… See the full description on the dataset page: https://huggingface.co/datasets/xinlai/Math-Step-DPO-10K.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gutenberg DPO
Overview
This is a dataset meant to enhance novel writing capabilities of LLMs, by using public domain books from Project Gutenberg
Process
First, the each book is parsed, split into chapters, cleaned up from the original format (remove superfluous newlines, illustration tags, etc.). Once we have chapters, an LLM is prompted with each chapter to create a synthetic prompt that would result in that chapter being written. Each chapter has a summary… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1.
srbdtwentyfour/dpo-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Citation
@misc{zhang2024chemllm, title={ChemLLM: A Chemical Large Language Model}, author={Di Zhang and Wei Liu and Qian Tan and Jingdan Chen and Hang Yan and Yuliang Yan and Jiatong Li and Weiran Huang and Xiangyu Yue and Dongzhan Zhou and Shufei Zhang and Mao Su and Hansen Zhong and Yuqiang Li and Wanli Ouyang}, year={2024}, eprint={2402.06852}, archivePrefix={arXiv}, primaryClass={cs.AI} }
Alpaca vs. Alpaca
Dataset Description
The Alpaca vs. Alpaca dataset is a curated blend of the Alpaca dataset and the Alpaca GPT-4 dataset, both available on HuggingFace Datasets. It uses the standard GPT dataset as the 'rejected' answer, steering the model towards the GPT-4 answer, which is considered as the 'chosen' one. However, it's important to note that the 'correctness' here is not absolute. The premise is based on the assumption that GPT-4 answers are generally… See the full description on the dataset page: https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Truthy DPO
This is a dataset designed to enhance the overall truthfulness of LLMs, without sacrificing immersion when roleplaying as a human. For example, in normal AI assistant model, the model should not try to describe what the warmth of the sun feels like, but if the system prompt indicates it's a human, it should. Mostly targets corporeal, spacial, temporal awareness, and common misconceptions.
Contribute
If you're interested in new functionality/datasets, take a… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1.