100+ datasets found
  1. h

    truthy-dpo-v0.1

    • huggingface.co
    Updated Dec 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jon Durbin (2023). truthy-dpo-v0.1 [Dataset]. https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2023
    Authors
    Jon Durbin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Truthy DPO

    This is a dataset designed to enhance the overall truthfulness of LLMs, without sacrificing immersion when roleplaying as a human. For example, in normal AI assistant model, the model should not try to describe what the warmth of the sun feels like, but if the system prompt indicates it's a human, it should. Mostly targets corporeal, spacial, temporal awareness, and common misconceptions.

      Contribute
    

    If you're interested in new functionality/datasets, take a… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1.

  2. h

    orpo-dpo-mix-40k

    • huggingface.co
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Labonne (2024). orpo-dpo-mix-40k [Dataset]. https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2024
    Authors
    Maxime Labonne
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ORPO-DPO-mix-40k v1.2

    This dataset is designed for ORPO or DPO training. See Fine-tune Llama 3 with ORPO for more information about how to use it. It is a combination of the following high-quality DPO datasets:

    argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)argilla/distilabel-intel-orca-dpo-pairs: highly scored chosen answers >=9, not in GSM8K (2,299 samples) argilla/ultrafeedback-binarized-preferences-cleaned: highly scored chosen answers >=5 (22… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k.

  3. h

    Human-Like-DPO-Dataset

    • huggingface.co
    Updated May 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human-Like LLMs (2024). Human-Like-DPO-Dataset [Dataset]. https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2024
    Dataset authored and provided by
    Human-Like LLMs
    License

    https://choosealicense.com/licenses/llama3/https://choosealicense.com/licenses/llama3/

    Description

    Enhancing Human-Like Responses in Large Language Models

    🤗 Models | 📊 Dataset | 📄 Paper

      Human-Like-DPO-Dataset
    

    This dataset was created as part of research aimed at improving conversational fluency and engagement in large language models. It is suitable for formats like Direct Preference Optimization (DPO) to guide models toward generating more human-like responses. The dataset includes 10,884 samples across 256 topics, including: Technology Daily Life Science… See the full description on the dataset page: https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset.

  4. h

    toxic-dpo-v0.2

    • huggingface.co
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    toxic-dpo-v0.2 [Dataset]. https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    unalignment
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Toxic-DPO

    This is a highly toxic, "harmful" dataset meant to illustrate how DPO can be used to de-censor/unalign a model quite easily using direct-preference-optimization (DPO) using very few examples. Many of the examples still contain some amount of warnings/disclaimers, so it's still somewhat editorialized.

      Usage restriction
    

    To use this data, you must acknowledge/agree to the following:

    data contained within is "toxic"/"harmful", and contains profanity and other types… See the full description on the dataset page: https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2.

  5. h

    dataset-tldr-preference-dpo

    • huggingface.co
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dataset-tldr-preference-dpo [Dataset]. https://huggingface.co/datasets/davanstrien/dataset-tldr-preference-dpo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2024
    Authors
    Daniel van Strien
    Description

    Dataset Card for dataset-tldr-preference-dpo

    This dataset has been created with distilabel.

      Dataset Summary
    

    This is a dataset intended for training models using DPO/ORPO for the task of producing concise tl;dr summaries of machine learning datasets based on their dataset cards. The dataset was created with distilabel. Each row of the dataset contains a dataset card which has been parsed to remove empty sections and placeholder text. The instruction request… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/dataset-tldr-preference-dpo.

  6. MemGPT-DPO-Dataset

    • huggingface.co
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MemGPT (2024). MemGPT-DPO-Dataset [Dataset]. https://huggingface.co/datasets/MemGPT/MemGPT-DPO-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2024
    Dataset authored and provided by
    MemGPT
    Description

    MemGPT-DPO-Dataset is our initial release of a potential series of datasets. Please check "files" tab for other languages!

      Details
    

    The dataset is synthetically generated by GPT-4, led by @starsnatched and @cpacker. This dataset is intended to be used with text-generation models, such as Mistral-7B-Instruct. The dataset allows the LLM to learn to use MemGPT-specific tools.

      → Features
    

    Teaches an LLM to prefer a function over the other.

      → Dataset size & splits… See the full description on the dataset page: https://huggingface.co/datasets/MemGPT/MemGPT-DPO-Dataset.
    
  7. h

    Ling-Coder-DPO

    • huggingface.co
    Updated Oct 17, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    inclusionAI (2019). Ling-Coder-DPO [Dataset]. https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO
    Explore at:
    Dataset updated
    Oct 17, 2019
    Dataset authored and provided by
    inclusionAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🤗 Hugging Face 🤖 ModelScope 🖥️ GitHub

      Ling-Coder Dataset
    

    The Ling-Coder Dataset comprises the following components:

    Ling-Coder-SFT: A subset of SFT data used for training Ling-Coder Lite, containing more than 5 million samples. Ling-Coder-DPO: A subset of DPO data used for training Ling-Coder Lite, containing 250k samples. Ling-Coder-SyntheticQA: A subset of synthetic data used for annealing training of Ling-Coder Lite, containing more… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO.

  8. h

    Multifaceted-Collection-DPO

    • huggingface.co
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Multifaceted-Collection-DPO [Dataset]. https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-DPO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2024
    Dataset authored and provided by
    KAIST AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Multifaceted Collection DPO

      Links for Reference
    

    Homepage: https://lklab.kaist.ac.kr/Janus/ Repository: https://github.com/kaistAI/Janus Paper: https://arxiv.org/abs/2405.17977 Point of Contact: suehyunpark@kaist.ac.kr

      TL;DR
    

    Multifaceted Collection is a preference dataset for aligning LLMs to diverse human preferences, where system messages are used to represent individual preferences. The instructions are acquired from five existing… See the full description on the dataset page: https://huggingface.co/datasets/kaist-ai/Multifaceted-Collection-DPO.

  9. h

    Snorkel-Mistral-PairRM-DPO-Dataset

    • huggingface.co
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Snorkel AI (2024). Snorkel-Mistral-PairRM-DPO-Dataset [Dataset]. https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2024
    Dataset authored and provided by
    Snorkel AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset:

    This is the data used for training Snorkel model We use ONLY the prompts from UltraFeedback; no external LLM responses used.

      Methodology:
    

    Generate 5 response variations for each prompt from a subset of 20,000 using the LLM - to start, we used Mistral-7B-Instruct-v0.2. Apply PairRM for response reranking. Update the LLM by applying Direct Preference Optimization (DPO) on the top (chosen) and bottom (rejected) responses. Use this LLM as the base model for the next… See the full description on the dataset page: https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset.

  10. h4-tests-format-dpo-dataset

    • huggingface.co
    Updated Mar 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    h4-tests-format-dpo-dataset [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/h4-tests-format-dpo-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    Description

    HuggingFaceH4/h4-tests-format-dpo-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    orpo-dpo-mix-40k-flat

    • huggingface.co
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Labonne (2024). orpo-dpo-mix-40k-flat [Dataset]. https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k-flat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2024
    Authors
    Maxime Labonne
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ORPO-DPO-mix-40k-flat

    This dataset is designed for ORPO or DPO training. See Uncensor any LLM with Abliteration for more information about how to use it. This is version with raw text instead of lists of dicts as in the original version here. It makes easier to parse in Axolotl, especially for DPO.ORPO-DPO-mix-40k-flat is a combination of the following high-quality DPO datasets:

    argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k-flat.

  12. h

    distilabel-capybara-dpo-7k-binarized

    • huggingface.co
    Updated Jan 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2024). distilabel-capybara-dpo-7k-binarized [Dataset]. https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 31, 2024
    Dataset authored and provided by
    Argilla
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Capybara-DPO 7K binarized

    A DPO dataset built with distilabel atop the awesome LDJnr/Capybara

    This is a preview version to collect feedback from the community. v2 will include the full base dataset and responses from more powerful models.

      Why?
    

    Multi-turn dialogue data is key to fine-tune capable chat models. Multi-turn preference data has been used by the most relevant RLHF works (Anthropic, Meta Llama2, etc.). Unfortunately, there are very few… See the full description on the dataset page: https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized.

  13. h

    LLM-QE-DPO-Training-Data

    • huggingface.co
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chengpingan (2025). LLM-QE-DPO-Training-Data [Dataset]. https://huggingface.co/datasets/chengpingan/LLM-QE-DPO-Training-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2025
    Authors
    chengpingan
    Description

    chengpingan/LLM-QE-DPO-Training-Data dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    distilabel-math-preference-dpo

    • huggingface.co
    Updated Nov 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2023). distilabel-math-preference-dpo [Dataset]. https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 22, 2023
    Dataset authored and provided by
    Argilla
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "distilabel-math-preference-dpo"

    More Information needed

  15. h

    distilabel-intel-orca-dpo-pairs

    • huggingface.co
    Updated Dec 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    distilabel-intel-orca-dpo-pairs [Dataset]. https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs
    Explore at:
    Dataset updated
    Dec 11, 2024
    Dataset authored and provided by
    Argilla
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    distilabel Orca Pairs for DPO

    The dataset is a "distilabeled" version of the widely used dataset: Intel/orca_dpo_pairs. The original dataset has been used by 100s of open-source practitioners and models. We knew from fixing UltraFeedback (and before that, Alpacas and Dollys) that this dataset could be highly improved. Continuing with our mission to build the best alignment datasets for open-source LLMs and the community, we spent a few hours improving it with… See the full description on the dataset page: https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs.

  16. h

    Math-Step-DPO-10K

    • huggingface.co
    Updated Jul 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Lai (2024). Math-Step-DPO-10K [Dataset]. https://huggingface.co/datasets/xinlai/Math-Step-DPO-10K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2024
    Authors
    Xin Lai
    Description

    Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

    🖥️Code | 🤗Data | 📄Paper This repo contains the Math-Step-DPO-10K dataset for our paper Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs, Step-DPO is a simple, effective, and data-efficient method for boosting the mathematical reasoning ability of LLMs. Notably, Step-DPO, when applied to Qwen2-72B-Instruct, achieves scores of 70.8% and 94.0% on the test sets of MATH and GSM8K… See the full description on the dataset page: https://huggingface.co/datasets/xinlai/Math-Step-DPO-10K.

  17. h

    gutenberg-dpo-v0.1

    • huggingface.co
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jon Durbin (2024). gutenberg-dpo-v0.1 [Dataset]. https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 11, 2024
    Authors
    Jon Durbin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gutenberg DPO

      Overview
    

    This is a dataset meant to enhance novel writing capabilities of LLMs, by using public domain books from Project Gutenberg

      Process
    

    First, the each book is parsed, split into chapters, cleaned up from the original format (remove superfluous newlines, illustration tags, etc.). Once we have chapters, an LLM is prompted with each chapter to create a synthetic prompt that would result in that chapter being written. Each chapter has a summary… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1.

  18. h

    dpo-dataset

    • huggingface.co
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam (2024). dpo-dataset [Dataset]. https://huggingface.co/datasets/srbdtwentyfour/dpo-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2024
    Authors
    Sam
    Description

    srbdtwentyfour/dpo-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    ChemPref-DPO-for-Chemistry-data-en

    • huggingface.co
    Updated Apr 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI4Chem (2024). ChemPref-DPO-for-Chemistry-data-en [Dataset]. https://huggingface.co/datasets/AI4Chem/ChemPref-DPO-for-Chemistry-data-en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2024
    Dataset authored and provided by
    AI4Chem
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Citation

    @misc{zhang2024chemllm, title={ChemLLM: A Chemical Large Language Model}, author={Di Zhang and Wei Liu and Qian Tan and Jingdan Chen and Hang Yan and Yuliang Yan and Jiatong Li and Weiran Huang and Xiangyu Yue and Dongzhan Zhou and Shufei Zhang and Mao Su and Hansen Zhong and Yuqiang Li and Wanli Ouyang}, year={2024}, eprint={2402.06852}, archivePrefix={arXiv}, primaryClass={cs.AI} }

  20. h

    alpaca-vs-alpaca-orpo-dpo

    • huggingface.co
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edoardo Federici (2024). alpaca-vs-alpaca-orpo-dpo [Dataset]. https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2024
    Authors
    Edoardo Federici
    Description

    Alpaca vs. Alpaca

      Dataset Description
    

    The Alpaca vs. Alpaca dataset is a curated blend of the Alpaca dataset and the Alpaca GPT-4 dataset, both available on HuggingFace Datasets. It uses the standard GPT dataset as the 'rejected' answer, steering the model towards the GPT-4 answer, which is considered as the 'chosen' one. However, it's important to note that the 'correctness' here is not absolute. The premise is based on the assumption that GPT-4 answers are generally… See the full description on the dataset page: https://huggingface.co/datasets/efederici/alpaca-vs-alpaca-orpo-dpo.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jon Durbin (2023). truthy-dpo-v0.1 [Dataset]. https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1

truthy-dpo-v0.1

jondurbin/truthy-dpo-v0.1

Explore at:
11 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2023
Authors
Jon Durbin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Truthy DPO

This is a dataset designed to enhance the overall truthfulness of LLMs, without sacrificing immersion when roleplaying as a human. For example, in normal AI assistant model, the model should not try to describe what the warmth of the sun feels like, but if the system prompt indicates it's a human, it should. Mostly targets corporeal, spacial, temporal awareness, and common misconceptions.

  Contribute

If you're interested in new functionality/datasets, take a… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1.

Search
Clear search
Close search
Google apps
Main menu