96 datasets found

h
Human-Like-DPO
huggingface.co
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MLX Community (2025). Human-Like-DPO [Dataset]. https://huggingface.co/datasets/mlx-community/Human-Like-DPO
Explore at:
Dataset updated
Jan 22, 2025
Dataset authored and provided by
MLX Community
Description
Human-Like DPO Test Dataset

This repository provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. The dataset is designed for experimentation and evaluation of DPO models in smaller-scale scenarios.

Dataset Overview

The dataset comprises a total of 1,000 examples, divided as follows:

Training Set: 800 examples (train.json) Validation Set: 100 examples (validation.json) Test Set: 100 examples… See the full description on the dataset page: https://huggingface.co/datasets/mlx-community/Human-Like-DPO.
h
mmlu-auxilary-train-dpo
huggingface.co
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xzuyn (2023). mmlu-auxilary-train-dpo [Dataset]. https://huggingface.co/datasets/xzuyn/mmlu-auxilary-train-dpo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 18, 2023
Authors
xzuyn
Description
MMLU Github Only used the auxiliary test set. I have not checked for similarity or contamination, but it's something I need to figure out soon. Has randomized starting messages indicating it's a multiple choice question, and the response needs to be a single letter. For the rejected response I randomly chose an incorrect answer, or randomly chose any answer written out fully and not just a single letter. This was done to hopefully teach a model how to properly follow the task of answering a… See the full description on the dataset page: https://huggingface.co/datasets/xzuyn/mmlu-auxilary-train-dpo.
h
dpo_data
huggingface.co
Updated Dec 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Share4oReasoning (2024). dpo_data [Dataset]. https://huggingface.co/datasets/Share4oReasoning/dpo_data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 28, 2024
Dataset authored and provided by
Share4oReasoning
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ShareGPT4oReasoning Training Data DPO

All dataset and models can be found at Share4oReasoning.

Contents:

DPO_preview: Contains model generated CoT judged my outcome reward.

Image use same in sft repo: contains the zipped image data (see below for details) used for SFT above.

Inference and Instruction for DPO: uploading now Training pipeline refer to LLaVA-Reasoner-DPO training TODO separate readme for setup and train.

Set up:

git clone… See the full description on the dataset page: https://huggingface.co/datasets/Share4oReasoning/dpo_data.
Orca DPO Dialogue Pairs
kaggle.com
zip
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Orca DPO Dialogue Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/intel-orca-dialogue-pairs/code
Explore at:
zip(11090412 bytes)Available download formats
Dataset updated
Nov 23, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Intel Orca Dialogue Pairs

Orca style for preference training (Intel's DPO dataset)

By Huggingface Hub [source]

About this dataset

The Intel/Orca/DPO Dialogue Pairs dataset is a unique resource for Natural language processing (NLP) research, combining AI and human conversations collected from online sources. This dataset is invaluable for exploring how human conversations can inform the development of conversational AI models. With columns such as System and Question extracted from chat logs, this dataset can help researchers understand more about how to better connect people with technology using meaningful dialogue. Furthermore, the data also includes columns for ChatGPT and Llama2–13b-Chat, two of the most widely used conversational AI models. By leveraging this data set, researchers have an exceptional opportunity to explore conversational techniques that enable humans and machines to communicate in natural languages

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will provide an overview of how to use the Intel/Orca/DPO Dialogue Pairs dataset efficiently for human-centric natural language processing research.

Step 1: Understand the dataset

The Intel/Orca/DPO Dialogue Pairs dataset is composed of two main columns: System and Question. The System column contains responses from AI systems, and the Question column contains questions asked by humans. Additionally, this dataset also contains columns for ChatGPT and Llama2–13b-Chat, two models used in developing conversational AI systems.

Step 2: Prepare your environment

Before getting started with analyzing data from this dataset, you should first prepare your environment accordingly. Make sure that any necessary libraries or services are installed on your machine before attempting to work with the data from this dataset in order to avoid potential issues or errors during usage.

##### Step 3: Access the data
In order to access and start working with the data contained in this Dataset, you can either download it directly via a Kaggle account or alternatively access it through one of its REST Endpoints if available on other services (i.e Databricks).

##### Step 4: Exploring & Analyzing the Data

##### Step 5 : Reporting Results
Lastly ,once explorations and analyses have been completed its highly important that results are reported accurately especially when dealing with ethical datasets such as dialogue pairs since consequences could be dire if misinformation is disseminated .Reporting results should usually involve standard relevant indicators being declared while taking care conducting appropriate statistical tests ruling out incorrect anomalous outcomes

Research Ideas

Developing and improving natural language processing algorithms for AI-human conversation.

Building user-friendly chatbots that are better at recognizing and understanding human intent by training the model using this dataset.

Designing recommendation systems to predict user questions and generate more accurate responses based on previous conversations in the dataset

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------------|:-----------------------------------------------------------------------------| | system | Contains the AI system's response to the user's question. (Text) | | chatgpt | Contains the ChatGPT model's response to the user's question. (Text) | | llama2-13b-chat | Contains the Llama2-13b-Chat model's response to the user's question. (Text) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
h
dpo-training-data
huggingface.co
Updated Aug 15, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeynep (2007). dpo-training-data [Dataset]. https://huggingface.co/datasets/Tandogan/dpo-training-data
Explore at:
Dataset updated
Aug 15, 2007
Authors
Zeynep
Description
Tandogan/dpo-training-data dataset hosted on Hugging Face and contributed by the HF Datasets community
r
Toxic-DPO Dataset
resodate.org
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unalignment (2024). Toxic-DPO Dataset [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdG94aWMtZHBvLWRhdGFzZXQ=
Explore at:
Dataset updated
Dec 2, 2024
Dataset provided by
Leibniz Data Manager
Authors
Unalignment
Description
The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback.
h
dpo-training-dataset
huggingface.co
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhruv Harchandani (2025). dpo-training-dataset [Dataset]. https://huggingface.co/datasets/vulcan2506/dpo-training-dataset
Explore at:
Dataset updated
Jul 15, 2025
Authors
Dhruv Harchandani
Description
vulcan2506/dpo-training-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
dpo checkpoint 180
kaggle.com
zip
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HinePo (2024). dpo checkpoint 180 [Dataset]. https://www.kaggle.com/datasets/hinepo/dpo-checkpoint-180
Explore at:
zip(310487423 bytes)Available download formats
Dataset updated
Feb 13, 2024
Authors
HinePo
Description
Mistral-7B-Instruct-v0.1 DPO finetuned for 180 steps on 'Intel/orca_dpo_pairs' dataset.

Training was done in Google Colab (A100) and it took ~4h30min. Batch size = 8.

Dataset link: https://huggingface.co/datasets/tatsu-lab/alpaca
h
orpo-dpo-mix-40k
huggingface.co
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Labonne (2024). orpo-dpo-mix-40k [Dataset]. https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 18, 2024
Authors
Maxime Labonne
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ORPO-DPO-mix-40k v1.2

This dataset is designed for ORPO or DPO training. See Fine-tune Llama 3 with ORPO for more information about how to use it. It is a combination of the following high-quality DPO datasets:

argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)argilla/distilabel-intel-orca-dpo-pairs: highly scored chosen answers >=9, not in GSM8K (2,299 samples) argilla/ultrafeedback-binarized-preferences-cleaned: highly scored chosen answers >=5 (22… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k.
h
Self-taught-evaluator-DPO-data
huggingface.co
Updated Sep 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI at Meta (2024). Self-taught-evaluator-DPO-data [Dataset]. https://huggingface.co/datasets/facebook/Self-taught-evaluator-DPO-data
Explore at:
Dataset updated
Sep 27, 2024
Dataset authored and provided by
AI at Meta
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
This dataset is released as part of Self-taught evaluators research project. Please refer to our project materials here for training and evaluation details.

Loading the dataset with transformers

This dataset is built upon WildChat prompts by using Llama-3.1-70B-Instruct to generate responses and evaluation plans. Details on how to build such a self-taught dataset can be found in Self-taught evaluators. Minimal example below showing how to prepare training data. from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/facebook/Self-taught-evaluator-DPO-data.
Training Courses under the “Smart Silver” Digital Inclusion Programmes
data.gov.hk
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.hk, Training Courses under the “Smart Silver” Digital Inclusion Programmes [Dataset]. https://data.gov.hk/en-data/dataset/hk-dpo-dpo_hp-digital-inclusion-programmes-training-courses
Explore at:
Dataset provided by
data.gov.hk
Description
List of training courses provided under the “Smart Silver” digital inclusion programmes - the DPO launched “Smart Silver” digital inclusion programmes to help those in need (especially the elderly) to understand and use digital technology products and services.
h
DPO-hh-rlhf
huggingface.co
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Columbia NLP (2024). DPO-hh-rlhf [Dataset]. https://huggingface.co/datasets/Columbia-NLP/DPO-hh-rlhf
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 10, 2024
Dataset authored and provided by
Columbia NLP
Description
Dataset Card for DPO-hh-rlhf

Reformatted from Anthropic/hh-rlhf dataset. The LION-series are trained using an empirically optimized pipeline that consists of three stages: SFT, DPO, and online preference learning (online DPO). We find simple techniques such as sequence packing, loss masking in SFT, increasing the preference dataset size in DPO, and online DPO training can significantly improve the performance of language models. Our best models (the LION-series) exceed the… See the full description on the dataset page: https://huggingface.co/datasets/Columbia-NLP/DPO-hh-rlhf.
h
LLM-QE-DPO-Training-Data
huggingface.co
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chengpingan (2025). LLM-QE-DPO-Training-Data [Dataset]. https://huggingface.co/datasets/chengpingan/LLM-QE-DPO-Training-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2025
Authors
chengpingan
Description
chengpingan/LLM-QE-DPO-Training-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
List of online learning resources in the "Smart Silver" Elderly IT Learning...
data.gov.hk
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.hk (2024). List of online learning resources in the "Smart Silver" Elderly IT Learning Portal [Dataset]. https://data.gov.hk/en-data/dataset/hk-dpo-dpo_hp-online-learning-resources-in-the-elderly-it-learning-portal
Explore at:
Dataset updated
Jul 25, 2024
Dataset provided by
data.gov.hk
Description
List of online learning resources in the "Smart Silver" Elderly IT Learning Portal and the respective links.
h
Ling-Coder-DPO
huggingface.co
Updated Oct 17, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
inclusionAI (2019). Ling-Coder-DPO [Dataset]. https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO
Explore at:
Dataset updated
Oct 17, 2019
Dataset authored and provided by
inclusionAI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
🤗 Hugging Face 🤖 ModelScope 🖥️ GitHub

Ling-Coder Dataset

The Ling-Coder Dataset comprises the following components:

Ling-Coder-SFT: A subset of SFT data used for training Ling-Coder Lite, containing more than 5 million samples. Ling-Coder-DPO: A subset of DPO data used for training Ling-Coder Lite, containing 250k samples. Ling-Coder-SyntheticQA: A subset of synthetic data used for annealing training of Ling-Coder Lite, containing more… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO.
f
Data_Sheet_1_Analysis of an “international teaching practicum” as a program...
frontiersin.figshare.com
figshare.com
docx
Updated Aug 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youn Ock Kim; Soyoung Yun; Yang Hwan Sol (2023). Data_Sheet_1_Analysis of an “international teaching practicum” as a program for achieving “teacher agency” and strengthening “technological pedagogical content knowledge”.docx [Dataset]. http://doi.org/10.3389/feduc.2023.1200092.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2023.1200092.s001
Dataset updated
Aug 31, 2023
Dataset provided by
Frontiers
Authors
Youn Ock Kim; Soyoung Yun; Yang Hwan Sol
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionThis research aimed to study the achievement of teacher agency of Korean pre-service teachers through their experiences in an international teaching practicum (ITP). This study also diagnosed the domestic pre-orientation (DPO) program of G University to seek the possibilities of developing a TPACK strengthening program. The participants of this study were five female Korean pre-service teachers who were in the ITP from 2015-2016.MethodsThe data were collected in two different time slots: the teaching diaries and ITP reports from the Korean pre-service teachers, DPO teaching materials, and program instructor's field notes were collected in 2016, and then, the interview was conducted in 2018.Results and discussionTo study their teacher agency achievement, the three chordal triads of agency, the iterational dimension, the practical-evaluative dimension, and the projective dimension, were spotlighted and used as the lens to analyze the data. In addition, the DPO program was analyzed based on the elements of TPACK competencies. The research shows that the ITP was a trigger experience for the Korean pre-service teachers in terms of the achievement of teacher agency. The participants could project their aspirations and then decide and execute what they had learned from the ITP in their actual Korean classrooms. Also, the need to reconstruct the DPO program to be able to assist the pre-service teachers' TPACK achievement has been raised.
h
DPO
huggingface.co
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madeleine Hueber (2025). DPO [Dataset]. https://huggingface.co/datasets/madhueb/DPO
Explore at:
Dataset updated
May 28, 2025
Authors
Madeleine Hueber
Description
This dataset was created for training and evaluating a DPO-based language model in the context of STEM questions. It supports both instruction tuning and preference-based fine-tuning using the DPO framework. The dataset was developed for the M2 deliverable of the CS-552 course Modern Natural Language Processing.

Dataset structure :

train: For DPO training (chosen/rejected pairs). This data comes from Milestone 1 (pref pairs collected by students) or from mlabonne/orpo-dpo-mix-40k… See the full description on the dataset page: https://huggingface.co/datasets/madhueb/DPO.
Peace and Security Pillar: UN Peacekeeping Training Gender Aggregated Data
data.humdata.org
csv
Updated Oct 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United Nations Peace and Security Data Hub (2025). Peace and Security Pillar: UN Peacekeeping Training Gender Aggregated Data [Dataset]. https://data.humdata.org/dataset/dpo-pktraining
Explore at:
csv(1599)Available download formats
Dataset updated
Oct 28, 2025
Dataset provided by
United Nationshttp://un.org/
United Nations Peacekeeping Forceshttp://un.org/
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered
United Nations
Description
The following gender disaggregated training data is organized annually with period from 1 July to 30 June. The data represents military, police and civilian training.

Member States are responsible for delivering the pre-deployment training (PDT) to all units and personnel provided to UN peacekeeping operations. ITS delivers training of trainer’s courses for Member State trainers to build national capacity to deliver training to UN standards. Civilian Pre-Deployment Training (CPT) improves preparedness and effectiveness of civilian peacekeepers. ITS has a dedicated team that delivers CPT at the UN Regional Service Centre in Entebbe, Uganda. Senior Leadership Training targets the highest levels (SRSG, DSRSG, Force Commander or Head of Military Component, Police Commissioner and Director of Mission Support) of field mission leadership to provide them with the knowledge needed to lead and manage field missions.

This dataset is managed by the Integrated Training Service of the UN Department of Peace Operations.
h
HelpSteer2-DPO-Atsunori
huggingface.co
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GenRM: Generative Reward Models (2024). HelpSteer2-DPO-Atsunori [Dataset]. https://huggingface.co/datasets/GenRM/HelpSteer2-DPO-Atsunori
Explore at:
Dataset updated
Jul 11, 2024
Dataset authored and provided by
GenRM: Generative Reward Models
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a conversion of nvidia/HelpSteer2 into preference pairs based on the helpfulness score for training DPO. HelpSteer2-DPO is also licensed under CC-BY-4.0.

Dataset Description

In accordance with the following paper, HelpSteer2: Open-source dataset for training top-performing reward models we converted nvidia/HelpSteer2 dataset into a preference dataset by taking the response with the higher helpfulness score as the chosen response, with the remaining response being the… See the full description on the dataset page: https://huggingface.co/datasets/GenRM/HelpSteer2-DPO-Atsunori.
「友智识」数码共融计划培训课程
data.gov.hk
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.hk (2025). 「友智识」数码共融计划培训课程 [Dataset]. https://data.gov.hk/sc-data/dataset/hk-dpo-dpo_hp-digital-inclusion-programmes-training-courses
Explore at:
Dataset updated
Dec 2, 2025
Dataset provided by
資料一線通
Description
「友智识」数码共融计划培训课程列表 - 数字政策办公室推出「友智识」数码共融计划，协助有需要人士（尤其是长者）认识及使用数码科技产品及服务。

Facebook

Twitter

Click to copy link

Link copied

Cite

MLX Community (2025). Human-Like-DPO [Dataset]. https://huggingface.co/datasets/mlx-community/Human-Like-DPO

Human-Like-DPO

mlx-community/Human-Like-DPO

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 22, 2025

Dataset authored and provided by

MLX Community

Description

Human-Like DPO Test Dataset

This repository provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. The dataset is designed for experimentation and evaluation of DPO models in smaller-scale scenarios.

  Dataset Overview

The dataset comprises a total of 1,000 examples, divided as follows:

Training Set: 800 examples (train.json) Validation Set: 100 examples (validation.json) Test Set: 100 examples… See the full description on the dataset page: https://huggingface.co/datasets/mlx-community/Human-Like-DPO.

Clear search

Close search

Google apps

Main menu

Human-Like-DPO

mmlu-auxilary-train-dpo

dpo_data

Orca DPO Dialogue Pairs

Intel Orca Dialogue Pairs

Orca style for preference training (Intel's DPO dataset)

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Step 1: Understand the dataset

Step 2: Prepare your environment

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

dpo-training-data

Toxic-DPO Dataset

dpo-training-dataset

dpo checkpoint 180

orpo-dpo-mix-40k

Self-taught-evaluator-DPO-data

Training Courses under the “Smart Silver” Digital Inclusion Programmes

DPO-hh-rlhf

LLM-QE-DPO-Training-Data

List of online learning resources in the "Smart Silver" Elderly IT Learning...

Ling-Coder-DPO

Data_Sheet_1_Analysis of an “international teaching practicum” as a program...

DPO

Peace and Security Pillar: UN Peacekeeping Training Gender Aggregated Data

HelpSteer2-DPO-Atsunori

「友智识」数码共融计划培训课程

Human-Like-DPO

mlx-community/Human-Like-DPO