96 datasets found
  1. h

    Human-Like-DPO

    • huggingface.co
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MLX Community (2025). Human-Like-DPO [Dataset]. https://huggingface.co/datasets/mlx-community/Human-Like-DPO
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset authored and provided by
    MLX Community
    Description

    Human-Like DPO Test Dataset

    This repository provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. The dataset is designed for experimentation and evaluation of DPO models in smaller-scale scenarios.

      Dataset Overview
    

    The dataset comprises a total of 1,000 examples, divided as follows:

    Training Set: 800 examples (train.json) Validation Set: 100 examples (validation.json) Test Set: 100 examples… See the full description on the dataset page: https://huggingface.co/datasets/mlx-community/Human-Like-DPO.

  2. h

    mmlu-auxilary-train-dpo

    • huggingface.co
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xzuyn (2023). mmlu-auxilary-train-dpo [Dataset]. https://huggingface.co/datasets/xzuyn/mmlu-auxilary-train-dpo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2023
    Authors
    xzuyn
    Description

    MMLU Github Only used the auxiliary test set. I have not checked for similarity or contamination, but it's something I need to figure out soon. Has randomized starting messages indicating it's a multiple choice question, and the response needs to be a single letter. For the rejected response I randomly chose an incorrect answer, or randomly chose any answer written out fully and not just a single letter. This was done to hopefully teach a model how to properly follow the task of answering a… See the full description on the dataset page: https://huggingface.co/datasets/xzuyn/mmlu-auxilary-train-dpo.

  3. h

    dpo_data

    • huggingface.co
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Share4oReasoning (2024). dpo_data [Dataset]. https://huggingface.co/datasets/Share4oReasoning/dpo_data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 28, 2024
    Dataset authored and provided by
    Share4oReasoning
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ShareGPT4oReasoning Training Data DPO

    All dataset and models can be found at Share4oReasoning.

      Contents:
    

    DPO_preview: Contains model generated CoT judged my outcome reward.

    Image use same in sft repo: contains the zipped image data (see below for details) used for SFT above.

    Inference and Instruction for DPO: uploading now Training pipeline refer to LLaVA-Reasoner-DPO training TODO separate readme for setup and train.

      Set up:
    

    git clone… See the full description on the dataset page: https://huggingface.co/datasets/Share4oReasoning/dpo_data.

  4. Orca DPO Dialogue Pairs

    • kaggle.com
    zip
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Orca DPO Dialogue Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/intel-orca-dialogue-pairs/code
    Explore at:
    zip(11090412 bytes)Available download formats
    Dataset updated
    Nov 23, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Intel Orca Dialogue Pairs

    Orca style for preference training (Intel's DPO dataset)

    By Huggingface Hub [source]

    About this dataset

    The Intel/Orca/DPO Dialogue Pairs dataset is a unique resource for Natural language processing (NLP) research, combining AI and human conversations collected from online sources. This dataset is invaluable for exploring how human conversations can inform the development of conversational AI models. With columns such as System and Question extracted from chat logs, this dataset can help researchers understand more about how to better connect people with technology using meaningful dialogue. Furthermore, the data also includes columns for ChatGPT and Llama2–13b-Chat, two of the most widely used conversational AI models. By leveraging this data set, researchers have an exceptional opportunity to explore conversational techniques that enable humans and machines to communicate in natural languages

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide will provide an overview of how to use the Intel/Orca/DPO Dialogue Pairs dataset efficiently for human-centric natural language processing research.

    Step 1: Understand the dataset

    The Intel/Orca/DPO Dialogue Pairs dataset is composed of two main columns: System and Question. The System column contains responses from AI systems, and the Question column contains questions asked by humans. Additionally, this dataset also contains columns for ChatGPT and Llama2–13b-Chat, two models used in developing conversational AI systems.

    Step 2: Prepare your environment

    Before getting started with analyzing data from this dataset, you should first prepare your environment accordingly. Make sure that any necessary libraries or services are installed on your machine before attempting to work with the data from this dataset in order to avoid potential issues or errors during usage.

    ##### Step 3: Access the data
    In order to access and start working with the data contained in this Dataset, you can either download it directly via a Kaggle account or alternatively access it through one of its REST Endpoints if available on other services (i.e Databricks).

    ##### Step 4: Exploring & Analyzing the Data

    ##### Step 5 : Reporting Results
    Lastly ,once explorations and analyses have been completed its highly important that results are reported accurately especially when dealing with ethical datasets such as dialogue pairs since consequences could be dire if misinformation is disseminated .Reporting results should usually involve standard relevant indicators being declared while taking care conducting appropriate statistical tests ruling out incorrect anomalous outcomes

    Research Ideas

    • Developing and improving natural language processing algorithms for AI-human conversation.
    • Building user-friendly chatbots that are better at recognizing and understanding human intent by training the model using this dataset.
    • Designing recommendation systems to predict user questions and generate more accurate responses based on previous conversations in the dataset

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------------|:-----------------------------------------------------------------------------| | system | Contains the AI system's response to the user's question. (Text) | | chatgpt | Contains the ChatGPT model's response to the user's question. (Text) | | llama2-13b-chat | Contains the Llama2-13b-Chat model's response to the user's question. (Text) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  5. h

    dpo-training-data

    • huggingface.co
    Updated Aug 15, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeynep (2007). dpo-training-data [Dataset]. https://huggingface.co/datasets/Tandogan/dpo-training-data
    Explore at:
    Dataset updated
    Aug 15, 2007
    Authors
    Zeynep
    Description

    Tandogan/dpo-training-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. r

    Toxic-DPO Dataset

    • resodate.org
    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unalignment (2024). Toxic-DPO Dataset [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdG94aWMtZHBvLWRhdGFzZXQ=
    Explore at:
    Dataset updated
    Dec 2, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Unalignment
    Description

    The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback.

  7. h

    dpo-training-dataset

    • huggingface.co
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhruv Harchandani (2025). dpo-training-dataset [Dataset]. https://huggingface.co/datasets/vulcan2506/dpo-training-dataset
    Explore at:
    Dataset updated
    Jul 15, 2025
    Authors
    Dhruv Harchandani
    Description

    vulcan2506/dpo-training-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. dpo checkpoint 180

    • kaggle.com
    zip
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HinePo (2024). dpo checkpoint 180 [Dataset]. https://www.kaggle.com/datasets/hinepo/dpo-checkpoint-180
    Explore at:
    zip(310487423 bytes)Available download formats
    Dataset updated
    Feb 13, 2024
    Authors
    HinePo
    Description

    Mistral-7B-Instruct-v0.1 DPO finetuned for 180 steps on 'Intel/orca_dpo_pairs' dataset.

    Training was done in Google Colab (A100) and it took ~4h30min. Batch size = 8.

    Dataset link: https://huggingface.co/datasets/tatsu-lab/alpaca

  9. h

    orpo-dpo-mix-40k

    • huggingface.co
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Labonne (2024). orpo-dpo-mix-40k [Dataset]. https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2024
    Authors
    Maxime Labonne
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ORPO-DPO-mix-40k v1.2

    This dataset is designed for ORPO or DPO training. See Fine-tune Llama 3 with ORPO for more information about how to use it. It is a combination of the following high-quality DPO datasets:

    argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)argilla/distilabel-intel-orca-dpo-pairs: highly scored chosen answers >=9, not in GSM8K (2,299 samples) argilla/ultrafeedback-binarized-preferences-cleaned: highly scored chosen answers >=5 (22… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k.

  10. h

    Self-taught-evaluator-DPO-data

    • huggingface.co
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2024). Self-taught-evaluator-DPO-data [Dataset]. https://huggingface.co/datasets/facebook/Self-taught-evaluator-DPO-data
    Explore at:
    Dataset updated
    Sep 27, 2024
    Dataset authored and provided by
    AI at Meta
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    This dataset is released as part of Self-taught evaluators research project. Please refer to our project materials here for training and evaluation details.

      Loading the dataset with transformers
    

    This dataset is built upon WildChat prompts by using Llama-3.1-70B-Instruct to generate responses and evaluation plans. Details on how to build such a self-taught dataset can be found in Self-taught evaluators. Minimal example below showing how to prepare training data. from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/facebook/Self-taught-evaluator-DPO-data.

  11. Training Courses under the “Smart Silver” Digital Inclusion Programmes

    • data.gov.hk
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk, Training Courses under the “Smart Silver” Digital Inclusion Programmes [Dataset]. https://data.gov.hk/en-data/dataset/hk-dpo-dpo_hp-digital-inclusion-programmes-training-courses
    Explore at:
    Dataset provided by
    data.gov.hk
    Description

    List of training courses provided under the “Smart Silver” digital inclusion programmes - the DPO launched “Smart Silver” digital inclusion programmes to help those in need (especially the elderly) to understand and use digital technology products and services.

  12. h

    DPO-hh-rlhf

    • huggingface.co
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Columbia NLP (2024). DPO-hh-rlhf [Dataset]. https://huggingface.co/datasets/Columbia-NLP/DPO-hh-rlhf
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2024
    Dataset authored and provided by
    Columbia NLP
    Description

    Dataset Card for DPO-hh-rlhf

    Reformatted from Anthropic/hh-rlhf dataset. The LION-series are trained using an empirically optimized pipeline that consists of three stages: SFT, DPO, and online preference learning (online DPO). We find simple techniques such as sequence packing, loss masking in SFT, increasing the preference dataset size in DPO, and online DPO training can significantly improve the performance of language models. Our best models (the LION-series) exceed the… See the full description on the dataset page: https://huggingface.co/datasets/Columbia-NLP/DPO-hh-rlhf.

  13. h

    LLM-QE-DPO-Training-Data

    • huggingface.co
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chengpingan (2025). LLM-QE-DPO-Training-Data [Dataset]. https://huggingface.co/datasets/chengpingan/LLM-QE-DPO-Training-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2025
    Authors
    chengpingan
    Description

    chengpingan/LLM-QE-DPO-Training-Data dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. List of online learning resources in the "Smart Silver" Elderly IT Learning...

    • data.gov.hk
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk (2024). List of online learning resources in the "Smart Silver" Elderly IT Learning Portal [Dataset]. https://data.gov.hk/en-data/dataset/hk-dpo-dpo_hp-online-learning-resources-in-the-elderly-it-learning-portal
    Explore at:
    Dataset updated
    Jul 25, 2024
    Dataset provided by
    data.gov.hk
    Description

    List of online learning resources in the "Smart Silver" Elderly IT Learning Portal and the respective links.

  15. h

    Ling-Coder-DPO

    • huggingface.co
    Updated Oct 17, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    inclusionAI (2019). Ling-Coder-DPO [Dataset]. https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO
    Explore at:
    Dataset updated
    Oct 17, 2019
    Dataset authored and provided by
    inclusionAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🤗 Hugging Face 🤖 ModelScope 🖥️ GitHub

      Ling-Coder Dataset
    

    The Ling-Coder Dataset comprises the following components:

    Ling-Coder-SFT: A subset of SFT data used for training Ling-Coder Lite, containing more than 5 million samples. Ling-Coder-DPO: A subset of DPO data used for training Ling-Coder Lite, containing 250k samples. Ling-Coder-SyntheticQA: A subset of synthetic data used for annealing training of Ling-Coder Lite, containing more… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO.

  16. f

    Data_Sheet_1_Analysis of an “international teaching practicum” as a program...

    • frontiersin.figshare.com
    • figshare.com
    docx
    Updated Aug 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youn Ock Kim; Soyoung Yun; Yang Hwan Sol (2023). Data_Sheet_1_Analysis of an “international teaching practicum” as a program for achieving “teacher agency” and strengthening “technological pedagogical content knowledge”.docx [Dataset]. http://doi.org/10.3389/feduc.2023.1200092.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Aug 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Youn Ock Kim; Soyoung Yun; Yang Hwan Sol
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThis research aimed to study the achievement of teacher agency of Korean pre-service teachers through their experiences in an international teaching practicum (ITP). This study also diagnosed the domestic pre-orientation (DPO) program of G University to seek the possibilities of developing a TPACK strengthening program. The participants of this study were five female Korean pre-service teachers who were in the ITP from 2015-2016.MethodsThe data were collected in two different time slots: the teaching diaries and ITP reports from the Korean pre-service teachers, DPO teaching materials, and program instructor's field notes were collected in 2016, and then, the interview was conducted in 2018.Results and discussionTo study their teacher agency achievement, the three chordal triads of agency, the iterational dimension, the practical-evaluative dimension, and the projective dimension, were spotlighted and used as the lens to analyze the data. In addition, the DPO program was analyzed based on the elements of TPACK competencies. The research shows that the ITP was a trigger experience for the Korean pre-service teachers in terms of the achievement of teacher agency. The participants could project their aspirations and then decide and execute what they had learned from the ITP in their actual Korean classrooms. Also, the need to reconstruct the DPO program to be able to assist the pre-service teachers' TPACK achievement has been raised.

  17. h

    DPO

    • huggingface.co
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madeleine Hueber (2025). DPO [Dataset]. https://huggingface.co/datasets/madhueb/DPO
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Madeleine Hueber
    Description

    This dataset was created for training and evaluating a DPO-based language model in the context of STEM questions. It supports both instruction tuning and preference-based fine-tuning using the DPO framework. The dataset was developed for the M2 deliverable of the CS-552 course Modern Natural Language Processing.

      Dataset structure :
    

    train: For DPO training (chosen/rejected pairs). This data comes from Milestone 1 (pref pairs collected by students) or from mlabonne/orpo-dpo-mix-40k… See the full description on the dataset page: https://huggingface.co/datasets/madhueb/DPO.

  18. Peace and Security Pillar: UN Peacekeeping Training Gender Aggregated Data

    • data.humdata.org
    csv
    Updated Oct 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Nations Peace and Security Data Hub (2025). Peace and Security Pillar: UN Peacekeeping Training Gender Aggregated Data [Dataset]. https://data.humdata.org/dataset/dpo-pktraining
    Explore at:
    csv(1599)Available download formats
    Dataset updated
    Oct 28, 2025
    Dataset provided by
    United Nationshttp://un.org/
    United Nations Peacekeeping Forceshttp://un.org/
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    United Nations
    Description

    The following gender disaggregated training data is organized annually with period from 1 July to 30 June. The data represents military, police and civilian training.

    Member States are responsible for delivering the pre-deployment training (PDT) to all units and personnel provided to UN peacekeeping operations. ITS delivers training of trainer’s courses for Member State trainers to build national capacity to deliver training to UN standards. Civilian Pre-Deployment Training (CPT) improves preparedness and effectiveness of civilian peacekeepers. ITS has a dedicated team that delivers CPT at the UN Regional Service Centre in Entebbe, Uganda. Senior Leadership Training targets the highest levels (SRSG, DSRSG, Force Commander or Head of Military Component, Police Commissioner and Director of Mission Support) of field mission leadership to provide them with the knowledge needed to lead and manage field missions.

    This dataset is managed by the Integrated Training Service of the UN Department of Peace Operations.

  19. h

    HelpSteer2-DPO-Atsunori

    • huggingface.co
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GenRM: Generative Reward Models (2024). HelpSteer2-DPO-Atsunori [Dataset]. https://huggingface.co/datasets/GenRM/HelpSteer2-DPO-Atsunori
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset authored and provided by
    GenRM: Generative Reward Models
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a conversion of nvidia/HelpSteer2 into preference pairs based on the helpfulness score for training DPO. HelpSteer2-DPO is also licensed under CC-BY-4.0.

      Dataset Description
    

    In accordance with the following paper, HelpSteer2: Open-source dataset for training top-performing reward models we converted nvidia/HelpSteer2 dataset into a preference dataset by taking the response with the higher helpfulness score as the chosen response, with the remaining response being the… See the full description on the dataset page: https://huggingface.co/datasets/GenRM/HelpSteer2-DPO-Atsunori.

  20. 「友智识」数码共融计划培训课程

    • data.gov.hk
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk (2025). 「友智识」数码共融计划培训课程 [Dataset]. https://data.gov.hk/sc-data/dataset/hk-dpo-dpo_hp-digital-inclusion-programmes-training-courses
    Explore at:
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    資料一線通
    Description

    「友智识」数码共融计划培训课程列表 - 数字政策办公室推出「友智识」数码共融计划,协助有需要人士(尤其是长者)认识及使用数码科技产品及服务。

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MLX Community (2025). Human-Like-DPO [Dataset]. https://huggingface.co/datasets/mlx-community/Human-Like-DPO

Human-Like-DPO

mlx-community/Human-Like-DPO

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 22, 2025
Dataset authored and provided by
MLX Community
Description

Human-Like DPO Test Dataset

This repository provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. The dataset is designed for experimentation and evaluation of DPO models in smaller-scale scenarios.

  Dataset Overview

The dataset comprises a total of 1,000 examples, divided as follows:

Training Set: 800 examples (train.json) Validation Set: 100 examples (validation.json) Test Set: 100 examples… See the full description on the dataset page: https://huggingface.co/datasets/mlx-community/Human-Like-DPO.

Search
Clear search
Close search
Google apps
Main menu