Facebook
TwitterHuman-Like DPO Test Dataset
This repository provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. The dataset is designed for experimentation and evaluation of DPO models in smaller-scale scenarios.
Dataset Overview
The dataset comprises a total of 1,000 examples, divided as follows:
Training Set: 800 examples (train.json) Validation Set: 100 examples (validation.json) Test Set: 100 examples… See the full description on the dataset page: https://huggingface.co/datasets/mlx-community/Human-Like-DPO.
Facebook
TwitterMMLU Github Only used the auxiliary test set. I have not checked for similarity or contamination, but it's something I need to figure out soon. Has randomized starting messages indicating it's a multiple choice question, and the response needs to be a single letter. For the rejected response I randomly chose an incorrect answer, or randomly chose any answer written out fully and not just a single letter. This was done to hopefully teach a model how to properly follow the task of answering a… See the full description on the dataset page: https://huggingface.co/datasets/xzuyn/mmlu-auxilary-train-dpo.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ShareGPT4oReasoning Training Data DPO
All dataset and models can be found at Share4oReasoning.
Contents:
DPO_preview: Contains model generated CoT judged my outcome reward.
Image use same in sft repo: contains the zipped image data (see below for details) used for SFT above.
Inference and Instruction for DPO: uploading now Training pipeline refer to LLaVA-Reasoner-DPO training TODO separate readme for setup and train.
Set up:
git clone… See the full description on the dataset page: https://huggingface.co/datasets/Share4oReasoning/dpo_data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
The Intel/Orca/DPO Dialogue Pairs dataset is a unique resource for Natural language processing (NLP) research, combining AI and human conversations collected from online sources. This dataset is invaluable for exploring how human conversations can inform the development of conversational AI models. With columns such as System and Question extracted from chat logs, this dataset can help researchers understand more about how to better connect people with technology using meaningful dialogue. Furthermore, the data also includes columns for ChatGPT and Llama2–13b-Chat, two of the most widely used conversational AI models. By leveraging this data set, researchers have an exceptional opportunity to explore conversational techniques that enable humans and machines to communicate in natural languages
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This guide will provide an overview of how to use the Intel/Orca/DPO Dialogue Pairs dataset efficiently for human-centric natural language processing research.
Step 1: Understand the dataset
The Intel/Orca/DPO Dialogue Pairs dataset is composed of two main columns: System and Question. The System column contains responses from AI systems, and the Question column contains questions asked by humans. Additionally, this dataset also contains columns for ChatGPT and Llama2–13b-Chat, two models used in developing conversational AI systems.
Step 2: Prepare your environment
Before getting started with analyzing data from this dataset, you should first prepare your environment accordingly. Make sure that any necessary libraries or services are installed on your machine before attempting to work with the data from this dataset in order to avoid potential issues or errors during usage.
##### Step 3: Access the data
In order to access and start working with the data contained in this Dataset, you can either download it directly via a Kaggle account or alternatively access it through one of its REST Endpoints if available on other services (i.e Databricks).##### Step 4: Exploring & Analyzing the Data
##### Step 5 : Reporting Results
Lastly ,once explorations and analyses have been completed its highly important that results are reported accurately especially when dealing with ethical datasets such as dialogue pairs since consequences could be dire if misinformation is disseminated .Reporting results should usually involve standard relevant indicators being declared while taking care conducting appropriate statistical tests ruling out incorrect anomalous outcomes
- Developing and improving natural language processing algorithms for AI-human conversation.
- Building user-friendly chatbots that are better at recognizing and understanding human intent by training the model using this dataset.
- Designing recommendation systems to predict user questions and generate more accurate responses based on previous conversations in the dataset
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:--------------------|:-----------------------------------------------------------------------------| | system | Contains the AI system's response to the user's question. (Text) | | chatgpt | Contains the ChatGPT model's response to the user's question. (Text) | | llama2-13b-chat | Contains the Llama2-13b-Chat model's response to the user's question. (Text) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
Facebook
TwitterTandogan/dpo-training-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThe dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback.
Facebook
Twittervulcan2506/dpo-training-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMistral-7B-Instruct-v0.1 DPO finetuned for 180 steps on 'Intel/orca_dpo_pairs' dataset.
Training was done in Google Colab (A100) and it took ~4h30min. Batch size = 8.
Dataset link: https://huggingface.co/datasets/tatsu-lab/alpaca
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ORPO-DPO-mix-40k v1.2
This dataset is designed for ORPO or DPO training. See Fine-tune Llama 3 with ORPO for more information about how to use it. It is a combination of the following high-quality DPO datasets:
argilla/Capybara-Preferences: highly scored chosen answers >=5 (7,424 samples)argilla/distilabel-intel-orca-dpo-pairs: highly scored chosen answers >=9, not in GSM8K (2,299 samples) argilla/ultrafeedback-binarized-preferences-cleaned: highly scored chosen answers >=5 (22… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
This dataset is released as part of Self-taught evaluators research project. Please refer to our project materials here for training and evaluation details.
Loading the dataset with transformers
This dataset is built upon WildChat prompts by using Llama-3.1-70B-Instruct to generate responses and evaluation plans. Details on how to build such a self-taught dataset can be found in Self-taught evaluators. Minimal example below showing how to prepare training data. from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/facebook/Self-taught-evaluator-DPO-data.
Facebook
TwitterList of training courses provided under the “Smart Silver” digital inclusion programmes - the DPO launched “Smart Silver” digital inclusion programmes to help those in need (especially the elderly) to understand and use digital technology products and services.
Facebook
TwitterDataset Card for DPO-hh-rlhf
Reformatted from Anthropic/hh-rlhf dataset. The LION-series are trained using an empirically optimized pipeline that consists of three stages: SFT, DPO, and online preference learning (online DPO). We find simple techniques such as sequence packing, loss masking in SFT, increasing the preference dataset size in DPO, and online DPO training can significantly improve the performance of language models. Our best models (the LION-series) exceed the… See the full description on the dataset page: https://huggingface.co/datasets/Columbia-NLP/DPO-hh-rlhf.
Facebook
Twitterchengpingan/LLM-QE-DPO-Training-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterList of online learning resources in the "Smart Silver" Elderly IT Learning Portal and the respective links.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🤗 Hugging Face 🤖 ModelScope 🖥️ GitHub
Ling-Coder Dataset
The Ling-Coder Dataset comprises the following components:
Ling-Coder-SFT: A subset of SFT data used for training Ling-Coder Lite, containing more than 5 million samples. Ling-Coder-DPO: A subset of DPO data used for training Ling-Coder Lite, containing 250k samples. Ling-Coder-SyntheticQA: A subset of synthetic data used for annealing training of Ling-Coder Lite, containing more… See the full description on the dataset page: https://huggingface.co/datasets/inclusionAI/Ling-Coder-DPO.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThis research aimed to study the achievement of teacher agency of Korean pre-service teachers through their experiences in an international teaching practicum (ITP). This study also diagnosed the domestic pre-orientation (DPO) program of G University to seek the possibilities of developing a TPACK strengthening program. The participants of this study were five female Korean pre-service teachers who were in the ITP from 2015-2016.MethodsThe data were collected in two different time slots: the teaching diaries and ITP reports from the Korean pre-service teachers, DPO teaching materials, and program instructor's field notes were collected in 2016, and then, the interview was conducted in 2018.Results and discussionTo study their teacher agency achievement, the three chordal triads of agency, the iterational dimension, the practical-evaluative dimension, and the projective dimension, were spotlighted and used as the lens to analyze the data. In addition, the DPO program was analyzed based on the elements of TPACK competencies. The research shows that the ITP was a trigger experience for the Korean pre-service teachers in terms of the achievement of teacher agency. The participants could project their aspirations and then decide and execute what they had learned from the ITP in their actual Korean classrooms. Also, the need to reconstruct the DPO program to be able to assist the pre-service teachers' TPACK achievement has been raised.
Facebook
TwitterThis dataset was created for training and evaluating a DPO-based language model in the context of STEM questions. It supports both instruction tuning and preference-based fine-tuning using the DPO framework. The dataset was developed for the M2 deliverable of the CS-552 course Modern Natural Language Processing.
Dataset structure :
train: For DPO training (chosen/rejected pairs). This data comes from Milestone 1 (pref pairs collected by students) or from mlabonne/orpo-dpo-mix-40k… See the full description on the dataset page: https://huggingface.co/datasets/madhueb/DPO.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The following gender disaggregated training data is organized annually with period from 1 July to 30 June. The data represents military, police and civilian training.
Member States are responsible for delivering the pre-deployment training (PDT) to all units and personnel provided to UN peacekeeping operations. ITS delivers training of trainer’s courses for Member State trainers to build national capacity to deliver training to UN standards. Civilian Pre-Deployment Training (CPT) improves preparedness and effectiveness of civilian peacekeepers. ITS has a dedicated team that delivers CPT at the UN Regional Service Centre in Entebbe, Uganda. Senior Leadership Training targets the highest levels (SRSG, DSRSG, Force Commander or Head of Military Component, Police Commissioner and Director of Mission Support) of field mission leadership to provide them with the knowledge needed to lead and manage field missions.
This dataset is managed by the Integrated Training Service of the UN Department of Peace Operations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a conversion of nvidia/HelpSteer2 into preference pairs based on the helpfulness score for training DPO. HelpSteer2-DPO is also licensed under CC-BY-4.0.
Dataset Description
In accordance with the following paper, HelpSteer2: Open-source dataset for training top-performing reward models we converted nvidia/HelpSteer2 dataset into a preference dataset by taking the response with the higher helpfulness score as the chosen response, with the remaining response being the… See the full description on the dataset page: https://huggingface.co/datasets/GenRM/HelpSteer2-DPO-Atsunori.
Facebook
Twitter「友智识」数码共融计划培训课程列表 - 数字政策办公室推出「友智识」数码共融计划,协助有需要人士(尤其是长者)认识及使用数码科技产品及服务。
Facebook
TwitterHuman-Like DPO Test Dataset
This repository provides a test dataset for Direct Preference Optimization (DPO) training, derived from the Human-Like DPO Dataset by HumanLLMs. The dataset is designed for experimentation and evaluation of DPO models in smaller-scale scenarios.
Dataset Overview
The dataset comprises a total of 1,000 examples, divided as follows:
Training Set: 800 examples (train.json) Validation Set: 100 examples (validation.json) Test Set: 100 examples… See the full description on the dataset page: https://huggingface.co/datasets/mlx-community/Human-Like-DPO.