Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Introduction
GitHub Repo UltraRM-13b UltraCM-13b
UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.
Facebook
TwitterWe include all the possible comparisons following the Instruct-GPT. We use the fine-grained_score. import os
import matplotlib.pyplot as plt import numpy as np import pandas as pd from datasets import load_dataset, DatasetDict from transformers import AutoTokenizer from tqdm import tqdm from transformers import AutoTokenizer
ds = load_dataset("openbmb/UltraFeedback", split="train") import itertools data = [] for example in ds: prompt = example['instruction'] responses = {}… See the full description on the dataset page: https://huggingface.co/datasets/RLHFlow/UltraFeedback-preference-standard.
Facebook
TwitterUltraFeedback - Prompts Dataset
Summary
The UltraFeedback - Prompts dataset is a processed version of the UltraFeedback dataset for model evaluation on specific aspects like helpfulness, honesty, and instruction-following.
Data Structure
Format: Conversational Type: Prompt-only
Column:
"pompt": The input question or instruction provided to the model.
Generation script
The script used to generate this dataset can be found here.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned)
This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about Argilla's approach towards UltraFeedback binarization at argilla/ultrafeedback-binarized-preferences/README.md.
Differences with argilla/ultrafeedback-binarized-preferences… See the full description on the dataset page: https://huggingface.co/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Ultrafeedback Curated
This dataset is a curated version of UltraFeedback dataset performed by Argilla (using distilabel).
Introduction
You can take a look at argilla/ultrafeedback-binarized-preferences for more context on the UltraFeedback error, but the following excerpt sums up the problem found: After visually browsing around some examples using the sort and filter feature of Argilla (sort by highest rating for chosen responses), we noticed a strong mismatch between… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-curated.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned)
This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about Argilla's approach towards UltraFeedback binarization at argilla/ultrafeedback-binarized-preferences/README.md.
Differences with argilla/ultrafeedback-binarized-preferences… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for UltraFeedback Binarized
Dataset Description
This is a pre-processed version of the UltraFeedback dataset and was used to train Zephyr-7Β-β, a state of the art chat model at the 7B parameter scale. The original UltraFeedback dataset consists of 64k prompts, where is prompt is accompanied with four model completions from a wide variety of open and proprietary models. GPT-4 is then used to assign a score to each completion, along criteria like helpfulness… See the full description on the dataset page: https://huggingface.co/datasets/zhengr/ultrafeedback_binarized.
Facebook
TwitterUltrafeedback binarized dataset using the mean of preference ratings
Introduction
This dataset contains the result of curation work performed by Argilla (using Argilla 😃). After visually browsing around 200 examples using the sort and filter feature of Argilla, we noticed a strong mismatch between the overall_score in the original UF dataset (and the Zephyr train_prefs dataset) and the quality of the chosen response. By adding the critique rationale to our Argilla… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-curation.
Facebook
TwitterUltrafeedback binarized dataset using the mean of preference ratings
Introduction
This dataset contains the result of curation work performed by Argilla (using Argilla 😃). After visually browsing around some examples using the sort and filter feature of Argilla (sort by highest rating for chosen responses), we noticed a strong mismatch between the overall_score in the original UF dataset (and the Zephyr train_prefs dataset) and the quality of the chosen response. By… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences.
Facebook
TwitterUltraFeedback GPT-3.5-Turbo Helpfulness Dataset
Summary
The UltraFeedback GPT-3.5-Turbo Helpfulness dataset contains processed user-assistant interactions filtered for helpfulness, derived from the openbmb/UltraFeedback dataset. It is designed for fine-tuning and evaluating models in alignment tasks.
Data Structure
Format: Conversational Type: Unpaired preference
Column:
"pompt": The input question or instruction provided to the model. "completion": The… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
juyoungml/UltraFeedback dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Chinese SmolTalk Dataset [中文] [English]
[OpenCSG Community] [👾github] [wechat] [Twitter]
UltraFeedback Chinese Dataset
UltraFeedback-Chinese is a Chinese version developed based on the construction method of the UltraFeedback dataset, designed specifically for training robust reward and critic models. This dataset supports two training methods: PPO (Proximal Policy Optimization) and DPO (Direct Preference Optimization). UltraFeedback-Chinese… See the full description on the dataset page: https://huggingface.co/datasets/opencsg/UltraFeedback-chinese.
Facebook
Twitterprinceton-nlp/mistral-instruct-ultrafeedback dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset for Training wzhouad/gemma-2-9b-it-WPO-HB
This dataset was curated specifically for training the wzhouad/gemma-2-9b-it-WPO-HB model in a hybrid RL setting. The prompts are sourced from the Ultrafeedback dataset, and the corresponding outputs are as follows:
On-Policy Outputs: 5 outputs generated using the gemma-2-9b-it model, based on Ultrafeedback prompts. GPT-4-turbo Outputs: 1 output generated using GPT-4-turbo, based on the same Ultrafeedback prompts.
Due to challenges… See the full description on the dataset page: https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for llama3-ultrafeedback
This dataset was used to train princeton-nlp/Llama-3-Instruct-8B-SimPO. We released an updated version of this dataset annotated with a stronger reward model: princeton-nlp/llama3-ultrafeedback-armorm. If you are interested in training other model types (e.g., Mistral, Gemma-2), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, and princeton-nlp/gemma2-ultrafeedback-armorm.
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for gemma2-ultrafeedback-armorm
This dataset was used to train princeton-nlp/gemma-2-9b-it-SimPO. If you are interested in training other model types (e.g., Mistral, Llama-3), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, princeton-nlp/llama3-ultrafeedback, and princeton-nlp/llama3-ultrafeedback-armorm.
Dataset Structure
This dataset contains around 60k training samples and 2k testing samples, following the… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm.
Facebook
TwitterUltraFeedback Extended
An extended version of UltraFeedback with more responses per instruction and a diverse pool of LLM judges.
Overview
The original UltraFeedback dataset pairs each instruction with 4 model responses scored by GPT-4. This dataset extends it in two ways:
10 response models (up from 4), using more recent and diverse LLMs. 10 judge models (instead of GPT-4 alone), each independently scoring every response on a 1--10 scale.
Importantly, the sets of… See the full description on the dataset page: https://huggingface.co/datasets/ltg/ultrafeedback-extended.
Facebook
Twittersailor2/sea-ultrafeedback dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned) KTO
A KTO signal transformed version of the highly loved UltraFeedback Binarized Preferences Cleaned, the preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback
This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned-kto.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "ultrafeedback_binarized_cleaned"
Update 1/12/2023: I've removed examples identified as faulty by Argilla - see their awesome work for more details. This is a version of the UltraFeedback binarized dataset but with TruthfulQA prompts removed and source annotations added (so you can filter out samples from different sources yourself if you want!). Please see the binarized dataset card for more information, or the original UltraFeedback dataset card.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Introduction
GitHub Repo UltraRM-13b UltraCM-13b
UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.