MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Introduction
GitHub Repo UltraRM-13b UltraCM-13b
UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned)
This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about Argilla's approach towards UltraFeedback binarization at argilla/ultrafeedback-binarized-preferences/README.md.
Differences with argilla/ultrafeedback-binarized-preferences… See the full description on the dataset page: https://huggingface.co/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Ultrafeedback Curated
This dataset is a curated version of UltraFeedback dataset performed by Argilla (using distilabel).
Introduction
You can take a look at argilla/ultrafeedback-binarized-preferences for more context on the UltraFeedback error, but the following excerpt sums up the problem found: After visually browsing around some examples using the sort and filter feature of Argilla (sort by highest rating for chosen responses), we noticed a strong mismatch between… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-curated.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for UltraFeedback Binarized
Dataset Description
This is a pre-processed version of the UltraFeedback dataset and was used to train Zephyr-7Β-β, a state of the art chat model at the 7B parameter scale. The original UltraFeedback dataset consists of 64k prompts, where is prompt is accompanied with four model completions from a wide variety of open and proprietary models. GPT-4 is then used to assign a score to each completion, along criteria like helpfulness… See the full description on the dataset page: https://huggingface.co/datasets/zhengr/ultrafeedback_binarized.
Ultrafeedback binarized dataset using the mean of preference ratings
Introduction
This dataset contains the result of curation work performed by Argilla (using Argilla 😃). After visually browsing around 200 examples using the sort and filter feature of Argilla, we noticed a strong mismatch between the overall_score in the original UF dataset (and the Zephyr train_prefs dataset) and the quality of the chosen response. By adding the critique rationale to our Argilla… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-curation.
UltraFeedback - Prompts Dataset
Summary
The UltraFeedback - Prompts dataset is a processed version of the UltraFeedback dataset for model evaluation on specific aspects like helpfulness, honesty, and instruction-following.
Data Structure
Format: Conversational Type: Prompt-only
Column:
"pompt": The input question or instruction provided to the model.
Generation script
The script used to generate this dataset can be found here.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for gemma2-ultrafeedback-armorm
This dataset was used to train princeton-nlp/gemma-2-9b-it-SimPO. If you are interested in training other model types (e.g., Mistral, Llama-3), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, princeton-nlp/llama3-ultrafeedback, and princeton-nlp/llama3-ultrafeedback-armorm.
Dataset Structure
This dataset contains around 60k training samples and 2k testing samples, following the… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for llama3-ultrafeedback
This dataset was used to train princeton-nlp/Llama-3-Instruct-8B-SimPO. We released an updated version of this dataset annotated with a stronger reward model: princeton-nlp/llama3-ultrafeedback-armorm. If you are interested in training other model types (e.g., Mistral, Gemma-2), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, and princeton-nlp/gemma2-ultrafeedback-armorm.
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback.
withpi/openbmb-UltraFeedback-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Chinese SmolTalk Dataset [中文] [English]
[OpenCSG Community] [👾github] [wechat] [Twitter]
UltraFeedback Chinese Dataset
UltraFeedback-Chinese is a Chinese version developed based on the construction method of the UltraFeedback dataset, designed specifically for training robust reward and critic models. This dataset supports two training methods: PPO (Proximal Policy Optimization) and DPO (Direct Preference Optimization). UltraFeedback-Chinese… See the full description on the dataset page: https://huggingface.co/datasets/opencsg/UltraFeedback-chinese.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned) KTO
A KTO signal transformed version of the highly loved UltraFeedback Binarized Preferences Cleaned, the preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback
This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned-kto.
alvarobartt/ultrafeedback-mini dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for ultrafeedback-instruction-dataset
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/hassaan-qaisar/ultrafeedback-instruction-dataset/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/hassaan-qaisar/ultrafeedback-instruction-dataset.
Llama 3.1 Tulu 3 Ultrafeedback (Cleaned) (on-policy 8B)
Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact. This preference dataset is part of our Tulu 3 preference mixture. It contains prompts from Ai2's cleaned version of Ultrafeedback which removes instances of TruthfulQA. We further filtered this dataset to remove… See the full description on the dataset page: https://huggingface.co/datasets/allenai/tulu-3-ultrafeedback-cleaned-on-policy-8b.
UltraFeedback GPT-3.5-Turbo Helpfulness Dataset
Summary
The UltraFeedback GPT-3.5-Turbo Helpfulness dataset contains processed user-assistant interactions filtered for helpfulness, derived from the openbmb/UltraFeedback dataset. It is designed for fine-tuning and evaluating models in alignment tasks.
Data Structure
Format: Conversational Type: Unpaired preference
Column:
"pompt": The input question or instruction provided to the model. "completion": The… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "ultrafeedback_binarized_cleaned"
Update 1/12/2023: I've removed examples identified as faulty by Argilla - see their awesome work for more details. This is a version of the UltraFeedback binarized dataset but with TruthfulQA prompts removed and source annotations added (so you can filter out samples from different sources yourself if you want!). Please see the binarized dataset card for more information, or the original UltraFeedback dataset card.
Dataset for Training wzhouad/gemma-2-9b-it-WPO-HB
This dataset was curated specifically for training the wzhouad/gemma-2-9b-it-WPO-HB model in a hybrid RL setting. The prompts are sourced from the Ultrafeedback dataset, and the corresponding outputs are as follows:
On-Policy Outputs: 5 outputs generated using the gemma-2-9b-it model, based on Ultrafeedback prompts. GPT-4-turbo Outputs: 1 output generated using GPT-4-turbo, based on the same Ultrafeedback prompts.
Due to challenges… See the full description on the dataset page: https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid.
sailor2/sea-ultrafeedback dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset for Training wzhouad/Llama3-Instruct-8B-WPO-HB
This dataset was curated specifically for training the wzhouad/Llama3-Instruct-8B-WPO-HB model in a hybrid RL setting. The prompts are sourced from the Ultrafeedback dataset, and the corresponding outputs are as follows:
On-Policy Outputs: 5 outputs generated using the meta-llama/Meta-Llama-3-8B-Instruct model, based on Ultrafeedback prompts. GPT-4-turbo Outputs: 1 output generated using GPT-4-turbo, based on the same… See the full description on the dataset page: https://huggingface.co/datasets/wzhouad/llama3-ultrafeedback-hybrid.
Dataset Card for Ultrafeedback
This dataset has been created with distilabel.
Dataset Summary
This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/Andresckamilo/Ultrafeedback/raw/main/pipeline.yaml"
or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/Andresckamilo/Ultrafeedback.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Introduction
GitHub Repo UltraRM-13b UltraCM-13b
UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.