100+ datasets found

h
UltraFeedback
huggingface.co
opendatalab.com
Updated Sep 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenBMB (2023). UltraFeedback [Dataset]. https://huggingface.co/datasets/openbmb/UltraFeedback
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2023
Dataset authored and provided by
OpenBMB
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Introduction

GitHub Repo UltraRM-13b UltraCM-13b

UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.
h
ultrafeedback-binarized-preferences-cleaned
huggingface.co
Updated Apr 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farouk (2024). ultrafeedback-binarized-preferences-cleaned [Dataset]. https://huggingface.co/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 5, 2024
Authors
Farouk
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned)

This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about Argilla's approach towards UltraFeedback binarization at argilla/ultrafeedback-binarized-preferences/README.md.

Differences with argilla/ultrafeedback-binarized-preferences… See the full description on the dataset page: https://huggingface.co/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned.
h
ultrafeedback-curated
huggingface.co
Updated Dec 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2023). ultrafeedback-curated [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-curated
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 13, 2023
Dataset authored and provided by
Argilla
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Ultrafeedback Curated

This dataset is a curated version of UltraFeedback dataset performed by Argilla (using distilabel).

Introduction

You can take a look at argilla/ultrafeedback-binarized-preferences for more context on the UltraFeedback error, but the following excerpt sums up the problem found: After visually browsing around some examples using the sort and filter feature of Argilla (sort by highest rating for chosen responses), we noticed a strong mismatch between… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-curated.
h
ultrafeedback_binarized
huggingface.co
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RobinZ (2023). ultrafeedback_binarized [Dataset]. https://huggingface.co/datasets/zhengr/ultrafeedback_binarized
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 8, 2023
Authors
RobinZ
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for UltraFeedback Binarized

Dataset Description

This is a pre-processed version of the UltraFeedback dataset and was used to train Zephyr-7Β-β, a state of the art chat model at the 7B parameter scale. The original UltraFeedback dataset consists of 64k prompts, where is prompt is accompanied with four model completions from a wide variety of open and proprietary models. GPT-4 is then used to assign a score to each completion, along criteria like helpfulness… See the full description on the dataset page: https://huggingface.co/datasets/zhengr/ultrafeedback_binarized.
h
ultrafeedback-binarized-curation
huggingface.co
Updated Nov 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2023). ultrafeedback-binarized-curation [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-binarized-curation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 14, 2023
Dataset authored and provided by
Argilla
Description
Ultrafeedback binarized dataset using the mean of preference ratings

Introduction

This dataset contains the result of curation work performed by Argilla (using Argilla 😃). After visually browsing around 200 examples using the sort and filter feature of Argilla, we noticed a strong mismatch between the overall_score in the original UF dataset (and the Zephyr train_prefs dataset) and the quality of the chosen response. By adding the critique rationale to our Argilla… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-curation.
h
ultrafeedback-prompt
huggingface.co
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRL (2024). ultrafeedback-prompt [Dataset]. https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2024
Dataset authored and provided by
TRL
Description
UltraFeedback - Prompts Dataset

Summary

The UltraFeedback - Prompts dataset is a processed version of the UltraFeedback dataset for model evaluation on specific aspects like helpfulness, honesty, and instruction-following.

Data Structure

Format: Conversational Type: Prompt-only

Column:

"pompt": The input question or instruction provided to the model.

Generation script

The script used to generate this dataset can be found here.
h
gemma2-ultrafeedback-armorm
huggingface.co
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group (2024). gemma2-ultrafeedback-armorm [Dataset]. https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2024
Authors
Princeton NLP group
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for gemma2-ultrafeedback-armorm

This dataset was used to train princeton-nlp/gemma-2-9b-it-SimPO. If you are interested in training other model types (e.g., Mistral, Llama-3), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, princeton-nlp/llama3-ultrafeedback, and princeton-nlp/llama3-ultrafeedback-armorm.

Dataset Structure

This dataset contains around 60k training samples and 2k testing samples, following the… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm.
h
llama3-ultrafeedback
huggingface.co
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group (2024). llama3-ultrafeedback [Dataset]. https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2024
Authors
Princeton NLP group
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for llama3-ultrafeedback

This dataset was used to train princeton-nlp/Llama-3-Instruct-8B-SimPO. We released an updated version of this dataset annotated with a stronger reward model: princeton-nlp/llama3-ultrafeedback-armorm. If you are interested in training other model types (e.g., Mistral, Gemma-2), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, and princeton-nlp/gemma2-ultrafeedback-armorm.

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback.
h
openbmb-UltraFeedback-v2
huggingface.co
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pi Labs Inc. (2025). openbmb-UltraFeedback-v2 [Dataset]. https://huggingface.co/datasets/withpi/openbmb-UltraFeedback-v2
Explore at:
Dataset updated
Feb 21, 2025
Dataset provided by
Pi Labs, Inc.
Authors
Pi Labs Inc.
Description
withpi/openbmb-UltraFeedback-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
UltraFeedback-chinese
huggingface.co
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
opencsg (2025). UltraFeedback-chinese [Dataset]. https://huggingface.co/datasets/opencsg/UltraFeedback-chinese
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset authored and provided by
opencsg
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Chinese SmolTalk Dataset [中文] [English]

[OpenCSG Community] [👾github] [wechat] [Twitter]

UltraFeedback Chinese Dataset

UltraFeedback-Chinese is a Chinese version developed based on the construction method of the UltraFeedback dataset, designed specifically for training robust reward and critic models. This dataset supports two training methods: PPO (Proximal Policy Optimization) and DPO (Direct Preference Optimization). UltraFeedback-Chinese… See the full description on the dataset page: https://huggingface.co/datasets/opencsg/UltraFeedback-chinese.
h
ultrafeedback-binarized-preferences-cleaned-kto
huggingface.co
Updated Dec 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2024). ultrafeedback-binarized-preferences-cleaned-kto [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned-kto
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2024
Dataset authored and provided by
Argilla
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned) KTO

A KTO signal transformed version of the highly loved UltraFeedback Binarized Preferences Cleaned, the preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback

This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned-kto.
h
ultrafeedback-mini
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alvaro Bartolome, ultrafeedback-mini [Dataset]. https://huggingface.co/datasets/alvarobartt/ultrafeedback-mini
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Alvaro Bartolome
Description
alvarobartt/ultrafeedback-mini dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ultrafeedback-instruction-dataset
huggingface.co
Updated Aug 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hassaan Qaisar (2024). ultrafeedback-instruction-dataset [Dataset]. https://huggingface.co/datasets/hassaan-qaisar/ultrafeedback-instruction-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2024
Authors
Hassaan Qaisar
Description
Dataset Card for ultrafeedback-instruction-dataset

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/hassaan-qaisar/ultrafeedback-instruction-dataset/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/hassaan-qaisar/ultrafeedback-instruction-dataset.
tulu-3-ultrafeedback-cleaned-on-policy-8b
huggingface.co
Updated Apr 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2025). tulu-3-ultrafeedback-cleaned-on-policy-8b [Dataset]. https://huggingface.co/datasets/allenai/tulu-3-ultrafeedback-cleaned-on-policy-8b
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2025
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
Description
Llama 3.1 Tulu 3 Ultrafeedback (Cleaned) (on-policy 8B)

Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact. This preference dataset is part of our Tulu 3 preference mixture. It contains prompts from Ai2's cleaned version of Ultrafeedback which removes instances of TruthfulQA. We further filtered this dataset to remove… See the full description on the dataset page: https://huggingface.co/datasets/allenai/tulu-3-ultrafeedback-cleaned-on-policy-8b.
h
ultrafeedback-gpt-3.5-turbo-helpfulness
huggingface.co
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRL (2025). ultrafeedback-gpt-3.5-turbo-helpfulness [Dataset]. https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 8, 2025
Dataset authored and provided by
TRL
Description
UltraFeedback GPT-3.5-Turbo Helpfulness Dataset

Summary

The UltraFeedback GPT-3.5-Turbo Helpfulness dataset contains processed user-assistant interactions filtered for helpfulness, derived from the openbmb/UltraFeedback dataset. It is designed for fine-tuning and evaluating models in alignment tasks.

Data Structure

Format: Conversational Type: Unpaired preference

Column:

"pompt": The input question or instruction provided to the model. "completion": The… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness.
ultrafeedback_binarized_cleaned
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2023). ultrafeedback_binarized_cleaned [Dataset]. https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for "ultrafeedback_binarized_cleaned"

Update 1/12/2023: I've removed examples identified as faulty by Argilla - see their awesome work for more details. This is a version of the UltraFeedback binarized dataset but with TruthfulQA prompts removed and source annotations added (so you can filter out samples from different sources yourself if you want!). Please see the binarized dataset card for more information, or the original UltraFeedback dataset card.
h
gemma-2-ultrafeedback-hybrid
huggingface.co
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenxuan Zhou (2024). gemma-2-ultrafeedback-hybrid [Dataset]. https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Authors
Wenxuan Zhou
Description
Dataset for Training wzhouad/gemma-2-9b-it-WPO-HB

This dataset was curated specifically for training the wzhouad/gemma-2-9b-it-WPO-HB model in a hybrid RL setting. The prompts are sourced from the Ultrafeedback dataset, and the corresponding outputs are as follows:

On-Policy Outputs: 5 outputs generated using the gemma-2-9b-it model, based on Ultrafeedback prompts. GPT-4-turbo Outputs: 1 output generated using GPT-4-turbo, based on the same Ultrafeedback prompts.

Due to challenges… See the full description on the dataset page: https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid.
h
sea-ultrafeedback
huggingface.co
Updated Feb 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sailor2 (2025). sea-ultrafeedback [Dataset]. https://huggingface.co/datasets/sailor2/sea-ultrafeedback
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2025
Dataset authored and provided by
Sailor2
Description
sailor2/sea-ultrafeedback dataset hosted on Hugging Face and contributed by the HF Datasets community
h
llama3-ultrafeedback-hybrid
huggingface.co
Updated Aug 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenxuan Zhou (2024). llama3-ultrafeedback-hybrid [Dataset]. https://huggingface.co/datasets/wzhouad/llama3-ultrafeedback-hybrid
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Authors
Wenxuan Zhou
Description
Dataset for Training wzhouad/Llama3-Instruct-8B-WPO-HB

This dataset was curated specifically for training the wzhouad/Llama3-Instruct-8B-WPO-HB model in a hybrid RL setting. The prompts are sourced from the Ultrafeedback dataset, and the corresponding outputs are as follows:

On-Policy Outputs: 5 outputs generated using the meta-llama/Meta-Llama-3-8B-Instruct model, based on Ultrafeedback prompts. GPT-4-turbo Outputs: 1 output generated using GPT-4-turbo, based on the same… See the full description on the dataset page: https://huggingface.co/datasets/wzhouad/llama3-ultrafeedback-hybrid.
h
Ultrafeedback
huggingface.co
Updated Feb 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andres Montenegro (2023). Ultrafeedback [Dataset]. https://huggingface.co/datasets/Andresckamilo/Ultrafeedback
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2023
Authors
Andres Montenegro
Description
Dataset Card for Ultrafeedback

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/Andresckamilo/Ultrafeedback/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/Andresckamilo/Ultrafeedback.

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenBMB (2023). UltraFeedback [Dataset]. https://huggingface.co/datasets/openbmb/UltraFeedback

UltraFeedback

openbmb/UltraFeedback

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 26, 2023

Dataset authored and provided by

OpenBMB

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Introduction

GitHub Repo UltraRM-13b UltraCM-13b

UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.

Clear search

Close search

Google apps

Main menu

UltraFeedback

ultrafeedback-binarized-preferences-cleaned

ultrafeedback-curated

ultrafeedback_binarized

ultrafeedback-binarized-curation

ultrafeedback-prompt

gemma2-ultrafeedback-armorm

llama3-ultrafeedback

openbmb-UltraFeedback-v2

UltraFeedback-chinese

ultrafeedback-binarized-preferences-cleaned-kto

ultrafeedback-mini

ultrafeedback-instruction-dataset

tulu-3-ultrafeedback-cleaned-on-policy-8b

ultrafeedback-gpt-3.5-turbo-helpfulness

ultrafeedback_binarized_cleaned

gemma-2-ultrafeedback-hybrid

sea-ultrafeedback

llama3-ultrafeedback-hybrid

Ultrafeedback

UltraFeedbackSee More Versions

openbmb/UltraFeedback

UltraFeedback