100+ datasets found
  1. h

    UltraFeedback

    • huggingface.co
    • opendatalab.com
    Updated Oct 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenBMB (2023). UltraFeedback [Dataset]. https://huggingface.co/datasets/openbmb/UltraFeedback
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 2, 2023
    Dataset authored and provided by
    OpenBMB
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Introduction

    GitHub Repo UltraRM-13b UltraCM-13b

    UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.

  2. h

    UltraFeedback-preference-standard

    • huggingface.co
    Updated Mar 2, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RLHFlow (2026). UltraFeedback-preference-standard [Dataset]. https://huggingface.co/datasets/RLHFlow/UltraFeedback-preference-standard
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 2, 2026
    Dataset authored and provided by
    RLHFlow
    Description

    We include all the possible comparisons following the Instruct-GPT. We use the fine-grained_score. import os

    import matplotlib.pyplot as plt import numpy as np import pandas as pd from datasets import load_dataset, DatasetDict from transformers import AutoTokenizer from tqdm import tqdm from transformers import AutoTokenizer

    ds = load_dataset("openbmb/UltraFeedback", split="train") import itertools data = [] for example in ds: prompt = example['instruction'] responses = {}… See the full description on the dataset page: https://huggingface.co/datasets/RLHFlow/UltraFeedback-preference-standard.

  3. h

    ultrafeedback-prompt

    • huggingface.co
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRL (2024). ultrafeedback-prompt [Dataset]. https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2024
    Dataset authored and provided by
    TRL
    Description

    UltraFeedback - Prompts Dataset

      Summary
    

    The UltraFeedback - Prompts dataset is a processed version of the UltraFeedback dataset for model evaluation on specific aspects like helpfulness, honesty, and instruction-following.

      Data Structure
    

    Format: Conversational Type: Prompt-only

    Column:

    "pompt": The input question or instruction provided to the model.

      Generation script
    

    The script used to generate this dataset can be found here.

  4. h

    ultrafeedback-binarized-preferences-cleaned

    • huggingface.co
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farouk (2024). ultrafeedback-binarized-preferences-cleaned [Dataset]. https://huggingface.co/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2024
    Authors
    Farouk
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned)

    This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about Argilla's approach towards UltraFeedback binarization at argilla/ultrafeedback-binarized-preferences/README.md.

      Differences with argilla/ultrafeedback-binarized-preferences… See the full description on the dataset page: https://huggingface.co/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned.
    
  5. h

    ultrafeedback-curated

    • huggingface.co
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2024). ultrafeedback-curated [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-curated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2024
    Dataset authored and provided by
    Argilla
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Ultrafeedback Curated

    This dataset is a curated version of UltraFeedback dataset performed by Argilla (using distilabel).

      Introduction
    

    You can take a look at argilla/ultrafeedback-binarized-preferences for more context on the UltraFeedback error, but the following excerpt sums up the problem found: After visually browsing around some examples using the sort and filter feature of Argilla (sort by highest rating for chosen responses), we noticed a strong mismatch between… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-curated.

  6. h

    ultrafeedback-binarized-preferences-cleaned

    • huggingface.co
    Updated Dec 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2024). ultrafeedback-binarized-preferences-cleaned [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2024
    Dataset authored and provided by
    Argilla
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned)

    This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about Argilla's approach towards UltraFeedback binarization at argilla/ultrafeedback-binarized-preferences/README.md.

      Differences with argilla/ultrafeedback-binarized-preferences… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned.
    
  7. h

    ultrafeedback_binarized

    • huggingface.co
    Updated Oct 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RobinZ (2023). ultrafeedback_binarized [Dataset]. https://huggingface.co/datasets/zhengr/ultrafeedback_binarized
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2023
    Authors
    RobinZ
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for UltraFeedback Binarized

      Dataset Description
    

    This is a pre-processed version of the UltraFeedback dataset and was used to train Zephyr-7Β-β, a state of the art chat model at the 7B parameter scale. The original UltraFeedback dataset consists of 64k prompts, where is prompt is accompanied with four model completions from a wide variety of open and proprietary models. GPT-4 is then used to assign a score to each completion, along criteria like helpfulness… See the full description on the dataset page: https://huggingface.co/datasets/zhengr/ultrafeedback_binarized.

  8. h

    ultrafeedback-binarized-curation

    • huggingface.co
    Updated Nov 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2023). ultrafeedback-binarized-curation [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-binarized-curation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2023
    Dataset authored and provided by
    Argilla
    Description

    Ultrafeedback binarized dataset using the mean of preference ratings

      Introduction
    

    This dataset contains the result of curation work performed by Argilla (using Argilla 😃). After visually browsing around 200 examples using the sort and filter feature of Argilla, we noticed a strong mismatch between the overall_score in the original UF dataset (and the Zephyr train_prefs dataset) and the quality of the chosen response. By adding the critique rationale to our Argilla… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-curation.

  9. h

    ultrafeedback-binarized-preferences

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla, ultrafeedback-binarized-preferences [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Argilla
    Description

    Ultrafeedback binarized dataset using the mean of preference ratings

      Introduction
    

    This dataset contains the result of curation work performed by Argilla (using Argilla 😃). After visually browsing around some examples using the sort and filter feature of Argilla (sort by highest rating for chosen responses), we noticed a strong mismatch between the overall_score in the original UF dataset (and the Zephyr train_prefs dataset) and the quality of the chosen response. By… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences.

  10. h

    ultrafeedback-gpt-3.5-turbo-helpfulness

    • huggingface.co
    Updated Oct 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRL (2025). ultrafeedback-gpt-3.5-turbo-helpfulness [Dataset]. https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 20, 2025
    Dataset authored and provided by
    TRL
    Description

    UltraFeedback GPT-3.5-Turbo Helpfulness Dataset

      Summary
    

    The UltraFeedback GPT-3.5-Turbo Helpfulness dataset contains processed user-assistant interactions filtered for helpfulness, derived from the openbmb/UltraFeedback dataset. It is designed for fine-tuning and evaluating models in alignment tasks.

      Data Structure
    

    Format: Conversational Type: Unpaired preference

    Column:

    "pompt": The input question or instruction provided to the model. "completion": The… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness.

  11. h

    UltraFeedback

    • huggingface.co
    Updated Jan 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juyoung Suk (2024). UltraFeedback [Dataset]. https://huggingface.co/datasets/juyoungml/UltraFeedback
    Explore at:
    Dataset updated
    Jan 8, 2024
    Authors
    Juyoung Suk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    juyoungml/UltraFeedback dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    UltraFeedback-chinese

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opencsg (2025). UltraFeedback-chinese [Dataset]. https://huggingface.co/datasets/opencsg/UltraFeedback-chinese
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    opencsg
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Chinese SmolTalk Dataset [中文] [English]

    [OpenCSG Community] [👾github] [wechat] [Twitter]

      UltraFeedback Chinese Dataset
    

    UltraFeedback-Chinese is a Chinese version developed based on the construction method of the UltraFeedback dataset, designed specifically for training robust reward and critic models. This dataset supports two training methods: PPO (Proximal Policy Optimization) and DPO (Direct Preference Optimization). UltraFeedback-Chinese… See the full description on the dataset page: https://huggingface.co/datasets/opencsg/UltraFeedback-chinese.

  13. h

    mistral-instruct-ultrafeedback

    • huggingface.co
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group (2024). mistral-instruct-ultrafeedback [Dataset]. https://huggingface.co/datasets/princeton-nlp/mistral-instruct-ultrafeedback
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2024
    Authors
    Princeton NLP group
    Description

    princeton-nlp/mistral-instruct-ultrafeedback dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    gemma-2-ultrafeedback-hybrid

    • huggingface.co
    Updated Sep 3, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenxuan Zhou (2007). gemma-2-ultrafeedback-hybrid [Dataset]. https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 3, 2007
    Authors
    Wenxuan Zhou
    Description

    Dataset for Training wzhouad/gemma-2-9b-it-WPO-HB

    This dataset was curated specifically for training the wzhouad/gemma-2-9b-it-WPO-HB model in a hybrid RL setting. The prompts are sourced from the Ultrafeedback dataset, and the corresponding outputs are as follows:

    On-Policy Outputs: 5 outputs generated using the gemma-2-9b-it model, based on Ultrafeedback prompts. GPT-4-turbo Outputs: 1 output generated using GPT-4-turbo, based on the same Ultrafeedback prompts.

    Due to challenges… See the full description on the dataset page: https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid.

  15. h

    llama3-ultrafeedback

    • huggingface.co
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group (2024). llama3-ultrafeedback [Dataset]. https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 23, 2024
    Authors
    Princeton NLP group
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for llama3-ultrafeedback

    This dataset was used to train princeton-nlp/Llama-3-Instruct-8B-SimPO. We released an updated version of this dataset annotated with a stronger reward model: princeton-nlp/llama3-ultrafeedback-armorm. If you are interested in training other model types (e.g., Mistral, Gemma-2), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, and princeton-nlp/gemma2-ultrafeedback-armorm.

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback.
    
  16. h

    gemma2-ultrafeedback-armorm

    • huggingface.co
    Updated Feb 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group (2025). gemma2-ultrafeedback-armorm [Dataset]. https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 17, 2025
    Authors
    Princeton NLP group
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for gemma2-ultrafeedback-armorm

    This dataset was used to train princeton-nlp/gemma-2-9b-it-SimPO. If you are interested in training other model types (e.g., Mistral, Llama-3), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, princeton-nlp/llama3-ultrafeedback, and princeton-nlp/llama3-ultrafeedback-armorm.

      Dataset Structure
    

    This dataset contains around 60k training samples and 2k testing samples, following the… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm.

  17. h

    ultrafeedback-extended

    • huggingface.co
    Updated Mar 25, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language Technology Group (University of Oslo) (2026). ultrafeedback-extended [Dataset]. https://huggingface.co/datasets/ltg/ultrafeedback-extended
    Explore at:
    Dataset updated
    Mar 25, 2026
    Dataset authored and provided by
    Language Technology Group (University of Oslo)
    Description

    UltraFeedback Extended

    An extended version of UltraFeedback with more responses per instruction and a diverse pool of LLM judges.

      Overview
    

    The original UltraFeedback dataset pairs each instruction with 4 model responses scored by GPT-4. This dataset extends it in two ways:

    10 response models (up from 4), using more recent and diverse LLMs. 10 judge models (instead of GPT-4 alone), each independently scoring every response on a 1--10 scale.

    Importantly, the sets of… See the full description on the dataset page: https://huggingface.co/datasets/ltg/ultrafeedback-extended.

  18. h

    sea-ultrafeedback

    • huggingface.co
    Updated Feb 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sailor2 (2025). sea-ultrafeedback [Dataset]. https://huggingface.co/datasets/sailor2/sea-ultrafeedback
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 17, 2025
    Dataset authored and provided by
    Sailor2
    Description

    sailor2/sea-ultrafeedback dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    ultrafeedback-binarized-preferences-cleaned-kto

    • huggingface.co
    Updated Feb 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2024). ultrafeedback-binarized-preferences-cleaned-kto [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned-kto
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2024
    Dataset authored and provided by
    Argilla
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned) KTO

    A KTO signal transformed version of the highly loved UltraFeedback Binarized Preferences Cleaned, the preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback

    This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned-kto.

  20. h

    ultrafeedback_binarized_cleaned

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2, ultrafeedback_binarized_cleaned [Dataset]. https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Ai2
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for "ultrafeedback_binarized_cleaned"

    Update 1/12/2023: I've removed examples identified as faulty by Argilla - see their awesome work for more details. This is a version of the UltraFeedback binarized dataset but with TruthfulQA prompts removed and source annotations added (so you can filter out samples from different sources yourself if you want!). Please see the binarized dataset card for more information, or the original UltraFeedback dataset card.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
OpenBMB (2023). UltraFeedback [Dataset]. https://huggingface.co/datasets/openbmb/UltraFeedback

UltraFeedback

openbmb/UltraFeedback

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 2, 2023
Dataset authored and provided by
OpenBMB
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Introduction

GitHub Repo UltraRM-13b UltraCM-13b

UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.

Search
Clear search
Close search
Google apps
Main menu