100+ datasets found
  1. h

    UltraFeedback

    • huggingface.co
    • opendatalab.com
    Updated Sep 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenBMB (2023). UltraFeedback [Dataset]. https://huggingface.co/datasets/openbmb/UltraFeedback
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2023
    Dataset authored and provided by
    OpenBMB
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Introduction

    GitHub Repo UltraRM-13b UltraCM-13b

    UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.

  2. h

    ultrafeedback-binarized-preferences-cleaned

    • huggingface.co
    Updated Apr 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farouk (2024). ultrafeedback-binarized-preferences-cleaned [Dataset]. https://huggingface.co/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2024
    Authors
    Farouk
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned)

    This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about Argilla's approach towards UltraFeedback binarization at argilla/ultrafeedback-binarized-preferences/README.md.

      Differences with argilla/ultrafeedback-binarized-preferences… See the full description on the dataset page: https://huggingface.co/datasets/pharaouk/ultrafeedback-binarized-preferences-cleaned.
    
  3. h

    ultrafeedback-curated

    • huggingface.co
    Updated Dec 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2023). ultrafeedback-curated [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-curated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2023
    Dataset authored and provided by
    Argilla
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Ultrafeedback Curated

    This dataset is a curated version of UltraFeedback dataset performed by Argilla (using distilabel).

      Introduction
    

    You can take a look at argilla/ultrafeedback-binarized-preferences for more context on the UltraFeedback error, but the following excerpt sums up the problem found: After visually browsing around some examples using the sort and filter feature of Argilla (sort by highest rating for chosen responses), we noticed a strong mismatch between… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-curated.

  4. h

    ultrafeedback_binarized

    • huggingface.co
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RobinZ (2023). ultrafeedback_binarized [Dataset]. https://huggingface.co/datasets/zhengr/ultrafeedback_binarized
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2023
    Authors
    RobinZ
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for UltraFeedback Binarized

      Dataset Description
    

    This is a pre-processed version of the UltraFeedback dataset and was used to train Zephyr-7Β-β, a state of the art chat model at the 7B parameter scale. The original UltraFeedback dataset consists of 64k prompts, where is prompt is accompanied with four model completions from a wide variety of open and proprietary models. GPT-4 is then used to assign a score to each completion, along criteria like helpfulness… See the full description on the dataset page: https://huggingface.co/datasets/zhengr/ultrafeedback_binarized.

  5. h

    ultrafeedback-binarized-curation

    • huggingface.co
    Updated Nov 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2023). ultrafeedback-binarized-curation [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-binarized-curation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2023
    Dataset authored and provided by
    Argilla
    Description

    Ultrafeedback binarized dataset using the mean of preference ratings

      Introduction
    

    This dataset contains the result of curation work performed by Argilla (using Argilla 😃). After visually browsing around 200 examples using the sort and filter feature of Argilla, we noticed a strong mismatch between the overall_score in the original UF dataset (and the Zephyr train_prefs dataset) and the quality of the chosen response. By adding the critique rationale to our Argilla… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-curation.

  6. h

    ultrafeedback-prompt

    • huggingface.co
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRL (2024). ultrafeedback-prompt [Dataset]. https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Dataset authored and provided by
    TRL
    Description

    UltraFeedback - Prompts Dataset

      Summary
    

    The UltraFeedback - Prompts dataset is a processed version of the UltraFeedback dataset for model evaluation on specific aspects like helpfulness, honesty, and instruction-following.

      Data Structure
    

    Format: Conversational Type: Prompt-only

    Column:

    "pompt": The input question or instruction provided to the model.

      Generation script
    

    The script used to generate this dataset can be found here.

  7. h

    gemma2-ultrafeedback-armorm

    • huggingface.co
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group (2024). gemma2-ultrafeedback-armorm [Dataset]. https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Authors
    Princeton NLP group
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for gemma2-ultrafeedback-armorm

    This dataset was used to train princeton-nlp/gemma-2-9b-it-SimPO. If you are interested in training other model types (e.g., Mistral, Llama-3), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, princeton-nlp/llama3-ultrafeedback, and princeton-nlp/llama3-ultrafeedback-armorm.

      Dataset Structure
    

    This dataset contains around 60k training samples and 2k testing samples, following the… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm.

  8. h

    llama3-ultrafeedback

    • huggingface.co
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group (2024). llama3-ultrafeedback [Dataset]. https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2024
    Authors
    Princeton NLP group
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for llama3-ultrafeedback

    This dataset was used to train princeton-nlp/Llama-3-Instruct-8B-SimPO. We released an updated version of this dataset annotated with a stronger reward model: princeton-nlp/llama3-ultrafeedback-armorm. If you are interested in training other model types (e.g., Mistral, Gemma-2), please refer to their corresponding datasets: princeton-nlp/mistral-instruct-ultrafeedback, and princeton-nlp/gemma2-ultrafeedback-armorm.

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback.
    
  9. h

    openbmb-UltraFeedback-v2

    • huggingface.co
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pi Labs Inc. (2025). openbmb-UltraFeedback-v2 [Dataset]. https://huggingface.co/datasets/withpi/openbmb-UltraFeedback-v2
    Explore at:
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Pi Labs, Inc.
    Authors
    Pi Labs Inc.
    Description

    withpi/openbmb-UltraFeedback-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    UltraFeedback-chinese

    • huggingface.co
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opencsg (2025). UltraFeedback-chinese [Dataset]. https://huggingface.co/datasets/opencsg/UltraFeedback-chinese
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    opencsg
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Chinese SmolTalk Dataset [中文] [English]

    [OpenCSG Community] [👾github] [wechat] [Twitter]

      UltraFeedback Chinese Dataset
    

    UltraFeedback-Chinese is a Chinese version developed based on the construction method of the UltraFeedback dataset, designed specifically for training robust reward and critic models. This dataset supports two training methods: PPO (Proximal Policy Optimization) and DPO (Direct Preference Optimization). UltraFeedback-Chinese… See the full description on the dataset page: https://huggingface.co/datasets/opencsg/UltraFeedback-chinese.

  11. h

    ultrafeedback-binarized-preferences-cleaned-kto

    • huggingface.co
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2024). ultrafeedback-binarized-preferences-cleaned-kto [Dataset]. https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned-kto
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2024
    Dataset authored and provided by
    Argilla
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UltraFeedback - Binarized using the Average of Preference Ratings (Cleaned) KTO

    A KTO signal transformed version of the highly loved UltraFeedback Binarized Preferences Cleaned, the preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback

    This dataset represents a new iteration on top of argilla/ultrafeedback-binarized-preferences, and is the recommended and preferred dataset by Argilla to use from now on when fine-tuning on UltraFeedback. Read more about… See the full description on the dataset page: https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned-kto.

  12. h

    ultrafeedback-mini

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alvaro Bartolome, ultrafeedback-mini [Dataset]. https://huggingface.co/datasets/alvarobartt/ultrafeedback-mini
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Alvaro Bartolome
    Description

    alvarobartt/ultrafeedback-mini dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    ultrafeedback-instruction-dataset

    • huggingface.co
    Updated Aug 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hassaan Qaisar (2024). ultrafeedback-instruction-dataset [Dataset]. https://huggingface.co/datasets/hassaan-qaisar/ultrafeedback-instruction-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2024
    Authors
    Hassaan Qaisar
    Description

    Dataset Card for ultrafeedback-instruction-dataset

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/hassaan-qaisar/ultrafeedback-instruction-dataset/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/hassaan-qaisar/ultrafeedback-instruction-dataset.

  14. tulu-3-ultrafeedback-cleaned-on-policy-8b

    • huggingface.co
    Updated Apr 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2025). tulu-3-ultrafeedback-cleaned-on-policy-8b [Dataset]. https://huggingface.co/datasets/allenai/tulu-3-ultrafeedback-cleaned-on-policy-8b
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    Description

    Llama 3.1 Tulu 3 Ultrafeedback (Cleaned) (on-policy 8B)

    Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact. This preference dataset is part of our Tulu 3 preference mixture. It contains prompts from Ai2's cleaned version of Ultrafeedback which removes instances of TruthfulQA. We further filtered this dataset to remove… See the full description on the dataset page: https://huggingface.co/datasets/allenai/tulu-3-ultrafeedback-cleaned-on-policy-8b.

  15. h

    ultrafeedback-gpt-3.5-turbo-helpfulness

    • huggingface.co
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRL (2025). ultrafeedback-gpt-3.5-turbo-helpfulness [Dataset]. https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 8, 2025
    Dataset authored and provided by
    TRL
    Description

    UltraFeedback GPT-3.5-Turbo Helpfulness Dataset

      Summary
    

    The UltraFeedback GPT-3.5-Turbo Helpfulness dataset contains processed user-assistant interactions filtered for helpfulness, derived from the openbmb/UltraFeedback dataset. It is designed for fine-tuning and evaluating models in alignment tasks.

      Data Structure
    

    Format: Conversational Type: Unpaired preference

    Column:

    "pompt": The input question or instruction provided to the model. "completion": The… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness.

  16. ultrafeedback_binarized_cleaned

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2023). ultrafeedback_binarized_cleaned [Dataset]. https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for "ultrafeedback_binarized_cleaned"

    Update 1/12/2023: I've removed examples identified as faulty by Argilla - see their awesome work for more details. This is a version of the UltraFeedback binarized dataset but with TruthfulQA prompts removed and source annotations added (so you can filter out samples from different sources yourself if you want!). Please see the binarized dataset card for more information, or the original UltraFeedback dataset card.

  17. h

    gemma-2-ultrafeedback-hybrid

    • huggingface.co
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenxuan Zhou (2024). gemma-2-ultrafeedback-hybrid [Dataset]. https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 22, 2024
    Authors
    Wenxuan Zhou
    Description

    Dataset for Training wzhouad/gemma-2-9b-it-WPO-HB

    This dataset was curated specifically for training the wzhouad/gemma-2-9b-it-WPO-HB model in a hybrid RL setting. The prompts are sourced from the Ultrafeedback dataset, and the corresponding outputs are as follows:

    On-Policy Outputs: 5 outputs generated using the gemma-2-9b-it model, based on Ultrafeedback prompts. GPT-4-turbo Outputs: 1 output generated using GPT-4-turbo, based on the same Ultrafeedback prompts.

    Due to challenges… See the full description on the dataset page: https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid.

  18. h

    sea-ultrafeedback

    • huggingface.co
    Updated Feb 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sailor2 (2025). sea-ultrafeedback [Dataset]. https://huggingface.co/datasets/sailor2/sea-ultrafeedback
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 17, 2025
    Dataset authored and provided by
    Sailor2
    Description

    sailor2/sea-ultrafeedback dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    llama3-ultrafeedback-hybrid

    • huggingface.co
    Updated Aug 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenxuan Zhou (2024). llama3-ultrafeedback-hybrid [Dataset]. https://huggingface.co/datasets/wzhouad/llama3-ultrafeedback-hybrid
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 22, 2024
    Authors
    Wenxuan Zhou
    Description

    Dataset for Training wzhouad/Llama3-Instruct-8B-WPO-HB

    This dataset was curated specifically for training the wzhouad/Llama3-Instruct-8B-WPO-HB model in a hybrid RL setting. The prompts are sourced from the Ultrafeedback dataset, and the corresponding outputs are as follows:

    On-Policy Outputs: 5 outputs generated using the meta-llama/Meta-Llama-3-8B-Instruct model, based on Ultrafeedback prompts. GPT-4-turbo Outputs: 1 output generated using GPT-4-turbo, based on the same… See the full description on the dataset page: https://huggingface.co/datasets/wzhouad/llama3-ultrafeedback-hybrid.

  20. h

    Ultrafeedback

    • huggingface.co
    Updated Feb 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andres Montenegro (2023). Ultrafeedback [Dataset]. https://huggingface.co/datasets/Andresckamilo/Ultrafeedback
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2023
    Authors
    Andres Montenegro
    Description

    Dataset Card for Ultrafeedback

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/Andresckamilo/Ultrafeedback/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/Andresckamilo/Ultrafeedback.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
OpenBMB (2023). UltraFeedback [Dataset]. https://huggingface.co/datasets/openbmb/UltraFeedback

UltraFeedback

openbmb/UltraFeedback

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2023
Dataset authored and provided by
OpenBMB
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Introduction

GitHub Repo UltraRM-13b UltraCM-13b

UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models. We collect about 64k prompts from diverse resources (including UltraChat, ShareGPT, Evol-Instruct, TruthfulQA, FalseQA, and FLAN). We then use these prompts to query multiple LLMs (see Table for model lists) and generate 4 different responses for each prompt, resulting in a total of 256k samples. To… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/UltraFeedback.

Search
Clear search
Close search
Google apps
Main menu