MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is part of the Anthropic's HH data used to train their RLHF Assistant https://github.com/anthropics/hh-rlhf. The data contains the first utterance from human to the dialog agent and the number of words in that utterance. The sampled version is a random sample of size 200.
Dahoas/rm-hh-rlhf dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "hh-rlhf-h4"
More Information needed
HFXM/hh-rlhf-Rule17 dataset hosted on Hugging Face and contributed by the HF Datasets community
zekeZZ/hh-rlhf-dpo dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/llama2/https://choosealicense.com/licenses/llama2/
hh-rlhf中文翻译版本
基于Anthropic论文Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback 开源的helpful 和harmless数据,使用翻译工具进行了翻译。hh_rlhf_train.jsonl 合并中英文训练集数据 清洗过后17万条hh_rlhf_test.jsonl 合并中英文测试集数据 清洗过后9千条harmless_base_cn_train.jsonl 42394条harmless_base_cn_test.jsonl 2304条helpful_base_cn_train.jsonl 43722条helpful_base_cn_test.jsonl 2346条
实验报告
相关rlhf实验报告:https://zhuanlan.zhihu.com/p/652044120
HuggingFaceH4/h4-anthropic-hh-rlhf-helpful-base-gen dataset hosted on Hugging Face and contributed by the HF Datasets community
shbyun080/hh-rlhf-en dataset hosted on Hugging Face and contributed by the HF Datasets community
HH-RLHF-Harmless-Base Dataset
Summary
The HH-RLHF-Harmless-Base dataset is a processed version of Anthropic's HH-RLHF dataset, specifically curated to train models using the TRL library for preference learning and alignment tasks. It contains pairs of text samples, each labeled as either "chosen" or "rejected," based on human preferences regarding the harmlessness of the responses. This dataset enables models to learn human preferences in generating harmless responses… See the full description on the dataset page: https://huggingface.co/datasets/xinpeng/hh-rlhf-base.
Jise/hh-rlhf-helpful-base dataset hosted on Hugging Face and contributed by the HF Datasets community
rshwndsz/processed-hh-rlhf dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Lumen1123/hh-rlhf dataset hosted on Hugging Face and contributed by the HF Datasets community
Baidicoot/trojan-hh-rlhf-golden dataset hosted on Hugging Face and contributed by the HF Datasets community
HH-RLHF-Helpful-Base Dataset
Summary
The HH-RLHF-Helpful-Base dataset is a processed version of Anthropic's HH-RLHF dataset, specifically curated to train models using the TRL library for preference learning and alignment tasks. It contains pairs of text samples, each labeled as either "chosen" or "rejected," based on human preferences regarding the helpfulness of the responses. This dataset enables models to learn human preferences in generating helpful responses… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/hh-rlhf-helpful-base.
ondevicellm/hh-rlhf-h4 dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "hh-rlhf_with_features_flan_t5_large_lll_relabeled"
More Information needed
HFXM/hh-rlhf-Rule2 dataset hosted on Hugging Face and contributed by the HF Datasets community
fjxdaisy/hh-rlhf-entropy-rule5-b0-84 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Hh is a dataset for object detection tasks - it contains K Lk L K Lklkl annotations for 1,436 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
weepcat/hh-rlhf-eval dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is part of the Anthropic's HH data used to train their RLHF Assistant https://github.com/anthropics/hh-rlhf. The data contains the first utterance from human to the dialog agent and the number of words in that utterance. The sampled version is a random sample of size 200.