Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Datasets formats on the Hugging Face Hub
Every day, we check the proportion of data formats among the datasets published on Hugging Face. The data is published at https://huggingface.co/datasets/severo/dataset-formats. The count includes all the datasets supported by the dataset viewer, and only for the supported formats. By dataset format, we refer to the native format of the data. All the supported datasets are also available as Parquet. See… See the full description on the dataset page: https://huggingface.co/datasets/severo/dataset-formats.
Facebook
TwitterHuggingFaceH4/h4-tests-format-dpo-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterchupei/format-text dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterfrom datasets import load_dataset, features
def format(examples): """ Convert prompt from "xxx" to [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "xxx"}]}] and chosen and rejected from "xxx" to [{"role": "assistant", "content": [{"type": "text", "text": "xxx"}]}]. Images are wrapped in a list. """ output = {"images": [], "prompt": [], "chosen": [], "rejected": []} for image, question, chosen, rejected in zip(examples["image"]… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceH4/rlaif-v_formatted.
Facebook
Twitterskdrx/python-dpo-dataset-complete-just-formatting dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitter04RR/formatted-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Estwld/formatted-hh-rlhf dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterNa0s/sft-ready-Text-Generation-Augmented-Data-Alpaca-Format dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDelta-Vector/Tauri-Complex-JSON-Formatting dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittercouchpotato888/dolly-lora-data-format dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterKLL505/Med-Dataset-Formatted dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ShareGPT unfiltered dataset in RedPajama-Chat format
This dataset was created by converting The alpaca-lora formatted ShareGPT dataset to the format required by RedPajama-Chat. This script was used for the conversion: https://github.com/fredi-python/Alpaca2INCITE-Dataset-Converter/blob/main/convert.py WARNING: Only the first human and gpt text of each conversation from the original dataset is included in the dataset.
The format
{"text": "
Facebook
TwitterBookingCare/coqa-sharegpt-format dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterswakhil09/formatted-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is the blind eval dataset of high-quality, diverse, human-written instructions with demonstrations. We will be using this for step 3 evaluations in our RLHF pipeline.
Facebook
TwitterBisher/ClArTTS-HF-format dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterallenai/rlvr-code-data-python-r1-format-filtered dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
artemkoverchik/taxonomies-dataset-alpaca-prompt-format dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
AgentWaller/german-oasst1-qa-format dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDO NOT DELETE ME! I'M USED IN THE H4 UNIT TESTS :)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Datasets formats on the Hugging Face Hub
Every day, we check the proportion of data formats among the datasets published on Hugging Face. The data is published at https://huggingface.co/datasets/severo/dataset-formats. The count includes all the datasets supported by the dataset viewer, and only for the supported formats. By dataset format, we refer to the native format of the data. All the supported datasets are also available as Parquet. See… See the full description on the dataset page: https://huggingface.co/datasets/severo/dataset-formats.