100+ datasets found
  1. h

    dataset-formats

    • huggingface.co
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sylvain Lesage (2024). dataset-formats [Dataset]. https://huggingface.co/datasets/severo/dataset-formats
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2024
    Authors
    Sylvain Lesage
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Datasets formats on the Hugging Face Hub

    Every day, we check the proportion of data formats among the datasets published on Hugging Face. The data is published at https://huggingface.co/datasets/severo/dataset-formats. The count includes all the datasets supported by the dataset viewer, and only for the supported formats. By dataset format, we refer to the native format of the data. All the supported datasets are also available as Parquet. See… See the full description on the dataset page: https://huggingface.co/datasets/severo/dataset-formats.

  2. h4-tests-format-dpo-dataset

    • huggingface.co
    Updated Mar 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face H4 (2024). h4-tests-format-dpo-dataset [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/h4-tests-format-dpo-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    Description

    HuggingFaceH4/h4-tests-format-dpo-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    format-text

    • huggingface.co
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pei Chu (2024). format-text [Dataset]. https://huggingface.co/datasets/chupei/format-text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Pei Chu
    Description

    chupei/format-text dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. rlaif-v_formatted

    • huggingface.co
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face H4 (2024). rlaif-v_formatted [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/rlaif-v_formatted
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    Description

    from datasets import load_dataset, features

    def format(examples): """ Convert prompt from "xxx" to [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "xxx"}]}] and chosen and rejected from "xxx" to [{"role": "assistant", "content": [{"type": "text", "text": "xxx"}]}]. Images are wrapped in a list. """ output = {"images": [], "prompt": [], "chosen": [], "rejected": []} for image, question, chosen, rejected in zip(examples["image"]… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceH4/rlaif-v_formatted.

  5. h

    python-dpo-dataset-complete-just-formatting

    • huggingface.co
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    michael ilie (2024). python-dpo-dataset-complete-just-formatting [Dataset]. https://huggingface.co/datasets/skdrx/python-dpo-dataset-complete-just-formatting
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2024
    Authors
    michael ilie
    Description

    skdrx/python-dpo-dataset-complete-just-formatting dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    formatted-dataset

    • huggingface.co
    Updated Sep 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Rajesh (2023). formatted-dataset [Dataset]. https://huggingface.co/datasets/04RR/formatted-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 7, 2023
    Authors
    Rohit Rajesh
    Description

    04RR/formatted-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    formatted-hh-rlhf

    • huggingface.co
    Updated May 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhangyiqun (2024). formatted-hh-rlhf [Dataset]. https://huggingface.co/datasets/Estwld/formatted-hh-rlhf
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2024
    Authors
    zhangyiqun
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Estwld/formatted-hh-rlhf dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    sft-ready-Text-Generation-Augmented-Data-Alpaca-Format

    • huggingface.co
    Updated Dec 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Janati (2024). sft-ready-Text-Generation-Augmented-Data-Alpaca-Format [Dataset]. https://huggingface.co/datasets/Na0s/sft-ready-Text-Generation-Augmented-Data-Alpaca-Format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2024
    Authors
    Ali Janati
    Description

    Na0s/sft-ready-Text-Generation-Augmented-Data-Alpaca-Format dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    Tauri-Complex-JSON-Formatting

    • huggingface.co
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DV (2025). Tauri-Complex-JSON-Formatting [Dataset]. https://huggingface.co/datasets/Delta-Vector/Tauri-Complex-JSON-Formatting
    Explore at:
    Dataset updated
    Oct 20, 2025
    Authors
    DV
    Description

    Delta-Vector/Tauri-Complex-JSON-Formatting dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    dolly-lora-data-format

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phil Wee, dolly-lora-data-format [Dataset]. https://huggingface.co/datasets/couchpotato888/dolly-lora-data-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Phil Wee
    Description

    couchpotato888/dolly-lora-data-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    Med-Dataset-Formatted

    • huggingface.co
    Updated Jan 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenneth Ly (2024). Med-Dataset-Formatted [Dataset]. https://huggingface.co/datasets/KLL505/Med-Dataset-Formatted
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 31, 2024
    Authors
    Kenneth Ly
    Description

    KLL505/Med-Dataset-Formatted dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    ShareGPT-Unfiltered-RedPajama-Chat-format

    • huggingface.co
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fredi (2023). ShareGPT-Unfiltered-RedPajama-Chat-format [Dataset]. https://huggingface.co/datasets/Fredithefish/ShareGPT-Unfiltered-RedPajama-Chat-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2023
    Authors
    Fredi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ShareGPT unfiltered dataset in RedPajama-Chat format

    This dataset was created by converting The alpaca-lora formatted ShareGPT dataset to the format required by RedPajama-Chat. This script was used for the conversion: https://github.com/fredi-python/Alpaca2INCITE-Dataset-Converter/blob/main/convert.py WARNING: Only the first human and gpt text of each conversation from the original dataset is included in the dataset.

      The format
    

    {"text": "

  13. h

    coqa-sharegpt-format

    • huggingface.co
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BookingCare Technology .,JSC (2025). coqa-sharegpt-format [Dataset]. https://huggingface.co/datasets/BookingCare/coqa-sharegpt-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2025
    Dataset authored and provided by
    BookingCare Technology .,JSC
    Description

    BookingCare/coqa-sharegpt-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    formatted-dataset

    • huggingface.co
    Updated Oct 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swakhil M (2024). formatted-dataset [Dataset]. https://huggingface.co/datasets/swakhil09/formatted-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2024
    Authors
    Swakhil M
    Description

    swakhil09/formatted-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. instruction-dataset

    • huggingface.co
    • opendatalab.com
    Updated Feb 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face H4 (2023). instruction-dataset [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/instruction-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is the blind eval dataset of high-quality, diverse, human-written instructions with demonstrations. We will be using this for step 3 evaluations in our RLHF pipeline.

  16. h

    ClArTTS-HF-format

    • huggingface.co
    Updated Jul 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Bisher tello (2025). ClArTTS-HF-format [Dataset]. https://huggingface.co/datasets/Bisher/ClArTTS-HF-format
    Explore at:
    Dataset updated
    Jul 25, 2025
    Authors
    Mohamad Bisher tello
    Description

    Bisher/ClArTTS-HF-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. rlvr-code-data-python-r1-format-filtered

    • huggingface.co
    Updated Jul 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2025). rlvr-code-data-python-r1-format-filtered [Dataset]. https://huggingface.co/datasets/allenai/rlvr-code-data-python-r1-format-filtered
    Explore at:
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    Description

    allenai/rlvr-code-data-python-r1-format-filtered dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    taxonomies-dataset-alpaca-prompt-format

    • huggingface.co
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artem Koverchik (2025). taxonomies-dataset-alpaca-prompt-format [Dataset]. https://huggingface.co/datasets/artemkoverchik/taxonomies-dataset-alpaca-prompt-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2025
    Authors
    Artem Koverchik
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    artemkoverchik/taxonomies-dataset-alpaca-prompt-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    german-oasst1-qa-format

    • huggingface.co
    Updated Jan 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Waller (2025). german-oasst1-qa-format [Dataset]. https://huggingface.co/datasets/AgentWaller/german-oasst1-qa-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 4, 2025
    Authors
    Kevin Waller
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    AgentWaller/german-oasst1-qa-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h4-tests-format-sft-dataset

    • huggingface.co
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face H4 (2024). h4-tests-format-sft-dataset [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/h4-tests-format-sft-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    Description

    DO NOT DELETE ME! I'M USED IN THE H4 UNIT TESTS :)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sylvain Lesage (2024). dataset-formats [Dataset]. https://huggingface.co/datasets/severo/dataset-formats

dataset-formats

severo/dataset-formats

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2024
Authors
Sylvain Lesage
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Datasets formats on the Hugging Face Hub

Every day, we check the proportion of data formats among the datasets published on Hugging Face. The data is published at https://huggingface.co/datasets/severo/dataset-formats. The count includes all the datasets supported by the dataset viewer, and only for the supported formats. By dataset format, we refer to the native format of the data. All the supported datasets are also available as Parquet. See… See the full description on the dataset page: https://huggingface.co/datasets/severo/dataset-formats.

Search
Clear search
Close search
Google apps
Main menu