100+ datasets found
  1. h

    ShareGPT52K

    • huggingface.co
    Updated Apr 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryoko AI (2023). ShareGPT52K [Dataset]. https://huggingface.co/datasets/RyokoAI/ShareGPT52K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2023
    Dataset authored and provided by
    Ryoko AI
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for ShareGPT52K90K

      Dataset Summary
    

    This dataset is a collection of approximately 52,00090,000 conversations scraped via the ShareGPT API before it was shut down. These conversations include both user prompts and responses from OpenAI's ChatGPT. This repository now contains the new 90K conversations version. The previous 52K may be found in the old/ directory.

      Supported Tasks and Leaderboards
    

    text-generation

      Languages
    

    This dataset isโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/RyokoAI/ShareGPT52K.

  2. h

    ShareGPT_Vicuna_unfiltered

    • huggingface.co
    Updated Apr 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    z. (2023). ShareGPT_Vicuna_unfiltered [Dataset]. https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered
    Explore at:
    Dataset updated
    Apr 12, 2023
    Authors
    z.
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Further cleaning done. Please look through the dataset and ensure that I didn't miss anything. Update: Confirmed working method for training the model: https://huggingface.co/AlekseyKorshuk/vicuna-7b/discussions/4#64346c08ef6d5abefe42c12c Two choices:

    Removes instances of "I'm sorry, but": https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json Has instances of "I'm sorry, but":โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered.

  3. h

    ShareGPT-Unfiltered-RedPajama-Chat-format

    • huggingface.co
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fredi (2023). ShareGPT-Unfiltered-RedPajama-Chat-format [Dataset]. https://huggingface.co/datasets/Fredithefish/ShareGPT-Unfiltered-RedPajama-Chat-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2023
    Authors
    Fredi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ShareGPT unfiltered dataset in RedPajama-Chat format

    This dataset was created by converting The alpaca-lora formatted ShareGPT dataset to the format required by RedPajama-Chat. This script was used for the conversion: https://github.com/fredi-python/Alpaca2INCITE-Dataset-Converter/blob/main/convert.py WARNING: Only the first human and gpt text of each conversation from the original dataset is included in the dataset.

      The format
    

    {"text": "

  4. h

    Reflection-Dataset-ShareGPT-v2

    • huggingface.co
    Updated Sep 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maheswar KK (2024). Reflection-Dataset-ShareGPT-v2 [Dataset]. https://huggingface.co/datasets/mahiatlinux/Reflection-Dataset-ShareGPT-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2024
    Authors
    Maheswar KK
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Simple "Reflection" method dataset inspired by mattshumer

      This is the ShareGPT version. Find prompt and response pair dataset here
    

    This dataset was synthetically generated using Glaive AI. There have been structure improvements and added more rows.

  5. h

    coqa-sharegpt-format

    • huggingface.co
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BookingCare Technology .,JSC (2025). coqa-sharegpt-format [Dataset]. https://huggingface.co/datasets/BookingCare/coqa-sharegpt-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2025
    Dataset authored and provided by
    BookingCare Technology .,JSC
    Description

    BookingCare/coqa-sharegpt-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    PIPPA-shareGPT

    • huggingface.co
    Updated Sep 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian (2023). PIPPA-shareGPT [Dataset]. https://huggingface.co/datasets/kingbri/PIPPA-shareGPT
    Explore at:
    Dataset updated
    Sep 2, 2023
    Authors
    Brian
    License

    https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/

    Description

    Dataset Card: PIPPA-ShareGPT

    This is a conversion of PygmalionAI's PIPPA deduped dataset to ShareGPT format for finetuning with Axolotl. The reformat was completed via the following TypeScript project called ShareGPT-Reformat.

      Files and explanations
    

    pippa_sharegpt_raw.jsonl: The raw deduped dataset file converted to shareGPT. Roles will be defaulted to your finetuning software. pippa_sharegpt.jsonl: A shareGPT dataset with the roles as USER: and CHARACTER: for finetuningโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/kingbri/PIPPA-shareGPT.

  7. h

    WizardLM_evol_instruct_v2_196K-ShareGPT

    • huggingface.co
    Updated Apr 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Labonne (2024). WizardLM_evol_instruct_v2_196K-ShareGPT [Dataset]. https://huggingface.co/datasets/mlabonne/WizardLM_evol_instruct_v2_196K-ShareGPT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2024
    Authors
    Maxime Labonne
    Description

    mlabonne/WizardLM_evol_instruct_v2_196K-ShareGPT dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    ShareGPT-4o

    • huggingface.co
    Updated May 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenGVLab (2024). ShareGPT-4o [Dataset]. https://huggingface.co/datasets/OpenGVLab/ShareGPT-4o
    Explore at:
    Dataset updated
    May 28, 2024
    Dataset authored and provided by
    OpenGVLab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    OpenGVLab/ShareGPT-4o dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    sharegpt-deutsch

    • huggingface.co
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FreedomAI (2023). sharegpt-deutsch [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/sharegpt-deutsch
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2023
    Dataset authored and provided by
    FreedomAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Deutsch ShareGPT data translated by gpt-3.5-turbo.The dataset is used in the research related to MultilingualSIFT.

  10. reasoning-sharegpt

    • huggingface.co
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arcee AI (2024). reasoning-sharegpt [Dataset]. https://huggingface.co/datasets/arcee-ai/reasoning-sharegpt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2024
    Dataset provided by
    Arcee AI, Inc.
    Authors
    Arcee AI
    Description

    arcee-ai/reasoning-sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    guanaco-sharegpt-style

    • huggingface.co
    Updated Nov 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Schmid (2023). guanaco-sharegpt-style [Dataset]. https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2023
    Authors
    Philipp Schmid
    Description

    Dataset Card for "guanaco-sharegpt-style"

    More Information needed

  12. h

    alpaca-gpt4-sharegpt

    • huggingface.co
    Updated Jun 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinand Balachandran (2025). alpaca-gpt4-sharegpt [Dataset]. https://huggingface.co/datasets/abhinand/alpaca-gpt4-sharegpt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 8, 2025
    Authors
    Abhinand Balachandran
    Description

    abhinand/alpaca-gpt4-sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    sharegpt

    • huggingface.co
    Updated Apr 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abaojiangoa (2025). sharegpt [Dataset]. https://huggingface.co/datasets/jwjiangb/sharegpt
    Explore at:
    Dataset updated
    Apr 13, 2025
    Authors
    abaojiangoa
    Description

    jwjiangb/sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    ultrachat-sharegpt

    • huggingface.co
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenChat (2024). ultrachat-sharegpt [Dataset]. https://huggingface.co/datasets/openchat/ultrachat-sharegpt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2024
    Dataset authored and provided by
    OpenChat
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UltraChat dataset in ShareGPT format

    This is the full UltraChat dataset converted to ShareGPT format.

  15. h

    ShareGPT-Processed

    • huggingface.co
    Updated Jul 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pokai Chang (2023). ShareGPT-Processed [Dataset]. https://huggingface.co/datasets/zetavg/ShareGPT-Processed
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2023
    Authors
    Pokai Chang
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    ShareGPT-Processed

    The RyokoAI/ShareGPT52K dataset, converted to Markdown and labeled with the language used.

      Acknowledgements
    

    vinta/pangu.js โ€” To insert whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols). matthewwithanm/python-markdownify โ€” Provides a starting point to convert HTML to Markdown. BYVoid/OpenCC โ€” Conversions between Traditional Chinese and Simplified Chinese. aboSamoor/polyglotโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/zetavg/ShareGPT-Processed.

  16. h

    mmlu-sharegpt-all

    • huggingface.co
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Horus AI Labs (2025). mmlu-sharegpt-all [Dataset]. https://huggingface.co/datasets/horus-ai-labs/mmlu-sharegpt-all
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2025
    Authors
    Horus AI Labs
    Description

    horus-ai-labs/mmlu-sharegpt-all dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    ShareGPT-4o-Image

    • huggingface.co
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FreedomAI (2025). ShareGPT-4o-Image [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/ShareGPT-4o-Image
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    FreedomAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ๐Ÿ“š ShareGPT-4o-Image

    ShareGPT-4o-Image is a large-scale and high-quality image generation dataset, where all images are produced by GPT-4oโ€™s image generation capabilities. This dataset is designed to align open multimodal models with GPT-4oโ€™s strengths in visual content creation. It includes 45K text-to-image and 46K text-and-image-to-image samples, making it a useful resource for enhancing multimodal models in both image generation and editing tasks.

      Dataset Overviewโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/ShareGPT-4o-Image.
    
  18. h

    alpaca-sharegpt-data

    • huggingface.co
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RS (2024). alpaca-sharegpt-data [Dataset]. https://huggingface.co/datasets/HappyAIUser/alpaca-sharegpt-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2024
    Authors
    RS
    Description

    HappyAIUser/alpaca-sharegpt-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    synthetic_text_to_sql-ShareGPT

    • huggingface.co
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Labonne (2025). synthetic_text_to_sql-ShareGPT [Dataset]. https://huggingface.co/datasets/mlabonne/synthetic_text_to_sql-ShareGPT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 7, 2025
    Authors
    Maxime Labonne
    Description

    synthetic_text_to_sql

    ShareGPT version of gretelai/synthetic_text_to_sql using the following code: from datasets import load_dataset, DatasetDict

    Load the dataset

    dataset = load_dataset('gretelai/synthetic_text_to_sql', split='all')

    def format_sample(sample): conversations = [ { "from": "human", "value": f"{sample['sql_context']}

    {sample['sql_prompt']}" }, { "from": "gpt", "value":โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/synthetic_text_to_sql-ShareGPT.

  20. h

    openchat_sharegpt_v3

    • huggingface.co
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenChat (2025). openchat_sharegpt_v3 [Dataset]. https://huggingface.co/datasets/openchat/openchat_sharegpt_v3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2025
    Dataset authored and provided by
    OpenChat
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ShareGPT dataset for training OpenChat V3 series. See OpenChat repository for instructions. Contents:

    sharegpt_clean.json: ShareGPT dataset in original format, converted to Markdown, and with model labels. sharegpt_gpt4.json: All instances in sharegpt_clean.json with model == "Model: GPT-4". *.parquet: Pre-tokenized dataset for training specified version of OpenChat.

    Note: The dataset is NOT currently compatible with HF dataset loader. Licensed under MIT.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ryoko AI (2023). ShareGPT52K [Dataset]. https://huggingface.co/datasets/RyokoAI/ShareGPT52K

ShareGPT52K

RyokoAI/ShareGPT52K

ShareGPT 90K Conversations

Explore at:
50 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 5, 2023
Dataset authored and provided by
Ryoko AI
License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Description

Dataset Card for ShareGPT52K90K

  Dataset Summary

This dataset is a collection of approximately 52,00090,000 conversations scraped via the ShareGPT API before it was shut down. These conversations include both user prompts and responses from OpenAI's ChatGPT. This repository now contains the new 90K conversations version. The previous 52K may be found in the old/ directory.

  Supported Tasks and Leaderboards

text-generation

  Languages

This dataset isโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/RyokoAI/ShareGPT52K.

Search
Clear search
Close search
Google apps
Main menu