100+ datasets found

h
ShareGPT52K
huggingface.co
Updated Apr 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryoko AI (2023). ShareGPT52K [Dataset]. https://huggingface.co/datasets/RyokoAI/ShareGPT52K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 5, 2023
Dataset authored and provided by
Ryoko AI
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for ShareGPT52K90K

Dataset Summary

This dataset is a collection of approximately 52,00090,000 conversations scraped via the ShareGPT API before it was shut down. These conversations include both user prompts and responses from OpenAI's ChatGPT. This repository now contains the new 90K conversations version. The previous 52K may be found in the old/ directory.

Supported Tasks and Leaderboards

text-generation

Languages

This dataset is… See the full description on the dataset page: https://huggingface.co/datasets/RyokoAI/ShareGPT52K.
h
ShareGPT_Vicuna_unfiltered
huggingface.co
Updated Apr 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
z. (2023). ShareGPT_Vicuna_unfiltered [Dataset]. https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered
Explore at:
Dataset updated
Apr 12, 2023
Authors
z.
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Further cleaning done. Please look through the dataset and ensure that I didn't miss anything. Update: Confirmed working method for training the model: https://huggingface.co/AlekseyKorshuk/vicuna-7b/discussions/4#64346c08ef6d5abefe42c12c Two choices:

Removes instances of "I'm sorry, but": https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json Has instances of "I'm sorry, but":… See the full description on the dataset page: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered.
h
ShareGPT-Unfiltered-RedPajama-Chat-format
huggingface.co
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fredi (2023). ShareGPT-Unfiltered-RedPajama-Chat-format [Dataset]. https://huggingface.co/datasets/Fredithefish/ShareGPT-Unfiltered-RedPajama-Chat-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2023
Authors
Fredi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ShareGPT unfiltered dataset in RedPajama-Chat format

This dataset was created by converting The alpaca-lora formatted ShareGPT dataset to the format required by RedPajama-Chat. This script was used for the conversion: https://github.com/fredi-python/Alpaca2INCITE-Dataset-Converter/blob/main/convert.py WARNING: Only the first human and gpt text of each conversation from the original dataset is included in the dataset.

The format

{"text": "
h
Reflection-Dataset-ShareGPT-v2
huggingface.co
Updated Sep 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maheswar KK (2024). Reflection-Dataset-ShareGPT-v2 [Dataset]. https://huggingface.co/datasets/mahiatlinux/Reflection-Dataset-ShareGPT-v2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 10, 2024
Authors
Maheswar KK
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Simple "Reflection" method dataset inspired by mattshumer

This is the ShareGPT version. Find prompt and response pair dataset here

This dataset was synthetically generated using Glaive AI. There have been structure improvements and added more rows.
h
coqa-sharegpt-format
huggingface.co
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BookingCare Technology .,JSC (2025). coqa-sharegpt-format [Dataset]. https://huggingface.co/datasets/BookingCare/coqa-sharegpt-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 4, 2025
Dataset authored and provided by
BookingCare Technology .,JSC
Description
BookingCare/coqa-sharegpt-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
PIPPA-shareGPT
huggingface.co
Updated Sep 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian (2023). PIPPA-shareGPT [Dataset]. https://huggingface.co/datasets/kingbri/PIPPA-shareGPT
Explore at:
Dataset updated
Sep 2, 2023
Authors
Brian
License
https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/
Description
Dataset Card: PIPPA-ShareGPT

This is a conversion of PygmalionAI's PIPPA deduped dataset to ShareGPT format for finetuning with Axolotl. The reformat was completed via the following TypeScript project called ShareGPT-Reformat.

Files and explanations

pippa_sharegpt_raw.jsonl: The raw deduped dataset file converted to shareGPT. Roles will be defaulted to your finetuning software. pippa_sharegpt.jsonl: A shareGPT dataset with the roles as USER: and CHARACTER: for finetuning… See the full description on the dataset page: https://huggingface.co/datasets/kingbri/PIPPA-shareGPT.
h
WizardLM_evol_instruct_v2_196K-ShareGPT
huggingface.co
Updated Apr 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Labonne (2024). WizardLM_evol_instruct_v2_196K-ShareGPT [Dataset]. https://huggingface.co/datasets/mlabonne/WizardLM_evol_instruct_v2_196K-ShareGPT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2024
Authors
Maxime Labonne
Description
mlabonne/WizardLM_evol_instruct_v2_196K-ShareGPT dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ShareGPT-4o
huggingface.co
Updated May 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenGVLab (2024). ShareGPT-4o [Dataset]. https://huggingface.co/datasets/OpenGVLab/ShareGPT-4o
Explore at:
Dataset updated
May 28, 2024
Dataset authored and provided by
OpenGVLab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
OpenGVLab/ShareGPT-4o dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sharegpt-deutsch
huggingface.co
Updated Aug 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FreedomAI (2023). sharegpt-deutsch [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/sharegpt-deutsch
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2023
Dataset authored and provided by
FreedomAI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Deutsch ShareGPT data translated by gpt-3.5-turbo.The dataset is used in the research related to MultilingualSIFT.
reasoning-sharegpt
huggingface.co
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arcee AI (2024). reasoning-sharegpt [Dataset]. https://huggingface.co/datasets/arcee-ai/reasoning-sharegpt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 5, 2024
Dataset provided by
Arcee AI, Inc.
Authors
Arcee AI
Description
arcee-ai/reasoning-sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community
h
guanaco-sharegpt-style
huggingface.co
Updated Nov 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philipp Schmid (2023). guanaco-sharegpt-style [Dataset]. https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 13, 2023
Authors
Philipp Schmid
Description
Dataset Card for "guanaco-sharegpt-style"

More Information needed
h
alpaca-gpt4-sharegpt
huggingface.co
Updated Jun 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinand Balachandran (2025). alpaca-gpt4-sharegpt [Dataset]. https://huggingface.co/datasets/abhinand/alpaca-gpt4-sharegpt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 8, 2025
Authors
Abhinand Balachandran
Description
abhinand/alpaca-gpt4-sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sharegpt
huggingface.co
Updated Apr 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
abaojiangoa (2025). sharegpt [Dataset]. https://huggingface.co/datasets/jwjiangb/sharegpt
Explore at:
Dataset updated
Apr 13, 2025
Authors
abaojiangoa
Description
jwjiangb/sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ultrachat-sharegpt
huggingface.co
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenChat (2024). ultrachat-sharegpt [Dataset]. https://huggingface.co/datasets/openchat/ultrachat-sharegpt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2024
Dataset authored and provided by
OpenChat
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
UltraChat dataset in ShareGPT format

This is the full UltraChat dataset converted to ShareGPT format.
h
ShareGPT-Processed
huggingface.co
Updated Jul 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pokai Chang (2023). ShareGPT-Processed [Dataset]. https://huggingface.co/datasets/zetavg/ShareGPT-Processed
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2023
Authors
Pokai Chang
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
ShareGPT-Processed

The RyokoAI/ShareGPT52K dataset, converted to Markdown and labeled with the language used.

Acknowledgements

vinta/pangu.js — To insert whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols). matthewwithanm/python-markdownify — Provides a starting point to convert HTML to Markdown. BYVoid/OpenCC — Conversions between Traditional Chinese and Simplified Chinese. aboSamoor/polyglot… See the full description on the dataset page: https://huggingface.co/datasets/zetavg/ShareGPT-Processed.
h
mmlu-sharegpt-all
huggingface.co
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Horus AI Labs (2025). mmlu-sharegpt-all [Dataset]. https://huggingface.co/datasets/horus-ai-labs/mmlu-sharegpt-all
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2025
Authors
Horus AI Labs
Description
horus-ai-labs/mmlu-sharegpt-all dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ShareGPT-4o-Image
huggingface.co
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FreedomAI (2025). ShareGPT-4o-Image [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/ShareGPT-4o-Image
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
FreedomAI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
📚 ShareGPT-4o-Image

ShareGPT-4o-Image is a large-scale and high-quality image generation dataset, where all images are produced by GPT-4o’s image generation capabilities. This dataset is designed to align open multimodal models with GPT-4o’s strengths in visual content creation. It includes 45K text-to-image and 46K text-and-image-to-image samples, making it a useful resource for enhancing multimodal models in both image generation and editing tasks.

Dataset Overview… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/ShareGPT-4o-Image.
h
alpaca-sharegpt-data
huggingface.co
Updated Oct 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RS (2024). alpaca-sharegpt-data [Dataset]. https://huggingface.co/datasets/HappyAIUser/alpaca-sharegpt-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2024
Authors
RS
Description
HappyAIUser/alpaca-sharegpt-data dataset hosted on Hugging Face and contributed by the HF Datasets community
h
synthetic_text_to_sql-ShareGPT
huggingface.co
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Labonne (2025). synthetic_text_to_sql-ShareGPT [Dataset]. https://huggingface.co/datasets/mlabonne/synthetic_text_to_sql-ShareGPT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 7, 2025
Authors
Maxime Labonne
Description
synthetic_text_to_sql

ShareGPT version of gretelai/synthetic_text_to_sql using the following code: from datasets import load_dataset, DatasetDict

Load the dataset

dataset = load_dataset('gretelai/synthetic_text_to_sql', split='all')

def format_sample(sample): conversations = [ { "from": "human", "value": f"{sample['sql_context']}

{sample['sql_prompt']}" }, { "from": "gpt", "value":… See the full description on the dataset page: https://huggingface.co/datasets/mlabonne/synthetic_text_to_sql-ShareGPT.
h
openchat_sharegpt_v3
huggingface.co
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenChat (2025). openchat_sharegpt_v3 [Dataset]. https://huggingface.co/datasets/openchat/openchat_sharegpt_v3
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2025
Dataset authored and provided by
OpenChat
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
ShareGPT dataset for training OpenChat V3 series. See OpenChat repository for instructions. Contents:

sharegpt_clean.json: ShareGPT dataset in original format, converted to Markdown, and with model labels. sharegpt_gpt4.json: All instances in sharegpt_clean.json with model == "Model: GPT-4". *.parquet: Pre-tokenized dataset for training specified version of OpenChat.

Note: The dataset is NOT currently compatible with HF dataset loader. Licensed under MIT.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ryoko AI (2023). ShareGPT52K [Dataset]. https://huggingface.co/datasets/RyokoAI/ShareGPT52K

ShareGPT52K

RyokoAI/ShareGPT52K

ShareGPT 90K Conversations

Explore at:

50 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 5, 2023

Dataset authored and provided by

Ryoko AI

License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Description

Dataset Card for ShareGPT52K90K

  Dataset Summary

This dataset is a collection of approximately 52,00090,000 conversations scraped via the ShareGPT API before it was shut down. These conversations include both user prompts and responses from OpenAI's ChatGPT. This repository now contains the new 90K conversations version. The previous 52K may be found in the old/ directory.

  Supported Tasks and Leaderboards

text-generation

  Languages

This dataset is… See the full description on the dataset page: https://huggingface.co/datasets/RyokoAI/ShareGPT52K.

Clear search

Close search

Google apps

Main menu

ShareGPT52K

ShareGPT_Vicuna_unfiltered

ShareGPT-Unfiltered-RedPajama-Chat-format

Reflection-Dataset-ShareGPT-v2

coqa-sharegpt-format

PIPPA-shareGPT

WizardLM_evol_instruct_v2_196K-ShareGPT

ShareGPT-4o

sharegpt-deutsch

reasoning-sharegpt

guanaco-sharegpt-style

alpaca-gpt4-sharegpt

sharegpt

ultrachat-sharegpt

ShareGPT-Processed

mmlu-sharegpt-all

ShareGPT-4o-Image

alpaca-sharegpt-data

synthetic_text_to_sql-ShareGPT

Load the dataset

openchat_sharegpt_v3

ShareGPT52KSee More Versions

RyokoAI/ShareGPT52K

ShareGPT 90K Conversations

ShareGPT52K