21 datasets found

h
coqa-sharegpt-format
huggingface.co
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BookingCare Technology .,JSC (2025). coqa-sharegpt-format [Dataset]. https://huggingface.co/datasets/BookingCare/coqa-sharegpt-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 4, 2025
Dataset provided by
https://bookingcare.vn/
Authors
BookingCare Technology .,JSC
Description
BookingCare/coqa-sharegpt-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Code-74k-ShareGPT-Vicuna
huggingface.co
Updated Jan 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Computations (2024). Code-74k-ShareGPT-Vicuna [Dataset]. https://huggingface.co/datasets/cognitivecomputations/Code-74k-ShareGPT-Vicuna
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2024
Dataset authored and provided by
Cognitive Computations
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Code-74k-ShareGPT-Vicuna This dataset is in Vicuna/ShareGPT format. There are around 74000 set of conversations. Each set having 2 conversations. Python, Java, JavaScript, GO, C++, Rust etc. code with detailed explanation are provided. This dataset has around 60~65% of Python code.
h
Atma3.2-ShareGPT
huggingface.co
Updated Nov 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RS (2024). Atma3.2-ShareGPT [Dataset]. https://huggingface.co/datasets/HappyAIUser/Atma3.2-ShareGPT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 16, 2024
Authors
RS
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Atma3.2-ShareGPT

This dataset contains instruction-input-output pairs converted to ShareGPT format, designed for instruction tuning and text generation tasks.

Dataset Description

The dataset consists of carefully curated instruction-input-output pairs, formatted for conversational AI training. Each entry contains:

An instruction that specifies the task An optional input providing context A detailed output that addresses the instruction

Usage… See the full description on the dataset page: https://huggingface.co/datasets/HappyAIUser/Atma3.2-ShareGPT.
h
shisa-v2-sharegpt
huggingface.co
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shisa.AI (2025). shisa-v2-sharegpt [Dataset]. https://huggingface.co/datasets/shisa-ai/shisa-v2-sharegpt
Explore at:
Dataset updated
Apr 15, 2025
Dataset provided by
Shisa.AI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
shisa-v2-sharegpt

This is an updated version of the original shisa-v1 dataset augmxnt/ultra-orca-boros-en-ja-v1 and retains the same conversations field and sharegpt formatting to facilitate its use as drop-in replacement for the original dataset. The shisa-v2 revision filters a few entries, but largely retains the exact composition and prompts of the original.

All responses have been entirely regenerated from open weight models (Athene V2, Llama 3.3 70B, and Tulu 3 405B) Outputs… See the full description on the dataset page: https://huggingface.co/datasets/shisa-ai/shisa-v2-sharegpt.
h
Nectar-ShareGPT-clean
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philip May, Nectar-ShareGPT-clean [Dataset]. https://huggingface.co/datasets/PhilipMay/Nectar-ShareGPT-clean
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Philip May
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Nectar ShareGPT Clean

This dataset is cleaned and created with 04_convert_nectar.ipynb based on berkeley-nest/Nectar. Main changes:

convert to conversations format which is supported by Axolotl - see ShareGPT only use best rank answers clean invisible characters and strip - see mltb2.text.clean_all_invisible_chars_and_strip() remove rows with empty text remove rows from multiple sources (see source column)

Licensing

Copyright (c) 2024 Philip MayCopyright (c) Banghua Zhu… See the full description on the dataset page: https://huggingface.co/datasets/PhilipMay/Nectar-ShareGPT-clean.
h
ultrachat_200k_sharegpt
huggingface.co
Updated Feb 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinand Balachandran (2024). ultrachat_200k_sharegpt [Dataset]. https://huggingface.co/datasets/abhinand/ultrachat_200k_sharegpt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 5, 2024
Authors
Abhinand Balachandran
Description
Dataset Card for UltraChat 200k

This is just the original ultrachat 200k dataset converted to sharegpt format.

Dataset Description

This is a heavily filtered version of the UltraChat dataset and was used to train Zephyr-7B-β, a state of the art 7b chat model. The original datasets consists of 1.4M dialogues generated by ChatGPT and spanning a wide range of topics. To create UltraChat 200k, we applied the following logic:

Selection of a subset of data for faster… See the full description on the dataset page: https://huggingface.co/datasets/abhinand/ultrachat_200k_sharegpt.
h
cosmopedia-japanese-subset_from_aixsatoshi_filtered-sharegpt-format-no-system-prompt_split_5...
huggingface.co
Updated Jun 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shido_wake (2024). cosmopedia-japanese-subset_from_aixsatoshi_filtered-sharegpt-format-no-system-prompt_split_5 [Dataset]. https://huggingface.co/datasets/shidowake/cosmopedia-japanese-subset_from_aixsatoshi_filtered-sharegpt-format-no-system-prompt_split_5
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2024
Authors
shido_wake
Description
shidowake/cosmopedia-japanese-subset_from_aixsatoshi_filtered-sharegpt-format-no-system-prompt_split_5 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
SCP_40k-claude-3-7-sonnet-16k-sharegpt
huggingface.co
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcus Cedric R. Idia (2025). SCP_40k-claude-3-7-sonnet-16k-sharegpt [Dataset]. https://huggingface.co/datasets/marcuscedricridia/SCP_40k-claude-3-7-sonnet-16k-sharegpt
Explore at:
Dataset updated
Apr 3, 2025
Authors
Marcus Cedric R. Idia
Description
Merged UI Dataset: SCP_40k-claude-3-7-sonnet-16k-sharegpt

This dataset was automatically generated by merging and processing the following sources: mlfoundations-dev/SCP_40k-claude-3-7-sonnet-16k Generation Timestamp: 2025-04-03 17:50:36 Processing Time: 14.17 seconds Output Format: sharegpt

Processing Summary

Total Datasets Attempted: 1 Datasets Successfully Processed: 1 Datasets Failed/Skipped: 0 Total Input Rows Scanned: 49,603 Total Formatted Entries Generated: 49… See the full description on the dataset page: https://huggingface.co/datasets/marcuscedricridia/SCP_40k-claude-3-7-sonnet-16k-sharegpt.
h
ATCgpt-Fixed
huggingface.co
Updated Dec 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RS (2024). ATCgpt-Fixed [Dataset]. https://huggingface.co/datasets/HappyAIUser/ATCgpt-Fixed
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2024
Authors
RS
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for ATCgpt-Fixed

This dataset contains instruction-input-output pairs converted to ShareGPT format, designed for instruction tuning and text generation tasks.

Dataset Description

The dataset consists of carefully curated instruction-input-output pairs, formatted for conversational AI training. Each entry contains:

An instruction that specifies the task An optional input providing context A detailed output that addresses the instruction

Usage

This… See the full description on the dataset page: https://huggingface.co/datasets/HappyAIUser/ATCgpt-Fixed.
h
MMLU-Alpaca
huggingface.co
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RS (2024). MMLU-Alpaca [Dataset]. https://huggingface.co/datasets/HappyAIUser/MMLU-Alpaca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2024
Authors
RS
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for MMLU-Alpaca

This dataset contains instruction-input-output pairs converted to ShareGPT format, designed for instruction tuning and text generation tasks.

Dataset Description

The dataset consists of carefully curated instruction-input-output pairs, formatted for conversational AI training. Each entry contains:

An instruction that specifies the task An optional input providing context A detailed output that addresses the instruction

Usage

This… See the full description on the dataset page: https://huggingface.co/datasets/HappyAIUser/MMLU-Alpaca.
h
AEZAKMI_v2_sharegpt
huggingface.co
Updated Jan 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam (2024). AEZAKMI_v2_sharegpt [Dataset]. https://huggingface.co/datasets/adamo1139/AEZAKMI_v2_sharegpt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2024
Authors
Adam
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
I moved AEZAKMI V2 in sharegpt format to a different repo so that it's easier to use with HF datasets library.
h
newnewdataset-sophie
huggingface.co
Updated Jun 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moritz Nickel (2024). newnewdataset-sophie [Dataset]. https://huggingface.co/datasets/Fischerboot/newnewdataset-sophie
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 22, 2024
Authors
Moritz Nickel
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
new version with more output examples, in sharegpt format
h
Viet-ShareGPT-4o-Text-VQA
huggingface.co
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fifth Civil Defender - 5CD (2025). Viet-ShareGPT-4o-Text-VQA [Dataset]. https://huggingface.co/datasets/5CD-AI/Viet-ShareGPT-4o-Text-VQA
Explore at:
Dataset updated
Jan 27, 2025
Dataset authored and provided by
Fifth Civil Defender - 5CD
Description
Dataset Overview

This dataset is was created from 42,678 Vietnamese 🇻🇳 images with the last GPT-4o. The dataset has superior quality compared to other existing datasets with:

Highly detailed descriptions, from the overall composition of the image to descriptions of each object, including their location, quantity, etc. Descriptions of text include not only recognition but also the font style, color, position, and size of the text. Answers are very long and detailed, including… See the full description on the dataset page: https://huggingface.co/datasets/5CD-AI/Viet-ShareGPT-4o-Text-VQA.
h
Contexual-RAG-Relations-Dataset
huggingface.co
Updated Mar 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZySec AI (2025). Contexual-RAG-Relations-Dataset [Dataset]. https://huggingface.co/datasets/ZySec-AI/Contexual-RAG-Relations-Dataset
Explore at:
Dataset updated
Mar 23, 2025
Dataset authored and provided by
ZySec AI
Description
Crawlify Pronoun Replacement Dataset

This dataset contains conversation pairs for training a model to replace pronouns with full names and relevant details.

Format

Each example in the dataset follows the ShareGPT format: { "conversations": [ { "from": "system", "value": "system message" }, { "from": "human", "value": "input text" }, { "from": "assistant"… See the full description on the dataset page: https://huggingface.co/datasets/ZySec-AI/Contexual-RAG-Relations-Dataset.
h
Literotica-RP-Conversion-test-1
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
minipasila (2025). Literotica-RP-Conversion-test-1 [Dataset]. https://huggingface.co/datasets/mpasila/Literotica-RP-Conversion-test-1
Explore at:
Dataset updated
May 28, 2025
Authors
minipasila
Description
Uses ShareGPT. This is just a quick test, I was gonna do more but Grok 3 is not that cheap.. and scaling it is gonna cost. But it seems to at least know what I wanted it to do. (Other models had annoying issues.) System prompt for the generation of this data: You're a bot that transforms stories into human/gpt roled conversations in ShareGPT formatting in .json meaning new lines use and so on. You're supposed to transform the story into a roleplay conversation between an user(human) and the… See the full description on the dataset page: https://huggingface.co/datasets/mpasila/Literotica-RP-Conversion-test-1.
h
CoTton-6k
huggingface.co
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Newstar Research ASIA (2025). CoTton-6k [Dataset]. https://huggingface.co/datasets/NewstaR/CoTton-6k
Explore at:
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Newstar Research ASIA
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for CoTton-6k

🧠 Dataset Summary

CoTton-6k is a 6,000-example dataset of soft reasoning conversations in the ShareGPT format. Each entry contains an exchange between a user and a model, showcasing high-quality Chain-of-Thought (CoT) reasoning in natural language. The dataset is distilled from 3 cutting-edge open LLMs:

Qwen3 AM Thinking QwQ

The name CoTton encodes multiple layers of meaning:

CoT — Chain-of-Thought is embedded in the name. TON — The dataset… See the full description on the dataset page: https://huggingface.co/datasets/NewstaR/CoTton-6k.
h
Reddit-Writing-SGPT
huggingface.co
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bintang Fortuna (2024). Reddit-Writing-SGPT [Dataset]. https://huggingface.co/datasets/BintangFortuna/Reddit-Writing-SGPT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2024
Dataset authored and provided by
Bintang Fortuna
Description
I forgot if this dataset is the dirty version of Reddit Writing Prompts or not, it's probably a mix of both. The data was filtered and classified using Lilac with two embedding models:

jinaai/jina-embeddings-v2-base-en BAAI/bge-m3

(Note: Lilac is amazing BTW, and the UI is nice. Highly recommended for data processing tasks) The dataset has been converted to ShareGPT format, including word counts for responses and labeled perspectives. While the labeling may not be 100% accurate, ambiguous… See the full description on the dataset page: https://huggingface.co/datasets/BintangFortuna/Reddit-Writing-SGPT.
h
ichikara-instruction-003-sharegpt
huggingface.co
Updated Dec 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataPilot (2024). ichikara-instruction-003-sharegpt [Dataset]. https://huggingface.co/datasets/DataPilot/ichikara-instruction-003-sharegpt
Explore at:
Dataset updated
Dec 21, 2024
Dataset authored and provided by
DataPilot
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
ichikara-instruction-003-sharegpt Dataset by DataPilot

データセット概要 (Dataset Summary)

このデータセットは、kinokokoro/ichikara-instruction-003 で公開されている日本語インストラクションデータを、広く利用されている ShareGPT形式に変換したものです。変換および公開は DataPilot が行いました。元データセットは、様々な質問に対して人間が作成した回答が含まれており、日本語の大規模言語モデル（LLM）のファインチューニングに有用です。このShareGPT形式版は、特に会話形式のデータ入力を想定したモデルの学習に適しています。注意: 元データセットには、1つの質問に対して複数の回答が存在する場合があります。このShareGPT形式データセットでは、各「質問と回答のペア」を独立した一つの会話データとして扱っています。

データ形式 (Data Format)

データはJSON… See the full description on the dataset page: https://huggingface.co/datasets/DataPilot/ichikara-instruction-003-sharegpt.
h
German-RAG-ORPO-ShareGPT-HESSIAN-AI
huggingface.co
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avemio AG (2024). German-RAG-ORPO-ShareGPT-HESSIAN-AI [Dataset]. https://huggingface.co/datasets/avemio/German-RAG-ORPO-ShareGPT-HESSIAN-AI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2024
Dataset authored and provided by
Avemio AG
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
German-RAG-ORPO (Odds Ratio Preference Optimization) ShareGPT-Format

German-RAG - German Retrieval Augmented Generation Dataset Summary

The ORPO Tasks Dataset represents a specialized collection for fine-tuning language models with a focus on RAG-specific capabilities. The subsets can be for this training step are derived from 3 different sources:

SauerkrautLM Preference Datasets: SauerkrautLM-Fermented-GER-DPO: is a specialized dataset designed for training… See the full description on the dataset page: https://huggingface.co/datasets/avemio/German-RAG-ORPO-ShareGPT-HESSIAN-AI.
h
function-calling_chatml_gemma_v1
huggingface.co
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicky (2024). function-calling_chatml_gemma_v1 [Dataset]. https://huggingface.co/datasets/NickyNicky/function-calling_chatml_gemma_v1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2024
Authors
Nicky
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
take dataset.

hiyouga/glaive-function-calling-v2-sharegpt

image tokens (Min: 60 Max: 2099). format gemma template.

Facebook

Twitter

Click to copy link

Link copied

Cite

BookingCare Technology .,JSC (2025). coqa-sharegpt-format [Dataset]. https://huggingface.co/datasets/BookingCare/coqa-sharegpt-format

coqa-sharegpt-format

BookingCare/coqa-sharegpt-format

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 4, 2025

Dataset provided by

https://bookingcare.vn/

Authors

BookingCare Technology .,JSC

Description

BookingCare/coqa-sharegpt-format dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

coqa-sharegpt-format

Code-74k-ShareGPT-Vicuna

Atma3.2-ShareGPT

shisa-v2-sharegpt

Nectar-ShareGPT-clean

ultrachat_200k_sharegpt

cosmopedia-japanese-subset_from_aixsatoshi_filtered-sharegpt-format-no-system-prompt_split_5...

SCP_40k-claude-3-7-sonnet-16k-sharegpt

ATCgpt-Fixed

MMLU-Alpaca

AEZAKMI_v2_sharegpt

newnewdataset-sophie

Viet-ShareGPT-4o-Text-VQA

Contexual-RAG-Relations-Dataset

Literotica-RP-Conversion-test-1

CoTton-6k

Reddit-Writing-SGPT

ichikara-instruction-003-sharegpt

German-RAG-ORPO-ShareGPT-HESSIAN-AI

function-calling_chatml_gemma_v1

coqa-sharegpt-format

BookingCare/coqa-sharegpt-format