https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Ilya Gusev (From Huggingface) [source]
The GPT Roleplay Realm dataset is a valuable resource for enhancing the capabilities of language models in the realm of character role-playing. Specifically designed to facilitate immersive role-playing experiences, this dataset is comprised of character cards generated by GPT models. These character cards contain essential information such as names, greetings, example dialogues, context, topics of interest, dialogues involving the characters, and image prompts.
With a focus on enriching language models' ability to engage in dynamic and realistic interactions with fictional characters, this dataset provides users with a diverse range of well-rounded characters to incorporate into their role-playing scenarios. Each character card includes a name that gives them an individual identity and distinction within the narrative.
Additionally, context descriptions offer crucial background information about each character's history or personality traits that can lend depth and authenticity to their portrayal. Greetings act as introductory statements that set the tone for interactions with these virtual personas.
Example dialogues showcase how these characters might converse within specific scenarios or settings. These conversations serve as guidelines for users when constructing interactive narratives or engaging in linguistic exchanges with these language model-generated characters.
Moreover, topics provided on each character card indicate the areas of expertise or interests that are inherent to each persona within the realm created by GPT models. This information enables users to generate dialogue that aligns with each character's unique knowledge base or passions.
Furthermore, dialogues involving additional participants allow for multi-person exchanges and enable more intricate storytelling possibilities within virtual worlds. This feature enhances user engagement by promoting collaborative storytelling among multiple AI-generated characters.
To enhance visual immersion and aid user creativity during role-playing experiences, image prompts are also included on each character card. These suggestive visuals stimulate users' imagination regarding how each character may appear physically based on their described features or characteristics.
In conclusion, by providing extensive details about fictional personas generated by language models via sample dialogues along with their relevant context descriptions, interests/topics listicles paired up provocative visual prompts, the GPT Roleplay Realm dataset elevates the standards of language models in creating immersive and engaging role-playing experiences
How to Use This Dataset: GPT Roleplay Realm
Welcome to the GPT Roleplay Realm dataset! This guide will help you navigate and make the best use of this enhanced character role-playing dataset.
Overview
The GPT Roleplay Realm dataset consists of character cards generated by GPT models. These character cards contain names, greetings, example dialogues, context, topics of interest, dialogues involving the characters, and image prompts. The purpose of this dataset is to provide language models with rich information about fictional characters that can be used for immersive role-playing experiences.
Understanding the Columns
The dataset is primarily organized into several columns:
name
: The name of the character.context
: A brief description or background information about the character.greeting
: The initial greeting or introduction phrase of each character.example_dialogue
: A sample dialogue or conversation involving each character.topics
: The topics or themes that each character is knowledgeable or interested in.dialogues
: Additional dialogues or conversations involving each character.image_prompt
: Prompts or descriptions for images that represent each character.Getting Started
When exploring this dataset, it may be helpful to first get a sense of all the available characters by examining their names using the name column.
You can then dive deeper into a specific character's information by exploring their context in order to understand their background and story.
To engage with a specific character in a role-playing scenario, start by using their provided greeting as an introductory statement towards them.
If you want to understand how different characters interact with ...
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ChatGPT-RealUser-2.2M: A Large-Scale Dataset of Real-User, Real-World ChatGPT Conversations
ChatGPT-RealUser-2.2M is a large-scale dataset of real-user, Real-World ChatGPT conversations developed by Gata. From 2024–2025, participants using Gata’s GPT-to-Earn product opted in to share their chats and earned points based on conversation quality. The dataset covers GPT-3.5, GPT-4, and o1 models, and contains 2,244,389 conversations from 15,316 unique users. Because many chats are… See the full description on the dataset page: https://huggingface.co/datasets/Gata-community/ChatGPT-RealUser-2.2M-preview.
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Bitext - Insurance Tagged Training Dataset for LLM-based Virtual Assistants
Overview
This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [insurance] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-insurance-llm-chatbot-training-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Conversational agents based on large language models (LLMs) have shown moderate efficacy in reducing depressive and anxiety symptoms. However, most existing evaluations lack methodological transparency, rely on closed-source models, and show limited standardization in performance and safety assessment.Objective: We have two study objectives: (1) to develop an LLM-based conversational agent through system design analysis and initial functionality testing, and (2) to evaluate its safety and performance through standardized assessment in controlled simulated interactions focused on depression and anxiety of two LLMs (GPT-4o and Llama 3.1-8B).Methods: We conducted a cross-sectional study in two phases. First, we developed a mental health platform integrating a conversational agent with functionalities including personalized context, pretrained therapeutic modules, self-assessment tools, and an emergency alert system. Second, we evaluated the agent’s responses in simulated interactions based on predefined user personas for each LLM. Four expert raters assessed 816 interaction pairs using a 5-criterion Likert scale evaluating tone, clarity, domain accuracy (correctness), robustness, completeness, boundaries, target language, and safety. In addition, we use quantitative performance metrics such as cost, response length, and number of tokens. Multiple linear regression models were used to compare LLM performance and assess metric interrelations.Results: First, we developed a web-based mental health platform using a user-centered design, structured into frontend, backend, and database layers. The system integrates therapeutic chat (GPT-4o and Llama 3.1-8B), psychological assessments (PHQ-9, GAD-7), CBT-based tasks, and an emergency alert system. The platform supports secure user authentication, data encryption, multilingual access, and session tracking. Second, GPT-4o outperformed Llama 3.1-8B in both quantitative and qualitative metrics, generating longer and more lexically diverse responses, using more tokens, and scoring higher in clarity, robustness, completeness, boundaries, and target language. However, it incurred higher costs, with no significant differences in tone, accuracy, or safety.Conclusion: Our study presents a conversational agent with multiple functionalities and shows that GPT-4o outperforms Llama 3.1-8B in performance, although at a higher cost. This platform could be used in future clinical trials or real-world implementation studies.
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Dataset Card for WildChat
Note: a newer version with 4.8 million conversations and demographic information can be found here.
Dataset Description
Paper: https://arxiv.org/abs/2405.01470
Interactive Search Tool: https://wildvisualizer.com (paper)
License: ODC-BY
Language(s) (NLP): multi-lingual
Point of Contact: Yuntian Deng
Dataset Summary
WildChat is a collection of 650K conversations between human users and ChatGPT. We collected WildChat… See the full description on the dataset page: https://huggingface.co/datasets/allenai/WildChat.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MemGPT
This is the self-instruct dataset of MSC conversations used for MemGPT paper. For more information please refer to memgpt.ai The MSC dataset is a multi-round human conversations. In this dataset, our goal is to come up with a conversation opener, that is personalized to the user by referencing topics from the previous conversations. These were generated while evaluating MemGPT.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Ilya Gusev (From Huggingface) [source]
The GPT Roleplay Realm dataset is a valuable resource for enhancing the capabilities of language models in the realm of character role-playing. Specifically designed to facilitate immersive role-playing experiences, this dataset is comprised of character cards generated by GPT models. These character cards contain essential information such as names, greetings, example dialogues, context, topics of interest, dialogues involving the characters, and image prompts.
With a focus on enriching language models' ability to engage in dynamic and realistic interactions with fictional characters, this dataset provides users with a diverse range of well-rounded characters to incorporate into their role-playing scenarios. Each character card includes a name that gives them an individual identity and distinction within the narrative.
Additionally, context descriptions offer crucial background information about each character's history or personality traits that can lend depth and authenticity to their portrayal. Greetings act as introductory statements that set the tone for interactions with these virtual personas.
Example dialogues showcase how these characters might converse within specific scenarios or settings. These conversations serve as guidelines for users when constructing interactive narratives or engaging in linguistic exchanges with these language model-generated characters.
Moreover, topics provided on each character card indicate the areas of expertise or interests that are inherent to each persona within the realm created by GPT models. This information enables users to generate dialogue that aligns with each character's unique knowledge base or passions.
Furthermore, dialogues involving additional participants allow for multi-person exchanges and enable more intricate storytelling possibilities within virtual worlds. This feature enhances user engagement by promoting collaborative storytelling among multiple AI-generated characters.
To enhance visual immersion and aid user creativity during role-playing experiences, image prompts are also included on each character card. These suggestive visuals stimulate users' imagination regarding how each character may appear physically based on their described features or characteristics.
In conclusion, by providing extensive details about fictional personas generated by language models via sample dialogues along with their relevant context descriptions, interests/topics listicles paired up provocative visual prompts, the GPT Roleplay Realm dataset elevates the standards of language models in creating immersive and engaging role-playing experiences
How to Use This Dataset: GPT Roleplay Realm
Welcome to the GPT Roleplay Realm dataset! This guide will help you navigate and make the best use of this enhanced character role-playing dataset.
Overview
The GPT Roleplay Realm dataset consists of character cards generated by GPT models. These character cards contain names, greetings, example dialogues, context, topics of interest, dialogues involving the characters, and image prompts. The purpose of this dataset is to provide language models with rich information about fictional characters that can be used for immersive role-playing experiences.
Understanding the Columns
The dataset is primarily organized into several columns:
name
: The name of the character.context
: A brief description or background information about the character.greeting
: The initial greeting or introduction phrase of each character.example_dialogue
: A sample dialogue or conversation involving each character.topics
: The topics or themes that each character is knowledgeable or interested in.dialogues
: Additional dialogues or conversations involving each character.image_prompt
: Prompts or descriptions for images that represent each character.Getting Started
When exploring this dataset, it may be helpful to first get a sense of all the available characters by examining their names using the name column.
You can then dive deeper into a specific character's information by exploring their context in order to understand their background and story.
To engage with a specific character in a role-playing scenario, start by using their provided greeting as an introductory statement towards them.
If you want to understand how different characters interact with ...