7 datasets found
  1. GPT Roleplay Realm: Enhanced Character

    • kaggle.com
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). GPT Roleplay Realm: Enhanced Character [Dataset]. https://www.kaggle.com/datasets/thedevastator/gpt-roleplay-realm-enhanced-character-role-playi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    GPT Roleplay Realm: Enhanced Character Role-playing Dataset

    Character Cards and Dialogues for immersive role-playing experiences

    By Ilya Gusev (From Huggingface) [source]

    About this dataset

    The GPT Roleplay Realm dataset is a valuable resource for enhancing the capabilities of language models in the realm of character role-playing. Specifically designed to facilitate immersive role-playing experiences, this dataset is comprised of character cards generated by GPT models. These character cards contain essential information such as names, greetings, example dialogues, context, topics of interest, dialogues involving the characters, and image prompts.

    With a focus on enriching language models' ability to engage in dynamic and realistic interactions with fictional characters, this dataset provides users with a diverse range of well-rounded characters to incorporate into their role-playing scenarios. Each character card includes a name that gives them an individual identity and distinction within the narrative.

    Additionally, context descriptions offer crucial background information about each character's history or personality traits that can lend depth and authenticity to their portrayal. Greetings act as introductory statements that set the tone for interactions with these virtual personas.

    Example dialogues showcase how these characters might converse within specific scenarios or settings. These conversations serve as guidelines for users when constructing interactive narratives or engaging in linguistic exchanges with these language model-generated characters.

    Moreover, topics provided on each character card indicate the areas of expertise or interests that are inherent to each persona within the realm created by GPT models. This information enables users to generate dialogue that aligns with each character's unique knowledge base or passions.

    Furthermore, dialogues involving additional participants allow for multi-person exchanges and enable more intricate storytelling possibilities within virtual worlds. This feature enhances user engagement by promoting collaborative storytelling among multiple AI-generated characters.

    To enhance visual immersion and aid user creativity during role-playing experiences, image prompts are also included on each character card. These suggestive visuals stimulate users' imagination regarding how each character may appear physically based on their described features or characteristics.

    In conclusion, by providing extensive details about fictional personas generated by language models via sample dialogues along with their relevant context descriptions, interests/topics listicles paired up provocative visual prompts, the GPT Roleplay Realm dataset elevates the standards of language models in creating immersive and engaging role-playing experiences

    How to use the dataset

    How to Use This Dataset: GPT Roleplay Realm

    Welcome to the GPT Roleplay Realm dataset! This guide will help you navigate and make the best use of this enhanced character role-playing dataset.

    Overview

    The GPT Roleplay Realm dataset consists of character cards generated by GPT models. These character cards contain names, greetings, example dialogues, context, topics of interest, dialogues involving the characters, and image prompts. The purpose of this dataset is to provide language models with rich information about fictional characters that can be used for immersive role-playing experiences.

    Understanding the Columns

    The dataset is primarily organized into several columns:

    • name: The name of the character.
    • context: A brief description or background information about the character.
    • greeting: The initial greeting or introduction phrase of each character.
    • example_dialogue: A sample dialogue or conversation involving each character.
    • topics: The topics or themes that each character is knowledgeable or interested in.
    • dialogues: Additional dialogues or conversations involving each character.
    • image_prompt: Prompts or descriptions for images that represent each character.

    Getting Started

    When exploring this dataset, it may be helpful to first get a sense of all the available characters by examining their names using the name column.

    You can then dive deeper into a specific character's information by exploring their context in order to understand their background and story.

    To engage with a specific character in a role-playing scenario, start by using their provided greeting as an introductory statement towards them.

    If you want to understand how different characters interact with ...

  2. h

    ChatGPT-RealUser-2.2M-preview

    • huggingface.co
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gata (2025). ChatGPT-RealUser-2.2M-preview [Dataset]. https://huggingface.co/datasets/Gata-community/ChatGPT-RealUser-2.2M-preview
    Explore at:
    Dataset updated
    Aug 30, 2025
    Dataset authored and provided by
    Gata
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ChatGPT-RealUser-2.2M: A Large-Scale Dataset of Real-User, Real-World ChatGPT Conversations

    ChatGPT-RealUser-2.2M is a large-scale dataset of real-user, Real-World ChatGPT conversations developed by Gata. From 2024–2025, participants using Gata’s GPT-to-Earn product opted in to share their chats and earned points based on conversation quality. The dataset covers GPT-3.5, GPT-4, and o1 models, and contains 2,244,389 conversations from 15,316 unique users. Because many chats are… See the full description on the dataset page: https://huggingface.co/datasets/Gata-community/ChatGPT-RealUser-2.2M-preview.

  3. h

    Bitext-insurance-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-insurance-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-insurance-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Insurance Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [insurance] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-insurance-llm-chatbot-training-dataset.

  4. f

    Development, system design, safety, and performance metrics of a...

    • figshare.com
    xlsx
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Villarreal-Zegarra (2025). Development, system design, safety, and performance metrics of a conversational agent for reducing depressive and anxious symptoms: The MHAI Study [Dataset]. http://doi.org/10.6084/m9.figshare.29606618.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    figshare
    Authors
    David Villarreal-Zegarra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Conversational agents based on large language models (LLMs) have shown moderate efficacy in reducing depressive and anxiety symptoms. However, most existing evaluations lack methodological transparency, rely on closed-source models, and show limited standardization in performance and safety assessment.Objective: We have two study objectives: (1) to develop an LLM-based conversational agent through system design analysis and initial functionality testing, and (2) to evaluate its safety and performance through standardized assessment in controlled simulated interactions focused on depression and anxiety of two LLMs (GPT-4o and Llama 3.1-8B).Methods: We conducted a cross-sectional study in two phases. First, we developed a mental health platform integrating a conversational agent with functionalities including personalized context, pretrained therapeutic modules, self-assessment tools, and an emergency alert system. Second, we evaluated the agent’s responses in simulated interactions based on predefined user personas for each LLM. Four expert raters assessed 816 interaction pairs using a 5-criterion Likert scale evaluating tone, clarity, domain accuracy (correctness), robustness, completeness, boundaries, target language, and safety. In addition, we use quantitative performance metrics such as cost, response length, and number of tokens. Multiple linear regression models were used to compare LLM performance and assess metric interrelations.Results: First, we developed a web-based mental health platform using a user-centered design, structured into frontend, backend, and database layers. The system integrates therapeutic chat (GPT-4o and Llama 3.1-8B), psychological assessments (PHQ-9, GAD-7), CBT-based tasks, and an emergency alert system. The platform supports secure user authentication, data encryption, multilingual access, and session tracking. Second, GPT-4o outperformed Llama 3.1-8B in both quantitative and qualitative metrics, generating longer and more lexically diverse responses, using more tokens, and scoring higher in clarity, robustness, completeness, boundaries, and target language. However, it incurred higher costs, with no significant differences in tone, accuracy, or safety.Conclusion: Our study presents a conversational agent with multiple functionalities and shows that GPT-4o outperforms Llama 3.1-8B in performance, although at a higher cost. This platform could be used in future clinical trials or real-world implementation studies.

  5. WildChat

    • huggingface.co
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2024). WildChat [Dataset]. https://huggingface.co/datasets/allenai/WildChat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    Dataset Card for WildChat

      Note: a newer version with 4.8 million conversations and demographic information can be found here.
    
    
    
    
    
      Dataset Description
    

    Paper: https://arxiv.org/abs/2405.01470

    Interactive Search Tool: https://wildvisualizer.com (paper)

    License: ODC-BY

    Language(s) (NLP): multi-lingual

    Point of Contact: Yuntian Deng

      Dataset Summary
    

    WildChat is a collection of 650K conversations between human users and ChatGPT. We collected WildChat… See the full description on the dataset page: https://huggingface.co/datasets/allenai/WildChat.

  6. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

  7. MSC-Self-Instruct

    • huggingface.co
    Updated Oct 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MemGPT (2023). MSC-Self-Instruct [Dataset]. https://huggingface.co/datasets/MemGPT/MSC-Self-Instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 17, 2023
    Dataset authored and provided by
    MemGPT
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    MemGPT

    This is the self-instruct dataset of MSC conversations used for MemGPT paper. For more information please refer to memgpt.ai The MSC dataset is a multi-round human conversations. In this dataset, our goal is to come up with a conversation opener, that is personalized to the user by referencing topics from the previous conversations. These were generated while evaluating MemGPT.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). GPT Roleplay Realm: Enhanced Character [Dataset]. https://www.kaggle.com/datasets/thedevastator/gpt-roleplay-realm-enhanced-character-role-playi
Organization logo

GPT Roleplay Realm: Enhanced Character

Character Cards and Dialogues for immersive role-playing experiences

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2023
Dataset provided by
Kaggle
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

GPT Roleplay Realm: Enhanced Character Role-playing Dataset

Character Cards and Dialogues for immersive role-playing experiences

By Ilya Gusev (From Huggingface) [source]

About this dataset

The GPT Roleplay Realm dataset is a valuable resource for enhancing the capabilities of language models in the realm of character role-playing. Specifically designed to facilitate immersive role-playing experiences, this dataset is comprised of character cards generated by GPT models. These character cards contain essential information such as names, greetings, example dialogues, context, topics of interest, dialogues involving the characters, and image prompts.

With a focus on enriching language models' ability to engage in dynamic and realistic interactions with fictional characters, this dataset provides users with a diverse range of well-rounded characters to incorporate into their role-playing scenarios. Each character card includes a name that gives them an individual identity and distinction within the narrative.

Additionally, context descriptions offer crucial background information about each character's history or personality traits that can lend depth and authenticity to their portrayal. Greetings act as introductory statements that set the tone for interactions with these virtual personas.

Example dialogues showcase how these characters might converse within specific scenarios or settings. These conversations serve as guidelines for users when constructing interactive narratives or engaging in linguistic exchanges with these language model-generated characters.

Moreover, topics provided on each character card indicate the areas of expertise or interests that are inherent to each persona within the realm created by GPT models. This information enables users to generate dialogue that aligns with each character's unique knowledge base or passions.

Furthermore, dialogues involving additional participants allow for multi-person exchanges and enable more intricate storytelling possibilities within virtual worlds. This feature enhances user engagement by promoting collaborative storytelling among multiple AI-generated characters.

To enhance visual immersion and aid user creativity during role-playing experiences, image prompts are also included on each character card. These suggestive visuals stimulate users' imagination regarding how each character may appear physically based on their described features or characteristics.

In conclusion, by providing extensive details about fictional personas generated by language models via sample dialogues along with their relevant context descriptions, interests/topics listicles paired up provocative visual prompts, the GPT Roleplay Realm dataset elevates the standards of language models in creating immersive and engaging role-playing experiences

How to use the dataset

How to Use This Dataset: GPT Roleplay Realm

Welcome to the GPT Roleplay Realm dataset! This guide will help you navigate and make the best use of this enhanced character role-playing dataset.

Overview

The GPT Roleplay Realm dataset consists of character cards generated by GPT models. These character cards contain names, greetings, example dialogues, context, topics of interest, dialogues involving the characters, and image prompts. The purpose of this dataset is to provide language models with rich information about fictional characters that can be used for immersive role-playing experiences.

Understanding the Columns

The dataset is primarily organized into several columns:

  • name: The name of the character.
  • context: A brief description or background information about the character.
  • greeting: The initial greeting or introduction phrase of each character.
  • example_dialogue: A sample dialogue or conversation involving each character.
  • topics: The topics or themes that each character is knowledgeable or interested in.
  • dialogues: Additional dialogues or conversations involving each character.
  • image_prompt: Prompts or descriptions for images that represent each character.

Getting Started

When exploring this dataset, it may be helpful to first get a sense of all the available characters by examining their names using the name column.

You can then dive deeper into a specific character's information by exploring their context in order to understand their background and story.

To engage with a specific character in a role-playing scenario, start by using their provided greeting as an introductory statement towards them.

If you want to understand how different characters interact with ...

Search
Clear search
Close search
Google apps
Main menu