41 datasets found
  1. h

    persona-chat

    • huggingface.co
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksey Korshuk (2023). persona-chat [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/persona-chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2023
    Authors
    Aleksey Korshuk
    Description

    AlekseyKorshuk/persona-chat dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. Synthetic-Persona-Chat

    • huggingface.co
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2023). Synthetic-Persona-Chat [Dataset]. https://huggingface.co/datasets/google/Synthetic-Persona-Chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2023
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for SPC: Synthetic-Persona-Chat Dataset

    Abstract from the paper introducing this dataset:

    High-quality conversational datasets are essential for developing AI models that can communicate with users. One way to foster deeper interactions between a chatbot and its user is through personas, aspects of the user's character that provide insights into their personality, motivations, and behaviors. Training Natural Language Processing (NLP) models on a diverse and… See the full description on the dataset page: https://huggingface.co/datasets/google/Synthetic-Persona-Chat.

  3. h

    persona-chat

    • huggingface.co
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cynaptics Club, IIT Indore (2024). persona-chat [Dataset]. https://huggingface.co/datasets/Cynaptics/persona-chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2024
    Dataset authored and provided by
    Cynaptics Club, IIT Indore
    Description

    Dataset Description

    This persona chat dataset consists of 20,000 conversations. This dataset is crafted to enhance personalized conversational text generation models that consistently reflect a character's persona in the generated response across many conversation turns. Each dialogue in the dataset is structured to reflect a back-and-forth exchange between two personas, offering a window into how individual characteristics, backgrounds, and personal narratives can influence… See the full description on the dataset page: https://huggingface.co/datasets/Cynaptics/persona-chat.

  4. h

    persona-chat

    • huggingface.co
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Awsaf (2025). persona-chat [Dataset]. https://huggingface.co/datasets/awsaf49/persona-chat
    Explore at:
    Dataset updated
    Jul 3, 2025
    Authors
    Awsaf
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for PersonaChat

      Dataset Description
    

    PersonaChat is a multi-turn dialogue dataset introduced by Zhang et al. (2018) for training and evaluating persona-grounded conversational agents. Each conversation is between two crowdworkers, each assigned a randomly selected persona consisting of several simple facts. The dataset aims to assess whether models can maintain consistent character traits throughout a conversation.

    Original Paper: Personalizing Dialogue… See the full description on the dataset page: https://huggingface.co/datasets/awsaf49/persona-chat.

  5. O

    PERSONA-CHAT

    • opendatalab.com
    zip
    Updated Sep 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Montreal Institute for Learning Algorithms (2022). PERSONA-CHAT [Dataset]. https://opendatalab.com/OpenDataLab/PERSONA-CHAT
    Explore at:
    zip(247211 bytes)Available download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    Facebook
    Montreal Institute for Learning Algorithms
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present the PERSONA-CHAT dataset, a new dialogue dataset consisting of 162,064 utterances between crowdworkers who were randomly paired and each asked to act the part of a given provided persona (randomly assigned, and created by another set of crowdworkers). The paired workers were asked to chat naturally and to get to know each other during the conversation. This produces interesting and engaging conversations that our agents can try to learn to mimic.

  6. Synthetic Persona Chat

    • kaggle.com
    zip
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kawindu Wijewardhane (2024). Synthetic Persona Chat [Dataset]. https://www.kaggle.com/datasets/kawinduwijewardhane/synthetic-persona-chat/code
    Explore at:
    zip(4045494 bytes)Available download formats
    Dataset updated
    Sep 22, 2024
    Authors
    Kawindu Wijewardhane
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Kawindu Wijewardhane

    Released under MIT

    Contents

  7. Facebook AI - PersonaChat (8784 examples)

    • kaggle.com
    zip
    Updated Mar 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharv Jairath (2022). Facebook AI - PersonaChat (8784 examples) [Dataset]. https://www.kaggle.com/datasets/atharvjairath/personachat/code
    Explore at:
    zip(2816727 bytes)Available download formats
    Dataset updated
    Mar 19, 2022
    Authors
    Atharv Jairath
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Personalizing Dialogue Agents: I have a dog, do you have pets too?

    Paper

    Content

    A chit-chat dataset where paired Turkers are given assigned personas and chat to try to get to know each other.

    Abstract

    Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

    Acknowledgements

  8. h

    persona-chat

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anezatra, persona-chat [Dataset]. https://huggingface.co/datasets/anezatra/persona-chat
    Explore at:
    Authors
    Anezatra
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Persona-Chat

      Dataset Summary
    

    Persona-Chat is a high-quality multi-turn dialogue dataset designed to train conversational AI systems with consistent personality and style. Each participant in the dataset is assigned a persona—a short description or set of traits—which guides their responses throughout the conversation. This dataset enables AI models to learn to maintain coherent personas across dialogue turns and produce responses that reflect consistent characteristics… See the full description on the dataset page: https://huggingface.co/datasets/anezatra/persona-chat.

  9. Toloka Persona Chat Rus

    • kaggle.com
    zip
    Updated Aug 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Biryukov (2021). Toloka Persona Chat Rus [Dataset]. https://www.kaggle.com/valentinbiryukov/toloka-persona-chat-rus
    Explore at:
    zip(6644148 bytes)Available download formats
    Dataset updated
    Aug 12, 2021
    Authors
    Valentin Biryukov
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Content

    This dataset of 10,000 dialogues will help researchers of dialogue systems to develop approaches for training chat bots. Prepared in collaboration with MIPT’s Neural Networks and Deep Learning Lab, the dataset contains profiles with a description of each individual's personality and dialogues between the research participants. A chatbot that is trained on the dataset will be able to communicate on behalf of a certain persona and get to know people by chatting with them on general topics.

  10. PMPC (Persona Match on Persona-Chat)

    • opendatalab.com
    zip
    Updated Sep 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Science and Technology of China (2022). PMPC (Persona Match on Persona-Chat) [Dataset]. https://opendatalab.com/OpenDataLab/PMPC
    Explore at:
    zip(141185672 bytes)Available download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    科大讯飞http://www.iflytek.com/
    Queen’s University
    Microsoft Research Asia
    University of Science and Technology of China
    Description

    PMPC (Persona Match on Persona-Chat) is a dataset for Speaker Persona Detection (SPD) which aims to detect speaker personas based on the plain conversational text.

  11. h

    korean-persona-chat-v1

    • huggingface.co
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SeongUk Moon (2025). korean-persona-chat-v1 [Dataset]. https://huggingface.co/datasets/ANTEGRAL/korean-persona-chat-v1
    Explore at:
    Dataset updated
    Feb 7, 2025
    Authors
    SeongUk Moon
    Description

    ANTEGRAL/korean-persona-chat-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    PERSONA-CHAT对话数文本据集 - Dataset - 海数据

    • haidatas.com
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PERSONA-CHAT对话数文本据集 - Dataset - 海数据 [Dataset]. https://haidatas.com/dataset/persona-chat
    Explore at:
    Dataset updated
    Feb 11, 2025
    Description

    PERSONA-CHAT 数据集,这是一个新的对话数据集,由随机配对的众包工作人员之间的 162,064 个话语组成 并且每个人都要求扮演给定的角色(随机分配,由另一组众包创建)。配对的工人被要求自然地聊天,并在谈话中相互了解。这会产生有趣且引人入胜的对话,我们的代理可以尝试学习模仿。

  13. h

    rp-chat-persona-sharegpt

    • huggingface.co
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jinsu kim (2025). rp-chat-persona-sharegpt [Dataset]. https://huggingface.co/datasets/suchievement/rp-chat-persona-sharegpt
    Explore at:
    Dataset updated
    Oct 25, 2025
    Authors
    jinsu kim
    Description

    suchievement/rp-chat-persona-sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    qa-chat-persona-education

    • huggingface.co
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Kaitchup (2024). qa-chat-persona-education [Dataset]. https://huggingface.co/datasets/kaitchup/qa-chat-persona-education
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 17, 2024
    Dataset authored and provided by
    The Kaitchup
    Description

    kaitchup/qa-chat-persona-education dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    persona-based-chat-messages

    • huggingface.co
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maks (2024). persona-based-chat-messages [Dataset]. https://huggingface.co/datasets/Kkordik/persona-based-chat-messages
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2024
    Authors
    Maks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is reformated nazlicanto/persona-based-chat Original dataset Synthetic Persona Chat

      Changes
    

    Added system column, which is reformated persona_b, unified into one string, replaced "I", "my"... on "You", "your"... and corrected capital letter usage (now after dot goes capital letter) Added messages column, which is dialogue reformated to be in conversational format + system message Splitted on train and test

      More about reformating
    

    You can find all the… See the full description on the dataset page: https://huggingface.co/datasets/Kkordik/persona-based-chat-messages.

  16. Data from: AstroChat

    • kaggle.com
    • huggingface.co
    zip
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    astro_pat (2024). AstroChat [Dataset]. https://www.kaggle.com/datasets/patrickfleith/astrochat
    Explore at:
    zip(1214166 bytes)Available download formats
    Dataset updated
    Jun 9, 2024
    Authors
    astro_pat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose and Scope

    The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.

    Intended Use

    The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).

    Quickstart

    To be completed

    DATASET DESCRIPTION

    Access

    Structure

    901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column): - id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets. - topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split. - subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc. - persona: description of the persona used to simulate a user - opening_question: the first question asked by the user to start a conversation with the AI-assistant - messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields: - role: the role of the speaker, either user or assistant - content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.

    Important See the full list of topics and subtopics covered below.

    Metadata

    Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main

    Generation Method

    We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:

    Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.

    Step-by-step description

    • Defined a set of user persona
    • Defined a set of topics/ disciplines within the domain of Astronautics / Space Mission Engineering
    • For each topics, we defined a set of subtopics to narrow down the conversation to more specific and niche conversations (see below the full list)
    • For each subtopic we generate a set of opening questions that the user could ask to start a conversation (see below the full list)
    • We then distil the knowledge of an strong Chat Model (in our case ChatGPT through then api with gpt-4-turbo model) to generate the answers to the opening questions
    • We simulate follow-up questions from the user to the assistant, and the assistant's answers to these questions which builds up the messages.

    Future work and contributions appreciated

    • Distil knowledge from more models (Anthropic, Mixtral, GPT-4o, etc...)
    • Implement more creativity in the opening questions and follow-up questions
    • Filter-out questions and conversations which are too similar
    • Ask topic and subtopic expert to validate the generated conversations to have a sense on how reliable is the overall dataset

    Languages

    All instances in the dataset are in english

    Size

    901 synthetically-generated dialogue

    USAGE AND GUIDELINES

    License

    AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International

    Restrictions

    No restriction. Please provide the correct attribution following the license terms.

    Citation

    Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579

    Update Frequency

    Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)

    Have a feedback or spot an error?

    Use the ...

  17. F

    Spanish Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Spanish Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/spanish-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Spanish Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Spanish-speaking regions.

    Participant & Chat Overview

    Participants: 150+ native Spanish speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Spanish healthcare communication and includes:

    Authentic Naming Patterns: Spanish personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Spanish formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Spanish-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

  18. g

    Create persona using a template - AI Prompt Template

    • godtierprompts.com
    jsonld
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2025). Create persona using a template - AI Prompt Template [Dataset]. https://www.godtierprompts.com/prompt/ccff9fbd-dcbd-406f-9848-a67104964aef
    Explore at:
    jsonldAvailable download formats
    Dataset updated
    Jul 1, 2025
    Authors
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Quality Score
    Description

    A curated prompt template for AI language models: Create a persona using a template very useful

  19. h

    genz-persona-chat-style

    • huggingface.co
    Updated Nov 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datthesh Padmanabh Shenoy (2025). genz-persona-chat-style [Dataset]. https://huggingface.co/datasets/dattheshshenoy/genz-persona-chat-style
    Explore at:
    Dataset updated
    Nov 8, 2025
    Authors
    Datthesh Padmanabh Shenoy
    Description

    dattheshshenoy/genz-persona-chat-style dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    persona_chat-informal_indonesian

    • huggingface.co
    Updated Nov 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pradana Setialana (2024). persona_chat-informal_indonesian [Dataset]. https://huggingface.co/datasets/psetialana/persona_chat-informal_indonesian
    Explore at:
    Dataset updated
    Nov 1, 2024
    Authors
    Pradana Setialana
    Description

    This dataset is a translation of the Persona Chat dataset into informal Indonesian, reflecting the language commonly used by Indonesian teenagers in instant messaging conversations. It is derived from the repository psetialana/multi_session_chat-informal_indonesian-transformed, which serves as a translated version of gonced8/multi-session_chat. The conversations in the first session of the multi-session chat dataset originate from the Persona Chat dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aleksey Korshuk (2023). persona-chat [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/persona-chat

persona-chat

AlekseyKorshuk/persona-chat

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2023
Authors
Aleksey Korshuk
Description

AlekseyKorshuk/persona-chat dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu