41 datasets found

h
persona-chat
huggingface.co
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksey Korshuk (2023). persona-chat [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/persona-chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2023
Authors
Aleksey Korshuk
Description
AlekseyKorshuk/persona-chat dataset hosted on Hugging Face and contributed by the HF Datasets community
Synthetic-Persona-Chat
huggingface.co
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2023). Synthetic-Persona-Chat [Dataset]. https://huggingface.co/datasets/google/Synthetic-Persona-Chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2023
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for SPC: Synthetic-Persona-Chat Dataset

Abstract from the paper introducing this dataset:

High-quality conversational datasets are essential for developing AI models that can communicate with users. One way to foster deeper interactions between a chatbot and its user is through personas, aspects of the user's character that provide insights into their personality, motivations, and behaviors. Training Natural Language Processing (NLP) models on a diverse and… See the full description on the dataset page: https://huggingface.co/datasets/google/Synthetic-Persona-Chat.
h
persona-chat
huggingface.co
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynaptics Club, IIT Indore (2024). persona-chat [Dataset]. https://huggingface.co/datasets/Cynaptics/persona-chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 26, 2024
Dataset authored and provided by
Cynaptics Club, IIT Indore
Description
Dataset Description

This persona chat dataset consists of 20,000 conversations. This dataset is crafted to enhance personalized conversational text generation models that consistently reflect a character's persona in the generated response across many conversation turns. Each dialogue in the dataset is structured to reflect a back-and-forth exchange between two personas, offering a window into how individual characteristics, backgrounds, and personal narratives can influence… See the full description on the dataset page: https://huggingface.co/datasets/Cynaptics/persona-chat.
h
persona-chat
huggingface.co
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Awsaf (2025). persona-chat [Dataset]. https://huggingface.co/datasets/awsaf49/persona-chat
Explore at:
Dataset updated
Jul 3, 2025
Authors
Awsaf
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for PersonaChat

Dataset Description

PersonaChat is a multi-turn dialogue dataset introduced by Zhang et al. (2018) for training and evaluating persona-grounded conversational agents. Each conversation is between two crowdworkers, each assigned a randomly selected persona consisting of several simple facts. The dataset aims to assess whether models can maintain consistent character traits throughout a conversation.

Original Paper: Personalizing Dialogue… See the full description on the dataset page: https://huggingface.co/datasets/awsaf49/persona-chat.
O
PERSONA-CHAT
opendatalab.com
zip
Updated Sep 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Montreal Institute for Learning Algorithms (2022). PERSONA-CHAT [Dataset]. https://opendatalab.com/OpenDataLab/PERSONA-CHAT
Explore at:
zip(247211 bytes)Available download formats
Dataset updated
Sep 22, 2022
Dataset provided by
Facebook
Montreal Institute for Learning Algorithms
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present the PERSONA-CHAT dataset, a new dialogue dataset consisting of 162,064 utterances between crowdworkers who were randomly paired and each asked to act the part of a given provided persona (randomly assigned, and created by another set of crowdworkers). The paired workers were asked to chat naturally and to get to know each other during the conversation. This produces interesting and engaging conversations that our agents can try to learn to mimic.
Synthetic Persona Chat
kaggle.com
zip
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kawindu Wijewardhane (2024). Synthetic Persona Chat [Dataset]. https://www.kaggle.com/datasets/kawinduwijewardhane/synthetic-persona-chat/code
Explore at:
zip(4045494 bytes)Available download formats
Dataset updated
Sep 22, 2024
Authors
Kawindu Wijewardhane
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Kawindu Wijewardhane

Released under MIT

Contents
Facebook AI - PersonaChat (8784 examples)
kaggle.com
zip
Updated Mar 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atharv Jairath (2022). Facebook AI - PersonaChat (8784 examples) [Dataset]. https://www.kaggle.com/datasets/atharvjairath/personachat/code
Explore at:
zip(2816727 bytes)Available download formats
Dataset updated
Mar 19, 2022
Authors
Atharv Jairath
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Personalizing Dialogue Agents: I have a dog, do you have pets too?

Paper

Content

A chit-chat dataset where paired Turkers are given assigned personas and chat to try to get to know each other.

Abstract

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

Acknowledgements

Paper

Code
h
persona-chat
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anezatra, persona-chat [Dataset]. https://huggingface.co/datasets/anezatra/persona-chat
Explore at:
Authors
Anezatra
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Persona-Chat

Dataset Summary

Persona-Chat is a high-quality multi-turn dialogue dataset designed to train conversational AI systems with consistent personality and style. Each participant in the dataset is assigned a persona—a short description or set of traits—which guides their responses throughout the conversation. This dataset enables AI models to learn to maintain coherent personas across dialogue turns and produce responses that reflect consistent characteristics… See the full description on the dataset page: https://huggingface.co/datasets/anezatra/persona-chat.
Toloka Persona Chat Rus
kaggle.com
zip
Updated Aug 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Biryukov (2021). Toloka Persona Chat Rus [Dataset]. https://www.kaggle.com/valentinbiryukov/toloka-persona-chat-rus
Explore at:
zip(6644148 bytes)Available download formats
Dataset updated
Aug 12, 2021
Authors
Valentin Biryukov
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Content

This dataset of 10,000 dialogues will help researchers of dialogue systems to develop approaches for training chat bots. Prepared in collaboration with MIPT’s Neural Networks and Deep Learning Lab, the dataset contains profiles with a description of each individual's personality and dialogues between the research participants. A chatbot that is trained on the dataset will be able to communicate on behalf of a certain persona and get to know people by chatting with them on general topics.
PMPC (Persona Match on Persona-Chat)
opendatalab.com
zip
Updated Sep 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Science and Technology of China (2022). PMPC (Persona Match on Persona-Chat) [Dataset]. https://opendatalab.com/OpenDataLab/PMPC
Explore at:
zip(141185672 bytes)Available download formats
Dataset updated
Sep 22, 2022
Dataset provided by
科大讯飞http://www.iflytek.com/
Queen’s University
Microsoft Research Asia
University of Science and Technology of China
Description
PMPC (Persona Match on Persona-Chat) is a dataset for Speaker Persona Detection (SPD) which aims to detect speaker personas based on the plain conversational text.
h
korean-persona-chat-v1
huggingface.co
Updated Feb 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SeongUk Moon (2025). korean-persona-chat-v1 [Dataset]. https://huggingface.co/datasets/ANTEGRAL/korean-persona-chat-v1
Explore at:
Dataset updated
Feb 7, 2025
Authors
SeongUk Moon
Description
ANTEGRAL/korean-persona-chat-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
PERSONA-CHAT对话数文本据集 - Dataset - 海数据
haidatas.com
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PERSONA-CHAT对话数文本据集 - Dataset - 海数据 [Dataset]. https://haidatas.com/dataset/persona-chat
Explore at:
Dataset updated
Feb 11, 2025
Description
PERSONA-CHAT 数据集，这是一个新的对话数据集，由随机配对的众包工作人员之间的 162,064 个话语组成并且每个人都要求扮演给定的角色（随机分配，由另一组众包创建）。配对的工人被要求自然地聊天，并在谈话中相互了解。这会产生有趣且引人入胜的对话，我们的代理可以尝试学习模仿。
h
rp-chat-persona-sharegpt
huggingface.co
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jinsu kim (2025). rp-chat-persona-sharegpt [Dataset]. https://huggingface.co/datasets/suchievement/rp-chat-persona-sharegpt
Explore at:
Dataset updated
Oct 25, 2025
Authors
jinsu kim
Description
suchievement/rp-chat-persona-sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community
h
qa-chat-persona-education
huggingface.co
Updated Oct 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Kaitchup (2024). qa-chat-persona-education [Dataset]. https://huggingface.co/datasets/kaitchup/qa-chat-persona-education
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 17, 2024
Dataset authored and provided by
The Kaitchup
Description
kaitchup/qa-chat-persona-education dataset hosted on Hugging Face and contributed by the HF Datasets community
h
persona-based-chat-messages
huggingface.co
Updated May 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maks (2024). persona-based-chat-messages [Dataset]. https://huggingface.co/datasets/Kkordik/persona-based-chat-messages
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2024
Authors
Maks
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is reformated nazlicanto/persona-based-chat Original dataset Synthetic Persona Chat

Changes

Added system column, which is reformated persona_b, unified into one string, replaced "I", "my"... on "You", "your"... and corrected capital letter usage (now after dot goes capital letter) Added messages column, which is dialogue reformated to be in conversational format + system message Splitted on train and test

More about reformating

You can find all the… See the full description on the dataset page: https://huggingface.co/datasets/Kkordik/persona-based-chat-messages.
Data from: AstroChat
kaggle.com
huggingface.co
zip
Updated Jun 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
astro_pat (2024). AstroChat [Dataset]. https://www.kaggle.com/datasets/patrickfleith/astrochat
Explore at:
zip(1214166 bytes)Available download formats
Dataset updated
Jun 9, 2024
Authors
astro_pat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose and Scope

The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.

Intended Use

The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).

Quickstart

To be completed

DATASET DESCRIPTION

Access

Manual download from Hugging face hub: https://huggingface.co/datasets/patrickfleith/AstroChat

Or with python: python from datasets import load_dataset dataset = load_dataset("patrickfleith/AstroChat")

Structure

901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column): - id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets. - topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split. - subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc. - persona: description of the persona used to simulate a user - opening_question: the first question asked by the user to start a conversation with the AI-assistant - messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields: - role: the role of the speaker, either user or assistant - content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.

Important See the full list of topics and subtopics covered below.

Metadata

Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main

Generation Method

We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:

Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.

Step-by-step description

Defined a set of user persona

Defined a set of topics/ disciplines within the domain of Astronautics / Space Mission Engineering

For each topics, we defined a set of subtopics to narrow down the conversation to more specific and niche conversations (see below the full list)

For each subtopic we generate a set of opening questions that the user could ask to start a conversation (see below the full list)

We then distil the knowledge of an strong Chat Model (in our case ChatGPT through then api with gpt-4-turbo model) to generate the answers to the opening questions

We simulate follow-up questions from the user to the assistant, and the assistant's answers to these questions which builds up the messages.

Future work and contributions appreciated

Distil knowledge from more models (Anthropic, Mixtral, GPT-4o, etc...)

Implement more creativity in the opening questions and follow-up questions

Filter-out questions and conversations which are too similar

Ask topic and subtopic expert to validate the generated conversations to have a sense on how reliable is the overall dataset

Languages

All instances in the dataset are in english

Size

901 synthetically-generated dialogue

USAGE AND GUIDELINES

License

AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International

Restrictions

No restriction. Please provide the correct attribution following the license terms.

Citation

Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579

Update Frequency

Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)

Have a feedback or spot an error?

Use the ...
F
Spanish Agent-Customer Chat Dataset for Healthcare Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/spanish-healthcare-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The Spanish Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Spanish-speaking regions.
Participant & Chat Overview
•
Participants: 150+ native Spanish speakers from the FutureBeeAI Crowd Community

•
Conversation Length: 300–700 words per chat

•
Turns per Chat: 50–150 dialogue turns across both participants

•
Chat Types: Inbound and outbound

•
Sentiment Coverage: Positive, neutral, and negative outcomes included

Topic Diversity
The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:
•
Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups

•
Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.
Language Diversity & Realism
This dataset reflects the natural flow of Spanish healthcare communication and includes:
•
Authentic Naming Patterns: Spanish personal names, clinic names, and brands

•
Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Spanish formats

•
Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Spanish-speaking regions

•
Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.
Conversational Flow & Structure
Conversations range from simple inquiries to complex advisory sessions, including:
•General inquiries
•Detailed problem-solving
•Routine status updates
•Treatment recommendations
•Support and feedback interactions
Each conversation typically includes these structural components:
•Greetings and verification
•Information gathering
•Problem definition
•Solution delivery
•Closing messages
•Follow-up and feedback (where applicable)
This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.
Data Format & Structure
Available in JSON, CSV, and TXT formats, each conversation includes:
•Full message history with clear speaker labels
•Participant identifiers
•Metadata (e.g., topic tags, region, sentiment)
•Compatibility with common NLP and ML pipelines
Applications
<p
g
Create persona using a template - AI Prompt Template
godtierprompts.com
jsonld
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2025). Create persona using a template - AI Prompt Template [Dataset]. https://www.godtierprompts.com/prompt/ccff9fbd-dcbd-406f-9848-a67104964aef
Explore at:
jsonldAvailable download formats
Dataset updated
Jul 1, 2025
Authors
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Quality Score
Description
A curated prompt template for AI language models: Create a persona using a template very useful
h
genz-persona-chat-style
huggingface.co
Updated Nov 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datthesh Padmanabh Shenoy (2025). genz-persona-chat-style [Dataset]. https://huggingface.co/datasets/dattheshshenoy/genz-persona-chat-style
Explore at:
Dataset updated
Nov 8, 2025
Authors
Datthesh Padmanabh Shenoy
Description
dattheshshenoy/genz-persona-chat-style dataset hosted on Hugging Face and contributed by the HF Datasets community
h
persona_chat-informal_indonesian
huggingface.co
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pradana Setialana (2024). persona_chat-informal_indonesian [Dataset]. https://huggingface.co/datasets/psetialana/persona_chat-informal_indonesian
Explore at:
Dataset updated
Nov 1, 2024
Authors
Pradana Setialana
Description
This dataset is a translation of the Persona Chat dataset into informal Indonesian, reflecting the language commonly used by Indonesian teenagers in instant messaging conversations. It is derived from the repository psetialana/multi_session_chat-informal_indonesian-transformed, which serves as a translated version of gonced8/multi-session_chat. The conversations in the first session of the multi-session chat dataset originate from the Persona Chat dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aleksey Korshuk (2023). persona-chat [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/persona-chat

persona-chat

AlekseyKorshuk/persona-chat

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 25, 2023

Authors

Aleksey Korshuk

Description

AlekseyKorshuk/persona-chat dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

persona-chat

Synthetic-Persona-Chat

persona-chat

persona-chat

PERSONA-CHAT

Synthetic Persona Chat

Dataset

Contents

Facebook AI - PersonaChat (8784 examples)

Personalizing Dialogue Agents: I have a dog, do you have pets too?

Content

Abstract

Acknowledgements

persona-chat

Toloka Persona Chat Rus

Content

PMPC (Persona Match on Persona-Chat)

korean-persona-chat-v1

PERSONA-CHAT对话数文本据集 - Dataset - 海数据

rp-chat-persona-sharegpt

qa-chat-persona-education

persona-based-chat-messages

Data from: AstroChat

Purpose and Scope

Intended Use

Quickstart

DATASET DESCRIPTION

Access

Structure

Metadata

Generation Method

Step-by-step description

Future work and contributions appreciated

Languages

Size

USAGE AND GUIDELINES

License

Restrictions

Citation

Update Frequency

Have a feedback or spot an error?

Spanish Agent-Customer Chat Dataset for Healthcare Domain

Introduction

Participant & Chat Overview

Topic Diversity

Language Diversity & Realism

Conversational Flow & Structure

Data Format & Structure

Applications

Create persona using a template - AI Prompt Template

genz-persona-chat-style

persona_chat-informal_indonesian

persona-chat

AlekseyKorshuk/persona-chat