25 datasets found

h
personachat_truecased
huggingface.co
opendatalab.com
Updated Sep 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bavard AI, Inc. (2021). personachat_truecased [Dataset]. https://huggingface.co/datasets/bavard/personachat_truecased
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 27, 2021
Dataset authored and provided by
Bavard AI, Inc.
Description
A version of the PersonaChat dataset that has been true-cased, and also has been given more normalized punctuation. The original PersonaChat dataset is in all lower case, and has extra space around each clause/sentence separating punctuation mark. This version of the dataset has more of a natural language look, with sentence capitalization, proper noun capitalization, and normalized whitespace. Also, each dialogue turn includes a pool of distractor candidate responses, which can be used by a multiple choice regularization loss during training.
t
PersonaChat - Dataset - LDM
service.tib.eu
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). PersonaChat - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/personachat
Explore at:
Dataset updated
Nov 25, 2024
Description
Persona-Chat is sourced from authentic conversations between human annotators who are randomly matched and assigned persona information.
h
persona-chat
huggingface.co
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksey Korshuk (2023). persona-chat [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/persona-chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2023
Authors
Aleksey Korshuk
Description
AlekseyKorshuk/persona-chat dataset hosted on Hugging Face and contributed by the HF Datasets community
Synthetic-Persona-Chat
huggingface.co
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2023). Synthetic-Persona-Chat [Dataset]. https://huggingface.co/datasets/google/Synthetic-Persona-Chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2023
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for SPC: Synthetic-Persona-Chat Dataset

Abstract from the paper introducing this dataset:

High-quality conversational datasets are essential for developing AI models that can communicate with users. One way to foster deeper interactions between a chatbot and its user is through personas, aspects of the user's character that provide insights into their personality, motivations, and behaviors. Training Natural Language Processing (NLP) models on a diverse and… See the full description on the dataset page: https://huggingface.co/datasets/google/Synthetic-Persona-Chat.
t
PersonaChat dataset - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). PersonaChat dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/personachat-dataset
Explore at:
Dataset updated
Dec 16, 2024
Description
The PersonaChat dataset is a large persona-conditioned chit-chat style dialogue dataset.
h
persona-chat
huggingface.co
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynaptics Club, IIT Indore (2024). persona-chat [Dataset]. https://huggingface.co/datasets/Cynaptics/persona-chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 26, 2024
Dataset authored and provided by
Cynaptics Club, IIT Indore
Description
Dataset Description

This persona chat dataset consists of 20,000 conversations. This dataset is crafted to enhance personalized conversational text generation models that consistently reflect a character's persona in the generated response across many conversation turns. Each dialogue in the dataset is structured to reflect a back-and-forth exchange between two personas, offering a window into how individual characteristics, backgrounds, and personal narratives can influence… See the full description on the dataset page: https://huggingface.co/datasets/Cynaptics/persona-chat.
Facebook AI - PersonaChat (8784 examples)
kaggle.com
Updated Mar 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atharv Jairath (2022). Facebook AI - PersonaChat (8784 examples) [Dataset]. https://www.kaggle.com/datasets/atharvjairath/personachat/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atharv Jairath
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Personalizing Dialogue Agents: I have a dog, do you have pets too?

Paper

Content

A chit-chat dataset where paired Turkers are given assigned personas and chat to try to get to know each other.

Abstract

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

Acknowledgements

Paper

Code
h
persona-chat
huggingface.co
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Awsaf (2025). persona-chat [Dataset]. https://huggingface.co/datasets/awsaf49/persona-chat
Explore at:
Dataset updated
Jul 3, 2025
Authors
Awsaf
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for PersonaChat

Dataset Description

PersonaChat is a multi-turn dialogue dataset introduced by Zhang et al. (2018) for training and evaluating persona-grounded conversational agents. Each conversation is between two crowdworkers, each assigned a randomly selected persona consisting of several simple facts. The dataset aims to assess whether models can maintain consistent character traits throughout a conversation.

Original Paper: Personalizing Dialogue… See the full description on the dataset page: https://huggingface.co/datasets/awsaf49/persona-chat.
t
USR-PersonaChat - Dataset - LDM
service.tib.eu
Updated Jan 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). USR-PersonaChat - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/usr-personachat
Explore at:
Dataset updated
Jan 2, 2025
Description
This dataset is used for dialogue response evaluation.
O
PERSONA-CHAT
opendatalab.com
zip
Updated Sep 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Facebook (2022). PERSONA-CHAT [Dataset]. https://opendatalab.com/OpenDataLab/PERSONA-CHAT
Explore at:
zip(247211 bytes)Available download formats
Dataset updated
Sep 22, 2022
Dataset provided by
Facebook
Montreal Institute for Learning Algorithms
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present the PERSONA-CHAT dataset, a new dialogue dataset consisting of 162,064 utterances between crowdworkers who were randomly paired and each asked to act the part of a given provided persona (randomly assigned, and created by another set of crowdworkers). The paired workers were asked to chat naturally and to get to know each other during the conversation. This produces interesting and engaging conversations that our agents can try to learn to mimic.
h
personachat
huggingface.co
Updated Nov 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anas Saleh Mousa (2024). personachat [Dataset]. https://huggingface.co/datasets/anassaleh218/personachat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2024
Authors
Anas Saleh Mousa
Description
anassaleh218/personachat dataset hosted on Hugging Face and contributed by the HF Datasets community
h
personachat-safe
huggingface.co
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Human Language Technology Lab @ NUS (2024). personachat-safe [Dataset]. https://huggingface.co/datasets/hlt-lab/personachat-safe
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 26, 2024
Dataset authored and provided by
Human Language Technology Lab @ NUS
Description
Dataset Card for "personachat_safe"

More Information needed
h
korean-persona-chat-v1
huggingface.co
Updated Feb 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SeongUk Moon (2025). korean-persona-chat-v1 [Dataset]. https://huggingface.co/datasets/ANTEGRAL/korean-persona-chat-v1
Explore at:
Dataset updated
Feb 7, 2025
Authors
SeongUk Moon
Description
ANTEGRAL/korean-persona-chat-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community
O
ConvAI2 (Conversational Intelligence Challenge 2)
opendatalab.com
zip
Updated Apr 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Facebook AI Research (2023). ConvAI2 (Conversational Intelligence Challenge 2) [Dataset]. https://opendatalab.com/OpenDataLab/ConvAI2
Explore at:
zipAvailable download formats
Dataset updated
Apr 1, 2023
Dataset provided by
McGill University
Carnegie Mellon University
Facebook AI Research
Moscow Institute of Physics and Technology
University of Montreal
Microsoft Research
Description
The ConvAI2 NeurIPS competition aimed at finding approaches to creating high-quality dialogue agents capable of meaningful open domain conversation. The ConvAI2 dataset for training models is based on the PERSONA-CHAT dataset. The speaker pairs each have assigned profiles coming from a set of 1155 possible personas (at training time), each consisting of at least 5 profile sentences, setting aside 100 never seen before personas for validation. As the original PERSONA-CHAT test set was released, a new hidden test set consisted of 100 new personas and over 1,015 dialogs was created by crowdsourced workers. To avoid modeling that takes advantage of trivial word overlap, additional rewritten sets of the same train and test personas were crowdsourced, with related sentences that are rephrases, generalizations or specializations, rendering the task much more challenging. For example “I just got my nails done” is revised as “I love to pamper myself on a regular basis” and “I am on a diet now” is revised as “I need to lose weight.” The training, validation and hidden test sets consists of 17,878, 1,000 and 1,015 dialogues, respectively.
PMPC (Persona Match on Persona-Chat)
opendatalab.com
zip
Updated Sep 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iFlytek Research (2022). PMPC (Persona Match on Persona-Chat) [Dataset]. https://opendatalab.com/OpenDataLab/PMPC
Explore at:
zip(141185672 bytes)Available download formats
Dataset updated
Sep 22, 2022
Dataset provided by
科大讯飞http://www.iflytek.com/
Queen’s University
University of Science and Technology of China
Microsoft Research Asia
Description
PMPC (Persona Match on Persona-Chat) is a dataset for Speaker Persona Detection (SPD) which aims to detect speaker personas based on the plain conversational text.
h
Synthetic-Persona-Chat-Reversal-Role
huggingface.co
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taiwan Llama (2025). Synthetic-Persona-Chat-Reversal-Role [Dataset]. https://huggingface.co/datasets/tw-llama/Synthetic-Persona-Chat-Reversal-Role
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 4, 2025
Dataset authored and provided by
Taiwan Llama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
使用前置需求

Python 3.x CSV 文件，必須包含以下欄位： user 1 personas user 2 personas Best Generated Conversation

如何使用

準備 CSV 文件請確認 CSV 文件中包含上述三個欄位，並將 CSV 文件命名為 input.csv（或根據實際情況修改腳本中的檔案名稱）。

運行腳本在命令列執行： python extract_conversations.py

執行後會生成一個 output.json 文件，內含轉換後的 JSON 數據。

如何更換角色映射

預設情況下，腳本將對話中：

User 1 的訊息映射為 gpt User 2 的訊息映射為 human

若你需要更換角色，例如將 User 1 映射成 human、User 2 映射成 gpt，請按照以下步驟修改腳本中對應的部分：

找到以下程式碼片段（位於每組對話配對邏輯中）：if first[0] == "1" and second[0] == "2":… See the full description on the dataset page: https://huggingface.co/datasets/tw-llama/Synthetic-Persona-Chat-Reversal-Role.
t
ConvAI2 Dataset - Dataset - LDM
service.tib.eu
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). ConvAI2 Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/convai2-dataset
Explore at:
Dataset updated
Nov 25, 2024
Description
The ConvAI2 dataset, derived from Persona-Chat, contains dialogues between crowdworkers who role-play as assigned personas, enabling the development of conversational agents that can mimic engaging interactions.
a
Open-Dialogue
aifasthub.com
huggingface.co
Updated Sep 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI Box (2025). Open-Dialogue [Dataset]. https://aifasthub.com/datasets/RUCAIBox/Open-Dialogue
Explore at:
Dataset updated
Sep 5, 2025
Dataset authored and provided by
AI Box
Description
This is the open dialogue datasets collected by TextBox, including:

PersonaChat (pc) DailyDialog (dd) DSTC7-AVSD (da) SGD (sgd) Topical-Chat (tc) Wizard of Wikipedia (wow) Movie Dialog (md) Cleaned OpenSubtitles Dialogs (cos) Empathetic Dialogues (ed) Curiosity (curio) CMU Document Grounded Conversations (cmudog) MuTual (mutual) OpenDialKG (odkg) DREAM (dream).

The detail and leaderboard of each dataset can be found in TextBox page.
h
korean-persona-chat-dataset
huggingface.co
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sun Donghae (2024). korean-persona-chat-dataset [Dataset]. https://huggingface.co/datasets/NLPBada/korean-persona-chat-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2024
Authors
Sun Donghae
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
채팅-페르소나 쌍 데이터셋

위 데이터는 AI Hub의 한국어 멀티세션 대화 데이터 셋을

한국어 어체 변환 모델 korean-style-converter-6b을 이용해 존댓말에서 반말로 변환 후

Session1-2로 이루어진 데이터셋에서 10328개의 ( 채팅 - 페르소나 ) 쌍을 추출하여 제작하였습니다.

추후, 정제된 버전의 데이터 셋도 공개 예정입니다.

정제된 버전의 데이터셋이 공개되었습니다! NLPBada/korean-persona-chat-dataset-v2
h
real-persona-chat
huggingface.co
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Team JINIAC (2024). real-persona-chat [Dataset]. https://huggingface.co/datasets/JINIAC/real-persona-chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2024
Dataset authored and provided by
Team JINIAC
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
以下のデータセットから、dialogue_idとutterances、話者情報（ペルソナ）を抽出し、ロールプレイを想定した形式に変更して作成しました。https://github.com/nu-dialogue/real-persona-chat

文献

@inproceedings{yamashita-etal-2023-realpersonachat, title = "{R}eal{P}ersona{C}hat: A Realistic Persona Chat Corpus with Interlocutors{'} Own Personalities", author = "Yamashita, Sanae and Inoue, Koji and Guo, Ao and Mochizuki, Shota and Kawahara, Tatsuya and Higashinaka, Ryuichiro", booktitle = "Proceedings of… See the full description on the dataset page: https://huggingface.co/datasets/JINIAC/real-persona-chat.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bavard AI, Inc. (2021). personachat_truecased [Dataset]. https://huggingface.co/datasets/bavard/personachat_truecased

personachat_truecased

bavard/personachat_truecased

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 27, 2021

Dataset authored and provided by

Bavard AI, Inc.

Description

A version of the PersonaChat dataset that has been true-cased, and also has been given more normalized punctuation. The original PersonaChat dataset is in all lower case, and has extra space around each clause/sentence separating punctuation mark. This version of the dataset has more of a natural language look, with sentence capitalization, proper noun capitalization, and normalized whitespace. Also, each dialogue turn includes a pool of distractor candidate responses, which can be used by a multiple choice regularization loss during training.

Clear search

Close search

Google apps

Main menu

personachat_truecased

PersonaChat - Dataset - LDM

persona-chat

Synthetic-Persona-Chat

PersonaChat dataset - Dataset - LDM

persona-chat

Facebook AI - PersonaChat (8784 examples)

Personalizing Dialogue Agents: I have a dog, do you have pets too?

Content

Abstract

Acknowledgements

persona-chat

USR-PersonaChat - Dataset - LDM

PERSONA-CHAT

personachat

personachat-safe

korean-persona-chat-v1

ConvAI2 (Conversational Intelligence Challenge 2)

PMPC (Persona Match on Persona-Chat)

Synthetic-Persona-Chat-Reversal-Role

ConvAI2 Dataset - Dataset - LDM

Open-Dialogue

korean-persona-chat-dataset

real-persona-chat

personachat_truecased

bavard/personachat_truecased