25 datasets found
  1. h

    personachat_truecased

    • huggingface.co
    • opendatalab.com
    Updated Sep 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bavard AI, Inc. (2021). personachat_truecased [Dataset]. https://huggingface.co/datasets/bavard/personachat_truecased
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2021
    Dataset authored and provided by
    Bavard AI, Inc.
    Description

    A version of the PersonaChat dataset that has been true-cased, and also has been given more normalized punctuation. The original PersonaChat dataset is in all lower case, and has extra space around each clause/sentence separating punctuation mark. This version of the dataset has more of a natural language look, with sentence capitalization, proper noun capitalization, and normalized whitespace. Also, each dialogue turn includes a pool of distractor candidate responses, which can be used by a multiple choice regularization loss during training.

  2. t

    PersonaChat - Dataset - LDM

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). PersonaChat - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/personachat
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    Persona-Chat is sourced from authentic conversations between human annotators who are randomly matched and assigned persona information.

  3. h

    persona-chat

    • huggingface.co
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksey Korshuk (2023). persona-chat [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/persona-chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2023
    Authors
    Aleksey Korshuk
    Description

    AlekseyKorshuk/persona-chat dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. Synthetic-Persona-Chat

    • huggingface.co
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2023). Synthetic-Persona-Chat [Dataset]. https://huggingface.co/datasets/google/Synthetic-Persona-Chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2023
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for SPC: Synthetic-Persona-Chat Dataset

    Abstract from the paper introducing this dataset:

    High-quality conversational datasets are essential for developing AI models that can communicate with users. One way to foster deeper interactions between a chatbot and its user is through personas, aspects of the user's character that provide insights into their personality, motivations, and behaviors. Training Natural Language Processing (NLP) models on a diverse and… See the full description on the dataset page: https://huggingface.co/datasets/google/Synthetic-Persona-Chat.

  5. t

    PersonaChat dataset - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). PersonaChat dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/personachat-dataset
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The PersonaChat dataset is a large persona-conditioned chit-chat style dialogue dataset.

  6. h

    persona-chat

    • huggingface.co
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cynaptics Club, IIT Indore (2024). persona-chat [Dataset]. https://huggingface.co/datasets/Cynaptics/persona-chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2024
    Dataset authored and provided by
    Cynaptics Club, IIT Indore
    Description

    Dataset Description

    This persona chat dataset consists of 20,000 conversations. This dataset is crafted to enhance personalized conversational text generation models that consistently reflect a character's persona in the generated response across many conversation turns. Each dialogue in the dataset is structured to reflect a back-and-forth exchange between two personas, offering a window into how individual characteristics, backgrounds, and personal narratives can influence… See the full description on the dataset page: https://huggingface.co/datasets/Cynaptics/persona-chat.

  7. Facebook AI - PersonaChat (8784 examples)

    • kaggle.com
    Updated Mar 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharv Jairath (2022). Facebook AI - PersonaChat (8784 examples) [Dataset]. https://www.kaggle.com/datasets/atharvjairath/personachat/versions/2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atharv Jairath
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Personalizing Dialogue Agents: I have a dog, do you have pets too?

    Paper

    Content

    A chit-chat dataset where paired Turkers are given assigned personas and chat to try to get to know each other.

    Abstract

    Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

    Acknowledgements

  8. h

    persona-chat

    • huggingface.co
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Awsaf (2025). persona-chat [Dataset]. https://huggingface.co/datasets/awsaf49/persona-chat
    Explore at:
    Dataset updated
    Jul 3, 2025
    Authors
    Awsaf
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for PersonaChat

      Dataset Description
    

    PersonaChat is a multi-turn dialogue dataset introduced by Zhang et al. (2018) for training and evaluating persona-grounded conversational agents. Each conversation is between two crowdworkers, each assigned a randomly selected persona consisting of several simple facts. The dataset aims to assess whether models can maintain consistent character traits throughout a conversation.

    Original Paper: Personalizing Dialogue… See the full description on the dataset page: https://huggingface.co/datasets/awsaf49/persona-chat.

  9. t

    USR-PersonaChat - Dataset - LDM

    • service.tib.eu
    Updated Jan 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). USR-PersonaChat - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/usr-personachat
    Explore at:
    Dataset updated
    Jan 2, 2025
    Description

    This dataset is used for dialogue response evaluation.

  10. O

    PERSONA-CHAT

    • opendatalab.com
    zip
    Updated Sep 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Facebook (2022). PERSONA-CHAT [Dataset]. https://opendatalab.com/OpenDataLab/PERSONA-CHAT
    Explore at:
    zip(247211 bytes)Available download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    Facebook
    Montreal Institute for Learning Algorithms
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present the PERSONA-CHAT dataset, a new dialogue dataset consisting of 162,064 utterances between crowdworkers who were randomly paired and each asked to act the part of a given provided persona (randomly assigned, and created by another set of crowdworkers). The paired workers were asked to chat naturally and to get to know each other during the conversation. This produces interesting and engaging conversations that our agents can try to learn to mimic.

  11. h

    personachat

    • huggingface.co
    Updated Nov 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anas Saleh Mousa (2024). personachat [Dataset]. https://huggingface.co/datasets/anassaleh218/personachat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 22, 2024
    Authors
    Anas Saleh Mousa
    Description

    anassaleh218/personachat dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    personachat-safe

    • huggingface.co
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human Language Technology Lab @ NUS (2024). personachat-safe [Dataset]. https://huggingface.co/datasets/hlt-lab/personachat-safe
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 26, 2024
    Dataset authored and provided by
    Human Language Technology Lab @ NUS
    Description

    Dataset Card for "personachat_safe"

    More Information needed

  13. h

    korean-persona-chat-v1

    • huggingface.co
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SeongUk Moon (2025). korean-persona-chat-v1 [Dataset]. https://huggingface.co/datasets/ANTEGRAL/korean-persona-chat-v1
    Explore at:
    Dataset updated
    Feb 7, 2025
    Authors
    SeongUk Moon
    Description

    ANTEGRAL/korean-persona-chat-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. O

    ConvAI2 (Conversational Intelligence Challenge 2)

    • opendatalab.com
    zip
    Updated Apr 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Facebook AI Research (2023). ConvAI2 (Conversational Intelligence Challenge 2) [Dataset]. https://opendatalab.com/OpenDataLab/ConvAI2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2023
    Dataset provided by
    McGill University
    Carnegie Mellon University
    Facebook AI Research
    Moscow Institute of Physics and Technology
    University of Montreal
    Microsoft Research
    Description

    The ConvAI2 NeurIPS competition aimed at finding approaches to creating high-quality dialogue agents capable of meaningful open domain conversation. The ConvAI2 dataset for training models is based on the PERSONA-CHAT dataset. The speaker pairs each have assigned profiles coming from a set of 1155 possible personas (at training time), each consisting of at least 5 profile sentences, setting aside 100 never seen before personas for validation. As the original PERSONA-CHAT test set was released, a new hidden test set consisted of 100 new personas and over 1,015 dialogs was created by crowdsourced workers. To avoid modeling that takes advantage of trivial word overlap, additional rewritten sets of the same train and test personas were crowdsourced, with related sentences that are rephrases, generalizations or specializations, rendering the task much more challenging. For example “I just got my nails done” is revised as “I love to pamper myself on a regular basis” and “I am on a diet now” is revised as “I need to lose weight.” The training, validation and hidden test sets consists of 17,878, 1,000 and 1,015 dialogues, respectively.

  15. PMPC (Persona Match on Persona-Chat)

    • opendatalab.com
    zip
    Updated Sep 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iFlytek Research (2022). PMPC (Persona Match on Persona-Chat) [Dataset]. https://opendatalab.com/OpenDataLab/PMPC
    Explore at:
    zip(141185672 bytes)Available download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    科大讯飞http://www.iflytek.com/
    Queen’s University
    University of Science and Technology of China
    Microsoft Research Asia
    Description

    PMPC (Persona Match on Persona-Chat) is a dataset for Speaker Persona Detection (SPD) which aims to detect speaker personas based on the plain conversational text.

  16. h

    Synthetic-Persona-Chat-Reversal-Role

    • huggingface.co
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taiwan Llama (2025). Synthetic-Persona-Chat-Reversal-Role [Dataset]. https://huggingface.co/datasets/tw-llama/Synthetic-Persona-Chat-Reversal-Role
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2025
    Dataset authored and provided by
    Taiwan Llama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    使用前置需求

    Python 3.x CSV 文件,必須包含以下欄位: user 1 personas user 2 personas Best Generated Conversation

      如何使用
    

    準備 CSV 文件請確認 CSV 文件中包含上述三個欄位,並將 CSV 文件命名為 input.csv(或根據實際情況修改腳本中的檔案名稱)。

    運行腳本在命令列執行: python extract_conversations.py

    執行後會生成一個 output.json 文件,內含轉換後的 JSON 數據。

      如何更換角色映射
    

    預設情況下,腳本將對話中:

    User 1 的訊息映射為 gpt User 2 的訊息映射為 human

    若你需要更換角色,例如將 User 1 映射成 human、User 2 映射成 gpt,請按照以下步驟修改腳本中對應的部分:

    找到以下程式碼片段(位於每組對話配對邏輯中):if first[0] == "1" and second[0] == "2":… See the full description on the dataset page: https://huggingface.co/datasets/tw-llama/Synthetic-Persona-Chat-Reversal-Role.

  17. t

    ConvAI2 Dataset - Dataset - LDM

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ConvAI2 Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/convai2-dataset
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The ConvAI2 dataset, derived from Persona-Chat, contains dialogues between crowdworkers who role-play as assigned personas, enabling the development of conversational agents that can mimic engaging interactions.

  18. a

    Open-Dialogue

    • aifasthub.com
    • huggingface.co
    Updated Sep 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Box (2025). Open-Dialogue [Dataset]. https://aifasthub.com/datasets/RUCAIBox/Open-Dialogue
    Explore at:
    Dataset updated
    Sep 5, 2025
    Dataset authored and provided by
    AI Box
    Description

    This is the open dialogue datasets collected by TextBox, including:

    PersonaChat (pc) DailyDialog (dd) DSTC7-AVSD (da) SGD (sgd) Topical-Chat (tc) Wizard of Wikipedia (wow) Movie Dialog (md) Cleaned OpenSubtitles Dialogs (cos) Empathetic Dialogues (ed) Curiosity (curio) CMU Document Grounded Conversations (cmudog) MuTual (mutual) OpenDialKG (odkg) DREAM (dream).

    The detail and leaderboard of each dataset can be found in TextBox page.

  19. h

    korean-persona-chat-dataset

    • huggingface.co
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sun Donghae (2024). korean-persona-chat-dataset [Dataset]. https://huggingface.co/datasets/NLPBada/korean-persona-chat-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2024
    Authors
    Sun Donghae
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    채팅-페르소나 쌍 데이터셋

    위 데이터는 AI Hub의 한국어 멀티세션 대화 데이터 셋을

    한국어 어체 변환 모델 korean-style-converter-6b을 이용해 존댓말에서 반말로 변환 후

    Session1-2로 이루어진 데이터셋에서 10328개의 ( 채팅 - 페르소나 ) 쌍을 추출하여 제작하였습니다.

    추후, 정제된 버전의 데이터 셋도 공개 예정입니다.

    정제된 버전의 데이터셋이 공개되었습니다! NLPBada/korean-persona-chat-dataset-v2

  20. h

    real-persona-chat

    • huggingface.co
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Team JINIAC (2024). real-persona-chat [Dataset]. https://huggingface.co/datasets/JINIAC/real-persona-chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2024
    Dataset authored and provided by
    Team JINIAC
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    以下のデータセットから、dialogue_idとutterances、話者情報(ペルソナ)を抽出し、ロールプレイを想定した形式に変更して作成しました。https://github.com/nu-dialogue/real-persona-chat

      文献
    

    @inproceedings{yamashita-etal-2023-realpersonachat, title = "{R}eal{P}ersona{C}hat: A Realistic Persona Chat Corpus with Interlocutors{'} Own Personalities", author = "Yamashita, Sanae and Inoue, Koji and Guo, Ao and Mochizuki, Shota and Kawahara, Tatsuya and Higashinaka, Ryuichiro", booktitle = "Proceedings of… See the full description on the dataset page: https://huggingface.co/datasets/JINIAC/real-persona-chat.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bavard AI, Inc. (2021). personachat_truecased [Dataset]. https://huggingface.co/datasets/bavard/personachat_truecased

personachat_truecased

bavard/personachat_truecased

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 27, 2021
Dataset authored and provided by
Bavard AI, Inc.
Description

A version of the PersonaChat dataset that has been true-cased, and also has been given more normalized punctuation. The original PersonaChat dataset is in all lower case, and has extra space around each clause/sentence separating punctuation mark. This version of the dataset has more of a natural language look, with sentence capitalization, proper noun capitalization, and normalized whitespace. Also, each dialogue turn includes a pool of distractor candidate responses, which can be used by a multiple choice regularization loss during training.

Search
Clear search
Close search
Google apps
Main menu