11 datasets found
  1. h

    PersonaHub_modified

    • huggingface.co
    Updated Jun 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yusheng Su (2024). PersonaHub_modified [Dataset]. https://huggingface.co/datasets/yushengsu/PersonaHub_modified
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2024
    Authors
    Yusheng Su
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description
  2. h

    CustomerPersonas

    • huggingface.co
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liran Baba (2024). CustomerPersonas [Dataset]. https://huggingface.co/datasets/CordwainerSmith/CustomerPersonas
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2024
    Authors
    Liran Baba
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Synthetic Customer Experience Persona

      Overview
    

    The Synthetic Customer Experience Persona Dataset is a large-scale synthetic corpus of customer service personas, designed to aid in the development and evaluation of AI models for customer service applications. Inspired by Tencent AI Labs' Persona Hub, this dataset provides a diverse array of customer profiles across multiple industries.

      Dataset Statistics
    

    Total Personas: 250,000 Industries Covered: 6 (Retail… See the full description on the dataset page: https://huggingface.co/datasets/CordwainerSmith/CustomerPersonas.

  3. h

    persona-response-hub-10k-fratbro

    • huggingface.co
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee Harrold (2025). persona-response-hub-10k-fratbro [Dataset]. https://huggingface.co/datasets/LeeHarrold/persona-response-hub-10k-fratbro
    Explore at:
    Dataset updated
    Nov 24, 2025
    Authors
    Lee Harrold
    Description

    LeeHarrold/persona-response-hub-10k-fratbro dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. a

    ArcGIS Hub - Personas

    • keep-your-city-clean-hubclub.hub.arcgis.com
    Updated Nov 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hub Club: ArcGIS Hub Demo Organization (2021). ArcGIS Hub - Personas [Dataset]. https://keep-your-city-clean-hubclub.hub.arcgis.com/datasets/arcgis-hub-personas
    Explore at:
    Dataset updated
    Nov 25, 2021
    Dataset authored and provided by
    Hub Club: ArcGIS Hub Demo Organization
    Description

    Started as a open data platform for governments to make their data accessible to communities - first step of engagementToday, Hub has evolved into a collaboration and engagement platform that is used by organizations across the globe to address critical issues in the community and world at largeAs the product evolved, adoption evolved as well. Today, Hub implementation are wide-ranged- Government, NGOs, Infrastructure, Education

  5. h

    TextBooksPersonaHub

    • huggingface.co
    Updated Jul 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nacer (2024). TextBooksPersonaHub [Dataset]. http://doi.org/10.57967/hf/2751
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2024
    Authors
    nacer
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TextBooksPersonaHub

      Overview
    

    The TextBooksPersonaHub dataset is an extension of the proj-persona/PersonaHub dataset, created using the technique described in the paper Textbooks Are All You Need II. This dataset contains synthetically generated "textbook-like" passages tailored in french to specific personas, aimed at enhancing language model training with high-quality and diverse content.

      Dataset Creation
    
    
    
    
    
    
    
      Source Data
    

    The original personas… See the full description on the dataset page: https://huggingface.co/datasets/drodin/TextBooksPersonaHub.

  6. h

    persona-response-hub-10k

    • huggingface.co
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee Harrold (2025). persona-response-hub-10k [Dataset]. https://huggingface.co/datasets/LeeHarrold/persona-response-hub-10k
    Explore at:
    Dataset updated
    Nov 23, 2025
    Authors
    Lee Harrold
    Description

    Dataset Card for persona-response-hub-10k

    This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into your Argilla server as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

      Using this dataset with Argilla
    

    To load with Argilla, you'll just need to install Argilla as pip install argilla --upgrade and then use the following code: import argilla as rg

    ds =… See the full description on the dataset page: https://huggingface.co/datasets/LeeHarrold/persona-response-hub-10k.

  7. e

    Persona Image Managers

    • migrating2arcgispro.eagle.co.nz
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eagle Technology Group Ltd (2023). Persona Image Managers [Dataset]. https://migrating2arcgispro.eagle.co.nz/documents/da6c05c8b1cc4ac78a01e524d89465c1
    Explore at:
    Dataset updated
    Sep 6, 2023
    Dataset authored and provided by
    Eagle Technology Group Ltd
    Description

    Image for use one the Migrating to ArcGIS Pro Hub page for the Managers Persona.

  8. Data from: AstroChat

    • kaggle.com
    • huggingface.co
    zip
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    astro_pat (2024). AstroChat [Dataset]. https://www.kaggle.com/datasets/patrickfleith/astrochat
    Explore at:
    zip(1214166 bytes)Available download formats
    Dataset updated
    Jun 9, 2024
    Authors
    astro_pat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose and Scope

    The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.

    Intended Use

    The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).

    Quickstart

    To be completed

    DATASET DESCRIPTION

    Access

    Structure

    901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column): - id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets. - topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split. - subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc. - persona: description of the persona used to simulate a user - opening_question: the first question asked by the user to start a conversation with the AI-assistant - messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields: - role: the role of the speaker, either user or assistant - content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.

    Important See the full list of topics and subtopics covered below.

    Metadata

    Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main

    Generation Method

    We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:

    Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.

    Step-by-step description

    • Defined a set of user persona
    • Defined a set of topics/ disciplines within the domain of Astronautics / Space Mission Engineering
    • For each topics, we defined a set of subtopics to narrow down the conversation to more specific and niche conversations (see below the full list)
    • For each subtopic we generate a set of opening questions that the user could ask to start a conversation (see below the full list)
    • We then distil the knowledge of an strong Chat Model (in our case ChatGPT through then api with gpt-4-turbo model) to generate the answers to the opening questions
    • We simulate follow-up questions from the user to the assistant, and the assistant's answers to these questions which builds up the messages.

    Future work and contributions appreciated

    • Distil knowledge from more models (Anthropic, Mixtral, GPT-4o, etc...)
    • Implement more creativity in the opening questions and follow-up questions
    • Filter-out questions and conversations which are too similar
    • Ask topic and subtopic expert to validate the generated conversations to have a sense on how reliable is the overall dataset

    Languages

    All instances in the dataset are in english

    Size

    901 synthetically-generated dialogue

    USAGE AND GUIDELINES

    License

    AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International

    Restrictions

    No restriction. Please provide the correct attribution following the license terms.

    Citation

    Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579

    Update Frequency

    Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)

    Have a feedback or spot an error?

    Use the ...

  9. d

    Personal well-being - Datasets - Data North Yorkshire

    • hub.datanorthyorkshire.org
    Updated Oct 4, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Personal well-being - Datasets - Data North Yorkshire [Dataset]. https://hub.datanorthyorkshire.org/dataset/personal-well-being-estimates
    Explore at:
    Dataset updated
    Oct 4, 2016
    Area covered
    North Yorkshire, Yorkshire
    Description

    Estimates of personal well-being from the Annual Population Survey (APS) at district level on life satisfaction, feeling worthwhile, feeling happy and feeling anxious.

  10. Blended Skill Talk

    • kaggle.com
    zip
    Updated Nov 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Blended Skill Talk [Dataset]. https://www.kaggle.com/datasets/thedevastator/multi-modal-conversation-data/code
    Explore at:
    zip(38458951 bytes)Available download formats
    Dataset updated
    Nov 26, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Blended Skill Talk

    Personality, Empathy, and Knowledge

    By Huggingface Hub [source]

    About this dataset

    This dataset contains conversations between two personas with additional context, previous utterances, free messages, guided messages, suggestions and guided chosen suggestions, allowing for the creation of natural multi-modal conversations with personality, empathy and knowledge. The conversations are designed to measure a full range of technical competencies such as dialog flow management (including response times), topic control and coherence of conversation. It also provides a basis for exploring the impact of different conversational styles on user engagement. Additionally, the task is useful in validating distributed dialogue systems across various modalities while revealing potential biases present in different contexts. Finally, it enables benchmarking against data sets in similar areas towards development of an automatic evaluation system to effectively grade tactical skill talk performance over time

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains conversations between two personas, providing detailed insight into the conversation flow and the use of multi-modal communication tools. The data provides information about each particular conversational experience that can be used to develop a better understanding of how people interact.

    Within this dataset there are several columns that can be used to investigate the conversations, along with the context in which it is occurring. Personas are included as an indicator for who is engaging in conversation, additional context allows a framework for starting off conversations and accumulating meaningful results from each segment of dialogue. The previous utterance column captures recent history so any relevant reference points can be utilized during discourse. Free messages allow anyone partaking in the exchange to spontaneously add content and suggest topics that may have been otherwise overlooked. Guided messages help guide discussions while still allowing for creative dialogue choices with regards to how delivery will vary within earlier frames mentioned previously such as context or previous utterances. Suggestions serve a dual purpose offering approval or suggestions when necessary but subtlety implying how certain actions should or could be taken when engaging digital avatars; while guided chosen suggestion inserts weights User Neutrality vs sentiment engagement depending on what course might suit either users objectives best, minimizing bias and instead advocating fairness over personal preference towards either individual party's specific stance on disputed topics~

    Research Ideas

    • Building virtual assistants with conversational and multi-modal capabilities.
    • Creating interactive tutorials using a combination of personas, free messages, guided messages and suggestions for content that constantly adapts to user input

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: validation.csv | Column name | Description | |:-----------------------|:-------------------------------------------------------------------------------------------------------| | personas | This column contains the two personas involved in the conversation. (String) | | additional_context | This column contains additional context information that may be relevant to the conversation. (String) | | previous_utterance | This column contains the previous utterance from the conversation. (String) | | free_messages | This column contains free messages that can be used to create dynamic conversations. (String) | | suggestions | This column contains suggested messages that can be used to create dynamic conversations. (String) |

    File: train.csv | Column name | Description | |:--...

  11. h

    PersonaHub-ko

    • huggingface.co
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    유준혁 (2025). PersonaHub-ko [Dataset]. https://huggingface.co/datasets/youjunhyeok/PersonaHub-ko
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 15, 2025
    Authors
    유준혁
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Translated proj-persona/PersonaHub using nayohan/llama3-instrucTrans-enko-8b. For this dataset, we only used data that is 5000 characters or less in length and has language of English. Thanks for @proj-persona and @nayohan.

      Scaling Synthetic Data Creation with 1,000,000,000 Personas
    

    This repo releases data introduced in our paper Scaling Synthetic Data Creation with 1,000,000,000 Personas: We propose a novel persona-driven data synthesis methodology that leverages various… See the full description on the dataset page: https://huggingface.co/datasets/youjunhyeok/PersonaHub-ko.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yusheng Su (2024). PersonaHub_modified [Dataset]. https://huggingface.co/datasets/yushengsu/PersonaHub_modified

PersonaHub_modified

yushengsu/PersonaHub_modified

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2024
Authors
Yusheng Su
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description
Search
Clear search
Close search
Google apps
Main menu