11 datasets found

h
PersonaHub_modified
huggingface.co
Updated Jun 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yusheng Su (2024). PersonaHub_modified [Dataset]. https://huggingface.co/datasets/yushengsu/PersonaHub_modified
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2024
Authors
Yusheng Su
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
GPT-4 Generated Data Reference:

(Original) https://huggingface.co/datasets/proj-persona/PersonaHub (Original Github) https://github.com/tencent-ailab/persona-hub
h
CustomerPersonas
huggingface.co
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liran Baba (2024). CustomerPersonas [Dataset]. https://huggingface.co/datasets/CordwainerSmith/CustomerPersonas
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2024
Authors
Liran Baba
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Synthetic Customer Experience Persona

Overview

The Synthetic Customer Experience Persona Dataset is a large-scale synthetic corpus of customer service personas, designed to aid in the development and evaluation of AI models for customer service applications. Inspired by Tencent AI Labs' Persona Hub, this dataset provides a diverse array of customer profiles across multiple industries.

Dataset Statistics

Total Personas: 250,000 Industries Covered: 6 (Retail… See the full description on the dataset page: https://huggingface.co/datasets/CordwainerSmith/CustomerPersonas.
h
persona-response-hub-10k-fratbro
huggingface.co
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lee Harrold (2025). persona-response-hub-10k-fratbro [Dataset]. https://huggingface.co/datasets/LeeHarrold/persona-response-hub-10k-fratbro
Explore at:
Dataset updated
Nov 24, 2025
Authors
Lee Harrold
Description
LeeHarrold/persona-response-hub-10k-fratbro dataset hosted on Hugging Face and contributed by the HF Datasets community
a
ArcGIS Hub - Personas
keep-your-city-clean-hubclub.hub.arcgis.com
Updated Nov 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hub Club: ArcGIS Hub Demo Organization (2021). ArcGIS Hub - Personas [Dataset]. https://keep-your-city-clean-hubclub.hub.arcgis.com/datasets/arcgis-hub-personas
Explore at:
Dataset updated
Nov 25, 2021
Dataset authored and provided by
Hub Club: ArcGIS Hub Demo Organization
Description
Started as a open data platform for governments to make their data accessible to communities - first step of engagementToday, Hub has evolved into a collaboration and engagement platform that is used by organizations across the globe to address critical issues in the community and world at largeAs the product evolved, adoption evolved as well. Today, Hub implementation are wide-ranged- Government, NGOs, Infrastructure, Education
h
TextBooksPersonaHub
huggingface.co
Updated Jul 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nacer (2024). TextBooksPersonaHub [Dataset]. http://doi.org/10.57967/hf/2751
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/2751
Dataset updated
Jul 21, 2024
Authors
nacer
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
TextBooksPersonaHub

Overview

The TextBooksPersonaHub dataset is an extension of the proj-persona/PersonaHub dataset, created using the technique described in the paper Textbooks Are All You Need II. This dataset contains synthetically generated "textbook-like" passages tailored in french to specific personas, aimed at enhancing language model training with high-quality and diverse content.

Dataset Creation Source Data

The original personas… See the full description on the dataset page: https://huggingface.co/datasets/drodin/TextBooksPersonaHub.
h
persona-response-hub-10k
huggingface.co
Updated Nov 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lee Harrold (2025). persona-response-hub-10k [Dataset]. https://huggingface.co/datasets/LeeHarrold/persona-response-hub-10k
Explore at:
Dataset updated
Nov 23, 2025
Authors
Lee Harrold
Description
Dataset Card for persona-response-hub-10k

This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into your Argilla server as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

Using this dataset with Argilla

To load with Argilla, you'll just need to install Argilla as pip install argilla --upgrade and then use the following code: import argilla as rg

ds =… See the full description on the dataset page: https://huggingface.co/datasets/LeeHarrold/persona-response-hub-10k.
e
Persona Image Managers
migrating2arcgispro.eagle.co.nz
Updated Sep 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eagle Technology Group Ltd (2023). Persona Image Managers [Dataset]. https://migrating2arcgispro.eagle.co.nz/documents/da6c05c8b1cc4ac78a01e524d89465c1
Explore at:
Dataset updated
Sep 6, 2023
Dataset authored and provided by
Eagle Technology Group Ltd
Description
Image for use one the Migrating to ArcGIS Pro Hub page for the Managers Persona.
Data from: AstroChat
kaggle.com
huggingface.co
zip
Updated Jun 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
astro_pat (2024). AstroChat [Dataset]. https://www.kaggle.com/datasets/patrickfleith/astrochat
Explore at:
zip(1214166 bytes)Available download formats
Dataset updated
Jun 9, 2024
Authors
astro_pat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose and Scope

The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.

Intended Use

The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).

Quickstart

To be completed

DATASET DESCRIPTION

Access

Manual download from Hugging face hub: https://huggingface.co/datasets/patrickfleith/AstroChat

Or with python: python from datasets import load_dataset dataset = load_dataset("patrickfleith/AstroChat")

Structure

901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column): - id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets. - topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split. - subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc. - persona: description of the persona used to simulate a user - opening_question: the first question asked by the user to start a conversation with the AI-assistant - messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields: - role: the role of the speaker, either user or assistant - content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.

Important See the full list of topics and subtopics covered below.

Metadata

Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main

Generation Method

We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:

Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.

Step-by-step description

Defined a set of user persona

Defined a set of topics/ disciplines within the domain of Astronautics / Space Mission Engineering

For each topics, we defined a set of subtopics to narrow down the conversation to more specific and niche conversations (see below the full list)

For each subtopic we generate a set of opening questions that the user could ask to start a conversation (see below the full list)

We then distil the knowledge of an strong Chat Model (in our case ChatGPT through then api with gpt-4-turbo model) to generate the answers to the opening questions

We simulate follow-up questions from the user to the assistant, and the assistant's answers to these questions which builds up the messages.

Future work and contributions appreciated

Distil knowledge from more models (Anthropic, Mixtral, GPT-4o, etc...)

Implement more creativity in the opening questions and follow-up questions

Filter-out questions and conversations which are too similar

Ask topic and subtopic expert to validate the generated conversations to have a sense on how reliable is the overall dataset

Languages

All instances in the dataset are in english

Size

901 synthetically-generated dialogue

USAGE AND GUIDELINES

License

AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International

Restrictions

No restriction. Please provide the correct attribution following the license terms.

Citation

Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579

Update Frequency

Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)

Have a feedback or spot an error?

Use the ...
d
Personal well-being - Datasets - Data North Yorkshire
hub.datanorthyorkshire.org
Updated Oct 4, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Personal well-being - Datasets - Data North Yorkshire [Dataset]. https://hub.datanorthyorkshire.org/dataset/personal-well-being-estimates
Explore at:
Dataset updated
Oct 4, 2016
Area covered
North Yorkshire, Yorkshire
Description
Estimates of personal well-being from the Annual Population Survey (APS) at district level on life satisfaction, feeling worthwhile, feeling happy and feeling anxious.
Blended Skill Talk
kaggle.com
zip
Updated Nov 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Blended Skill Talk [Dataset]. https://www.kaggle.com/datasets/thedevastator/multi-modal-conversation-data/code
Explore at:
zip(38458951 bytes)Available download formats
Dataset updated
Nov 26, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Blended Skill Talk

Personality, Empathy, and Knowledge

By Huggingface Hub [source]

About this dataset

This dataset contains conversations between two personas with additional context, previous utterances, free messages, guided messages, suggestions and guided chosen suggestions, allowing for the creation of natural multi-modal conversations with personality, empathy and knowledge. The conversations are designed to measure a full range of technical competencies such as dialog flow management (including response times), topic control and coherence of conversation. It also provides a basis for exploring the impact of different conversational styles on user engagement. Additionally, the task is useful in validating distributed dialogue systems across various modalities while revealing potential biases present in different contexts. Finally, it enables benchmarking against data sets in similar areas towards development of an automatic evaluation system to effectively grade tactical skill talk performance over time

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains conversations between two personas, providing detailed insight into the conversation flow and the use of multi-modal communication tools. The data provides information about each particular conversational experience that can be used to develop a better understanding of how people interact.

Within this dataset there are several columns that can be used to investigate the conversations, along with the context in which it is occurring. Personas are included as an indicator for who is engaging in conversation, additional context allows a framework for starting off conversations and accumulating meaningful results from each segment of dialogue. The previous utterance column captures recent history so any relevant reference points can be utilized during discourse. Free messages allow anyone partaking in the exchange to spontaneously add content and suggest topics that may have been otherwise overlooked. Guided messages help guide discussions while still allowing for creative dialogue choices with regards to how delivery will vary within earlier frames mentioned previously such as context or previous utterances. Suggestions serve a dual purpose offering approval or suggestions when necessary but subtlety implying how certain actions should or could be taken when engaging digital avatars; while guided chosen suggestion inserts weights User Neutrality vs sentiment engagement depending on what course might suit either users objectives best, minimizing bias and instead advocating fairness over personal preference towards either individual party's specific stance on disputed topics~

Research Ideas

Building virtual assistants with conversational and multi-modal capabilities.

Creating interactive tutorials using a combination of personas, free messages, guided messages and suggestions for content that constantly adapts to user input

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv | Column name | Description | |:-----------------------|:-------------------------------------------------------------------------------------------------------| | personas | This column contains the two personas involved in the conversation. (String) | | additional_context | This column contains additional context information that may be relevant to the conversation. (String) | | previous_utterance | This column contains the previous utterance from the conversation. (String) | | free_messages | This column contains free messages that can be used to create dynamic conversations. (String) | | suggestions | This column contains suggested messages that can be used to create dynamic conversations. (String) |

File: train.csv | Column name | Description | |:--...
h
PersonaHub-ko
huggingface.co
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
유준혁 (2025). PersonaHub-ko [Dataset]. https://huggingface.co/datasets/youjunhyeok/PersonaHub-ko
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2025
Authors
유준혁
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Translated proj-persona/PersonaHub using nayohan/llama3-instrucTrans-enko-8b. For this dataset, we only used data that is 5000 characters or less in length and has language of English. Thanks for @proj-persona and @nayohan.

Scaling Synthetic Data Creation with 1,000,000,000 Personas

This repo releases data introduced in our paper Scaling Synthetic Data Creation with 1,000,000,000 Personas: We propose a novel persona-driven data synthesis methodology that leverages various… See the full description on the dataset page: https://huggingface.co/datasets/youjunhyeok/PersonaHub-ko.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yusheng Su (2024). PersonaHub_modified [Dataset]. https://huggingface.co/datasets/yushengsu/PersonaHub_modified

PersonaHub_modified

yushengsu/PersonaHub_modified

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 29, 2024

Authors

Yusheng Su

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

GPT-4 Generated Data Reference:

(Original) https://huggingface.co/datasets/proj-persona/PersonaHub (Original Github) https://github.com/tencent-ailab/persona-hub

Clear search

Close search

Google apps

Main menu

PersonaHub_modified

CustomerPersonas

persona-response-hub-10k-fratbro

ArcGIS Hub - Personas

TextBooksPersonaHub

persona-response-hub-10k

Persona Image Managers

Data from: AstroChat

Purpose and Scope

Intended Use

Quickstart

DATASET DESCRIPTION

Access

Structure

Metadata

Generation Method

Step-by-step description

Future work and contributions appreciated

Languages

Size

USAGE AND GUIDELINES

License

Restrictions

Citation

Update Frequency

Have a feedback or spot an error?

Personal well-being - Datasets - Data North Yorkshire

Blended Skill Talk

Blended Skill Talk

Personality, Empathy, and Knowledge

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

PersonaHub-ko

PersonaHub_modified

yushengsu/PersonaHub_modified