Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GPT-4 Generated Data Reference:
(Original) https://huggingface.co/datasets/proj-persona/PersonaHub (Original Github) https://github.com/tencent-ailab/persona-hub
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Synthetic Customer Experience Persona
Overview
The Synthetic Customer Experience Persona Dataset is a large-scale synthetic corpus of customer service personas, designed to aid in the development and evaluation of AI models for customer service applications. Inspired by Tencent AI Labs' Persona Hub, this dataset provides a diverse array of customer profiles across multiple industries.
Dataset Statistics
Total Personas: 250,000 Industries Covered: 6 (Retail… See the full description on the dataset page: https://huggingface.co/datasets/CordwainerSmith/CustomerPersonas.
Facebook
TwitterLeeHarrold/persona-response-hub-10k-fratbro dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterStarted as a open data platform for governments to make their data accessible to communities - first step of engagementToday, Hub has evolved into a collaboration and engagement platform that is used by organizations across the globe to address critical issues in the community and world at largeAs the product evolved, adoption evolved as well. Today, Hub implementation are wide-ranged- Government, NGOs, Infrastructure, Education
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
TextBooksPersonaHub
Overview
The TextBooksPersonaHub dataset is an extension of the proj-persona/PersonaHub dataset, created using the technique described in the paper Textbooks Are All You Need II. This dataset contains synthetically generated "textbook-like" passages tailored in french to specific personas, aimed at enhancing language model training with high-quality and diverse content.
Dataset Creation
Source Data
The original personas… See the full description on the dataset page: https://huggingface.co/datasets/drodin/TextBooksPersonaHub.
Facebook
TwitterDataset Card for persona-response-hub-10k
This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into your Argilla server as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.
Using this dataset with Argilla
To load with Argilla, you'll just need to install Argilla as pip install argilla --upgrade and then use the following code: import argilla as rg
ds =… See the full description on the dataset page: https://huggingface.co/datasets/LeeHarrold/persona-response-hub-10k.
Facebook
TwitterImage for use one the Migrating to ArcGIS Pro Hub page for the Managers Persona.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.
The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).
To be completed
python
from datasets import load_dataset
dataset = load_dataset("patrickfleith/AstroChat")901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column):
- id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets.
- topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split.
- subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc.
- persona: description of the persona used to simulate a user
- opening_question: the first question asked by the user to start a conversation with the AI-assistant
- messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields:
- role: the role of the speaker, either user or assistant
- content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.
Important See the full list of topics and subtopics covered below.
Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main
We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:
Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.
gpt-4-turbo model) to generate the answers to the opening questionsAll instances in the dataset are in english
901 synthetically-generated dialogue
AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International
No restriction. Please provide the correct attribution following the license terms.
Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579
Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)
Use the ...
Facebook
TwitterEstimates of personal well-being from the Annual Population Survey (APS) at district level on life satisfaction, feeling worthwhile, feeling happy and feeling anxious.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
This dataset contains conversations between two personas with additional context, previous utterances, free messages, guided messages, suggestions and guided chosen suggestions, allowing for the creation of natural multi-modal conversations with personality, empathy and knowledge. The conversations are designed to measure a full range of technical competencies such as dialog flow management (including response times), topic control and coherence of conversation. It also provides a basis for exploring the impact of different conversational styles on user engagement. Additionally, the task is useful in validating distributed dialogue systems across various modalities while revealing potential biases present in different contexts. Finally, it enables benchmarking against data sets in similar areas towards development of an automatic evaluation system to effectively grade tactical skill talk performance over time
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains conversations between two personas, providing detailed insight into the conversation flow and the use of multi-modal communication tools. The data provides information about each particular conversational experience that can be used to develop a better understanding of how people interact.
Within this dataset there are several columns that can be used to investigate the conversations, along with the context in which it is occurring. Personas are included as an indicator for who is engaging in conversation, additional context allows a framework for starting off conversations and accumulating meaningful results from each segment of dialogue. The previous utterance column captures recent history so any relevant reference points can be utilized during discourse. Free messages allow anyone partaking in the exchange to spontaneously add content and suggest topics that may have been otherwise overlooked. Guided messages help guide discussions while still allowing for creative dialogue choices with regards to how delivery will vary within earlier frames mentioned previously such as context or previous utterances. Suggestions serve a dual purpose offering approval or suggestions when necessary but subtlety implying how certain actions should or could be taken when engaging digital avatars; while guided chosen suggestion inserts weights User Neutrality vs sentiment engagement depending on what course might suit either users objectives best, minimizing bias and instead advocating fairness over personal preference towards either individual party's specific stance on disputed topics~
- Building virtual assistants with conversational and multi-modal capabilities.
- Creating interactive tutorials using a combination of personas, free messages, guided messages and suggestions for content that constantly adapts to user input
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: validation.csv | Column name | Description | |:-----------------------|:-------------------------------------------------------------------------------------------------------| | personas | This column contains the two personas involved in the conversation. (String) | | additional_context | This column contains additional context information that may be relevant to the conversation. (String) | | previous_utterance | This column contains the previous utterance from the conversation. (String) | | free_messages | This column contains free messages that can be used to create dynamic conversations. (String) | | suggestions | This column contains suggested messages that can be used to create dynamic conversations. (String) |
File: train.csv | Column name | Description | |:--...
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Translated proj-persona/PersonaHub using nayohan/llama3-instrucTrans-enko-8b. For this dataset, we only used data that is 5000 characters or less in length and has language of English. Thanks for @proj-persona and @nayohan.
Scaling Synthetic Data Creation with 1,000,000,000 Personas
This repo releases data introduced in our paper Scaling Synthetic Data Creation with 1,000,000,000 Personas: We propose a novel persona-driven data synthesis methodology that leverages various… See the full description on the dataset page: https://huggingface.co/datasets/youjunhyeok/PersonaHub-ko.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GPT-4 Generated Data Reference:
(Original) https://huggingface.co/datasets/proj-persona/PersonaHub (Original Github) https://github.com/tencent-ailab/persona-hub