34 datasets found

PERSONA
huggingface.co
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SynthLabs (2025). PERSONA [Dataset]. https://huggingface.co/datasets/SynthLabsAI/PERSONA
Explore at:
Dataset updated
Apr 16, 2025
Dataset provided by
Synth Labs
Authors
SynthLabs
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for PERSONAS (Prism Filter)

PERSONAS (Prism filter) is one of the largest datasets of synthetic preferences, with over 200k preferences over thousands of questions and 1k personas. Details on the PERSONAS dataset can be found here paper link. Note that you MUST also fill out the form on our site to receive access to the full dataset. The form is available here.

Dataset Details Dataset Description

The personas dataset is a pluralistic… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/PERSONA.
persona-bias
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2, persona-bias [Dataset]. https://huggingface.co/datasets/allenai/persona-bias
Explore at:
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Persona-bias

Data accompanying the paper Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs at ICLR 2024. Paper || Code || Project website || License

Motivation

This is a dataset of model outputs supporting our extensive study of biases in persona-assigned LLMs. These model outputs can be used for many purposes, for instance:

developing a deeper understanding of persona-induced biases, e.g. by analyzing the inhibiting assumptions underlying model… See the full description on the dataset page: https://huggingface.co/datasets/allenai/persona-bias.
h
persona-driven-dataset
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gauri K (2025). persona-driven-dataset [Dataset]. https://huggingface.co/datasets/gourik/persona-driven-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 1, 2025
Authors
Gauri K
Description
Datasets for paper "Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation" https://arxiv.org/abs/2412.13578
f
Persona list by category.
plos.figshare.com
xls
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Persona list by category. [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325664.t002
Dataset updated
Jun 30, 2025
Dataset provided by
PLOS ONE
Authors
Pedro Henrique Luz de Araujo; Benjamin Roth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One way to steer generations from large language models (LLM) is to assign a persona: a role that describes how the user expects the LLM to behave (e.g., a helpful assistant, a teacher, a woman). This paper investigates how personas affect diverse aspects of model behavior. We assign to seven LLMs 162 personas from 12 categories spanning variables like gender, sexual orientation, and occupation. We prompt them to answer questions from five datasets covering objective (e.g., questions about math and history) and subjective tasks (e.g., questions about beliefs and values). We also compare persona’s generations to two baseline settings: a control persona setting with 30 paraphrases of “a helpful assistant” to control for models’ prompt sensitivity, and an empty persona setting where no persona is assigned. We find that for all models and datasets, personas show greater variability than the control setting and that some measures of persona behavior generalize across models.
f
Persona group average ranks (out of 193—162 personas + 30 control personas +...
plos.figshare.com
xls
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Persona group average ranks (out of 193—162 personas + 30 control personas + no persona baseline—lower is better) for each knowledge domain. The rank of the best persona in each group is shown in parenthesis. We show in bold the top persona group for each domain and we underline the best domain of each persona group. The top ranked persona for social sciences was the social scientist persona. [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325664.t003
Dataset updated
Jun 30, 2025
Dataset provided by
PLOS ONE
Authors
Pedro Henrique Luz de Araujo; Benjamin Roth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Persona group average ranks (out of 193—162 personas + 30 control personas + no persona baseline—lower is better) for each knowledge domain. The rank of the best persona in each group is shown in parenthesis. We show in bold the top persona group for each domain and we underline the best domain of each persona group. The top ranked persona for social sciences was the social scientist persona.
Synthetic-Persona-Chat
huggingface.co
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2023). Synthetic-Persona-Chat [Dataset]. https://huggingface.co/datasets/google/Synthetic-Persona-Chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2023
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for SPC: Synthetic-Persona-Chat Dataset

Abstract from the paper introducing this dataset:

High-quality conversational datasets are essential for developing AI models that can communicate with users. One way to foster deeper interactions between a chatbot and its user is through personas, aspects of the user's character that provide insights into their personality, motivations, and behaviors. Training Natural Language Processing (NLP) models on a diverse and… See the full description on the dataset page: https://huggingface.co/datasets/google/Synthetic-Persona-Chat.
P
PEC Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peixiang Zhong; Chen Zhang; Hao Wang; Yong liu; Chunyan Miao, PEC Dataset [Dataset]. https://paperswithcode.com/dataset/pec
Explore at:
Authors
Peixiang Zhong; Chen Zhang; Hao Wang; Yong liu; Chunyan Miao
Description
A novel large-scale multi-domain dataset for persona-based empathetic conversations.
f
Persona ranks (out of 193, lower is better) for increasingly specialized...
plos.figshare.com
xls
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Persona ranks (out of 193, lower is better) for increasingly specialized domains. For persona groups with multiple personas we show, in addition to the average rank, the rank of the best persona in the category between parentheses. [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325664.t004
Dataset updated
Jun 30, 2025
Dataset provided by
PLOS ONE
Authors
Pedro Henrique Luz de Araujo; Benjamin Roth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Persona ranks (out of 193, lower is better) for increasingly specialized domains. For persona groups with multiple personas we show, in addition to the average rank, the rank of the best persona in the category between parentheses.
Data from: Mapping and Influencing the Political Ideology of Large Language...
zenodo.org
bin, json
Updated Feb 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pietro Bernardelle; Pietro Bernardelle; Leon Fröhling; Leon Fröhling; Stefano Civelli; Stefano Civelli; Riccardo Lunardi; Riccardo Lunardi; KEVIN ROITERO; KEVIN ROITERO; Gianluca Demartini; Gianluca Demartini (2025). Mapping and Influencing the Political Ideology of Large Language Models using Synthetic Personas [Dataset]. http://doi.org/10.5281/zenodo.14816665
Explore at:
bin, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14816665
Dataset updated
Feb 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pietro Bernardelle; Pietro Bernardelle; Leon Fröhling; Leon Fröhling; Stefano Civelli; Stefano Civelli; Riccardo Lunardi; Riccardo Lunardi; KEVIN ROITERO; KEVIN ROITERO; Gianluca Demartini; Gianluca Demartini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the datasets and materials used to analyze and replicate the results presented in our paper investigating how persona-based prompting affects the political orientations of Large Language Models (LLMs).

Contents

The repository includes files organized by model (Mistral, Llama, Qwen, and Zephyr) and experimental condition (base, right-authoritarian [ra], and left-libertarian [ll]):

Model Response Data

*_persona_compass_base.pqt: Political compass test responses for each model using baseline persona descriptions

*_persona_compass_ra.pqt: Responses after injecting right-authoritarian descriptors

*_persona_compass_ll.pqt: Responses after injecting left-libertarian descriptors

Configuration and Input Files

personas.json: Collection of synthetic persona descriptions from PersonaHub used in the experiments

token_personas.json: Tokenized versions of the persona descriptions

political_compass_statements.json: The 62 statements from the Political Compass Test used for evaluation

prompts.json: Prompt templates used for model interactions

baseLLMsPoliticalView.json: Default political orientations of the models without persona prompting

Related Code Repository

The code used to analyze this data and reproduce the results presented in the paper can be found at: https://github.com/d-lab/llm-political-personas

File Placement Instructions

After downloading, organize the files as follows:

Configuration and Input Files

Place all the configuration files in the data/raw/ directory.

Model Response Files

Rename all model-specific .pqt files to persona_compass.pqt and place them in their respective directories:

Base condition files:

data/processed/Llama-3.1-8B-Instruct/base/persona_compass.pqt

data/processed/Mistral-7B-Instruct-v0.3/base/persona_compass.pqt

data/processed/Qwen2.5-7B-Instruct/base/persona_compass.pqt

data/processed/zephyr-7b-beta/base/persona_compass.pqt

Right-authoritarian condition files:

data/processed/Llama-3.1-8B-Instruct/right_authoritarian_personas/persona_compass.pqt

data/processed/Mistral-7B-Instruct-v0.3/right_authoritarian_personas/persona_compass.pqt

data/processed/Qwen2.5-7B-Instruct/right_authoritarian_personas/persona_compass.pqt

data/processed/zephyr-7b-beta/right_authoritarian_personas/persona_compass.pqt

Left-libertarian condition files:

data/processed/Llama-3.1-8B-Instruct/left_libertarian_personas/persona_compass.pqt

data/processed/Mistral-7B-Instruct-v0.3/left_libertarian_personas/persona_compass.pqt

data/processed/Qwen2.5-7B-Instruct/left_libertarian_personas/persona_compass.pqt

data/processed/zephyr-7b-beta/left_libertarian_personas/persona_compass.pqt
P
ConvAI2 Dataset
paperswithcode.com
library.toponeai.link
Updated Feb 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Dinan; Varvara Logacheva; Valentin Malykh; Alexander Miller; Kurt Shuster; Jack Urbanek; Douwe Kiela; Arthur Szlam; Iulian Serban; Ryan Lowe; Shrimai Prabhumoye; Alan W. black; Alexander Rudnicky; Jason Williams; Joelle Pineau; Mikhail Burtsev; Jason Weston (2021). ConvAI2 Dataset [Dataset]. https://paperswithcode.com/dataset/convai2
Explore at:
Dataset updated
Feb 19, 2021
Authors
Emily Dinan; Varvara Logacheva; Valentin Malykh; Alexander Miller; Kurt Shuster; Jack Urbanek; Douwe Kiela; Arthur Szlam; Iulian Serban; Ryan Lowe; Shrimai Prabhumoye; Alan W. black; Alexander Rudnicky; Jason Williams; Joelle Pineau; Mikhail Burtsev; Jason Weston
Description
The ConvAI2 NeurIPS competition aimed at finding approaches to creating high-quality dialogue agents capable of meaningful open domain conversation. The ConvAI2 dataset for training models is based on the PERSONA-CHAT dataset. The speaker pairs each have assigned profiles coming from a set of 1155 possible personas (at training time), each consisting of at least 5 profile sentences, setting aside 100 never seen before personas for validation. As the original PERSONA-CHAT test set was released, a new hidden test set consisted of 100 new personas and over 1,015 dialogs was created by crowdsourced workers.

To avoid modeling that takes advantage of trivial word overlap, additional rewritten sets of the same train and test personas were crowdsourced, with related sentences that are rephrases, generalizations or specializations, rendering the task much more challenging. For example “I just got my nails done” is revised as “I love to pamper myself on a regular basis” and “I am on a diet now” is revised as “I need to lose weight.”

The training, validation and hidden test sets consists of 17,878, 1,000 and 1,015 dialogues, respectively.
f
Persona ranks for self-bias (out of 193), self-accuracy, overall bias, and...
plos.figshare.com
figshare.com
xls
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Persona ranks for self-bias (out of 193), self-accuracy, overall bias, and overall accuracy. [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325664.t005
Dataset updated
Jun 30, 2025
Dataset provided by
PLOS ONE
Authors
Pedro Henrique Luz de Araujo; Benjamin Roth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Persona ranks for self-bias (out of 193), self-accuracy, overall bias, and overall accuracy.
PERSONA_subset
huggingface.co
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SynthLabs (2025). PERSONA_subset [Dataset]. https://huggingface.co/datasets/SynthLabsAI/PERSONA_subset
Explore at:
Dataset updated
Apr 16, 2025
Dataset provided by
Synth Labs
Authors
SynthLabs
License
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Description
Dataset Card for PERSONAS (Prism Filter)

PERSONAS (Prism filter) is one of the largest datasets of synthetic preferences, with over 200k preferences over thousands of questions and 1k personas. Details on the PERSONAS dataset can be found here paper link Note that this subset is 5% of the training split of PERSONAS. The full dataset is here, strictly available for academic use. You MUST request access to the full persona dataset here.

Dataset Details… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/PERSONA_subset.
P
USR-PersonaChat Dataset
paperswithcode.com
Updated Feb 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shikib Mehri; Maxine Eskenazi (2022). USR-PersonaChat Dataset [Dataset]. https://paperswithcode.com/dataset/usr-personachat
Explore at:
Dataset updated
Feb 18, 2022
Authors
Shikib Mehri; Maxine Eskenazi
Description
This dataset was collected with the goal of assessing dialog evaluation metrics. In the paper, USR: An Unsupervised and Reference Free Evaluation Metric for Dialog (Mehri and Eskenazi, 2020), the authors collect this data to measure the quality of several existing word-overlap and embedding-based metrics, as well as their newly proposed USR metric.
P
SynthPAI Dataset
paperswithcode.com
Updated Jun 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hanna Yukhymenko; Robin Staab; Mark Vero; Martin Vechev (2024). SynthPAI Dataset [Dataset]. https://paperswithcode.com/dataset/synthpai
Explore at:
Dataset updated
Jun 10, 2024
Authors
Hanna Yukhymenko; Robin Staab; Mark Vero; Martin Vechev
Description
SynthPAI was created to provide a dataset that can be used to investigate the personal attribute inference (PAI) capabilities of LLM on online texts. Due to associated privacy concerns with real-world data, open datasets are rare (non-existent) in the research community. SynthPAI is a synthetic dataset that aims to fill this gap.

Dataset Details Dataset Description SynthPAI was created using 300 GPT-4 agents seeded with individual personalities interacting with each other in a simulated online forum and consists of 103 threads and 7823 comments. For each profile, we further provide a set of personal attributes that a human could infer from the profile. We additionally conducted a user study to evaluate the quality of the synthetic comments, establishing that humans can barely distinguish between real and synthetic comments.

Curated by: The dataset was created by SRILab at ETH Zurich. It was not created on behalf of any outside entity. Funded by: Two authors of this work are supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) (SERI-funded ERC Consolidator Grant). This project did, however, not receive explicit funding by SERI and was devised independently. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the SERI-funded ERC Consolidator Grant. Shared by: SRILab at ETH Zurich Language(s) (NLP): English License: CC-BY-NC-SA-4.0

Dataset Sources

Repository: https://github.com/eth-sri/SynthPAI Paper: https://arxiv.org/abs/2406.07217

Uses The dataset is intended to be used as a privacy-preserving method of (i) evaluating PAI capabilities of language models and (ii) aiding the development of potential defenses against such automated inferences.

Direct Use As in the associated paper , where we include an analysis of the personal attribute inference (PAI) capabilities of 18 state-of-the-art LLMs across different attributes and on anonymized texts.

Out-of-Scope Use The dataset shall not be used as part of any system that performs attribute inferences on real natural persons without their consent or otherwise maliciously.

Dataset Structure We provide the instance descriptions below. Each data point consists of a single comment (that can be a top-level post):

Comment

author str: unique identifier of the person writing

username str: corresponding username

parent_id str: unique identifier of the parent comment

thread_id str: unique identifier of the thread

children list[str]: unique identifiers of children comments

profile Profile: profile making the comment - described below

text str: text of the comment

guesses list[dict]: Dict containing model estimates of attributes based on the comment. Only contains attributes for which a prediction exists.

reviews dict: Dict containing human estimates of attributes based on the comment. Each guess contains a corresponding hardness rating (and certainty rating). Contains all attributes

The associated profiles are structured as follows

Profile

username str: identifier

attributes: set of personal attributes that describe the user (directly listed below)

The corresponding attributes and values are

Attributes

Age continuous [18-99] The age of a user in years.

Place of Birth tuple [city, country] The place of birth of a user. We create tuples jointly for city and country in free-text format. (field name: birth_city_country)

Location tuple [city, country] The current location of a user. We create tuples jointly for city and country in free-text format. (field name: city_country)

Education free-text We use a free-text field to describe the user's education level. This includes additional details such as the degree and major. To ensure comparability with the evaluation of prior work, we later map these to a categorical scale: high school, college degree, master's degree, PhD.

Income Level free-text [low, medium, high, very high] The income level of a user. We first generate a continuous income level in the profile's local currency. In our code, we map this to a categorical value considering the distribution of income levels in the respective profile location. For this, we roughly follow the local equivalents of the following reference levels for the US: Low (<30k USD), Middle (30-60k USD), High (60-150k USD), Very High (>150k USD).

Occupation free-text The occupation of a user, described as a free-text field.

Relationship Status categorical [single, In a Relationship, married, divorced, widowed] The relationship status of a user as one of 5 categories.

Sex categorical [Male, Female] Biological Sex of a profile.

Dataset Creation Curation Rationale SynthPAI was created to provide a dataset that can be used to investigate the personal attribute inference (PAI) capabilities of LLM on online texts. Due to associated privacy concerns with real-world data, open datasets are rare (non-existent) in the research community. SynthPAI is a synthetic dataset that aims to fill this gap. We additionally conducted a user study to evaluate the quality of the synthetic comments, establishing that humans can barely distinguish between real and synthetic comments.

Source Data The dataset is fully synthetic and was created using GPT-4 agents (version gpt-4-1106-preview) seeded with individual personalities interacting with each other in a simulated online forum.

Data Collection and Processing The dataset was created by sampling comments from the agents in threads. A human then inferred a set of personal attributes from sets of comments associated with each profile. Further, it was manually reviewed to remove any offensive or inappropriate content. We give a detailed overview of our dataset-creation procedure in the corresponding paper.

Annotations

Annotations are provided by authors of the paper.

Personal and Sensitive Information

All contained personal information is purely synthetic and does not relate to any real individual.

Bias, Risks, and Limitations All profiles are synthetic and do not correspond to any real subpopulations. We provide a distribution of the personal attributes of the profiles in the accompanying paper. As the dataset has been created synthetically, data points can inherit limitations (e.g., biases) from the underlying model, GPT-4. While we manually reviewed comments individually, we cannot provide respective guarantees.

Citation BibTeX:

@misc{2406.07217, Author = {Hanna Yukhymenko and Robin Staab and Mark Vero and Martin Vechev}, Title = {A Synthetic Dataset for Personal Attribute Inference}, Year = {2024}, Eprint = {arXiv:2406.07217}, } APA:

Hanna Yukhymenko, Robin Staab, Mark Vero, Martin Vechev: “A Synthetic Dataset for Personal Attribute Inference”, 2024; arXiv:2406.07217.

Dataset Card Authors

Hanna Yukhymenko Robin Staab Mark Vero
h
persona-chat
huggingface.co
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Awsaf (2025). persona-chat [Dataset]. https://huggingface.co/datasets/awsaf49/persona-chat
Explore at:
Dataset updated
Jul 3, 2025
Authors
Awsaf
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for PersonaChat

Dataset Description

PersonaChat is a multi-turn dialogue dataset introduced by Zhang et al. (2018) for training and evaluating persona-grounded conversational agents. Each conversation is between two crowdworkers, each assigned a randomly selected persona consisting of several simple facts. The dataset aims to assess whether models can maintain consistent character traits throughout a conversation.

Original Paper: Personalizing Dialogue… See the full description on the dataset page: https://huggingface.co/datasets/awsaf49/persona-chat.
f
Example prompts (with an example persona) for all datasets.
plos.figshare.com
xls
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Example prompts (with an example persona) for all datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325664.t001
Dataset updated
Jun 30, 2025
Dataset provided by
PLOS ONE
Authors
Pedro Henrique Luz de Araujo; Benjamin Roth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example prompts (with an example persona) for all datasets.
P
FoCus Dataset
paperswithcode.com
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoonna Jang; Jungwoo Lim; Yuna Hur; Dongsuk Oh; Suhyune Son; Yeonsoo Lee; Donghoon Shin; Seungryong Kim; Heuiseok Lim (2024). FoCus Dataset [Dataset]. https://paperswithcode.com/dataset/focus
Explore at:
Dataset updated
Feb 12, 2024
Authors
Yoonna Jang; Jungwoo Lim; Yuna Hur; Dongsuk Oh; Suhyune Son; Yeonsoo Lee; Donghoon Shin; Seungryong Kim; Heuiseok Lim
Description
We introduce a new dataset, called FoCus, that supports knowledge-grounded answers that reflect user’s persona. One of the situations in which people need different types of knowledge, based on their preferences, occurs when they travel around the world.
f
Differences between the average accuracy (across all personas) and the...
figshare.com
plos.figshare.com
xls
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Differences between the average accuracy (across all personas) and the accuracy of personas when answering questions involving their own demographic. [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325664.t006
Dataset updated
Jun 30, 2025
Dataset provided by
PLOS ONE
Authors
Pedro Henrique Luz de Araujo; Benjamin Roth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Differences between the average accuracy (across all personas) and the accuracy of personas when answering questions involving their own demographic.
f
Differences between the frequency that each demographic is selected as the...
plos.figshare.com
xls
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Henrique Luz de Araujo; Benjamin Roth (2025). Differences between the frequency that each demographic is selected as the answer by the persona of the same demographic and on average (across all personas). [Dataset]. http://doi.org/10.1371/journal.pone.0325664.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325664.t007
Dataset updated
Jun 30, 2025
Dataset provided by
PLOS ONE
Authors
Pedro Henrique Luz de Araujo; Benjamin Roth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Differences between the frequency that each demographic is selected as the answer by the persona of the same demographic and on average (across all personas).
m
Zhi Yubo_URL of Master thesis_2025
data.mendeley.com
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubo Zhi (2025). Zhi Yubo_URL of Master thesis_2025 [Dataset]. http://doi.org/10.17632/8z8thsd2d7.1
Explore at:
Unique identifier
https://doi.org/10.17632/8z8thsd2d7.1
Dataset updated
Apr 25, 2025
Authors
Yubo Zhi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Current researches focus on understanding influencer marketing, the theories behind it and factors that contributes to the success of such campaigns. Even though many articles and research papers do acknowledge that relationship of the 2 parties is important and essential for influencer marketing, however, very few or no researches directly conduct empirical analysis on whether relationship between KOLs and their followers indeed influence and to what magnitude influence the success of influencer marketing campaigns and eventually impacting brand’s choice of marketing tactic or KOLs to choose. This study in the form of case study with KOLs on Instagram and Red platform will help to fill this void by addressing this issue which is underexplored currently and provide a deep-dive into the relationship between influencers and its followers and the impact on their followers. Together with the deep-dive, the paper will also include researches on other factors that will affect the effectiveness of influencer marketing. Empirical evidence from this research confirms that KOLs’ ability to influence their followers will impact the outcome of influencer marketing, but only effective through certain methods. Specifically, focusing on two largest social platform, Instagram and Red, the paper found that post content alignment with KOL’s persona and write-up or message interactivity are two influencing factors in determining the success of influencer marketing. While other factors such as relationship built between the KOL and followers does not seem to influence the outcome of future campaigns, potentially suggesting that past relationship built between the KOL and her followers has a short-horizon of influences, as the benefits of strong relationship with the followers seem not carry forward. The findings in this paper offer marketers and KOLs theoretical guidance for conducting influencer marketing campaigns on Instagram and Red as well as in the global and China market. Keywords: Brand marketing strategy, Influencer Marketing, key opinion leader, Social Media Platforms, Consumer BehaviorM

Facebook

Twitter

Click to copy link

Link copied

Cite

SynthLabs (2025). PERSONA [Dataset]. https://huggingface.co/datasets/SynthLabsAI/PERSONA

PERSONA

SynthLabsAI/PERSONA

PERSONAS (Prism Filter)

Explore at:

Dataset updated

Apr 16, 2025

Dataset provided by

Synth Labs

Authors

SynthLabs

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Dataset Card for PERSONAS (Prism Filter)

PERSONAS (Prism filter) is one of the largest datasets of synthetic preferences, with over 200k preferences over thousands of questions and 1k personas. Details on the PERSONAS dataset can be found here paper link. Note that you MUST also fill out the form on our site to receive access to the full dataset. The form is available here.

  Dataset Details







  Dataset Description

The personas dataset is a pluralistic… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/PERSONA.

Clear search

Close search

Google apps

Main menu

PERSONA

persona-bias

persona-driven-dataset

Persona list by category.

Persona group average ranks (out of 193—162 personas + 30 control personas +...

Synthetic-Persona-Chat

PEC Dataset

Persona ranks (out of 193, lower is better) for increasingly specialized...

Data from: Mapping and Influencing the Political Ideology of Large Language...

Contents

Model Response Data

Configuration and Input Files

Related Code Repository

File Placement Instructions

Configuration and Input Files

Model Response Files

ConvAI2 Dataset

Persona ranks for self-bias (out of 193), self-accuracy, overall bias, and...

PERSONA_subset

USR-PersonaChat Dataset

SynthPAI Dataset

persona-chat

Example prompts (with an example persona) for all datasets.

FoCus Dataset

Differences between the average accuracy (across all personas) and the...

Differences between the frequency that each demographic is selected as the...

Zhi Yubo_URL of Master thesis_2025

PERSONA

SynthLabsAI/PERSONA

PERSONAS (Prism Filter)