Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across U.S. Census regions, based on a 2025 nationwide survey. It tracks how often users followed, partially used, or never used ChatGPT by state region.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows the types of advice users sought from ChatGPT based on a 2025 U.S. survey, including education, financial, medical, and legal topics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aggregated a Twitter dataset utilizing Twitter Archiving Google Sheet (TAGS) to interact with Twitter’s API and return relevant data. To analyze the marketing side of the conversation around ChatGPT, we selected #ChatGPT as a common hashtag to target tweets talking about AI. This is the marketing dataset, so we included hashtags “marketing”, “content creation”, or “creator economy” as content creation is a field heavily impacted by ChatGPT’s writing capabilities as a chatbot and creator economy is a common word used by experts to describe the overarching industry. This would give us a more specific dataset to analyze what people well-versed in marketing, ChatGPT’s ideal audience, thought about AI’s role in marketing. Because of the TAGS limitation, our dataset was limited to tweets ranging from January 21st to January 25th for both datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows how men and women in the U.S. reported using ChatGPT in a 2025 survey, including whether they followed its advice or chose not to use it.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows the percentage of U.S. adults who say they trust ChatGPT more than a human expert, based on a 2025 national AI trust survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset reflects how Americans perceive ChatGPT's broader societal impact, based on a 2025 survey that asked whether the AI will help or harm humanity.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset includes the responses (Likert-point scale) of individuals to the Cognitive Engagement Scale, together with some demographics.
The AgoraSpeech dataset is stored and provided as a CSV file. Each row in the CSV file corresponds to a paragraph from a campaign speech, totaling 5,279 rows, while each column represents a specific type of information related to that paragraph, with 20 columns in total. The dataset provides multiple types of information for each paragraph. First, each paragraph is described by metadata, such as the politician's name, the location of the speech, and other relevant details. Additionally, the paragraph is contextualized by its position within the full campaign speech, with the text available in both Greek and English. Finally, the results of ChatGPT's analysis across six NLP tasks are included, with each task creating a dedicated column. Information type Feature name Description Values Metadata elections The election period of the speech "1st Elections 2023-05-21" or "2nd Elections 2023-06-25" Metadata speech_id The ID of the speech string in format: PoliticianName_YYYY_MM_DD_Location Metadata politician The name of the politician that gave the speech "Androulakis", "Koutsoumpas", "Mitsotakis", "Tsipras", "Varoufakis", or "Velopoulos" Metadata date (YYYY-MM-DD) The date of the speech string in format: YYYY-MM-DD Metadata location The location where the speech took place cities in Greece Content paragraph The number of the paragraph of the speech an integer (starting from 1) Content text The text of the paragraph (in english) english text Content text_el The text of the paragraph (in Greek) greek text Analysis criticism_or_agenda Whether the text is identified as "political agenda" or "criticism" "political agenda" or "criticism" Analysis topic The main topic of the text string from a predefined list of topics Analysis sentiment The detected sentiment of the text a float number between -1 (negative) and 1 (positive) Analysis polarization The detected level of polarization in the text a float number between -1 (none/low) and 1 (high) Analysis populism The detected level of populism in the text a float number between -1 (none/low) and 1 (high) Analysis named entities The detected entities (people, locations, organizations, etc.) in the text string values detected as entities along with metadata Remark: The table above presents only 14 features (cf. the dataset contains 20 features). The first 8 features, i.e., metadata and content, are created before any annotations. The remaining 6 features correspond to NLP tasks, each appearing twice in the dataset: once as annotated by ChatGPT and once following verification through the human-in-the-loop process. Political discourse datasets are important for gaining political insights, analyzing communication strategies or social science phenomena. Although numerous political discourse corpora exist, comprehensive, high-quality, annotated datasets are scarce. This is largely due to the substantial manual effort, multidisciplinarity, and expertise required for the nuanced annotation of rhetorical strategies and ideological contexts. In this paper, we present AgoraSpeech, a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023. The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection. A two-step annotation was employed, starting with ChatGPT-generated annotations and followed by exhaustive human-in-the-loop validation. The dataset was initially used in a case study to provide insights during the pre-election period. However, it has general applicability by serving as a rich source of information for political and social scientists, journalists, or data scientists, while it can be used for benchmarking and fine-tuning NLP and large language models (LLMs). Code Availability Exploratory Data Analysis: https://github.com/Datalab-AUTH/AgoraSpeech-EDA Related Publications Pavlos Sermpezis, Stelios Karamanidis, Eva Paraschou, Ilias Dimitriadis, Sofia Yfantidou, Filitsa-Ioanna Kouskouveli, Thanasis Troboukis, Kelly Kiki, Antonis Galanopoulos, and Athena Vakali, 2024, AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI (Submitted for peer-review). Acknowledgments We would like to thank Nota Vafea, Katerina Voutsina, Stefania Ibrishimova, Athina Thanasi, Chrysoula Marinou, and Georgios Schinas (iMEdD) for their contributions to the human data annotation process. We also thank Christos Nomikos, Nikos Sarantos (iMEdD), and Dimitrios-Panteleimon Giakatos (Datalab) for their IT support and software development of the online tool, as well as Anatoli Stavroulopoulou for cross-checking the political speeches' translations.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was made for a NER task. In this task, we need to train a named entity recognition (NER) model for the identification of mountain names inside the texts.
Each entry in the dataset corresponds to a tweet or a sentence that was generated by OpenAI's ChatGPT. It's a mixed dataset that includes a variety of tweets/texts, some of which are focused on mountain-related experiences, while others may discuss different topics.
The features of the dataset include:
Text Content: This feature contains the actual text content of each sentence/tweet. It captures the expressions, experiences, or sentiments related to mountainous regions and activities.
Markers: In the context of the provided code, the "marker" feature represents the start and end indices of the occurrences of specific mountain names within the tweet text.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As ChatGPT emerges as a potential ally in healthcare decision-making, it is imperative to investigate how users leverage and perceive it. The repurposing of technology is innovative but brings risks, especially since AI’s effectiveness depends on the data it’s fed. In healthcare, ChatGPT might provide sound advice based on current medical knowledge, which could turn into misinformation if its data sources later include erroneous information. Our study assesses user perceptions of ChatGPT, particularly of those who used ChatGPT for healthcare-related queries. By examining factors such as competence, reliability, transparency, trustworthiness, security, and persuasiveness of ChatGPT, the research aimed to understand how users rely on ChatGPT for health-related decision-making. A web-based survey was distributed to U.S. adults using ChatGPT at least once a month. Bayesian Linear Regression was used to understand how much ChatGPT aids in informed decision-making. This analysis was conducted on subsets of respondents, both those who used ChatGPT for healthcare decisions and those who did not. Qualitative data from open-ended questions were analyzed using content analysis, with thematic coding to extract public opinions on urban environmental policies. Six hundred and seven individuals responded to the survey. Respondents were distributed across 306 US cities of which 20 participants were from rural cities. Of all the respondents, 44 used ChatGPT for health-related queries and decision-making. In the healthcare context, the most effective model highlights ’Competent + Trustworthy + ChatGPT for healthcare queries’, underscoring the critical importance of perceived competence and trustworthiness specifically in the realm of healthcare applications of ChatGPT. On the other hand, the non-healthcare context reveals a broader spectrum of influential factors in its best model, which includes ’Trustworthy + Secure + Benefits outweigh risks + Satisfaction + Willing to take decisions + Intent to use + Persuasive’. In conclusion our study findings suggest a clear demarcation in user expectations and requirements from AI systems based on the context of their use. We advocate for a balanced approach where technological advancement and user readiness are harmonized.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents how much users trust ChatGPT across different advice categories, including career, education, financial, legal, and medical advice, based on a 2025 U.S. survey.
Introduction and Generation Presenting, a Synthetic Dataset designed with a fixed structure Prompt for LLMs, and a "completion" element designed to train responses, including descriptions of the movies worked on by the two stars listed in the prompt, whether the two individuals have collaborated with each other, and providing a suggestion of what movie they could collaborate on next.
The data is Generated with prompt suggestions from both the Gemma 2 2B model through Google AI Studio, and ChatGPT 4o through it's webapp. The dataset was first generated with a request for random pairings of famous actors and directors, but also visually confirmed by me using actual move knowledge to avoid repetition and check for correctness, as recent collaborative data was also noticeably absent.
To check step by step, the data was generated serially and cumulatively, prompting suggestions in the same format, but with other directors and actors that I knew were not included in the collective list, and knew would add diversity to the included genre and style data.
Structure "prompt":string"Movie Collaboration between Christopher Nolan and Leonardo DiCaprio" "completion":{ "star_1": "star_2": "collaboration_check": "collaboration_idea": }
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for UltraChat 200k
Dataset Description
This is a heavily filtered version of the UltraChat dataset and was used to train Zephyr-7B-β, a state of the art 7b chat model. The original datasets consists of 1.4M dialogues generated by ChatGPT and spanning a wide range of topics. To create UltraChat 200k, we applied the following logic:
Selection of a subset of data for faster supervised fine tuning. Truecasing of the dataset, as we observed around 5% of the data… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionAphasia is associated with impairments in written language, including difficulty with sentence formulation, word finding, and editing. While writing aids show promise, artificial intelligence (AI) tools, such as large language models (LLMs), offer new opportunities for individuals with language-based writing challenges.MethodsThis case report describes the use of the LLM, ChatGPT, to improve accuracy, complexity, and productivity in an adult with aphasia. The intervention combined self-genrated content with AI-assisted editing, guided by a visual flow char and structured prompts. Writing samples were analyzed for sentence count, complexity, and errors, while the patient's attitudes toward writing were evaluated through surveys.ResultsWhen using ChatGPT, the patient produced more sentences with fewer errors, while self-written samples showed reduced total errors but decreased sentence production and increased sentence length and syntactic complexity. Although the patient required clinician prompting and modeling to use ChatGPT effectively, he developed greater independence and confidence over time. One year later, he reported continued use of ChatGPT for creative and communicative tasks.DiscussionThis case highlights how AI tools can enhance written communication and promote participation in meaningful activities for individuals with aphasia, especially those with prior experience using technology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAdvances in artificial intelligence are gradually transforming various fields, but its applicability among ordinary people is unknown. This study aims to explore the ability of a large language model to address Helicobacter pylori related questions.MethodsWe created several prompts on the basis of guidelines and the clinical concerns of patients. The capacity of ChatGPT on Helicobacter pylori queries was evaluated by experts. Ordinary people assessed the applicability.ResultsThe responses to each prompt in ChatGPT-4 were good in terms of response length and repeatability. There was good agreement in each dimension (Fleiss’ kappa ranged from 0.302 to 0.690, p
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset summarizes how ChatGPT users rated the outcomes of the advice they received, including whether it was helpful, harmful, neutral, or uncertain, based on a 2025 U.S. survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Personality research has traditionally relied on questionnaires, which bring with them inherent limitations, such as response style bias. With the emergence of large language models such as ChatGPT, the question arises as to what extent these models can be used in personality research. In this study, ChatGPT (GPT-4) generated 2000 text-based personas. Next, for each persona, ChatGPT completed a short form of the Big Five Inventory (BFI-10), the Brief Sensation Seeking Scale (BSSS), and a Short Dark Triad (SD3). The mean scores on the BFI-10 items were found to correlate strongly with means from previously published research, and principal component analysis revealed a clear five-component structure. Certain relationships between traits, such as a negative correlation between the age of the persona and the BSSS score, were clearly interpretable, while some other correlations diverged from the literature. An additional analysis using four new sets of 2000 personas each, including a set of ‘realistic’ personas and a set of cinematic personas, showed that the correlation matrix among personality constructs was affected by the persona set. It is concluded that evaluating questionnaires and research hypotheses prior to engaging with real individuals holds promise.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Every November Dublin City Council (DCC) conducts traffic counts at 33 locations on entry points into the city centre around a 'cordon' formed by the Royal and Grand Canals. As the name suggests, the cordon has been chosen to ensure (as far as possible) that any person entering the City Centre from outside must pass through one of the 33 locations where the surveys are undertaken. In addition, every May there is a wider traffic count survey carried out at approximately 60 locations where in addition to the canal cordon locations, further counts are carried out at bridges along the River Liffey and points such as Parnell Street and St. Stephens Green. These traffic counts provide a reliable measurement of the modal distribution of persons travelling into, and out of, Dublin City on a year on year comparable basis. The data collected is divided into the various transport modes allowing us to better understand the changing usage trends in cycling, pedestrian and various vehicle types. Resources include a map with the 33 locations on the Cordon where data is annually collected. All 33 cordon points are on routes for general traffic into the City Centre, while 22 of the cordon points are on bus routes into the City. The numbers of people using Bus, Luas, DART and suburban rail services to enter the City Centre are collated from each of the various service providers and an Annual Monitoring Report is prepared by the National Transport Authority.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.