MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
ChatGPT Gemini Claude Perplexity Human Evaluation Multi Aspect Review Dataset
Introduction
Human evaluation and reviews with scalar score of AI Services responses are very usefuly in LLM Finetuning, Human Preference Alignment, Few-Shot Learning, Bad Case Shooting, etc, but extremely difficult to collect. This dataset is collected from DeepNLP AI Service User Review panel (http://www.deepnlp.org/store), which is an open review website for users to give reviews and upload… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.
This dataset provides a collection of user reviews for the ChatGPT mobile application on iOS. It captures valuable user insights and sentiments, making it suitable for understanding customer satisfaction, evaluating app performance, and identifying emerging trends. The data was gathered by scraping ChatGPT reviews from the App Store.
The dataset is typically provided in a CSV file format. It includes 2058 unique date values and 2257 unique review texts. The reviews span from 18th May 2023 to 25th July 2023. Review counts by period are as follows: * 18th May 2023 - 25th May 2023: 1,475 reviews * 25th May 2023 - 1st June 2023: 267 reviews * 1st June 2023 - 7th June 2023: 117 reviews * 7th June 2023 - 14th June 2023: 82 reviews * 14th June 2023 - 21st June 2023: 60 reviews * 21st June 2023 - 28th June 2023: 59 reviews * 28th June 2023 - 4th July 2023: 73 reviews * 4th July 2023 - 11th July 2023: 45 reviews * 11th July 2023 - 18th July 2023: 57 reviews * 18th July 2023 - 25th July 2023: 57 reviews
Rating distribution is also available: * 1.00 - 1.40 stars: 495 reviews * 1.80 - 2.20 stars: 139 reviews * 3.00 - 3.40 stars: 220 reviews * 3.80 - 4.20 stars: 304 reviews * 4.60 - 5.00 stars: 1,134 reviews
This dataset is ideal for: * Sentiment analysis to gauge user emotions and opinions regarding the ChatGPT app. * Performance evaluation to identify factors contributing to high or low user ratings. * Pattern identification to uncover recurring themes and common issues in user feedback.
The dataset covers reviews globally, spanning a time range from 18th May 2023 to 25th July 2023.
CC-BY-NC
Original Data Source: ChatGPT App Reviews
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive ChatGPT statistics covering 800 million weekly users, $300 billion valuation, market share, demographics, and technical specifications for 2025.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd7e02bf38f4b08df2508d6b6e42f3066%2Fchatgpt2.png?generation=1700233710310045&alt=media" alt="">
Based on their wikipedia page
ChatGPT (Chat Generative Pre-trained Transformer) is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, that enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.
These reviews were extracted from Google Store App
This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following
(AND MANY MORE!)
Images generated using Bing Image Generator
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows the types of advice users sought from ChatGPT based on a 2025 U.S. survey, including education, financial, medical, and legal topics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*both authors contributed equally
Automated query script for automated language bias studies in GPT 3-5
Dataset of the paper "How User Language Affects Conflict Fatality Estimates in ChatGPT" preprint available on ArXiv
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset for this research project was meticulously constructed to investigate the adoption of ChatGPT among students in the United States. The primary objective was to gain insights into the technological barriers and resistances faced by students in integrating ChatGPT into their information systems. The dataset was designed to capture the diverse adoption patterns among students in various public and private schools and universities across the United States. By examining adoption rates, frequency of usage, and the contexts in which ChatGPT is employed, the research sought to provide a comprehensive understanding of how students are incorporating this technology into their information systems. Moreover, by including participants from diverse educational institutions, the research sought to ensure a comprehensive representation of the student population in the United States. This approach aimed to provide nuanced insights into how factors such as educational background, institution type, and technological familiarity influence ChatGPT adoption.
This dataset provides a daily-updated collection of user reviews and ratings specifically for the ChatGPT Android application. It includes crucial information such as the review text, associated ratings, and the dates when reviews were posted. The dataset also details the relevancy of each review. It serves as a valuable resource for understanding user sentiment, tracking app performance over time, and analysing trends within the AI and Large Language Model (LLM) application landscape.
The dataset is primarily available in a tabular format, typically a CSV file, facilitating easy integration and analysis. It comprises over 637,000 unique reviews, reflecting a substantial volume of user feedback. This dataset is updated on a daily basis, ensuring access to the latest user opinions and rating trends. While the exact file size is not specified, the number of records indicates a considerable volume of data.
This dataset is ideal for various analytical applications, including: * Sentiment Analysis: Extracting and understanding user emotions and opinions towards the ChatGPT Android app. * Natural Language Processing (NLP) Tasks: Training and testing NLP models for text classification, entity recognition, and language generation based on real-world user input. * App Performance Monitoring: Tracking changes in user ratings and feedback over time to gauge application performance and identify areas for improvement. * Market Research: Gaining insights into user perception of AI and LLM applications within the mobile market. * Competitive Analysis: Comparing user feedback for the ChatGPT app against other similar applications. * Feature Prioritisation: Identifying desired features or common pain points mentioned by users to inform product development.
This dataset offers global coverage, collecting reviews from users across the world. The time range for the reviews spans from 25 July 2023 to 30 June 2025. This extensive period allows for longitudinal studies of user sentiment and app evolution. It captures feedback from a diverse demographic of ChatGPT Android app users. Some data points, such as appVersion
, may occasionally have null values.
CC-BY-NC-SA
Original Data Source: ChatGPT reviews [DAILY UPDATED]
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides user reviews for ChatGPT, offering valuable qualitative feedback, satisfaction ratings, and submission dates. It captures a diverse array of user sentiments, from concise remarks to more detailed feedback. The ratings are provided on a scale of 1 to 5, indicating different levels of user satisfaction. The dataset spans several months, which allows for temporal analysis of sentiment trends, as each review includes a timestamp. This data is ideal for gaining insights into user characteristics and for improving application features and services.
The dataset is provided as a free resource. While a sample file will be updated separately to the platform, the data quality is assessed as 5 out of 5, with the current version being 1.0. It was listed on 08/06/2025, with 1 view and 0 downloads recorded so far. The dataset contains approximately 193,154 unique reviews.
This dataset is particularly useful for various analytical applications, including: * Sentiment Analysis: Developing models to predict the emotional tone or sentiment conveyed in user reviews. * Customer Feedback Analysis: Extracting actionable insights that can inform and guide improvements to application features and services. * Review Classification: Building machine learning models to categorise user reviews, for instance, as positive or negative. * Data Visualisation: Creating visual representations of review patterns and trends. * Exploratory Data Analysis: Investigating the characteristics and underlying patterns within the review data. * Natural Language Processing (NLP): Applying NLP techniques to understand and process the textual feedback. * Text Mining: Discovering patterns and insights from the large collection of text reviews. * Time-Series Analysis: Examining how sentiment and ratings evolve over time based on review timestamps.
This dataset comprises user reviews for ChatGPT collected from 25th July 2023 to 24th August 2024. The data collection is global, reflecting feedback from users worldwide.
CCO
This dataset is ideal for a range of users interested in understanding user feedback and sentiment, including: * Data Scientists and Machine Learning Engineers for building and training sentiment analysis and classification models. * Product Managers and App Developers to gain actionable insights for product improvement and feature development. * Market Researchers to understand user satisfaction and market perception of AI applications. * Academic Researchers studying human-computer interaction, natural language processing, or user behaviour.
Original Data Source: ChatGPT Users Reviews
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains information related to ChatGPT, OpenAI's conversational AI model, gathered from social media. It includes keywords such as "chatgpt" and "chat gpt", as well as associated hashtags and mentions. The dataset's purpose is to help in understanding public opinion, identifying trends, and exploring potential applications of ChatGPT. By analysing tweet volume, sentiment, user engagement, and the influence of key AI events, this dataset offers valuable insights for various stakeholders.
This dataset is provided as a CSV file and includes data on 500,000 tweets. The dataset consists of two CSV files: an originally scraped dataset and a preprocessed dataset.
This dataset is ideal for: * Understanding public sentiment and trends surrounding ChatGPT. * Analysing tweet volume and user engagement related to AI-powered conversational technologies. * Exploring the influence of key AI events on social media discussions. * Supporting research into the societal impact and adoption of conversational AI.
The dataset covers the period from January to March 2023. The data collected is global in scope, capturing public opinion on social media platforms.
CC0
Original Data Source: 500k ChatGPT-related Tweets Jan-Mar 2023
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents how much users trust ChatGPT across different advice categories, including career, education, financial, legal, and medical advice, based on a 2025 U.S. survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across U.S. Census regions, based on a 2025 nationwide survey. It tracks how often users followed, partially used, or never used ChatGPT by state region.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has been used to write a book chapter on the topic of "Classifying User Intent for Effective Prompt Engineering: A Case of a Chatbot for Startup Teams". The dataset contains the following five resources:Startup questions and intent classifications- This resource demonstrates a list of possible questions and the classification of those questions into four intents i.e. reflecting on own experience, seeking information, brainstorming, and seeking advicePrompt_Book_v1- The file contains a brief guide on how questions are classified, a description of prompt patterns and templates, and lastly matching purpose-prompt patternQuestions_classification_script- The Python script used in our work to classify user intentSurvey_questionnaire- The original survey questions asked from the participantssurvey_responses- Survey responses from study respondents
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
UPDATE: Due to new Twitter API conditions changed by Elon Musk, now it's no longer free to use the Twitter (X) API and the pricing is 100 $/month in the hobby plan. So my automated ETL notebook stopped from updating new tweets to this dataset on May 13th 2023.
This dataset is was updated everyday with the addition of 1000 tweets/day containing any of the words "ChatGPT", "GPT3", or "GPT4", starting from the 3rd of April 2023. Everyday's tweets are uploaded 24-72h later, so the counter on tweets' likes, retweets, messages and impressions gets enough time to be relevant. Tweets are from any language selected randomly from all hours of the day. There are some basic filters applied trying to discard sensitive tweets and spam.
This dataset can be used for many different applications regarding to Data Analysis and Visualization but also NLP Sentiment Analysis techniques and more.
Consider upvoting this Dataset and the ETL scheduled Notebook providing new data everyday into it if you found them interesting, thanks! 🤗
tweet_id: Integer. unique identifier for each tweet. Older tweets have smaller IDs.
tweet_created: Timestamp. Time of the tweet's creation.
tweet_extracted: Timestamp. The UTC time when the ETL pipeline pulled the tweet and its metadata (likes count, retweets count, etc).
text: String. The raw payload text from the tweet.
lang: String. Short name for the Tweet text's language.
user_id: Integer. Twitter's unique user id.
user_name: String. The author's public name on Twitter.
user_username: String. The author's Twitter account username (@example)
user_location: String. The author's public location.
user_description: String. The author's public profile's bio.
user_created: Timestamp. Timestamp of user's Twitter account creation.
user_followers_count: Integer. The number of followers of the author's account at the moment of the tweet extraction
user_following_count: Integer. The number of followed accounts from the author's account at the moment of the Tweet extraction
user_tweet_count: Integer. The number of Tweets that the author has published at the moment of the Tweet extraction.
user_verified: Boolean. True if the user is verified (blue mark).
source: The device/app used to publish the tweet (Apparently not working, all values are Nan so far).
retweet_count: Integer. Number of retweets to the Tweet at the moment of the Tweet extraction.
like_count: Integer. Number of Likes to the Tweet at the moment of the Tweet extraction.
reply_count: Integer. Number of reply messages to the Tweet.
impression_count: Integer. Number of times the Tweet has been seen at the moment of the Tweet extraction.
More info: Tweets API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet Users API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "Collective Cognition ChatGPT Conversations"
Dataset Description
Dataset Summary
The "Collective Cognition ChatGPT Conversations" dataset is a collection of chat logs between users and the ChatGPT model. These conversations have been shared by users on the "Collective Cognition" website. The dataset provides insights into user interactions with language models and can be utilized for multiple purposes, including training, research, and… See the full description on the dataset page: https://huggingface.co/datasets/CollectiveCognition/chats-data-2023-10-16.
This dataset features 100,000 user reviews of the ChatGPT app, collected from the Google Play Store. It offers diverse feedback from users across ten countries, providing valuable insights into user experiences and application performance. This dataset is well-suited for natural language processing tasks, sentiment analysis, and studies on user feedback.
The dataset contains 100,000 records, typically formatted as a CSV file. It includes detailed information such as user ratings, textual comments, and application versions. The ratings distribution shows a significant majority of high ratings, with 74,403 reviews in the 4.80-5.00 range. Thumbs Up counts range from 0 to 1712, with most reviews having fewer than 85.60 likes. There are 95,666 unique app versions and 87,220 unique review dates recorded.
This dataset is ideal for: * Sentiment Analysis: To evaluate user sentiment, assess satisfaction levels, and pinpoint areas for app improvement. * Natural Language Processing (NLP): For applying techniques such as text classification, summarisation, and keyword extraction from user comments. * Trend Analysis: To observe changes in user feedback over time and across different app versions. * Market Research: To analyse user preferences and common issues across various geographic regions and demographic groups.
The dataset covers user reviews from ten specific countries: the United States, United Kingdom, Canada, Australia, India, Japan, Germany, France, South Korea, and Brazil. The reviews span a time range from 21 November 2023 to 19 July 2024. The data originates from publicly available user reviews on the Google Play Store.
CC BY-NC-SA
This dataset is suitable for: * Researchers: Undertaking studies in natural language processing and user experience. * Data Analysts: For sentiment analysis and identifying key trends in user feedback. * Product Developers: Seeking to understand user satisfaction and pinpoint areas for product enhancement. * Market Researchers: Interested in consumer preferences and challenges across different markets.
Original Data Source: ChatGPT Reviews
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Within a year of its launch, ChatGPT has seen a surge in popularity. While many are drawn to its effectiveness and user-friendly interface, ChatGPT also introduces moral concerns, such as the temptation to present generated text as one’s own. This led us to theorize that personality traits such as Machiavellianism and sensation-seeking may be predictive of ChatGPT usage. We launched two online questionnaires with 2,000 respondents each, in September 2023 and March 2024, respectively. In Questionnaire 1, 22% of respondents were students, and 54% were full-time employees; 32% indicated they used ChatGPT at least weekly. Analysis of our ChatGPT Acceptance Scale revealed two factors, Effectiveness and Concerns, which correlated positively and negatively, respectively, with ChatGPT use frequency. A specific aspect of Machiavellianism (manipulation tactics) was found to predict ChatGPT usage. Questionnaire 2 was a replication of Questionnaire 1, with 21% students and 54% full-time employees, of which 43% indicated using ChatGPT weekly. In Questionnaire 2, more extensive personality scales were used. We found a moderate correlation between Machiavellianism and ChatGPT usage (r = .22) and with an opportunistic attitude towards undisclosed use (r = .30), relationships that largely remained intact after controlling for gender, age, education level, and the respondents’ country. We conclude that covert use of ChatGPT is associated with darker personality traits, something that requires further attention.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
ChatGPT Gemini Claude Perplexity Human Evaluation Multi Aspect Review Dataset
Introduction
Human evaluation and reviews with scalar score of AI Services responses are very usefuly in LLM Finetuning, Human Preference Alignment, Few-Shot Learning, Bad Case Shooting, etc, but extremely difficult to collect. This dataset is collected from DeepNLP AI Service User Review panel (http://www.deepnlp.org/store), which is an open review website for users to give reviews and upload… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset.