Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across U.S. Census regions, based on a 2025 nationwide survey. It tracks how often users followed, partially used, or never used ChatGPT by state region.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows the types of advice users sought from ChatGPT based on a 2025 U.S. survey, including education, financial, medical, and legal topics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows how men and women in the U.S. reported using ChatGPT in a 2025 survey, including whether they followed its advice or chose not to use it.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aggregated a Twitter dataset utilizing Twitter Archiving Google Sheet (TAGS) to interact with Twitter’s API and return relevant data. To analyze the marketing side of the conversation around ChatGPT, we selected #ChatGPT as a common hashtag to target tweets talking about AI. This is the marketing dataset, so we included hashtags “marketing”, “content creation”, or “creator economy” as content creation is a field heavily impacted by ChatGPT’s writing capabilities as a chatbot and creator economy is a common word used by experts to describe the overarching industry. This would give us a more specific dataset to analyze what people well-versed in marketing, ChatGPT’s ideal audience, thought about AI’s role in marketing. Because of the TAGS limitation, our dataset was limited to tweets ranging from January 21st to January 25th for both datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents how much users trust ChatGPT across different advice categories, including career, education, financial, legal, and medical advice, based on a 2025 U.S. survey.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compares how much U.S. adults trust ChatGPT relative to Google Search, including responses from a 2025 national survey measuring perceptions of AI accuracy and reliability.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides over 15,000 language models and dialogues designed to power dynamic ChatGPT applications. It was created by Databricks employees, aiming to facilitate the use of large language models (LLMs) for interactive dialogue interactions. The dataset generates prompt-response pairs across eight distinct instruction categories and deliberately avoids information from external web sources, with the exception of Wikipedia for specific instruction sets. This open-source resource is ideal for exploring the boundaries of text-based conversations and uncovering new insights into natural language processing.
The dataset is typically provided as a data file, usually in CSV format. It contains over 15,000 language models and dialogues, with the main train.csv
file consisting of this quantity of records. Each record within the dataset represents a unique prompt-response pair, or a single turn in a conversation between two individuals. The columns are all of a string data type.
This dataset is suited for a variety of applications and use cases: * Training dialogue systems by developing multiple funneling pipelines to enrich models with real-world conversations. * Creating intelligent chatbot interactions. * Generating natural language answers as part of Q&A systems. * Utilising excerpts from Wikipedia for particular subsets of instruction categories. * Leveraging the classification labels with supervised learning techniques, such as multi-class classification neural networks or logistic regression classifiers. * Developing deep learning models to detect and respond to conversational intent. * Training language models for customer service queries using natural language processing (NLP). * Creating custom dialogue agents capable of handling more intricate conversational interactions.
The dataset has a global reach. It was listed on 17/06/2025, and its content focuses on general conversational and Q&A interactions, without specific demographic limitations.
CC0
This dataset is valuable for a wide range of users, including AI/ML developers, researchers, and data scientists looking to: * Build and train conversational AI models. * Develop advanced chatbot applications. * Explore new insights in natural language processing. * Create bespoke dialogue agents for various sectors, such as customer service. * Apply supervised learning to classify conversational data.
Original Data Source: Databricks Dolly (15K)
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for UltraChat 200k
Dataset Description
This is a heavily filtered version of the UltraChat dataset and was used to train Zephyr-7B-β, a state of the art 7b chat model. The original datasets consists of 1.4M dialogues generated by ChatGPT and spanning a wide range of topics. To create UltraChat 200k, we applied the following logic:
Selection of a subset of data for faster supervised fine tuning. Truecasing of the dataset, as we observed around 5% of the data… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset reflects how Americans perceive ChatGPT's broader societal impact, based on a 2025 survey that asked whether the AI will help or harm humanity.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a unique resource for Natural Language Processing (NLP) research, combining conversations between AI and humans that were extracted from online chat logs. Its purpose is to explore how human conversations can inform the development of conversational AI models, offering insights into connecting people with technology through meaningful dialogue. The dataset includes responses from AI systems, questions from humans, and outputs from popular models such as ChatGPT and Llama2-13b-Chat.
The data is typically provided in a CSV file format, specifically the train.csv
file is part of this dataset. It contains conversations, with unique values for the system
column totalling 12,552, for chatgpt
at 12,440, and for llama2-13b-chat
at 12,851.
This dataset is ideal for: * Developing and improving natural language processing algorithms for AI-human conversation. * Building user-friendly chatbots that are better at recognising and understanding human intent by training models using this dataset. * Designing recommendation systems to predict user questions and generate more accurate responses based on prior conversations. * Exploring conversational techniques that enable natural language communication between humans and machines.
The dataset's scope is global. While the specific time range of the included conversations is not detailed in the sources, the dataset was listed on 16th June 2025. It primarily covers interactions between AI systems and human users.
CCO
This dataset is intended for: * Natural Language Processing (NLP) researchers seeking to understand and advance human-centric AI. * Developers focused on building and refining conversational AI models and chatbots. * Data scientists working on recommendation systems. * Anyone interested in the development of meaningful dialogue between humans and technology.
Original Data Source: Orca DPO Dialogue Pairs
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
AI Awareness & Usage – Survey
Artificial Intelligence (AI) is no longer a concept of the future—it's already part of our everyday lives. From smart assistants to creative tools, AI is transforming how we live, work, and even express ourselves.
This short and thoughtful survey aims to explore not just how much people know about AI, but also how they feel about its growing role in society.
You'll be asked about:
Your awareness and understanding of AI
Whether you’ve used AI-powered tools like ChatGPT, Google Bard, etc.
Your opinion on whether AI makes life easier
How you would personify AI, if it were a human
Your openness to AI creating things like books, music, or movies
Your responses will help us better understand public perception of AI—both its current impact and its potential to shape the future of creativity, communication, and technology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows the percentage of U.S. adults who say they trust ChatGPT more than a human expert, based on a 2025 national AI trust survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset summarizes how ChatGPT users rated the outcomes of the advice they received, including whether it was helpful, harmful, neutral, or uncertain, based on a 2025 U.S. survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Every November Dublin City Council (DCC) conducts traffic counts at 33 locations on entry points into the city centre around a 'cordon' formed by the Royal and Grand Canals. As the name suggests, the cordon has been chosen to ensure (as far as possible) that any person entering the City Centre from outside must pass through one of the 33 locations where the surveys are undertaken. In addition, every May there is a wider traffic count survey carried out at approximately 60 locations where in addition to the canal cordon locations, further counts are carried out at bridges along the River Liffey and points such as Parnell Street and St. Stephens Green. These traffic counts provide a reliable measurement of the modal distribution of persons travelling into, and out of, Dublin City on a year on year comparable basis. The data collected is divided into the various transport modes allowing us to better understand the changing usage trends in cycling, pedestrian and various vehicle types. Resources include a map with the 33 locations on the Cordon where data is annually collected. All 33 cordon points are on routes for general traffic into the City Centre, while 22 of the cordon points are on bus routes into the City. The numbers of people using Bus, Luas, DART and suburban rail services to enter the City Centre are collated from each of the various service providers and an Annual Monitoring Report is prepared by the National Transport Authority.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vision-language models are of interest in various domains, including automated driving, where computer vision techniques can accurately detect road users, but where the vehicle sometimes fails to understand context. This study examined the effectiveness of GPT-4V in predicting the level of ‘risk’ in traffic images as assessed by humans. We used 210 static images taken from a moving vehicle, each previously rated by approximately 650 people. Based on psychometric construct theory and using insights from the self-consistency prompting method, we formulated three hypotheses: 1) repeating the prompt under effectively identical conditions increases validity, 2) varying the prompt text and extracting a total score increases validity compared to using a single prompt, and 3) in a multiple regression analysis, the incorporation of object detection features, alongside the GPT-4V-based risk rating, significantly contributes to improving the model’s validity. Validity was quantified by the correlation coefficient with human risk scores, across the 210 images. The results confirmed the three hypotheses. The eventual validity coefficient was r = 0.83, indicating that population-level human risk can be predicted using AI with a high degree of accuracy. The findings suggest that GPT-4V must be prompted in a way equivalent to how humans fill out a multi-item questionnaire.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.