Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).
This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).
We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:
Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets. Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.
This dataset is a collection of several single chat datasets. If you use this dataset in your research, please credit the original authors of the internal datasets. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study aims to explore students' associations with Artificial Intelligence (AI) and how these perceptions have evolved following the release of Chat GPT. A free word association test was conducted with 836 German high school students aged 10–20. Associations were collected before and after the release of Chat GPT, processed, cleaned, and inductively categorized into nine groups: technical association, assistance system, future, human, negative, positive, artificial, others, and no association. In total, 355 distinct terms were mentioned, with “robot” emerging as the most frequently cited, followed by “computer” and “Chat GPT,” indicating a strong connection between AI and technological applications. The release of Chat GPT had a significant impact on students' associations, with a marked increase in mentions of Chat GPT and related assistance systems, such as Siri and Snapchat AI. The results reveal a shift in students' perception of AI-from abstract, futuristic concepts to more immediate, application-based associations. Network analysis further demonstrated how terms were semantically clustered, emphasizing the prominence of assistance systems in students' conceptions. The findings underscore the importance of integrating AI education that fosters both critical reflection and practical understanding of AI, encouraging responsible engagement with the technology. These insights are crucial for shaping the future of AI literacy in schools and universities.
Facebook
TwitterIn January 2024, ChatGPT online domain chat.openai.com registered over **** percent of its traffic as originating in the United States. Users based in India generated approximately **** percent of the total visits to the chatbot platform, while users in Indonesia accounted for *** percent of the total visits to the website. Visits from Brazil represented the fourth-largest group for the platform, generating more than **** percent of the total traffic recorded in the examined period.
Facebook
TwitterIn October 2025 alone, ChatGPT’s mobile app recorded over **** million iOS and Android downloads worldwide. This brings the total downloads to about *** billion since May 2023. The ChatGPT platform was launched on November 30, 2022, and has attracted global attention and interest. It has led the global artificial intelligence push with its chatbot format, transforming industries like online search.
Facebook
TwitterI will be updating it more in the future as I improve the prompts. If you notice anything odd or if you have any questions, please don't hesitate to ask!
The prompts were a combination of the PERSUADE corpus and some from GPT-4. Essays for the same prompt are generated with different temperatures, top_k values, and slightly different prompts.
All together there were 15 prompts from PERSUADE and 20 from GPT-4. All prompts are below:
persuade_prompts = ['Today the majority of humans own and operate cell phones on a daily basis. In essay form, explain if drivers should or should not be able to use cell phones in any capacity while operating a vehicle.',
'Write an explanatory essay to inform fellow citizens about the advantages of limiting car usage. Your essay must be based on ideas and information that can be found in the passage set. Manage your time carefully so that you can read the passages; plan your response; write your response; and revise and edit your response. Be sure to use evidence from multiple sources; and avoid overly relying on one source. Your response should be in the form of a multiparagraph essay. Write your essay in the space provided.',
'Some schools require students to complete summer projects to assure they continue learning during their break. Should these summer projects be teacher-designed or student-designed? Take a position on this question. Support your response with reasons and specific examples.',
"You have just read the article, 'A Cowboy Who Rode the Waves.' Luke's participation in the Seagoing Cowboys program allowed him to experience adventures and visit many unique places. Using information from the article, write an argument from Luke's point of view convincing others to participate in the Seagoing Cowboys program. Be sure to include: reasons to join the program; details from the article to support Luke's claims; an introduction, a body, and a conclusion to your essay.",
'Your principal has decided that all students must participate in at least one extracurricular activity. For example, students could participate in sports, work on the yearbook, or serve on the student council. Do you agree or disagree with this decision? Use specific details and examples to convince others to support your position. ',
'In "The Challenge of Exploring Venus," the author suggests studying Venus is a worthy pursuit despite the dangers it presents. Using details from the article, write an essay evaluating how well the author supports this idea. Be sure to include: a claim that evaluates how well the author supports the idea that studying Venus is a worthy pursuit despite the dangers; an explanation of the evidence from the article that supports your claim; an introduction, a body, and a conclusion to your essay.',
'In the article "Making Mona Lisa Smile," the author describes how a new technology called the Facial Action Coding System enables computers to identify human emotions. Using details from the article, write an essay arguing whether the use of this technology to read the emotional expressions of students in a classroom is valuable.',
"You have read the article 'Unmasking the Face on Mars.' Imagine you are a scientist at NASA discussing the Face with someone who thinks it was created by aliens. Using information in the article, write an argumentative essay to convince someone that the Face is just a natural landform.Be sure to include: claims to support your argument that the Face is a natural landform; evidence from the article to support your claims; an introduction, a body, and a conclusion to your argumentative essay.",
'Some of your friends perform community service. For example, some tutor elementary school children and others clean up litter. They think helping the community is very important. But other friends of yours think community service takes too much time away from what they need or want to do.
Your principal is deciding whether to require all students to perform community service.
Write a letter to your principal in which you take a position on whether students should be required to perform community service. Support your position with examples.',
"Your principal is considering changing school po...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Chat GPT for text classification of Colon, Lung, and Thyroid Cancer. This data set was created in a novel way with the answers given to the questions created by Chat GPT.
In addition, the terms “colon, lung, and thyroid” have been removed in this dataset.
With this dataset, artificial intelligence-based models can be developed for the classification of Colon, Lung, and Thyroid Cancer.
Details of the dataset are available in the following article: - Sunnetci, K. M. (2025). Biomedical text-based detection of colon, lung, and thyroid cancer: A deep learning approach with novel dataset. Displays, 103068. https://doi.org/10.1016/j.displa.2025.103068
If you use this dataset, please cite the article above.
The authors would like to thank ChatGPT for its support in preparing the data.
Facebook
Twitterhttp://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset.
Data Instances
{'id': 9, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik heb wat hulp nodig met Spark SQL union en het ordenen ' 'van kolommen. ' '[|AI|] Natuurlijk, ik help je graag. Waar heb je specifiek hulp ' 'bij nodig? ' '[|Human|] Ik wil de output van een UNION-query ordenen op een ' 'kolom die niet aanwezig is in de SELECT-instructie. Is dat ' 'mogelijk? ' '[|AI|] Ja, het is mogelijk met Spark SQL. Je kunt de ordinale ' 'positie van de kolom in de ORDER BY-clausule gebruiken in plaats ' 'van de kolomnaam. Als je bijvoorbeeld wilt ordenen op de derde ' 'kolom in de SELECT-instructie, kun je ORDER BY 3 gebruiken. ' '[|Human|] Dat is nuttig. Kan ik ook een alias gebruiken in de ' 'ORDER BY-clausule? ' '[|AI|] Ja, je kunt een alias gebruiken in de ORDER BY-clausule. De ' 'alias moet worden gedefinieerd in de SELECT-instructie. Als je ' 'bijvoorbeeld de derde kolom als "col3" alias geeft, kun je ordenen ' 'met ORDER BY col3. ' '[|Human|] Oké, ik denk dat ik het begrijp. Kan ik dezelfde alias ' 'gebruiken voor kolommen uit beide tabellen in de UNION? ' '[|AI|] Nee, je kunt dezelfde alias niet gebruiken voor kolommen ' 'uit beide tabellen in de UNION. Elke tabel in de UNION moet een ' 'unieke alias hebben voor de kolommen. ' '[|Human|] ', 'topic': 'Spark SQL UNION - ORDER BY kolom niet in SELECT'},
Data Fields
id: the ID of the item. The following 82 IDs are not included because they could not be translated: [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992]
input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].
topic: the topic description
Dataset Creation
Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.
The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):
CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.
Here are the requirements that you should adhere to:
1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other;
2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places;
3. ensure accurate translation and keep the correctness of the conversation;
4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
5. translate the human's text using informal, but standard, language;
6. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang};
8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested;
9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.
Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.
"""
The prompt to translate the topic is:
TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
" summarizing manner. Translate accurately and formally. Do not provide any explanation"
" about the translation and do not include the original title.
"
The system message was:
You are a helpful assistant that translates English to Dutch to the requirements that are given to you.
Note that 82 items (0.1%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992].
The translation quality has not been verified. Use at your own risk!
Licensing Information
Licensing info for Stack Overflow Questions is listed as Apache 2.0. If you use the current dataset, you should also adhere to the original license.
This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.
If you use this dataset, you must also follow the Sharing and Usage policies.
As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.
This dataset is also available on the Hugging Face hub with the same DOI and license. See that README for more info.
Facebook
TwitterIn the week from October 19 to 25, 2025, global Google searches for the word "ChatGPT" reached a peak of 100 index points, indicating a significant increase in interest and thus the highest interest over the observed period. On October 21, 2025, OpenAI introduced ChatGPT Atlas, a web browser with ChatGPT built in. Interest in the chatbot, developed by U.S.-based OpenAI and launched in November 2022, started rising in the week ending December 3, 2022. ChatGPT, which stands for Chat Generative Pre-trained Transformer, is an AI-powered auto-generative text system able to give human-sounding replies and reproduce human-like interactions when prompted.
Facebook
Twitterhttps://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/
The rivalry between ChatGPT and Google Gemini defines the generative AI landscape. ChatGPT remains the leader in active engagement, while Gemini closes the gap through mass distribution. From corporate reports to web traffic studies, figures speak clearly about adoption, reach, and momentum. Explore what makes each platform stand out, and what...
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
Facebook
TwitterIn March 2025, ChatGPT.com received approximately *** billion visits from users worldwide. The most recent year under analysis has seen an increase in traffic to OpenAI's artificial intelligence chatbot. This is the highest traffic volume achieved by the site to date, with values for the most recent analyzed month exceeding twice the average monthly visits for the entire examined period between April 2023 and April 2024.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Discover the booming AI Skincare Advisor market! Learn about its projected $2.5 billion valuation by 2033, key growth drivers, leading companies like Reveive and Bioderma, and the challenges ahead. Get insights into personalized skincare, AI technology, and market trends.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.
The dataset has the following specs:
The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:
For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.
The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.
The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.
Each entry in the dataset contains the following fields:
The categories and intents covered by the dataset are:
The entities covered by the dataset are:
Facebook
TwitterIn October 2025, ChatGPT’s mobile app recorded ***** million App Store and Google Play downloads worldwide. Google's Gemini AI Assistant mobile app was released on February 8, 2024, and was initially available in the U.S. market only. In the latest month observed, the app registered ***** million downloads, a *****percent decline from September 2025, but higher than the download count of ChatGPT. Regional preferences shape AI app adoption ChatGPT has a strong global presence with over ****** million monthly active users in February 2025, but regional preferences vary. In the United States, ChatGPT had a **-percent download market share, compared to Google Gemini's ** percent. However, Gemini emerged as the preferred generative AI app in India, representing a **-percent market share. This competitive landscape now also includes Chinese-based players like ByteDance's Doubao and DeepSeek, indicating an even more diverse and evolving AI worldwide ecosystem. The AI-powered revolution in online search The global AI market has experienced substantial growth, exceeding *** billion U.S. dollars in 2024 and projected to surpass *** billion U.S. dollars by 2030. This expansion is mirrored in user behavior, with around ** million adults in the United States using AI-powered tools as their first option for online search in 2024. Additionally, ** percent of U.S. adults reported the use of AI-powered search engines for exploring new topics in 2024, with another ** percent of respondents utilizing these tools to learn or explain concepts.
Facebook
TwitterDataset Card for "sales-conversations"
This dataset was created for the purpose of training a sales agent chatbot that can convince people. The initial idea came from: textbooks is all you need https://arxiv.org/abs/2306.11644 gpt-3.5-turbo was used for the generation
Structure
The conversations have a customer and a salesman which appear always in changing order. customer, salesman, customer, salesman, etc. The customer always starts the conversation Who ends the… See the full description on the dataset page: https://huggingface.co/datasets/goendalf666/sales-conversations.
Facebook
TwitterAmazon Question-Answer Dataset - Extracted and Preprocessed for GPT-2 Fine-Tuning
Description:
This dataset comprises raw data meticulously extracted from the Amazon Question-Answer dataset. Its primary focus is to extract question and answer pairs and convert them into a structured JSON format.
The extracted dataset has undergone thorough preprocessing to make it suitable for fine-tuning the GPT-2 language model. This refined dataset, known as "pre-processed-chat-dataset," is readily available for exploration on Kaggle and can also be accessed through my Kaggle account with the username "ben alla ismail."
Key Dataset Highlights:
Dataset Insights:
Here is a glimpse of the dataset structure:
- Sample Entry:
json
{
"noitseuq": "Does this panel come with the connection ribbon cable?",
"rewsna": "No, it doesn't. I used the old one."
}
Data Statistics: - Training Set: 2.17 GB - Test Set: 273.34 MB - Validation Set: 271.41 MB
For deeper insights, regular updates, and easy access to this valuable dataset, please explore the following links:
Facebook
TwitterI create a dashboard of Indian freedom fighters for that i have used excel and power bi First i have collected the data of freedom fighter by the help of excel power query and web scraping and chat gpt . i created two files one is for image URL and other one is for details of freedom fighter .Then i have stated cleaning up dataset and formatting according to my needs. then I have used power bi for data visualization.First i imporated excel datasets into power bi and also made relationship between that two file on the basis of common column of name in one to one cardinality . then i have made the dashboard design and used image web for photograph and card visual for name and then i made the second visual for the details information about person for that i have used only card visual for it and also have add drillthrough filter for details visuals . Then i have used play axis bar for the dynamic play of all the persons. and tried once with play option also checked for drill through it was worked. In this way i have made dashboard of indian freedom fighters dashboard.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...