19 datasets found

b
ChatGPT Revenue and Usage Statistics (2025)
businessofapps.com
Updated Feb 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/
Explore at:
Dataset updated
Feb 9, 2023
Dataset authored and provided by
Business of Apps
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...
ChatGPT Classification Dataset
kaggle.com
zip
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
Explore at:
zip(718710 bytes)Available download formats
Dataset updated
Sep 7, 2023
Authors
Mahdi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf
All GPT-4 Conversations
kaggle.com
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). All GPT-4 Conversations [Dataset]. https://www.kaggle.com/datasets/thedevastator/all-gpt-4-synthetic-chat-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description

All GPT-4 Generated Datasets

Every chat dataset generated by GPT-4 from Huggingface at the same format

From [Huggingface datasets]

About this dataset

How to use the dataset

The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets. Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.

Acknowledgements

This dataset is a collection of several single chat datasets. If you use this dataset in your research, please credit the original authors of the internal datasets. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
f
Data Sheet 1_Free word association analysis of students' perception of...
frontiersin.figshare.com
pdf
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marvin Henrich; Sandra Formella-Zimmermann; Sebastian Schneider; Paul Wilhelm Dierkes (2025). Data Sheet 1_Free word association analysis of students' perception of artificial intelligence.pdf [Dataset]. http://doi.org/10.3389/feduc.2025.1543746.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2025.1543746.s001
Dataset updated
May 21, 2025
Dataset provided by
Frontiers
Authors
Marvin Henrich; Sandra Formella-Zimmermann; Sebastian Schneider; Paul Wilhelm Dierkes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study aims to explore students' associations with Artificial Intelligence (AI) and how these perceptions have evolved following the release of Chat GPT. A free word association test was conducted with 836 German high school students aged 10–20. Associations were collected before and after the release of Chat GPT, processed, cleaned, and inductively categorized into nine groups: technical association, assistance system, future, human, negative, positive, artificial, others, and no association. In total, 355 distinct terms were mentioned, with “robot” emerging as the most frequently cited, followed by “computer” and “Chat GPT,” indicating a strong connection between AI and technological applications. The release of Chat GPT had a significant impact on students' associations, with a marked increase in mentions of Chat GPT and related assistance systems, such as Siri and Snapchat AI. The results reveal a shift in students' perception of AI-from abstract, futuristic concepts to more immediate, application-based associations. Network analysis further demonstrated how terms were semantically clustered, emphasizing the prominence of assistance systems in students' conceptions. The findings underscore the importance of integrating AI education that fosters both critical reflection and practical understanding of AI, encouraging responsible engagement with the technology. These insights are crucial for shaping the future of AI literacy in schools and universities.
ChatGPT website traffic share 2024, by country
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, ChatGPT website traffic share 2024, by country [Dataset]. https://www.statista.com/statistics/1463911/chatgpt-chat-open-ai-com-traffic-share-by-country/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2024
Area covered
Worldwide
Description
In January 2024, ChatGPT online domain chat.openai.com registered over **** percent of its traffic as originating in the United States. Users based in India generated approximately **** percent of the total visits to the chatbot platform, while users in Indonesia accounted for *** percent of the total visits to the website. Visits from Brazil represented the fourth-largest group for the platform, generating more than **** percent of the total traffic recorded in the examined period.
Global ChatGPT monthly app downloads 2025
statista.com
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global ChatGPT monthly app downloads 2025 [Dataset]. https://www.statista.com/statistics/1386342/chat-gpt-app-downloads/
Explore at:
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2023 - Oct 2025
Area covered
Worldwide, United States
Description
In October 2025 alone, ChatGPT’s mobile app recorded over **** million iOS and Android downloads worldwide. This brings the total downloads to about *** billion since May 2023. The ChatGPT platform was launched on November 30, 2022, and has attracted global attention and interest. It has led the global artificial intelligence push with its chatbot format, transforming industries like online search.

daigt data - llama 70b and falcon180b

kaggle.com

zip

Updated Nov 26, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Nicholas Broad (2023). daigt data - llama 70b and falcon180b [Dataset]. https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b

Explore at:

zip(6163526 bytes)Available download formats

Dataset updated

Nov 26, 2023

Authors

Nicholas Broad

Description

This is for the LLM - Detect AI Generated Text (DAIGT) competition.

Versions

1. Very light processing out of LLM. 1k from llama-70b-chat, 1k from falcon-180b-chat across all persuade prompts and some extras from gpt-4

2. Added llama70b_v2.csv by cleaning up Llama 70b output as seen in this notebook. Same data, just with some text removed from samples.

3. 500 generated samples from llama 70b and falcon 180b for each prompt in RDizzl3_seven. (3,500 total for llama70b; 3,500 total for falcon180b). These had sources in the the prompt, unlike earlier versions.

I will be updating it more in the future as I improve the prompts. If you notice anything odd or if you have any questions, please don't hesitate to ask!

Prompts

The prompts were a combination of the PERSUADE corpus and some from GPT-4. Essays for the same prompt are generated with different temperatures, top_k values, and slightly different prompts.

All together there were 15 prompts from PERSUADE and 20 from GPT-4. All prompts are below:

persuade_prompts = ['Today the majority of humans own and operate cell phones on a daily basis. In essay form, explain if drivers should or should not be able to use cell phones in any capacity while operating a vehicle.',
 'Write an explanatory essay to inform fellow citizens about the advantages of limiting car usage. Your essay must be based on ideas and information that can be found in the passage set. Manage your time carefully so that you can read the passages; plan your response; write your response; and revise and edit your response. Be sure to use evidence from multiple sources; and avoid overly relying on one source. Your response should be in the form of a multiparagraph essay. Write your essay in the space provided.',
 'Some schools require students to complete summer projects to assure they continue learning during their break. Should these summer projects be teacher-designed or student-designed? Take a position on this question. Support your response with reasons and specific examples.',
 "You have just read the article, 'A Cowboy Who Rode the Waves.' Luke's participation in the Seagoing Cowboys program allowed him to experience adventures and visit many unique places. Using information from the article, write an argument from Luke's point of view convincing others to participate in the Seagoing Cowboys program. Be sure to include: reasons to join the program; details from the article to support Luke's claims; an introduction, a body, and a conclusion to your essay.",
 'Your principal has decided that all students must participate in at least one extracurricular activity. For example, students could participate in sports, work on the yearbook, or serve on the student council. Do you agree or disagree with this decision? Use specific details and examples to convince others to support your position. ',
 'In "The Challenge of Exploring Venus," the author suggests studying Venus is a worthy pursuit despite the dangers it presents. Using details from the article, write an essay evaluating how well the author supports this idea. Be sure to include: a claim that evaluates how well the author supports the idea that studying Venus is a worthy pursuit despite the dangers; an explanation of the evidence from the article that supports your claim; an introduction, a body, and a conclusion to your essay.',
 'In the article "Making Mona Lisa Smile," the author describes how a new technology called the Facial Action Coding System enables computers to identify human emotions. Using details from the article, write an essay arguing whether the use of this technology to read the emotional expressions of students in a classroom is valuable.',
 "You have read the article 'Unmasking the Face on Mars.' Imagine you are a scientist at NASA discussing the Face with someone who thinks it was created by aliens. Using information in the article, write an argumentative essay to convince someone that the Face is just a natural landform.Be sure to include: claims to support your argument that the Face is a natural landform; evidence from the article to support your claims; an introduction, a body, and a conclusion to your argumentative essay.",
 'Some of your friends perform community service. For example, some tutor elementary school children and others clean up litter. They think helping the community is very important. But other friends of yours think community service takes too much time away from what they need or want to do. 
Your principal is deciding whether to require all students to perform community service. 
Write a letter to your principal in which you take a position on whether students should be required to perform community service. Support your position with examples.',
 "Your principal is considering changing school po...

Colon, Lung, and Thyroid Cancer Classification
kaggle.com
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kubilay Sünnetci (2025). Colon, Lung, and Thyroid Cancer Classification [Dataset]. https://www.kaggle.com/datasets/kubilaysnnetci/colon-lung-and-thyroid-cancer-classification/code
Explore at:
zip(428170 bytes)Available download formats
Dataset updated
May 29, 2025
Authors
Kubilay Sünnetci
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset was created by Chat GPT for text classification of Colon, Lung, and Thyroid Cancer. This data set was created in a novel way with the answers given to the questions created by Chat GPT.

In addition, the terms “colon, lung, and thyroid” have been removed in this dataset.

With this dataset, artificial intelligence-based models can be developed for the classification of Colon, Lung, and Thyroid Cancer.

Details of the dataset are available in the following article: - Sunnetci, K. M. (2025). Biomedical text-based detection of colon, lung, and thyroid cancer: A deep learning approach with novel dataset. Displays, 103068. https://doi.org/10.1016/j.displa.2025.103068

If you use this dataset, please cite the article above.

The authors would like to thank ChatGPT for its support in preparing the data.
Z
Stack Overflow Chat Dutch
data.niaid.nih.gov
Updated Jun 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanroy, Bram (2023). Stack Overflow Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052379
Explore at:
Dataset updated
Jun 20, 2023
Authors
Vanroy, Bram
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset.

Data Instances

{'id': 9, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik heb wat hulp nodig met Spark SQL union en het ordenen ' 'van kolommen. ' '[|AI|] Natuurlijk, ik help je graag. Waar heb je specifiek hulp ' 'bij nodig? ' '[|Human|] Ik wil de output van een UNION-query ordenen op een ' 'kolom die niet aanwezig is in de SELECT-instructie. Is dat ' 'mogelijk? ' '[|AI|] Ja, het is mogelijk met Spark SQL. Je kunt de ordinale ' 'positie van de kolom in de ORDER BY-clausule gebruiken in plaats ' 'van de kolomnaam. Als je bijvoorbeeld wilt ordenen op de derde ' 'kolom in de SELECT-instructie, kun je ORDER BY 3 gebruiken. ' '[|Human|] Dat is nuttig. Kan ik ook een alias gebruiken in de ' 'ORDER BY-clausule? ' '[|AI|] Ja, je kunt een alias gebruiken in de ORDER BY-clausule. De ' 'alias moet worden gedefinieerd in de SELECT-instructie. Als je ' 'bijvoorbeeld de derde kolom als "col3" alias geeft, kun je ordenen ' 'met ORDER BY col3. ' '[|Human|] Oké, ik denk dat ik het begrijp. Kan ik dezelfde alias ' 'gebruiken voor kolommen uit beide tabellen in de UNION? ' '[|AI|] Nee, je kunt dezelfde alias niet gebruiken voor kolommen ' 'uit beide tabellen in de UNION. Elke tabel in de UNION moet een ' 'unieke alias hebben voor de kolommen. ' '[|Human|] ', 'topic': 'Spark SQL UNION - ORDER BY kolom niet in SELECT'},

Data Fields

id: the ID of the item. The following 82 IDs are not included because they could not be translated: [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992]

input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

topic: the topic description

Dataset Creation

Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

"""

The prompt to translate the topic is:

TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
" summarizing manner. Translate accurately and formally. Do not provide any explanation"
" about the translation and do not include the original title.

"

The system message was:

You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

Note that 82 items (0.1%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992].

The translation quality has not been verified. Use at your own risk!

Licensing Information

Licensing info for Stack Overflow Questions is listed as Apache 2.0. If you use the current dataset, you should also adhere to the original license.

This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

If you use this dataset, you must also follow the Sharing and Usage policies.

As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

This dataset is also available on the Hugging Face hub with the same DOI and license. See that README for more info.
Global interest in ChatGPT on Google search weekly 2022-2025
statista.com
Updated Nov 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global interest in ChatGPT on Google search weekly 2022-2025 [Dataset]. https://www.statista.com/statistics/1366930/chatgpt-google-search-weekly-worldwide/
Explore at:
Dataset updated
Nov 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 6, 2022 - Oct 25, 2025
Area covered
Worldwide
Description
In the week from October 19 to 25, 2025, global Google searches for the word "ChatGPT" reached a peak of 100 index points, indicating a significant increase in interest and thus the highest interest over the observed period. On October 21, 2025, OpenAI introduced ChatGPT Atlas, a web browser with ChatGPT built in. Interest in the chatbot, developed by U.S.-based OpenAI and launched in November 2022, started rising in the week ending December 3, 2022. ChatGPT, which stands for Chat Generative Pre-trained Transformer, is an AI-powered auto-generative text system able to give human-sounding replies and reproduce human-like interactions when prompted.
S
ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends
sqmagazine.co.uk
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SQ Magazine (2025). ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends [Dataset]. https://sqmagazine.co.uk/chatgpt-vs-google-gemini-statistics/
Explore at:
Dataset updated
Oct 7, 2025
Dataset authored and provided by
SQ Magazine
License
https://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/
Time period covered
Jan 1, 2024 - Dec 31, 2025
Area covered
Global
Description
The rivalry between ChatGPT and Google Gemini defines the generative AI landscape. ChatGPT remains the leader in active engagement, while Gemini closes the gap through mass distribution. From corporate reports to web traffic studies, figures speak clearly about adoption, reach, and momentum. Explore what makes each platform stand out, and what...
h
awesome-chatgpt-prompts
huggingface.co
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Authors
Fatih Kadir Akın
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
🧠 Awesome ChatGPT Prompts [CSV dataset]

This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

License

CC-0
ChatGPT global web traffic 2022-2024
statista.com
Updated Nov 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). ChatGPT global web traffic 2022-2024 [Dataset]. https://www.statista.com/statistics/1463713/chatgpt-chat-openai-com-monthly-visits/
Explore at:
Dataset updated
Nov 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Apr 2023 - Mar 2025
Area covered
Worldwide
Description
In March 2025, ChatGPT.com received approximately *** billion visits from users worldwide. The most recent year under analysis has seen an increase in traffic to OpenAI's artificial intelligence chatbot. This is the highest traffic volume achieved by the site to date, with values for the most recent analyzed month exceeding twice the average monthly visits for the entire examined period between April 2023 and April 2024.
A
AI Skincare Advisor Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). AI Skincare Advisor Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-skincare-advisor-504886
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Aug 29, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Discover the booming AI Skincare Advisor market! Learn about its projected $2.5 billion valuation by 2033, key growth drivers, leading companies like Reveive and Bioderma, and the challenges ahead. Get insights into personalized skincare, AI technology, and market trends.
Bitext Gen AI Chatbot Customer Support Dataset
kaggle.com
zip
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext Gen AI Chatbot Customer Support Dataset [Dataset]. https://www.kaggle.com/datasets/bitext/bitext-gen-ai-chatbot-customer-support-dataset
Explore at:
zip(3007665 bytes)Available download formats
Dataset updated
Mar 18, 2024
Authors
Bitext
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.

The dataset has the following specs:

Use Case: Intent Detection

Vertical: Customer Service

27 intents assigned to 10 categories

26872 question/answer pairs, around 1000 per intent

30 entity/slot types

12 different types of language generation tags

The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:

Automotive, Retail Banking, Education, Events & Ticketing, Field Services, Healthcare, Hospitality, Insurance, Legal Services, Manufacturing, Media Streaming, Mortgages & Loans, Moving & Storage, Real Estate/Construction, Restaurant & Bar Chains, Retail/E-commerce, Telecommunications, Travel, Utilities, Wealth Management

For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.

The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.

Dataset Token Count

The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.

Fields of the Dataset

Each entry in the dataset contains the following fields:

flags: tags (explained below in the Language Generation Tags section)

instruction: a user request from the Customer Service domain

category: the high-level semantic category for the intent

intent: the intent corresponding to the user instruction

response: an example expected response from the virtual assistant

Categories and Intents

The categories and intents covered by the dataset are:

ACCOUNT: create_account, delete_account, edit_account, recover_password, registration_problems, switch_account

CANCELLATION_FEE: check_cancellation_fee

CONTACT: contact_customer_service, contact_human_agent

DELIVERY: delivery_options, delivery_period

FEEDBACK: complaint, review

INVOICE: check_invoice, get_invoice

ORDER: cancel_order, change_order, place_order, track_order

PAYMENT: check_payment_methods, payment_issue

REFUND: check_refund_policy, get_refund, track_refund

SHIPPING_ADDRESS: change_shipping_address, set_up_shipping_address

SUBSCRIPTION: newsletter_subscription

Entities

The entities covered by the dataset are:

{{Order Number}}, typically present in:

Intents: cancel_order, change_order, change_shipping_address, check_invoice, check_refund_policy, complaint, delivery_options, delivery_period, get_invoice, get_refund, place_order, track_order, track_refund

{{Invoice Number}}, typically present in:

Intents: check_invoice, get_invoice

{{Online Order Interaction}}, typically present in:

Intents: cancel_order, change_order, check_refund_policy, delivery_period, get_refund, review, track_order, track_refund

{{Online Payment Interaction}}, typically present in:

Intents: cancel_order, check_payment_methods

{{Online Navigation Step}}, typically present in:

Intents: complaint, delivery_options

{{Online Customer Support Channel}}, typically present in:

Intents: check_refund_policy, complaint, contact_human_agent, delete_account, delivery_options, edit_account, get_refund, payment_issue, registration_problems, switch_account

{{Profile}}, typically present in:

Intent: switch_account

{{Profile Type}}, typically present in:

Intent: switch_account

{{Settings}}, typically present in:

Intents: cancel_order, change_order, change_shipping_address, check_cancellation_fee, check_invoice, check_payment_methods, contact_human_agent, delete_account, delivery_options, edit_account, get_invoice, newsletter_subscription, payment_issue, place_order, recover_password, registration_problems, set_up_shipping_address, switch_account, track_order, track_refund

{{Online Company Portal Info}}, typically present in:

Intents: cancel_order, edit_account

{{Date}}, typically present in:

Intents: check_invoice, check_refund_policy, get_refund, track_order, track_refund

{{Date Range}}, typically present in:

Intents: check_cancellation_fee, check_invoice, get_invoice

{{Shipping Cut-off Time}}, typically present in:

Intent: delivery_options

{{Delivery City}}, typically present in:

Inten...
ChatGPT and Gemini app downloads worldwide monthly 2023-2025
statista.com
abripper.com
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). ChatGPT and Gemini app downloads worldwide monthly 2023-2025 [Dataset]. https://www.statista.com/statistics/1497377/global-chatgpt-vs-gemini-app-downloads/
Explore at:
Dataset updated
Dec 3, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2023 - Oct 2025
Area covered
Worldwide
Description
In October 2025, ChatGPT’s mobile app recorded ***** million App Store and Google Play downloads worldwide. Google's Gemini AI Assistant mobile app was released on February 8, 2024, and was initially available in the U.S. market only. In the latest month observed, the app registered ***** million downloads, a *****percent decline from September 2025, but higher than the download count of ChatGPT. Regional preferences shape AI app adoption ChatGPT has a strong global presence with over ****** million monthly active users in February 2025, but regional preferences vary. In the United States, ChatGPT had a **-percent download market share, compared to Google Gemini's ** percent. However, Gemini emerged as the preferred generative AI app in India, representing a **-percent market share. This competitive landscape now also includes Chinese-based players like ByteDance's Doubao and DeepSeek, indicating an even more diverse and evolving AI worldwide ecosystem. The AI-powered revolution in online search The global AI market has experienced substantial growth, exceeding *** billion U.S. dollars in 2024 and projected to surpass *** billion U.S. dollars by 2030. This expansion is mirrored in user behavior, with around ** million adults in the United States using AI-powered tools as their first option for online search in 2024. Additionally, ** percent of U.S. adults reported the use of AI-powered search engines for exploring new topics in 2024, with another ** percent of respondents utilizing these tools to learn or explain concepts.
h
sales-conversations
huggingface.co
Updated Sep 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ENGEL (2023). sales-conversations [Dataset]. https://huggingface.co/datasets/goendalf666/sales-conversations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2023
Authors
ENGEL
Description
Dataset Card for "sales-conversations"

This dataset was created for the purpose of training a sales agent chatbot that can convince people. The initial idea came from: textbooks is all you need https://arxiv.org/abs/2306.11644 gpt-3.5-turbo was used for the generation

Structure

The conversations have a customer and a salesman which appear always in changing order. customer, salesman, customer, salesman, etc. The customer always starts the conversation Who ends the… See the full description on the dataset page: https://huggingface.co/datasets/goendalf666/sales-conversations.
raw-chat-dataset
kaggle.com
zip
Updated Sep 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ben alla ismail (2023). raw-chat-dataset [Dataset]. https://www.kaggle.com/datasets/benallaismail/raw-chat-dataset
Explore at:
zip(892690951 bytes)Available download formats
Dataset updated
Sep 24, 2023
Authors
ben alla ismail
Description
Amazon Question-Answer Dataset - Extracted and Preprocessed for GPT-2 Fine-Tuning

Description:

This dataset comprises raw data meticulously extracted from the Amazon Question-Answer dataset. Its primary focus is to extract question and answer pairs and convert them into a structured JSON format.

The extracted dataset has undergone thorough preprocessing to make it suitable for fine-tuning the GPT-2 language model. This refined dataset, known as "pre-processed-chat-dataset," is readily available for exploration on Kaggle and can also be accessed through my Kaggle account with the username "ben alla ismail."

Key Dataset Highlights:

Data Source: Amazon Question-Answer Dataset

Format: JSON (Structured Question-Answer Pairs)

Purpose: Fine-Tuning the GPT-2 Model

Dataset Insights:

Here is a glimpse of the dataset structure: - Sample Entry: json { "noitseuq": "Does this panel come with the connection ribbon cable?", "rewsna": "No, it doesn't. I used the old one." }

Data Statistics: - Training Set: 2.17 GB - Test Set: 273.34 MB - Validation Set: 271.41 MB

For deeper insights, regular updates, and easy access to this valuable dataset, please explore the following links:

LinkedIn: Ismail Ben Alla

GitHub: benisalla

Instagram: ismail_ben_alla

Twitter: ismail_ben_alla
Freedom fighters Dashboard
kaggle.com
zip
Updated Jan 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prajyot palam (2024). Freedom fighters Dashboard [Dataset]. https://www.kaggle.com/datasets/prajyotpalam/freedom-fighters-dashboard/code
Explore at:
zip(653521 bytes)Available download formats
Dataset updated
Jan 26, 2024
Authors
Prajyot palam
Description
I create a dashboard of Indian freedom fighters for that i have used excel and power bi First i have collected the data of freedom fighter by the help of excel power query and web scraping and chat gpt . i created two files one is for image URL and other one is for details of freedom fighter .Then i have stated cleaning up dataset and formatting according to my needs. then I have used power bi for data visualization.First i imporated excel datasets into power bi and also made relationship between that two file on the basis of common column of name in one to one cardinality . then i have made the dashboard design and used image web for photograph and card visual for name and then i made the second visual for the details information about person for that i have used only card visual for it and also have add drillthrough filter for details visuals . Then i have used play axis bar for the dynamic play of all the persons. and tried once with play option also checked for drill through it was worked. In this way i have made dashboard of indian freedom fighters dashboard.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/

ChatGPT Revenue and Usage Statistics (2025)

Explore at:

26 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 9, 2023

Dataset authored and provided by

Business of Apps

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...

Clear search

Close search

Google apps

Main menu

ChatGPT Revenue and Usage Statistics (2025)

ChatGPT Classification Dataset

All GPT-4 Conversations

All GPT-4 Generated Datasets

Every chat dataset generated by GPT-4 from Huggingface at the same format

About this dataset

How to use the dataset

Acknowledgements

License

Data Sheet 1_Free word association analysis of students' perception of...

ChatGPT website traffic share 2024, by country

Global ChatGPT monthly app downloads 2025

daigt data - llama 70b and falcon180b

This is for the LLM - Detect AI Generated Text (DAIGT) competition.

Versions

1. Very light processing out of LLM. 1k from llama-70b-chat, 1k from falcon-180b-chat across all persuade prompts and some extras from gpt-4

2. Added llama70b_v2.csv by cleaning up Llama 70b output as seen in this notebook. Same data, just with some text removed from samples.

3. 500 generated samples from llama 70b and falcon 180b for each prompt in RDizzl3_seven. (3,500 total for llama70b; 3,500 total for falcon180b). These had sources in the the prompt, unlike earlier versions.

Prompts

Colon, Lung, and Thyroid Cancer Classification

Stack Overflow Chat Dutch

Global interest in ChatGPT on Google search weekly 2022-2025

ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends

awesome-chatgpt-prompts

ChatGPT global web traffic 2022-2024

AI Skincare Advisor Report

Bitext Gen AI Chatbot Customer Support Dataset

Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

Dataset Token Count

Fields of the Dataset

Categories and Intents

Entities

ChatGPT and Gemini app downloads worldwide monthly 2023-2025

sales-conversations

raw-chat-dataset

Freedom fighters Dashboard

ChatGPT Revenue and Usage Statistics (2025)