28 datasets found

b
ChatGPT Revenue and Usage Statistics (2025)
businessofapps.com
Updated Feb 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/
Explore at:
Dataset updated
Feb 9, 2023
Dataset authored and provided by
Business of Apps
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...
All GPT-4 Conversations
kaggle.com
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). All GPT-4 Conversations [Dataset]. https://www.kaggle.com/datasets/thedevastator/all-gpt-4-synthetic-chat-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description

All GPT-4 Generated Datasets

Every chat dataset generated by GPT-4 from Huggingface at the same format

From [Huggingface datasets]

About this dataset

How to use the dataset

The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets. Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.

Acknowledgements

This dataset is a collection of several single chat datasets. If you use this dataset in your research, please credit the original authors of the internal datasets. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Global interest in ChatGPT on Google search weekly 2022-2025
statista.com
Updated Nov 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global interest in ChatGPT on Google search weekly 2022-2025 [Dataset]. https://www.statista.com/statistics/1366930/chatgpt-google-search-weekly-worldwide/
Explore at:
Dataset updated
Nov 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 6, 2022 - Oct 25, 2025
Area covered
Worldwide
Description
In the week from October 19 to 25, 2025, global Google searches for the word "ChatGPT" reached a peak of 100 index points, indicating a significant increase in interest and thus the highest interest over the observed period. On October 21, 2025, OpenAI introduced ChatGPT Atlas, a web browser with ChatGPT built in. Interest in the chatbot, developed by U.S.-based OpenAI and launched in November 2022, started rising in the week ending December 3, 2022. ChatGPT, which stands for Chat Generative Pre-trained Transformer, is an AI-powered auto-generative text system able to give human-sounding replies and reproduce human-like interactions when prompted.
h
Bitext-customer-support-llm-chatbot-training-dataset
huggingface.co
opendatalab.com
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.
d
Data from: Performance of GPT-3.5 and GPT-4 on standardized urology...
search.dataone.org
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max S. Yudovich; Elizaveta Makarova; Christian M. Hague; Jay D. Raman (2024). Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study [Dataset]. http://doi.org/10.7910/DVN/4EJOCL
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/4EJOCL
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Max S. Yudovich; Elizaveta Makarova; Christian M. Hague; Jay D. Raman
Description
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States. In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Bitext Gen AI Chatbot Customer Support Dataset
kaggle.com
zip
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext Gen AI Chatbot Customer Support Dataset [Dataset]. https://www.kaggle.com/datasets/bitext/bitext-gen-ai-chatbot-customer-support-dataset
Explore at:
zip(3007665 bytes)Available download formats
Dataset updated
Mar 18, 2024
Authors
Bitext
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.

The dataset has the following specs:

Use Case: Intent Detection

Vertical: Customer Service

27 intents assigned to 10 categories

26872 question/answer pairs, around 1000 per intent

30 entity/slot types

12 different types of language generation tags

The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:

Automotive, Retail Banking, Education, Events & Ticketing, Field Services, Healthcare, Hospitality, Insurance, Legal Services, Manufacturing, Media Streaming, Mortgages & Loans, Moving & Storage, Real Estate/Construction, Restaurant & Bar Chains, Retail/E-commerce, Telecommunications, Travel, Utilities, Wealth Management

For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.

The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.

Dataset Token Count

The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.

Fields of the Dataset

Each entry in the dataset contains the following fields:

flags: tags (explained below in the Language Generation Tags section)

instruction: a user request from the Customer Service domain

category: the high-level semantic category for the intent

intent: the intent corresponding to the user instruction

response: an example expected response from the virtual assistant

Categories and Intents

The categories and intents covered by the dataset are:

ACCOUNT: create_account, delete_account, edit_account, recover_password, registration_problems, switch_account

CANCELLATION_FEE: check_cancellation_fee

CONTACT: contact_customer_service, contact_human_agent

DELIVERY: delivery_options, delivery_period

FEEDBACK: complaint, review

INVOICE: check_invoice, get_invoice

ORDER: cancel_order, change_order, place_order, track_order

PAYMENT: check_payment_methods, payment_issue

REFUND: check_refund_policy, get_refund, track_refund

SHIPPING_ADDRESS: change_shipping_address, set_up_shipping_address

SUBSCRIPTION: newsletter_subscription

Entities

The entities covered by the dataset are:

{{Order Number}}, typically present in:

Intents: cancel_order, change_order, change_shipping_address, check_invoice, check_refund_policy, complaint, delivery_options, delivery_period, get_invoice, get_refund, place_order, track_order, track_refund

{{Invoice Number}}, typically present in:

Intents: check_invoice, get_invoice

{{Online Order Interaction}}, typically present in:

Intents: cancel_order, change_order, check_refund_policy, delivery_period, get_refund, review, track_order, track_refund

{{Online Payment Interaction}}, typically present in:

Intents: cancel_order, check_payment_methods

{{Online Navigation Step}}, typically present in:

Intents: complaint, delivery_options

{{Online Customer Support Channel}}, typically present in:

Intents: check_refund_policy, complaint, contact_human_agent, delete_account, delivery_options, edit_account, get_refund, payment_issue, registration_problems, switch_account

{{Profile}}, typically present in:

Intent: switch_account

{{Profile Type}}, typically present in:

Intent: switch_account

{{Settings}}, typically present in:

Intents: cancel_order, change_order, change_shipping_address, check_cancellation_fee, check_invoice, check_payment_methods, contact_human_agent, delete_account, delivery_options, edit_account, get_invoice, newsletter_subscription, payment_issue, place_order, recover_password, registration_problems, set_up_shipping_address, switch_account, track_order, track_refund

{{Online Company Portal Info}}, typically present in:

Intents: cancel_order, edit_account

{{Date}}, typically present in:

Intents: check_invoice, check_refund_policy, get_refund, track_order, track_refund

{{Date Range}}, typically present in:

Intents: check_cancellation_fee, check_invoice, get_invoice

{{Shipping Cut-off Time}}, typically present in:

Intent: delivery_options

{{Delivery City}}, typically present in:

Inten...
ChatGPT Classification Dataset
kaggle.com
zip
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
Explore at:
zip(718710 bytes)Available download formats
Dataset updated
Sep 7, 2023
Authors
Mahdi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf
a
End-to-End Response Time by Input Token Count by Models Model
artificialanalysis.ai
Updated Jan 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). End-to-End Response Time by Input Token Count by Models Model [Dataset]. https://artificialanalysis.ai/models
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better by Model
a
Latency vs. Output Speed by Models Model
artificialanalysis.ai
Updated Jan 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). Latency vs. Output Speed by Models Model [Dataset]. https://artificialanalysis.ai/models
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comprehensive comparison of Latency (Time to First Token) vs. Output Speed (Output Tokens per Second) by Model
a
Latency by Input Token Count by Models Model
artificialanalysis.ai
Updated Jan 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). Latency by Input Token Count by Models Model [Dataset]. https://artificialanalysis.ai/models
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Seconds to First Token Received; Lower is better by Model
S
ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends
sqmagazine.co.uk
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SQ Magazine (2025). ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends [Dataset]. https://sqmagazine.co.uk/chatgpt-vs-google-gemini-statistics/
Explore at:
Dataset updated
Oct 7, 2025
Dataset authored and provided by
SQ Magazine
License
https://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/
Time period covered
Jan 1, 2024 - Dec 31, 2025
Area covered
Global
Description
The rivalry between ChatGPT and Google Gemini defines the generative AI landscape. ChatGPT remains the leader in active engagement, while Gemini closes the gap through mass distribution. From corporate reports to web traffic studies, figures speak clearly about adoption, reach, and momentum. Explore what makes each platform stand out, and what...
a
Seconds to First Answer Token Received by Models Model
artificialanalysis.ai
Updated Jan 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). Seconds to First Answer Token Received by Models Model [Dataset]. https://artificialanalysis.ai/models
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time by Model
Data from: AstroChat
kaggle.com
huggingface.co
zip
Updated Jun 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
astro_pat (2024). AstroChat [Dataset]. https://www.kaggle.com/datasets/patrickfleith/astrochat
Explore at:
zip(1214166 bytes)Available download formats
Dataset updated
Jun 9, 2024
Authors
astro_pat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose and Scope

The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.

Intended Use

The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).

Quickstart

To be completed

DATASET DESCRIPTION

Access

Manual download from Hugging face hub: https://huggingface.co/datasets/patrickfleith/AstroChat

Or with python: python from datasets import load_dataset dataset = load_dataset("patrickfleith/AstroChat")

Structure

901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column): - id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets. - topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split. - subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc. - persona: description of the persona used to simulate a user - opening_question: the first question asked by the user to start a conversation with the AI-assistant - messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields: - role: the role of the speaker, either user or assistant - content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.

Important See the full list of topics and subtopics covered below.

Metadata

Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main

Generation Method

We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:

Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.

Step-by-step description

Defined a set of user persona

Defined a set of topics/ disciplines within the domain of Astronautics / Space Mission Engineering

For each topics, we defined a set of subtopics to narrow down the conversation to more specific and niche conversations (see below the full list)

For each subtopic we generate a set of opening questions that the user could ask to start a conversation (see below the full list)

We then distil the knowledge of an strong Chat Model (in our case ChatGPT through then api with gpt-4-turbo model) to generate the answers to the opening questions

We simulate follow-up questions from the user to the assistant, and the assistant's answers to these questions which builds up the messages.

Future work and contributions appreciated

Distil knowledge from more models (Anthropic, Mixtral, GPT-4o, etc...)

Implement more creativity in the opening questions and follow-up questions

Filter-out questions and conversations which are too similar

Ask topic and subtopic expert to validate the generated conversations to have a sense on how reliable is the overall dataset

Languages

All instances in the dataset are in english

Size

901 synthetically-generated dialogue

USAGE AND GUIDELINES

License

AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International

Restrictions

No restriction. Please provide the correct attribution following the license terms.

Citation

Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579

Update Frequency

Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)

Have a feedback or spot an error?

Use the ...
a
Output Speed by Input Token Count by Models Model
artificialanalysis.ai
Updated Jan 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). Output Speed by Input Token Count by Models Model [Dataset]. https://artificialanalysis.ai/models
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Output Tokens per Second; Higher is better by Model
raw-chat-dataset
kaggle.com
zip
Updated Sep 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ben alla ismail (2023). raw-chat-dataset [Dataset]. https://www.kaggle.com/datasets/benallaismail/raw-chat-dataset
Explore at:
zip(892690951 bytes)Available download formats
Dataset updated
Sep 24, 2023
Authors
ben alla ismail
Description
Amazon Question-Answer Dataset - Extracted and Preprocessed for GPT-2 Fine-Tuning

Description:

This dataset comprises raw data meticulously extracted from the Amazon Question-Answer dataset. Its primary focus is to extract question and answer pairs and convert them into a structured JSON format.

The extracted dataset has undergone thorough preprocessing to make it suitable for fine-tuning the GPT-2 language model. This refined dataset, known as "pre-processed-chat-dataset," is readily available for exploration on Kaggle and can also be accessed through my Kaggle account with the username "ben alla ismail."

Key Dataset Highlights:

Data Source: Amazon Question-Answer Dataset

Format: JSON (Structured Question-Answer Pairs)

Purpose: Fine-Tuning the GPT-2 Model

Dataset Insights:

Here is a glimpse of the dataset structure: - Sample Entry: json { "noitseuq": "Does this panel come with the connection ribbon cable?", "rewsna": "No, it doesn't. I used the old one." }

Data Statistics: - Training Set: 2.17 GB - Test Set: 273.34 MB - Validation Set: 271.41 MB

For deeper insights, regular updates, and easy access to this valuable dataset, please explore the following links:

LinkedIn: Ismail Ben Alla

GitHub: benisalla

Instagram: ismail_ben_alla

Twitter: ismail_ben_alla
a
Intelligence vs. Seconds to Output 500 Tokens, including reasoning model...
artificialanalysis.ai
Updated Jan 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). Intelligence vs. Seconds to Output 500 Tokens, including reasoning model 'thinking' time by Models Model [Dataset]. https://artificialanalysis.ai/models
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Seconds to Output 500 Tokens, including reasoning model 'thinking' time by Model
a
Intelligence vs. Output Speed by Models Model
artificialanalysis.ai
Updated Jan 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). Intelligence vs. Output Speed by Models Model [Dataset]. https://artificialanalysis.ai/models
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Output Speed (Output Tokens per Second) by Model
qa_ds_chat_based_translation
kaggle.com
zip
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ismail ben alla (2023). qa_ds_chat_based_translation [Dataset]. https://www.kaggle.com/benisalla/qa-ds-chat-based-translation
Explore at:
zip(754426593 bytes)Available download formats
Dataset updated
Sep 28, 2023
Authors
ismail ben alla
Description
Title: Preprocessed Dataset for Chatbot Model-Based Translation Transformer

Description:

This meticulously preprocessed dataset originates from raw chat data, accessible through the provided link. It has been transformed into a binary format to optimize efficiency and accessibility for deep learning applications.

The dataset is meticulously structured in binary format, essentially representing streams of integers. This format is particularly well-suited for the streamlined management of large-scale datasets in deep learning tasks.

Key Dataset Components:

Train.bin (5.85 GB): The training dataset.

Test.bin (731.35 MB): The testing dataset.

Val.bin (731.35 MB): The validation dataset.

To interact with this data, users can seamlessly utilize np.memmap from the NumPy library, offering effortless access and manipulation for various machine learning endeavors.

tably, the dataset employs the GPT-2 tokenizer with the inclusion of special tokens:
a
Tokens used to run all evaluations in the Artificial Analysis Intelligence...
artificialanalysis.ai
Updated Jan 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). Tokens used to run all evaluations in the Artificial Analysis Intelligence Index by Models Model [Dataset]. https://artificialanalysis.ai/models
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Tokens used to run all evaluations in the Artificial Analysis Intelligence Index by Model
h
awesome-chatgpt-prompts
huggingface.co
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Authors
Fatih Kadir Akın
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
🧠 Awesome ChatGPT Prompts [CSV dataset]

This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

License

CC-0

Facebook

Twitter

Click to copy link

Link copied

Cite

Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/

ChatGPT Revenue and Usage Statistics (2025)

Explore at:

26 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 9, 2023

Dataset authored and provided by

Business of Apps

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...

Clear search

Close search

Google apps

Main menu

ChatGPT Revenue and Usage Statistics (2025)

All GPT-4 Conversations

All GPT-4 Generated Datasets

Every chat dataset generated by GPT-4 from Huggingface at the same format

About this dataset

How to use the dataset

Acknowledgements

License

Global interest in ChatGPT on Google search weekly 2022-2025

Bitext-customer-support-llm-chatbot-training-dataset

Data from: Performance of GPT-3.5 and GPT-4 on standardized urology...

Bitext Gen AI Chatbot Customer Support Dataset

Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

Dataset Token Count

Fields of the Dataset

Categories and Intents

Entities

ChatGPT Classification Dataset

End-to-End Response Time by Input Token Count by Models Model

Latency vs. Output Speed by Models Model

Latency by Input Token Count by Models Model

ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends

Seconds to First Answer Token Received by Models Model

Data from: AstroChat

Purpose and Scope

Intended Use

Quickstart

DATASET DESCRIPTION

Access

Structure

Metadata

Generation Method

Step-by-step description

Future work and contributions appreciated

Languages

Size

USAGE AND GUIDELINES

License

Restrictions

Citation

Update Frequency

Have a feedback or spot an error?

Output Speed by Input Token Count by Models Model

raw-chat-dataset

Intelligence vs. Seconds to Output 500 Tokens, including reasoning model...

Intelligence vs. Output Speed by Models Model

qa_ds_chat_based_translation

tably, the dataset employs the GPT-2 tokenizer with the inclusion of special tokens:

Tokens used to run all evaluations in the Artificial Analysis Intelligence...

awesome-chatgpt-prompts

ChatGPT Revenue and Usage Statistics (2025)