28 datasets found
  1. b

    ChatGPT Revenue and Usage Statistics (2025)

    • businessofapps.com
    Updated Feb 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/
    Explore at:
    Dataset updated
    Feb 9, 2023
    Dataset authored and provided by
    Business of Apps
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...

  2. All GPT-4 Conversations

    • kaggle.com
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). All GPT-4 Conversations [Dataset]. https://www.kaggle.com/datasets/thedevastator/all-gpt-4-synthetic-chat-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    All GPT-4 Generated Datasets

    Every chat dataset generated by GPT-4 from Huggingface at the same format

    From [Huggingface datasets]

    About this dataset

    How to use the dataset

    The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets. Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.

    Acknowledgements

    This dataset is a collection of several single chat datasets. If you use this dataset in your research, please credit the original authors of the internal datasets. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

  3. Global interest in ChatGPT on Google search weekly 2022-2025

    • statista.com
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global interest in ChatGPT on Google search weekly 2022-2025 [Dataset]. https://www.statista.com/statistics/1366930/chatgpt-google-search-weekly-worldwide/
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 6, 2022 - Oct 25, 2025
    Area covered
    Worldwide
    Description

    In the week from October 19 to 25, 2025, global Google searches for the word "ChatGPT" reached a peak of 100 index points, indicating a significant increase in interest and thus the highest interest over the observed period. On October 21, 2025, OpenAI introduced ChatGPT Atlas, a web browser with ChatGPT built in. Interest in the chatbot, developed by U.S.-based OpenAI and launched in November 2022, started rising in the week ending December 3, 2022. ChatGPT, which stands for Chat Generative Pre-trained Transformer, is an AI-powered auto-generative text system able to give human-sounding replies and reproduce human-like interactions when prompted.

  4. h

    Bitext-customer-support-llm-chatbot-training-dataset

    • huggingface.co
    • opendatalab.com
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.

  5. d

    Data from: Performance of GPT-3.5 and GPT-4 on standardized urology...

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max S. Yudovich; Elizaveta Makarova; Christian M. Hague; Jay D. Raman (2024). Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study [Dataset]. http://doi.org/10.7910/DVN/4EJOCL
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Max S. Yudovich; Elizaveta Makarova; Christian M. Hague; Jay D. Raman
    Description

    This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States. In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.

  6. Bitext Gen AI Chatbot Customer Support Dataset

    • kaggle.com
    zip
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext Gen AI Chatbot Customer Support Dataset [Dataset]. https://www.kaggle.com/datasets/bitext/bitext-gen-ai-chatbot-customer-support-dataset
    Explore at:
    zip(3007665 bytes)Available download formats
    Dataset updated
    Mar 18, 2024
    Authors
    Bitext
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

    Overview

    This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.

    The dataset has the following specs:

    • Use Case: Intent Detection
    • Vertical: Customer Service
    • 27 intents assigned to 10 categories
    • 26872 question/answer pairs, around 1000 per intent
    • 30 entity/slot types
    • 12 different types of language generation tags

    The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:

    • Automotive, Retail Banking, Education, Events & Ticketing, Field Services, Healthcare, Hospitality, Insurance, Legal Services, Manufacturing, Media Streaming, Mortgages & Loans, Moving & Storage, Real Estate/Construction, Restaurant & Bar Chains, Retail/E-commerce, Telecommunications, Travel, Utilities, Wealth Management

    For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.

    The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.

    Dataset Token Count

    The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.

    Fields of the Dataset

    Each entry in the dataset contains the following fields:

    • flags: tags (explained below in the Language Generation Tags section)
    • instruction: a user request from the Customer Service domain
    • category: the high-level semantic category for the intent
    • intent: the intent corresponding to the user instruction
    • response: an example expected response from the virtual assistant

    Categories and Intents

    The categories and intents covered by the dataset are:

    • ACCOUNT: create_account, delete_account, edit_account, recover_password, registration_problems, switch_account
    • CANCELLATION_FEE: check_cancellation_fee
    • CONTACT: contact_customer_service, contact_human_agent
    • DELIVERY: delivery_options, delivery_period
    • FEEDBACK: complaint, review
    • INVOICE: check_invoice, get_invoice
    • ORDER: cancel_order, change_order, place_order, track_order
    • PAYMENT: check_payment_methods, payment_issue
    • REFUND: check_refund_policy, get_refund, track_refund
    • SHIPPING_ADDRESS: change_shipping_address, set_up_shipping_address
    • SUBSCRIPTION: newsletter_subscription

    Entities

    The entities covered by the dataset are:

    • {{Order Number}}, typically present in:
    • Intents: cancel_order, change_order, change_shipping_address, check_invoice, check_refund_policy, complaint, delivery_options, delivery_period, get_invoice, get_refund, place_order, track_order, track_refund
    • {{Invoice Number}}, typically present in:
      • Intents: check_invoice, get_invoice
    • {{Online Order Interaction}}, typically present in:
      • Intents: cancel_order, change_order, check_refund_policy, delivery_period, get_refund, review, track_order, track_refund
    • {{Online Payment Interaction}}, typically present in:
      • Intents: cancel_order, check_payment_methods
    • {{Online Navigation Step}}, typically present in:
      • Intents: complaint, delivery_options
    • {{Online Customer Support Channel}}, typically present in:
      • Intents: check_refund_policy, complaint, contact_human_agent, delete_account, delivery_options, edit_account, get_refund, payment_issue, registration_problems, switch_account
    • {{Profile}}, typically present in:
      • Intent: switch_account
    • {{Profile Type}}, typically present in:
      • Intent: switch_account
    • {{Settings}}, typically present in:
      • Intents: cancel_order, change_order, change_shipping_address, check_cancellation_fee, check_invoice, check_payment_methods, contact_human_agent, delete_account, delivery_options, edit_account, get_invoice, newsletter_subscription, payment_issue, place_order, recover_password, registration_problems, set_up_shipping_address, switch_account, track_order, track_refund
    • {{Online Company Portal Info}}, typically present in:
      • Intents: cancel_order, edit_account
    • {{Date}}, typically present in:
      • Intents: check_invoice, check_refund_policy, get_refund, track_order, track_refund
    • {{Date Range}}, typically present in:
      • Intents: check_cancellation_fee, check_invoice, get_invoice
    • {{Shipping Cut-off Time}}, typically present in:
      • Intent: delivery_options
    • {{Delivery City}}, typically present in:
      • Inten...
  7. ChatGPT Classification Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
    Explore at:
    zip(718710 bytes)Available download formats
    Dataset updated
    Sep 7, 2023
    Authors
    Mahdi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

    This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

    We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

    Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

    https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf

  8. a

    End-to-End Response Time by Input Token Count by Models Model

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). End-to-End Response Time by Input Token Count by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better by Model

  9. a

    Latency vs. Output Speed by Models Model

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). Latency vs. Output Speed by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comprehensive comparison of Latency (Time to First Token) vs. Output Speed (Output Tokens per Second) by Model

  10. a

    Latency by Input Token Count by Models Model

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). Latency by Input Token Count by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Seconds to First Token Received; Lower is better by Model

  11. S

    ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends

    • sqmagazine.co.uk
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SQ Magazine (2025). ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends [Dataset]. https://sqmagazine.co.uk/chatgpt-vs-google-gemini-statistics/
    Explore at:
    Dataset updated
    Oct 7, 2025
    Dataset authored and provided by
    SQ Magazine
    License

    https://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/

    Time period covered
    Jan 1, 2024 - Dec 31, 2025
    Area covered
    Global
    Description

    The rivalry between ChatGPT and Google Gemini defines the generative AI landscape. ChatGPT remains the leader in active engagement, while Gemini closes the gap through mass distribution. From corporate reports to web traffic studies, figures speak clearly about adoption, reach, and momentum. Explore what makes each platform stand out, and what...

  12. a

    Seconds to First Answer Token Received by Models Model

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). Seconds to First Answer Token Received by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time by Model

  13. Data from: AstroChat

    • kaggle.com
    • huggingface.co
    zip
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    astro_pat (2024). AstroChat [Dataset]. https://www.kaggle.com/datasets/patrickfleith/astrochat
    Explore at:
    zip(1214166 bytes)Available download formats
    Dataset updated
    Jun 9, 2024
    Authors
    astro_pat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose and Scope

    The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.

    Intended Use

    The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).

    Quickstart

    To be completed

    DATASET DESCRIPTION

    Access

    Structure

    901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column): - id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets. - topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split. - subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc. - persona: description of the persona used to simulate a user - opening_question: the first question asked by the user to start a conversation with the AI-assistant - messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields: - role: the role of the speaker, either user or assistant - content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.

    Important See the full list of topics and subtopics covered below.

    Metadata

    Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main

    Generation Method

    We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:

    Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.

    Step-by-step description

    • Defined a set of user persona
    • Defined a set of topics/ disciplines within the domain of Astronautics / Space Mission Engineering
    • For each topics, we defined a set of subtopics to narrow down the conversation to more specific and niche conversations (see below the full list)
    • For each subtopic we generate a set of opening questions that the user could ask to start a conversation (see below the full list)
    • We then distil the knowledge of an strong Chat Model (in our case ChatGPT through then api with gpt-4-turbo model) to generate the answers to the opening questions
    • We simulate follow-up questions from the user to the assistant, and the assistant's answers to these questions which builds up the messages.

    Future work and contributions appreciated

    • Distil knowledge from more models (Anthropic, Mixtral, GPT-4o, etc...)
    • Implement more creativity in the opening questions and follow-up questions
    • Filter-out questions and conversations which are too similar
    • Ask topic and subtopic expert to validate the generated conversations to have a sense on how reliable is the overall dataset

    Languages

    All instances in the dataset are in english

    Size

    901 synthetically-generated dialogue

    USAGE AND GUIDELINES

    License

    AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International

    Restrictions

    No restriction. Please provide the correct attribution following the license terms.

    Citation

    Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579

    Update Frequency

    Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)

    Have a feedback or spot an error?

    Use the ...

  14. a

    Output Speed by Input Token Count by Models Model

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). Output Speed by Input Token Count by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Output Tokens per Second; Higher is better by Model

  15. raw-chat-dataset

    • kaggle.com
    zip
    Updated Sep 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ben alla ismail (2023). raw-chat-dataset [Dataset]. https://www.kaggle.com/datasets/benallaismail/raw-chat-dataset
    Explore at:
    zip(892690951 bytes)Available download formats
    Dataset updated
    Sep 24, 2023
    Authors
    ben alla ismail
    Description

    Amazon Question-Answer Dataset - Extracted and Preprocessed for GPT-2 Fine-Tuning

    Description:

    This dataset comprises raw data meticulously extracted from the Amazon Question-Answer dataset. Its primary focus is to extract question and answer pairs and convert them into a structured JSON format.

    The extracted dataset has undergone thorough preprocessing to make it suitable for fine-tuning the GPT-2 language model. This refined dataset, known as "pre-processed-chat-dataset," is readily available for exploration on Kaggle and can also be accessed through my Kaggle account with the username "ben alla ismail."

    Key Dataset Highlights:

    • Data Source: Amazon Question-Answer Dataset
    • Format: JSON (Structured Question-Answer Pairs)
    • Purpose: Fine-Tuning the GPT-2 Model

    Dataset Insights:

    Here is a glimpse of the dataset structure: - Sample Entry: json { "noitseuq": "Does this panel come with the connection ribbon cable?", "rewsna": "No, it doesn't. I used the old one." }

    Data Statistics: - Training Set: 2.17 GB - Test Set: 273.34 MB - Validation Set: 271.41 MB

    For deeper insights, regular updates, and easy access to this valuable dataset, please explore the following links:

  16. a

    Intelligence vs. Seconds to Output 500 Tokens, including reasoning model...

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). Intelligence vs. Seconds to Output 500 Tokens, including reasoning model 'thinking' time by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comprehensive comparison of Artificial Analysis Intelligence Index vs. Seconds to Output 500 Tokens, including reasoning model 'thinking' time by Model

  17. a

    Intelligence vs. Output Speed by Models Model

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). Intelligence vs. Output Speed by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comprehensive comparison of Artificial Analysis Intelligence Index vs. Output Speed (Output Tokens per Second) by Model

  18. qa_ds_chat_based_translation

    • kaggle.com
    zip
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ismail ben alla (2023). qa_ds_chat_based_translation [Dataset]. https://www.kaggle.com/benisalla/qa-ds-chat-based-translation
    Explore at:
    zip(754426593 bytes)Available download formats
    Dataset updated
    Sep 28, 2023
    Authors
    ismail ben alla
    Description

    Title: Preprocessed Dataset for Chatbot Model-Based Translation Transformer

    Description:

    This meticulously preprocessed dataset originates from raw chat data, accessible through the provided link. It has been transformed into a binary format to optimize efficiency and accessibility for deep learning applications.

    The dataset is meticulously structured in binary format, essentially representing streams of integers. This format is particularly well-suited for the streamlined management of large-scale datasets in deep learning tasks.

    Key Dataset Components:

    • Train.bin (5.85 GB): The training dataset.
    • Test.bin (731.35 MB): The testing dataset.
    • Val.bin (731.35 MB): The validation dataset.

    To interact with this data, users can seamlessly utilize np.memmap from the NumPy library, offering effortless access and manipulation for various machine learning endeavors.

    tably, the dataset employs the GPT-2 tokenizer with the inclusion of special tokens:

  19. a

    Tokens used to run all evaluations in the Artificial Analysis Intelligence...

    • artificialanalysis.ai
    Updated Jan 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2024). Tokens used to run all evaluations in the Artificial Analysis Intelligence Index by Models Model [Dataset]. https://artificialanalysis.ai/models
    Explore at:
    Dataset updated
    Jan 15, 2024
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Tokens used to run all evaluations in the Artificial Analysis Intelligence Index by Model

  20. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/

ChatGPT Revenue and Usage Statistics (2025)

Explore at:
26 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 9, 2023
Dataset authored and provided by
Business of Apps
License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...

Search
Clear search
Close search
Google apps
Main menu