68 datasets found

Estimated water consumption for training GPT-3 2023
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2023
Area covered
Worldwide
Description
GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.
b
ChatGPT Revenue and Usage Statistics (2025)
businessofapps.com
Updated Feb 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/
Explore at:
Dataset updated
Feb 9, 2023
Dataset authored and provided by
Business of Apps
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...
S
Test dataset of ChatGPT in medical field
scidb.cn
Updated Mar 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.o00130.00001
Dataset updated
Mar 3, 2023
Dataset provided by
Science Data Bank
Authors
robin shen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.
ChatGPT Classification Dataset
kaggle.com
zip
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
Explore at:
zip(718710 bytes)Available download formats
Dataset updated
Sep 7, 2023
Authors
Mahdi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf
Few, But More Orgnized data for train and test!
kaggle.com
zip
Updated Nov 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reza JafariRaviz (2023). Few, But More Orgnized data for train and test! [Dataset]. https://www.kaggle.com/datasets/rezajafariraviz/few-but-more-orgnized-data-for-train-and-test
Explore at:
zip(2185756 bytes)Available download formats
Dataset updated
Nov 26, 2023
Authors
Reza JafariRaviz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The data has been created for use in an AI detection competition. Two prompts are passed to chatbots to elicit responses. The chatbots used are Bing, Bard, and ChatGPT. The data is also labeled to indicate whether the prompt includes the source text or not.
f
Data Sheet 2_Large language models generating synthetic clinical datasets: a...
frontiersin.figshare.com
figshare.com
xlsx
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1533508.s002
Dataset updated
Feb 5, 2025
Dataset provided by
Frontiers
Authors
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
ChatGPT for MLM
kaggle.com
zip
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evgenii Pishchik (2023). ChatGPT for MLM [Dataset]. https://www.kaggle.com/datasets/pe4eniks/chatgpt-for-mlm/code
Explore at:
zip(20267 bytes)Available download formats
Dataset updated
Mar 15, 2023
Authors
Evgenii Pishchik
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description.

This is a small dataset of synthetically generated samples for the MLM task using ChatGPT.

For data construction I use these requests. All requests were generated consistently and within one chat.

140 queries about general CV. 40 queries about datasets for CV. 40 queries about articles in CV. 20 queries about transformers in CV. 20 queries about training pipelines in CV. 20 queries about libraries for CV. 20 queries about hardware for CV.

Training.

You have a prompt with one [MASK] token that you need to predict and correct word at this position.

Data structure.

data.csv - main file with all data.

synthetic.txt - raw outputs from ChatGPT.

preprocess.py - convertation from raw to structured data.
Data Sheet 1_Evaluating multiple large language models on orbital...
frontiersin.figshare.com
docx
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qi-Chen Yang; Yan-Mei Zeng; Hong Wei; Cheng Chen; Qian Ling; Xiao-Yu Wang; Xu Chen; Yi Shao (2025). Data Sheet 1_Evaluating multiple large language models on orbital diseases.docx [Dataset]. http://doi.org/10.3389/fcell.2025.1574378.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fcell.2025.1574378.s001
Dataset updated
Jul 7, 2025
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Qi-Chen Yang; Yan-Mei Zeng; Hong Wei; Cheng Chen; Qian Ling; Xiao-Yu Wang; Xu Chen; Yi Shao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The avoidance of mistakes by humans is achieved through continuous learning, error correction, and experience accumulation. This process is known to be both time-consuming and laborious, often involving numerous detours. In order to assist humans in their learning endeavors, ChatGPT (Generative Pre-trained Transformer) has been developed as a collection of large language models (LLMs) capable of generating responses that resemble human-like answers to a wide range of problems. In this study, we sought to assess the potential of LLMs as assistants in addressing queries related to orbital diseases. To accomplish this, we gathered a dataset consisting of 100 orbital questions, along with their corresponding answers, sourced from examinations administered to ophthalmologist residents and medical students. Five language models (LLMs) were utilized for testing and comparison purposes, namely, GPT-4, GPT-3.5, PaLM2, Claude 2, and SenseNova. Subsequently, the LLM exhibiting the most exemplary performance was selected for comparison against ophthalmologists and medical students. Notably, GPT-4 and PaLM2 demonstrated a superior average correlation when compared to the other LLMs. Furthermore, GPT-4 exhibited a broader spectrum of accurate responses and attained the highest average score among all the LLMs. Additionally, GPT-4 demonstrated the highest level of confidence during the test. The performance of GPT-4 surpassed that of medical students, albeit falling short of that exhibited by ophthalmologists. In contrast, the findings of the study indicate that GPT-4 exhibited superior performance within the orbital domain of ophthalmology. Given further refinement through training, LLMs possess considerable potential to be utilized as comprehensive instruments alongside medical students and ophthalmologists.
A
Artificial Intelligence in Supply Chain Market Report
promarketreports.com
doc, pdf, ppt
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pro Market Reports (2025). Artificial Intelligence in Supply Chain Market Report [Dataset]. https://www.promarketreports.com/reports/artificial-intelligence-in-supply-chain-market-8381
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jan 23, 2025
Dataset authored and provided by
Pro Market Reports
License
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Artificial Intelligence in Supply Chain Market market was valued at USD 51.35 billion in 2024 and is projected to reach USD 86.87 billion by 2033, with an expected CAGR of 7.80% during the forecast period. Recent developments include: The recent rise of artificial intelligence (AI) has given the sector fresh optimism, and one particular technology, ChatGPT, is showing a lot of potential. The ChatGPT language model was developed by OpenAI to generate human-like responses to questions posed in natural language. It has been trained on a large amount of data, identifying patterns and producing solutions that are extremely accurate and pertinent to the situation. Just a few months after going live, the site had more than 100 million signups. ChatGPT has already shown enormous promise in the fields of healthcare and finance, and it is ready to change supply chain management for startups., Actyv.ai, a category pioneer in the enterprise SaaS with embedded B2B BNPL and insurance arena with headquarters in Singapore, announced a strategic agreement with PwC India in March 2023 to promote embedded finance adoption in supply chain ecosystems for their clients. In addition to facilitating access to pertinent embedded financial and insurance products, the partnership seeks to concentrate on using the potential of artificial intelligence to spur growth prospects in the global supply chain ecosystem.. Key drivers for this market are: Increasing market growth of E-commerce, Increasing growth in big data technology; High demand for advanced solutions for transparency in supply chain process. Potential restraints include: Lack of technical expertise.
m
PIM treatments optimization with PM-TOM using STOPP and Beers criteria and...
data.mendeley.com
Updated Nov 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adnan Kulenovic (2024). PIM treatments optimization with PM-TOM using STOPP and Beers criteria and ChatGPT - a case study [Dataset]. http://doi.org/10.17632/3mcz5hy342.2
Explore at:
Unique identifier
https://doi.org/10.17632/3mcz5hy342.2
Dataset updated
Nov 11, 2024
Authors
Adnan Kulenovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PM-TOM (Personalized Medicine: Therapy Optimization Method) is a decision support tool designed to find treatments with the minimal STOPP and Beers criteria risks and ADRs caused by ADEs, DDIs, DCIs, DGIs and DFIs. The tool optimizes a treatment by considering drug classes selected by health professionals and applicable to the patient's conditions.

This data set includes the details of a polypharmacy treatment of an older patient's case at admission to a deprescribing facility, discharge, and after applying the PM-TOM optimization. All three treatments were reviewed by ChatGPT 4.0, trained on a large set of medical literature, which confirmed the benefits of the optimized treatment and its alignment with the Beers and STOPP/START criteria.

The integrated PM-TOM and ChatGPT approach has several advantages: 1. PM-TOM simplifies finding and reviewing effective drug regimens, allowing healthcare providers to leverage their expertise more efficiently. 2. Detailed PM-TOM reports facilitate the involvement of pharmacists in monitoring and adjusting polypharmacy treatments. 3. Targeted PM-TOM recommendations help reduce non-actionable alerts, mitigating alert fatigue and minimizing prescribing errors. 4. PM-TOM rapidly evaluates and optimizes complex treatment plans, enabling consideration of multiple safe alternatives. 5. When applied at the primary care level, this approach helps prevent and reduce inappropriate prescribing, including prescribing cascades. 6. AI tools like ChatGPT, trained on up-to-date medical information, provide additional insights to help healthcare providers refine treatment plans.
Energy consumption when training LLMs in 2022 (in MWh)
statista.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Energy consumption when training LLMs in 2022 (in MWh) [Dataset]. https://www.statista.com/statistics/1384401/energy-use-when-training-llm-models/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
Worldwide
Description
Energy consumption of artificial intelligence (AI) models in training is considerable, with both GPT-3, the original release of the current iteration of OpenAI's popular ChatGPT, and Gopher consuming well over **********-megawatt hours of energy simply for training. As this is only for the training model it is likely that the energy consumption for the entire usage and lifetime of GPT-3 and other large language models (LLMs) is significantly higher. The largest consumer of energy, GPT-3, consumed roughly the equivalent of *** Germans in 2022. While not a staggering amount, it is a considerable use of energy. Energy savings through AI While it is undoubtedly true that training LLMs takes a considerable amount of energy, the energy savings are also likely to be substantial. Any AI model that improves processes by minute numbers might save hours on shipment, liters of fuel, or dozens of computations. Each one of these uses energy as well and the sum of energy saved through a LLM might vastly outperform its energy cost. A good example is mobile phone operators, of which a ***** expect that AI might reduce power consumption by *** to ******* percent. Considering that much of the world uses mobile phones this would be a considerable energy saver. Emissions are considerable The amount of CO2 emissions from training LLMs is also considerable, with GPT-3 producing nearly *** tonnes of CO2. This again could be radically changed based on the types of energy production creating the emissions. Most data center operators for instance would prefer to have nuclear energy play a key role, a significantly low-emission energy producer.
Global interest in ChatGPT on Google search weekly 2022-2025
statista.com
Updated Nov 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global interest in ChatGPT on Google search weekly 2022-2025 [Dataset]. https://www.statista.com/statistics/1366930/chatgpt-google-search-weekly-worldwide/
Explore at:
Dataset updated
Nov 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 6, 2022 - Oct 25, 2025
Area covered
Worldwide
Description
In the week from October 19 to 25, 2025, global Google searches for the word "ChatGPT" reached a peak of 100 index points, indicating a significant increase in interest and thus the highest interest over the observed period. On October 21, 2025, OpenAI introduced ChatGPT Atlas, a web browser with ChatGPT built in. Interest in the chatbot, developed by U.S.-based OpenAI and launched in November 2022, started rising in the week ending December 3, 2022. ChatGPT, which stands for Chat Generative Pre-trained Transformer, is an AI-powered auto-generative text system able to give human-sounding replies and reproduce human-like interactions when prompted.
DAIGT-V4-TRAIN-DATASET
kaggle.com
zip
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darek Kłeczek (2024). DAIGT-V4-TRAIN-DATASET [Dataset]. https://www.kaggle.com/datasets/thedrcat/daigt-v4-train-dataset/data?select=train_v4_drcat_01.csv
Explore at:
zip(51270920 bytes)Available download formats
Dataset updated
Jan 15, 2024
Authors
Darek Kłeczek
Description
New release of DAIGT train dataset! Improvement:

Everything that was already in V3 dataset, plus a little bit of extra magic!

8000+ texts I generated with llama-based models finetuned on Persuade corpus 🔥🔥🔥

Sources (please upvote the original datasets!):

Text generated with ChatGPT by MOTH (https://www.kaggle.com/datasets/alejopaullier/daigt-external-dataset)

Persuade corpus contributed by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/persaude-corpus-2/)

Text generated with Llama-70b and Falcon180b by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b)

Text generated with ChatGPT and GPT4 by Radek (https://www.kaggle.com/datasets/radek1/llm-generated-essays)

2000 Claude essays generated by @darraghdog (https://www.kaggle.com/datasets/darraghdog/hello-claude-1000-essays-from-anthropic)

LLM-generated essay using PaLM from Google Gen-AI by @kingki19 (https://www.kaggle.com/datasets/kingki19/llm-generated-essay-using-palm-from-google-gen-ai)

Official train essays

Essays I generated with various LLMs

License: MIT for the data I generated. Check source datasets for the other sources mentioned above.
n
A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott McGrath (2024). A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions - Full study data [Dataset]. http://doi.org/10.5061/dryad.s4mw6m9cv
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.s4mw6m9cv
Dataset updated
Jun 4, 2024
Dataset provided by
University of California, Berkeley
Authors
Scott McGrath
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Objective: Our objective is to evaluate the efficacy of ChatGPT 4 in accurately and effectively delivering genetic information, building on previous findings with ChatGPT 3.5. We focus on assessing the utility, limitations, and ethical implications of using ChatGPT in medical settings. Materials and Methods: A structured questionnaire, including the Brief User Survey (BUS-15) and custom questions, was developed to assess ChatGPT 4's clinical value. An expert panel of genetic counselors and clinical geneticists independently evaluated ChatGPT 4's responses to these questions. We also involved comparative analysis with ChatGPT 3.5, utilizing descriptive statistics and using R for data analysis. Results: ChatGPT 4 demonstrated improvements over 3.5 in context recognition, relevance, and informativeness. However, performance variability and concerns about the naturalness of the output were noted. No significant difference in accuracy was found between ChatGPT 3.5 and 4.0. Notably, the efficacy of ChatGPT 4 varied significantly across different genetic conditions, with specific differences identified between responses related to BRCA1 and HFE. Discussion and Conclusion: This study highlights ChatGPT 4's potential in genomics, noting significant advancements over its predecessor. Despite these improvements, challenges remain, including the risk of outdated information and the necessity of ongoing refinement. The variability in performance across different genetic conditions underscores the need for expert oversight and continuous AI training. ChatGPT 4, while showing promise, emphasizes the importance of balancing technological innovation with ethical responsibility in healthcare information delivery. Methods Study Design This study was conducted to evaluate the performance of ChatGPT 4 (March 23rd, 2023) Model) in the context of genetic counseling and education. The evaluation involved a structured questionnaire, which included questions selected from the Brief User Survey (BUS-15) and additional custom questions designed to assess the clinical value of ChatGPT 4's responses. Questionnaire Development The questionnaire was built on Qualtrics, which comprised twelve questions: seven selected from the BUS-15 preceded by two additional questions that we designed. The initial questions focused on quality and answer relevancy: 1. The overall quality of the Chatbot’s response is: (5-point Likert: Very poor to Very Good) 2. The Chatbot delivered an answer that provided the relevant information you would include if asked the question. (5-point Likert: Strongly disagree to Strongly agree) The BUS-15 questions (7-point Likert: Strongly disagree to Strongly agree) focused on: 1. Recognition and facilitation of users’ goal and intent: Chatbot seems able to recognize the user’s intent and guide the user to its goals. 2. Relevance of information: The chatbot provides relevant and appropriate information/answer to people at each stage to make them closer to their goal. 3. Maxim of quantity: The chatbot responds in an informative way without adding too much information. 4. Resilience to failure: Chatbot seems able to find ways to respond appropriately even when it encounters situations or arguments it is not equipped to handle. 5. Understandability and politeness: The chatbot seems able to understand input and convey correct statements and answers without ambiguity and with acceptable manners. 6. Perceived conversational credibility: The chatbot responds in a credible and informative way without adding too much information. 7. Meet the neurodiverse needs: Chatbot seems able to meet needs and be used by users independently form their health conditions, well-being, age, etc. Expert Panel and Data Collection A panel of experts (two genetic counselors and two clinical geneticists) was provided with a link to the survey containing the questions. They independently evaluated the responses from ChatGPT 4 without discussing the questions or answers among themselves until after the survey submission. This approach ensured unbiased evaluation.
b
GPQA Benchmarks and Pricing, Aug 2025
binaryverseai.com
Updated Aug 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VALS AI (2025). GPQA Benchmarks and Pricing, Aug 2025 [Dataset]. https://binaryverseai.com/chatgpt-o3-pro-review-benchmarks-hacks/
Explore at:
Dataset updated
Aug 9, 2025
Dataset authored and provided by
VALS AI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of model accuracy on GPQA, token pricing, and latency for leading AI reasoning models.
h
chats-data-2023-09-27
huggingface.co
Updated Sep 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Collective Cognition (2023). chats-data-2023-09-27 [Dataset]. https://huggingface.co/datasets/CollectiveCognition/chats-data-2023-09-27
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 27, 2023
Dataset authored and provided by
Collective Cognition
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for "Collective Cognition ChatGPT Conversations"

Dataset Description Dataset Summary

The "Collective Cognition ChatGPT Conversations" dataset is a collection of chat logs between users and the ChatGPT model. These conversations have been shared by users on the "Collective Cognition" website. The dataset provides insights into user interactions with language models and can be utilized for multiple purposes, including training, research, and… See the full description on the dataset page: https://huggingface.co/datasets/CollectiveCognition/chats-data-2023-09-27.
Top web domains cited by LLMs 2025
statista.com
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Top web domains cited by LLMs 2025 [Dataset]. https://www.statista.com/statistics/1620335/top-web-domains-cited-by-llms/
Explore at:
Dataset updated
Jun 29, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2025
Area covered
Worldwide
Description
A June 2025 study found that ****** was the most frequently cited web domain by large language models (LLMs). The platform was referenced in approximately ** percent of the analyzed cases, likely due to the content licensing agreement between Google and Reddit in early 2024 for the purpose of AI models training. ********* ranked second, being mentioned in roughly ** percent of the times, while ****** and ******* were mentioned ** percent.
f
Data_Sheet_2_Performance analysis of large language models in the domain of...
frontiersin.figshare.com
pdf
Updated Nov 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah Al Zubaer; Michael Granitzer; Jelena Mitrović (2023). Data_Sheet_2_Performance analysis of large language models in the domain of legal argument mining.pdf [Dataset]. http://doi.org/10.3389/frai.2023.1278796.s002
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2023.1278796.s002
Dataset updated
Nov 17, 2023
Dataset provided by
Frontiers
Authors
Abdullah Al Zubaer; Michael Granitzer; Jelena Mitrović
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.
Z
Data from: Learning the Rules of Peptide Self-assembly through Data Mining...
data-staging.niaid.nih.gov
Updated Mar 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang, Zhenze; Yorke, Sarah K.; Knowles, Tuomas; Buehler, Markus (2025). Learning the Rules of Peptide Self-assembly through Data Mining with Large Language Models [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14787834
Explore at:
Dataset updated
Mar 22, 2025
Dataset provided by
Massachusetts Institute of Technology
University of Cambridge
Authors
Yang, Zhenze; Yorke, Sarah K.; Knowles, Tuomas; Buehler, Markus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Peptides are biologically ubiquitous and important molecules that self-assemble into diverse structures. While extensive research has explored the effects of chemical composition and environmental conditions on self-assembly, a systematic study consolidating this data to uncover global rules is lacking. In this work, we curate a peptide assembly database through a combination of manual processing by human experts and literature mining with a large language model. As a result, we collect more than 1,000 experimental data entries with information about peptide sequence, experimental conditions and corresponding self-assembly phases. Utilizing the data, machine learning models are trained and evaluated, demonstrating excellent accuracy (> 80%) and efficiency in assembly phase classification. Moreover, we fine-tune our GPT model for peptide literature mining with the developed dataset, which exhibits markedly superior performance in extracting information from academic publications relative to the pre-trained model. This workflow can improve efficiency when exploring potential self-assembling peptide candidates, through guiding experimental work, while also deepening our understanding of the mechanisms governing peptide self-assembly.

--- phase_data_clean.csv stores 1000+ peptide self-assembly data under different experimental conditions.

---mined_paper_list.csv stores the corresponding papers we used to collect data.

--- trainset.jsonl and testset.jsonl are data we used for fine-tuning the LLM.

--- fine-tuning.ipynb: code used to fine-tune ChatGPT model.

--- pretrain.ipynb: code used to test the pretrained ChatGPT model.

--- train_and_inference.ipynb: code to use mined data to train and test a ML predictor for phase classification.
Table_1_Evaluation of the quality and quantity of artificial...
frontiersin.figshare.com
docx
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jisun Choi; Ah Ran Oh; Jungchan Park; Ryung A. Kang; Seung Yeon Yoo; Dong Jae Lee; Kwangmo Yang (2024). Table_1_Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0.DOCX [Dataset]. http://doi.org/10.3389/fmed.2024.1400153.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2024.1400153.s001
Dataset updated
Jul 31, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Jisun Choi; Ah Ran Oh; Jungchan Park; Ryung A. Kang; Seung Yeon Yoo; Dong Jae Lee; Kwangmo Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionThe large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.MethodsTwo anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.ResultsRegarding quality, “appropriate” was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed “insufficient” in 59% of cases for 3.5, and “adequate” in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were − 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.ConclusionChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista, Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/

Estimated water consumption for training GPT-3 2023

Explore at:

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Jul 2023

Area covered

Worldwide

Description

GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.

Clear search

Close search

Google apps

Main menu

Estimated water consumption for training GPT-3 2023

ChatGPT Revenue and Usage Statistics (2025)

Test dataset of ChatGPT in medical field

ChatGPT Classification Dataset

Few, But More Orgnized data for train and test!

Data Sheet 2_Large language models generating synthetic clinical datasets: a...

ChatGPT for MLM

Description.

Training.

Data structure.

Data Sheet 1_Evaluating multiple large language models on orbital...

Artificial Intelligence in Supply Chain Market Report

PIM treatments optimization with PM-TOM using STOPP and Beers criteria and...

Energy consumption when training LLMs in 2022 (in MWh)

Global interest in ChatGPT on Google search weekly 2022-2025

DAIGT-V4-TRAIN-DATASET

A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to...

GPQA Benchmarks and Pricing, Aug 2025

chats-data-2023-09-27

Top web domains cited by LLMs 2025

Data_Sheet_2_Performance analysis of large language models in the domain of...

Data from: Learning the Rules of Peptide Self-assembly through Data Mining...

Table_1_Evaluation of the quality and quantity of artificial...

Estimated water consumption for training GPT-3 2023