45 datasets found

Estimated water consumption for training GPT-3 2023
statista.com
ai-chatbox.pro
Updated Nov 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
Explore at:
Dataset updated
Nov 19, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2023
Area covered
Worldwide
Description
GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.
Energy consumption when training LLMs in 2022 (in MWh)
statista.com
Updated Sep 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Energy consumption when training LLMs in 2022 (in MWh) [Dataset]. https://www.statista.com/statistics/1384401/energy-use-when-training-llm-models/
Explore at:
Dataset updated
Sep 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
Worldwide
Description
Energy consumption of artificial intelligence (AI) models in training is considerable, with both GPT-3, the original release of the current iteration of OpenAI's popular ChatGPT, and Gopher consuming well over a thousand-megawatt hours of energy simply for training. As this is only for the training model it is likely that the energy consumption for the entire usage and lifetime of GPT-3 and other large language models (LLMs) is significantly higher. The largest consumer of energy, GPT-3, consumed roughly the equivalent of 200 Germans in 2022. While not a staggering amount, it is a considerable use of energy.

Energy savings through AI

While it is undoubtedly true that training LLMs takes a considerable amount of energy, the energy savings are also likely to be substantial. Any AI model that improves processes by minute numbers might save hours on shipment, liters of fuel, or dozens of computations. Each one of these uses energy as well and the sum of energy saved through a LLM might vastly outperform its energy cost. A good example is mobile phone operators, of which a third expect that AI might reduce power consumption by ten to fifteen percent. Considering that much of the world uses mobile phones this would be a considerable energy saver.

Emissions are considerable

The amount of CO2 emissions from training LLMs is also considerable, with GPT-3 producing nearly 500 tonnes of CO2. This again could be radically changed based on the types of energy production creating the emissions. Most data center operators for instance would prefer to have nuclear energy play a key role, a significantly low-emission energy producer.
m
Date Set: ChatGPT as an education and learning tool for engineering,...
data.mendeley.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RAVINDRA BHARDWAJ (2025). Date Set: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India [Dataset]. http://doi.org/10.17632/995zwcz5yt.2
Explore at:
Unique identifier
https://doi.org/10.17632/995zwcz5yt.2
Dataset updated
Jun 25, 2025
Authors
RAVINDRA BHARDWAJ
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
This is the raw data that is used in the publication: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India.
d
Replication Data for: ChatGPT outperforms crowd-workers for text-annotation...
search.dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gilardi, Fabrizio; Alizadeh, Meysam; Kubli, Maël (2023). Replication Data for: ChatGPT outperforms crowd-workers for text-annotation tasks [Dataset]. http://doi.org/10.7910/DVN/PQYF6M
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PQYF6M
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Gilardi, Fabrizio; Alizadeh, Meysam; Kubli, Maël
Description
Many NLP applications require manual text annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd-workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using four samples of tweets and news articles (n = 6,183), we show that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frame detection. Across the four datasets, the zero-shot accuracy of ChatGPT exceeds that of crowd-workers by about 25 percentage points on average, while ChatGPT's intercoder agreement exceeds that of both crowd-workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003---about thirty times cheaper than MTurk. These results demonstrate the potential of large language models to drastically increase the efficiency of text classification.
Global weekly interest in ChatGPT on Google search 2022-2024
statista.com
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Global weekly interest in ChatGPT on Google search 2022-2024 [Dataset]. https://www.statista.com/statistics/1366930/chatgpt-google-search-weekly-worldwide/
Explore at:
Dataset updated
Jul 5, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 6, 2022 - Jun 30, 2024
Area covered
Worldwide
Description
As of June 2024, global Google searches for the word "ChatGPT" increased again after a slight decline by the end of 2024. Interest in the chatbot, developed by the U.S.-based OpenAI and launched in November 2022, started rising in the week ending December 3, 2022. Recently, growing demand for information on ChatGPT made the keyword hit a peak of 100 index points during the week ending on June 2, 2024. ChatGPT, which stands for Chat Generative Pre-trained Transformer, is a chatbot and AI-powered auto-generative text system able to give human-sounding replies and reproduce human-like interactions when prompted.
m
ChatGPT as an education and learning tool for engineering, technology and...
data.mendeley.com
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RAVINDRA BHARDWAJ (2024). ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India [Dataset]. http://doi.org/10.17632/995zwcz5yt.1
Explore at:
Unique identifier
https://doi.org/10.17632/995zwcz5yt.1
Dataset updated
May 14, 2024
Authors
RAVINDRA BHARDWAJ
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
This is the raw data that is used in the publication: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India.
h
awesome-chatgpt-prompts
huggingface.co
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Authors
Fatih Kadir Akın
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
🧠 Awesome ChatGPT Prompts [CSV dataset]

This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

License

CC-0
f
Data Sheet 1_Large language models generating synthetic clinical datasets: a...
frontiersin.figshare.com
xlsx
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 1_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1533508.s001
Dataset updated
Feb 5, 2025
Dataset provided by
Frontiers
Authors
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
f
Table_1_Evaluation of the quality and quantity of artificial...
frontiersin.figshare.com
docx
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jisun Choi; Ah Ran Oh; Jungchan Park; Ryung A. Kang; Seung Yeon Yoo; Dong Jae Lee; Kwangmo Yang (2024). Table_1_Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0.DOCX [Dataset]. http://doi.org/10.3389/fmed.2024.1400153.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2024.1400153.s001
Dataset updated
Jul 11, 2024
Dataset provided by
Frontiers
Authors
Jisun Choi; Ah Ran Oh; Jungchan Park; Ryung A. Kang; Seung Yeon Yoo; Dong Jae Lee; Kwangmo Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionThe large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.MethodsTwo anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.ResultsRegarding quality, “appropriate” was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed “insufficient” in 59% of cases for 3.5, and “adequate” in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were − 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.ConclusionChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.
Orca DPO Dialogue Pairs
kaggle.com
opendatabay.com
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Orca DPO Dialogue Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/intel-orca-dialogue-pairs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Intel Orca Dialogue Pairs

Orca style for preference training (Intel's DPO dataset)

By Huggingface Hub [source]

About this dataset

The Intel/Orca/DPO Dialogue Pairs dataset is a unique resource for Natural language processing (NLP) research, combining AI and human conversations collected from online sources. This dataset is invaluable for exploring how human conversations can inform the development of conversational AI models. With columns such as System and Question extracted from chat logs, this dataset can help researchers understand more about how to better connect people with technology using meaningful dialogue. Furthermore, the data also includes columns for ChatGPT and Llama2–13b-Chat, two of the most widely used conversational AI models. By leveraging this data set, researchers have an exceptional opportunity to explore conversational techniques that enable humans and machines to communicate in natural languages

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will provide an overview of how to use the Intel/Orca/DPO Dialogue Pairs dataset efficiently for human-centric natural language processing research.

Step 1: Understand the dataset

The Intel/Orca/DPO Dialogue Pairs dataset is composed of two main columns: System and Question. The System column contains responses from AI systems, and the Question column contains questions asked by humans. Additionally, this dataset also contains columns for ChatGPT and Llama2–13b-Chat, two models used in developing conversational AI systems.

Step 2: Prepare your environment

Before getting started with analyzing data from this dataset, you should first prepare your environment accordingly. Make sure that any necessary libraries or services are installed on your machine before attempting to work with the data from this dataset in order to avoid potential issues or errors during usage.

##### Step 3: Access the data
In order to access and start working with the data contained in this Dataset, you can either download it directly via a Kaggle account or alternatively access it through one of its REST Endpoints if available on other services (i.e Databricks).

##### Step 4: Exploring & Analyzing the Data

##### Step 5 : Reporting Results
Lastly ,once explorations and analyses have been completed its highly important that results are reported accurately especially when dealing with ethical datasets such as dialogue pairs since consequences could be dire if misinformation is disseminated .Reporting results should usually involve standard relevant indicators being declared while taking care conducting appropriate statistical tests ruling out incorrect anomalous outcomes

Research Ideas

Developing and improving natural language processing algorithms for AI-human conversation.

Building user-friendly chatbots that are better at recognizing and understanding human intent by training the model using this dataset.

Designing recommendation systems to predict user questions and generate more accurate responses based on previous conversations in the dataset

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------------|:-----------------------------------------------------------------------------| | system | Contains the AI system's response to the user's question. (Text) | | chatgpt | Contains the ChatGPT model's response to the user's question. (Text) | | llama2-13b-chat | Contains the Llama2-13b-Chat model's response to the user's question. (Text) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
P
VideoInstruct Dataset
paperswithcode.com
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Maaz; Hanoona Rasheed; Salman Khan; Fahad Shahbaz Khan (2024). VideoInstruct Dataset [Dataset]. https://paperswithcode.com/dataset/videoinstruct
Explore at:
Dataset updated
May 17, 2024
Authors
Muhammad Maaz; Hanoona Rasheed; Salman Khan; Fahad Shahbaz Khan
Description
Video Instruction Dataset is used to train Video-ChatGPT. It consists of 100,000 high-quality video instruction pairs. employs a combination of human-assisted and semi-automatic annotation techniques, aiming to produce high-quality video instruction data. These methods create question-answer pairs related to

Video summarization Description-based question-answers (exploring spatial, temporal, relationships, and reasoning concepts) Creative/generative question-answers
o
Awesome ChatGPT Prompts
opendatabay.com
.csv
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Awesome ChatGPT Prompts [Dataset]. https://www.opendatabay.com/data/ai-ml/b19fe949-9f50-4a6e-ba87-7318e75458c2
Explore at:
.csvAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
Welcome to the "Awesome ChatGPT Prompts" dataset on Kaggle! This is a collection of prompt examples to be used with the ChatGPT model.

The ChatGPT model is a large language model trained by OpenAI that is capable of generating human-like text. By providing it with a prompt, it can generate responses that continue the conversation or expand on the given prompt.

License

CC0

Original Data Source: Awesome ChatGPT Prompts
Learning the Rules of Peptide Self-assembly through Data Mining with Large...
zenodo.org
bin, csv
Updated Feb 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenze Yang; Zhenze Yang; Sarah K. Yorke; Sarah K. Yorke; Tuomas Knowles; Tuomas Knowles; Markus Buehler; Markus Buehler (2025). Learning the Rules of Peptide Self-assembly through Data Mining with Large Language Models [Dataset]. http://doi.org/10.5281/zenodo.14791268
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14791268
Dataset updated
Feb 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zhenze Yang; Zhenze Yang; Sarah K. Yorke; Sarah K. Yorke; Tuomas Knowles; Tuomas Knowles; Markus Buehler; Markus Buehler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Peptides are biologically ubiquitous and important molecules that self-assemble into diverse structures. While extensive research has explored the effects of chemical composition and environmental conditions on self-assembly, a systematic study consolidating this data to uncover global rules is lacking. In this work, we curate a peptide assembly database through a combination of manual processing by human experts and literature mining with a large language model. As a result, we collect more than 1,000 experimental data entries with information about peptide sequence, experimental conditions and corresponding self-assembly phases. Utilizing the data, machine learning models are trained and evaluated, demonstrating excellent accuracy (> 80%) and efficiency in assembly phase classification. Moreover, we fine-tune our GPT model for peptide literature mining with the developed dataset, which exhibits markedly superior performance in extracting information from academic publications relative to the pre-trained model. This workflow can improve efficiency when exploring potential self-assembling peptide candidates, through guiding experimental work, while also deepening our understanding of the mechanisms governing peptide self-assembly.

--- phase_data_clean.csv stores 1000+ peptide self-assembly data under different experimental conditions.

--- trainset.jsonl and testset.jsonl are data we used for fine-tuning the LLM.

--- fine-tuning.ipynb: code used to fine-tune ChatGPT model.

--- pretrain.ipynb: code used to test the pretrained ChatGPT model.

--- train_and_inference.ipynb: code to use mined data to train and test a ML predictor for phase classification.
P
OffensiveLang Dataset
paperswithcode.com
Updated Mar 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amit Das; Mostafa Rahgouy; Dongji Feng; Zheng Zhang; Tathagata Bhattacharya; Nilanjana Raychawdhary; Fatemeh Jamshidi; Vinija Jain; Aman Chadha; Mary Sandage; Lauramarie Pope; Gerry Dozier; Cheryl Seals (2024). OffensiveLang Dataset [Dataset]. https://paperswithcode.com/dataset/offensivelang
Explore at:
Dataset updated
Mar 3, 2024
Authors
Amit Das; Mostafa Rahgouy; Dongji Feng; Zheng Zhang; Tathagata Bhattacharya; Nilanjana Raychawdhary; Fatemeh Jamshidi; Vinija Jain; Aman Chadha; Mary Sandage; Lauramarie Pope; Gerry Dozier; Cheryl Seals
Description
OffensiveLang is a community based implicit offensive language dataset generated by ChatGPT 3.5 containing data for 38 different target groups. It has been meticulously annotated by Amazon MTurk workers, ensuring high-quality labeling of hate speech. Additionally, a prompt-based zero-shot method was employed with ChatGPT and the detection results were compared between human annotation and ChatGPT annotation. This dataset is invaluable for researchers and practitioners working on implicit hate speech detection and large language models.

Source: ChatGPT 3.5 Length: 8270 texts train: 6616 texts test: 1654 texts

OffensiveLang.csv_Details Column1: Text: Contains text generated by ChatGPT 3.5. Column2: Category: Represents the category of the target group. Column3: Target Group: Specifies the target group for the text. Column4: Final Annotation: The final human annotation determined by the majority vote among three MTurk annotators (this annotation will be used as the actual annotation for model training and evaluation). Column5: OpenAI_Annotation: Annotation provided by ChatGPT 3.5. Column6-8: Annotator1-3: Individual annotations from three human annotators.

Paper For a detailed investigation of the OffensiveLang dataset, refer to the associated paper: OffensiveLang: A Community Based Implicit Offensive Language Dataset.

Citation If you use this dataset in your research, please cite the following paper:

@article{das2024offlandat, title={OffLanDat: A Community Based Implicit Offensive Language Dataset Generated by Large Language Model Through Prompt Engineering}, author={Das, Amit and Rahgouy, Mostafa and Feng, Dongji and Zhang, Zheng and Bhattacharya, Tathagata and Raychawdhary, Nilanjana and Sandage, Mary and Pope, Lauramarie and Dozier, Gerry and Seals, Cheryl}, journal={arXiv preprint arXiv:2403.02472}, year={2024} }

License: CC BY Contact For any questions or feedback, feel free to contact the authors of the paper or the dataset contributors. Contact Information: azd0123@auburn.edu
A
Artificial Intelligence in Supply Chain Market Report
promarketreports.com
doc, pdf, ppt
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pro Market Reports (2025). Artificial Intelligence in Supply Chain Market Report [Dataset]. https://www.promarketreports.com/reports/artificial-intelligence-in-supply-chain-market-8381
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jan 23, 2025
Dataset authored and provided by
Pro Market Reports
License
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Artificial Intelligence (AI) in Supply Chain Market size was valued at USD 51.35 billion in 2025 and is projected to expand at a compound annual growth rate (CAGR) of 7.80% from 2025 to 2033, reaching a value of USD 93.47 billion by 2033. The growth of the market is attributed to the increasing adoption of AI technologies by businesses to improve their supply chain efficiency and optimize operations. Factors such as the rising demand for real-time data analysis, predictive analytics, and automation in the supply chain are driving the market growth. Additionally, government initiatives and investments in AI research and development are further contributing to the market expansion. The market for AI in Supply Chain is segmented based on component, end-user, and technology. The software segment held the largest market share in 2025, owing to the increasing adoption of software solutions for data analysis, forecasting, and optimization in the supply chain. The manufacturing end-user segment is expected to witness the highest growth rate during the forecast period due to the increasing implementation of AI technologies to improve production efficiency and reduce costs. Machine learning and natural language processing are the key technologies driving the growth of the AI in Supply Chain Market. Recent developments include: The recent rise of artificial intelligence (AI) has given the sector fresh optimism, and one particular technology, ChatGPT, is showing a lot of potential. The ChatGPT language model was developed by OpenAI to generate human-like responses to questions posed in natural language. It has been trained on a large amount of data, identifying patterns and producing solutions that are extremely accurate and pertinent to the situation. Just a few months after going live, the site had more than 100 million signups. ChatGPT has already shown enormous promise in the fields of healthcare and finance, and it is ready to change supply chain management for startups., Actyv.ai, a category pioneer in the enterprise SaaS with embedded B2B BNPL and insurance arena with headquarters in Singapore, announced a strategic agreement with PwC India in March 2023 to promote embedded finance adoption in supply chain ecosystems for their clients. In addition to facilitating access to pertinent embedded financial and insurance products, the partnership seeks to concentrate on using the potential of artificial intelligence to spur growth prospects in the global supply chain ecosystem.. Key drivers for this market are: Increasing market growth of E-commerce, Increasing growth in big data technology; High demand for advanced solutions for transparency in supply chain process. Potential restraints include: Lack of technical expertise.
E
Google Gemini Statistics By Features, Performance and AI Versions
enterpriseappstoday.com
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EnterpriseAppsToday (2023). Google Gemini Statistics By Features, Performance and AI Versions [Dataset]. https://www.enterpriseappstoday.com/stats/google-gemini-statistics.html
Explore at:
Dataset updated
Dec 20, 2023
Dataset authored and provided by
EnterpriseAppsToday
License
https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Google Gemini Statistics: In 2023, Google unveiled the most powerful AI model to date. Google Gemini is the worldâ€™s most advanced AI leaving the ChatGPT 4 behind in the line. Google has 3 different sizes of models, superior to each, and can perform tasks accordingly. According to Google Gemini Statistics, these can understand and solve complex problems related to absolutely anything. Google even said, they will develop AI in such as way that it will let you know how helpful AI is in our daily routine. Well, we hope our next generation wonâ€™t be fully dependent on such technologies, otherwise, we will lose all of our natural talent! Editorâ€™s Choice Google Gemini can follow natural and engaging conversations. According to Google Gemini Statistics, Gemini Ultra has a 90.0% score on the MMLU benchmark for testing the knowledge of and problem-solving on subjects including history, physics, math, law, ethics, history, and medicine. If you ask Gemini what to do with your raw material, it can provide you with ideas in the form of text or images according to the given input. Gemini has outperformed ChatGPT -4 tests in the majority of the cases. According to the report this LLM is said to be unique because it can process multiple types of data at the same time along with video, images, computer code, and text. Google is considering its development as The Gemini Era, showing the importance of our AI is significant in improving our daily lives. Google Gemini can talk like a real person Gemini Ultra is the largest model and can solve extremely complex problems. Gemini models are trained on multilingual and multimodal datasets. Geminiâ€™s Ultra performance on the MMMU benchmark has also outperformed the GPT-4V in the following results Art and Design (74.2), Business (62.7), Health and Medicine (71.3), Humanities and Social Science (78.3), and Technology and Engineering (53.00).
Replication Package for "Improving the Readability of Generated Tests Using...
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory Gay; Gregory Gay (2023). Replication Package for "Improving the Readability of Generated Tests Using GPT-4 and ChatGPT Code Interpreter" [Dataset]. http://doi.org/10.5281/zenodo.8296610
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8296610
Dataset updated
Oct 5, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gregory Gay; Gregory Gay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
While automated test generation can decrease the human burden associated with testing, it does not eliminate this burden. Humans must still work with generated test cases to interpret testing results, debug the code, build and maintain a comprehensive test suite, and many other tasks. Therefore, a major challenge with automated test generation is understandability of generated test test cases.

Large language models (LLMs), machine learning models trained on massive corpora of textual data - including both natural language and programming languages - are an emerging technology with great potential for performing language-related predictive tasks such as translation, summarization, and decision support.

In this study, we are exploring the capabilities of LLMs with regard to improving test case understandability.

This package contains the data produced during this exploration:

The examples directory contains the three case studies we tested our transformation process on:

queue_example: Tests of a basic queue data structure

httpie_sessions: Tests of the sessions module from the httpie project.

string_utils_validation: Tests of the validation module from the python-string-utils project.

Each directory contains the modules-under-test, the original test cases generated by Pynguin, and the transformed test cases.

Two trials were performed per case example of the transformation technique to assess the impact of different results from the LLM.

The survey directory contains the survey that was sent to assess the impact of the transformation on test readability.

survey.pdf contains the survey questions.

responses.xlsx contains the survey results.
f
Data from: S1 Dataset -
figshare.com
xlsx
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jaimin Patel; Peyton Robinson; Elisa Illing; Benjamin Anthony (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0306233.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0306233.s001
Dataset updated
Sep 26, 2024
Dataset provided by
PLOS ONE
Authors
Jaimin Patel; Peyton Robinson; Elisa Illing; Benjamin Anthony
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectivesThis study compares the performance of the artificial intelligence (AI) platform Chat Generative Pre-Trained Transformer (ChatGPT) to Otolaryngology trainees on board-style exam questions.MethodsWe administered a set of 30 Otolaryngology board-style questions to medical students (MS) and Otolaryngology residents (OR). 31 MSs and 17 ORs completed the questionnaire. The same test was administered to ChatGPT version 3.5, five times. Comparisons of performance were achieved using a one-way ANOVA with Tukey Post Hoc test, along with a regression analysis to explore the relationship between education level and performance.ResultsThe average scores increased each year from MS1 to PGY5. A one-way ANOVA revealed that ChatGPT outperformed trainee years MS1, MS2, and MS3 (p =
f
Data from: Evaluating the Diagnostic Accuracy and Management Recommendations...
tandf.figshare.com
docx
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Rojas-Carabali; Carlos Cifuentes-González; Xin Wei; Ikhwanuliman Putera; Alok Sen; Zheng Xian Thng; Rajdeep Agrawal; Tobias Elze; Lucia Sobrin; John H. Kempen; Bernett Lee; Jyotirmay Biswas; Quan Dong Nguyen; Vishali Gupta; Alejandra de-la-Torre; Rupesh Agrawal (2024). Evaluating the Diagnostic Accuracy and Management Recommendations of ChatGPT in Uveitis [Dataset]. http://doi.org/10.6084/m9.figshare.27097932.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27097932.v1
Dataset updated
Sep 24, 2024
Dataset provided by
Taylor & Francis
Authors
William Rojas-Carabali; Carlos Cifuentes-González; Xin Wei; Ikhwanuliman Putera; Alok Sen; Zheng Xian Thng; Rajdeep Agrawal; Tobias Elze; Lucia Sobrin; John H. Kempen; Bernett Lee; Jyotirmay Biswas; Quan Dong Nguyen; Vishali Gupta; Alejandra de-la-Torre; Rupesh Agrawal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accurate diagnosis and timely management are vital for favorable uveitis outcomes. Artificial Intelligence (AI) holds promise in medical decision-making, particularly in ophthalmology. Yet, the diagnostic precision and management advice from AI-based uveitis chatbots lack assessment. We appraised diagnostic accuracy and management suggestions of an AI-based chatbot, ChatGPT, versus five uveitis-trained ophthalmologists, using 25 standard cases aligned with new Uveitis Nomenclature guidelines. Participants predicted likely diagnoses, two differentials, and next management steps. Comparative success rates were computed. Ophthalmologists excelled (60–92%) in likely diagnosis, exceeding AI (60%). Considering fully and partially accurate diagnoses, ophthalmologists achieved 76–100% success; AI attained 72%. Despite an 8% AI improvement, its overall performance lagged. Ophthalmologists and AI agreed on diagnosis in 48% cases, with 91.6% exhibiting concurrence in management plans. The study underscores AI chatbots' potential in uveitis diagnosis and management, indicating their value in reducing diagnostic errors. Further research is essential to enhance AI chatbot precision in diagnosis and recommendations.
d
Data from: Performance of GPT-3.5 and GPT-4 on standardized urology...
search.dataone.org
dataverse.harvard.edu
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max S. Yudovich; Elizaveta Makarova; Christian M. Hague; Jay D. Raman (2024). Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study [Dataset]. http://doi.org/10.7910/DVN/4EJOCL
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/4EJOCL
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Max S. Yudovich; Elizaveta Makarova; Christian M. Hague; Jay D. Raman
Description
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States. In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2024). Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/

Estimated water consumption for training GPT-3 2023

Explore at:

Dataset updated

Nov 19, 2024

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Jul 2023

Area covered

Worldwide

Description

GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.

Clear search

Close search

Google apps

Main menu

Estimated water consumption for training GPT-3 2023

Energy consumption when training LLMs in 2022 (in MWh)

Date Set: ChatGPT as an education and learning tool for engineering,...

Replication Data for: ChatGPT outperforms crowd-workers for text-annotation...

Global weekly interest in ChatGPT on Google search 2022-2024

ChatGPT as an education and learning tool for engineering, technology and...

awesome-chatgpt-prompts

Data Sheet 1_Large language models generating synthetic clinical datasets: a...

Table_1_Evaluation of the quality and quantity of artificial...

Orca DPO Dialogue Pairs

Intel Orca Dialogue Pairs

Orca style for preference training (Intel's DPO dataset)

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Step 1: Understand the dataset

Step 2: Prepare your environment

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

VideoInstruct Dataset

Awesome ChatGPT Prompts

License

Learning the Rules of Peptide Self-assembly through Data Mining with Large...

OffensiveLang Dataset

Artificial Intelligence in Supply Chain Market Report

Google Gemini Statistics By Features, Performance and AI Versions

Replication Package for "Improving the Readability of Generated Tests Using...

Data from: S1 Dataset -

Data from: Evaluating the Diagnostic Accuracy and Management Recommendations...

Data from: Performance of GPT-3.5 and GPT-4 on standardized urology...

Estimated water consumption for training GPT-3 2023