45 datasets found
  1. Estimated water consumption for training GPT-3 2023

    • statista.com
    • ai-chatbox.pro
    Updated Nov 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
    Explore at:
    Dataset updated
    Nov 19, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2023
    Area covered
    Worldwide
    Description

    GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.

  2. Energy consumption when training LLMs in 2022 (in MWh)

    • statista.com
    Updated Sep 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Energy consumption when training LLMs in 2022 (in MWh) [Dataset]. https://www.statista.com/statistics/1384401/energy-use-when-training-llm-models/
    Explore at:
    Dataset updated
    Sep 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2022
    Area covered
    Worldwide
    Description

    Energy consumption of artificial intelligence (AI) models in training is considerable, with both GPT-3, the original release of the current iteration of OpenAI's popular ChatGPT, and Gopher consuming well over a thousand-megawatt hours of energy simply for training. As this is only for the training model it is likely that the energy consumption for the entire usage and lifetime of GPT-3 and other large language models (LLMs) is significantly higher. The largest consumer of energy, GPT-3, consumed roughly the equivalent of 200 Germans in 2022. While not a staggering amount, it is a considerable use of energy.

    Energy savings through AI

    While it is undoubtedly true that training LLMs takes a considerable amount of energy, the energy savings are also likely to be substantial. Any AI model that improves processes by minute numbers might save hours on shipment, liters of fuel, or dozens of computations. Each one of these uses energy as well and the sum of energy saved through a LLM might vastly outperform its energy cost. A good example is mobile phone operators, of which a third expect that AI might reduce power consumption by ten to fifteen percent. Considering that much of the world uses mobile phones this would be a considerable energy saver.

    Emissions are considerable

    The amount of CO2 emissions from training LLMs is also considerable, with GPT-3 producing nearly 500 tonnes of CO2. This again could be radically changed based on the types of energy production creating the emissions. Most data center operators for instance would prefer to have nuclear energy play a key role, a significantly low-emission energy producer.

  3. m

    Date Set: ChatGPT as an education and learning tool for engineering,...

    • data.mendeley.com
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RAVINDRA BHARDWAJ (2025). Date Set: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India [Dataset]. http://doi.org/10.17632/995zwcz5yt.2
    Explore at:
    Dataset updated
    Jun 25, 2025
    Authors
    RAVINDRA BHARDWAJ
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    This is the raw data that is used in the publication: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India.

  4. d

    Replication Data for: ChatGPT outperforms crowd-workers for text-annotation...

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gilardi, Fabrizio; Alizadeh, Meysam; Kubli, Maël (2023). Replication Data for: ChatGPT outperforms crowd-workers for text-annotation tasks [Dataset]. http://doi.org/10.7910/DVN/PQYF6M
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Gilardi, Fabrizio; Alizadeh, Meysam; Kubli, Maël
    Description

    Many NLP applications require manual text annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd-workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using four samples of tweets and news articles (n = 6,183), we show that ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frame detection. Across the four datasets, the zero-shot accuracy of ChatGPT exceeds that of crowd-workers by about 25 percentage points on average, while ChatGPT's intercoder agreement exceeds that of both crowd-workers and trained annotators for all tasks. Moreover, the per-annotation cost of ChatGPT is less than $0.003---about thirty times cheaper than MTurk. These results demonstrate the potential of large language models to drastically increase the efficiency of text classification.

  5. Global weekly interest in ChatGPT on Google search 2022-2024

    • statista.com
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Global weekly interest in ChatGPT on Google search 2022-2024 [Dataset]. https://www.statista.com/statistics/1366930/chatgpt-google-search-weekly-worldwide/
    Explore at:
    Dataset updated
    Jul 5, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 6, 2022 - Jun 30, 2024
    Area covered
    Worldwide
    Description

    As of June 2024, global Google searches for the word "ChatGPT" increased again after a slight decline by the end of 2024. Interest in the chatbot, developed by the U.S.-based OpenAI and launched in November 2022, started rising in the week ending December 3, 2022. Recently, growing demand for information on ChatGPT made the keyword hit a peak of 100 index points during the week ending on June 2, 2024. ChatGPT, which stands for Chat Generative Pre-trained Transformer, is a chatbot and AI-powered auto-generative text system able to give human-sounding replies and reproduce human-like interactions when prompted.

  6. m

    ChatGPT as an education and learning tool for engineering, technology and...

    • data.mendeley.com
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RAVINDRA BHARDWAJ (2024). ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India [Dataset]. http://doi.org/10.17632/995zwcz5yt.1
    Explore at:
    Dataset updated
    May 14, 2024
    Authors
    RAVINDRA BHARDWAJ
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    This is the raw data that is used in the publication: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India.

  7. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

  8. f

    Data Sheet 1_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    xlsx
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 1_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  9. f

    Table_1_Evaluation of the quality and quantity of artificial...

    • frontiersin.figshare.com
    docx
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jisun Choi; Ah Ran Oh; Jungchan Park; Ryung A. Kang; Seung Yeon Yoo; Dong Jae Lee; Kwangmo Yang (2024). Table_1_Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0.DOCX [Dataset]. http://doi.org/10.3389/fmed.2024.1400153.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Frontiers
    Authors
    Jisun Choi; Ah Ran Oh; Jungchan Park; Ryung A. Kang; Seung Yeon Yoo; Dong Jae Lee; Kwangmo Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.MethodsTwo anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.ResultsRegarding quality, “appropriate” was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed “insufficient” in 59% of cases for 3.5, and “adequate” in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were − 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.ConclusionChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.

  10. Orca DPO Dialogue Pairs

    • kaggle.com
    • opendatabay.com
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Orca DPO Dialogue Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/intel-orca-dialogue-pairs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Intel Orca Dialogue Pairs

    Orca style for preference training (Intel's DPO dataset)

    By Huggingface Hub [source]

    About this dataset

    The Intel/Orca/DPO Dialogue Pairs dataset is a unique resource for Natural language processing (NLP) research, combining AI and human conversations collected from online sources. This dataset is invaluable for exploring how human conversations can inform the development of conversational AI models. With columns such as System and Question extracted from chat logs, this dataset can help researchers understand more about how to better connect people with technology using meaningful dialogue. Furthermore, the data also includes columns for ChatGPT and Llama2–13b-Chat, two of the most widely used conversational AI models. By leveraging this data set, researchers have an exceptional opportunity to explore conversational techniques that enable humans and machines to communicate in natural languages

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide will provide an overview of how to use the Intel/Orca/DPO Dialogue Pairs dataset efficiently for human-centric natural language processing research.

    Step 1: Understand the dataset

    The Intel/Orca/DPO Dialogue Pairs dataset is composed of two main columns: System and Question. The System column contains responses from AI systems, and the Question column contains questions asked by humans. Additionally, this dataset also contains columns for ChatGPT and Llama2–13b-Chat, two models used in developing conversational AI systems.

    Step 2: Prepare your environment

    Before getting started with analyzing data from this dataset, you should first prepare your environment accordingly. Make sure that any necessary libraries or services are installed on your machine before attempting to work with the data from this dataset in order to avoid potential issues or errors during usage.

    ##### Step 3: Access the data
    In order to access and start working with the data contained in this Dataset, you can either download it directly via a Kaggle account or alternatively access it through one of its REST Endpoints if available on other services (i.e Databricks).

    ##### Step 4: Exploring & Analyzing the Data

    ##### Step 5 : Reporting Results
    Lastly ,once explorations and analyses have been completed its highly important that results are reported accurately especially when dealing with ethical datasets such as dialogue pairs since consequences could be dire if misinformation is disseminated .Reporting results should usually involve standard relevant indicators being declared while taking care conducting appropriate statistical tests ruling out incorrect anomalous outcomes

    Research Ideas

    • Developing and improving natural language processing algorithms for AI-human conversation.
    • Building user-friendly chatbots that are better at recognizing and understanding human intent by training the model using this dataset.
    • Designing recommendation systems to predict user questions and generate more accurate responses based on previous conversations in the dataset

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------------|:-----------------------------------------------------------------------------| | system | Contains the AI system's response to the user's question. (Text) | | chatgpt | Contains the ChatGPT model's response to the user's question. (Text) | | llama2-13b-chat | Contains the Llama2-13b-Chat model's response to the user's question. (Text) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  11. P

    VideoInstruct Dataset

    • paperswithcode.com
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Maaz; Hanoona Rasheed; Salman Khan; Fahad Shahbaz Khan (2024). VideoInstruct Dataset [Dataset]. https://paperswithcode.com/dataset/videoinstruct
    Explore at:
    Dataset updated
    May 17, 2024
    Authors
    Muhammad Maaz; Hanoona Rasheed; Salman Khan; Fahad Shahbaz Khan
    Description

    Video Instruction Dataset is used to train Video-ChatGPT. It consists of 100,000 high-quality video instruction pairs. employs a combination of human-assisted and semi-automatic annotation techniques, aiming to produce high-quality video instruction data. These methods create question-answer pairs related to

    Video summarization Description-based question-answers (exploring spatial, temporal, relationships, and reasoning concepts) Creative/generative question-answers

  12. o

    Awesome ChatGPT Prompts

    • opendatabay.com
    .csv
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Awesome ChatGPT Prompts [Dataset]. https://www.opendatabay.com/data/ai-ml/b19fe949-9f50-4a6e-ba87-7318e75458c2
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    Welcome to the "Awesome ChatGPT Prompts" dataset on Kaggle! This is a collection of prompt examples to be used with the ChatGPT model.

    The ChatGPT model is a large language model trained by OpenAI that is capable of generating human-like text. By providing it with a prompt, it can generate responses that continue the conversation or expand on the given prompt.

    License

    CC0

    Original Data Source: Awesome ChatGPT Prompts

  13. Learning the Rules of Peptide Self-assembly through Data Mining with Large...

    • zenodo.org
    bin, csv
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenze Yang; Zhenze Yang; Sarah K. Yorke; Sarah K. Yorke; Tuomas Knowles; Tuomas Knowles; Markus Buehler; Markus Buehler (2025). Learning the Rules of Peptide Self-assembly through Data Mining with Large Language Models [Dataset]. http://doi.org/10.5281/zenodo.14791268
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Feb 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zhenze Yang; Zhenze Yang; Sarah K. Yorke; Sarah K. Yorke; Tuomas Knowles; Tuomas Knowles; Markus Buehler; Markus Buehler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Peptides are biologically ubiquitous and important molecules that self-assemble into diverse structures. While extensive research has explored the effects of chemical composition and environmental conditions on self-assembly, a systematic study consolidating this data to uncover global rules is lacking. In this work, we curate a peptide assembly database through a combination of manual processing by human experts and literature mining with a large language model. As a result, we collect more than 1,000 experimental data entries with information about peptide sequence, experimental conditions and corresponding self-assembly phases. Utilizing the data, machine learning models are trained and evaluated, demonstrating excellent accuracy (> 80%) and efficiency in assembly phase classification. Moreover, we fine-tune our GPT model for peptide literature mining with the developed dataset, which exhibits markedly superior performance in extracting information from academic publications relative to the pre-trained model. This workflow can improve efficiency when exploring potential self-assembling peptide candidates, through guiding experimental work, while also deepening our understanding of the mechanisms governing peptide self-assembly.

    --- phase_data_clean.csv stores 1000+ peptide self-assembly data under different experimental conditions.

    --- trainset.jsonl and testset.jsonl are data we used for fine-tuning the LLM.

    --- fine-tuning.ipynb: code used to fine-tune ChatGPT model.

    --- pretrain.ipynb: code used to test the pretrained ChatGPT model.

    --- train_and_inference.ipynb: code to use mined data to train and test a ML predictor for phase classification.

  14. P

    OffensiveLang Dataset

    • paperswithcode.com
    Updated Mar 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amit Das; Mostafa Rahgouy; Dongji Feng; Zheng Zhang; Tathagata Bhattacharya; Nilanjana Raychawdhary; Fatemeh Jamshidi; Vinija Jain; Aman Chadha; Mary Sandage; Lauramarie Pope; Gerry Dozier; Cheryl Seals (2024). OffensiveLang Dataset [Dataset]. https://paperswithcode.com/dataset/offensivelang
    Explore at:
    Dataset updated
    Mar 3, 2024
    Authors
    Amit Das; Mostafa Rahgouy; Dongji Feng; Zheng Zhang; Tathagata Bhattacharya; Nilanjana Raychawdhary; Fatemeh Jamshidi; Vinija Jain; Aman Chadha; Mary Sandage; Lauramarie Pope; Gerry Dozier; Cheryl Seals
    Description

    OffensiveLang is a community based implicit offensive language dataset generated by ChatGPT 3.5 containing data for 38 different target groups. It has been meticulously annotated by Amazon MTurk workers, ensuring high-quality labeling of hate speech. Additionally, a prompt-based zero-shot method was employed with ChatGPT and the detection results were compared between human annotation and ChatGPT annotation. This dataset is invaluable for researchers and practitioners working on implicit hate speech detection and large language models.

    Source: ChatGPT 3.5 Length: 8270 texts train: 6616 texts test: 1654 texts

    OffensiveLang.csv_Details Column1: Text: Contains text generated by ChatGPT 3.5. Column2: Category: Represents the category of the target group. Column3: Target Group: Specifies the target group for the text. Column4: Final Annotation: The final human annotation determined by the majority vote among three MTurk annotators (this annotation will be used as the actual annotation for model training and evaluation). Column5: OpenAI_Annotation: Annotation provided by ChatGPT 3.5. Column6-8: Annotator1-3: Individual annotations from three human annotators.

    Paper For a detailed investigation of the OffensiveLang dataset, refer to the associated paper: OffensiveLang: A Community Based Implicit Offensive Language Dataset.

    Citation If you use this dataset in your research, please cite the following paper:

    @article{das2024offlandat, title={OffLanDat: A Community Based Implicit Offensive Language Dataset Generated by Large Language Model Through Prompt Engineering}, author={Das, Amit and Rahgouy, Mostafa and Feng, Dongji and Zhang, Zheng and Bhattacharya, Tathagata and Raychawdhary, Nilanjana and Sandage, Mary and Pope, Lauramarie and Dozier, Gerry and Seals, Cheryl}, journal={arXiv preprint arXiv:2403.02472}, year={2024} }

    License: CC BY Contact For any questions or feedback, feel free to contact the authors of the paper or the dataset contributors. Contact Information: azd0123@auburn.edu

  15. A

    Artificial Intelligence in Supply Chain Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Artificial Intelligence in Supply Chain Market Report [Dataset]. https://www.promarketreports.com/reports/artificial-intelligence-in-supply-chain-market-8381
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jan 23, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Artificial Intelligence (AI) in Supply Chain Market size was valued at USD 51.35 billion in 2025 and is projected to expand at a compound annual growth rate (CAGR) of 7.80% from 2025 to 2033, reaching a value of USD 93.47 billion by 2033. The growth of the market is attributed to the increasing adoption of AI technologies by businesses to improve their supply chain efficiency and optimize operations. Factors such as the rising demand for real-time data analysis, predictive analytics, and automation in the supply chain are driving the market growth. Additionally, government initiatives and investments in AI research and development are further contributing to the market expansion. The market for AI in Supply Chain is segmented based on component, end-user, and technology. The software segment held the largest market share in 2025, owing to the increasing adoption of software solutions for data analysis, forecasting, and optimization in the supply chain. The manufacturing end-user segment is expected to witness the highest growth rate during the forecast period due to the increasing implementation of AI technologies to improve production efficiency and reduce costs. Machine learning and natural language processing are the key technologies driving the growth of the AI in Supply Chain Market. Recent developments include: The recent rise of artificial intelligence (AI) has given the sector fresh optimism, and one particular technology, ChatGPT, is showing a lot of potential. The ChatGPT language model was developed by OpenAI to generate human-like responses to questions posed in natural language. It has been trained on a large amount of data, identifying patterns and producing solutions that are extremely accurate and pertinent to the situation. Just a few months after going live, the site had more than 100 million signups. ChatGPT has already shown enormous promise in the fields of healthcare and finance, and it is ready to change supply chain management for startups., Actyv.ai, a category pioneer in the enterprise SaaS with embedded B2B BNPL and insurance arena with headquarters in Singapore, announced a strategic agreement with PwC India in March 2023 to promote embedded finance adoption in supply chain ecosystems for their clients. In addition to facilitating access to pertinent embedded financial and insurance products, the partnership seeks to concentrate on using the potential of artificial intelligence to spur growth prospects in the global supply chain ecosystem.. Key drivers for this market are: Increasing market growth of E-commerce, Increasing growth in big data technology; High demand for advanced solutions for transparency in supply chain process. Potential restraints include: Lack of technical expertise.

  16. E

    Google Gemini Statistics By Features, Performance and AI Versions

    • enterpriseappstoday.com
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EnterpriseAppsToday (2023). Google Gemini Statistics By Features, Performance and AI Versions [Dataset]. https://www.enterpriseappstoday.com/stats/google-gemini-statistics.html
    Explore at:
    Dataset updated
    Dec 20, 2023
    Dataset authored and provided by
    EnterpriseAppsToday
    License

    https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Google Gemini Statistics: In 2023, Google unveiled the most powerful AI model to date. Google Gemini is the world’s most advanced AI leaving the ChatGPT 4 behind in the line. Google has 3 different sizes of models, superior to each, and can perform tasks accordingly. According to Google Gemini Statistics, these can understand and solve complex problems related to absolutely anything. Google even said, they will develop AI in such as way that it will let you know how helpful AI is in our daily routine. Well, we hope our next generation won’t be fully dependent on such technologies, otherwise, we will lose all of our natural talent! Editor’s Choice Google Gemini can follow natural and engaging conversations. According to Google Gemini Statistics, Gemini Ultra has a 90.0% score on the MMLU benchmark for testing the knowledge of and problem-solving on subjects including history, physics, math, law, ethics, history, and medicine. If you ask Gemini what to do with your raw material, it can provide you with ideas in the form of text or images according to the given input. Gemini has outperformed ChatGPT -4 tests in the majority of the cases. According to the report this LLM is said to be unique because it can process multiple types of data at the same time along with video, images, computer code, and text. Google is considering its development as The Gemini Era, showing the importance of our AI is significant in improving our daily lives. Google Gemini can talk like a real person Gemini Ultra is the largest model and can solve extremely complex problems. Gemini models are trained on multilingual and multimodal datasets. Gemini’s Ultra performance on the MMMU benchmark has also outperformed the GPT-4V in the following results Art and Design (74.2), Business (62.7), Health and Medicine (71.3), Humanities and Social Science (78.3), and Technology and Engineering (53.00).

  17. Replication Package for "Improving the Readability of Generated Tests Using...

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Gay; Gregory Gay (2023). Replication Package for "Improving the Readability of Generated Tests Using GPT-4 and ChatGPT Code Interpreter" [Dataset]. http://doi.org/10.5281/zenodo.8296610
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gregory Gay; Gregory Gay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    While automated test generation can decrease the human burden associated with testing, it does not eliminate this burden. Humans must still work with generated test cases to interpret testing results, debug the code, build and maintain a comprehensive test suite, and many other tasks. Therefore, a major challenge with automated test generation is understandability of generated test test cases.

    Large language models (LLMs), machine learning models trained on massive corpora of textual data - including both natural language and programming languages - are an emerging technology with great potential for performing language-related predictive tasks such as translation, summarization, and decision support.

    In this study, we are exploring the capabilities of LLMs with regard to improving test case understandability.

    This package contains the data produced during this exploration:

    • The examples directory contains the three case studies we tested our transformation process on:
      • queue_example: Tests of a basic queue data structure
      • httpie_sessions: Tests of the sessions module from the httpie project.
      • string_utils_validation: Tests of the validation module from the python-string-utils project.
      • Each directory contains the modules-under-test, the original test cases generated by Pynguin, and the transformed test cases.
      • Two trials were performed per case example of the transformation technique to assess the impact of different results from the LLM.
    • The survey directory contains the survey that was sent to assess the impact of the transformation on test readability.
      • survey.pdf contains the survey questions.
      • responses.xlsx contains the survey results.
  18. f

    Data from: S1 Dataset -

    • figshare.com
    xlsx
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaimin Patel; Peyton Robinson; Elisa Illing; Benjamin Anthony (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0306233.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jaimin Patel; Peyton Robinson; Elisa Illing; Benjamin Anthony
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectivesThis study compares the performance of the artificial intelligence (AI) platform Chat Generative Pre-Trained Transformer (ChatGPT) to Otolaryngology trainees on board-style exam questions.MethodsWe administered a set of 30 Otolaryngology board-style questions to medical students (MS) and Otolaryngology residents (OR). 31 MSs and 17 ORs completed the questionnaire. The same test was administered to ChatGPT version 3.5, five times. Comparisons of performance were achieved using a one-way ANOVA with Tukey Post Hoc test, along with a regression analysis to explore the relationship between education level and performance.ResultsThe average scores increased each year from MS1 to PGY5. A one-way ANOVA revealed that ChatGPT outperformed trainee years MS1, MS2, and MS3 (p =

  19. f

    Data from: Evaluating the Diagnostic Accuracy and Management Recommendations...

    • tandf.figshare.com
    docx
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Rojas-Carabali; Carlos Cifuentes-González; Xin Wei; Ikhwanuliman Putera; Alok Sen; Zheng Xian Thng; Rajdeep Agrawal; Tobias Elze; Lucia Sobrin; John H. Kempen; Bernett Lee; Jyotirmay Biswas; Quan Dong Nguyen; Vishali Gupta; Alejandra de-la-Torre; Rupesh Agrawal (2024). Evaluating the Diagnostic Accuracy and Management Recommendations of ChatGPT in Uveitis [Dataset]. http://doi.org/10.6084/m9.figshare.27097932.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    William Rojas-Carabali; Carlos Cifuentes-González; Xin Wei; Ikhwanuliman Putera; Alok Sen; Zheng Xian Thng; Rajdeep Agrawal; Tobias Elze; Lucia Sobrin; John H. Kempen; Bernett Lee; Jyotirmay Biswas; Quan Dong Nguyen; Vishali Gupta; Alejandra de-la-Torre; Rupesh Agrawal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accurate diagnosis and timely management are vital for favorable uveitis outcomes. Artificial Intelligence (AI) holds promise in medical decision-making, particularly in ophthalmology. Yet, the diagnostic precision and management advice from AI-based uveitis chatbots lack assessment. We appraised diagnostic accuracy and management suggestions of an AI-based chatbot, ChatGPT, versus five uveitis-trained ophthalmologists, using 25 standard cases aligned with new Uveitis Nomenclature guidelines. Participants predicted likely diagnoses, two differentials, and next management steps. Comparative success rates were computed. Ophthalmologists excelled (60–92%) in likely diagnosis, exceeding AI (60%). Considering fully and partially accurate diagnoses, ophthalmologists achieved 76–100% success; AI attained 72%. Despite an 8% AI improvement, its overall performance lagged. Ophthalmologists and AI agreed on diagnosis in 48% cases, with 91.6% exhibiting concurrence in management plans. The study underscores AI chatbots' potential in uveitis diagnosis and management, indicating their value in reducing diagnostic errors. Further research is essential to enhance AI chatbot precision in diagnosis and recommendations.

  20. d

    Data from: Performance of GPT-3.5 and GPT-4 on standardized urology...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max S. Yudovich; Elizaveta Makarova; Christian M. Hague; Jay D. Raman (2024). Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study [Dataset]. http://doi.org/10.7910/DVN/4EJOCL
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Max S. Yudovich; Elizaveta Makarova; Christian M. Hague; Jay D. Raman
    Description

    This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States. In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2024). Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
Organization logo

Estimated water consumption for training GPT-3 2023

Explore at:
Dataset updated
Nov 19, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2023
Area covered
Worldwide
Description

GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.

Search
Clear search
Close search
Google apps
Main menu