55 datasets found
  1. Estimated water consumption for training GPT-3 2023

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2023
    Area covered
    Worldwide
    Description

    GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.

  2. All GPT-4 Conversations

    • kaggle.com
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). All GPT-4 Conversations [Dataset]. https://www.kaggle.com/datasets/thedevastator/all-gpt-4-synthetic-chat-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    All GPT-4 Generated Datasets

    Every chat dataset generated by GPT-4 from Huggingface at the same format

    From [Huggingface datasets]

    About this dataset

    How to use the dataset

    The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets. Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.

    Acknowledgements

    This dataset is a collection of several single chat datasets. If you use this dataset in your research, please credit the original authors of the internal datasets. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

  3. h

    airoboros-gpt4

    • huggingface.co
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jon Durbin (2023). airoboros-gpt4 [Dataset]. https://huggingface.co/datasets/jondurbin/airoboros-gpt4
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2023
    Authors
    Jon Durbin
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The data was generated by gpt-4, and therefore is subject to OpenAI ToS. The tool used to generate the data airoboros is apache-2. Specific areas of focus for this training data:

    trivia math nonsensical math coding closed context question answering closed context question answering, with multiple contexts to choose from as confounding factors writing multiple choice

      Usage and License Notices
    

    All airoboros models and datasets are intended and licensed for research use only.… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/airoboros-gpt4.

  4. LLM Feedback Collection

    • kaggle.com
    zip
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). LLM Feedback Collection [Dataset]. https://www.kaggle.com/datasets/thedevastator/fine-grained-gpt-4-evaluation
    Explore at:
    zip(159502027 bytes)Available download formats
    Dataset updated
    Nov 23, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    LLM Feedback Collection

    Induce fine-grained evaluation capabilities into language models

    By Huggingface Hub [source]

    About this dataset

    This dataset contains 100,000 feedback responses from GPT-4 AI models along with rubrics designed to evaluate both absolute and ranking scores. Each response is collected through a comprehensive evaluation process that takes into account the model's feedback, instruction, criteria for scoring, referenced answer and input given. This data provides researchers and developers with valuable insights into the performance of their AI models on various tasks as well as the ability to compare them against one another using precise and accurate measures. Each response is accompanied by five descriptive scores that give a detailed overview of its quality in terms of relevance to the input given, accuracy in reference to the reference answer provided, coherence between different parts of the output such as grammar and organization, fluency in expression of ideas without errors or unnecessary repetitions, and overall productivity accounting for all other factors combined. With this dataset at your disposal, you will be able to evaluate each output qualitatively without having to manually inspect every single response

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains feedback from GPT-4 models, along with associated rubrics for absolute and ranking scoring. It can be used to evaluate the performance of GPT-4 models on different challenging tasks.

    In order to use this dataset effectively, it is important to understand the data provided in each column: - orig_feedback – Feedback given by the original GPT-4 model - orig_score2_description – Description of the second score given to the original GPT-4 model - orig_reference_answer – Reference answer used to evaluate the original GPT-4 model
    - output – Output from the fine-grained evaluation
    - orig_response – Response from the original GPT-4 model * orig_criteria – Criteria used to evaluate the original GPT-4 model *orig_instruction– Instruction given to the original GPT 4 model *orig_score3 _description– Description of third score given to

    Research Ideas

    • Data-driven evaluation of GPT-4 models using the absolute and ranking scores collected from this dataset.
    • Training a deep learning model to automate the assessment of GPT-4 responses based on the rubrics provided in this dataset.
    • Building a semantic search engine using GPT-4 that is able to identify relevant responses more accurately with the help of this dataset's data collection metrics and rubrics for scoring

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:----------------------------|:---------------------------------------------------------------| | orig_feedback | Feedback from the evaluator. (Text) | | orig_score2_description | Description of the second score given by the evaluator. (Text) | | orig_reference_answer | Reference answer used to evaluate the model response. (Text) | | output | Output from the GPT-4 model. (Text) | | orig_response | Original response from the GPT-4 model. (Text) | | orig_criteria | Criteria used by the evaluator to rate the response. (Text) | | orig_instruction | Instructions provided by the evaluator. (Text) | | orig_score3_description | Description of the third score given by the evaluator. (Text) | | orig_score5_description | Description of the fifth score given by the evaluator. (Text) | | orig_score1_description | Description of the first score given by the evaluator. (Text) | | input | Input given to the evaluation. (Text) | | orig_score4_description | Description of the fourth score given by the evalua...

  5. G

    Generative Pre-trained Transformer (GPT) Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Oct 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Generative Pre-trained Transformer (GPT) Report [Dataset]. https://www.datainsightsmarket.com/reports/generative-pre-trained-transformer-gpt-1443347
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Oct 27, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Generative Pre-trained Transformer (GPT) market is experiencing explosive growth, projected to reach a substantial USD 35,000 million by 2025, driven by an impressive Compound Annual Growth Rate (CAGR) of 35%. This rapid expansion is fueled by the increasing adoption of advanced AI technologies across diverse applications, from enhancing customer service in large enterprises to empowering small and medium-sized businesses (SMEs) with innovative content creation and automation tools. The evolution of GPT models, particularly the advanced capabilities of GPT-4, is a significant driver, offering more sophisticated language understanding, generation, and reasoning. This surge in demand is also attributed to the growing need for personalized user experiences, efficient data analysis, and the development of intelligent applications that can streamline complex tasks. Companies are increasingly investing in GPT-powered solutions to gain a competitive edge, improve operational efficiency, and unlock new revenue streams. The market is poised for continued robust expansion through 2033, as the capabilities of GPT technology continue to mature and integrate more deeply into business processes. Key trends include the rise of specialized GPT models tailored for specific industries, the development of multimodal GPTs capable of processing and generating various forms of data (text, images, audio), and the growing focus on ethical AI development and deployment. While the market benefits from substantial growth drivers, potential restraints include the high computational costs associated with training and running large GPT models, ongoing concerns regarding data privacy and security, and the need for skilled AI professionals to effectively implement and manage these solutions. Nonetheless, the overarching trend points towards a transformative impact of GPT across nearly every sector, with significant opportunities for innovation and market leadership. This report delves into the dynamic Generative Pre-trained Transformer (GPT) market, encompassing its current state and future trajectory from 2019-2033. With 2025 serving as the Base Year and Estimated Year, the Forecast Period spans 2025-2033, building upon the Historical Period of 2019-2024. The analysis will quantify market opportunities, projecting significant growth, potentially reaching hundreds of millions of dollars in market value.

  6. h

    alpaca-gpt4-data-zh

    • huggingface.co
    Updated Apr 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Alexiuk (2023). alpaca-gpt4-data-zh [Dataset]. https://huggingface.co/datasets/llm-wizard/alpaca-gpt4-data-zh
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2023
    Authors
    Chris Alexiuk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "alpaca-gpt4-data-zh"

    All of the work is done by this team.

      Usage and License Notices
    

    The data is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

      English Dataset
    

    Found here

      Citation
    

    @article{peng2023gpt4llm, title={Instruction Tuning with GPT-4}, author={Baolin Peng, Chunyuan Li… See the full description on the dataset page: https://huggingface.co/datasets/llm-wizard/alpaca-gpt4-data-zh.

  7. AquilaMed-RL

    • huggingface.co
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beijing Academy of Artificial Intelligence (2024). AquilaMed-RL [Dataset]. https://huggingface.co/datasets/BAAI/AquilaMed-RL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2024
    Dataset authored and provided by
    Beijing Academy of Artificial Intelligence
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Introduce

    This dataset is used for the human preference training stage. The data is sampled from the SFT dataset, and the sampled data is then inferred using a trained SFT model and GPT-4. GPT-4 is subsequently used to score the two responses to determine the positive and negative examples.

      Cite
    

    If you find our work helpful, feel free to give us a cite. @misc{zhao2024aquliamed, title={Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models}… See the full description on the dataset page: https://huggingface.co/datasets/BAAI/AquilaMed-RL.

  8. Z

    Model Output of GPT-3.5 and GPT-4 for ECHR-AM

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zubaer, Abdullah Al; Granitzer, Michael; Mitrović, Jelena (2024). Model Output of GPT-3.5 and GPT-4 for ECHR-AM [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8246128
    Explore at:
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    University of Passau
    University of Passau | Institute for Artificial Intelligence Research and Development of Serbia, Novi Sad, Serbia.
    Authors
    Zubaer, Abdullah Al; Granitzer, Michael; Mitrović, Jelena
    Description

    "gpt3.5-gpt4-input-output-echram.zip" :

    Input and output to GPT-3.5 and GPT-4 based on ECHR dataset published in JSON format in this paper for argument component classification only i.e. clauses that are argumentative (conclusion/premise), extracted from the JSON file

    Note: Output of the model is under OpenAI Terms & policies.

    Please cite our paper also if you use this dataset: Performance analysis of large language models in the domain of legal argument mining

    You can click here for BibTex or copy the text below.

    @ARTICLE{10.3389/frai.2023.1278796,

    AUTHOR={Al Zubaer, Abdullah and Granitzer, Michael and Mitrović, Jelena },

    TITLE={Performance analysis of large language models in the domain of legal argument mining},

    JOURNAL={Frontiers in Artificial Intelligence},

    VOLUME={6},

    YEAR={2023},

    URL={https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1278796},

    DOI={10.3389/frai.2023.1278796},

    ISSN={2624-8212},

    ABSTRACT={Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.}}

  9. f

    Summary of GPT-4 TR review.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martinez, Nicole; Gallifant, Jack; Strekalova, Yulia A. Levites; Celi, Leo Anthony; Demner-Fushman, Dina; Pierce, Robin; Mwavu, Rogers; Osorio-Valencia, Juan S.; Parke, Rachael; Ghassemi, Marzyeh; Gichoya, Judy Wawira; Fiske, Amelia; McCoy, Liam G. (2024). Summary of GPT-4 TR review. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001387486
    Explore at:
    Dataset updated
    Jan 18, 2024
    Authors
    Martinez, Nicole; Gallifant, Jack; Strekalova, Yulia A. Levites; Celi, Leo Anthony; Demner-Fushman, Dina; Pierce, Robin; Mwavu, Rogers; Osorio-Valencia, Juan S.; Parke, Rachael; Ghassemi, Marzyeh; Gichoya, Judy Wawira; Fiske, Amelia; McCoy, Liam G.
    Description

    The study provides a comprehensive review of OpenAI’s Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4’s report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.

  10. Data for "Flexible, Model-Agnostic Method for Materials Data Extraction from...

    • figshare.com
    xlsx
    Updated May 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maciej Polak; Dane Morgan; Shrey Modi; Jinming Zhang; Anna Latosinska; Shaonan Wang; Jasmine Wang; Ayan Deep Hazra (2024). Data for "Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models" [Dataset]. http://doi.org/10.6084/m9.figshare.21861948.v5
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 10, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Maciej Polak; Dane Morgan; Shrey Modi; Jinming Zhang; Anna Latosinska; Shaonan Wang; Jasmine Wang; Ayan Deep Hazra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets for the paper entitled "Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models" by Maciej P. Polak, Shrey Modi, Anna Latosinska, Jinming Zhang, Ching-Wen Wang, Shanonan Wang, Ayan Deep Hazra, and Dane MorganMPPolak_BulkModulus_ValidationData.xlsx - a dataset of bulk modulus sentences, positive - containing bulk modulus data, and negative - not contaning data, used for model assessment.MPPolak_BulkModulus_AllTrainData.xlsx - a dataset of bulk modulus sentences, positive - containing bulk modulus data, and negative - not contaning data, used for fine tuning of the model and model assessment.MPPolak_CritCoolRate_Dataset.xlsx - a dataset of critical cooling rates for metallic glasses developed in this paper with the ,ethod presented in the paper, consisting of names of materials, values of critical cooling rates, their units, and DOIs of the source documents.MPPolak_DataExtraction_codes.zip - simple example codes necessary to reproduce the results. The provided 'positive' and 'negative' files are a shortened versions of the training data allowing for quick execution and testing. The 'pos' and 'neg' files contain full testing sets. The 'plotting' directory contains data and scripts which allow to reproduce the figures.

  11. f

    Data Sheet 1_Evaluating the strengths and limitations of multimodal...

    • frontiersin.figshare.com
    docx
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saif Aldeen AlRyalat; Ayman Mohammed Musleh; Malik Y. Kahook (2024). Data Sheet 1_Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images.docx [Dataset]. http://doi.org/10.3389/fopht.2024.1387190.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    Frontiers
    Authors
    Saif Aldeen AlRyalat; Ayman Mohammed Musleh; Malik Y. Kahook
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OverviewThis study evaluates the diagnostic accuracy of a multimodal large language model (LLM), ChatGPT-4, in recognizing glaucoma using color fundus photographs (CFPs) with a benchmark dataset and without prior training or fine tuning.MethodsThe publicly accessible Retinal Fundus Glaucoma Challenge “REFUGE” dataset was utilized for analyses. The input data consisted of the entire 400 image testing set. The task involved classifying fundus images into either ‘Likely Glaucomatous’ or ‘Likely Non-Glaucomatous’. We constructed a confusion matrix to visualize the results of predictions from ChatGPT-4, focusing on accuracy of binary classifications (glaucoma vs non-glaucoma).ResultsChatGPT-4 demonstrated an accuracy of 90% with a 95% confidence interval (CI) of 87.06%-92.94%. The sensitivity was found to be 50% (95% CI: 34.51%-65.49%), while the specificity was 94.44% (95% CI: 92.08%-96.81%). The precision was recorded at 50% (95% CI: 34.51%-65.49%), and the F1 Score was 0.50.ConclusionChatGPT-4 achieved relatively high diagnostic accuracy without prior fine tuning on CFPs. Considering the scarcity of data in specialized medical fields, including ophthalmology, the use of advanced AI techniques, such as LLMs, might require less data for training compared to other forms of AI with potential savings in time and financial resources. It may also pave the way for the development of innovative tools to support specialized medical care, particularly those dependent on multimodal data for diagnosis and follow-up, irrespective of resource constraints.

  12. H

    Data from: Comparison between GPT-4 and human raters in grading pharmacy...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wuan Shuen Yap; Pui San Saw; Li Ling Yeap; Shaun Wen Huey Lee; Wei Jin Wong; Ronald Seng Seng Lee (2025). Comparison between GPT-4 and human raters in grading pharmacy students’ exam responses in Malaysia: a cross-sectional study [Dataset]. http://doi.org/10.7910/DVN/CMT1TF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Wuan Shuen Yap; Pui San Saw; Li Ling Yeap; Shaun Wen Huey Lee; Wei Jin Wong; Ronald Seng Seng Lee
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Malaysia
    Description

    Manual grading is time-consuming and prone to inconsistencies, prompting exploration of generative artificial intelligence tools like GPT-4 to enhance efficiency and reliability. This study investigates GPT-4’s potential in grading pharmacy students’ exam responses, focusing on the impact of optimized prompts. Specifically, it evaluated the alignment between GPT-4 and human raters’, assessed GPT-4’s consistency over time, and determined its error rates in grading pharmacy students’ exam responses. We conducted a comparative study using past exam responses graded by university-trained raters vs. GPT-4’s . Responses were randomized before evaluation by GPT-4, accessed via a Plus account between April–September 2024. Prompt optimization was performed on 16 responses, followed by evaluation of 3 prompt delivery methods. We then applied the optimized approach across 4 item types. Intraclass correlation coefficients and error analyses assessed consistency and agreement between GPT-4 and human ratings.

  13. LLM - Detect AI Datamix

    • kaggle.com
    zip
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raja Biswas (2024). LLM - Detect AI Datamix [Dataset]. https://www.kaggle.com/datasets/conjuring92/ai-mix-v26
    Explore at:
    zip(172818297 bytes)Available download formats
    Dataset updated
    Jan 19, 2024
    Authors
    Raja Biswas
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is the datamix created by Team 🔍 📝 🕵️‍♂️ 🤖 during the LLM - Detect AI Generated Text competition. This dataset helped us to win the competition. It facilitates a text-classification task to separate LLM generate essays from the student written ones.

    It was developed in an incremental way focusing on size, diversity and complexity. For each datamix iteration, we attempted to plug blindspots of the previous generation models while maintaining robustness.

    To maximally leverage in-domain human texts, we used the entire Persuade corpus comprising all 15 prompts. We also included diverse human texts from sources such as OpenAI GPT2 output dataset, ELLIPSE corpus, NarrativeQA, wikipedia, NLTK Brown corpus and IMDB movie reviews.

    Sources for our generated essays can be grouped under four categories: - Proprietary LLMs (gpt-3.5, gpt-4, claude, cohere, gemini, palm) - Open source LLMs (llama, falcon, mistral, mixtral) - Existing LLM generated text datasets - Synthetic dataset made by T5 - DAIGT V2 subset - OUTFOX - Ghostbuster - gpt-2-output-dataset

    • Fine-tuned open-source LLMs (mistral, llama, falcon, deci-lm, t5, pythia, OPT, BLOOM, GPT2). For LLM fine-tuning, we leveraged the PERSUADE corpus in different ways:
      • Instruction tuning: Instructions were composed of different metadata e.g. prompt name, holistic essay score, ELL status and grade level. Responses were the corresponding student essays.
      • One topic held out: LLMs fine-tuned on PERSUADE essays with one prompt held out. When generating, only the held out prompt essays were generated. This was done to encourage new writing styles.
      • Span wise generation: Generate one span (discourse) at a time conditioned on the remaining essay.

    We used a wide variety of generation configs and prompting strategies to promote diversity & complexity to the data. Generated essays leveraged a combination of the following: - Contrastive search - Use of Guidance scale, typical_p, suppress_tokens - High temperature & large values of top-k - Prompting to fill-in-the-blank: randomly mask words in an essay and asking LLM to reconstruct the original essay (similar to MLM) - Prompting without source texts - Prompting with source texts - Prompting to rewrite existing essays

    Finally, we incorporated augmented essays to make our models aware of typical attacks on LLM content detection systems and obfuscations present in the provided training data. We mainly used a combination of the following augmentations on a random subset of essays: - Spelling correction - Deletion/insertion/swapping of characters - Replacement with synonym - Introduce obfuscations - Back translation - Random capitalization - Swap sentence

  14. G

    Generative Pre-trained Transformer (GPT) Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Generative Pre-trained Transformer (GPT) Report [Dataset]. https://www.archivemarketresearch.com/reports/generative-pre-trained-transformer-gpt-27152
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Market Analysis: Generative Pre-trained Transformer (GPT) The Generative Pre-trained Transformer (GPT) market is poised for exponential growth, driven by its transformative capabilities in natural language processing (NLP) and artificial intelligence (AI). With a market size estimated at USD X million in 2025, the market is projected to reach USD X million by 2033, exhibiting a robust CAGR of XX% during the forecast period. The increasing demand for advanced NLP applications, such as chatbot development, text summarization, and language translation, is fueling the market growth. Furthermore, the integration of GPT into cloud computing platforms and the rise of cognitive computing are further driving adoption. Key market trends include the emergence of GPT-4, the latest iteration of GPT, offering enhanced capabilities. Additionally, the convergence of GPT with other AI technologies, such as computer vision and machine learning, is creating new opportunities. However, data privacy concerns and the potential for GPT-generated content to be used for malicious purposes pose challenges to the market. The market is dominated by leading technology companies such as OpenAI, Microsoft, Google, and Baidu, which invest heavily in GPT development. North America and Asia Pacific are anticipated to be major growth regions, driven by the presence of leading technology hubs and the increasing adoption of NLP technologies.

  15. h

    data

    • huggingface.co
    Updated May 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    小蛮 (2023). data [Dataset]. https://huggingface.co/datasets/Amyww/data
    Explore at:
    Dataset updated
    May 19, 2023
    Authors
    小蛮
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary. Described in the following paper: https://arxiv.org/abs/2305.07759. The models referred to in the paper were trained on TinyStories-train.txt (the file tinystories-valid.txt can be used for validation loss). These models can be found on Huggingface, at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M. Additional resources: tinystories_all_data.tar.gz - contains a superset of… See the full description on the dataset page: https://huggingface.co/datasets/Amyww/data.

  16. E

    Data from: Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0

    • live.european-language-grid.eu
    binary format
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23730
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Nov 14, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Knowledge-Enhanced Winograd Schema Challenge KE-WSC is an upgraded version of the original WSC dataset. It includes the following extensions:

    • Annotation of semantically or syntactically solvable examples: Some samples from the original dataset can be solved without deeper semantic processing due to the morphologically richness of Slovene. For example, the sentence: “Riba je pojedla črva. Bila je lačna.” requires only the knowledge of gender and does not require any deep semantical processing to infer that the fish was hungry and not the worm. To have a representative set of syntactical samples, we decided to create 197 new examples by modifying the existing ones.
    • Two-Level Knowledge ontology: We developed a hierarchical scheme to categorize knowledge required to successfully solve a problem. In our analysis, we detected 9 high-level knowledge categories (social knowledge, psychological knowledge, etc.) and 37 lower-level more nuanced knowledge (physical laws/the laws of nature, social roles, causal relationships, etc.).
    • Semi-Automatic Explanation Generation: Textual explanations were generated using GPT-4, followed by verification and correction by human annotators to ensure accuracy and clarity. For instance, a textual explanation for the sentence “Pokal ne gre v rjav kovček, ker je prevelik.” is “Če je nekaj preveliko, se ne prilega v manjši prostor.”.
    • Translation to English: The finalized explanations were translated into English using a trained translator, enabling broader applicability.
    • SPO Triplet Generation: Subject-Predicate-Object triplets were extracted using GPT-4 to highlight key semantic relationships within each example.

    The dataset can be used to study knowledge explanation in models and enables knowledge-enhanced machine learning. It can be used to train a classification or generative models. It comprises 601 training samples, 200 validation samples, and 200 test samples, and is released in a tabular TSV format. The README.txt file contains a description of the attributes. The test set labels are private, as the dataset is integrated into the SloBENCH evaluation framework (https://slobench.cjvt.si/). If you use the dataset to train your models, please consider submitting the test set predictions to SloBENCH to get the evaluation score and see how it compares to others.

    References: Levesque, H., Davis, E., & Morgenstern, L. (2012, May). The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning.

  17. b

    GPQA Benchmarks and Pricing, Aug 2025

    • binaryverseai.com
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VALS AI (2025). GPQA Benchmarks and Pricing, Aug 2025 [Dataset]. https://binaryverseai.com/chatgpt-o3-pro-review-benchmarks-hacks/
    Explore at:
    Dataset updated
    Aug 9, 2025
    Dataset authored and provided by
    VALS AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of model accuracy on GPQA, token pricing, and latency for leading AI reasoning models.

  18. LLM Question-Answer Dataset

    • kaggle.com
    zip
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2024). LLM Question-Answer Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/llm-dataset/code
    Explore at:
    zip(543652 bytes)Available download formats
    Dataset updated
    Mar 6, 2024
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    LLM Dataset - Prompts and Generated Texts

    The dataset contains prompts and texts generated by the Large Language Models (LLMs) in 32 different languages. The prompts are short sentences or phrases for the model to generate text. The texts generated by the LLM are responses to these prompts and can vary in length and complexity.

    Researchers and developers can use this dataset to train and fine-tune their own language models for multilingual applications. The dataset provides a rich and diverse collection of outputs from the model, demonstrating its ability to generate coherent and contextually relevant text in multiple languages.

    👉 Legally sourced datasets and carefully structured for AI training and model development. Explore samples from our dataset - Full dataset

    Models used for text generation:

    • GPT-3.5,
    • GPT-4

    Languages in the dataset:

    Arabic, Azerbaijani, Catalan, Chinese, Czech, Danish, German, Greek, English, Esperanto, Spanish, Persian, Finnish, French, Irish, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malayalam, Maratham, Netherlands, Polish, Portuguese, Portuguese (Brazil), Slovak, Swedish, Thai, Turkish, Ukrainian

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Ff60c93f09ec82a765aa39678e4aa0a58%2Fsnapedit_1709731090855.jpeg?generation=1709738798916444&alt=media" alt="">

    🧩 This is just an example of the data. Leave a request here to learn more

    Content

    CSV File includes the following data: - from_language: language the prompt is made in, - model: type of the model (GPT-3.5, GPT-4 and Uncensored GPT Version), - time: time when the answer was generated, - text: user prompt, - response: response generated by the model

    🚀 You can learn more about our high-quality unique datasets here

    keywords: dataset, machine learning, natural language processing, artificial intelligence, deep learning, neural networks, text generation, language models, openai, gpt-3, data science, predictive modeling, sentiment analysis, keyword extraction, text classification, sequence-to-sequence models, attention mechanisms, transformer architecture, word embeddings, glove embeddings, chatbots, question answering, language understanding, text mining, information retrieval, data preprocessing, feature engineering, explainable ai, model deployment

  19. Small OpenOrca Dataset (0.05)

    • kaggle.com
    zip
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    zip(197922288 bytes)Available download formats
    Dataset updated
    Mar 1, 2024
    Authors
    fatih_kgg
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset is a subsample of the original OpenOrca dataset.
    The OpenOrca dataset is a collection of augmented FLAN Collection data. Currently ~1M GPT-4 completions, and ~3.2M GPT-3.5 completions. It is tabularized in alignment with the distributions presented in the ORCA paper and currently represents a partial completion of the full intended dataset, with ongoing generation to expand its scope. The data is primarily used for training and evaluation in the field of natural language processing.

    Each data instance in this dataset represents entries from the FLAN collection that have been augmented by submitting a listed question to either the GPT-4 or GPT-3.5 model. The response generated by the model is then recorded in the dataset.

    Original Dataset:
    OpenOrca ([https://huggingface.co/datasets/Open-Orca/OpenOrca])

    Subsampling Methodology:
    This subsample preserves the original distribution of the 17 unique 'system_prompt' values available in this feature in OpenOrca. We employed a stratified random sampling approach, selecting 5% (0.05 ratio) of the data points from each prompt style category. This ensures that the subsample retains the relative representation of different 'system_prompt' values while reducing the overall dataset size for focused analysis. While original dataset is around 4M rows, this dataset is 200K rows.

    Supported Tasks and Leaderboards:
    This dataset supports a range of tasks including language modeling, text generation, and text augmentation. It has been instrumental in the generation of multiple high-performing model checkpoints which have exhibited exceptional performance in our unit testing. Further information on leaderboards will be updated as they become available.

    Use Cases
    The dataset can be used for tasks related to language understanding, natural language processing, machine learning model training, and model performance evaluation.

    Dataset Structure

    Data Instances
    A data instance in this dataset represents entries from the FLAN collection which have been augmented by submitting the listed question to either GPT-4 or GPT-3.5. The response is then entered into the response field.

    Features
    'id', a unique numbered identifier which includes one of 'niv', 't0', 'cot', or 'flan' to represent which source FLAN Collection submix the 'question' is sourced from.
    'system_prompt', representing the System Prompt presented to the GPT-3.5 or GPT-4 API for the datapoint
    'question', representing a question entry as provided by the FLAN Collection
    'response', a response to that question received from a query to either GPT-3.5 or GPT-4.

  20. h

    airoboros-gpt4-m2.0

    • huggingface.co
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jon Durbin (2023). airoboros-gpt4-m2.0 [Dataset]. https://huggingface.co/datasets/jondurbin/airoboros-gpt4-m2.0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 25, 2023
    Authors
    Jon Durbin
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Overview

    This is a merge of https://hf.co/datasets/jondurbin/airoboros-gpt4-1.4.1 and https://hf.co/datasets/jondurbin/airoboros-gpt4-2.0

      Category breakdown
    
    
    
    
    
    
      Licence and usage restrictions
    

    The data was generated by gpt-4 via OpenAI API calls. The ToS for OpenAI API usage has a clause preventing the output from being used to train a model that competes with OpenAI

    what does compete actually mean here? these small open source models will not produce output… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/airoboros-gpt4-m2.0.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista, Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
Organization logo

Estimated water consumption for training GPT-3 2023

Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2023
Area covered
Worldwide
Description

GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.

Search
Clear search
Close search
Google apps
Main menu