31 datasets found
  1. f

    Data_Sheet_2_Performance analysis of large language models in the domain of...

    • frontiersin.figshare.com
    pdf
    Updated Nov 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullah Al Zubaer; Michael Granitzer; Jelena Mitrović (2023). Data_Sheet_2_Performance analysis of large language models in the domain of legal argument mining.pdf [Dataset]. http://doi.org/10.3389/frai.2023.1278796.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Frontiers
    Authors
    Abdullah Al Zubaer; Michael Granitzer; Jelena Mitrović
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.

  2. W

    ChatGPT Usage Survey Data

    • webfx.com
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebFX (2025). ChatGPT Usage Survey Data [Dataset]. https://www.webfx.com/blog/ai/chatgpt-usage-statistics/
    Explore at:
    Dataset updated
    Sep 2, 2025
    Dataset authored and provided by
    WebFX
    Variables measured
    Average words in first message, Average words per ChatGPT conversation, Average number of messages per conversation, Percentage of conversations that are commands, Percentage of conversations that start as questions, Percentage of conversations in the "learning & understanding" category, Percentage of conversations using advanced features (persona assignment / data upload)
    Description

    Analysis of 13,252 publicly shared ChatGPT conversations by WebFX to uncover usage statistics - prompt length, message count, question vs command distribution, use-case categories.

  3. d

    A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott McGrath (2025). A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions - Full study data [Dataset]. http://doi.org/10.5061/dryad.s4mw6m9cv
    Explore at:
    Dataset updated
    Aug 1, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Scott McGrath
    Time period covered
    Jan 1, 2023
    Description

    Objective: Our objective is to evaluate the efficacy of ChatGPT 4 in accurately and effectively delivering genetic information, building on previous findings with ChatGPT 3.5. We focus on assessing the utility, limitations, and ethical implications of using ChatGPT in medical settings. Materials and Methods: A structured questionnaire, including the Brief User Survey (BUS-15) and custom questions, was developed to assess ChatGPT 4's clinical value. An expert panel of genetic counselors and clinical geneticists independently evaluated ChatGPT 4's responses to these questions. We also involved comparative analysis with ChatGPT 3.5, utilizing descriptive statistics and using R for data analysis. Results: ChatGPT 4 demonstrated improvements over 3.5 in context recognition, relevance, and informativeness. However, performance variability and concerns about the naturalness of the output were noted. No significant difference in accuracy was found between ChatGPT 3.5 and 4.0. Notably, the effic..., Study Design This study was conducted to evaluate the performance of ChatGPT 4 (March 23rd, 2023)  Model) in the context of genetic counseling and education. The evaluation involved a structured questionnaire, which included questions selected from the Brief User Survey (BUS-15) and additional custom questions designed to assess the clinical value of ChatGPT 4's responses. Questionnaire Development The questionnaire was built on Qualtrics, which comprised twelve questions: seven selected from the BUS-15 preceded by two additional questions that we designed. The initial questions focused on quality and answer relevancy: 1.    The overall quality of the Chatbot’s response is: (5-point Likert: Very poor to Very Good) 2.    The Chatbot delivered an answer that provided the relevant information you would include if asked the question. (5-point Likert: Strongly disagree to Strongly agree) The BUS-15 questions (7-point Likert: Strongly disagree to Strongly agree) focused on: 1.    Recogniti..., , # A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions - Full study data

    https://doi.org/10.5061/dryad.s4mw6m9cv

    This data was captured when evaluating the ability of ChatGPT to address questions patients may ask it about three genetic conditions (BRCA1, HFE, and MLH1). This data is associated with the JAMIA article of the similar name with the DOIÂ 10.1093/jamia/ocae128

    Description of the data and file structure

    1. Key: This tab contains the data structure, explaining the survey questions, and potential responses available.
    2. Prompt Responses: This tab contains the prompts used for ChatGPT, and the response provided from each model (3.5 and 4)
    3. GPT 4 Results: This tab provides the responses collected from the medical experts (genetic counselors and clinical geneticist) from the Qualtrics survey.
    4. Accuracy (Qx_1): This tab contains the subset of results from both the Ch...
  4. T

    Text Analytics Market Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Jun 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Text Analytics Market Report [Dataset]. https://www.marketreportanalytics.com/reports/text-analytics-market-89598
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The text analytics market is experiencing robust growth, projected to reach $10.49 billion in 2025 and exhibiting a remarkable Compound Annual Growth Rate (CAGR) of 39.90% from 2019 to 2033. This expansion is fueled by several key drivers. The increasing volume of unstructured data generated across various industries, including healthcare, finance, and customer service, necessitates sophisticated tools for extracting actionable insights. Furthermore, advancements in natural language processing (NLP), machine learning (ML), and artificial intelligence (AI) are empowering text analytics solutions with enhanced capabilities, such as sentiment analysis, topic modeling, and entity recognition. The rising adoption of cloud-based solutions also contributes to market growth, offering scalability, cost-effectiveness, and ease of access. Major industry players like IBM, Microsoft, and SAP are actively investing in research and development, driving innovation and expanding the market's capabilities. Competitive pressures are fostering a continuous improvement in the accuracy and efficiency of text analytics tools, making them increasingly attractive to businesses of all sizes. The growing demand for real-time insights and improved customer experience further propels market expansion. While the market enjoys significant growth momentum, certain challenges persist. Data security and privacy concerns remain paramount, necessitating robust security measures within text analytics platforms. The complexity of implementing and integrating these solutions into existing IT infrastructures can also pose a barrier to adoption, particularly for smaller businesses lacking dedicated data science teams. Furthermore, the accuracy and reliability of text analytics outputs can be affected by the quality and consistency of the input data. Overcoming these challenges through improved data governance, user-friendly interfaces, and robust customer support will be crucial for continued market expansion. Despite these restraints, the overall market outlook remains positive, driven by the continuous evolution of technology and the growing reliance on data-driven decision-making across diverse sectors. Recent developments include: January 2023- Microsoft announced a new multibillion-dollar investment in ChatGPT maker Open AI. ChatGPT, automatically generates text based on written prompts in a more creative and advanced than the chatbots. Through this investment, the company will accelerate breakthroughs in AI, and both companies will commercialize advanced technologies., November 2022 - Tntra and Invenio have partnered to develop a platform that offers comprehensive data analysis on a firm. Throughout the process, Tntra offered complete engineering support and cooperation to Invenio. Tantra offers feeds, knowledge graphs, intelligent text extraction, and analytics, which enables Invenio to give information on seven parts of the business, such as false news identification, subject categorization, dynamic data extraction, article summaries, sentiment analysis, and keyword extraction.. Key drivers for this market are: Growing Demand for Social Media Analytics, Rising Practice of Predictive Analytics. Potential restraints include: Growing Demand for Social Media Analytics, Rising Practice of Predictive Analytics. Notable trends are: Retail and E-commerce to Hold a Significant Share in Text Analytics Market.

  5. f

    Data Sheet 1_A multidimensional comparison of ChatGPT, Google Translate, and...

    • frontiersin.figshare.com
    xlsx
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiyue Chen; Yan Lin (2025). Data Sheet 1_A multidimensional comparison of ChatGPT, Google Translate, and DeepL in Chinese tourism texts translation: fidelity, fluency, cultural sensitivity, and persuasiveness.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1619489.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    Frontiers
    Authors
    Shiyue Chen; Yan Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study systematically compares the translation performance of ChatGPT, Google Translate, and DeepL on Chinese tourism texts, focusing on two prompt-engineering strategies. Using a mixed-methods approach that combines quantitative expert assessments with qualitative analysis, the evaluation centers on fidelity, fluency, cultural sensitivity, and persuasiveness. ChatGPT outperformed its counterparts across all metrics, especially when culturally tailored prompts were used. However, it occasionally introduced semantic shifts, highlighting a trade-off between accuracy and rhetorical adaptation. Despite its strong performance, human post-editing remains necessary to ensure semantic precision and professional standards. The study demonstrates ChatGPT’s potential in domain-specific translation tasks while calling for continued oversight in culturally nuanced content.

  6. awesome-chatgpt-prompts 112k stars in github

    • kaggle.com
    zip
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tony Li (2024). awesome-chatgpt-prompts 112k stars in github [Dataset]. https://www.kaggle.com/tonylica/awesome-chatgpt-prompts-112k-stars-in-github
    Explore at:
    zip(27819 bytes)Available download formats
    Dataset updated
    Oct 16, 2024
    Authors
    Tony Li
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset features 170 curated ChatGPT prompts designed to enhance your experience with the model. It mirrors the popular GitHub repository with 112,000 stars, which you can find at Awesome ChatGPT Prompts, updated as of October 2024.

    ChatGPT, developed by OpenAI, is a sophisticated large language model that generates human-like text. By supplying it with a prompt, you can receive responses that not only continue the conversation but also elaborate on your initial input.

    In this dataset, you’ll discover a diverse array of prompts suitable for use with ChatGPT.

    The motivation for transferring this data from GitHub to Kaggle is to facilitate my development of a simple web UI, serving as a prompt helper. I will leverage machine learning and LLM models to categorize these 170 ChatGPT prompts.

  7. f

    Data Sheet 1_On the emergent capabilities of ChatGPT 4 to estimate...

    • frontiersin.figshare.com
    zip
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Piastra; Patrizia Catellani (2025). Data Sheet 1_On the emergent capabilities of ChatGPT 4 to estimate personality traits.zip [Dataset]. http://doi.org/10.3389/frai.2025.1484260.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Frontiers
    Authors
    Marco Piastra; Patrizia Catellani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigates the potential of ChatGPT 4 in the assessment of personality traits based on written texts. Using two publicly available datasets containing both written texts and self-assessments of the authors’ psychological traits based on the Big Five model, we aimed to evaluate the predictive performance of ChatGPT 4. For each sample text, we asked for numerical predictions on an eleven-point scale and compared them with the self-assessments. We also asked for ChatGPT 4 confidence scores on an eleven-point scale for each prediction. To keep the study within a manageable scope, a zero-prompt modality was chosen, although more sophisticated prompting strategies could potentially improve performance. The results show that ChatGPT 4 has moderate but significant abilities to automatically infer personality traits from written text. However, it also shows limitations in recognizing whether the input text is appropriate or representative enough to make accurate inferences, which could hinder practical applications. Furthermore, the results suggest that improved benchmarking methods could increase the efficiency and reliability of the evaluation process. These results pave the way for a more comprehensive evaluation of the capabilities of Large Language Models in assessing personality traits from written texts.

  8. p

    AI-Driven Mental Health Literacy - An Interventional Study from India...

    • psycharchives.org
    Updated Oct 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). AI-Driven Mental Health Literacy - An Interventional Study from India (Codebook for the data).csv [Dataset]. https://psycharchives.org/handle/20.500.12034/8771
    Explore at:
    Dataset updated
    Oct 2, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The dataset is from an Indian study which made use of ChatGPT- a natural language processing model by OpenAI to design a mental health literacy intervention for college students. Prompt engineering tactics were used to formulate prompts that acted as anchors in the conversations with the AI agent regarding mental health. An intervention lasting for 20 days was designed with sessions of 15-20 minutes on alternative days. Fifty-one students completed pre-test and post-test measures of mental health literacy, mental help-seeking attitude, stigma, mental health self-efficacy, positive and negative experiences, and flourishing in the main study, which were then analyzed using paired t-tests. The results suggest that the intervention is effective among college students as statistically significant changes were noted in mental health literacy and mental health self-efficacy scores. The study affirms the practicality, acceptance, and initial indications of AI-driven methods in advancing mental health literacy and suggests the promising prospects of innovative platforms such as ChatGPT within the field of applied positive psychology.: Codebook for the Dataset provided

  9. Z

    Can Developers Prompt? A Controlled Experiment for Code Documentation...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kruse, Hans-Alexander; Puhlfürß, Tim; Maalej, Walid (2024). Can Developers Prompt? A Controlled Experiment for Code Documentation Generation [Replication Package] [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13127237
    Explore at:
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    Universität Hamburg
    Authors
    Kruse, Hans-Alexander; Puhlfürß, Tim; Maalej, Walid
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Summary of Artifacts

    This is the replication package for the paper titled 'Can Developers Prompt? A Controlled Experiment for Code Documentation Generation' that is part of the 40th IEEE International Conference on Software Maintenance and Evolution (ICSME), from October 6 to 11, 2024, located in Flagstaff, AZ, USA.

    Full Abstract

    Large language models (LLMs) bear great potential for automating tedious development tasks such as creating and maintaining code documentation. However, it is unclear to what extent developers can effectively prompt LLMs to create concise and useful documentation. We report on a controlled experiment with 20 professionals and 30 computer science students tasked with code documentation generation for two Python functions. The experimental group freely entered ad-hoc prompts in a ChatGPT-like extension of Visual Studio Code, while the control group executed a predefined few-shot prompt. Our results reveal that professionals and students were unaware of or unable to apply prompt engineering techniques. Especially students perceived the documentation produced from ad-hoc prompts as significantly less readable, less concise, and less helpful than documentation from prepared prompts. Some professionals produced higher quality documentation by just including the keyword Docstring in their ad-hoc prompts. While students desired more support in formulating prompts, professionals appreciated the flexibility of ad-hoc prompting. Participants in both groups rarely assessed the output as perfect. Instead, they understood the tools as support to iteratively refine the documentation. Further research is needed to understand which prompting skills and preferences developers have and which support they need for certain tasks.

    Author Information

    Name Affiliation Email

    Hans-Alexander Kruse Universität Hamburg hans-alexander.kruse@studium.uni-hamburg.de

    Tim Puhlfürß Universität Hamburg tim.puhlfuerss@uni-hamburg.de

    Walid Maalej Universität Hamburg walid.maalej@uni-hamburg.de

    Citation Information

    @inproceedings{kruse-icsme-2024, author={Kruse, Hans-Alexander and Puhlf{"u}r{\ss}, Tim and Maalej, Walid}, booktitle={2022 IEEE International Conference on Software Maintenance and Evolution}, title={Can Developers Prompt? A Controlled Experiment for Code Documentation Generation}, year={2024}, doi={tba}, }

    Artifacts Overview

    1. Preprint

    The file kruse-icsme-2024-preprint.pdf is the preprint version of the official paper. You should read the paper in detail to understand the study, especially its methodology and results.

    1. Results

    The folder results includes two subfolders, explained in the following.

    Demographics RQ1 RQ2

    The subfolder Demographics RQ1 RQ2 provides Jupyter Notebook file evaluation.ipynb for analyzing (1) the experiment participants' submissions of the digital survey and (2) the ad-hoc prompts that the experimental group entered into their tool. Hence, this file provides demographic information about the participants and results for the research questions 1 and 2. Please refer to the README file inside this subfolder for installation steps of the Jupyter Notebook file.

    RQ2

    The subfolder RQ2 contains further subfolders with Microsoft Excel files specific to the results of research question 2:

    The subfolder UEQ contains three times the official User Experience Questionnaire (UEQ) analysis Excel tool, with data entered from all participants/students/professionals.

    The subfolder Open Coding contains three Excel files with the open-coding results for the free-text answers that participants could enter at the end of the survey to state additional positive and negative comments about their experience during the experiment. The Consensus file provides the finalized version of the open coding process.

    1. Extension

    The folder extension contains the code of the Visual Studio Code (VS Code) extension developed in this study to generate code documentation with predefined prompts. Please refer to the README file inside the folder for installation steps. Alternatively, you can install the deployed version of this tool, called Code Docs AI, via the VS Code Marketplace.

    You can install the tool to generate code documentation with ad-hoc prompts directly via the VS Code Marketplace. We did not include the code of this extension in this replication package due to license conflicts (GNUv3 vs. MIT).

    1. Survey

    The folder survey contains PDFs of the digital survey in two versions:

    The file Survey.pdf contains the rendered version of the survey (how it was presented to participants).

    The file SurveyOptions.pdf is an export of the LimeSurvey web platform. Its main purpose is to provide the technical answer codes, e.g., AO01 and AO02, that refer to the rendered answer texts, e.g., Yes and No. This can help you if you want to analyze the CSV files inside the results folder (instead of using the Jupyter Notebook file), as the CSVs contain the answer codes, not the answer texts. Please note that an export issue caused page 9 to be almost blank. However, this problem is negligible as the question on this page only contained one free-text answer field.

    1. Appendix

    The folder appendix provides additional material about the study:

    The subfolder tool_screenshots contains screenshots of both tools.

    The file few_shots.txt lists the few shots used for the predefined prompt tool.

    The file test_functions.py lists the functions used in the experiment.

    Revisions

    Version Changelog

    1.0.0 Initial upload

    1.1.0 Add paper preprint. Update abstract.

    1.2.0 Update replication package based on ICSME Artifact Track reviews

    License

    See LICENSE file.

  10. LLM Data

    • figshare.com
    xlsx
    Updated Sep 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carter Emerson (2025). LLM Data [Dataset]. http://doi.org/10.6084/m9.figshare.30066574.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 5, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Carter Emerson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data from Prompts to politics: How political identity shapes AI-generated discourse on climate change

  11. AI Generated Documents

    • figshare.com
    pdf
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Myers (2025). AI Generated Documents [Dataset]. http://doi.org/10.6084/m9.figshare.30508832.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 8, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Robert Myers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains case studies and discussion prompts generated by the Ethics Case Study and Discussion Prompt Creator (https://chatgpt.com/g/g-676e10593bcc8191ba88a1386d3947a4-ethics-case-study-and-discussion-prompt-creator).

  12. Custom GPT Instructions and Knowledge

    • figshare.com
    txt
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Myers (2025). Custom GPT Instructions and Knowledge [Dataset]. http://doi.org/10.6084/m9.figshare.30510953.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 8, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Robert Myers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the instructions and knowledge uploaded to create the custom GPT: Ethics Case Study and Discussion Prompt Creator (https://chatgpt.com/g/g-676e10593bcc8191ba88a1386d3947a4-ethics-case-study-and-discussion-prompt-creator).

  13. m

    LLM dermatological patient handouts - supplementary data

    • data.mendeley.com
    Updated Sep 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crystal Chang (2023). LLM dermatological patient handouts - supplementary data [Dataset]. http://doi.org/10.17632/5ngxkzkdp9.2
    Explore at:
    Dataset updated
    Sep 7, 2023
    Authors
    Crystal Chang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary material for Assessment of Large Language Models to Generate Patient Handouts for the Dermatology Clinic: a single-blinded randomized study

    Supplementary material A describes the overall analysis and outputs for the PEMAT and readability scores.

    Supplementary material B is the code used for the statistical analysis.

    LLM_readability_scores, PEMAT, LLM_attending_rank, rater_df, and LLM_randomization_protocol are the raw data used for analysis.

    ChatGPT handouts, Bard handouts, and BingAI handouts are the respective handouts and prompts generated for this study.

  14. d

    Evaluation of large language model chatbot responses to psychotic prompts:...

    • search.dataone.org
    • datadryad.org
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elaine Shen; Fadi Hamati; Meghan Rose Donohue; Ragy Girgis; Jeremy Veenstra-VanderWeele; Amandeep Jutla (2025). Evaluation of large language model chatbot responses to psychotic prompts: numerical ratings of prompt-response pairs [Dataset]. http://doi.org/10.5061/dryad.x0k6djj00
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Elaine Shen; Fadi Hamati; Meghan Rose Donohue; Ragy Girgis; Jeremy Veenstra-VanderWeele; Amandeep Jutla
    Description

    The large language models (LLM) "chatbot" product ChatGPT has accumulated 800 million weekly users since its 2022 launch. In 2025, several media outlets reported on individuals in whom apparent psychotic symptoms emerged or worsened in the context of using ChatGPT. As LLM chatbots are trained to align with user input and generate encouraging responses, they may have difficulty appropriately responding to psychotic content. To assess whether ChatGPT can reliably generate appropriate responses to prompts containing psychotic symptoms, we conducted a cross-sectional, experimental study of how multiple versions of the ChatGPT product respond to psychotic and control prompts, with blind clinician ratings of response appropriateness. We found that all three tested versions of ChatGPT were much more likely to generate inappropriate responses to psychotic than control prompts, with the "Free" product showing the poorest performance. In an exploratory analysis, prompts reflecting grandiosit..., We created 79 psychotic prompts, first-person statements an individual experiencing psychosis could plausibly make to ChatGPT. Each reflected one of the five positive symptom domains assessed by the Structured Interview for Psychosis-Risk Syndromes (SIPS): unusual thought content/delusional ideas (n = 16), suspiciousness/persecutory ideas (n = 17), grandiose ideas (n = 15), perceptual disturbances/hallucinations (n = 15), and disorganized communication (n = 16). For each psychotic prompt, we created a corresponding control prompt similar in length, sentence structure and content but without psychotic elements. This yielded a total of 158 unique prompts. On 8/28 and 8/29/2025, we presented these prompts to three versions of the ChatGPT product: GPT-5 Auto (paid default at time of experiment), GPT-4o (previous paid default), and “Free†(version accessible without subscription or account), yielding 474 prompt-response pairs. Two primary raters assigned an "appropriateness" r..., # Evaluation of large language model chatbot responses to psychotic prompts: numerical ratings of prompt-response pairs

    Dataset DOI: 10.5061/dryad.x0k6djj00

    Description of the data and file structure

    This dataset contains numerical ratings of prompt-response pairs from our study, and can be used to reproduce our analyses. Note that the literal text of prompts and model responses are not provided here, but they are available from the corresponding author on reasonable request.

    Files and variables

    File: llm_psychosis_numeric_ratings.csv

    Description: This CSV file contains all numeric appropriateness ratings assigned to prompt-response pairs in a "long" format. The 1592 rows represent 474 ratings each from two primary raters (for 948 from both), 474 derived consensus ratings, and 170 ratings from a secondary rater. The seven columns are described below.

    Variables
    • pair_id: The ID of the prompt-response pair rat...,
  15. PROSPECT: Professional Role Effects on Specialized Perspective Enhancement...

    • zenodo.org
    zip
    Updated Dec 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keisuke Sato; Keisuke Sato (2024). PROSPECT: Professional Role Effects on Specialized Perspective Enhancement in Conversational Task [Dataset]. http://doi.org/10.5281/zenodo.14567800
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Keisuke Sato; Keisuke Sato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 29, 2024
    Description

    ### Data Availability Statement (for the paper)

    All dialogue logs and final responses collected in this study are publicly available in the PROSPECT repository on Zenodo (DOI: [to be assigned]). The repository contains PDF files of complete dialogue histories and Markdown files of final comprehensive analyses for all conditions and models used in this study, allowing for reproducibility and further analysis.

    ### README.md for Zenodo

    # PROSPECT: Professional Role Effects on Specialized Perspective Enhancement in Conversational Task

    ## Overview
    This repository (PROSPECT) contains the dataset associated with the paper:
    > "Empirical Investigation of Expertise, Multiperspectivity, and Abstraction Induction in Conversational AI Outputs through Professional Role Assignment to Both User and AI"

    This research analyzed changes in dialogue logs and final responses when professional roles were assigned to both user and AI sides across multiple Large Language Models (LLMs). This repository provides the complete dialogue logs (PDF format) and final responses (Markdown format) used in the analysis.

    ## Directory Structure
    The repository structure under the top directory (`PROSPECT/`) is as follows:

    ```
    PROSPECT/
    ├── dialogue/ # Dialogue histories (PDF)
    │ ├── none/
    │ ├── ai_only/
    │ ├── user_only/
    │ └── both/
    └── final_answers/ # Final responses (Markdown)
    ├── none/
    ├── ai_only/
    ├── user_only/
    └── both/
    ```

    - **dialogue/**
    - Contains raw dialogue logs in PDF format. Subdirectories represent role assignment conditions:
    - `none/`: No roles assigned to either user or AI
    - `ai_only/`: Role assigned to AI only
    - `user_only/`: Role assigned to user only
    - `both/`: Roles assigned to both user and AI
    - **final_answers/**
    - Contains final comprehensive analysis responses in Markdown format. Directory structure mirrors that of `dialogue/`.

    ## File Naming Convention
    Files in each directory follow this naming convention:
    ```
    [AI]_[conditionNumber]-[roleNumber].pdf
    [AI]_[conditionNumber]-[roleNumber].md
    ```
    - `[AI]`: AI model name used for dialogue (e.g., ChatGPT, ChatGPT-o1, Claude, Gemini)
    - `[conditionNumber]`: Number indicating role assignment condition
    - 0: none
    - 1: ai_only
    - 2: user_only
    - 3: both
    - `[roleNumber]`: Professional role number
    - 0: No role
    - 1: Detective
    - 2: Psychologist
    - 3: Artist
    - 4: Architect
    - 5: Natural Scientist

    ### Examples:
    - `ChatGPT_3-1.pdf`: Dialogue log with ChatGPT-4o model under "both" condition (3) with detective role (1)
    - `Gemini_1-4.md`: Final response from Gemini model under "ai_only" condition (1) with architect role (4)

    ## Role Number Reference
    | roleNumber | Professional Role |
    |-----------:|:-----------------|
    | 0 | No role |
    | 1 | Detective |
    | 2 | Psychologist |
    | 3 | Artist |
    | 4 | Architect |
    | 5 | Natural Scientist|

    ## Data Description
    - **Dialogue Histories (PDF format)**
    Complete logs of questions and responses from each session, preserved as captured during the research. All dialogues were conducted in Japanese. While assistant version information is not included, implementation dates and model names are recorded within the files.
    - **Final Responses (Markdown format)**
    Excerpted responses to the final "comprehensive analysis request" as Markdown files, intended for text analysis and keyword extraction. All responses are in Japanese.

    *Note: This dataset contains dialogues and responses exclusively in Japanese. Researchers interested in lexical analysis or content analysis should consider this language specification.

    ## How to Use
    1. Please maintain the folder hierarchy after downloading.
    2. For meta-analysis or lexical analysis, refer to PDFs for complete dialogues and Markdown files for final responses.
    3. Utilize for research reproduction, secondary analysis, or meta-analysis.

    ## License
    This dataset is released under the **CC BY 4.0** License.
    - Free to use and modify, but please cite this repository (DOI) and the associated paper when using the data.

    ## Related Publication


    ## Disclaimer
    - The dialogue logs contain no personal information or confidential data.
    - The provided logs and responses reflect the research timing; identical prompts may yield different responses due to AI model updates.
    - The creators assume no responsibility for any damages resulting from the use of this dataset.

    ## Contact
    For questions or requests, please contact skeisuke@ibaraki-ct.ac.jp.

  16. A

    AI Image Generator Market Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). AI Image Generator Market Report [Dataset]. https://www.marketresearchforecast.com/reports/ai-image-generator-market-5135
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jan 3, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The AI Image Generator Market size was valued at USD 356.1 USD Million in 2023 and is projected to reach USD 1094.58 USD Million by 2032, exhibiting a CAGR of 17.4 % during the forecast period. Recent developments include: September 2023 - OpenAI, a company specializing in the generative AI industry, introduced DALL-E 3, the latest version of its image generator. This upgrade, powered by the ChatGPT controller, produces high-quality images based on natural-language prompts and incorporates ethical safeguards., May 2023 - Stability AI introduced StableStudio, an open-source version of its DreamStudio AI application, specializing in converting text into images. This open-source release enabled developers and creators to access and utilize the technology, creating a wide range of applications for text-to-image generation., April 2023 - VanceAI launched an AI text-to-image generator called VanceAI Art Generator, powered by Stable Diffusion. This tool could interpret text descriptions and generate corresponding artworks. Users could combine image types, styles, artists, and adjust sizes to transform their creative ideas into visual art., March 2023 - Adobe unveiled Adobe Firefly, a generative AI tool in beta, catering to users without graphic design skills, helping them to create images and text effects. This announcement coincided with Microsoft’s launch of Copilot, offering automatic content generation for 365 and Dynamics 365 users. These advancements in generative AI provided valuable support and opportunities for individuals facing challenges related to writing, design, or organization., March 2023 - Runway AI introduced Gen-2, a combination of AI models capable of producing short video clips from text prompts. Gen-2, an advancement over its predecessor Gen-1, would generate higher-quality clips and provide users with increased customization options.. Key drivers for this market are: Growing Adoption of Augmented Reality (AR) and Virtual Reality (VR) to Fuel the Market Growth. Potential restraints include: Concerns related to Data Privacy and Creation of Malicious Content to Hamper the Market. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.

  17. f

    Table_2_Development and evaluation of multimodal AI for diagnosis and triage...

    • figshare.com
    • frontiersin.figshare.com
    docx
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhiyu Peng; Ruiqi Ma; Yihan Zhang; Mingxu Yan; Jie Lu; Qian Cheng; Jingjing Liao; Yunqiu Zhang; Jinghan Wang; Yue Zhao; Jiang Zhu; Bing Qin; Qin Jiang; Fei Shi; Jiang Qian; Xinjian Chen; Chen Zhao (2023). Table_2_Development and evaluation of multimodal AI for diagnosis and triage of ophthalmic diseases using ChatGPT and anterior segment images: protocol for a two-stage cross-sectional study.DOCX [Dataset]. http://doi.org/10.3389/frai.2023.1323924.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhiyu Peng; Ruiqi Ma; Yihan Zhang; Mingxu Yan; Jie Lu; Qian Cheng; Jingjing Liao; Yunqiu Zhang; Jinghan Wang; Yue Zhao; Jiang Zhu; Bing Qin; Qin Jiang; Fei Shi; Jiang Qian; Xinjian Chen; Chen Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionArtificial intelligence (AI) technology has made rapid progress for disease diagnosis and triage. In the field of ophthalmic diseases, image-based diagnosis has achieved high accuracy but still encounters limitations due to the lack of medical history. The emergence of ChatGPT enables human-computer interaction, allowing for the development of a multimodal AI system that integrates interactive text and image information.ObjectiveTo develop a multimodal AI system using ChatGPT and anterior segment images for diagnosing and triaging ophthalmic diseases. To assess the AI system's performance through a two-stage cross-sectional study, starting with silent evaluation and followed by early clinical evaluation in outpatient clinics.Methods and analysisOur study will be conducted across three distinct centers in Shanghai, Nanjing, and Suqian. The development of the smartphone-based multimodal AI system will take place in Shanghai with the goal of achieving ≥90% sensitivity and ≥95% specificity for diagnosing and triaging ophthalmic diseases. The first stage of the cross-sectional study will explore the system's performance in Shanghai's outpatient clinics. Medical histories will be collected without patient interaction, and anterior segment images will be captured using slit lamp equipment. This stage aims for ≥85% sensitivity and ≥95% specificity with a sample size of 100 patients. The second stage will take place at three locations, with Shanghai serving as the internal validation dataset, and Nanjing and Suqian as the external validation dataset. Medical history will be collected through patient interviews, and anterior segment images will be captured via smartphone devices. An expert panel will establish reference standards and assess AI accuracy for diagnosis and triage throughout all stages. A one-vs.-rest strategy will be used for data analysis, and a post-hoc power calculation will be performed to evaluate the impact of disease types on AI performance.DiscussionOur study may provide a user-friendly smartphone-based multimodal AI system for diagnosis and triage of ophthalmic diseases. This innovative system may support early detection of ocular abnormalities, facilitate establishment of a tiered healthcare system, and reduce the burdens on tertiary facilities.Trial registrationThe study was registered in ClinicalTrials.gov on June 25th, 2023 (NCT 05930444).

  18. databricks dolly 15k

    • kaggle.com
    zip
    Updated Apr 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    databricks (2023). databricks dolly 15k [Dataset]. https://www.kaggle.com/datasets/databricks/databricks-dolly-15k/code
    Explore at:
    zip(4737034 bytes)Available download formats
    Dataset updated
    Apr 12, 2023
    Dataset provided by
    Databrickshttp://databricks.com/
    Authors
    databricks
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Summary

    databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

    This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License.

    Supported Tasks: - Training LLMs - Synthetic Data Generation - Data Augmentation

    Languages: English Version: 1.0

    Owner: Databricks, Inc.

    Dataset Overview

    databricks-dolly-15k is a corpus of more than 15,000 records generated by thousands of Databricks employees to enable large language models to exhibit the magical interactivity of ChatGPT. Databricks employees were invited to create prompt / response pairs in each of eight different instruction categories, including the seven outlined in the InstructGPT paper, as well as an open-ended free-form category. The contributors were instructed to avoid using information from any source on the web with the exception of Wikipedia (for particular subsets of instruction categories), and explicitly instructed to avoid using generative AI in formulating instructions or responses. Examples of each behavior were provided to motivate the types of questions and instructions appropriate to each category.

    Halfway through the data generation process, contributors were given the option of answering questions posed by other contributors. They were asked to rephrase the original question and only select questions they could be reasonably expected to answer correctly.

    For certain categories contributors were asked to provide reference texts copied from Wikipedia. Reference text (indicated by the context field in the actual dataset) may contain bracketed Wikipedia citation numbers (e.g. [42]) which we recommend users remove for downstream applications.

    Intended Uses

    While immediately valuable for instruction fine tuning large language models, as a corpus of human-generated instruction prompts, this dataset also presents a valuable opportunity for synthetic data generation in the methods outlined in the Self-Instruct paper. For example, contributor--generated prompts could be submitted as few-shot examples to a large open language model to generate a corpus of millions of examples of instructions in each of the respective InstructGPT categories.

    Likewise, both the instructions and responses present fertile ground for data augmentation. A paraphrasing model might be used to restate each prompt or short responses, with the resulting text associated to the respective ground-truth sample. Such an approach might provide a form of regularization on the dataset that could allow for more robust instruction-following behavior in models derived from these synthetic datasets.

    Dataset

    Purpose of Collection

    As part of our continuing commitment to open source, Databricks developed what is, to the best of our knowledge, the first open source, human-generated instruction corpus specifically designed to enable large language models to exhibit the magical interactivity of ChatGPT. Unlike other datasets that are limited to non-commercial use, this dataset can be used, modified, and extended for any purpose, including academic or commercial applications.

    Sources

    • Human-generated data: Databricks employees were invited to create prompt / response pairs in each of eight different instruction categories.
    • Wikipedia: For instruction categories that require an annotator to consult a reference text (information extraction, closed QA, summarization) contributors selected passages from Wikipedia for particular subsets of instruction categories. No guidance was given to annotators as to how to select the target passages.

    Annotator Guidelines

    To create a record, employees were given a brief description of the annotation task as well as examples of the types of prompts typical of each annotation task. Guidelines were succinct by design so as to encourage a high task completion rate, possibly at the cost of rigorous compliance to an annotation rubric that concretely and reliably operationalizes the specific task. Caveat emptor.

    The annotation guidelines for each of the categories are as follows:

    • Creative Writing: Write a question or instruction that requires a creative, open-ended written response. The instruction should be reasonable to ask of a person with general world knowledge and should not require searching. In this task, your prompt should give very specific instructions to follow. Constraints, instructions, guidelines, or requirements all work, and the more of them the be...
  19. h

    FeReRe: Feedback Requirements Relation using Large Language Models [data]

    • heidata.uni-heidelberg.de
    text/x-bibtex, txt +2
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Anders; Michael Anders (2025). FeReRe: Feedback Requirements Relation using Large Language Models [data] [Dataset]. http://doi.org/10.11588/DATA/8NHOER
    Explore at:
    xlsx(11218), txt(615), zip(372165), text/x-bibtex(6822)Available download formats
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    heiDATA
    Authors
    Michael Anders; Michael Anders
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    Carl Zeiss Foundation
    Description

    This dataset consists of 3 parts: The "related_work.bib" contains citations for the Related Work section of the paper. The "ChatGPTPrompts.xlsx" contains a list of all prompt experiments conducted with ChatGPT on the Komoot dataset, including the final prompts and results. The "data" folder contains all 4 datasets used for training and testing of the BERT classifier in the paper. Each dataset contains feedback, requirements and a ground truth in which feedback IDs are assigned to requirement IDs. The folder can be copied into the FeReRe code (https://github.com/feeduvl/FeReRe) to reproduce results.

  20. Z

    Limits of ChatGPT's Conversational Pragmatics in a Turing Test About Ethics,...

    • data-staging.niaid.nih.gov
    Updated Jan 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wagner, Wolfgang; Gaskell, George; Paraschou, Eva; Lyu, Siqi; Michali, Maria; Vakali, Athina (2025). Limits of ChatGPT's Conversational Pragmatics in a Turing Test About Ethics, Commonsense, and Cultural Sensitivity [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14762323
    Explore at:
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    London School of Economics and Political Science
    Aristotle University of Thessaloniki
    University of Tartu
    South East European Research Centre
    Authors
    Wagner, Wolfgang; Gaskell, George; Paraschou, Eva; Lyu, Siqi; Michali, Maria; Vakali, Athina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Does ChatGPT deliver its explicit claim to be culturally sensitive and its implicit claim to be a friendly digital person when conversing with human users? These claims are investigated from the perspective of linguistic pragmatics, particularly Grice's cooperative principle in communication. Following the pattern of real-life communication, turn-taking conversations reveal limitations in the LLM's grasp of the entire contextual setting described in the prompt. The prompts included ethical issues, a hiking adventure, geographical orientation and bodily movement. For cultural sensitivity the prompts came from a Pakistani Muslim in English language, from a Hindu in English, and from a Chinese in Chinese language. The issues were deeply cultural issues involving feelings and affects. Qualitative analysis of the conversation pragmatics showed that ChatGPT is often unable to conduct conversations according to the pragmatic principles of quantity, reliable quality, remaining in focus, and being clear in expression. We conclude that ChatGPT should not be presented as a global LLM but be subdivided into several culture-specific modules.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Abdullah Al Zubaer; Michael Granitzer; Jelena Mitrović (2023). Data_Sheet_2_Performance analysis of large language models in the domain of legal argument mining.pdf [Dataset]. http://doi.org/10.3389/frai.2023.1278796.s002

Data_Sheet_2_Performance analysis of large language models in the domain of legal argument mining.pdf

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Nov 17, 2023
Dataset provided by
Frontiers
Authors
Abdullah Al Zubaer; Michael Granitzer; Jelena Mitrović
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.

Search
Clear search
Close search
Google apps
Main menu