Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large language models present new opportunities for teaching and learning. The response accuracy of these models, however, is believed to depend on the prompt quality which can be a challenge for students. In this study, we aimed to explore how undergraduate students use ChatGPT for problem-solving, what prompting strategies they develop, the link between these strategies and the model’s response accuracy, the existence of individual prompting tendencies, and the impact of gender in this context. Our students used ChatGPT to solve five problems related to embedded systems and provided the solutions and the conversations with this model. We analyzed the conversations thematically to identify prompting strategies and applied different quantitative analyses to establish relationships between these strategies and the response accuracy and other factors. The findings indicate that students predominantly employ three types of prompting strategies: single copy-and-paste prompting (SCP), single reformulated prompting (SRP), and multiple-question prompting (MQP). ChatGPT’s response accuracy using SRP and MQP was significantly higher than using SCP, with effect sizes of -0.94 and -0.69, respectively. The student-by-student analysis revealed some tendencies. For example, 26 percent of the students consistently copied and pasted the questions into ChatGPT without any modification. Students who used MQP showed better performance in the final exam than those who did not use this prompting strategy. As for gender, female students tended to make extensive use of SCP, whereas male students tended to mix SCP and MQP. We conclude that students develop different prompting strategies that lead to different response qualities and learning. More research is needed to deepen our understanding and inform effective educational practices in the AI era.
Supplemental Material Contents: · 1-Demographic Information.xlsx: contains the demographic information of the participants in the study. · 2-Forms.zip: contains the forms and questionnaires used to collect data for the experiment: demographic form, pre-study, post-study, and AAR/AI questionnaires. · 3-GitHub-Repository.zip: a copy of the GitHub repository used in the study. · 4-Tutorial Scripts.zip: script used in the experiment with the groups to be consistent with all participants. · 5-Logs-Rubric-Grades.zip: contains the participant data log (commit and PR), rubric for grading submissions, and grades. · 6-RQ1-Data-and-Analysis.zip: contains the data and analysis with respect to RQ1. · 7-RQ2-Data-and-Analysis.zip: contains the data and analysis with respect to RQ2. · 8-Participant Prompts.xlsx: contains the experimental group participant prompts with ChatGPT. 2. Forms.zip The forms zip contains the following files: · Demographics.pdf: a form used to collect demographic information from participants before the study. · Control Pre-Study Questionnaire.pdf: Pre study questionnaire control group (Self-Efficacy Questionnaire) · Control Post-Study Questionnaire.pdf: Post study questionnaire control group (NASA-TLX, Self-Efficacy Questionnaire) · Treatment - AAR_AI task.pdf: Pre and Post task AAR/AI questionnaire for experimental group. · Experimental Pre-Study Questionnaire.pdf: Pre study questionnaire experimental group (Self-Efficacy Questionnaire, Question for Familiarity with AI) · Experimental-Post Study Questionnaire.pdf: Post study questionnaire experimental group (AAR/AI step 7, Continuance Intention, NASA-TLX, HAI Guideline Questions, Self-Efficacy Questionnaire) 3-GitHub-Repository.zip The GitHub repository used in the study: contains the main.py code file and the Readme.md file (having the written instructions for the participants). 4-Tutorial Scripts.zip Contains: · Control-Script.pdf: Script for the control group. · Experimental-Script.pdf: Script for the experimental group. 5-Logs-Rubric-Grades.zip · rubric.pdf: Created rubric for grading task performance. · GitHub-Task3-Log.xlsx: File containing the data regarding the status of commit made and PR raised for each participant. · grades.xlsx: Detailed grades for each participant in experimental (treatment) and control groups. 6-RQ1-Data-and-Analysis.zip Note: The term 'treatment' has been used in the files of this folder to represent the experimental group: participants using ChatGPT for the tasks. · NASA TLX: folder containing the participant data (TLX.xlsx), code for statistical analysis (Stat-TLX.py) and statistical reports (analysis-TLX.csv). · Task Performance: folder containing the participant data (grades.xlsx & Scores.xlsx(overall grade)), code for statistical analysis (Stat-Correctness.py) and statistical reports (analysis.csv). · Self-Efficacy: folder containing: o Self-Efficacy-detailed.xlsx: participant data o Paired Stats: folder containing data (Total Self Efficacy.csv), code for statistical analysis(paired-stats.py), and statistical reports (analysis.csv). o Box plot: folder containing the code for generating the box plot and its output. · Continuance Intention.xlsx: participant data (experimental) for continuance intention of ChatGPT. · Stat-Table-H1-2-Paper.xlsx: Statistics table for NASA TLX and task performance as presented in the paper. 7-RQ2-Data-and-Analysis.zip · AAR_AI-Responses.xlsx: AAR/AI responses filled by participants in experimental group. · Quotation Manager-Faults&Conseq.xlsx: Contains the quotations from AAR/AI responses along with corresponding codes. Also contains the quotes that link faults to consequences in a separate sheet. · Codebook.xlsx: The final codebook (faults and consequences). · HAI-data.xlsx: Contains the reported guideline violations along with disaggregated analysis (grouped by gender). · Likert Plot-HAI: folder contains the code for generating the Likert plot figure presented in the paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data has been created using prompt engineering over chatGPT which has following labels - 0 - negative 1 - neutral 2 - positive
"gpt3.5-gpt4-input-output-echram.zip" : Input and output to GPT-3.5 and GPT-4 based on ECHR dataset published in JSON format in this paper for argument component classification only i.e. clauses that are argumentative (conclusion/premise), extracted from the JSON file Note: Output of the model is under OpenAI Terms & policies. Please cite our paper also if you use this dataset: Performance analysis of large language models in the domain of legal argument mining You can click here for BibTex or copy the text below. @ARTICLE{10.3389/frai.2023.1278796, AUTHOR={Al Zubaer, Abdullah and Granitzer, Michael and Mitrović, Jelena }, TITLE={Performance analysis of large language models in the domain of legal argument mining}, JOURNAL={Frontiers in Artificial Intelligence}, VOLUME={6}, YEAR={2023}, URL={https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1278796}, DOI={10.3389/frai.2023.1278796}, ISSN={2624-8212}, ABSTRACT={Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.}}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generative AI, in particular large language models (LLMs) such as ChatGPT, has the potential to revolutionize research. I describe dozens of use cases along six domains in which LLMs are starting to become useful as both research assistants and tutors: ideation and feedback, writing, background research, data analysis, coding, and mathematical derivations. I provide general instructions and demonstrate specific examples of how to take advantage of each of these, classifying the LLM capabilities from experimental to highly useful. I argue that economists can reap significant productivity gains by taking advantage of generative AI to automate micro tasks. Moreover, these gains will grow as the performance of AI systems across all of these domains will continue to improve. I also speculate on the longer-term implications of AI-powered cognitive automation for economic research.The resources provided here contain the prompts and code to reproduce the chats with GPT-3.5, GPT-4, ChatGPT and Claude 2 that are listed in the paper.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset is from an Indian study which made use of ChatGPT- a natural language processing model by OpenAI to design a mental health literacy intervention for college students. Prompt engineering tactics were used to formulate prompts that acted as anchors in the conversations with the AI agent regarding mental health. An intervention lasting for 20 days was designed with sessions of 15-20 minutes on alternative days. Fifty-one students completed pre-test and post-test measures of mental health literacy, mental help-seeking attitude, stigma, mental health self-efficacy, positive and negative experiences, and flourishing in the main study, which were then analyzed using paired t-tests. The results suggest that the intervention is effective among college students as statistically significant changes were noted in mental health literacy and mental health self-efficacy scores. The study affirms the practicality, acceptance, and initial indications of AI-driven methods in advancing mental health literacy and suggests the promising prospects of innovative platforms such as ChatGPT within the field of applied positive psychology.: Data used in analysis for the intervention study
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The AI Image Generator Market size was valued at USD 356.1 USD Million in 2023 and is projected to reach USD 1094.58 USD Million by 2032, exhibiting a CAGR of 17.4 % during the forecast period. AI image generator refers to a software application for generating image data by means of artificial intelligence, utilizing such models as deep learning, neural networks, and others. Some of them are GANs which stand for Generative Adversarial Networks, VAEs which stand for Variational Autoencoders, and diffusion models. Essential characteristics include crystal clear display of the resultant image, conversion of the source image to another style, and image improvement. It makes use for the generation of art, designing, virtual fitting, and even in-game design . These generators facilitate the quickly and cheaply generated visualization and image modifications depending on certain parameters or styles, hence changing the creative landscapes of various industries by improving efficiency and creativity. Recent developments include: September 2023 - OpenAI, a company specializing in the generative AI industry, introduced DALL-E 3, the latest version of its image generator. This upgrade, powered by the ChatGPT controller, produces high-quality images based on natural-language prompts and incorporates ethical safeguards., May 2023 - Stability AI introduced StableStudio, an open-source version of its DreamStudio AI application, specializing in converting text into images. This open-source release enabled developers and creators to access and utilize the technology, creating a wide range of applications for text-to-image generation., April 2023 - VanceAI launched an AI text-to-image generator called VanceAI Art Generator, powered by Stable Diffusion. This tool could interpret text descriptions and generate corresponding artworks. Users could combine image types, styles, artists, and adjust sizes to transform their creative ideas into visual art., March 2023 - Adobe unveiled Adobe Firefly, a generative AI tool in beta, catering to users without graphic design skills, helping them to create images and text effects. This announcement coincided with Microsoft’s launch of Copilot, offering automatic content generation for 365 and Dynamics 365 users. These advancements in generative AI provided valuable support and opportunities for individuals facing challenges related to writing, design, or organization., March 2023 - Runway AI introduced Gen-2, a combination of AI models capable of producing short video clips from text prompts. Gen-2, an advancement over its predecessor Gen-1, would generate higher-quality clips and provide users with increased customization options.. Key drivers for this market are: Growing Adoption of Augmented Reality (AR) and Virtual Reality (VR) to Fuel the Market Growth. Potential restraints include: Concerns related to Data Privacy and Creation of Malicious Content to Hamper the Market. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This pilot study is the first phase of a broader project aimed at developing an explainable artificial intelligence (AI) tool to support the ethical evaluation of Japanese-language clinical research documents. The tool is explicitly not intended to assist document drafting. We assessed the baseline performance of generative AI—Generative Pre-trained Transformer (GPT)-4 and GPT-4o—in analyzing clinical research protocols and informed consent forms (ICFs). The goal was to determine whether these models could accurately and consistently extract ethically relevant information, including the research objectives and background, research design, and participant-related risks and benefits. First, we compared the performance of GPT-4 and GPT-4o using custom agents developed via OpenAI’s Custom GPT functionality (hereafter “GPTs”). Then, using GPT-4o alone, we compared outputs generated by GPTs optimized with customized Japanese prompts to those generated by standard prompts. GPT-4o achieved 80% agreement in extracting research objectives and background and 100% in extracting research design, while both models demonstrated high reproducibility across ten trials. GPTs with customized prompts produced more accurate and consistent outputs than standard prompts. This study suggests the potential utility of generative AI in pre-institutional review board (IRB) review tasks; it also provides foundational data for future validation and standardization efforts involving retrieval-augmented generation and fine-tuning. Importantly, this tool is intended not to automate ethical review but rather to support IRB decision-making. Limitations include the absence of gold standard reference data, reliance on a single evaluator, lack of convergence and inter-rater reliability analysis, and the inability of AI to substitute for in-person elements such as site visits.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large language models present new opportunities for teaching and learning. The response accuracy of these models, however, is believed to depend on the prompt quality which can be a challenge for students. In this study, we aimed to explore how undergraduate students use ChatGPT for problem-solving, what prompting strategies they develop, the link between these strategies and the model’s response accuracy, the existence of individual prompting tendencies, and the impact of gender in this context. Our students used ChatGPT to solve five problems related to embedded systems and provided the solutions and the conversations with this model. We analyzed the conversations thematically to identify prompting strategies and applied different quantitative analyses to establish relationships between these strategies and the response accuracy and other factors. The findings indicate that students predominantly employ three types of prompting strategies: single copy-and-paste prompting (SCP), single reformulated prompting (SRP), and multiple-question prompting (MQP). ChatGPT’s response accuracy using SRP and MQP was significantly higher than using SCP, with effect sizes of -0.94 and -0.69, respectively. The student-by-student analysis revealed some tendencies. For example, 26 percent of the students consistently copied and pasted the questions into ChatGPT without any modification. Students who used MQP showed better performance in the final exam than those who did not use this prompting strategy. As for gender, female students tended to make extensive use of SCP, whereas male students tended to mix SCP and MQP. We conclude that students develop different prompting strategies that lead to different response qualities and learning. More research is needed to deepen our understanding and inform effective educational practices in the AI era.