GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OverviewThis study evaluates the diagnostic accuracy of a multimodal large language model (LLM), ChatGPT-4, in recognizing glaucoma using color fundus photographs (CFPs) with a benchmark dataset and without prior training or fine tuning.MethodsThe publicly accessible Retinal Fundus Glaucoma Challenge “REFUGE” dataset was utilized for analyses. The input data consisted of the entire 400 image testing set. The task involved classifying fundus images into either ‘Likely Glaucomatous’ or ‘Likely Non-Glaucomatous’. We constructed a confusion matrix to visualize the results of predictions from ChatGPT-4, focusing on accuracy of binary classifications (glaucoma vs non-glaucoma).ResultsChatGPT-4 demonstrated an accuracy of 90% with a 95% confidence interval (CI) of 87.06%-92.94%. The sensitivity was found to be 50% (95% CI: 34.51%-65.49%), while the specificity was 94.44% (95% CI: 92.08%-96.81%). The precision was recorded at 50% (95% CI: 34.51%-65.49%), and the F1 Score was 0.50.ConclusionChatGPT-4 achieved relatively high diagnostic accuracy without prior fine tuning on CFPs. Considering the scarcity of data in specialized medical fields, including ophthalmology, the use of advanced AI techniques, such as LLMs, might require less data for training compared to other forms of AI with potential savings in time and financial resources. It may also pave the way for the development of innovative tools to support specialized medical care, particularly those dependent on multimodal data for diagnosis and follow-up, irrespective of resource constraints.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the raw data that is used in the publication: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States. In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Weights assigned to the HEART variables by the five ChatGPT-4 models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundChat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that uses deep learning algorithms trained on vast amounts of data to generate human-like texts such as essays. Consequently, it has introduced new challenges and threats to medical education. We assessed the use of ChatGPT and other AI tools among medical students in Uganda.MethodsWe conducted a descriptive cross-sectional study among medical students at four public universities in Uganda from 1st November 2023 to 20th December 2023. Participants were recruited by stratified random sampling. We used a semi-structured questionnaire to collect data on participants’ socio-demographics and use of AI tools such as ChatGPT. Our outcome variable was use of AI tools. Data were analyzed descriptively in Stata version 17.0. We conducted a modified Poisson regression to explore the association between use of AI tools and various exposures.ResultsA total of 564 students participated. Almost all (93%) had heard about AI tools and more than two-thirds (75.7%) had ever used AI tools. Regarding the AI tools used, majority (72.2%) had ever used ChatGPT, followed by SnapChat AI (14.9%), Bing AI (11.5%), and Bard AI (6.9%). Most students use AI tools to complete assignments (55.5%), preparing for tutorials (39.9%), preparing for exams (34.8%) and research writing (24.8%). Students also reported the use of AI tools for nonacademic purposes including emotional support, recreation, and spiritual growth. Older students were 31% less likely to use AI tools compared to younger ones (Adjusted Prevalence Ratio (aPR):0.69; 95% CI: [0.62, 0.76]). Students at Makerere University were 66% more likely to use AI tools compared to students in Gulu University (aPR:1.66; 95% CI:[1.64, 1.69]).ConclusionThe use of ChatGPT and other AI tools was widespread among medical students in Uganda. AI tools were used for both academic and non-academic purposes. Younger students were more likely to use AI tools compared to older students. There is a need to promote AI literacy in institutions to empower older students with essential skills for the digital age. Further, educators should assume students are using AI and adjust their way of teaching and setting exams to suit this new reality. Our research adds further evidence to existing voices calling for regulatory frameworks for AI in medical education.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionArtificial intelligence (AI) has tremendous potential for use in psychology. Among the many applications that may benefit from development of AI applications is narrative-personality assessment. Use of these tools and research methods is notably time-consuming and resource intensive. AI has potential to address these issues in ways that would greatly reduce clinician and researcher burden. Nonetheless, it is unclear if current AI models are sufficiently sophisticated to perform the complex downstream tasks, such as narrative assessment.MethodologyThe purpose of this study is to explore if an expert-refined prompt generation process can enable AI-empowered chatbots to reliably and accurately rate narratives using the Social Cognition and Object Relations scales – Global Rating Method (SCORS-G). Experts generated prompt inputs by engaging in a detailed review of SCORS-G training materials. Prompts were then improved using an systematic process in which experts worked with Llama-2-70b to refine prompts. The utility of the prompts was then tested on two AI-empowered chatbots, ChatGPT-4 (OpenAI, 2023) and CLAUDE-2-100k, that were not used in the prompt refinement process.ResultsResults showed that the refined prompts allowed chatbots to reliably rate narratives at the global level, though accuracy varied across subscales. Averaging ratings from two chatbots notably improved reliability for the global score and all subscale scores. Experimentation indicated that expert-refined prompts outperformed basic prompts regarding interrater reliability and absolute agreement with gold standard ratings. Only the expert-refined prompts were able to generate acceptable single-rater interrater reliability estimates.DiscussionFindings suggest that AI could significantly reduce the time and resource burdens on clinicians and researchers using narrative rating systems like the SCORS-G. Limitations and implications for future research are discussed.
Comparison of Represents the average of coding benchmarks in the Artificial Analysis Intelligence Index (LiveCodeBench & SciCode) by Model
Comparison of Context Window: Tokens Limit; Higher is better by Model
Comparison of Artificial Analysis Intelligence Index vs. Context Window (Tokens) by Model
Comparison of Latency (Time to First Token) vs. Output Speed (Output Tokens per Second) by Model
Comparison of Artificial Analysis Intelligence Index vs. End-to-End Seconds to Output 100 Tokens by Model
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
Comparison of Intelligence Index incorporates 7 evaluations spanning reasoning, knowledge, math & coding by Model
Comparison of Artificial Analysis Intelligence Index vs. Price (USD per M Tokens) by Model
Not seeing a result you expected?
Learn how you can add new datasets to our index.
GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.