Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data contain 63 UN policy documents and press releases from the Digital Library covering the years 2017 to March 2024 and over 700 distinct codes for Thematic Analysis generated by the custom GPT model developed for the AI & Geopolitics Project (AIxGEO). More information can be found in the ReadMe file.
The potential of using Chat GPT and AI to revolutionize the way we interact with computers, specifically in the field of medical diagnostics. Chat GPT can make conversations between doctors and patients more natural, while AI can analyze vast amounts of patient data to identify trends and estimate a patient’s health. Patients can use Chat GPT to better understand their medical conditions, and both Chat GPT and AI can be used to automate tasks such as scheduling appointments and processing test results. However, there are limitations to using AI, including data bias, complex results, and analysis errors. To reduce errors, it is important to validate findings using various techniques and ensure that data is accurate and up-to-date. Chat GPT also employs security measures to protect patient data privacy and confidentiality.
Comparison of Tokens used to run all evaluations in the Artificial Analysis Intelligence Index by Model
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Seconds to Output 500 Tokens, including reasoning model 'thinking' time by Model
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We propose an instruction-based process for trustworthy data curation in materials science (MatSci-Instruct), which we then apply to finetune a LLaMa-based language model targeted for materials science (HoneyBee). MatSci-Instruct helps alleviate the scarcity of relevant, high-quality materials science textual data available in the open literature, and HoneyBee is the first billion-parameter language model specialized to materials science. In MatSci-Instruct we improve the trustworthiness of generated data by prompting multiple commercially available large language models for generation with an Instructor module (e.g. Chat-GPT) and verification from an independent Verifier module (e.g. Claude). Using MatSci-Instruct, we construct a dataset of multiple tasks and measure the quality of our dataset along multiple dimensions, including accuracy against known facts, relevance to materials science, as well as completeness and reasonableness of the data. Moreover, we iteratively generate more targeted instructions and instruction-data in a finetuning-evaluation-feedback loop leading to progressively better performance for our finetuned HoneyBee models. Our evaluation on the MatSci-NLP benchmark shows HoneyBee's outperformance of existing language models on materials science tasks and iterative improvement in successive stages of instruction-data refinement. We study the quality of HoneyBee's language modeling through automatic evaluation and analyze case studies to further understand the model's capabilities and limitations. Our code and relevant datasets are publicly available at https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee.
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Context Window (Tokens) by Model
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Output Speed (Output Tokens per Second) by Model
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Output Tokens Used in Artificial Analysis Intelligence Index (Log Scale) by Model
Comparison of Represents the average of coding benchmarks in the Artificial Analysis Intelligence Index (LiveCodeBench & SciCode) by Model
Comparison of Represents the average of math benchmarks in the Artificial Analysis Intelligence Index (AIME 2024 & Math-500) by Model
Comprehensive comparison of Artificial Analysis Intelligence Index vs. Price (USD per M Tokens, Log Scale, More Expensive to Cheaper) by Model
Comprehensive comparison of Latency (Time to First Token) vs. Output Speed (Output Tokens per Second) by Model
Comparison of Cost (USD) to run all evaluations in the Artificial Analysis Intelligence Index by Model
Comparison of Image Input Price: USD per 1k images at 1MP (1024x1024) by Model
Comprehensive comparison of Output Speed (Output Tokens per Second) vs. Price (USD per M Tokens) by Model
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data contain 63 UN policy documents and press releases from the Digital Library covering the years 2017 to March 2024 and over 700 distinct codes for Thematic Analysis generated by the custom GPT model developed for the AI & Geopolitics Project (AIxGEO). More information can be found in the ReadMe file.