MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MrMaxMind99/Medical-QA-RS dataset hosted on Hugging Face and contributed by the HF Datasets community
all-processed dataset is a concatenation of of medical-meadow-* and chatdoctor_healthcaremagic datasets The Chat Doctor term is replaced by the chatbot term in the chatdoctor_healthcaremagic dataset Similar to the literature the medical_meadow_cord19 dataset is subsampled to 50,000 samples truthful-qa-* is a benchmark dataset for evaluating the truthfulness of models in text generation, which is used in Llama 2 paper. Within this dataset, there are 55 and 16 questions related to Health and… See the full description on the dataset page: https://huggingface.co/datasets/lavita/medical-qa-datasets.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
This dataset was designed and created to enable advancements in healthcare-focused large language models, particularly in the context of retrieval-augmented clinical question-answering capabilities. Developed using a self-constructed pipeline based on the 13-billion parameter Meta Llama 2 model, this dataset encompasses 21466 medical discharge summaries extracted from the MIMIC-IV-Note dataset, with 156599 synthetically generated question-and-answer pairs, a subset of which was verified for accuracy by a physician. These pairs were generated by providing the model with a discharge summary and instructing it to generate question-and-answer pairs based on the contextual information present in the summaries. This work aims to generate data in support of the development of compact large language models capable of efficiently extracting information from medical notes and discharge summaries, thus enabling potential improvements for real-time decision-making processes in clinical settings. Additionally, accompanying the dataset is code facilitating question-and-answer pair generation from any medical and non-medical text. Despite the robustness of the presented dataset, it has certain limitations. The generation process was confined to a maximum context length of 6000 input tokens, owing to hardware constraints. The large language model's nature in generating these question-and-answer pairs may introduce an underlying bias or a lack in diversity and complexity. Future iterations should focus on rectifying these issues, possibly through diversified training and expanded verification procedures as well as the employment of more powerful large language models.
Dataset Card for "medical-qa-shared-task-v1-toy-eval"
More Information needed
The current dataset is a subset and reformatting of a more raw dataset. The focus here is only on US questions and answers split into dev, train, and test sets in separate json files. This format ought to be easier to use. This notebook captures how the conversion was done.
The more raw dataset is pulled from paperswithcode which was originally pulled from A Large-scale Open Domain Question Answering Dataset from Medical Exams
The dataset is collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively.
This is under the MIT License
MIT License (As given on github)
Copyright (c) 2022 Di Jin
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Written with StackEdit.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A large non-factoid English consumer Question Answering (QA) dataset containing 51,000 pairs of consumer questions and their corresponding expert answers. This dataset is useful for bench-marking or training systems on more difficult real-world questions and responses which may contain spelling or formatting errors, or lexical gaps between consumer and expert vocabularies.
By downloading this dataset, you agree to have obtained ethics approval from your institution. Lineage: We collected data from posts and comments to subreddit /r/askdocs, published between July 10, 2013, and April 2, 2022, totalling 600,000 submissions (original posts) and 1,700,000 comments (replies). We generated question-answer pairs by taking the highest scoring answer from a verified medical expert to a Reddit question. Questions with only images are removed, all links are removed and authors are removed.
We provide two separate datasets in this collection and provide the following schemas. MedRedQA - Reddit Medical Question and Answer pairs from /r/askdocs. CSV format. i. the poster's question (Body) ii. Title of the post iii. The filtered answer from a verified physician comment (Response) iv. Occupation indicated for verification status v. Any PMCIDs found in the post
MedRedQA+PubMed - PubMed Enriched subset of MedRedQA. JSON format. i. Question. The user's original question. The is equivalent to the Body field in MedRedQA ii. Document: The abstract of the PubMed document (if it exists and contains an abstract) for that particular post. Note: it does not necessarily mean the answer references this document. But at least one other verified physician in the responses has mentioned that particular document. iii. The filtered response. This is equivalent to the Response field in MedRedQA.
Faithality/merged-medical-qa dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
Dataset Description
Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]
Dataset Sources [optional]
Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/TUDB-Labs/medical-qa.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
# Patient Doctor Q&A TR 321179 Veri Seti
Patient Doctor Q&A TR 321179 veri seti, [**Patient Doctor Q&A TR 19583**](https://www.kaggle.com/datasets/kaayra2000/patient-doctor-qa-dataset-tr), [**Patient Doctor Q&A TR 167732**](https://www.kaggle.com/datasets/kaayra2000/patient-doctor-q-and-a-tr-167732), [**Patient Doctor Q&A TR 5695**](https://www.kaggle.com/datasets/kaayra2000/patient-doctor-q-and-a-translated-from-id-to-tr) ve [**Patient Doctor Q&A TR 95588**](https://www.kaggle.com/datasets/kaayra2000/patient-doctor-q-and-a-tr-95588) veri setlerinin birleştirilmiş ve karıştırılmış halidir.
## Ana Özellikler:
* İçerik: Çeşitli tıbbi konuları kapsayan hasta soruları ve doktor yanıtları.
* Yapı: 2 sütun içerir: Soru, Cevap.
* Dil: Türkçe.
## Potansiyel Kullanım Alanları:
* Tıbbi araştırmalar
* Doğal Dil İşleme (NLP)
* Tıbbi eğitim
## Sınırlamalar:
* Veri gizliliği endişeleri
* Yanıt kalitesinde değişkenlik
* Potansiyel önyargılar
## Genel Değerlendirme:
Patient Doctor Q&A TR 321179 veri seti, gerçek dünyadaki tıbbi iletişimi ve bilgi alışverişini anlamak için değerli bir kaynaktır. Türkçeye çevrilmiş bu veri seti, tıbbi araştırmalar ve eğitim için önemli bir kaynak olup, hasta ve doktor arasındaki iletişimi analiz etmek için kullanılabilir. Ancak, veri gizliliği ve yanıt kalitesindeki değişkenlik gibi sınırlamalar göz önünde bulundurulmalıdır.
Bu veri seti, araştırmacılara ve eğitimcilere, Türkçe tıbbi iletişim verilerini kullanarak daha derinlemesine analiz yapma ve doğal dil işleme tekniklerini uygulama fırsatı sunar.
# Patient Doctor Q&A TR 321179 Dataset
The Patient Doctor Q&A TR 321179 dataset is a combined and shuffled version of the [**Patient Doctor Q&A TR 19583**](https://www.kaggle.com/datasets/kaayra2000/patient-doctor-qa-dataset-tr), [**Patient Doctor Q&A TR 167732**](https://www.kaggle.com/datasets/kaayra2000/patient-doctor-q-and-a-tr-167732), [**Patient Doctor Q&A TR 5695**](https://www.kaggle.com/datasets/kaayra2000/patient-doctor-q-and-a-translated-from-id-to-tr), and [**Patient Doctor Q&A TR 95588**](https://www.kaggle.com/datasets/kaayra2000/patient-doctor-q-and-a-tr-95588) datasets.
## Main Features:
* Content: Patient questions and doctor answers covering various medical topics.
* Structure: Contains 2 columns: Question, Answer.
* Language: Turkish.
## Potential Uses:
* Medical research
* Natural Language Processing (NLP)
* Medical education
## Limitations:
* Data privacy concerns
* Variability in answer quality
* Potential biases
## General Assessment:
The Patient Doctor Q&A TR 321179 dataset is a valuable resource for understanding real-world medical communication and information exchange. This dataset, translated into Turkish, is an important resource for medical research and education, and can be used to analyze communication between patients and doctors. However, limitations such as data privacy and variability in answer quality should be considered.
This dataset offers researchers and educators the opportunity to conduct more in-depth analyses and apply natural language processing techniques using Turkish medical communication data.
Juliabot/medical-qa-formatted dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset has a total of 364,420 pieces of medical QA data, some of which have multiple questions in different ways. We extract medical QA pairs from plain texts (e.g., medical encyclopedias and medical articles). We collected 8,699 encyclopedia entries for diseases and 2,736 encyclopedia entries for medicines on Chinese Wikipedia. Moreover, we crawled 226,432 high-quality medical articles from the Qianwen Health website.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and consequently fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are generated by medical experts from 100+ MIMIC-III, version 1.4, discharge summaries. These discharge summaries overlap with the n2c2 challenge, so they are filled in with surrogate PHI. We analyze this dataset to characterize the types of information sought by medical experts. We also train baseline models for trigger detection and question generation (QG), paired with unsupervised answer retrieval over EHRs. Our baseline model is able to generate high quality questions in over 62% of cases when prompted with human selected triggers. We release this dataset (and a link to all code to reproduce baseline model results) to facilitate further research into realistic clinical QA and QG.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
bala1524/Medical-QA-Mistral7B-Finetuning dataset hosted on Hugging Face and contributed by the HF Datasets community
Starlord1010/Medical-QA-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents the number of visitors to public health centers in Qatar. It is categorized by the name of the health center and helps evaluate patient load, service demand, and regional distribution of healthcare access across the country.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains trade data on optical, medical, and precision instruments, including imports and re-exports. It supports analysis of Qatar’s scientific and medical equipment market.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Market Overview According to market research, the global Daily QA Check Device market is projected to reach a significant X million value by 2033, expanding at a robust CAGR of X%. This expansion is attributed to various driving factors, including the increasing demand for quality assurance in healthcare, food production, and education. The growing adoption of artificial intelligence (AI) and IoT technologies in these sectors has also contributed to the market's growth. Market Segmentation The Daily QA Check Device market can be segmented based on application, type, and region. By application, the market caters to educational institutions, food production industry, medical institutions, and others. Medical institutions dominate the market due to the stringent regulations for maintaining the accuracy and reliability of medical equipment. By type, the market is classified into basic, intelligent, and professional types. Professional-type devices offer advanced features and automation, leading to their popularity in hospitals and research labs. Regionally, North America holds the largest market share, followed by Europe and Asia Pacific. The Asia Pacific region is expected to witness substantial growth due to the rising demand for quality assurance in emerging economies. Key players in the industry include Guangzhou Raydose Software Technology LLC, Sichuan Jingwei Food Testing Technology, and Shenzhen Ruikang'an Technology Development.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a comprehensive compilation of questions related to dermatology, spanning inquiries about various skin diseases, their symptoms, recommended medications, and available treatment modalities. Each question is paired with a concise and informative response, making it an ideal resource for training and fine-tuning language models in the field of dermatological healthcare. The dataset is designed to facilitate the development of advanced medical chat-bots and language models tailored to dermatology, providing valuable insights into skin health-related inquiries.
Please Explore the Work Here: https://github.com/Mreeb/llama2-Fine-tuning-On-Custom-Medical_data/tree/master
This report examines the number, percentage, and characteristics of specialty mental health treatment facilities in the United States that use three quality assurance practices related to the behavioral health workforce as part of their standard operating procedures.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MrMaxMind99/Medical-QA-RS dataset hosted on Hugging Face and contributed by the HF Datasets community