100+ datasets found
  1. AI medical chatbot

    • kaggle.com
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yousef Saeedian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description:

    This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

    Key Features:

    • Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.
    • Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.
    • Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.
    • Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

    Potential Use Cases:

    • NLP Model Training: Train models to understand and generate medical dialogues.
    • Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.
    • Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.
    • Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

    This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.

  2. Medical_cost_dataset

    • kaggle.com
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandita Pore (2023). Medical_cost_dataset [Dataset]. https://www.kaggle.com/datasets/nanditapore/medical-cost-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nandita Pore
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description:

    Explore the intricacies of medical costs and healthcare expenses with our meticulously curated Medical Cost Dataset. This dataset offers valuable insights into the factors influencing medical charges, enabling researchers, analysts, and healthcare professionals to gain a deeper understanding of the dynamics within the healthcare industry.

    Columns: 1. ID: A unique identifier assigned to each individual record, facilitating efficient data management and analysis. 2. Age: The age of the patient, providing a crucial demographic factor that often correlates with medical expenses. 3. Sex: The gender of the patient, offering insights into potential cost variations based on biological differences. 4. BMI: The Body Mass Index (BMI) of the patient, indicating the relative weight status and its potential impact on healthcare costs. 5. Children: The number of children or dependents covered under the medical insurance, influencing family-related medical expenses. 6. Smoker: A binary indicator of whether the patient is a smoker or not, as smoking habits can significantly impact healthcare costs. 7. Region: The geographic region of the patient, helping to understand regional disparities in healthcare expenditure. 8. Charges: The medical charges incurred by the patient, serving as the target variable for analysis and predictions.

    Whether you're aiming to uncover patterns in medical billing, predict future healthcare costs, or explore the relationships between different variables and charges, our Medical Cost Dataset provides a robust foundation for your research. Researchers can utilize this dataset to develop data-driven models that enhance the efficiency of healthcare resource allocation, insurers can refine pricing strategies, and policymakers can make informed decisions to improve the overall healthcare system.

    Unlock the potential of healthcare data with our comprehensive Medical Cost Dataset. Gain insights, make informed decisions, and contribute to the advancement of healthcare economics and policy. Start your analysis today and pave the way for a healthier future.

  3. Health Care Analytics

    • kaggle.com
    Updated Jan 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abishek Sudarshan
    Description

    Context

    Part of Janatahack Hackathon in Analytics Vidhya

    Content

    The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

    MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

    MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

    One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

    The Process:

    MedCamp employees / volunteers reach out to people and drive registrations.
    During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.
    

    Other things to note:

    Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people.
    For a few camps, there was hardware failure, so some information about date and time of registration is lost.
    MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides  
    information about several health issues through various awareness stalls.
    

    Favorable outcome:

    For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall.
    You need to predict the chances (probability) of having a favourable outcome.
    

    Train / Test split:

    Camps started on or before 31st March 2006 are considered in Train
    Test data is for all camps conducted on or after 1st April 2006.
    

    Acknowledgements

    Credits to AV

    Inspiration

    To share with the data science community to jump start their journey in Healthcare Analytics

  4. f

    Medical dataset

    • figshare.com
    png
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jyotismita Chaki (2023). Medical dataset [Dataset]. http://doi.org/10.6084/m9.figshare.21894681.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    figshare
    Authors
    Jyotismita Chaki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data is related to the medical book

  5. o

    Public Health Portfolio dataset

    • nihr.opendatasoft.com
    • nihr.aws-ec2-eu-central-1.opendatasoft.com
    csv, excel, json
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Public Health Portfolio dataset [Dataset]. https://nihr.opendatasoft.com/explore/dataset/phof-datase/
    Explore at:
    excel, json, csvAvailable download formats
    Dataset updated
    May 29, 2025
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The NIHR is one of the main funders of public health research in the UK. Public health research falls within the remit of a range of NIHR Research Programmes, NIHR Centres of Excellence and Facilities, plus the NIHR Academy. NIHR awards from all NIHR Research Programmes and the NIHR Academy that were funded between January 2006 and the present extraction date are eligible for inclusion in this dataset. An agreed inclusion/exclusion criteria is used to categorise awards as public health awards (see below). Following inclusion in the dataset, public health awards are second level coded to one of the four Public Health Outcomes Framework domains. These domains are: (1) wider determinants (2) health improvement (3) health protection (4) healthcare and premature mortality.More information on the Public Health Outcomes Framework domains can be found here.This dataset is updated quarterly to include new NIHR awards categorised as public health awards. Please note that for those Public Health Research Programme projects showing an Award Budget of £0.00, the project is undertaken by an on-call team for example, PHIRST, Public Health Review Team, or Knowledge Mobilisation Team, as part of an ongoing programme of work.Inclusion criteriaThe NIHR Public Health Overview project team worked with colleagues across NIHR public health research to define the inclusion criteria for NIHR public health research awards. NIHR awards are categorised as public health awards if they are determined to be ‘investigations of interventions in, or studies of, populations that are anticipated to have an effect on health or on health inequity at a population level.’ This definition of public health is intentionally broad to capture the wide range of NIHR public health awards across prevention, health improvement, health protection, and healthcare services (both within and outside of NHS settings). This dataset does not reflect the NIHR’s total investment in public health research. The intention is to showcase a subset of the wider NIHR public health portfolio. This dataset includes NIHR awards categorised as public health awards from NIHR Research Programmes and the NIHR Academy. This dataset does not currently include public health awards or projects funded by any of the three NIHR Research Schools or any of the NIHR Centres of Excellence and Facilities. Therefore, awards from the NIHR Schools for Public Health, Primary Care and Social Care, NIHR Public Health Policy Research Unit and the NIHR Health Protection Research Units do not feature in this curated portfolio.DisclaimersUsers of this dataset should acknowledge the broad definition of public health that has been used to develop the inclusion criteria for this dataset. This caveat applies to all data within the dataset irrespective of the funding NIHR Research Programme or NIHR Academy award.Please note that this dataset is currently subject to a limited data quality review. We are working to improve our data collection methodologies. Please also note that some awards may also appear in other NIHR curated datasets. Further informationFurther information on the individual awards shown in the dataset can be found on the NIHR’s Funding & Awards website here. Further information on individual NIHR Research Programme’s decision making processes for funding health and social care research can be found here.Further information on NIHR’s investment in public health research can be found as follows: NIHR School for Public Health here. NIHR Public Health Policy Research Unit here. NIHR Health Protection Research Units here. NIHR Public Health Research Programme Health Determinants Research Collaborations (HDRC) here. NIHR Public Health Research Programme Public Health Intervention Responsive Studies Teams (PHIRST) here.

  6. m

    AHD: Arabic Healthcare Dataset

    • data.mendeley.com
    Updated Sep 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hezam Gawbah (2024). AHD: Arabic Healthcare Dataset [Dataset]. http://doi.org/10.17632/mgj29ndgrk.6
    Explore at:
    Dataset updated
    Sep 4, 2024
    Authors
    Hezam Gawbah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • Numerous language-centric research on healthcare is conducted day by day. To address shortcomings of Arabic natural language generation models, we introduce a large Arabic Healthcare Dataset (AHD) of textual data. For this motivation, we named our dataset ‘AHD’.
    • The largest Arabic Healthcare Dataset (AHD) as we know was collected from altibbi website.

    • The AHD consists of more than 808k Question and Answer into 90 variety categories. The AHD contains one file, and the file description will be discussed here. One file is the actual data which is in Arabic language.

      • AHD.xlsx file contains dataset in excel format, which includes the question, answer, and category in Arabic.

      • AHD_english.xlsx file contains dataset in excel format, which includes the question, answer, and category translated to English.

    • Distribution of Question and Answer per category.xlsex shows the distribution of the data set by category.

  7. h

    Kaggle-Mental-Health-Survey-Data

    • huggingface.co
    Updated Jul 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shanti flagg (2024). Kaggle-Mental-Health-Survey-Data [Dataset]. https://huggingface.co/datasets/sflagg/Kaggle-Mental-Health-Survey-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2024
    Authors
    shanti flagg
    Description

    sflagg/Kaggle-Mental-Health-Survey-Data dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. i

    COVID-19 Posteroanterior Chest X-Ray fused (CPCXR) dataset

    • ieee-dataport.org
    Updated Oct 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Narinder Singh Punn (2020). COVID-19 Posteroanterior Chest X-Ray fused (CPCXR) dataset [Dataset]. https://ieee-dataport.org/documents/covid-19-posteroanterior-chest-x-ray-fused-cpcxr-dataset
    Explore at:
    Dataset updated
    Oct 27, 2020
    Authors
    Narinder Singh Punn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    and U.S. national library of medicine (USNLM) collected Montgomery country - NLM(MC) (https://lhncbc.nlm.nih.gov/publication/pub9931). These datasets were annotated by expert radiologists.

  9. Medical Text Dataset -Cancer Doc Classification

    • kaggle.com
    Updated Aug 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Falgunipatel19 (2022). Medical Text Dataset -Cancer Doc Classification [Dataset]. https://www.kaggle.com/datasets/falgunipatel19/biomedical-text-publication-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Falgunipatel19
    Description

    For Biomedical text document classification, abstract and full papers(whose length less than or equal to 6 pages) available and used. This dataset focused on long research paper whose page size more than 6 pages. Dataset includes cancer documents to be classified into 3 categories like 'Thyroid_Cancer','Colon_Cancer','Lung_Cancer'. Total publications=7569. it has 3 class labels in dataset. number of samples in each categories: colon cancer=2579, lung cancer=2180, thyroid cancer=2810

  10. i

    MedCD: A Medical Clinical Dataset

    • ieee-dataport.org
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye Chen (2025). MedCD: A Medical Clinical Dataset [Dataset]. https://ieee-dataport.org/documents/medcd-medical-clinical-dataset
    Explore at:
    Dataset updated
    Feb 10, 2025
    Authors
    Ye Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    namely MedCD

  11. h

    medical_asr_recording_dataset

    • huggingface.co
    Updated Oct 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hani. M (2023). medical_asr_recording_dataset [Dataset]. https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 19, 2023
    Authors
    Hani. M
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Data Source Kaggle Medical Speech, Transcription, and Intent Context

    8.5 hours of audio utterances paired with text for common medical symptoms.

    Content

    This data contains thousands of audio utterances for common medical symptoms like “knee pain” or “headache,” totaling more than 8 hours in aggregate. Each utterance was created by individual human contributors based on a given symptom. These audio snippets can be used to train conversational agents in the medical field. This Figure Eight… See the full description on the dataset page: https://huggingface.co/datasets/Hani89/medical_asr_recording_dataset.

  12. Disease Prediction Using Machine Learning

    • dataandsons.com
    csv, zip
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    test test (2022). Disease Prediction Using Machine Learning [Dataset]. https://www.dataandsons.com/categories/machine-learning/disease-prediction-using-machine-learning
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    Authors
    test test
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    About this Dataset

    This dataset will help you apply your existing knowledge to great use. This dataset has 132 parameters on which 42 different types of diseases can be predicted. This dataset consists of 2 CSV files. One of them is for training and the other is for testing your model. Each CSV file has 133 columns. 132 of these columns are symptoms that a person experiences and the last column is the prognosis. These symptoms are mapped to 42 diseases you can classify these sets of symptoms. You are required to train your model on training data and test it on testing data.

    Category

    Machine Learning

    Keywords

    medicine,disease,Healthcare,ML,Machine Learning

    Row Count

    4962

    Price

    $109.00

  13. a

    Medical Segmentation Decathlon Datasets

    • academictorrents.com
    bittorrent
    Updated Sep 20, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    None (2018). Medical Segmentation Decathlon Datasets [Dataset]. https://academictorrents.com/details/274be65156ed14828fb7b30b82407a2417e1924a
    Explore at:
    bittorrent(75906970628)Available download formats
    Dataset updated
    Sep 20, 2018
    Authors
    None
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    With recent advances in machine learning, semantic segmentation algorithms are becoming increasingly general purpose and translatable to unseen tasks. Many key algorithmic advances in the field of medical imaging are commonly validated on a small number of tasks, limiting our understanding of the generalisability of the proposed contributions. A model which works out-of-the-box on many tasks, in the spirit of AutoML, would have a tremendous impact on healthcare. The field of medical imaging is also missing a fully open source and comprehensive benchmark for general purpose algorithmic validation and testing covering a large span of challenges, such as: small data, unbalanced labels, large-ranging object scales, multi-class labels, and multimodal imaging, etc. This challenge and dataset aims to provide such resource thorugh the open sourcing of large medical imaging datasets on several highly different tasks, and by standardising the analysis and validati

  14. B

    Dataset 4: Analysis Plan

    • borealisdata.ca
    Updated Mar 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Global Strategy Lab (2023). Dataset 4: Analysis Plan [Dataset]. http://doi.org/10.5683/SP2/GZP24S
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2023
    Dataset provided by
    Borealis
    Authors
    The Global Strategy Lab
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The analysis plan is provided to guide interested readers through the stages of our study. We outline the research methods, statistical tools, and data sources undertaken in our study. All decisions were solidified before analysis work begun.

  15. p

    Data from: EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge...

    • physionet.org
    Updated Jan 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin Kotschenreuther (2024). EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems [Dataset]. http://doi.org/10.13026/25fx-f706
    Explore at:
    Dataset updated
    Jan 11, 2024
    Authors
    Konstantin Kotschenreuther
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    This dataset was designed and created to enable advancements in healthcare-focused large language models, particularly in the context of retrieval-augmented clinical question-answering capabilities. Developed using a self-constructed pipeline based on the 13-billion parameter Meta Llama 2 model, this dataset encompasses 21466 medical discharge summaries extracted from the MIMIC-IV-Note dataset, with 156599 synthetically generated question-and-answer pairs, a subset of which was verified for accuracy by a physician. These pairs were generated by providing the model with a discharge summary and instructing it to generate question-and-answer pairs based on the contextual information present in the summaries. This work aims to generate data in support of the development of compact large language models capable of efficiently extracting information from medical notes and discharge summaries, thus enabling potential improvements for real-time decision-making processes in clinical settings. Additionally, accompanying the dataset is code facilitating question-and-answer pair generation from any medical and non-medical text. Despite the robustness of the presented dataset, it has certain limitations. The generation process was confined to a maximum context length of 6000 input tokens, owing to hardware constraints. The large language model's nature in generating these question-and-answer pairs may introduce an underlying bias or a lack in diversity and complexity. Future iterations should focus on rectifying these issues, possibly through diversified training and expanded verification procedures as well as the employment of more powerful large language models.

  16. h

    medical-o1-reasoning-SFT

    • huggingface.co
    Updated Apr 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FreedomAI (2025). medical-o1-reasoning-SFT [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT
    Explore at:
    Dataset updated
    Apr 22, 2025
    Dataset authored and provided by
    FreedomAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    News

    [2025/04/22] We split the data and kept only the medical SFT dataset (medical_o1_sft.json). The file medical_o1_sft_mix.json contains a mix of medical and general instruction data. [2025/02/22] We released the distilled dataset from Deepseek-R1 based on medical verifiable problems. You can use it to initialize your models with the reasoning chain from Deepseek-R1. [2024/12/25] We open-sourced the medical reasoning dataset for SFT, built on medical verifiable problems and an LLM… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT.

  17. Gold Standard/Manual Reviewed Annotated Datasets for Technical Validation

    • figshare.com
    xlsx
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zoie SY Wong (2023). Gold Standard/Manual Reviewed Annotated Datasets for Technical Validation [Dataset]. http://doi.org/10.6084/m9.figshare.23504922.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Zoie SY Wong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This page shares the technical validation datasets used to evaluate a Large Dataset of Annotated Incident Reports on Medication Errors and its machine annotator. The files contain in this repository include the IFMIR gold standard dataset (CrossValid_IFMIR_522.xlsx), randomly sampled labeled incident reports from 2010 – 2020 (InternalValid_JQ2010-20_40.xlsx), randomly sampled labeled incident reports from 2021 (ExternalValid_JQ2021_20.xlsx) and Error-free reports (Error_analysis.xlsx).

    To use any of these datasets, one should also cite this original data source: Medical Adverse Event Information Collection Project [Iryō jiko jōhō shūshū-tō jigyō]  Japan Council for Quality Health Care; 2022 [Available from: https://www.med-safe.jp/index.html.]

  18. i

    Data from: Disease Prediction Dataset

    • ieee-dataport.org
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Nautiyal (2025). Disease Prediction Dataset [Dataset]. https://ieee-dataport.org/documents/disease-prediction-dataset
    Explore at:
    Dataset updated
    Feb 20, 2025
    Authors
    Ayush Nautiyal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains symptoms and disease information. It contains total of 1325 symptoms covered with 391 disease.This dataset is refernced from website MedLinePlus. This dataset have training and testing dataset and can be used to train disease prediction algorithm . It is created on own for project disease prediction and do not involves any funding or promotional terms.

  19. Data from: State Health Expenditure Dataset (SHED), 2000-2013

    • icpsr.umich.edu
    ascii, delimited, r +3
    Updated May 12, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Resnick, Beth; Bishai, David; Leider, Jonathan P.; Colrick, Ian (2017). State Health Expenditure Dataset (SHED), 2000-2013 [Dataset]. http://doi.org/10.3886/ICPSR36741.v1
    Explore at:
    delimited, spss, sas, stata, ascii, rAvailable download formats
    Dataset updated
    May 12, 2017
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Resnick, Beth; Bishai, David; Leider, Jonathan P.; Colrick, Ian
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/36741/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36741/terms

    Time period covered
    2000 - 2013
    Area covered
    United States
    Description

    The State Health Expenditure Dataset was designed to better understand the impact of cost-effectiveness of public spending on public health. The collection includes approximately 1.9 million individual records, which were characterized into over 60,000 individual program categories. This data was provided by the US Census, and was collected from state budget offices across the country from 2000-2013. This dataset only encompasses state records that the Census had identified as functional code 32 (health - other) and code 27 (environmental health).

  20. h

    medical-question-answering-datasets

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malikeh Ehghaghi, medical-question-answering-datasets [Dataset]. https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Malikeh Ehghaghi
    Description

    Malikeh1375/medical-question-answering-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
Organization logo

AI medical chatbot

A Dataset for Understanding Medical Conversations and Enhancing Healthcare

Explore at:
48 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yousef Saeedian
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Description:

This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

Key Features:

  • Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.
  • Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.
  • Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.
  • Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

Potential Use Cases:

  • NLP Model Training: Train models to understand and generate medical dialogues.
  • Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.
  • Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.
  • Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.

Search
Clear search
Close search
Google apps
Main menu