37 datasets found
  1. A

    ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-disease-prediction-using-machine-learning-with-gui-5ad4/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/neelima98/disease-prediction-using-machine-learning on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Due to big data progress in biomedical and healthcare communities, accurate study of medical data benefits early disease recognition, patient care and community services. When the quality of medical data is incomplete the exactness of study is reduced. Moreover, different regions exhibit unique appearances of certain regional diseases, which may results in weakening the prediction of disease outbreaks. In this project, it bid a Machine learning Decision tree map, Navie Bayes, Random forest algorithm by using structured and unstructured data from hospital. It also uses Machine learning algorithm for partitioning the data. To the highest of gen, none of the current work attentive on together data types in the zone of remedial big data analytics. Compared to several typical calculating algorithms, the scheming accuracy of our proposed algorithm reaches 94.8% with an regular speed which is quicker than that of the unimodal disease risk prediction algorithm and produces report.

    --- Original source retains full ownership of the source dataset ---

  2. Disease Prediction Using Machine Learning

    • dataandsons.com
    csv, zip
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    test test (2022). Disease Prediction Using Machine Learning [Dataset]. https://www.dataandsons.com/categories/machine-learning/disease-prediction-using-machine-learning
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    Authors
    test test
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    About this Dataset

    This dataset will help you apply your existing knowledge to great use. This dataset has 132 parameters on which 42 different types of diseases can be predicted. This dataset consists of 2 CSV files. One of them is for training and the other is for testing your model. Each CSV file has 133 columns. 132 of these columns are symptoms that a person experiences and the last column is the prognosis. These symptoms are mapped to 42 diseases you can classify these sets of symptoms. You are required to train your model on training data and test it on testing data.

    Category

    Machine Learning

    Keywords

    medicine,disease,Healthcare,ML,Machine Learning

    Row Count

    4962

    Price

    $109.00

  3. i

    Data from: Disease Prediction Dataset

    • ieee-dataport.org
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Nautiyal (2025). Disease Prediction Dataset [Dataset]. https://ieee-dataport.org/documents/disease-prediction-dataset
    Explore at:
    Dataset updated
    Feb 20, 2025
    Authors
    Ayush Nautiyal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains symptoms and disease information. It contains total of 1325 symptoms covered with 391 disease.This dataset is refernced from website MedLinePlus. This dataset have training and testing dataset and can be used to train disease prediction algorithm . It is created on own for project disease prediction and do not involves any funding or promotional terms.

  4. 👨‍🦯 Parkinson's Disease Detection Dataset 👨‍⚕️

    • kaggle.com
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kancharla Naveen Kumar (2023). 👨‍🦯 Parkinson's Disease Detection Dataset 👨‍⚕️ [Dataset]. https://www.kaggle.com/datasets/naveenkumar20bps1137/parkinsons-disease-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kancharla Naveen Kumar
    Description

    Parkinson's data set

    This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to "status" column which is set to 0 for healthy and 1 for PD.

    The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column. For further information or to pass on comments, please contact Max Little (littlem '@' robots.ox.ac.uk).

    Further details are contained in the following reference -- if you use this dataset, please cite: Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2008), 'Suitability of dysphonia measurements for telemonitoring of Parkinson's disease', IEEE Transactions on Biomedical Engineering (to appear).

    This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

    Citation:

    Little,Max. (2008). Parkinsons. UCI Machine Learning Repository. https://doi.org/10.24432/C59C74.

    Matrix column entries (attributes):

    name - ASCII subject name and recording number MDVP:Fo(Hz) - Average vocal fundamental frequency MDVP:Fhi(Hz) - Maximum vocal fundamental frequency MDVP:Flo(Hz) - Minimum vocal fundamental frequency Five measures of variation in Frequency MDVP:Jitter(%) - Percentage of cycle-to-cycle variability of the period duration MDVP:Jitter(Abs) - Absolute value of cycle-to-cycle variability of the period duration MDVP:RAP - Relative measure of the pitch disturbance MDVP:PPQ - Pitch perturbation quotient Jitter:DDP - Average absolute difference of differences between jitter cycles Six measures of variation in amplitude MDVP:Shimmer - Variations in the voice amplitdue MDVP:Shimmer(dB) - Variations in the voice amplitdue in dB Shimmer:APQ3 - Three point amplitude perturbation quotient measured against the average of the three amplitude Shimmer:APQ5 - Five point amplitude perturbation quotient measured against the average of the three amplitude MDVP:APQ - Amplitude perturbation quotient from MDVP Shimmer:DDA - Average absolute difference between the amplitudes of consecutive periods Two measures of ratio of noise to tonal components in the voice NHR - Noise-to-harmonics Ratio and HNR - Harmonics-to-noise Ratio status - Health status of the subject (one) - Parkinson's, (zero) - healthy Two nonlinear dynamical complexity measures RPDE - Recurrence period density entropy D2 - correlation dimension DFA - Signal fractal scaling exponent Three nonlinear measures of fundamental frequency variation spread1 - discrete probability distribution of occurrence of relative semitone variations spread2 - Three nonlinear measures of fundamental frequency variation PPE - Entropy of the discrete probability distribution of occurrence of relative semitone variations

  5. Heart_Dataset

    • kaggle.com
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reddy_Nitin (2021). Heart_Dataset [Dataset]. https://www.kaggle.com/reddynitin/heart-dataset/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Reddy_Nitin
    Description

    This notebook will introduce some foundation machine learning and data science concepts by exploring the problem of heart disease classification.

    The original data came from the Cleveland database from UCI Machine Learning Repository.

    The original database contains 76 attributes, but here only 14 attributes will be used. Attributes (also called features) are the variables that we'll use to predict our target

  6. i

    Cardiovascular Disease Dataset

    • ieee-dataport.org
    Updated Oct 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajib Kumar Halder Halder (2022). Cardiovascular Disease Dataset [Dataset]. https://ieee-dataport.org/documents/cardiovascular-disease-dataset
    Explore at:
    Dataset updated
    Oct 25, 2022
    Authors
    Rajib Kumar Halder Halder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This heart disease dataset is curated by combining 3 popular heart disease datasets. The first dataset (Collected from Kaggle) contains 70000 records with 11 independent features which makes it the largest heart disease dataset available so far for research purposes. These data were collected at the moment of medical examination and information given by the patient. Second and third datasets contain 303 and 293 intstances respectively with 13 common features. The three datasets used for its curation are:Cardio Data (Kaggle Dataset)

  7. Kidney Disease Dataset

    • kaggle.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amanik (2025). Kidney Disease Dataset [Dataset]. https://www.kaggle.com/datasets/amanik000/kidney-disease-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    Kaggle
    Authors
    amanik
    Description

    The Kidney Disease Dataset is a rich collection of clinical and laboratory data from patients, curated to support the analysis, diagnosis, and prediction of chronic kidney disease (CKD). It includes 43 diverse features encompassing demographic details, vital signs, urine and blood test results, medical history, lifestyle factors, and biomarkers such as eGFR, serum creatinine, and Cystatin C. This dataset is ideal for building machine learning models, conducting statistical analysis, and exploring correlations between health indicators and kidney function. It provides a valuable resource for researchers and healthcare professionals working on early detection and management of kidney-related disorders. This dataset consists of detailed clinical information related to kidney health, intended for machine learning applications, statistical analysis, and healthcare research.

  8. Heart Disease Prediction

    • kaggle.com
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Falah Gatea (2024). Heart Disease Prediction [Dataset]. https://www.kaggle.com/datasets/falahgatea/heart-disease-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2024
    Dataset provided by
    Kaggle
    Authors
    Falah Gatea
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    About Dataset Context: The leading cause of death in the developed world is heart disease. Therefore there needs to be work done to help prevent the risks of of having a heart attack or stroke.

    Content: Use this dataset to predict which patients are most likely to suffer from a heart disease in the near future using the features given.

    Acknowledgement: This data comes from the University of California Irvine's Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Heart+Disease.

  9. Medical Data for Disease Prediction

    • kaggle.com
    Updated Feb 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Elshoraky (2025). Medical Data for Disease Prediction [Dataset]. https://www.kaggle.com/datasets/mhmdelshoraky/medical-data-for-disease-prediction/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamed Elshoraky
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview This dataset is a synthetic collection of medical attributes designed for educational and research purposes. It provides structured health-related data, including patient demographics, vital signs, and electrocardiogram (ECG) measurements, along with a predicted disease classification.

    The dataset is intended to support machine learning practitioners and students in developing classification models for disease prediction. It allows users to explore patterns in health-related data and apply machine learning techniques in a controlled, educational setting.

    Dataset Details Total Records: 695,551 entries Target Variable: Predicted_Disease (Categorical: ‘Arrhythmia’, ‘Heart Failure’, ‘Coronary Artery Disease’, ‘Good’)

    Features: - Age - Gender - Weight - Height - Heart_Rate - Oxygen_Saturation - Temperature - ECG_QT_Interval - ECG_ST_Segment - Predicted_Disease

    This dataset was generated with script with predefined parameter ranges and is not derived from real-world medical data. It should not be considered reliable for medical or clinical decision-making.

    It is intended for educational purposes only and should not be used in real-world healthcare applications. The accuracy of the generated values is not guaranteed.

    I'm not responsible for any incorrect use, misinterpretation, or unintended consequences of this dataset.

  10. Breast Cancer Prediction Dataset

    • kaggle.com
    Updated Sep 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Merishna Singh Suwal (2018). Breast Cancer Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/merishnasuwal/breast-cancer-prediction-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Merishna Singh Suwal
    Description

    Worldwide, breast cancer is the most common type of cancer in women and the second highest in terms of mortality rates.Diagnosis of breast cancer is performed when an abnormal lump is found (from self-examination or x-ray) or a tiny speck of calcium is seen (on an x-ray). After a suspicious lump is found, the doctor will conduct a diagnosis to determine whether it is cancerous and, if so, whether it has spread to other parts of the body.

    This breast cancer dataset was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.

  11. i

    Heart Disease Dataset (Comprehensive)

    • ieee-dataport.org
    Updated Oct 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MANU SIDDHARTHA (2019). Heart Disease Dataset (Comprehensive) [Dataset]. https://ieee-dataport.org/open-access/heart-disease-dataset-comprehensive
    Explore at:
    Dataset updated
    Oct 24, 2019
    Authors
    MANU SIDDHARTHA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset

  12. m

    Cardiovascular_Disease_Dataset

    • data.mendeley.com
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhanu Prakash Doppala (2021). Cardiovascular_Disease_Dataset [Dataset]. http://doi.org/10.17632/dzz48mvjht.1
    Explore at:
    Dataset updated
    Apr 16, 2021
    Authors
    Bhanu Prakash Doppala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This heart disease dataset is acquired from one o f the multispecialty hospitals in India. Over 14 common features which makes it one of the heart disease dataset available so far for research purposes. This dataset consists of 1000 subjects with 12 features. This dataset will be useful for building a early-stage heart disease detection as well as to generate predictive machine learning models.

  13. A

    ‘Heart Failure Prediction’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Heart Failure Prediction’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-heart-failure-prediction-c926/1b358936/?iid=010-637&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Heart Failure Prediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/andrewmvd/heart-failure-clinical-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide. Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

    Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies.

    People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.

    How to use this dataset

    • Create a model for predicting mortality caused by Heart Failure.
    • Your kernel can be featured here!
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit the authors

    Citation

    Davide Chicco, Giuseppe Jurman: Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making 20, 16 (2020). (link)

    License

    CC BY 4.0

    Splash icon

    Icon by Freepik, available on Flaticon.

    Splash banner

    Wallpaper by jcomp, available on Freepik.

    --- Original source retains full ownership of the source dataset ---

  14. Health Care Analytics

    • kaggle.com
    Updated Jan 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abishek Sudarshan
    Description

    Context

    Part of Janatahack Hackathon in Analytics Vidhya

    Content

    The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

    MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

    MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

    One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

    The Process:

    MedCamp employees / volunteers reach out to people and drive registrations.
    During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.
    

    Other things to note:

    Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people.
    For a few camps, there was hardware failure, so some information about date and time of registration is lost.
    MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides  
    information about several health issues through various awareness stalls.
    

    Favorable outcome:

    For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall.
    You need to predict the chances (probability) of having a favourable outcome.
    

    Train / Test split:

    Camps started on or before 31st March 2006 are considered in Train
    Test data is for all camps conducted on or after 1st April 2006.
    

    Acknowledgements

    Credits to AV

    Inspiration

    To share with the data science community to jump start their journey in Healthcare Analytics

  15. A

    ‘Dementia Prediction Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Dementia Prediction Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-dementia-prediction-dataset-8ab0/latest
    Explore at:
    Dataset updated
    Aug 14, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Dementia Prediction Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/dementia-prediction-dataset on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Dementia is a syndrome – usually of a chronic or progressive nature – in which there is deterioration in cognitive function (i.e. the ability to process thought) beyond what might be expected from normal aging. It affects memory, thinking, orientation, comprehension, calculation, learning capacity, language, and judgment. Consciousness is not affected. The impairment in cognitive function is commonly accompanied and occasionally preceded, by deterioration in emotional control, social behaviou, or motivation.

    Dementia results from a variety of diseases and injuries that primarily or secondarily affect the brain, such as Alzheimer's disease or stroke.

    Dementia is one of the major causes of disability and dependency among older people worldwide. It can be overwhelming, not only for the people who have it, but also for their carers and families. There is often a lack of awareness and understanding of dementia, resulting in stigmatization and barriers to diagnosis and care. The impact of dementia on carers, family, and society at large can be physical, psychological, social and e and economic

    Content

    This set consists of a longitudinal collection of 150 subjects aged 60 to 96. Each subject was scanned on two or more visits, separated by at least one year for a total of 373 imaging sessions. For each subject, 3 or 4 individual T1-weighted MRI scans obtained in single scan sessions are included. The subjects are all right-handed and include both men and women. 72 of the subjects were characterized as nondemented throughout the study. 64 of the included subjects were characterized as demented at the time of their initial visits and remained so for subsequent scans, including 51 individuals with mild to moderate Alzheimer’s disease. Another 14 subjects were characterized as nondemented at the time of their initial visit and were subsequently characterized as demented at a later visit

    Acknowledgements

    Battineni, Gopi; Amenta, Francesco; Chintalapudi, Nalini (2019), “Data for: MACHINE LEARNING IN MEDICINE: CLASSIFICATION AND PREDICTION OF DEMENTIA BY SUPPORT VECTOR MACHINES (SVM)”, Mendeley Data, V1, doi: 10.17632/tsy6rbc5d4.1 * Dataset is available here.

    --- Original source retains full ownership of the source dataset ---

  16. o

    Medical Diagnosis Prediction Dataset

    • opendatabay.com
    .undefined
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Public Safety & Security
    Description

    This dataset is designed for preliminary diagnosis prediction, supporting patient flow logistics and the second opinion concept during patient interactions through dialogue systems. It is part of a project initiated at ITMO University in 2022. The dataset maps symptoms to diseases, offering a valuable resource for developing AI and LLM-based diagnostic tools. It comprises two main columns, detailing symptoms and their corresponding diagnoses, with 132 unique symptoms and 40 unique diagnoses identified.

    Columns

    • симптомы (symptoms as list): Contains information regarding various patient symptoms, often provided as a list.
    • диагноз (disease name): Specifies the corresponding disease name or diagnosis associated with the listed symptoms.

    Distribution

    The dataset is typically provided in a CSV format. It structures information across two columns: symptoms and disease names. While the exact total number of rows or records is not specified, the dataset includes 132 unique symptoms and 40 unique diagnoses. This is a Version 1.0 dataset.

    Usage

    This dataset is ideally suited for: * Developing and training preliminary diagnosis prediction models. * Enhancing patient flow logistics in healthcare settings. * Supporting second opinion concepts through automated systems. * Building and refining dialogue systems for patient interactions. * Training AI and machine learning models for symptom-disease mapping.

    Coverage

    The dataset's scope is global, indicating its potential applicability across different regions. The project that developed these datasets has been active since 2022, suggesting the data reflects contemporary medical terminology and contexts. The dataset was listed on 26/06/2025.

    License

    CC-BY-NC

    Who Can Use It

    • AI/LLM developers: For training and fine-tuning models in medical diagnostics and conversational AI.
    • Medical researchers: To analyse symptom-disease correlations and develop predictive tools.
    • Healthcare technology developers: For creating applications that assist with patient intake, preliminary diagnoses, and medical information systems.
    • Academic institutions: For educational and research purposes in health informatics and AI in medicine.

    Dataset Name Suggestions

    • Patient Symptom-Disease Mapping Data
    • Medical Diagnosis Prediction Dataset
    • Healthcare Dialogue System Training Data
    • Symptom-Disease NLP Data
    • Clinical Symptom-Diagnosis Dataset

    Attributes

    Original Data Source: Patient Disease Dataset

  17. o

    Disease Symptom Classifier Dataset

    • opendatabay.com
    .undefined
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Disease Symptom Classifier Dataset [Dataset]. https://www.opendatabay.com/data/healthcare/1df74ad4-cc10-46c0-9cbb-309f5922d042
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Healthcare Insurance & Costs
    Description

    This dataset provides a curated collection of disease labels paired with natural language descriptions of symptoms. Its primary purpose is to facilitate the development of language models capable of accurately predicting potential diseases based on user-provided symptom descriptions. Such models hold significant potential for enabling early disease identification, allowing individuals to seek prompt medical attention and treatment. Furthermore, it supports the creation of applications for remote diagnosis and treatment recommendations, particularly useful in situations where in-person consultations may not be feasible or desirable.

    Columns

    The dataset consists of two main columns: * label: This column contains the specific disease labels associated with each symptom description. * text: This column provides the natural language descriptions of the symptoms experienced.

    Distribution

    The dataset is typically provided in a CSV file format. It comprises a total of 1200 datapoints. These datapoints are structured around 24 distinct diseases, with each disease having 50 corresponding symptom descriptions.

    Usage

    This dataset is ideal for various applications and use cases, including: * Developing and training natural language processing (NLP) models for disease prediction. * Creating AI-powered tools for early identification of health conditions. * Building virtual assistants or telemedicine platforms that offer remote diagnostic support. * Researching classification algorithms in the medical and healthcare domain. * Analysing disease patterns and symptom correlations.

    Coverage

    The dataset's coverage is global, making it suitable for a wide range of applications without regional limitations. It specifically includes 24 different diseases: Psoriasis, Varicose Veins, Typhoid, Chicken pox, Impetigo, Dengue, Fungal infection, Common Cold, Pneumonia, Dimorphic Hemorrhoids, Arthritis, Acne, Bronchial Asthma, Hypertension, Migraine, Cervical spondylosis, Jaundice, Malaria, urinary tract infection, allergy, gastroesophageal reflux disease, drug reaction, peptic ulcer disease, and diabetes. Information on specific time ranges or demographic scopes is not available in the provided details.

    License

    CCO

    Who Can Use It

    This dataset is intended for a variety of users, including: * Data Scientists and Machine Learning Engineers: To build and refine models for medical diagnostics and NLP tasks. * Healthcare Technology Developers: To integrate symptom analysis capabilities into healthcare applications and platforms. * Researchers: To conduct studies on disease prediction, language understanding in a medical context, and the application of deep learning to health data. * Students: As a valuable resource for learning and practicing data science and AI skills within the healthcare domain.

    Dataset Name Suggestions

    • Symptom2Disease Dataset
    • Disease Symptom Classifier
    • Medical Symptom Description Data
    • Healthcare NLP Diagnostic Dataset

    Attributes

    Original Data Source: Symptom2Disease

  18. A

    ‘In Hospital Mortality Prediction’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘In Hospital Mortality Prediction’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-in-hospital-mortality-prediction-41fd/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘In Hospital Mortality Prediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/saurabhshahane/in-hospital-mortality-prediction on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    The predictors of in-hospital mortality for intensive care units (ICU)-admitted HF patients remain poorly characterized. We aimed to develop and validate a prediction model for all-cause in-hospital mortality among ICU-admitted HF patients.

    Content

    Using Structured Query Language queries (PostgreSQL, version 9.6), demographic characteristics, vital signs, and laboratory values data were extracted from the following tables in the MIMIC III dataset: ADMISSIONS, PATIENTS, ICUSTAYS, D_ICD DIAGNOSIS, DIAGNOSIS_ICD, LABEVENTS, D_LABIEVENTS, CHARTEVENTS, D_ITEMS, NOTEEVENTS, and OUTPUTEVENTS. Based on previous studies 7-9 13-15, clinical relevance, and general availability at the time of presentation, we extracted the following data: demographic characteristics (age at the time of hospital admission, sex, ethnicity, weight, and height); vital signs (heart rate, (HR), systolic blood pressure [SBP], diastolic blood pressure [DBP], mean blood pressure, respiratory rate, body temperature, saturation pulse oxygen [SPO2], urine output [first 24 h]); comorbidities (hypertension, atrial fibrillation, ischemic heart disease, diabetes mellitus, depression, hypoferric anemia, hyperlipidemia, chronic kidney disease (CKD), and chronic obstructive pulmonary disease [COPD]); and laboratory variables (hematocrit, red blood cells, mean corpuscular hemoglobin [MCH], mean corpuscular hemoglobin concentration [MCHC], mean corpuscular volume [MCV], red blood cell distribution width [RDW], platelet count, white blood cells, neutrophils, basophils, lymphocytes, prothrombin time [PT], international normalized ratio [INR], NT-proBNP, creatine kinase, creatinine, blood urea nitrogen [BUN] glucose, potassium, sodium, calcium, chloride, magnesium, the anion gap, bicarbonate, lactate, hydrogen ion concentration [pH], partial pressure of CO2 in arterial blood, and LVEF), using Structured Query Language (SQL) with PostgreSQL (version 9.6). Demographic characteristics and vital signs extracted were recorded during the first 24 hours of each admission and laboratory variables were measured during the entire ICU stay. Comorbidities were identified using ICD-9 codes. For variable data with multiple measurements, the calculated mean value was included for analysis. The primary outcome of the study was in-hospital mortality, defined as the vital status at the time of hospital discharge in survivors and non-survivors.

    Acknowledgements

    Zhou, Jingmin et al. (2021), Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the MIMIC-III database, Dryad, Dataset, https://doi.org/10.5061/dryad.0p2ngf1zd

    LICENSE - CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

    Target Variable - Outcome 0 - Alive 1 - Death

    --- Original source retains full ownership of the source dataset ---

  19. Heart Disease Risk Prediction Dataset

    • kaggle.com
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Şahide ŞEKER (2025). Heart Disease Risk Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/sahideseker/heart-disease-risk-prediction-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Şahide ŞEKER
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🇬🇧 English:

    This synthetic dataset helps build machine learning models to predict whether a patient is at risk of heart disease. It includes patient attributes such as age, cholesterol, blood pressure, sex, and diabetes history.

    Use this dataset to:

    • Train classification models (e.g., XGBoost, Decision Tree)
    • Analyze the relationship between health metrics and heart disease
    • Practice healthcare-related ML without privacy concerns

    Features:

    • age: Age of the patient
    • cholesterol: Cholesterol level (mg/dL)
    • bp: Blood pressure (mmHg)
    • sex: Biological sex (Male/Female)
    • diabetes: Diabetes status (Yes/No)
    • heart_disease: Presence of heart disease (1 = Yes, 0 = No)

    🇹🇷 Türkçe:

    Bu sentetik veri seti, hastaların kalp hastalığı riski taşıyıp taşımadığını tahmin etmeye yönelik makine öğrenmesi modelleri geliştirmek için tasarlanmıştır. Yaş, kolesterol, tansiyon, cinsiyet ve diyabet bilgileri gibi özellikleri içerir.

    Bu veri seti ile:

    • XGBoost ve Decision Tree gibi sınıflandırma modelleri eğitilebilir
    • Sağlık verileriyle risk analizi yapılabilir
    • Gizlilik endişesi olmadan sağlık odaklı projeler geliştirilebilir
  20. f

    Data set presentation.

    • plos.figshare.com
    xls
    Updated Sep 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenguang Li; Yan Peng; Ke Peng (2024). Data set presentation. [Dataset]. http://doi.org/10.1371/journal.pone.0311222.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Wenguang Li; Yan Peng; Ke Peng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Diabetes, as an incurable lifelong chronic disease, has profound and far-reaching effects on patients. Given this, early intervention is particularly crucial, as it can not only significantly improve the prognosis of patients but also provide valuable reference information for clinical treatment. This study selected the BRFSS (Behavioral Risk Factor Surveillance System) dataset, which is publicly available on the Kaggle platform, as the research object, aiming to provide a scientific basis for the early diagnosis and treatment of diabetes through advanced machine learning techniques. Firstly, the dataset was balanced using various sampling methods; secondly, a Stacking model based on GA-XGBoost (XGBoost model optimized by genetic algorithm) was constructed for the risk prediction of diabetes; finally, the interpretability of the model was deeply analyzed using Shapley values. The results show: (1) Random oversampling, ADASYN, SMOTE, and SMOTEENN were used for data balance processing, among which SMOTEENN showed better efficiency and effect in dealing with data imbalance. (2) The GA-XGBoost model optimized the hyperparameters of the XGBoost model through a genetic algorithm to improve the model’s predictive accuracy. Combined with the better-performing LightGBM model and random forest model, a two-layer Stacking model was constructed. This model not only outperforms single machine learning models in predictive effect but also provides a new idea and method in the field of model integration. (3) Shapley value analysis identified features that have a significant impact on the prediction of diabetes, such as age and body mass index. This analysis not only enhances the transparency of the model but also provides more precise treatment decision support for doctors and patients. In summary, this study has not only improved the accuracy of predicting the risk of diabetes by adopting advanced machine learning techniques and model integration strategies but also provided a powerful tool for the early diagnosis and personalized treatment of diabetes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-disease-prediction-using-machine-learning-with-gui-5ad4/latest

‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2

Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/neelima98/disease-prediction-using-machine-learning on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Due to big data progress in biomedical and healthcare communities, accurate study of medical data benefits early disease recognition, patient care and community services. When the quality of medical data is incomplete the exactness of study is reduced. Moreover, different regions exhibit unique appearances of certain regional diseases, which may results in weakening the prediction of disease outbreaks. In this project, it bid a Machine learning Decision tree map, Navie Bayes, Random forest algorithm by using structured and unstructured data from hospital. It also uses Machine learning algorithm for partitioning the data. To the highest of gen, none of the current work attentive on together data types in the zone of remedial big data analytics. Compared to several typical calculating algorithms, the scheming accuracy of our proposed algorithm reaches 94.8% with an regular speed which is quicker than that of the unimodal disease risk prediction algorithm and produces report.

--- Original source retains full ownership of the source dataset ---

Search
Clear search
Close search
Google apps
Main menu