24 datasets found

m
Disease and symptoms dataset 2023
data.mendeley.com
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1
Explore at:
Unique identifier
https://doi.org/10.17632/2cxccsxydc.1
Dataset updated
Mar 3, 2025
Authors
Bran Stark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.
h
Diseases_Dataset
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamruzzaman Asif (2025). Diseases_Dataset [Dataset]. https://huggingface.co/datasets/kamruzzaman-asif/Diseases_Dataset
Explore at:
Dataset updated
Jun 1, 2025
Authors
Kamruzzaman Asif
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
🩺 Diseases Dataset

A consolidated medical dataset combining disease names, symptoms, and treatments collected from multiple public datasets across Hugging Face and Kaggle. This dataset can be used for building disease prediction, symptom clustering, and medical assistant models.

📦 Dataset Summary

Field Type Description

Disease string Name of the disease or condition

Symptoms string List of symptoms or Description of symptomps

Treatments string (Optional)… See the full description on the dataset page: https://huggingface.co/datasets/kamruzzaman-asif/Diseases_Dataset.
o
Synthetic Respiratory Symptoms Dataset
opendatabay.com
.csv
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Respiratory Symptoms Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/38f90264-2312-43c5-9666-e1b18e7ce7e2
Explore at:
.csvAvailable download formats
Dataset updated
Jun 3, 2025
Dataset authored and provided by
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Patient Health Records & Digital Health
Description
The Synthetic Respiratory Symptoms Dataset is created for educational and research use to analyze associations between respiratory symptoms, underlying diseases, treatment types, and the severity of the condition. This synthetic dataset helps simulate real-world scenarios involving respiratory illnesses while ensuring privacy compliance through full anonymization.

Dataset Features

Symptoms: Descriptions of reported respiratory symptoms (e.g., yellow cough, chest congestion).

Age: Age of the individual (in years).

Sex: Biological sex of the individual (Male/Female).

Disease: Diagnosed respiratory condition (e.g., Mesothelioma, bronchitis).

Treatment: Medical intervention applied (e.g., Chemotherapy, Antibiotic, or none).

Nature: Severity of the case (e.g., low, medium, high).

Distribution

https://storage.googleapis.com/opendatabay_public/38f90264-2312-43c5-9666-e1b18e7ce7e2/4fb39ac546e8_disease_count.png" alt="Synthetic Respiratory Symptoms Dataset Distribution by Disease Count">

https://storage.googleapis.com/opendatabay_public/38f90264-2312-43c5-9666-e1b18e7ce7e2/9b0990ccf2df_age_by_nature.png" alt="Synthetic Respiratory Symptoms Dataset Distribution by Nature">

Usage

This dataset can be used for the following applications:

Respiratory Disease Research: Study the correlation between symptoms, severity, and diagnosed diseases in synthetic patients.

Predictive Modelling: Train machine learning models to predict disease types or severity based on symptoms and demographic data.

Clinical Insights: Analyse how different symptoms align with specific respiratory conditions and the common treatments used.

Educational Purposes: Serve as a teaching tool for learners and educators in public health, respiratory medicine, and medical data science.

Coverage

This synthetic dataset is fully anonymised and complies with modern data privacy standards. It incorporates a diverse array of symptom profiles, diseases, and treatments relevant to respiratory illnesses, enabling a broad range of analytical and educational applications.

License

CC0 (Public Domain)

Who Can Use It

Medical Researchers: To investigate symptom-disease-treatment relationships in respiratory medicine.

Pulmonologists and Healthcare Professionals: To explore treatment trends and symptom patterns in respiratory conditions.

Data Scientists and Machine Learning Engineers: To build and evaluate predictive models for diagnosis or severity classification.

Educators and Students: As a practice dataset for courses in health data analytics and medical informatics.
H
Healthcare Data Industry Report
datainsightsmarket.com
doc, pdf, ppt
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Healthcare Data Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/healthcare-data-industry-8463
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Healthcare Data Industry market was valued at USD XX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 16.20% during the forecast period. Data in healthcare signifies all the information that is created or gathered in the healthcare industry. These include patient records, electronic health records, genomic data, health insurance claims, medical images, and all other clinical trial data. All this stands at the back of modern healthcare and could support many critical applications. First and foremost, health data improves patient care. Pattern analysis for patient records is simplified by health care providers in ensuring accurate disease diagnosis and application of personalized treatment plans. Medical field images, such as X-rays and MRIs, are helpful in finding abnormalities and useful in surgical methods. Genomic data insights comprise susceptibility from a genetic view point, which therefore enables coming up with a customised treatment plan for diseases such as cancer. Then, the health information data is very crucial in conducting research and developing new medical knowledge. Researchers analyze epidemiology of diseases by adopting massive datasets, manufacture new drugs and treatments, and analyze effectiveness of health care programs by such datasets. For instance, the medical trials dataset helps in the development of evidence about the safety and efficiency of new treatment options. The health insurance claims dataset can help assess healthcare utilization patterns so as to identify areas in need of improvement. Therefore, health care data also enables administrative and operational functions of health care organizations. EHRs allow easy maintenance of the patient data, enable sound communications among healthcare providers, and minimize errors. Apart from this, analytics on health insurance claims are performed to make possible billing and reimbursement services to ensure the payment of the healthcare provider in the right amount of their rendered service. Further, analytics data could be used for optimization of resource utilization, in identifying potential cost savings, and making health care organizations efficient as a whole. Healthcare information is one of those precious assets that propel innovation, promote better patient outcomes, and support the coherent functioning of the healthcare system. Therefore, improving the quality and efficiency in which care delivery is offered can be achieved through the effective use of healthcare information by healthcare providers, researchers, and administrators for a better state of health among individuals and communities. Recent developments include: March 2022: Microsoft launched Azure Health Data Services in the United States. It is a platform as a service (PAAS) offering designed exclusively to support protected health information (PHI) in the cloud., March 2022: The government of Thailand launched a big data portal for healthcare facilities. The National Reforms Committee on Public Health recently joined hands with 12 government agencies to improve the quality of healthcare services by implementing digital technologies.. Key drivers for this market are: Increase in Demand for Analytics Solutions for Population Health Management, Rise in Need for Business Intelligence to Optimize Health Administration and Strategy; Surge in Adoption of Big Data in the Healthcare Industry. Potential restraints include: Security Concerns Related to Sensitive Patients Medical Data, High Cost of Implementation and Deployment. Notable trends are: Cloud Segment is Expected to Register a High Growth Rate Over the Forecast Period.
P
MedQuAD Dataset
paperswithcode.com
Updated Feb 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asma Ben Abacha; Dina Demner-Fushman (2024). MedQuAD Dataset [Dataset]. https://paperswithcode.com/dataset/medquad
Explore at:
Dataset updated
Feb 16, 2024
Authors
Asma Ben Abacha; Dina Demner-Fushman
Description
MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.
Covid19_Dataset
kaggle.com
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Somin Ramchiary (2024). Covid19_Dataset [Dataset]. https://www.kaggle.com/datasets/somin69/covid19-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Somin Ramchiary
Description
The dataset contains 199999 Rows and 21 Variables of various medical Description & health indication.

USMER (Medical Unit Code): An identifier for the medical unit where the patient was treated or admitted. MEDICAL_UNIT: The name or code of the medical unit where the patient received treatment. SEX: The gender of the patient (e.g., Male, Female). PATIENT_TYPE: Indicates whether the patient is an outpatient or hospitalized (e.g., Outpatient, Hospitalized). DATE_DIED: The date of death of the patient, if applicable. This variable may be null for surviving patients. INTUBED: Indicates whether the patient was intubated (i.e., received mechanical ventilation) during treatment (e.g., Yes, No). PNEUMONIA: Indicates whether the patient had pneumonia as a complication of COVID-19 infection (e.g., Yes, No). AGE: The age of the patient at the time of diagnosis or admission. PREGNANT: Indicates whether the patient was pregnant at the time of diagnosis or admission (e.g., Yes, No). DIABETES: Indicates whether the patient had diabetes as a pre-existing condition (e.g., Yes, No). COPD (Chronic Obstructive Pulmonary Disease): Indicates whether the patient had COPD as a pre-existing condition (e.g., Yes, No). ASTHMA: Indicates whether the patient had asthma as a pre-existing condition (e.g., Yes, No). INMSUPR (Immunosuppression): Indicates whether the patient had immunosuppression (weakened immune system) as a pre-existing condition (e.g., Yes, No). HIPERTENSION (Hypertension): Indicates whether the patient had hypertension (high blood pressure) as a pre-existing condition (e.g., Yes, No). OTHER_DISEASE: Indicates whether the patient had other pre-existing diseases or medical conditions not specifically listed (e.g., Yes, No). CARDIOVASCULAR: Indicates whether the patient had cardiovascular disease as a pre-existing condition (e.g., Yes, No). OBESITY: Indicates whether the patient had obesity as a pre-existing condition (e.g., Yes, No). RENAL_CHRONIC (Chronic Renal Insufficiency): Indicates whether the patient had chronic renal insufficiency (chronic kidney disease) as a pre-existing condition (e.g., Yes, No). TOBACCO: Indicates whether the patient was a tobacco user (e.g., Yes, No). CLASIFFICATION_FINAL (Final Classification): The final classification of the patient's COVID-19 case based on severity or outcome (e.g., Mild, Severe, Deceased). ICU (Intensive Care Unit): Indicates whether the patient was admitted to the intensive care unit (ICU) during treatment (e.g., Yes, No).
o
Synthetic Dry Eye Disease Patient Records
opendatabay.com
.csv
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Dry Eye Disease Patient Records [Dataset]. https://www.opendatabay.com/data/synthetic/f4e9ad52-5d13-4d2e-ac19-207a5b71522e
Explore at:
.csvAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Patient Health Records & Digital Health
Description
The Dry Eye Disease Patient Records (Synthetic) is designed for educational and research purposes to analyze patterns in sleep behavior, stress levels, lifestyle factors, and their potential links to dry eye disease. It provides anonymized, synthetic data on various health conditions and behavioral habits.

Dataset Features

Gender: Gender of the individual (Male/Female).

Age: Age of the individual.

Sleep Duration: Average sleep duration in hours.

Sleep Quality: Subjective assessment of sleep quality (scale-based).

Stress Level: Measured stress level (scale-based).

Heart Rate: Resting heart rate (bpm).

Daily Steps: Number of steps taken per day.

Physical Activity: Minutes of physical activity per day.

Height & Weight: Individual’s height (cm) and weight (kg).

Sleep Disorder: Presence of a diagnosed sleep disorder (Yes/No).

Wake Up During Night: Frequency of waking up during the night (Yes/No).

Feel Sleepy During Day: Self-reported daytime sleepiness (Yes/No).

Caffeine Consumption: Frequency of caffeine intake (Yes/No).

Alcohol Consumption: Frequency of alcohol intake (Yes/No).

Smoking: Smoking habits (Yes/No).

Medical Issue: Presence of any medical conditions (Yes/No).

Ongoing Medication: Use of any ongoing medication (Yes/No).

Smart Device Before Bed: Usage of smart devices before sleeping (Yes/No).

Average Screen Time: Daily screen time in hours.

Blue-Light Filter: Use of blue-light filters on devices (Yes/No).

Eye Discomfort & Strain: Presence of discomfort and eye strain (Yes/No).

Redness in Eye: Occurrence of eye redness (Yes/No).

Itchiness/Irritation in Eye: Symptoms of eye itchiness or irritation (Yes/No).

Dry Eye Disease: Diagnosis of Dry Eye Disease (Yes/No).

Distribution

https://storage.googleapis.com/opendatabay_public/f4e9ad52-5d13-4d2e-ac19-207a5b71522e/2e2e949519d7_eye.png" alt="Dry Eye Disease Patient Records Synthetic Data">

Usage

This dataset can be used for the following applications:

Healthcare Analytics: Identify patterns between lifestyle factors and dry eye disease.

Predictive Modeling: Develop machine learning models to predict eye health risks.

Clinical Research: Investigate associations between screen time, sleep habits, and eye conditions.

Educational Purposes: Provide a dataset for students in medical, data science, and public health fields to analyze real-world health trends.

Coverage

This synthetic dataset is fully anonymized and complies with data privacy standards. It includes a variety of demographic and lifestyle factors to support a broad range of research and analysis.

License

CC0 (Public Domain)

Who Can Use It

Healthcare Researchers: To explore correlations between lifestyle habits and dry eye disease.

Clinicians and Medical Practitioners: To analyze factors contributing to eye health issues.

Data Scientists and Machine Learning Practitioners: To develop predictive models for eye-related conditions.

Educators and Students: As a resource for studying health analytics and medical research.
Data from: Discovering health topics in social media using topic models
figshare.com
txt
Updated Jan 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Paul (2016). Discovering health topics in social media using topic models [Dataset]. http://doi.org/10.6084/m9.figshare.1007712.v5
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1007712.v5
Dataset updated
Jan 18, 2016
Dataset provided by
figshare
Authors
Michael Paul
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data set for M. Paul and M. Dredze, "Discovering health topics in social media using topic models". This includes the set of tweets used in the experiments, and the words associatedwith ailments discovered by the Ailment Topic Aspect Model (ATAM). Contact: Michael Paul (mpaul39@gmail.com)Released June 26, 2014 atam.topwords.csv- The most probable words for each ailment. The first column is the ailment ID.The second column indicates if it is a general (G), symptom (S), or treatment (T) word.The third column is the word. The fourth column is the probability. Words are shownin descending order of probability until 90% of the probability mass is accumulatedfor each ailment or until probabilities drop below 1.0e-4. atam.tweets.x.csv (for x=[0-9])- The tweets used in the study. The first column is the tweet ID. The second columnindicates the ailment ID for the ailment sampled for that tweet.(See the atam.topwords.csv file for the most probable words associated with each ailment ID.)Full tweets can be downloaded using the tweet ID through the Twitter API(https://dev.twitter.com/docs/api/1.1). keywords.txt - The set of 269 health-related keywords used in our keyword-filtered Twitter stream as part of our dataset. keywords_x.txt (for x={diseases,symptoms,treatments})- The set of approximately 20,000 keyphrases crawled from wrongdiagnosis.com describingthe names of diseases, symptoms, and treatments and medications. These keyword lists areused to create input for ATAM (which requires phrases to be labeled as symptoms or treatments),and also to initially filter our dataset when constructing our health classifiers.
o
Synthetic Cardiovascular Disease Prediction Dataset
opendatabay.com
.csv
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Cardiovascular Disease Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/fd585f02-456f-4aa1-b7d1-4ea4499c7824
Explore at:
.csvAvailable download formats
Dataset updated
Apr 26, 2025
Dataset authored and provided by
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Patient Health Records & Digital Health
Description
This Synthetic Cardiovascular Disease Dataset is created for educational and research purposes in cardiology, public health, and data science. It provides demographic, medical, and diagnostic details related to cardiovascular diseases, enabling analysis of risk factors, disease progression, and treatment outcomes. The dataset can be utilized for building predictive models and exploring disease management strategies.

Dataset Features

Age: Age of the individual in years.

Gender: Biological sex of the individual (Male/Female).

Smoking: Current smoking status (Yes/No).

Hx Smoking: History of smoking (Yes/No).

Hx Hypertension: History of hypertension (Yes/No).

Cholesterol Level: Cholesterol level in mg/dL (e.g., Normal, Borderline, High).

Blood Pressure: Blood pressure reading in mmHg (e.g., Normal, Elevated, Hypertension Stage 1).

Diabetes: Presence of diabetes (Yes/No).

BMI: Body Mass Index, a measure of body fat based on height and weight.

Physical Activity: Physical activity level (e.g., Sedentary, Moderate, Active).

Risk Score: Cardiovascular risk score (Low, Moderate, High).

Angina: Presence of chest pain or discomfort (Yes/No).

Heart Attack: Indicates if the patient has suffered a myocardial infarction (Yes/No).

Heart Failure: Indicates the presence of heart failure (Yes/No).

Treatment Type: Type of treatment administered (e.g., Medication, Surgery, Lifestyle Changes).

Response: Patient's response to treatment (e.g., Excellent, Partial, Poor).

Recurred: Indicates whether cardiovascular issues have recurred (Yes/No).

Distribution

https://storage.googleapis.com/opendatabay_public/fd585f02-456f-4aa1-b7d1-4ea4499c7824/73db4d04d538_cardio.png" alt="Synthetic Cardiovascular Disease Prediction Data Distribution">

Usage

This dataset is suited for the following applications:

Risk Prediction: Develop predictive models to identify patients at risk of cardiovascular complications or recurrence.

Treatment Outcome Analysis: Evaluate the effectiveness of treatments based on response and recurrence data.

Disease Progression Modeling: Study the progression of cardiovascular diseases using features like cholesterol, hypertension, and risk scores.

Public Health Research: Analyze demographic and clinical patterns to inform cardiovascular disease management strategies.

Predictive Modeling: Build machine learning models to predict disease outcomes based on demographic and clinical features. ### Coverage This synthetic dataset is anonymized and adheres to data privacy standards. It is designed for research and learning purposes, with diverse cases representing varying levels of cardiovascular risk, treatment responses, and clinical conditions.

License

CC0 (Public Domain)

Who Can Use It

Data Science Practitioners: For practicing data preprocessing, classification, and regression tasks related to cardiovascular diseases.

Healthcare Professionals and Researchers: To explore relationships between clinical metrics and cardiovascular disease outcomes.

Public Health Analysts: To understand trends and design strategies for managing cardiovascular diseases.

Policy Makers and Regulators: For data-driven decision-making in cardiovascular disease prevention and treatment policies.
Lung Disease
kaggle.com
Updated Mar 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatemeh Mehrparvar (2024). Lung Disease [Dataset]. https://www.kaggle.com/datasets/fatemehmehrparvar/lung-disease
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2024
Dataset provided by
Kaggle
Authors
Fatemeh Mehrparvar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

Lung disease encompasses a wide range of conditions that affect the lungs and their ability to function effectively. These conditions can be caused by various factors, including infections, environmental factors, genetic predispositions, and lifestyle choices. Lung diseases can result in symptoms such as coughing, shortness of breath, chest pain, and reduced lung function. Detecting and diagnosing lung diseases is crucial for patient care, as they can have a significant impact on an individual's health and quality of life.

Global Impact:

Lung diseases have a substantial global impact. According to the World Health Organization (WHO), respiratory diseases, including lung diseases, are responsible for a significant portion of global mortality. In 2016, respiratory diseases were the fourth leading cause of death worldwide, with an estimated 3.0 million deaths attributed to them. Conditions like pneumonia, chronic obstructive pulmonary disease (COPD), and lung cancer contribute to this high mortality rate. Early detection and accurate diagnosis are essential for reducing the burden of lung diseases on public health.

The Need to Detect Lung Diseases:

Detecting lung diseases is vital for several reasons:

Early Intervention: Early detection allows for timely medical intervention and treatment, increasing the chances of successful management and recovery.

Disease Classification: Differentiating between various lung diseases, such as pneumonia, tuberculosis, and lung cancer, is crucial for appropriate treatment planning.

Public Health: Effective disease detection and management can have a positive impact on public health by reducing the overall disease burden.

Lung X-Ray Image Dataset:

The "Lung X-Ray Image Dataset" is a comprehensive collection of X-ray images that plays a pivotal role in the detection and diagnosis of lung diseases. This dataset contains a large number of high-quality X-ray images, meticulously collected from diverse sources, including hospitals, clinics, and healthcare institutions.

Dataset Contents:

Total Number of Images: The dataset comprises a total of 3,475 X-ray images. Classes within the Dataset:

Normal (1250 Images): These images represent healthy lung conditions, serving as a reference for comparison in diagnostic procedures.

Lung Opacity (1125 Images): This class includes X-ray images depicting various degrees of lung abnormalities, providing a diverse set of cases for analysis.

Viral Pneumonia (1100 Images): Images in this category are associated with viral pneumonia cases, contributing to the understanding and identification of this specific lung infection.

In conclusion, the "Lung X-Ray Image Dataset" plays a crucial role in the healthcare sector by providing a diverse and well-documented collection of X-ray images that support the detection, classification, and understanding of lung diseases. This resource is instrumental in advancing the field of respiratory medicine and improving patient outcomes.
i
Heart Disease Dataset (Comprehensive)
ieee-dataport.org
Updated Oct 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MANU SIDDHARTHA (2019). Heart Disease Dataset (Comprehensive) [Dataset]. https://ieee-dataport.org/open-access/heart-disease-dataset-comprehensive
Explore at:
Dataset updated
Oct 24, 2019
Authors
MANU SIDDHARTHA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset
i
Medical Imaging Datasets for Multimodal Disease Detection and Diagnosis...
ieee-dataport.org
Updated Aug 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nur Rusyidah Azri (2024). Medical Imaging Datasets for Multimodal Disease Detection and Diagnosis Research [Dataset]. https://ieee-dataport.org/documents/medical-imaging-datasets-multimodal-disease-detection-and-diagnosis-research
Explore at:
Dataset updated
Aug 10, 2024
Authors
Nur Rusyidah Azri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
brain MRI
o
Synthetic Parkinson's Disease Detection Dataset
opendatabay.com
.csv
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Parkinson's Disease Detection Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/b71f3ac4-9476-4c1f-b8f0-aa3f0277603b
Explore at:
.csvAvailable download formats
Dataset updated
Apr 28, 2025
Dataset authored and provided by
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Patient Health Records & Digital Health
Description
This synthetic Parkinson's Disease Detection Dataset is designed for educational and research purposes in the fields of data science, healthcare analytics, and medical research. It contains key clinical and speech features from individuals with Parkinson's Disease, which can be used to build predictive models, analyze disease progression, and assess the impact of motor and speech symptoms. The dataset is ideal for tasks such as classification, regression, and the study of biomarkers for Parkinson’s disease.

Dataset Features: Index: Row identifier for each record. Age: The age of the patient. Sex: Gender of the patient (Male/Female). Test_time: Duration or time of the test conducted (in minutes). Motor_UPDRS: Motor component score from the Unified Parkinson’s Disease Rating Scale (UPDRS). Total_UPDRS: Total score from the UPDRS, including both motor and non-motor components. Jitter(%): Percentage of frequency variation in speech, a key indicator of Parkinson’s disease. Jitter(Abs): Absolute jitter value, quantifying frequency variation. Jitter:RAP: Jitter measured using the Relative Average Perturbation method. Jitter:PPQ5: Jitter measured using the 5-point Perturbation Quotient. Jitter:DDP: Jitter measured using the Difference of Difference of Polynomials method. Shimmer: Amplitude variation in speech, indicating vocal instability. Shimmer(dB): Amplitude variation in decibels. Shimmer:APQ3: Shimmer measured using the 3-point Amplitude Perturbation Quotient. Shimmer:APQ5: Shimmer measured using the 5-point Amplitude Perturbation Quotient. Shimmer:APQ11: Shimmer measured using the 11-point Amplitude Perturbation Quotient. Shimmer:DDA: Shimmer measured using the Difference of Difference of Amplitudes method. NHR: Noise to Harmonics Ratio, a measure of voice quality and periodicity. HNR: Harmonics to Noise Ratio, reflecting the periodicity of speech sounds. RPDE: Recurrence Period Density Entropy, derived from voice signal analysis. DFA: Detrended Fluctuation Analysis, studying self-similarity in speech signals. PPE: Pitch Period Entropy, quantifying irregularity in pitch periods during speech. Usage This dataset is perfect for various applications related to Parkinson's Disease detection and analysis:

Disease Prediction: Develop machine learning models to predict the presence and progression of Parkinson’s Disease. Speech Analysis: Study speech features like jitter and shimmer for early diagnosis and monitoring of Parkinson's Disease. Predictive Modeling: Build models using clinical and speech features to assess disease severity. Clinical Research: Investigate the relationship between motor and non-motor symptoms of Parkinson's Disease. Healthcare Analytics: Apply data science techniques to improve the diagnosis and treatment of Parkinson’s Disease. Coverage This synthetic dataset is anonymized and designed for research and learning purposes. It includes a diverse range of speech and clinical data, simulating different stages of Parkinson’s Disease for analysis.

License CC0 (Public Domain)

Who Can Use It Data Science Practitioners: For practicing data preprocessing, classification, and regression tasks. Healthcare Analysts and Researchers: To explore relationships between clinical and speech features in Parkinson's Disease. Medical Professionals: To enhance understanding of Parkinson’s Disease symptoms and progressions. Machine Learning Enthusiasts: To experiment with models for predicting Parkinson’s Disease using diverse features. Academic Institutions: For use in educational settings to teach data science applications in healthcare.
o
WHO Health Indicators - Dataset - Data Catalog Armenia
data.opendata.am
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). WHO Health Indicators - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/who-health-indicators
Explore at:
Dataset updated
May 31, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Contains data from World Health Organization's data portal covering the following categories: Mortality and global health estimates, Sustainable development goals, Millennium Development Goals (MDGs), Health systems, Malaria, Tuberculosis, Child health, Infectious diseases, Neglected Tropical Diseases, World Health Statistics, Health financing, Tobacco, Substance use and mental health, Injuries and violence, HIV/AIDS and other STIs, Public health and environment, Nutrition, Urban health, Child mortality, Noncommunicable diseases, Noncommunicable diseases CCS, Negelected tropical diseases, Infrastructure, Essential health technologies, Medical equipment, Demographic and socioeconomic statistics, Health inequality monitor, Health Equity Monitor, Child malnutrition, TOBACCO, Neglected tropical diseases, International Health Regulations (2005) monitoring framework, 0, Insecticide resistance, Oral health, Universal Health Coverage, Global Observatory for eHealth (GOe), RSUD: GOVERNANCE, POLICY AND FINANCING : PREVENTION, RSUD: GOVERNANCE, POLICY AND FINANCING: TREATMENT, RSUD: GOVERNANCE, POLICY AND FINANCING: FINANCING, RSUD: SERVICE ORGANIZATION AND DELIVERY: TREATMENT SECTORS AND PROVIDERS, RSUD: SERVICE ORGANIZATION AND DELIVERY: TREATMENT CAPACITY AND TREATMENT COVERAGE, RSUD: SERVICE ORGANIZATION AND DELIVERY: PHARMACOLOGICAL TREATMENT, RSUD: SERVICE ORGANIZATION AND DELIVERY: SCREENING AND BRIEF INTERVENTIONS, RSUD: SERVICE ORGANIZATION AND DELIVERY: PREVENTION PROGRAMS AND PROVIDERS, RSUD: SERVICE ORGANIZATION AND DELIVERY: SPECIAL PROGRAMMES AND SERVICES, RSUD: HUMAN RESOURCES, RSUD: INFORMATION SYSTEMS, RSUD: YOUTH, FINANCIAL PROTECTION, AMR GLASS, Noncommunicable diseases and mental health, Health workforce, AMR GASP, ICD, SEXUAL AND REPRODUCTIVE HEALTH, Immunization, NLIS, AMC GLASS. For links to individual indicator metadata, see resource descriptions.
Medicaid Coverage Of Cessation Treatments And Barriers To Treatments
data.cdc.gov
data.virginia.gov
+3more
application/rdfxml +5
Updated Apr 22, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention National Center for Disease Prevention and Health Promotion Office on Smoking and Health (2014). Medicaid Coverage Of Cessation Treatments And Barriers To Treatments [Dataset]. https://data.cdc.gov/Cessation-Coverage-/Medicaid-Coverage-Of-Cessation-Treatments-And-Barr/ntaa-dtex
Explore at:
application/rdfxml, json, csv, application/rssxml, tsv, xmlAvailable download formats
Dataset updated
Apr 22, 2014
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
Centers for Disease Control and Prevention National Center for Disease Prevention and Health Promotion Office on Smoking and Health
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
2008-2024. American Lung Association. Cessation Coverage. Medicaid data compiled by the Centers for Disease Control and Prevention’s Office on Smoking and Health were obtained from the State Tobacco Cessation Coverage Database, developed and administered by the American Lung Association. Data from 2008-2012 are reported on an annual basis; beginning in 2013 data are reported on a quarterly basis. Data include state-level information on Medicaid coverage of approved medications by the Food and Drug Administration (FDA) for tobacco cessation treatment; types of counseling recommended by the Public Health Service (PHS) and barriers to accessing cessation treatment. Note: Section 2502 of the Patient Protection and Affordable Care Act requires all state Medicaid programs to cover all FDA-approved tobacco cessation medications as of January 1, 2014. However, states are currently in the process of modifying their coverage to come into compliance with this requirement. Data in the STATE System on Medicaid coverage of tobacco cessation medications reflect evidence of coverage that is found in documentable sources, and may not yet reflect medications covered under this requirement.
f
Instruction formatted data for LLaMA2-chat fine tuning
figshare.com
application/x-rar
Updated Feb 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ForexPredict BoEC (2024). Instruction formatted data for LLaMA2-chat fine tuning [Dataset]. http://doi.org/10.6084/m9.figshare.25200836.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25200836.v1
Dataset updated
Feb 10, 2024
Dataset provided by
figshare
Authors
ForexPredict BoEC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
######################### Medical scope, Prompt #1 61K instruction PubMED-Unlabeled ###################################################################Instruction: You are a language model specialized in generating JSON or YAML output. Given an instruction, you should generate a valid JSON or YAML object according to the provided guidelines. Your output should conform to the following JSON format: { "key1": "value1", "key2": "value2", ... } Make sure to adhere to the specified key-value pairs and maintain the correct syntax throughout your response. Your goal is to generate accurate and properly structured JSON or YAML output in response to the given instructions withoutany extra comment. Input: As a doctor that only speak in JSON, provide a "answer", "context and "keywords" about following question: Is naturopathy as effective as conventional therapy for treatment of menopausal symptoms? Output:{"answer" :Naturopathy appears to be an effective alternative for relief of specific menopausal symptoms compared to conventional therapy.,"context" :Although the use of alternative medicine in the United States is increasing, no published studies have documented the effectiveness of naturopathy for treatment of menopausal symptoms compared to women receiving conventional therapy in the clinical setting."keywords" :['Anxiety', 'Cohort Studies', 'Confidence Intervals', 'Estrogen Replacement Therapy', 'Female', 'Health Status', 'Hot Flashes', 'Humans', 'Menopause', 'Middle Aged','Naturopathy', 'Odds Ratio', 'Patient Satisfaction', 'Retrospective Studies', 'Risk Factors', 'Severity of Illness Index', 'Sleep Initiation and Maintenance Disorders','Surveys and Questionnaires', 'Time Factors', 'Urinary Incontinence', 'Vaginal Diseases', 'Washington']} ############################### Medical scope, Prompt #2 symptom-disease dataset 1K instruction ######################################################Instruction: You are a language model specialized in generating JSON or YAML output. Given an instruction, you should generate a valid JSON or YAML object according to theprovided guidelines. Your output should conform to the following JSON format: { "key1": "value1", "key2": "value2", ... } Make sure to adhere to the specified key-value pairs and maintain the correct syntax throughout your response. Your goal is to generate accurate and properly structured JSON or YAML output in response to the given instructions without any extra comment. Input: You are a doctor that only speak in JSON. Do not output that is'nt properly formated JSON. Provide following information about Panic disorder disease. Fill information about "symptom", "reason", "TestsAndProcedures" and "commonMedications" about Panic disorder disease in JSON format. Output: {"symptom" :['Anxiety and nervousness', 'Depression', 'Shortness of breath', 'Depressive or psychotic symptoms', 'Sharp chest pain', 'Dizziness', 'Insomnia', 'Abnormal involuntary movements', 'Chest tightness', 'Palpitations', 'Irregular heartbeat', 'Breathing fast'],"reason" :Panic disorder is an anxiety disorder characterized by recurring severe panic attacks. It may also include significant behavioral changes lasting at least a month and of ongoing worry about the implications or concern about having other attacks. The latter are called anticipatory attacks (DSM-IVR). Panic disorder is not the same as agoraphobia (fear of public places), although many afflicted with panic disorder also suffer from agoraphobia. Panic attacks cannot be predicted, therefore an individual may become stressed, anxious or worried wondering when the next panic attack will occur. Panic disorder may be differentiated as a medical condition, or chemical imbalance. The DSM-IV-TR describes panic disorder and anxiety differently. Whereas anxiety is preceded by chronic stressors which build to reactions of moderate intensity that can last for days, weeks or months, panic attacks are acute events triggered by a sudden, out-of-the-blue cause: duration is short and symptoms are more intense. Panic attacks can occur in children, as well as adults. Panic in young people may be particularly distressing because children tend to have less insight about what is happening, and parents are also likely to experience distress when attacks occur. "TestsAndProcedures" : ['Psychotherapy', 'Mental health counseling', 'Electrocardiogram', 'Depression screen (Depression screening)', 'Toxicology screen', 'Psychological and psychiatric evaluation and therapy']"commonMedications" : ['Lorazepam', 'Alprazolam (Xanax)', 'Clonazepam', 'Paroxetine (Paxil)', 'Venlafaxine (Effexor)', 'Mirtazapine', 'Buspirone (Buspar)', 'Fluvoxamine (Luvox)', 'Imipramine', 'Desvenlafaxine (Pristiq)', 'Clomipramine', 'Acamprosate (Campral)']} ############################### Medical scope, Prompt #3 194K instruction MED-MCQA dataset ######################################################Instruction: You are a language model specialized in generating JSON or YAML output. Given an instruction, you should generate a valid JSON or YAML object according to the provided guidelines. Your output should conform to the following JSON format: { "key1": "value1", "key2": "value2", ... } Make sure to adhere to the specified key-value pairs and maintain the correct syntax throughout your response. Your goal is to generate accurate and properly structured JSON or YAML output in response to the given instructions without any extra comment. Input: You are a medical teacher that only speak in JSON. Do not output that is'nt properly formated JSON. Generate a multichoice question and answer inAnatomy field and Urinary tract topic with following format: { "question": "expresion" , "OPA": "expresion", "OPB" : "expresion", "OPC": "expresion","OPD": "expresion", "Answer": "3"}. Output: { "question": Chronic urethral obstruction due to benign prismatic hyperplasia can lead to the following change in kidney parenchyma, "opa": Hyperplasia, "opb": Hyperophy, "opc": Atrophy, "opd": Dyplasia, "Answer": 2. ############################### Finance scope, Prompt #4 38K instruction ######################################################Instruction: You are a language model specialized in generating JSON or YAML output. Given an instruction, you should generate a valid JSON or YAML object according to the provided guidelines. Your output should conform to the following JSON format: { "key1": "value1", "key2": "value2", ... } Make sure to adhere to the specified key-value pairs and maintain the correct syntax throughout your response. Your goal is to generate accurate and properly structured JSON or YAML output in response to the given instructions without any extra comment. Input: You are a financial news analyzer that only speaks in JSON. Do not output that isn't properly formatted JSON. Analyze this news title "Here Why Bitcoin is Likely to See Historic Volatility in the Next Few Days". Provide sentiment as a probability distribution also indicate target market and related asset to the news title in JSON format. Output:{"sentiment": {"positive": 0.4463904500007629, "negative": 0.06607405841350555,"neutral":0.4875355064868927}, "market": "cryptocurrency" , "relatedAsset": ['BTC/USDT']}.
m
Prediction of Heart Attack
data.mendeley.com
Updated May 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rakin Sad Aftab (2024). Prediction of Heart Attack [Dataset]. http://doi.org/10.17632/yrwd336rkz.1
Explore at:
Unique identifier
https://doi.org/10.17632/yrwd336rkz.1
Dataset updated
May 10, 2024
Authors
Rakin Sad Aftab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of 303 observations, each representing a unique patient, and 14 different attributes associated with heart disease. This dataset is a critical resource for researchers focusing on predictive analytics in cardiovascular diseases.

Variables Overview: 1. Age: A continuous variable indicating the age of the patient. 2. Sex: A categorical variable with two levels ('Male', 'Female'), indicating the gender of the patient. 3. CP (Chest Pain type): A categorical variable describing the type of chest pain experienced by the patient, with categories such as 'Asymptomatic', 'Atypical Angina', 'Typical Angina', and 'Non-Angina'. 4. TRTBPS (Resting Blood Pressure): A continuous variable indicating the resting blood pressure (in mm Hg) on admission to the hospital. 5. Chol (Serum Cholesterol): A continuous variable measuring the serum cholesterol in mg/dl. 6. FBS (Fasting Blood Sugar): A binary variable where 1 represents fasting blood sugar > 120 mg/dl, and 0 otherwise. 7. Rest ECG (Resting Electrocardiographic Results): Categorizes the resting electrocardiographic results of the patient into 'Normal', 'ST Elevation', and other categories. 8. Thalachh (Maximum Heart Rate Achieved): A continuous variable indicating the maximum heart rate achieved by the patient. 9. Exng (Exercise Induced Angina): A binary variable where 1 indicates the presence of exercise-induced angina, and 0 otherwise. 10. Oldpeak (ST Depression Induced by Exercise Relative to Rest): A continuous variable indicating the ST depression induced by exercise relative to rest. 11. Slope (Slope of the Peak Exercise ST Segment): A categorical variable with levels such as 'Flat', 'Up Sloping', representing the slope of the peak exercise ST segment. 12. CA (Number of Major Vessels Colored by Fluoroscopy): A continuous variable ranging from 0 to 3, indicating the number of major vessels colored by fluoroscopy. 13. Thall (Thalassemia): A categorical variable, indicating different types of thalassemia (a blood disorder). 14. Target: A binary target variable indicating the presence (1) or absence (0) of heart disease.

Descriptive Statistics: The patients' age ranges from 29 to 77 years, with a mean age of approximately 54 years. The resting blood pressure spans from 94 to 200 mm Hg, and the average cholesterol level is about 246 mg/dl. The maximum heart rate achieved varies widely among patients, from 71 to 202 beats per minute.

Importance for Research: This dataset provides a comprehensive view of various factors that could potentially be linked to heart disease, making it an invaluable resource for developing predictive models. By analyzing relationships and patterns within these variables, researchers can identify key predictors of heart disease and enhance the accuracy of diagnostic tools. This could lead to better preventive measures and treatment strategies, ultimately improving patient outcomes in the realm of cardiovascular health
d
The medication rate for aspirin (AMI) patients within three months after...
data.gov.tw
csv
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Health Insurance Administration (2025). The medication rate for aspirin (AMI) patients within three months after discharge - Aspirin (Disease-specific indicator) [Dataset]. https://data.gov.tw/en/datasets/39230
Explore at:
csvAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
National Health Insurance Administration
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
The Bureau of National Health Insurance has selected evidence-based medical indicators to be made public in order to enhance the quality of medical care for patients with myocardial infarction, and to serve as a reference for disease treatment and healthcare quality. The public indicators are divided into three categories based on the treatment period and quality aspects, including inpatient process assessment, assessment of continued medication after discharge, and outcome assessment. This includes the ratio of inpatients with acute myocardial infarction (AMI) who undergo LDL cholesterol testing during hospitalization, the medication ratio for AMI patients during hospitalization, and within three, six, and nine months after discharge, and the ratio of patients returning to the emergency department within three days after discharge due to the same primary diagnosis or related conditions, as well as the unplanned readmission ratio within fourteen days after discharge due to the same primary diagnosis or related conditions.
P
Data from: ADNI Dataset
paperswithcode.com
opendatalab.com
Updated Jul 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). ADNI Dataset [Dataset]. https://paperswithcode.com/dataset/adni
Explore at:
Dataset updated
Jul 1, 2024
Description
Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite study that aims to improve clinical trials for the prevention and treatment of Alzheimer’s disease (AD).[1] This cooperative study combines expertise and funding from the private and public sector to study subjects with AD, as well as those who may develop AD and controls with no signs of cognitive impairment.[2] Researchers at 63 sites in the US and Canada track the progression of AD in the human brain with neuroimaging, biochemical, and genetic biological markers.[2][3] This knowledge helps to find better clinical trials for the prevention and treatment of AD. ADNI has made a global impact,[4] firstly by developing a set of standardized protocols to allow the comparison of results from multiple centers,[4] and secondly by its data-sharing policy which makes available all at the data without embargo to qualified researchers worldwide.[5] To date, over 1000 scientific publications have used ADNI data.[6] A number of other initiatives related to AD and other diseases have been designed and implemented using ADNI as a model.[4] ADNI has been running since 2004 and is currently funded until 2021.[7]

Source: Wikipedia, https://en.wikipedia.org/wiki/Alzheimer%27s_Disease_Neuroimaging_Initiative
Medication-Assisted Treatment in Medi-Cal for Opioid Use Disorders, by...
data.chhs.ca.gov
data.ca.gov
+2more
csv, zip
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Services (2025). Medication-Assisted Treatment in Medi-Cal for Opioid Use Disorders, by County [Dataset]. https://data.chhs.ca.gov/dataset/medication-assisted-treatment-in-medi-cal-for-opioid-use-disorders-quarterly-by-county
Explore at:
csv(20602), zip, csv(107785)Available download formats
Dataset updated
Apr 28, 2025
Dataset provided by
California Department of Health Care Serviceshttp://www.dhcs.ca.gov/
Authors
Department of Health Care Services
Description
Medi-Cal claims for persons receiving medication-assisted treatment for opioid use disorders are unduplicated by medication. Methadone program participation is identified from claims submitted by the counties. Buprenorphine and naltrexone pharmacy claims are those used for treating opioid use disorders. Naloxone pharmacy claims are those dispensed as a first aid item (injectables and nasal sprays). The county source is taken from submitted claims.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1

Disease and symptoms dataset 2023

Explore at:

Unique identifier

https://doi.org/10.17632/2cxccsxydc.1

Dataset updated

Mar 3, 2025

Authors

Bran Stark

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.

Clear search

Close search

Google apps

Main menu

Disease and symptoms dataset 2023

Diseases_Dataset

Synthetic Respiratory Symptoms Dataset

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Healthcare Data Industry Report

MedQuAD Dataset

Covid19_Dataset

Synthetic Dry Eye Disease Patient Records

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Data from: Discovering health topics in social media using topic models

Synthetic Cardiovascular Disease Prediction Dataset

Dataset Features

Distribution

Usage

License

Who Can Use It

Lung Disease

Heart Disease Dataset (Comprehensive)

Medical Imaging Datasets for Multimodal Disease Detection and Diagnosis...

Synthetic Parkinson's Disease Detection Dataset

WHO Health Indicators - Dataset - Data Catalog Armenia

Medicaid Coverage Of Cessation Treatments And Barriers To Treatments

Instruction formatted data for LLaMA2-chat fine tuning

Prediction of Heart Attack

The medication rate for aspirin (AMI) patients within three months after...

Data from: ADNI Dataset

Medication-Assisted Treatment in Medi-Cal for Opioid Use Disorders, by...

Disease and symptoms dataset 2023