Facebook
TwitterThis dataset was created by Priyanka
Facebook
Twitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is curated by combining 3 popular heart disease datasets. The first dataset (Collected from Kaggle) contains 70000 records with 11 independent features which makes it the largest heart disease dataset available so far for research purposes. These data were collected at the moment of medical examination and information given by the patient. Second and third datasets contain 303 and 293 intstances respectively with 13 common features. The three datasets used for its curation are:Cardio Data (Kaggle Dataset)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Ronak Kantariya
Released under CC0: Public Domain
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Heart Disease Data combined from UCI repository of following places:
Cleveland, Hungary, Switzerland, and VA Long Beach
Features: Age: Age of individual. 20-80 Sex: This is the gender of the individual. It is represented as a binary value where 1 stands for male and 0 stands for female. ChestPainType: This categorizes the type of chest pain experienced by the individual. The values are: Value 1: Typical angina, which is chest pain related to the heart. Value 2: Atypical angina, which is chest pain not related to the heart. Value 3: Non-anginal pain, which is typically sharp and non-continuous. Value 4: Asymptomatic, meaning the individual experiences no symptoms. RestingBP: This is the individual’s resting blood pressure (in mm Hg) when they are at rest. Cholesterol: This is the individual’s cholesterol level, measured in mg/dl. FastingBS: This indicates whether the individual’s fasting blood sugar is greater than 120 mg/dl. It is represented as a binary value where 1 stands for true and 0 stands for false. MaxHR: This is the maximum heart rate achieved by the individual. ExerciseAngina: This indicates whether the individual experiences angina (chest pain) induced by exercise. It is represented as a binary value where 1 stands for yes and 0 stands for no.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muneer Iqbal24
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Adaptation of http://archive.ics.uci.edu/ml/datasets/Heart+Disease
Ready for usage with ehrapy
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This synthetic dataset is designed to predict the risk of heart disease based on a combination of symptoms, lifestyle factors, and medical history. Each row in the dataset represents a patient, with binary (Yes/No) indicators for symptoms and risk factors, along with a computed risk label indicating whether the patient is at high or low risk of developing heart disease.
The dataset contains 70,000 samples, making it suitable for training machine learning models for classification tasks. The goal is to provide researchers, data scientists, and healthcare professionals with a clean and structured dataset to explore predictive modeling for cardiovascular health.
This dataset is a side project of EarlyMed, developed by students of Vellore Institute of Technology (VIT-AP). EarlyMed aims to leverage data science and machine learning for early detection and prevention of chronic diseases.
chest_pain): Presence of chest pain, a common symptom of heart disease.shortness_of_breath): Difficulty breathing, often associated with heart conditions.fatigue): Persistent tiredness without an obvious cause.palpitations): Irregular or rapid heartbeat.dizziness): Episodes of lightheadedness or fainting.swelling): Swelling due to fluid retention, often linked to heart failure.radiating_pain): Radiating pain, a hallmark of angina or heart attacks.cold_sweats): Symptoms commonly associated with acute cardiac events.age): Patient's age in years (continuous variable).hypertension): History of hypertension (Yes/No).cholesterol_high): Elevated cholesterol levels (Yes/No).diabetes): Diagnosis of diabetes (Yes/No).smoker): Whether the patient is a smoker (Yes/No).obesity): Obesity status (Yes/No).family_history): Family history of cardiovascular conditions (Yes/No).risk_label): Binary label indicating the risk of heart disease:
0: Low risk1: High riskThis dataset was synthetically generated using Python libraries such as numpy and pandas. The generation process ensured a balanced distribution of high-risk and low-risk cases while maintaining realistic correlations between features. For example:
- Patients with multiple risk factors (e.g., smoking, hypertension, and diabetes) were more likely to be labeled as high risk.
- Symptom patterns were modeled after clinical guidelines and research studies on heart disease.
The design of this dataset was inspired by the following resources:
This dataset can be used for a variety of purposes:
Machine Learning Research:
Healthcare Analytics:
Educational Purposes:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Heart Disease Data: Enhanced with Feature Engineering for Advanced Analysis
This dataset is an advanced version of the classic UCI Machine Learning heart disease dataset, enriched with feature engineering to support more sophisticated analyses. The original features have been supplemented with newly derived attributes that help to better understand and model cardiovascular risk factors.
These additional features facilitate a deeper analysis of cardiovascular health by incorporating various derived metrics and categorizations, enhancing the overall utility of the dataset for predictive modeling and data exploration.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Medical-Grade Explainable AI Project Assets
This dataset contains comprehensive assets for a production-ready Explainable AI (XAI) heart disease prediction system achieving 94.1% accuracy with full model transparency.
📊 CONTEXT: Healthcare AI faces a critical "black box" problem where models make predictions without explanations. This project demonstrates how to build trustworthy medical AI using SHAP and LIME for real-time explainability.
🎯 PROJECT GOAL: Create a clinically deployable AI system that not only predicts heart disease with high accuracy but also provides interpretable explanations for each prediction, enabling doctor-AI collaboration.
🚀 KEY FEATURES: - 94.1% prediction accuracy (XGBoost + Optuna) - Real-time SHAP & LIME explanations - FastAPI backend with medical validation - Gradio clinical dashboard - Full MLOps pipeline (MLflow tracking) - 4-Layer enterprise architecture
📁 ASSETS INCLUDED:
- heart_clean.csv - Clinical dataset ready for analysis
- SHAP summary plots for global explainability
- Performance metrics and visualizations
- Architecture diagrams
- Model evaluation results
🔗 COMPANION RESOURCES: - Live Demo: https://huggingface.co/spaces/Ariyan-Pro/HeartDisease-Predictor - Notebook: https://www.kaggle.com/code/ariyannadeem/heart-disease-prediction-with-explainable-ai - Source Code: https://github.com/Ariyan-Pro/ExplainableAI-HeartDisease
Perfect for learning medical AI implementation, explainable AI techniques, and production deployment.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four out of 5CVD deaths are due to heart attacks and strokes, and one-third of these deaths occur prematurely in people under 70 years of age. Heart failure is a common event caused by CVDs and this dataset contains 11 features that can be used to predict a possible heart disease.
People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.
This dataset was created by combining different datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:
Total: 1190 observations Duplicated: 272 observations
Final dataset: 918 observations
Every dataset used can be found under the Index of heart disease datasets from UCI Machine Learning Repository on the following link: https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/
fedesoriano. (September 2021). Heart Failure Prediction Dataset. Retrieved [Date Retrieved] from https://www.kaggle.com/fedesoriano/heart-failure-prediction.
Creators:
Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This data set dates from 1988 and consists of four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It contains 9 attributes and is a shorter version of the original model. The "target" field refers to the presence of heart disease in the patient. It is integer valued 0 = no disease and 1 = disease. Source of the original data can be found here: https://archive.ics.uci.edu/ml/datasets/heart+Disease
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains cleaned clinical, demographic, and physiological attributes collected from patients undergoing medical evaluation for potential heart disease. It is widely used for predictive modeling in healthcare, particularly to identify whether a patient is likely to have heart disease based on diagnostic measurements.
The target variable (num) indicates the presence or absence of heart disease, making this dataset suitable for binary classification tasks.
Dataset Structure
Columns (features):
age → Age of the patient (years)
sex → Gender (Male / Female)
cp → Chest pain type
typical angina
atypical angina
non-anginal pain
asymptomatic
trestbps → Resting blood pressure (mm Hg)
chol → Serum cholesterol (mg/dl)
fbs → Fasting blood sugar (True if > 120 mg/dl, else False)
restecg → Resting electrocardiographic results (normal, lv hypertrophy, etc.)
thalch → Maximum heart rate achieved
exang → Exercise induced angina (True = yes, False = no)
oldpeak → ST depression induced by exercise relative to rest (numeric value)
num → Target variable (Heart disease diagnosis)
0 → No heart disease
1-4 → Heart disease present (severity levels)
Use Cases-
Predictive modeling for heart disease classification
Exploratory data analysis (EDA) of risk factors
Machine learning projects in healthcare analytics
Medical research on correlations between risk factors and heart disease
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by QuangNguyen711
Released under MIT
Facebook
TwitterThis dataset was created by Nishant Bansal
Facebook
TwitterThe "Framingham" heart disease dataset includes over 4,240 records,16 columns and 15 attributes. The goal of the dataset is to predict whether the patient has 10-year risk of future (CHD) coronary heart disease
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a multivariate type of dataset which means providing or involving a variety of separate mathematical or statistical variables, multivariate numerical data analysis. It is composed of 14 attributes which are age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, oldpeak — ST depression induced by exercise relative to rest, the slope of the peak exercise ST segment, number of major vessels and Thalassemia. This database includes 76 attributes, but all published studies relate to the use of a subset of 14 of them. The Cleveland database is the only one used by ML researchers to date. One of the major tasks on this dataset is to predict based on the given attributes of a patient that whether that particular person has heart disease or not and other is the experimental task to diagnose and find out various insights from this dataset which could help in understanding the problem more.
id: (Unique id for each patient)
age: (Age of the patient in years)
origin: (place of study)
sex: (Male/Female)
cp: chest pain type:
1. typical angina
2. atypical angina
3. non-anginal
4. asymptomatic
trestbps: resting blood pressure (resting blood pressure (in mm Hg on admission to the hospital))
chol: (serum cholesterol in mg/dl)
fbs: (if fasting blood sugar > 120 mg/dl)
restecg: (resting electrocardiographic results)
Values: [normal, stt abnormality, lv hypertrophy]
thalach: maximum heart rate achieved
exang: exercise-induced angina (True/ False)
oldpeak: ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment
ca: number of major vessels (0-3) colored by fluoroscopy
thal: [normal; fixed defect; reversible defect]
num: the predicted attribute [0 shows no disease and 1, 2, 3 and 4 shows different level of disease]
Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D.
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64,304--310. David W. Aha & Dennis Kibler. "Instance-based prediction of heart-disease presence with the Cleveland database." Gennari, J.H., Langley, P, & Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11--61.
The authors of the databases have requested that any publications resulting from the use of the data include the names of the principal investigator responsible for the data collection at each institution.
They would be:
Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.
Facebook
TwitterThis dataset integrates all the databases present in Heart Disease Dataset available at UCI Machine Learning Repository. Original one contains 4 databases: Cleveland, Hungarian, Long Beach, and Switzerland. Most of the work has been done using Cleveland dataset only.
Originally there are 76 attributes in the dataset, Selection of attributes depends on one's need. Here I've taken 10 attributes for the prediction.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context: The leading cause of death in the developed world is heart disease. Therefore there needs to be work done to help prevent the risks of having a heart attack or stroke.
Content: Use this dataset to predict which patients are most likely to suffer from heart disease in the near future using the features given.
Acknowledgment: This data comes from the UCI at https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
Facebook
TwitterThis dataset was created by Saqlain Sheikh
Facebook
TwitterThis dataset was created by Priyanka