Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muneer Iqbal24
Released under CC0: Public Domain
Facebook
TwitterThis dataset was created by Priyanka
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is curated by combining 3 popular heart disease datasets. The first dataset (Collected from Kaggle) contains 70000 records with 11 independent features which makes it the largest heart disease dataset available so far for research purposes. These data were collected at the moment of medical examination and information given by the patient. Second and third datasets contain 303 and 293 intstances respectively with 13 common features. The three datasets used for its curation are:Cardio Data (Kaggle Dataset)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Heart Disease Data combined from UCI repository of following places:
Cleveland, Hungary, Switzerland, and VA Long Beach
Features: Age: Age of individual. 20-80 Sex: This is the gender of the individual. It is represented as a binary value where 1 stands for male and 0 stands for female. ChestPainType: This categorizes the type of chest pain experienced by the individual. The values are: Value 1: Typical angina, which is chest pain related to the heart. Value 2: Atypical angina, which is chest pain not related to the heart. Value 3: Non-anginal pain, which is typically sharp and non-continuous. Value 4: Asymptomatic, meaning the individual experiences no symptoms. RestingBP: This is the individual’s resting blood pressure (in mm Hg) when they are at rest. Cholesterol: This is the individual’s cholesterol level, measured in mg/dl. FastingBS: This indicates whether the individual’s fasting blood sugar is greater than 120 mg/dl. It is represented as a binary value where 1 stands for true and 0 stands for false. MaxHR: This is the maximum heart rate achieved by the individual. ExerciseAngina: This indicates whether the individual experiences angina (chest pain) induced by exercise. It is represented as a binary value where 1 stands for yes and 0 stands for no.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Adaptation of http://archive.ics.uci.edu/ml/datasets/Heart+Disease
Ready for usage with ehrapy
Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Heart
The Heart dataset from the UCI ML repository. Does the patient have heart disease?
Configurations and tasks
Configuration Task
hungary Binary classification
Usage
from datasets import load_dataset
dataset = load_dataset("mstz/heart", "hungary")["train"]
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains cleaned clinical, demographic, and physiological attributes collected from patients undergoing medical evaluation for potential heart disease. It is widely used for predictive modeling in healthcare, particularly to identify whether a patient is likely to have heart disease based on diagnostic measurements.
The target variable (num) indicates the presence or absence of heart disease, making this dataset suitable for binary classification tasks.
Dataset Structure
Columns (features):
age → Age of the patient (years)
sex → Gender (Male / Female)
cp → Chest pain type
typical angina
atypical angina
non-anginal pain
asymptomatic
trestbps → Resting blood pressure (mm Hg)
chol → Serum cholesterol (mg/dl)
fbs → Fasting blood sugar (True if > 120 mg/dl, else False)
restecg → Resting electrocardiographic results (normal, lv hypertrophy, etc.)
thalch → Maximum heart rate achieved
exang → Exercise induced angina (True = yes, False = no)
oldpeak → ST depression induced by exercise relative to rest (numeric value)
num → Target variable (Heart disease diagnosis)
0 → No heart disease
1-4 → Heart disease present (severity levels)
Use Cases-
Predictive modeling for heart disease classification
Exploratory data analysis (EDA) of risk factors
Machine learning projects in healthcare analytics
Medical research on correlations between risk factors and heart disease
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
❤️ Heart Disease Dataset (Enhanced with Feature Engineering)
📌 Overview
This dataset is an enhanced version of the classic UCI Heart Disease dataset, enriched with extensive feature engineering to support advanced data analysis and machine learning applications. In addition to the original clinical features, several derived variables have been introduced to provide deeper insights into cardiovascular risk patterns. These engineered features allow for improved predictive… See the full description on the dataset page: https://huggingface.co/datasets/nezahatkorkmaz/heart-disease-dataset.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Original Data from: https://archive.ics.uci.edu/ml/datasets/Heart+Disease Changes made: - four rows with missing values were removed, leaving 299 records - Chest Pain Type, Restecg, Thal variables were converted to indicator variables - class attribute binarised to -1 (no disease) / +1 disease (original values 1,2,3) Attributes: Col 0: CLASS: -1: no disease +1: disease Col 1: Age (cts) Col 2: Sex (0/1) Col 3: indicator (0/1) for typ angina Col 4: indicator for atyp angina Col 5: indicator for non-ang pain Col 6: resting blood pressure (cts) Col 7: Serum cholest (cts) Col 8: fasting blood sugar >120mg/dl (0/1) Col 9: indicator for electrocardio value 1 Col 10: indicator for electrocardio value 2 Col 11: Max heart rate (cts) Col 12: exercised induced angina (0/1) Col 13: ST depression induced by exercise (cts) Col 14: indicator for slope of peak exercise up Col 15: indicator for slope of peak exercise down Col 16: no major vessels colored by fluro (ctsish: 0,1,2,3) Col 17: Thal reversible defect indicator Col 18: Thal fixed defect indicator Col 19: Class 0-4, where 0 is disease not present, 1-4 is present
Facebook
TwitterThis dataset was created by Nishant Bansal
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Heart Disease Data: Enhanced with Feature Engineering for Advanced Analysis
This dataset is an advanced version of the classic UCI Machine Learning heart disease dataset, enriched with feature engineering to support more sophisticated analyses. The original features have been supplemented with newly derived attributes that help to better understand and model cardiovascular risk factors.
These additional features facilitate a deeper analysis of cardiovascular health by incorporating various derived metrics and categorizations, enhancing the overall utility of the dataset for predictive modeling and data exploration.
Facebook
Twitterhttps://archive.ics.uci.edu/ml/datasets/heart+Diseasehttps://archive.ics.uci.edu/ml/datasets/heart+Disease
The UCI Heart Disease Dataset is a heart disease dataset that contains a total of 76 attributes, but all published experiments refer to a subset of 14 attributes, of which the Cleveland database is the only one ML researchers have used.goal ” field refers to whether a patient has heart disease or not, and the experiments on the Cleveland database focused on trying to distinguish between presence (values 1, 2, 3, 4) and absence (value 0).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Ronak Kantariya
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset
Facebook
TwitterAuthor: H. Altay Guvenir, Burak Acar, Haldun Muderrisoglu
Source: UCI
Please cite: UCI
Cardiac Arrhythmia Database
The aim is to determine the type of arrhythmia from the ECG recordings. This database contains 279 attributes, 206 of which are linear valued and the rest are nominal.
Concerning the study of H. Altay Guvenir: "The aim is to distinguish between the presence and absence of cardiac arrhythmia and to classify it in one of the 16 groups. Class 01 refers to 'normal' ECG classes, 02 to 15 refers to different classes of arrhythmia and class 16 refers to the rest of unclassified ones. For the time being, there exists a computer program that makes such a classification. However, there are differences between the cardiologist's and the program's classification. Taking the cardiologist's as a gold standard we aim to minimize this difference by means of machine learning tools.
The names and id numbers of the patients were recently removed from the database.
1 Age: Age in years , linear
2 Sex: Sex (0 = male; 1 = female) , nominal
3 Height: Height in centimeters , linear
4 Weight: Weight in kilograms , linear
5 QRS duration: Average of QRS duration in msec., linear
6 P-R interval: Average duration between onset of P and Q waves
in msec., linear
7 Q-T interval: Average duration between onset of Q and offset
of T waves in msec., linear
8 T interval: Average duration of T wave in msec., linear
9 P interval: Average duration of P wave in msec., linear
Vector angles in degrees on front plane of:, linear
10 QRS
11 T
12 P
13 QRST
14 J
15 Heart rate: Number of heart beats per minute ,linear
Of channel DI:
Average width, in msec., of: linear
16 Q wave
17 R wave
18 S wave
19 R' wave, small peak just after R
20 S' wave
21 Number of intrinsic deflections, linear
22 Existence of ragged R wave, nominal
23 Existence of diphasic derivation of R wave, nominal
24 Existence of ragged P wave, nominal
25 Existence of diphasic derivation of P wave, nominal
26 Existence of ragged T wave, nominal
27 Existence of diphasic derivation of T wave, nominal
Of channel DII:
28 .. 39 (similar to 16 .. 27 of channel DI)
Of channels DIII:
40 .. 51
Of channel AVR:
52 .. 63
Of channel AVL:
64 .. 75
Of channel AVF:
76 .. 87
Of channel V1:
88 .. 99
Of channel V2:
100 .. 111
Of channel V3:
112 .. 123
Of channel V4:
124 .. 135
Of channel V5:
136 .. 147
Of channel V6:
148 .. 159
Of channel DI:
Amplitude , * 0.1 milivolt, of
160 JJ wave, linear
161 Q wave, linear
162 R wave, linear
163 S wave, linear
164 R' wave, linear
165 S' wave, linear
166 P wave, linear
167 T wave, linear
168 QRSA , Sum of areas of all segments divided by 10,
( Area= width * height / 2 ), linear
169 QRSTA = QRSA + 0.5 * width of T wave * 0.1 * height of T
wave. (If T is diphasic then the bigger segment is
considered), linear
Of channel DII:
170 .. 179
Of channel DIII:
180 .. 189
Of channel AVR:
190 .. 199
Of channel AVL:
200 .. 209
Of channel AVF:
210 .. 219
Of channel V1:
220 .. 229
Of channel V2:
230 .. 239
Of channel V3:
240 .. 249
Of channel V4:
250 .. 259
Of channel V5:
260 .. 269
Of channel V6:
270 .. 279
Class code - class - number of instances:
01 Normal 245 02 Ischemic changes (Coronary Artery Disease) 44 03 Old Anterior Myocardial Infarction 15 04 Old Inferior Myocardial Infarction 15 05 Sinus tachycardy 13 06 Sinus bradycardy 25 07 Ventricular Premature Contraction (PVC) 3 08 Supraventricular Premature Contraction 2 09 Left bundle branch block 9 10 Right bundle branch block 50 11 1. degree AtrioVentricular block 0 12 2. degree AV block 0 13 3. degree AV block 0 14 Left ventricule hypertrophy 4 15 Atrial Fibrillation or Flutter 5 16 Others 22
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This synthetic dataset is designed to predict the risk of heart disease based on a combination of symptoms, lifestyle factors, and medical history. Each row in the dataset represents a patient, with binary (Yes/No) indicators for symptoms and risk factors, along with a computed risk label indicating whether the patient is at high or low risk of developing heart disease.
The dataset contains 70,000 samples, making it suitable for training machine learning models for classification tasks. The goal is to provide researchers, data scientists, and healthcare professionals with a clean and structured dataset to explore predictive modeling for cardiovascular health.
This dataset is a side project of EarlyMed, developed by students of Vellore Institute of Technology (VIT-AP). EarlyMed aims to leverage data science and machine learning for early detection and prevention of chronic diseases.
chest_pain): Presence of chest pain, a common symptom of heart disease.shortness_of_breath): Difficulty breathing, often associated with heart conditions.fatigue): Persistent tiredness without an obvious cause.palpitations): Irregular or rapid heartbeat.dizziness): Episodes of lightheadedness or fainting.swelling): Swelling due to fluid retention, often linked to heart failure.radiating_pain): Radiating pain, a hallmark of angina or heart attacks.cold_sweats): Symptoms commonly associated with acute cardiac events.age): Patient's age in years (continuous variable).hypertension): History of hypertension (Yes/No).cholesterol_high): Elevated cholesterol levels (Yes/No).diabetes): Diagnosis of diabetes (Yes/No).smoker): Whether the patient is a smoker (Yes/No).obesity): Obesity status (Yes/No).family_history): Family history of cardiovascular conditions (Yes/No).risk_label): Binary label indicating the risk of heart disease:
0: Low risk1: High riskThis dataset was synthetically generated using Python libraries such as numpy and pandas. The generation process ensured a balanced distribution of high-risk and low-risk cases while maintaining realistic correlations between features. For example:
- Patients with multiple risk factors (e.g., smoking, hypertension, and diabetes) were more likely to be labeled as high risk.
- Symptom patterns were modeled after clinical guidelines and research studies on heart disease.
The design of this dataset was inspired by the following resources:
This dataset can be used for a variety of purposes:
Machine Learning Research:
Healthcare Analytics:
Educational Purposes:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heart Disease Dataset from UCI Repository
Facebook
Twitter
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This data set dates from 1988 and consists of four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It contains 9 attributes and is a shorter version of the original model. The "target" field refers to the presence of heart disease in the patient. It is integer valued 0 = no disease and 1 = disease. Source of the original data can be found here: https://archive.ics.uci.edu/ml/datasets/heart+Disease
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by QuangNguyen711
Released under MIT
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muneer Iqbal24
Released under CC0: Public Domain