Facebook
TwitterThis dataset was created by Nishant Bansal
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Heart Disease Data combined from UCI repository of following places:
Cleveland, Hungary, Switzerland, and VA Long Beach
Features: Age: Age of individual. 20-80 Sex: This is the gender of the individual. It is represented as a binary value where 1 stands for male and 0 stands for female. ChestPainType: This categorizes the type of chest pain experienced by the individual. The values are: Value 1: Typical angina, which is chest pain related to the heart. Value 2: Atypical angina, which is chest pain not related to the heart. Value 3: Non-anginal pain, which is typically sharp and non-continuous. Value 4: Asymptomatic, meaning the individual experiences no symptoms. RestingBP: This is the individual’s resting blood pressure (in mm Hg) when they are at rest. Cholesterol: This is the individual’s cholesterol level, measured in mg/dl. FastingBS: This indicates whether the individual’s fasting blood sugar is greater than 120 mg/dl. It is represented as a binary value where 1 stands for true and 0 stands for false. MaxHR: This is the maximum heart rate achieved by the individual. ExerciseAngina: This indicates whether the individual experiences angina (chest pain) induced by exercise. It is represented as a binary value where 1 stands for yes and 0 stands for no.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is curated by combining 3 popular heart disease datasets. The first dataset (Collected from Kaggle) contains 70000 records with 11 independent features which makes it the largest heart disease dataset available so far for research purposes. These data were collected at the moment of medical examination and information given by the patient. Second and third datasets contain 303 and 293 intstances respectively with 13 common features. The three datasets used for its curation are:Cardio Data (Kaggle Dataset)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Original Data from: https://archive.ics.uci.edu/ml/datasets/Heart+Disease Changes made: - four rows with missing values were removed, leaving 299 records - Chest Pain Type, Restecg, Thal variables were converted to indicator variables - class attribute binarised to -1 (no disease) / +1 disease (original values 1,2,3) Attributes: Col 0: CLASS: -1: no disease +1: disease Col 1: Age (cts) Col 2: Sex (0/1) Col 3: indicator (0/1) for typ angina Col 4: indicator for atyp angina Col 5: indicator for non-ang pain Col 6: resting blood pressure (cts) Col 7: Serum cholest (cts) Col 8: fasting blood sugar >120mg/dl (0/1) Col 9: indicator for electrocardio value 1 Col 10: indicator for electrocardio value 2 Col 11: Max heart rate (cts) Col 12: exercised induced angina (0/1) Col 13: ST depression induced by exercise (cts) Col 14: indicator for slope of peak exercise up Col 15: indicator for slope of peak exercise down Col 16: no major vessels colored by fluro (ctsish: 0,1,2,3) Col 17: Thal reversible defect indicator Col 18: Thal fixed defect indicator Col 19: Class 0-4, where 0 is disease not present, 1-4 is present
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heart Disease Dataset from UCI Repository
Facebook
TwitterThis is just the Cleveland Heart Disease dataset from UCI
Facebook
TwitterThe Heart-Disease-Dataset database consists of 76 attributes, but only a subset of 14 attributes has been utilized in all published experiments thus far. Among these experiments, ML researchers have exclusively employed the Cleveland database. The attribute labeled "goal" indicates the presence of heart disease in a patient and is represented by an integer ranging from 0 (indicating no presence) to 4. Previous studies conducted using the Cleveland database have primarily focused on distinguishing between the presence (values 1, 2, 3, 4) and absence (value 0) of heart disease.
Facebook
TwitterThis dataset integrates all the databases present in Heart Disease Dataset available at UCI Machine Learning Repository. Original one contains 4 databases: Cleveland, Hungarian, Long Beach, and Switzerland. Most of the work has been done using Cleveland dataset only.
Originally there are 76 attributes in the dataset, Selection of attributes depends on one's need. Here I've taken 10 attributes for the prediction.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a multivariate type of dataset which means providing or involving a variety of separate mathematical or statistical variables, multivariate numerical data analysis. It is composed of 14 attributes which are age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, oldpeak — ST depression induced by exercise relative to rest, the slope of the peak exercise ST segment, number of major vessels and Thalassemia. This database includes 76 attributes, but all published studies relate to the use of a subset of 14 of them. The Cleveland database is the only one used by ML researchers to date. One of the major tasks on this dataset is to predict based on the given attributes of a patient that whether that particular person has heart disease or not and other is the experimental task to diagnose and find out various insights from this dataset which could help in understanding the problem more.
id: (Unique id for each patient)
age: (Age of the patient in years)
origin: (place of study)
sex: (Male/Female)
cp: chest pain type:
1. typical angina
2. atypical angina
3. non-anginal
4. asymptomatic
trestbps: resting blood pressure (resting blood pressure (in mm Hg on admission to the hospital))
chol: (serum cholesterol in mg/dl)
fbs: (if fasting blood sugar > 120 mg/dl)
restecg: (resting electrocardiographic results)
Values: [normal, stt abnormality, lv hypertrophy]
thalach: maximum heart rate achieved
exang: exercise-induced angina (True/ False)
oldpeak: ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment
ca: number of major vessels (0-3) colored by fluoroscopy
thal: [normal; fixed defect; reversible defect]
num: the predicted attribute [0 shows no disease and 1, 2, 3 and 4 shows different level of disease]
Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D.
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64,304--310. David W. Aha & Dennis Kibler. "Instance-based prediction of heart-disease presence with the Cleveland database." Gennari, J.H., Langley, P, & Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11--61.
The authors of the databases have requested that any publications resulting from the use of the data include the names of the principal investigator responsible for the data collection at each institution.
They would be:
Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.
Facebook
Twitterhttps://archive.ics.uci.edu/ml/datasets/heart+Diseasehttps://archive.ics.uci.edu/ml/datasets/heart+Disease
The UCI Heart Disease Dataset is a heart disease dataset that contains a total of 76 attributes, but all published experiments refer to a subset of 14 attributes, of which the Cleveland database is the only one ML researchers have used.goal ” field refers to whether a patient has heart disease or not, and the experiments on the Cleveland database focused on trying to distinguish between presence (values 1, 2, 3, 4) and absence (value 0).
Facebook
TwitterThis database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by Machine Learning researchers to this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).
Source: https://archive.ics.uci.edu/ml/datasets/heart+disease
Facebook
Twitterhttps://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
The data is already presented in https://www.kaggle.com/ronitf/heart-disease-uci but there are some descriptions and values that are wrong as discussed in https://www.kaggle.com/ronitf/heart-disease-uci/discussion/105877. So, here is re-processed dataset that was cross-checked with the original data https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
There are 13 attributes 1. age: age in years 2. sex: sex (1 = male; 0 = female) 3. cp: chest pain type -- Value 0: typical angina -- Value 1: atypical angina -- Value 2: non-anginal pain -- Value 3: asymptomatic 4. trestbps: resting blood pressure (in mm Hg on admission to the hospital) 5. chol: serum cholestoral in mg/dl 6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) 7. restecg: resting electrocardiographic results -- Value 0: normal -- Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) -- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria 8. thalach: maximum heart rate achieved 9. exang: exercise induced angina (1 = yes; 0 = no) 10. oldpeak = ST depression induced by exercise relative to rest 11. slope: the slope of the peak exercise ST segment -- Value 0: upsloping -- Value 1: flat -- Value 2: downsloping 12. ca: number of major vessels (0-3) colored by flourosopy 13. thal: 0 = normal; 1 = fixed defect; 2 = reversable defect and the label 14. condition: 0 = no disease, 1 = disease
Data posted on Kaggle: https://www.kaggle.com/ronitf/heart-disease-uci Description of the data above: https://www.kaggle.com/ronitf/heart-disease-uci/discussion/105877 Original data https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Creators: Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D. University Hospital, Zurich, Switzerland: William Steinbr Creators: Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D. Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779
With the attributes described above, can you predict if a patient has heart disease?
Facebook
TwitterDataset is provided by the Cleveland Clinic Foundation for Heart Disease.
The dataset was downloaded from this link: http://storage.googleapis.com/download.tensorflow.org/data/heart.csv.
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset contains information on a number of heart conditions, such as cholesterol, blood sugar, heart rate, vessel depression, and diagnosis. Collected in four different locations: Cleveland, Switzerland, Hungary, and the VA Long Beach. It is a useful dataset for classifying data. Note: Refrain from inferring any medical conclusions from the dataset's findings
Cited From: Janosi,Andras, Steinbrunn,William, Pfisterer,Matthias, and Detrano,Robert. (1988). Heart Disease. UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X.
Facebook
TwitterThis is a multivariate type of dataset which means providing or involving a variety of separate mathematical or statistical variables, multivariate numerical data analysis. It is composed of 14 attributes which are age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, oldpeak — ST depression induced by exercise relative to rest, the slope of the peak exercise ST segment, number of major vessels and Thalassemia. This database includes 76 attributes, but all published studies relate to the use of a subset of 14 of them. The Cleveland database is the only one used by ML researchers to date. One of the major tasks on this dataset is to predict based on the given attributes of a patient that whether that particular person has heart disease or not and other is the experimental task to diagnose and find out various insights from this dataset which could help in understanding the problem more.
id (Unique id for each patient)age (Age of the patient in years)origin (place of study)sex (Male/Female)cp chest pain type ([typical angina, atypical angina, non-anginal, asymptomatic])trestbps resting blood pressure (resting blood pressure (in mm Hg on admission to the hospital))chol (serum cholesterol in mg/dl) fbs (if fasting blood sugar > 120 mg/dl)restecg (resting electrocardiographic results)
-- Values: [normal, stt abnormality, lv hypertrophy]thalach: maximum heart rate achievedexang: exercise-induced angina (True/ False)oldpeak: ST depression induced by exercise relative to restslope: the slope of the peak exercise ST segmentca: number of major vessels (0-3) colored by fluoroscopythal: [normal; fixed defect; reversible defect]num: the predicted attributeThe authors of the databases have requested that any publications resulting from the use of the data include the names of the principal investigator responsible for the data collection at each institution. They would be:
Facebook
TwitterThe dataset used can be found on the UCI Machine Learning Repository at the following location:
There are several copies of this dataset to be found on Kaggle, with people focusing on different types of analyses of the data. This specific copy can be analysed by anyone interested, but is primarily used by a study group from the Udacity Bertelsmann Technology Scholarship to practice analysis of association between variables as well as implementation and comparison of various Machine Learning models.
According to the paper by (Detrano et al., 1989) as found on the UCI Dataset webpage, the data represents data collected for 303 patients referred for coronary angiography at the Cleveland Clinic between May 1981 and September 1984. The 13 independent/ features variables can be divided into 3 groups as follows:
Routine evaluation (based on historical data):
Non-invasive test data (informed consent obtained for data as part of research protocol):
Other demographic and clinical variables (based on routine data):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3632459%2Fa01747fb0158dc51c12bc0824c9c4ae4%2Fdata_dictionary2.png?generation=1609522473018549&alt=media" alt="">
UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Donor:
David W. Aha (aha '@' ics.uci.edu) (714) 856-8779
The objective of the analysis is to use statistical learning to identify factors associated with Coronary Artery Disease as indicated by a coronary angiography interpreted by a Cardiologist (as per paper written by Detrano et al cited before).
Facebook
TwitterThis dataset consist of the heart_disease preprocessed data from the cleveland in which the original data is from the UCI repository.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This heart disease dataset is a curated combination of five widely used heart disease datasets, previously available independently on the UCI Machine Learning Repository. For the first time, these datasets have been merged based on 11 common clinical features, creating the largest unified heart disease dataset currently available for research.
This consolidated dataset supports more robust training and evaluation of machine learning models for heart disease prediction. By bringing together diverse sources, it enables broader generalization, better pattern detection, and ultimately aims to contribute to early diagnosis and clinical decision-making in cardiology.
Let me know if you'd like to include visual summaries, links to original UCI datasets, or example ML pipelines!
Facebook
TwitterCoronary heart disease (CHD) involves the reduction of blood flow to the heart muscle due to build-up of plaque in the arteries of the heart. It is the most common form of cardiovascular disease. Currently, invasive coronary angiography represents the gold standard for establishing the presence, location, and severity of CAD, however this diagnostic method is costly and associated with morbidity and mortality in CAD patients. Therefore, it would be beneficial to develop a non-invasive alternative to replace the current gold standard.
Other less invasive diagnostics methods have been proposed in the scientific literature including exercise electrocardiogram, thallium scintigraphy and fluoroscopy of coronary calcification. However the diagnostic accuracy of these tests only ranges between 35%-75%. Therefore, it would be beneficial to develop a computer aided diagnostic tool that could utilize the combined results of these non-invasive tests in conjunction with other patient attributes to boost the diagnostic power of these non-invasive methods with the aim ultimately replacing the current invasive gold standard.
In this vein (pun intended), the following dataset comprises 303 observations, 13 features and 1 target attribute. The 13 features include the results of the aforementioned non-invasive diagnostic tests along with other relevant patient information. The target variable includes the result of the invasive coronary angiogram which represents the presence or absence of coronary artery disease in the patient with 0 representing absence of CHD and labels 1-4 representing presence of CHD. Most research using this dataset have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).
The data was collected by Robert Detrano, M.D., Ph.D of the Cleveland Clinic Foundation. See here for protocol specifics.
Also, this paper provides a good summary of the dataset context.
The data set was downloaded from the UCI website.
Attribute Information:
Robert Detrano, M.D., Ph.D: Principle investigator responsible for collecting data
Diagnosis of Coronary Heart Disease by non-invasive means.
Facebook
Twitterhttps://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4.
Attribute Information:
- age
- sex
- chest pain type (4 values)
- resting blood pressure
- serum cholestoral in mg/dl
- fasting blood sugar > 120 mg/dl
- resting electrocardiographic results (values 0,1,2)
- maximum heart rate achieved
- exercise induced angina
- oldpeak = ST depression induced by exercise relative to rest
- the slope of the peak exercise ST segment
- number of major vessels (0-3) colored by flourosopy
- thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
The names and social security numbers of the patients were recently removed from the database, replaced with dummy values. One file has been "processed", that one containing the Cleveland database. All four unprocessed files also exist in this directory.
To see Test Costs (donated by Peter Turney), please see the folder "Costs"
Creators:
1. Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
2. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
3. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
4. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D.
Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779
Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).
See if you can find any other trends in heart data to predict certain cardiovascular events or find any clear indications of heart health.
Facebook
TwitterThis dataset was created by Nishant Bansal