72 datasets found

d
UCI Machine Learning Repository
dknet.org
rrid.site
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_026571
Description
Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given
UCI-dataset
kaggle.com
zip
Updated Aug 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Waquar Azam (2022). UCI-dataset [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/ucidatasetlist
Explore at:
zip(20774 bytes)Available download formats
Dataset updated
Aug 17, 2022
Authors
Md Waquar Azam
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.

There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year

Some missing values are present there also,

You can analyse the as per your requirement

EDA
n
uci-uni
networkrepository.com
csv
Updated Feb 28, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Network Data Repository (2016). uci-uni [Dataset]. https://networkrepository.com/socfb-uci-uni.php
Explore at:
csvAvailable download formats
Dataset updated
Feb 28, 2016
Dataset authored and provided by
Network Data Repository
License
https://networkrepository.com/policy.phphttps://networkrepository.com/policy.php
Description
Facebook social network - A social friendship network extracted from Facebook consisting of people (nodes) with edges representing friendship ties.
h
UCI_drug_reviews
huggingface.co
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Graham Reynolds (2024). UCI_drug_reviews [Dataset]. https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 7, 2024
Authors
John Graham Reynolds
Description
Data Description

This data comes from the UC Irvine Machine Learning Repository. It has been preprocessed to only contain reviews at least 13 or more words in length. The raw data for this specific dataset can be found here. The base UCI ML url can be found here.
UCI dataset
springernature.figshare.com
bin
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20496258.v1
Dataset updated
Mar 13, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

[2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.
Travel Review Rating Dataset
kaggle.com
zip
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wirach Leelakiatiwong (2020). Travel Review Rating Dataset [Dataset]. https://www.kaggle.com/wirachleelakiatiwong/travel-review-rating-dataset
Explore at:
zip(143705 bytes)Available download formats
Dataset updated
Sep 17, 2020
Authors
Wirach Leelakiatiwong
Description
Context

This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set. This data set is populated by capturing user ratings from Google reviews. Reviews on attractions from 24 categories across Europe are considered. Google user rating ranges from 1 to 5 and average user rating per category is calculated.

Content

Attribute 1 : Unique user id Attribute 2 : Average ratings on churches Attribute 3 : Average ratings on resorts Attribute 4 : Average ratings on beaches Attribute 5 : Average ratings on parks Attribute 6 : Average ratings on theatres Attribute 7 : Average ratings on museums Attribute 8 : Average ratings on malls Attribute 9 : Average ratings on zoo Attribute 10 : Average ratings on restaurants Attribute 11 : Average ratings on pubs/bars Attribute 12 : Average ratings on local services Attribute 13 : Average ratings on burger/pizza shops Attribute 14 : Average ratings on hotels/other lodgings Attribute 15 : Average ratings on juice bars Attribute 16 : Average ratings on art galleries Attribute 17 : Average ratings on dance clubs Attribute 18 : Average ratings on swimming pools Attribute 19 : Average ratings on gyms Attribute 20 : Average ratings on bakeries Attribute 21 : Average ratings on beauty & spas Attribute 22 : Average ratings on cafes Attribute 23 : Average ratings on view points Attribute 24 : Average ratings on monuments Attribute 25 : Average ratings on gardens

Acknowledgements

This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set

The UCI page mentions the following publication as the original source of the data set: Renjith, Shini, A. Sreekumar, and M. Jathavedan. 2018. Evaluation of Partitioning Clustering Algorithms for Processing Social Media Data in Tourism Domain. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 12731. IEEE

Inspiration

I'm kind of people who love traveling. But sometimes I've problems like where should I visit? Are there somewhere interesting places matched with my lifestyle? Often I spent hours to search for interesting place to go out. Such a waste of time.

What if we can build a recommender system which can recommend you several interesting venue based on your preferences. With information from Google review, I'll try to divide Google review user into cluster of similar interest for further work of building recommender system based on thier preference.
Adult Data Set ( Census Income dataset)
kaggle.com
zip
Updated Mar 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KritiDoneria (2021). Adult Data Set ( Census Income dataset) [Dataset]. https://www.kaggle.com/datasets/kritidoneria/adultdatasetxai
Explore at:
zip(481687 bytes)Available download formats
Dataset updated
Mar 7, 2021
Authors
KritiDoneria
Description
The dataset used is US Census data which is an extraction of the 1994 census data which was donated to the UC Irvine’s Machine Learning Repository. The data contains approximately 32,000 observations with over 15 variables. The dataset was downloaded from: http://archive.ics.uci.edu/ml/datasets/Adult. The dependent variable in our analysis will be income level and who earns above $50,000 a year using SQL queries, Proportion Analysis using bar charts and Simple Decision Tree to understand the important variables and their influence on prediction.
h
drug-reviews
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mouwiya S. A. Al-Qaisieh, drug-reviews [Dataset]. https://huggingface.co/datasets/Mouwiya/drug-reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Mouwiya S. A. Al-Qaisieh
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
Dataset Details

1.Dataset Loading:

Initially, we load the Drug Review Dataset from the UC Irvine Machine Learning Repository. This dataset contains patient reviews of different drugs, along with the medical condition being treated and the patients' satisfaction ratings.

2.Data Preprocessing:

The dataset is preprocessed to ensure data integrity and consistency. We handle missing values and ensure that each patient ID is unique across the dataset.

3.Text… See the full description on the dataset page: https://huggingface.co/datasets/Mouwiya/drug-reviews.
UCI ML Parkinsons dataset
kaggle.com
zip
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elnaz Alikarami (2025). UCI ML Parkinsons dataset [Dataset]. https://www.kaggle.com/datasets/elnazalikarami/uci-ml-parkinsons-dataset
Explore at:
zip(316796 bytes)Available download formats
Dataset updated
Jul 8, 2025
Authors
Elnaz Alikarami
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Oxford Parkinson's Disease Detection Dataset UCI Machine Learning Repository

dataset's original link : https://archive.ics.uci.edu/dataset/174/parkinsons

Dataset Characteristics Multivariate

Subject Area Health and Medicine

Associated Tasks Classification

Feature Type Real

Instances

197

Features

22

Dataset Information Additional Information

This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to "status" column which is set to 0 for healthy and 1 for PD.

The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column.For further information or to pass on comments, please contact Max Little (littlem '@' robots.ox.ac.uk).

Further details are contained in the following reference -- if you use this dataset, please cite: Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2008), 'Suitability of dysphonia measurements for telemonitoring of Parkinson's disease', IEEE Transactions on Biomedical Engineering (to appear).

Has Missing Values?

No
UCI News Aggregator Dataset With Content
kaggle.com
zip
Updated Feb 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LouisKitLungLaw (2019). UCI News Aggregator Dataset With Content [Dataset]. https://www.kaggle.com/louislung/uci-news-aggregator-dataset-with-content
Explore at:
zip(14776686 bytes)Available download formats
Dataset updated
Feb 21, 2019
Authors
LouisKitLungLaw
Description
Content

The columns included in this dataset are:

ID : the numeric ID of the article

TITLE : the headline of the article

URL : the URL of the article

PUBLISHER : the publisher of the article

CATEGORY : the category of the news item; one of: -- b : business -- t : science and technology -- e : entertainment -- m : health

STORY : alphanumeric ID of the news story that the article discusses

HOSTNAME : hostname where the article was posted

TIMESTAMP : approximate timestamp of the article's publication, given in Unix time (seconds since midnight on Jan 1, 1970)

MAIN_CONTENT: article's content

MAIN_CONTENT_LEN: length of main_content

Acknowledgments

This dataset comes from the UCI Machine Learning Repository. Any publications that use this data should cite the repository as follows:

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

This specific dataset can be found in the UCI ML Repository at this URL
d
UCI Libraries' chatbot files (ANTswers)
search.dataone.org
datadryad.org
+1more
Updated Jun 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danielle Kane (2025). UCI Libraries' chatbot files (ANTswers) [Dataset]. http://doi.org/10.7280/D1P075
Explore at:
Unique identifier
https://doi.org/10.7280/D1P075
Dataset updated
Jun 21, 2025
Dataset provided by
Dryad Digital Repository
Authors
Danielle Kane
Time period covered
Oct 10, 2017
Description
ANTswers is an experimental chatbot that can answer questions about the UC Irvine Libraries. ANTswers is a web-based application, run on a remote library server and is accessed through a web interface page. ANTswersâ€™ personality and persona is based on the UCI mascot, Peter the Anteater. ANTswers responds to simple and short questions. The first link in a response opens in a preview window, all other links open in a new window. Each transaction is reviewed and a data form is filled out to track usage; such as date, time, answer rate, etc.
b
Heart Disease Data Set
berd-platform.de
bin
Updated Jul 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andras Janosi; William Steinbrunn; Matthias Pfisterer; Robert Detrano; Andras Janosi; William Steinbrunn; Matthias Pfisterer; Robert Detrano (2025). Heart Disease Data Set [Dataset]. http://doi.org/10.82939/15znh-yyr19
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.82939/15znh-yyr19
Dataset updated
Jul 31, 2025
Dataset provided by
UC Irvine Machine Learning Repository
Authors
Andras Janosi; William Steinbrunn; Matthias Pfisterer; Robert Detrano; Andras Janosi; William Steinbrunn; Matthias Pfisterer; Robert Detrano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to
this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).
UCI Adult Census Data Dataset
kaggle.com
zip
Updated Aug 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagnik (2020). UCI Adult Census Data Dataset [Dataset]. https://www.kaggle.com/datasets/sagnikpatra/uci-adult-census-data-dataset
Explore at:
zip(745972 bytes)Available download formats
Dataset updated
Aug 10, 2020
Authors
Sagnik
Description
The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. I encountered it during my course, and I wish to share it here because it is a good starter example for data pre-processing and machine learning practices.

Fields

The dataset contains 16 columns Target filed: Income -- The income is divide into two classes: 50K Number of attributes: 14 -- These are the demographics and other features to describe a person

We can explore the possibility in predicting income level based on the individual’s personal information.

Acknowledgements

This dataset named “adult” is found in the UCI machine learning repository
Credit Card Approvals (Clean Data)
kaggle.com
zip
Updated Apr 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Cortinhas (2022). Credit Card Approvals (Clean Data) [Dataset]. https://www.kaggle.com/datasets/samuelcortinhas/credit-card-approval-clean-data
Explore at:
zip(19448 bytes)Available download formats
Dataset updated
Apr 25, 2022
Authors
Samuel Cortinhas
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a cleaned version of this dataset from UCI machine learning repository on credit card approvals.

Missing values have been filled and feature names and categorical names have been inferred, resulting in more context and it being easier to use.

Your task is to predict which people in the dataset are successful in applying for a credit card.
Data from: A new hybrid ensemble model with voting-based outlier detection...
figshare.com
txt
Updated Aug 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenyu Zhang; Dongqi Yang; Shuai Zhang (2020). A new hybrid ensemble model with voting-based outlier detection and balanced sampling for credit scoring [Dataset]. http://doi.org/10.6084/m9.figshare.12782552.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12782552.v2
Dataset updated
Aug 11, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Wenyu Zhang; Dongqi Yang; Shuai Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Three datasets from the UC Irvine (UCI) machine learning repository, that is, the Australian, German, and Japanese datasets, were adopted for the current study. The Australian credit dataset contains 690 samples, of which 307 are positive and 383 are negative. The dimensions of its input features are 15. The German credit dataset contains 1000 samples, 700 of which are positive and 300 are negative. The dimensions of its input features are 21. The Japanese credit dataset contains 690 samples, of which 383 are positive and 307 are negative. The dimensions of its input features are 16.
h
sms_spam
huggingface.co
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UC Irvine (2023). sms_spam [Dataset]. https://huggingface.co/datasets/ucirvine/sms_spam
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Dataset authored and provided by
UC Irvine
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for [Dataset Name]

Dataset Summary

The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

English

Dataset Structure Data Instances

[More Information… See the full description on the dataset page: https://huggingface.co/datasets/ucirvine/sms_spam.
Data from: A novel multi-stage ensemble model with fuzzy-clustering and...
figshare.com
txt
Updated Sep 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dongqi Yang; Wenyu Zhang; Xin Wu; Jose H.Ablanedo; Wangzhi Yu (2020). A novel multi-stage ensemble model with fuzzy-clustering and optimized classifier composition for corporate bankruptcy prediction [Dataset]. http://doi.org/10.6084/m9.figshare.12103773.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12103773.v2
Dataset updated
Sep 2, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Dongqi Yang; Wenyu Zhang; Xin Wu; Jose H.Ablanedo; Wangzhi Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this experiment, the datasets are from the UC Irvine (UCI) UCI machine learning repository (Zięba et al., 2016), which contains the financial indicators of Polish manufacturing corporates from 2007 to 2011 in the real world. The datasets were separated into five parts (each part represents each fiscal year) that describe the period from the 1st year (2007 fiscal year) to the 5th year (2011 fiscal year), which corresponds to five different bankruptcy cycles. The class labels (“0” is operating and “1” is bankruptcy) of the datasets are determined by the bankruptcy status of the enterprise in 2012. Furthermore, the Creator dataset from the real world that was published by a Chinese intelligent government services provider called Creator Information Technology Co., Ltd in 2019 was also adopted. The Creator dataset includes company management information of 35960 Chinese companies.
Annotated Benchmark of Real-World Data for Approximate Functional Dependency...
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. http://doi.org/10.5281/zenodo.8098909
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8098909
Dataset updated
Jul 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

Dataset References

adult.csv: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

claims.csv: TSA Claims Data 2002 to 2006, published by the U.S. Department of Homeland Security.

dblp10k.csv: Frequency-aware Similarity Measures. Lange, Dustin; Naumann, Felix (2011). 243–248. Made available as DBLP Dataset 2.

hospital.csv: Hospital dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

t_biocase_... files: t_bioc_... files used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

tax.csv: Tax dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.
UCI Heart Disease Data Set
kaggle.com
zip
Updated Jan 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lourens Walters (2021). UCI Heart Disease Data Set [Dataset]. https://www.kaggle.com/lourenswalters/uci-heart-disease-data-set
Explore at:
zip(4110 bytes)Available download formats
Dataset updated
Jan 1, 2021
Authors
Lourens Walters
Description
Context

The dataset used can be found on the UCI Machine Learning Repository at the following location:

Heart Disease Dataset

There are several copies of this dataset to be found on Kaggle, with people focusing on different types of analyses of the data. This specific copy can be analysed by anyone interested, but is primarily used by a study group from the Udacity Bertelsmann Technology Scholarship to practice analysis of association between variables as well as implementation and comparison of various Machine Learning models.

Content

According to the paper by (Detrano et al., 1989) as found on the UCI Dataset webpage, the data represents data collected for 303 patients referred for coronary angiography at the Cleveland Clinic between May 1981 and September 1984. The 13 independent/ features variables can be divided into 3 groups as follows:

Routine evaluation (based on historical data):

ECG at rest

Serum Cholesterol

Fasting blood sugar

Non-invasive test data (informed consent obtained for data as part of research protocol):

Exercise ECG

ST-segment peak slope (upsloping, flat or downsloping)

ST-segment depression

Excercise Thallium scintigraphy (fixed, reversible or none)

Cardiac fluoroscopy (number of vessels appeared to contain calcium)

Other demographic and clinical variables (based on routine data):

Age

Sex

Chest pain type

Systolic blood pressure

ST-T-wave abnormality (T-wave abnormality)

Probably or definite ventricular hypertrophy (Este's criteria)

The dependent/ response variable was the angiographic test result indicating a >50% diameter narrowing.

Data Dictionary

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3632459%2Fa01747fb0158dc51c12bc0824c9c4ae4%2Fdata_dictionary2.png?generation=1609522473018549&alt=media" alt="">

Acknowledgements

UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Donor:

David W. Aha (aha '@' ics.uci.edu) (714) 856-8779

Inspiration

The objective of the analysis is to use statistical learning to identify factors associated with Coronary Artery Disease as indicated by a coronary angiography interpreted by a Cardiologist (as per paper written by Detrano et al cited before).
Data from: A novel multi-stage ensemble model with enhanced outlier...
figshare.com
txt
Updated Jun 19, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenyu Zhang; Dongqi Yang; Shuai Zhang; Jose H.Ablanedo; Yu Lou (2020). A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring [Dataset]. http://doi.org/10.6084/m9.figshare.12512360.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12512360.v1
Dataset updated
Jun 19, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wenyu Zhang; Dongqi Yang; Shuai Zhang; Jose H.Ablanedo; Yu Lou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nine datasets from the UC Irvine (UCI) machine learning repository, i.e., the Australian, Japanese, German (Asuncion & Newman, 2007), Taiwan (Yeh & Lien, 2009) and Polish credit datasets (Zięba et al., 2016) were adopted for the current study. The Polish credit datasets contain five datasets distinguished five classification cases that depend on the forecasting period (e.g., the Polish 1, the Polish 2, the Polish 3, the Polish 4 and the Polish 5). AER credit dataset (Greene, 2003), which is a credit card dataset for econometric analysis. Creator dataset, which is published in 2019 by a Chinese digital government services provider named Creator Information Technology Co., Ltd[1]. The Creator dataset contains the property rights, financial statements, and basic company information of 35960 Chinese companies.

[1] http://www.chinacreator.com/cn/

Facebook

Twitter

Click to copy link

Link copied

Cite

UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571

UCI Machine Learning Repository

RRID:SCR_026571, r3d100010960, UCI Machine Learning Repository (RRID:SCR_026571), UC Irvine Machine Learning Repository

Explore at:

Unique identifier

https://identifiers.org/RRID:SCR_026571

Description

Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given

Clear search

Close search

Google apps

Main menu

UCI Machine Learning Repository

UCI-dataset

EDA

uci-uni

UCI_drug_reviews

UCI dataset

Travel Review Rating Dataset

Context

Content

Acknowledgements

Inspiration

Adult Data Set ( Census Income dataset)

drug-reviews

UCI ML Parkinsons dataset

Instances

Features

UCI News Aggregator Dataset With Content

Content

Acknowledgments

UCI Libraries' chatbot files (ANTswers)

Heart Disease Data Set

UCI Adult Census Data Dataset

Credit Card Approvals (Clean Data)

Data from: A new hybrid ensemble model with voting-based outlier detection...

sms_spam

Data from: A novel multi-stage ensemble model with fuzzy-clustering and...

Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

UCI Heart Disease Data Set

Context

Content

Data Dictionary

Acknowledgements

Inspiration

Data from: A novel multi-stage ensemble model with enhanced outlier...

UCI Machine Learning Repository

RRID:SCR_026571, r3d100010960, UCI Machine Learning Repository (RRID:SCR_026571), UC Irvine Machine Learning Repository