72 datasets found
  1. d

    UCI Machine Learning Repository

    • dknet.org
    • rrid.site
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
    Explore at:
    Description

    Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given

  2. UCI-dataset

    • kaggle.com
    zip
    Updated Aug 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Waquar Azam (2022). UCI-dataset [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/ucidatasetlist
    Explore at:
    zip(20774 bytes)Available download formats
    Dataset updated
    Aug 17, 2022
    Authors
    Md Waquar Azam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.

    There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year

    Some missing values are present there also,

    You can analyse the as per your requirement

    EDA

  3. n

    uci-uni

    • networkrepository.com
    csv
    Updated Feb 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Network Data Repository (2016). uci-uni [Dataset]. https://networkrepository.com/socfb-uci-uni.php
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 28, 2016
    Dataset authored and provided by
    Network Data Repository
    License

    https://networkrepository.com/policy.phphttps://networkrepository.com/policy.php

    Description

    Facebook social network - A social friendship network extracted from Facebook consisting of people (nodes) with edges representing friendship ties.

  4. h

    UCI_drug_reviews

    • huggingface.co
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Graham Reynolds (2024). UCI_drug_reviews [Dataset]. https://huggingface.co/datasets/MarioBarbeque/UCI_drug_reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2024
    Authors
    John Graham Reynolds
    Description

    Data Description

    This data comes from the UC Irvine Machine Learning Repository. It has been preprocessed to only contain reviews at least 13 or more words in length. The raw data for this specific dataset can be found here. The base UCI ML url can be found here.

  5. UCI dataset

    • springernature.figshare.com
    bin
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 13, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

    [2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.

  6. Travel Review Rating Dataset

    • kaggle.com
    zip
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wirach Leelakiatiwong (2020). Travel Review Rating Dataset [Dataset]. https://www.kaggle.com/wirachleelakiatiwong/travel-review-rating-dataset
    Explore at:
    zip(143705 bytes)Available download formats
    Dataset updated
    Sep 17, 2020
    Authors
    Wirach Leelakiatiwong
    Description

    Context

    This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set. This data set is populated by capturing user ratings from Google reviews. Reviews on attractions from 24 categories across Europe are considered. Google user rating ranges from 1 to 5 and average user rating per category is calculated.

    Content

    Attribute 1 : Unique user id Attribute 2 : Average ratings on churches Attribute 3 : Average ratings on resorts Attribute 4 : Average ratings on beaches Attribute 5 : Average ratings on parks Attribute 6 : Average ratings on theatres Attribute 7 : Average ratings on museums Attribute 8 : Average ratings on malls Attribute 9 : Average ratings on zoo Attribute 10 : Average ratings on restaurants Attribute 11 : Average ratings on pubs/bars Attribute 12 : Average ratings on local services Attribute 13 : Average ratings on burger/pizza shops Attribute 14 : Average ratings on hotels/other lodgings Attribute 15 : Average ratings on juice bars Attribute 16 : Average ratings on art galleries Attribute 17 : Average ratings on dance clubs Attribute 18 : Average ratings on swimming pools Attribute 19 : Average ratings on gyms Attribute 20 : Average ratings on bakeries Attribute 21 : Average ratings on beauty & spas Attribute 22 : Average ratings on cafes Attribute 23 : Average ratings on view points Attribute 24 : Average ratings on monuments Attribute 25 : Average ratings on gardens

    Acknowledgements

    This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set

    The UCI page mentions the following publication as the original source of the data set: Renjith, Shini, A. Sreekumar, and M. Jathavedan. 2018. Evaluation of Partitioning Clustering Algorithms for Processing Social Media Data in Tourism Domain. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 12731. IEEE

    Inspiration

    I'm kind of people who love traveling. But sometimes I've problems like where should I visit? Are there somewhere interesting places matched with my lifestyle? Often I spent hours to search for interesting place to go out. Such a waste of time.

    What if we can build a recommender system which can recommend you several interesting venue based on your preferences. With information from Google review, I'll try to divide Google review user into cluster of similar interest for further work of building recommender system based on thier preference.

  7. Adult Data Set ( Census Income dataset)

    • kaggle.com
    zip
    Updated Mar 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KritiDoneria (2021). Adult Data Set ( Census Income dataset) [Dataset]. https://www.kaggle.com/datasets/kritidoneria/adultdatasetxai
    Explore at:
    zip(481687 bytes)Available download formats
    Dataset updated
    Mar 7, 2021
    Authors
    KritiDoneria
    Description

    The dataset used is US Census data which is an extraction of the 1994 census data which was donated to the UC Irvine’s Machine Learning Repository. The data contains approximately 32,000 observations with over 15 variables. The dataset was downloaded from: http://archive.ics.uci.edu/ml/datasets/Adult. The dependent variable in our analysis will be income level and who earns above $50,000 a year using SQL queries, Proportion Analysis using bar charts and Simple Decision Tree to understand the important variables and their influence on prediction.

  8. h

    drug-reviews

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mouwiya S. A. Al-Qaisieh, drug-reviews [Dataset]. https://huggingface.co/datasets/Mouwiya/drug-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Mouwiya S. A. Al-Qaisieh
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    Dataset Details

      1.Dataset Loading:
    

    Initially, we load the Drug Review Dataset from the UC Irvine Machine Learning Repository. This dataset contains patient reviews of different drugs, along with the medical condition being treated and the patients' satisfaction ratings.

      2.Data Preprocessing:
    

    The dataset is preprocessed to ensure data integrity and consistency. We handle missing values and ensure that each patient ID is unique across the dataset.

      3.Text… See the full description on the dataset page: https://huggingface.co/datasets/Mouwiya/drug-reviews.
    
  9. UCI ML Parkinsons dataset

    • kaggle.com
    zip
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elnaz Alikarami (2025). UCI ML Parkinsons dataset [Dataset]. https://www.kaggle.com/datasets/elnazalikarami/uci-ml-parkinsons-dataset
    Explore at:
    zip(316796 bytes)Available download formats
    Dataset updated
    Jul 8, 2025
    Authors
    Elnaz Alikarami
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Oxford Parkinson's Disease Detection Dataset UCI Machine Learning Repository

    dataset's original link : https://archive.ics.uci.edu/dataset/174/parkinsons

    Dataset Characteristics Multivariate

    Subject Area Health and Medicine

    Associated Tasks Classification

    Feature Type Real

    Instances

    197

    Features

    22

    Dataset Information Additional Information

    This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to "status" column which is set to 0 for healthy and 1 for PD.

    The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column.For further information or to pass on comments, please contact Max Little (littlem '@' robots.ox.ac.uk).

    Further details are contained in the following reference -- if you use this dataset, please cite: Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2008), 'Suitability of dysphonia measurements for telemonitoring of Parkinson's disease', IEEE Transactions on Biomedical Engineering (to appear).

    Has Missing Values?

    No

  10. UCI News Aggregator Dataset With Content

    • kaggle.com
    zip
    Updated Feb 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LouisKitLungLaw (2019). UCI News Aggregator Dataset With Content [Dataset]. https://www.kaggle.com/louislung/uci-news-aggregator-dataset-with-content
    Explore at:
    zip(14776686 bytes)Available download formats
    Dataset updated
    Feb 21, 2019
    Authors
    LouisKitLungLaw
    Description

    Content

    The columns included in this dataset are:

    ID : the numeric ID of the article

    TITLE : the headline of the article

    URL : the URL of the article

    PUBLISHER : the publisher of the article

    CATEGORY : the category of the news item; one of: -- b : business -- t : science and technology -- e : entertainment -- m : health

    STORY : alphanumeric ID of the news story that the article discusses

    HOSTNAME : hostname where the article was posted

    TIMESTAMP : approximate timestamp of the article's publication, given in Unix time (seconds since midnight on Jan 1, 1970)

    MAIN_CONTENT: article's content

    MAIN_CONTENT_LEN: length of main_content

    Acknowledgments

    This dataset comes from the UCI Machine Learning Repository. Any publications that use this data should cite the repository as follows:

    Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

    This specific dataset can be found in the UCI ML Repository at this URL

  11. d

    UCI Libraries' chatbot files (ANTswers)

    • search.dataone.org
    • datadryad.org
    • +1more
    Updated Jun 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danielle Kane (2025). UCI Libraries' chatbot files (ANTswers) [Dataset]. http://doi.org/10.7280/D1P075
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Danielle Kane
    Time period covered
    Oct 10, 2017
    Description

    ANTswers is an experimental chatbot that can answer questions about the UC Irvine Libraries. ANTswers is a web-based application, run on a remote library server and is accessed through a web interface page. ANTswers’ personality and persona is based on the UCI mascot, Peter the Anteater. ANTswers responds to simple and short questions. The first link in a response opens in a preview window, all other links open in a new window. Each transaction is reviewed and a data form is filled out to track usage; such as date, time, answer rate, etc.

  12. b

    Heart Disease Data Set

    • berd-platform.de
    bin
    Updated Jul 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andras Janosi; William Steinbrunn; Matthias Pfisterer; Robert Detrano; Andras Janosi; William Steinbrunn; Matthias Pfisterer; Robert Detrano (2025). Heart Disease Data Set [Dataset]. http://doi.org/10.82939/15znh-yyr19
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    UC Irvine Machine Learning Repository
    Authors
    Andras Janosi; William Steinbrunn; Matthias Pfisterer; Robert Detrano; Andras Janosi; William Steinbrunn; Matthias Pfisterer; Robert Detrano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to
    this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).

  13. UCI Adult Census Data Dataset

    • kaggle.com
    zip
    Updated Aug 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagnik (2020). UCI Adult Census Data Dataset [Dataset]. https://www.kaggle.com/datasets/sagnikpatra/uci-adult-census-data-dataset
    Explore at:
    zip(745972 bytes)Available download formats
    Dataset updated
    Aug 10, 2020
    Authors
    Sagnik
    Description

    The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. I encountered it during my course, and I wish to share it here because it is a good starter example for data pre-processing and machine learning practices.

    Fields

    The dataset contains 16 columns Target filed: Income -- The income is divide into two classes: 50K Number of attributes: 14 -- These are the demographics and other features to describe a person

    We can explore the possibility in predicting income level based on the individual’s personal information.

    Acknowledgements

    This dataset named “adult” is found in the UCI machine learning repository

  14. Credit Card Approvals (Clean Data)

    • kaggle.com
    zip
    Updated Apr 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Cortinhas (2022). Credit Card Approvals (Clean Data) [Dataset]. https://www.kaggle.com/datasets/samuelcortinhas/credit-card-approval-clean-data
    Explore at:
    zip(19448 bytes)Available download formats
    Dataset updated
    Apr 25, 2022
    Authors
    Samuel Cortinhas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a cleaned version of this dataset from UCI machine learning repository on credit card approvals.

    Missing values have been filled and feature names and categorical names have been inferred, resulting in more context and it being easier to use.

    Your task is to predict which people in the dataset are successful in applying for a credit card.

  15. Data from: A new hybrid ensemble model with voting-based outlier detection...

    • figshare.com
    txt
    Updated Aug 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenyu Zhang; Dongqi Yang; Shuai Zhang (2020). A new hybrid ensemble model with voting-based outlier detection and balanced sampling for credit scoring [Dataset]. http://doi.org/10.6084/m9.figshare.12782552.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 11, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Wenyu Zhang; Dongqi Yang; Shuai Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Three datasets from the UC Irvine (UCI) machine learning repository, that is, the Australian, German, and Japanese datasets, were adopted for the current study. The Australian credit dataset contains 690 samples, of which 307 are positive and 383 are negative. The dimensions of its input features are 15. The German credit dataset contains 1000 samples, 700 of which are positive and 300 are negative. The dimensions of its input features are 21. The Japanese credit dataset contains 690 samples, of which 383 are positive and 307 are negative. The dimensions of its input features are 16.

  16. h

    sms_spam

    • huggingface.co
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UC Irvine (2023). sms_spam [Dataset]. https://huggingface.co/datasets/ucirvine/sms_spam
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2023
    Dataset authored and provided by
    UC Irvine
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for [Dataset Name]

      Dataset Summary
    

    The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    English

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    [More Information… See the full description on the dataset page: https://huggingface.co/datasets/ucirvine/sms_spam.

  17. Data from: A novel multi-stage ensemble model with fuzzy-clustering and...

    • figshare.com
    txt
    Updated Sep 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongqi Yang; Wenyu Zhang; Xin Wu; Jose H.Ablanedo; Wangzhi Yu (2020). A novel multi-stage ensemble model with fuzzy-clustering and optimized classifier composition for corporate bankruptcy prediction [Dataset]. http://doi.org/10.6084/m9.figshare.12103773.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 2, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Dongqi Yang; Wenyu Zhang; Xin Wu; Jose H.Ablanedo; Wangzhi Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this experiment, the datasets are from the UC Irvine (UCI) UCI machine learning repository (Zięba et al., 2016), which contains the financial indicators of Polish manufacturing corporates from 2007 to 2011 in the real world. The datasets were separated into five parts (each part represents each fiscal year) that describe the period from the 1st year (2007 fiscal year) to the 5th year (2011 fiscal year), which corresponds to five different bankruptcy cycles. The class labels (“0” is operating and “1” is bankruptcy) of the datasets are determined by the bankruptcy status of the enterprise in 2012. Furthermore, the Creator dataset from the real world that was published by a Chinese intelligent government services provider called Creator Information Technology Co., Ltd in 2019 was also adopted. The Creator dataset includes company management information of 35960 Chinese companies.

  18. Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. http://doi.org/10.5281/zenodo.8098909
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

    This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

    The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

    The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

    Dataset References

  19. UCI Heart Disease Data Set

    • kaggle.com
    zip
    Updated Jan 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lourens Walters (2021). UCI Heart Disease Data Set [Dataset]. https://www.kaggle.com/lourenswalters/uci-heart-disease-data-set
    Explore at:
    zip(4110 bytes)Available download formats
    Dataset updated
    Jan 1, 2021
    Authors
    Lourens Walters
    Description

    Context

    The dataset used can be found on the UCI Machine Learning Repository at the following location:

    Heart Disease Dataset

    There are several copies of this dataset to be found on Kaggle, with people focusing on different types of analyses of the data. This specific copy can be analysed by anyone interested, but is primarily used by a study group from the Udacity Bertelsmann Technology Scholarship to practice analysis of association between variables as well as implementation and comparison of various Machine Learning models.

    Content

    According to the paper by (Detrano et al., 1989) as found on the UCI Dataset webpage, the data represents data collected for 303 patients referred for coronary angiography at the Cleveland Clinic between May 1981 and September 1984. The 13 independent/ features variables can be divided into 3 groups as follows:

    Routine evaluation (based on historical data):

    • ECG at rest
    • Serum Cholesterol
    • Fasting blood sugar

    Non-invasive test data (informed consent obtained for data as part of research protocol):

    • Exercise ECG
      • ST-segment peak slope (upsloping, flat or downsloping)
      • ST-segment depression
    • Excercise Thallium scintigraphy (fixed, reversible or none)
    • Cardiac fluoroscopy (number of vessels appeared to contain calcium)

    Other demographic and clinical variables (based on routine data):

    • Age
    • Sex
    • Chest pain type
    • Systolic blood pressure
    • ST-T-wave abnormality (T-wave abnormality)
    • Probably or definite ventricular hypertrophy (Este's criteria)
    • The dependent/ response variable was the angiographic test result indicating a >50% diameter narrowing.

    Data Dictionary

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3632459%2Fa01747fb0158dc51c12bc0824c9c4ae4%2Fdata_dictionary2.png?generation=1609522473018549&alt=media" alt="">

    Acknowledgements

    UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Donor:

    David W. Aha (aha '@' ics.uci.edu) (714) 856-8779

    Inspiration

    The objective of the analysis is to use statistical learning to identify factors associated with Coronary Artery Disease as indicated by a coronary angiography interpreted by a Cardiologist (as per paper written by Detrano et al cited before).

  20. Data from: A novel multi-stage ensemble model with enhanced outlier...

    • figshare.com
    txt
    Updated Jun 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenyu Zhang; Dongqi Yang; Shuai Zhang; Jose H.Ablanedo; Yu Lou (2020). A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring [Dataset]. http://doi.org/10.6084/m9.figshare.12512360.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 19, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wenyu Zhang; Dongqi Yang; Shuai Zhang; Jose H.Ablanedo; Yu Lou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nine datasets from the UC Irvine (UCI) machine learning repository, i.e., the Australian, Japanese, German (Asuncion & Newman, 2007), Taiwan (Yeh & Lien, 2009) and Polish credit datasets (Zięba et al., 2016) were adopted for the current study. The Polish credit datasets contain five datasets distinguished five classification cases that depend on the forecasting period (e.g., the Polish 1, the Polish 2, the Polish 3, the Polish 4 and the Polish 5). AER credit dataset (Greene, 2003), which is a credit card dataset for econometric analysis. Creator dataset, which is published in 2019 by a Chinese digital government services provider named Creator Information Technology Co., Ltd[1]. The Creator dataset contains the property rights, financial statements, and basic company information of 35960 Chinese companies.

    [1] http://www.chinacreator.com/cn/

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571

UCI Machine Learning Repository

RRID:SCR_026571, r3d100010960, UCI Machine Learning Repository (RRID:SCR_026571), UC Irvine Machine Learning Repository

Explore at:
Description

Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given

Search
Clear search
Close search
Google apps
Main menu