100+ datasets found
  1. i

    UCI datasets

    • ieee-dataport.org
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets
    Explore at:
    Dataset updated
    May 14, 2025
    Authors
    Yuan Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    biology

  2. P

    UCI Machine Learning Repository Dataset

    • paperswithcode.com
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository
    Explore at:
    Dataset updated
    Mar 9, 2021
    Authors
    Jan N. van Rijn; Jonathan K. Vis
    Description

    UCI Machine Learning Repository is a collection of over 550 datasets.

  3. UCI dataset

    • springernature.figshare.com
    bin
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 13, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

    [2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.

  4. s

    UCI Machine Learning Repository

    • scicrunch.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
    Explore at:
    Description

    Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given

  5. Z

    UCI and OpenML Data Sets for Ordinal Quantification

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bunse, Mirko (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8177301
    Explore at:
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Bunse, Mirko
    Moreo, Alejandro
    Sebastiani, Fabrizio
    Senz, Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  6. a

    UCI Machine Learning Datasets 12/2013

    • academictorrents.com
    bittorrent
    Updated Dec 20, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2013). UCI Machine Learning Datasets 12/2013 [Dataset]. https://academictorrents.com/details/7fafb101f9c7961f9b840daeb4af43039107ddef
    Explore at:
    bittorrent(16365432846)Available download formats
    Dataset updated
    Dec 20, 2013
    Dataset authored and provided by
    UCI
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. Many people deserve thanks for making the repository a success. Foremost among them are the d

  7. UCI Diabetes Data Set

    • kaggle.com
    Updated May 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ergin Altıntaş (2020). UCI Diabetes Data Set [Dataset]. https://www.kaggle.com/ealtintas/uci-machine-learning-repository-diabetes-data-set/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ergin Altıntaş
    Description

    About this Dataset

    This CSV contain a data set prepared for the use of participants for the 1994 AAAI Spring Symposium on Artificial Intelligence in Medicine.

    Content

    Original files were obtained from: https://archive.ics.uci.edu/ml/datasets/diabetes

    Archived file diabetes-data.tar.z which contains 70 sets of data recorded on diabetes patients (several weeks' to months' worth of glucose, insulin, and lifestyle data per patient + a description of the problem domain) is extracted and processed and merged as a CSV file.

    The Code field of the CSV is deciphered as follows:

    33 = Regular insulin dose 34 = NPH insulin dose 35 = UltraLente insulin dose 48 = Unspecified blood glucose measurement 57 = Unspecified blood glucose measurement 58 = Pre-breakfast blood glucose measurement 59 = Post-breakfast blood glucose measurement 60 = Pre-lunch blood glucose measurement 61 = Post-lunch blood glucose measurement 62 = Pre-supper blood glucose measurement 63 = Post-supper blood glucose measurement 64 = Pre-snack blood glucose measurement 65 = Hypoglycemic symptoms 66 = Typical meal ingestion 67 = More-than-usual meal ingestion 68 = Less-than-usual meal ingestion 69 = Typical exercise activity 70 = More-than-usual exercise activity 71 = Less-than-usual exercise activity 72 = Unspecified special event

  8. UCI datasets

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi; Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi (2023). UCI datasets [Dataset]. http://doi.org/10.5281/zenodo.7681792
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 4, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi; Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding

    • Air Quality
    • US census 1990

    Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly

    Number of features: 15-68

    Ground truth: No

    Type of Graph: No ground truth

    More information about the datasets is contained in the dataset_description.html files.

  9. Open-source data sets for classification task from UCI repository and...

    • figshare.com
    txt
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xh niu (2024). Open-source data sets for classification task from UCI repository and Scikit-learn in section 4 [Dataset]. http://doi.org/10.6084/m9.figshare.26886055.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    xh niu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets from Scikit-learn are: ‘Iris’, ‘Wine’, ‘Breast Cancer Wisconsin (Diagnostic)’. Datasets from UCI repository are: ‘Seeds’ ‘Banknote Authentication’ (‘Banknotes’), ‘Heart disease’ ‘ Parkinsons ‘, ‘Ecoli’, ‘Thyroid (Thyroid gland data)’

  10. d

    Replication Data for: Scalable Kernel Mean Matching

    • search.dataone.org
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chandra, Swarup (2023). Replication Data for: Scalable Kernel Mean Matching [Dataset]. http://doi.org/10.7910/DVN/ELFPEM
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Chandra, Swarup
    Description
  11. z

    UCI Datasets: "Air quality" and "US Census (1990)"

    • zenodo.org
    bin, csv, html
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). UCI Datasets: "Air quality" and "US Census (1990)" [Dataset]. http://doi.org/10.5281/zenodo.8063512
    Explore at:
    bin, csv, htmlAvailable download formats
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    Zenodo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Two preprocessed datasets collected from the UCI repository that can be used for the purpose of structure learning from multivariate data of different types.

    Air Quality

    This dataset represents hourly averaged measurements of 5 metal oxide chemical sensors embedded in an air quality chemical multisensor device. The certified analyzer was located on the field in a significantly polluted area, at road level, within an Italian city. Data were recorded from March 2004 to February 2005 (one year), representing the longest freely available recordings of on-field deployed air quality chemical sensor device responses [1]. More information about the attributes and their type can be found in airqualitydataset_description.html.

    Size of dataset: 9358
    Number of Features: 16
    Type of data: discrete and continuous
    Ground Truth: No

    Contains the responses of a gas multisensor device deployed on the field in an Italian city. Hourly responses averages are recorded along with gas concentrations references from a certified analyzer. There are 15 attributes. Date and Time as well as discrete and real covariates.

    0 Date (DD/MM/YYYY)
    1 Time (HH.MM.SS)
    2 True hourly averaged concentration CO in mg/m^3 (reference analyzer)
    3 PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted)
    4 True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer)
    5 True hourly averaged Benzene concentration in microg/m^3 (reference analyzer)
    6 PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted)
    7 True hourly averaged NOx concentration in ppb (reference analyzer)
    8 PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted)
    9 True hourly averaged NO2 concentration in microg/m^3 (reference analyzer)
    10 PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)
    11 PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted)
    12 Temperature in °C
    13 Relative Humidity (%)
    14 AH Absolute Humidity

    US Census (1990)

    This dataset is a discretized version of the USCensus1990raw dataset. The data was collected as part of the 1990 census, and it describes one percent sample of the Public Use Microdata Samples (PUMS) person records drawn from the full 1990 census sample (all fifty states and the District of Columbia but not including "PUMA Cross State Lines One Percent Persons Records") [2]. More information about the attributes and their type can be found in census1990_description.html.

    Size of dataset: 2458285
    Number of features: 68
    Ground truth: No

    References:

    [1] S. De Vito and E. Massera and M. Piga and L. Martinotto and G. Di Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical, Volume 129, Issue 2, 22 February 2008, Pages 750-757, ISSN 0925-4005 https://doi.org/10.1016/j.snb.2007.09.060

    [2] Meek, Thiesson and Heckerman (2001), "The Learning Curve Method Applied to Clustering",The Journal of Machine Learning Research. (Also see MSR-TR-2001-34 available athttps://www.microsoft.com/en-us/research/wp-content/uploads/2001/01/lc-aistats.pdf)

  12. P

    ionosphere Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fei Tony Liu; Kai Ming Ting; Zhi-Hua Zhou, ionosphere Dataset [Dataset]. https://paperswithcode.com/dataset/ionosphere
    Explore at:
    Authors
    Fei Tony Liu; Kai Ming Ting; Zhi-Hua Zhou
    Description

    The original ionosphere dataset from UCI machine learning repository is a binary classification dataset with dimensionality 34. There is one attribute having values all zeros, which is discarded. So the total number of dimensions are 33. The ‘bad’ class is considered as outliers class and the ‘good’ class as inliers.

  13. Default of credit card clients

    • kaggle.com
    Updated Oct 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marios Michalopoulos (2019). Default of credit card clients [Dataset]. https://www.kaggle.com/mariosfish/default-of-credit-card-clients/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2019
    Dataset provided by
    Kaggle
    Authors
    Marios Michalopoulos
    Description

    Context

    This notebook was created for analysis and prediction making of the Default of credit card clients Data Set from UCI Machine Learning Library. The data set can be accessed separately from the UCI Machine Learning Repository page, here.

    Content

    In their paper "The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. (Yeh I. C. & Lien C. H.,2009)", which can be found here, Yeh I. C. & Lien C. H. review six data mining techniques (discriminant analysis, logistic regression, Bayesclassifier, nearest neighbor, artificial neural networks, and classification trees) and their applications on credit scoring. Then, using the real cardholders’ credit risk data in Taiwan, they compare the classification accuracy among them.

    Models

    We will create 3 models in order to make predictions and compare them with the original paper. These models are: - Logistic Regression - Decision tree - Neural Network

    After the initial predictions, each model will be "optimized" by GridSearchCV estimator, which will search for the best set of hyperparameters for every model.

    Goal

    Using the models we created, we will try to predict the class value of dpnm column with better scores (accuracy and f1) than the scores presented in the original paper.

  14. Z

    Data from: Imbalanced dataset for benchmarking

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lemaitre, Guillaume (2020). Imbalanced dataset for benchmarking [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_61452
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Oliveira, Dayvid V. R.
    Nogueira, Fernando
    Lemaitre, Guillaume
    Aridas, Christos K.
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Imbalanced dataset for benchmarking

    The different algorithms of the imbalanced-learn toolbox are evaluated on a set of common dataset, which are more or less balanced. These benchmark have been proposed in [1]. The following section presents the main characteristics of this benchmark.

    Characteristics

    IDNameRepository & TargetRatio# samples# features
    1EcoliUCI, target: imU8.6:13367
    2Optical DigitsUCI, target: 89.1:15,62064
    3SatImageUCI, target: 49.3:16,43536
    4Pen DigitsUCI, target: 59.4:110,99216
    5AbaloneUCI, target: 79.7:14,1778
    6Sick EuthyroidUCI, target: sick euthyroid9.8:13,16325
    7SpectrometerUCI, target: >=4411:153193
    8Car_Eval_34UCI, target: good, v good12:11,7286
    9ISOLETUCI, target: A, B12:17,797617
    10US CrimeUCI, target: >0.6512:11,994122
    11Yeast_ML8LIBSVM, target: 813:12,417103
    12SceneLIBSVM, target: >one label13:12,407294
    13Libras MoveUCI, target: 114:136090
    14Thyroid SickUCI, target: sick15:13,77228
    15Coil_2000KDD, CoIL, target: minority16:19,82285
    16ArrhythmiaUCI, target: 0617:1452279
    17Solar Flare M0UCI, target: M->019:11,38910
    18OILUCI, target: minority22:193749
    19Car_Eval_4UCI, target: vgood26:11,7286
    20Wine QualityUCI, wine, target: <=426:14,89811
    21Letter ImgUCI, target: Z26:120,00016
    22Yeast _ME2UCI, target: ME228:11,4848
    23WebpageLIBSVM, w7a, target: minority33:149,749300
    24Ozone LevelUCI, ozone, data34:12,53672
    25MammographyUCI, target: minority42:111,1836
    26Protein homo.KDD CUP 2004, minority111:1145,75174
    27Abalone_19UCI, target: 19130:14,1778

    References

    [1] Ding, Zejin, "Diversified Ensemble Classifiers for H ighly Imbalanced Data Learning and their Application in Bioinformatics." Dissertation, Georgia State University, (2011).

    [2] Blake, Catherine, and Christopher J. Merz. "UCI Repository of machine learning databases." (1998).

    [3] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

    [4] Caruana, Rich, Thorsten Joachims, and Lars Backstrom. "KDD-Cup 2004: results and analysis." ACM SIGKDD Explorations Newsletter 6.2 (2004): 95-108.

  15. heart-disease-data

    • kaggle.com
    zip
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nagaveda Reddy (2020). heart-disease-data [Dataset]. https://www.kaggle.com/nagavedareddy/heartdiseasedata
    Explore at:
    zip(3494 bytes)Available download formats
    Dataset updated
    Aug 5, 2020
    Authors
    Nagaveda Reddy
    Description

    Dataset

    This dataset was created by Nagaveda Reddy

    Contents

  16. Credit Default Data Set from UCI Repository

    • kaggle.com
    zip
    Updated Oct 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    somaktukai (2018). Credit Default Data Set from UCI Repository [Dataset]. https://www.kaggle.com/datasets/somaktukai/credit-default-data-set-from-uci-repository
    Explore at:
    zip(1464078 bytes)Available download formats
    Dataset updated
    Oct 30, 2018
    Authors
    somaktukai
    Description

    Dataset

    This dataset was created by somaktukai

    Contents

  17. DodgerLoopGame UCR Archive Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Southampton (2024). DodgerLoopGame UCR Archive Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11186627
    Explore at:
    Dataset updated
    May 14, 2024
    Dataset provided by
    University of Californiahttp://universityofcalifornia.edu/
    University of Southampton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the UCR Archive maintained by University of Southampton researchers. Please cite a relevant or the latest full archive release if you use the datasets. See http://www.timeseriesclassification.com/.

    The traffic data are collected with the loop sensor installed on ramp for the 101 North freeway in Los Angeles. This location is close to Dodgers Stadium; therefore the traffic is affected by volume of visitors to the stadium. Missing values are represented with NaN. - Class 1: Normal Day - Class 2: Game Day There is nothing to infer from the order of examples in the train and test set. Missing values are represented with NaN in the text file. Data created by Ihler, Alexander, Jon Hutchins, and Padhraic Smyth (see [1][2][3]). Data edited by Chin-Chia Michael Yeh.

    [1] Ihler, Alexander, Jon Hutchins, and Padhraic Smyth. "Adaptive event detection with time-varying poisson processes." Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.

    [2] “UCI Machine Learning Repository: Dodgers Loop Sensor Data Set.” UCI Machine Learning Repository, archive.ics.uci.edu/ml/datasets/dodgers+loop+sensor.

    [3] “Caltrans PeMS.” Caltrans, pems.dot.ca.gov/.

    Donator: C. Yeh

  18. UCI Heart Disease Data Set

    • kaggle.com
    Updated Jan 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lourens Walters (2021). UCI Heart Disease Data Set [Dataset]. https://www.kaggle.com/lourenswalters/uci-heart-disease-data-set/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 1, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lourens Walters
    Description

    Context

    The dataset used can be found on the UCI Machine Learning Repository at the following location:

    Heart Disease Dataset

    There are several copies of this dataset to be found on Kaggle, with people focusing on different types of analyses of the data. This specific copy can be analysed by anyone interested, but is primarily used by a study group from the Udacity Bertelsmann Technology Scholarship to practice analysis of association between variables as well as implementation and comparison of various Machine Learning models.

    Content

    According to the paper by (Detrano et al., 1989) as found on the UCI Dataset webpage, the data represents data collected for 303 patients referred for coronary angiography at the Cleveland Clinic between May 1981 and September 1984. The 13 independent/ features variables can be divided into 3 groups as follows:

    Routine evaluation (based on historical data):

    • ECG at rest
    • Serum Cholesterol
    • Fasting blood sugar

    Non-invasive test data (informed consent obtained for data as part of research protocol):

    • Exercise ECG
      • ST-segment peak slope (upsloping, flat or downsloping)
      • ST-segment depression
    • Excercise Thallium scintigraphy (fixed, reversible or none)
    • Cardiac fluoroscopy (number of vessels appeared to contain calcium)

    Other demographic and clinical variables (based on routine data):

    • Age
    • Sex
    • Chest pain type
    • Systolic blood pressure
    • ST-T-wave abnormality (T-wave abnormality)
    • Probably or definite ventricular hypertrophy (Este's criteria)
    • The dependent/ response variable was the angiographic test result indicating a >50% diameter narrowing.

    Data Dictionary

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3632459%2Fa01747fb0158dc51c12bc0824c9c4ae4%2Fdata_dictionary2.png?generation=1609522473018549&alt=media" alt="">

    Acknowledgements

    UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Donor:

    David W. Aha (aha '@' ics.uci.edu) (714) 856-8779

    Inspiration

    The objective of the analysis is to use statistical learning to identify factors associated with Coronary Artery Disease as indicated by a coronary angiography interpreted by a Cardiologist (as per paper written by Detrano et al cited before).

  19. UCI Communities and Crime Unnormalized Data Set

    • kaggle.com
    Updated Feb 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kavitha (2018). UCI Communities and Crime Unnormalized Data Set [Dataset]. https://www.kaggle.com/kkanda/communities%20and%20crime%20unnormalized%20data%20set/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kavitha
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Introduction: The dataset used for this experiment is real and authentic. The dataset is acquired from UCI machine learning repository website [13]. The title of the dataset is ‘Crime and Communities’. It is prepared using real data from socio-economic data from 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crimedata from the 1995 FBI UCR [13]. This dataset contains a total number of 147 attributes and 2216 instances.

    The per capita crimes variables were calculated using population values included in the 1995 FBI data (which differ from the 1990 Census values).

    Content

    The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The crime attributes (N=18) that could be predicted are the 8 crimes considered 'Index Crimes' by the FBI)(Murders, Rape, Robbery, .... ), per capita (actually per 100,000 population) versions of each, and Per Capita Violent Crimes and Per Capita Nonviolent Crimes)

    predictive variables : 125 non-predictive variables : 4 potential goal/response variables : 18

    Acknowledgements

    http://archive.ics.uci.edu/ml/datasets/Communities%20and%20Crime%20Unnormalized

    U. S. Department of Commerce, Bureau of the Census, Census Of Population And Housing 1990 United States: Summary Tape File 1a & 3a (Computer Files),

    U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)

    U.S. Department of Justice, Bureau of Justice Statistics, Law Enforcement Management And Administrative Statistics (Computer File) U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)

    U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States (Computer File) (1995)

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    Data available in the dataset may not act as a complete source of information for identifying factors that contribute to more violent and non-violent crimes as many relevant factors may still be missing.

    However, I would like to try and answer the following questions answered.

    1. Analyze if number of vacant and occupied houses and the period of time the houses were vacant had contributed to any significant change in violent and non-violent crime rates in communities

    2. How has unemployment changed crime rate(violent and non-violent) in the communities?

    3. Were people from a particular age group more vulnerable to crime?

    4. Does ethnicity play a role in crime rate?

    5. Has education played a role in bringing down the crime rate?

  20. UCI Bike Sharing Data

    • kaggle.com
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SriramM2010 (2023). UCI Bike Sharing Data [Dataset]. http://doi.org/10.34740/kaggle/ds/3515261
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SriramM2010
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets

UCI datasets

Explore at:
Dataset updated
May 14, 2025
Authors
Yuan Sun
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

biology

Search
Clear search
Close search
Google apps
Main menu