97 datasets found
  1. P

    UCI Machine Learning Repository Dataset

    • paperswithcode.com
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository
    Explore at:
    Dataset updated
    Mar 9, 2021
    Authors
    Jan N. van Rijn; Jonathan K. Vis
    Description

    UCI Machine Learning Repository is a collection of over 550 datasets.

  2. i

    UCI datasets

    • ieee-dataport.org
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets
    Explore at:
    Dataset updated
    May 14, 2025
    Authors
    Yuan Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    biology

  3. s

    UCI Machine Learning Repository

    • scicrunch.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
    Explore at:
    Description

    Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given

  4. a

    UCI Machine Learning Datasets 12/2013

    • academictorrents.com
    bittorrent
    Updated Dec 20, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2013). UCI Machine Learning Datasets 12/2013 [Dataset]. https://academictorrents.com/details/7fafb101f9c7961f9b840daeb4af43039107ddef
    Explore at:
    bittorrent(16365432846)Available download formats
    Dataset updated
    Dec 20, 2013
    Dataset authored and provided by
    UCI
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. Many people deserve thanks for making the repository a success. Foremost among them are the d

  5. UCI dataset

    • springernature.figshare.com
    bin
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 13, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

    [2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.

  6. UCI and OpenML Data Sets for Ordinal Quantification

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()"
    julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  7. heart-disease-data

    • kaggle.com
    zip
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nagaveda Reddy (2020). heart-disease-data [Dataset]. https://www.kaggle.com/nagavedareddy/heartdiseasedata
    Explore at:
    zip(3494 bytes)Available download formats
    Dataset updated
    Aug 5, 2020
    Authors
    Nagaveda Reddy
    Description

    Dataset

    This dataset was created by Nagaveda Reddy

    Contents

  8. P

    Toxicity Dataset

    • paperswithcode.com
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Toxicity Dataset [Dataset]. https://paperswithcode.com/dataset/toxicity
    Explore at:
    Dataset updated
    Apr 26, 2025
    Description

    Classification dataset of 171 molecules according to its toxicity from the UCI Machine Learning Repository.

  9. Adult Census Income Dataset

    • kaggle.com
    Updated Jul 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyam Choksi (2024). Adult Census Income Dataset [Dataset]. https://www.kaggle.com/datasets/priyamchoksi/adult-census-income-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Priyam Choksi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The following was retrieved from UCI machine learning repository.

    This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)). The prediction task is to determine whether a person makes over $50K a year.

    Description of fnlwgt (final weight)

    The weights on the Current Population Survey (CPS) files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls. These are:

    A single cell estimate of the population 16+ for each state. Controls for Hispanic Origin by age and sex. Controls by Race, age and sex. We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population. People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.

  10. O

    ionosphere

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nanjing University (2022). ionosphere [Dataset]. https://opendatalab.com/OpenDataLab/ionosphere
    Explore at:
    zip(58517 bytes)Available download formats
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    Monash University
    Nanjing University
    Description

    The original ionosphere dataset from UCI machine learning repository is a binary classification dataset with dimensionality 34. There is one attribute having values all zeros, which is discarded. So the total number of dimensions are 33. The ‘bad’ class is considered as outliers class and the ‘good’ class as inliers.

  11. Internet Advertisements Data Set

    • kaggle.com
    Updated Sep 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning (2017). Internet Advertisements Data Set [Dataset]. https://www.kaggle.com/uciml/internet-advertisements-data-set/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    UCI Machine Learning
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The task is to predict whether an image is an advertisement ("ad") or not ("nonad").

    Content

    There are 1559 columns in the data.Each row in the data represent one image which is tagged as ad or nonad in the last column.column 0 to 1557 represent the actual numerical attributes of the images

    Acknowledgements

    Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

    Here is a BiBTeX citation as well:

    @misc{Lichman:2013 , author = "M. Lichman", year = "2013", title = "{UCI} Machine Learning Repository", url = "http://archive.ics.uci.edu/ml", institution = "University of California, Irvine, School of Information and Computer Sciences" } https://archive.ics.uci.edu/ml/citation_policy.html

  12. UCI datasets

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi; Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi (2023). UCI datasets [Dataset]. http://doi.org/10.5281/zenodo.7681792
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 4, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi; Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding

    • Air Quality
    • US census 1990

    Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly

    Number of features: 15-68

    Ground truth: No

    Type of Graph: No ground truth

    More information about the datasets is contained in the dataset_description.html files.

  13. d

    Replication Data for: Scalable Kernel Mean Matching

    • search.dataone.org
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chandra, Swarup (2023). Replication Data for: Scalable Kernel Mean Matching [Dataset]. http://doi.org/10.7910/DVN/ELFPEM
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Chandra, Swarup
    Description
  14. Garments'WorkerProductivity

    • kaggle.com
    Updated Oct 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VargheseR (2024). Garments'WorkerProductivity [Dataset]. http://doi.org/10.24432/C51S6D
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    VargheseR
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction to the Dataset

    Productivity Prediction of Garment Employees [Dataset]. (2020). UCI Machine Learning Repository. https://doi.org/10.24432/C51S6D.

    https://dl.acm.org/doi/abs/10.1504/ijbidm.2021.118183

    The Garment Industry is one of the key examples of the industrial globalization of this modern era. It is a highly labour-intensive industry with lots of manual processes. Satisfying the huge global demand for garment products is mostly dependent on the production and delivery performance of the employees in the garment manufacturing companies. So, it is highly desirable among the decision makers in the garments industry to track, analyse and predict the productivity performance of the working teams in their factories.

    This dataset includes important attributes of the garment manufacturing process and the productivity of the employees which had been collected manually and also been validated by the industry experts.

  15. h

    uci-shopper

    • huggingface.co
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Henning (2023). uci-shopper [Dataset]. https://huggingface.co/datasets/jlh/uci-shopper
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2023
    Authors
    John Henning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Online Shoppers Purchasing Intention Dataset

      Dataset Summary
    

    This dataset is a reupload of the Online Shoppers Purchasing Intention Dataset from the UCI Machine Learning Repository.

    NOTE: The information below is from the original dataset description from UCI's website.

      Overview
    

    Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples… See the full description on the dataset page: https://huggingface.co/datasets/jlh/uci-shopper.

  16. DodgerLoopGame UCR Archive Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Southampton (2024). DodgerLoopGame UCR Archive Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11186627
    Explore at:
    Dataset updated
    May 14, 2024
    Dataset provided by
    University of Californiahttp://universityofcalifornia.edu/
    University of Southampton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the UCR Archive maintained by University of Southampton researchers. Please cite a relevant or the latest full archive release if you use the datasets. See http://www.timeseriesclassification.com/.

    The traffic data are collected with the loop sensor installed on ramp for the 101 North freeway in Los Angeles. This location is close to Dodgers Stadium; therefore the traffic is affected by volume of visitors to the stadium. Missing values are represented with NaN. - Class 1: Normal Day - Class 2: Game Day There is nothing to infer from the order of examples in the train and test set. Missing values are represented with NaN in the text file. Data created by Ihler, Alexander, Jon Hutchins, and Padhraic Smyth (see [1][2][3]). Data edited by Chin-Chia Michael Yeh.

    [1] Ihler, Alexander, Jon Hutchins, and Padhraic Smyth. "Adaptive event detection with time-varying poisson processes." Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.

    [2] “UCI Machine Learning Repository: Dodgers Loop Sensor Data Set.” UCI Machine Learning Repository, archive.ics.uci.edu/ml/datasets/dodgers+loop+sensor.

    [3] “Caltrans PeMS.” Caltrans, pems.dot.ca.gov/.

    Donator: C. Yeh

  17. P

    Period Changer Dataset

    • paperswithcode.com
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Period Changer Dataset [Dataset]. https://paperswithcode.com/dataset/period-changer
    Explore at:
    Dataset updated
    Apr 26, 2025
    Description

    Classification dataset of 90 molecules according to its effects in the circadian rhythm from the UCI Machine Learning Repository.

  18. h

    my_iris

    • huggingface.co
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ralf Beier (2024). my_iris [Dataset]. https://huggingface.co/datasets/beierr1/my_iris
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2024
    Authors
    Ralf Beier
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Iris Species Dataset

    The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The dataset is taken from UCI Machine Learning Repository's… See the full description on the dataset page: https://huggingface.co/datasets/beierr1/my_iris.

  19. UCI Machine Learning Online Retail Transactions

    • kaggle.com
    Updated Sep 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denis Expósito (2024). UCI Machine Learning Online Retail Transactions [Dataset]. http://doi.org/10.34740/kaggle/dsv/9303150
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Denis Expósito
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. Data was obtained from the UCI Machine Learning public repository

  20. t

    Bank Marketing Dataset (UCI) - Test Upload

    • invenio01-demo.tugraz.at
    zip
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. Moro; P. Rita; P. Cortez; S. Moro; P. Rita; P. Cortez (2025). Bank Marketing Dataset (UCI) - Test Upload [Dataset]. http://doi.org/10.24432/c5k306
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    UCI Machine Learning Repository
    Authors
    S. Moro; P. Rita; P. Cortez; S. Moro; P. Rita; P. Cortez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is related to direct marketing campaigns conducted by a Portuguese banking institution, with campaigns relying on phone calls. Often multiple contacts with the same client were necessary to determine whether they would subscribe ('yes') or not ('no') to a bank term deposit. The dataset includes four files:

    1. bank-additional-full.csv: Contains all 41,188 examples with 20 input features, organized chronologically from May 2008 to November 2010, closely aligned with the data analyzed in [Moro et al., 2014].
    2. bank-additional.csv: A subset of 4,119 examples (10% of the full data), randomly selected, with 20 input features.
    3. bank-full.csv: The older version of the dataset, comprising all examples (41,188) with 17 input features, also organized chronologically.
    4. bank.csv: A 10% random subset of the older version, containing 4,119 examples and 17 input features.

    The smaller subsets are designed for testing computationally intensive machine learning algorithms (e.g., SVM). The primary classification objective is to predict whether a client will subscribe to a term deposit ('yes' or 'no'), based on the target variable y.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository

UCI Machine Learning Repository Dataset

Explore at:
Dataset updated
Mar 9, 2021
Authors
Jan N. van Rijn; Jonathan K. Vis
Description

UCI Machine Learning Repository is a collection of over 550 datasets.

Search
Clear search
Close search
Google apps
Main menu