100+ datasets found

i
UCI datasets
ieee-dataport.org
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets
Explore at:
Dataset updated
May 14, 2025
Authors
Yuan Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
biology
a
UCI Machine Learning Datasets 12/2013
academictorrents.com
bittorrent
Updated Dec 20, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI (2013). UCI Machine Learning Datasets 12/2013 [Dataset]. https://academictorrents.com/details/7fafb101f9c7961f9b840daeb4af43039107ddef
Explore at:
bittorrent(16365432846)Available download formats
Dataset updated
Dec 20, 2013
Dataset authored and provided by
UCI
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. Many people deserve thanks for making the repository a success. Foremost among them are the d
UCI and OpenML Data Sets for Ordinal Quantification
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8177302
Dataset updated
Jul 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

Usage

You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

Data Extraction: In your terminal, you can call either

make

(recommended), or

julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl

Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

Further Reading

Implementation of our experiments: https://github.com/mirkobunse/regularized-oq
s
UCI Machine Learning Repository
scicrunch.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_026571
Description
Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given
UCI dataset
springernature.figshare.com
bin
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20496258.v1
Dataset updated
Mar 13, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

[2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.
UCI Machine Learning Online Retail Transactions
kaggle.com
Updated Sep 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Denis Expósito (2024). UCI Machine Learning Online Retail Transactions [Dataset]. http://doi.org/10.34740/kaggle/dsv/9303150
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9303150
Dataset updated
Sep 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Denis Expósito
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. Data was obtained from the UCI Machine Learning public repository
UCI datasets
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi; Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi (2023). UCI datasets [Dataset]. http://doi.org/10.5281/zenodo.7681792
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7681792
Dataset updated
Apr 4, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi; Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding

Air Quality

US census 1990

Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly

Number of features: 15-68

Ground truth: No

Type of Graph: No ground truth

More information about the datasets is contained in the dataset_description.html files.
Imbalanced dataset for benchmarking
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira; Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira (2020). Imbalanced dataset for benchmarking [Dataset]. http://doi.org/10.5281/zenodo.61452
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.61452
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira; Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Imbalanced dataset for benchmarking
=======================

The different algorithms of the `imbalanced-learn` toolbox are evaluated on a set of common dataset, which are more or less balanced. These benchmark have been proposed in [1]. The following section presents the main characteristics of this benchmark.

Characteristics
-------------------

|ID |Name |Repository & Target |Ratio |# samples| # features |
|:---:|:----------------------:|--------------------------------------|:------:|:-------------:|:--------------:|
|1 |Ecoli |UCI, target: imU |8.6:1 |336 |7 |
|2 |Optical Digits |UCI, target: 8 |9.1:1 |5,620 |64 |
|3 |SatImage |UCI, target: 4 |9.3:1 |6,435 |36 |
|4 |Pen Digits |UCI, target: 5 |9.4:1 |10,992 |16 |
|5 |Abalone |UCI, target: 7 |9.7:1 |4,177 |8 |
|6 |Sick Euthyroid |UCI, target: sick euthyroid |9.8:1 |3,163 |25 |
|7 |Spectrometer |UCI, target: >=44 |11:1 |531 |93 |
|8 |Car_Eval_34 |UCI, target: good, v good |12:1 |1,728 |6 |
|9 |ISOLET |UCI, target: A, B |12:1 |7,797 |617 |
|10 |US Crime |UCI, target: >0.65 |12:1 |1,994 |122 |
|11 |Yeast_ML8 |LIBSVM, target: 8 |13:1 |2,417 |103 |
|12 |Scene |LIBSVM, target: >one label |13:1 |2,407 |294 |
|13 |Libras Move |UCI, target: 1 |14:1 |360 |90 |
|14 |Thyroid Sick |UCI, target: sick |15:1 |3,772 |28 |
|15 |Coil_2000 |KDD, CoIL, target: minority |16:1 |9,822 |85 |
|16 |Arrhythmia |UCI, target: 06 |17:1 |452 |279 |
|17 |Solar Flare M0 |UCI, target: M->0 |19:1 |1,389 |10 |
|18 |OIL |UCI, target: minority |22:1 |937 |49 |
|19 |Car_Eval_4 |UCI, target: vgood |26:1 |1,728 |6 |
|20 |Wine Quality |UCI, wine, target: <=4 |26:1 |4,898 |11 |
|21 |Letter Img |UCI, target: Z |26:1 |20,000 |16 |
|22 |Yeast _ME2 |UCI, target: ME2 |28:1 |1,484 |8 |
|23 |Webpage |LIBSVM, w7a, target: minority|33:1 |49,749 |300 |
|24 |Ozone Level |UCI, ozone, data |34:1 |2,536 |72 |
|25 |Mammography |UCI, target: minority |42:1 |11,183 |6 |
|26 |Protein homo. |KDD CUP 2004, minority |111:1|145,751 |74 |
|27 |Abalone_19 |UCI, target: 19 |130:1|4,177 |8 |

References
----------
[1] Ding, Zejin, "Diversified Ensemble Classifiers for H
ighly Imbalanced Data Learning and their Application in Bioinformatics." Dissertation, Georgia State University, (2011).

[2] Blake, Catherine, and Christopher J. Merz. "UCI Repository of machine learning databases." (1998).

[3] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

[4] Caruana, Rich, Thorsten Joachims, and Lars Backstrom. "KDD-Cup 2004: results and analysis." ACM SIGKDD Explorations Newsletter 6.2 (2004): 95-108.
d
Replication Data for: Scalable Kernel Mean Matching
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chandra, Swarup (2023). Replication Data for: Scalable Kernel Mean Matching [Dataset]. http://doi.org/10.7910/DVN/ELFPEM
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ELFPEM
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Chandra, Swarup
Description
Datasets available at UCI Machine Learning Repository and other repositories. List of datasets used in the experiment with their sources. ForestCover dataset @ https://archive.ics.uci.edu/ml/datasets/Covertype KDD Cup99 dataset @ https://archive.ics.uci.edu/ml/datasets/KDD+Cup+1999+Data PAMAP dataset @ https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring Powersupply @ http://www.cse.fau.edu/~xqzhu/stream.html SEA @ http://www.liaad.up.pt/kdus/products/datasets-for-concept-drift Syn002 & Syn003 (generated) @ http://moa.cms.waikato.ac.nz/details/classification/streams/ MNIST @ https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html News20 @ https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html
O
ionosphere
opendatalab.com
paperswithcode.com
zip
Updated Aug 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nanjing University (2022). ionosphere [Dataset]. https://opendatalab.com/OpenDataLab/ionosphere
Explore at:
zip(58517 bytes)Available download formats
Dataset updated
Aug 28, 2022
Dataset provided by
Monash University
Nanjing University
Description
The original ionosphere dataset from UCI machine learning repository is a binary classification dataset with dimensionality 34. There is one attribute having values all zeros, which is discarded. So the total number of dimensions are 33. The ‘bad’ class is considered as outliers class and the ‘good’ class as inliers.
P
Toxicity Dataset
paperswithcode.com
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Toxicity Dataset [Dataset]. https://paperswithcode.com/dataset/toxicity
Explore at:
Dataset updated
Apr 26, 2025
Description
Classification dataset of 171 molecules according to its toxicity from the UCI Machine Learning Repository.
Internet Advertisements Data Set
kaggle.com
Updated Sep 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2017). Internet Advertisements Data Set [Dataset]. https://www.kaggle.com/uciml/internet-advertisements-data-set/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
UCI Machine Learning
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The task is to predict whether an image is an advertisement ("ad") or not ("nonad").

Content

There are 1559 columns in the data.Each row in the data represent one image which is tagged as ad or nonad in the last column.column 0 to 1557 represent the actual numerical attributes of the images

Acknowledgements

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Here is a BiBTeX citation as well:

@misc{Lichman:2013 , author = "M. Lichman", year = "2013", title = "{UCI} Machine Learning Repository", url = "http://archive.ics.uci.edu/ml", institution = "University of California, Irvine, School of Information and Computer Sciences" } https://archive.ics.uci.edu/ml/citation_policy.html
heart-disease-data
kaggle.com
zip
Updated Aug 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nagaveda Reddy (2020). heart-disease-data [Dataset]. https://www.kaggle.com/nagavedareddy/heartdiseasedata
Explore at:
zip(3494 bytes)Available download formats
Dataset updated
Aug 5, 2020
Authors
Nagaveda Reddy
Description
Dataset

This dataset was created by Nagaveda Reddy

Contents
h
uci-shopper
huggingface.co
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Henning (2023). uci-shopper [Dataset]. https://huggingface.co/datasets/jlh/uci-shopper
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2023
Authors
John Henning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Online Shoppers Purchasing Intention Dataset

Dataset Summary

This dataset is a reupload of the Online Shoppers Purchasing Intention Dataset from the UCI Machine Learning Repository.

NOTE: The information below is from the original dataset description from UCI's website.

Overview

Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples… See the full description on the dataset page: https://huggingface.co/datasets/jlh/uci-shopper.
P
Period Changer Dataset
paperswithcode.com
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Period Changer Dataset [Dataset]. https://paperswithcode.com/dataset/period-changer
Explore at:
Dataset updated
Apr 26, 2025
Description
Classification dataset of 90 molecules according to its effects in the circadian rhythm from the UCI Machine Learning Repository.
t
Bank Marketing Dataset (UCI) - Test Upload
invenio01-demo.tugraz.at
zip
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. Moro; P. Rita; P. Cortez; S. Moro; P. Rita; P. Cortez (2025). Bank Marketing Dataset (UCI) - Test Upload [Dataset]. http://doi.org/10.24432/c5k306
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.24432/c5k306
Dataset updated
Apr 8, 2025
Dataset provided by
UCI Machine Learning Repository
Authors
S. Moro; P. Rita; P. Cortez; S. Moro; P. Rita; P. Cortez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is related to direct marketing campaigns conducted by a Portuguese banking institution, with campaigns relying on phone calls. Often multiple contacts with the same client were necessary to determine whether they would subscribe ('yes') or not ('no') to a bank term deposit. The dataset includes four files:

bank-additional-full.csv: Contains all 41,188 examples with 20 input features, organized chronologically from May 2008 to November 2010, closely aligned with the data analyzed in [Moro et al., 2014].

bank-additional.csv: A subset of 4,119 examples (10% of the full data), randomly selected, with 20 input features.

bank-full.csv: The older version of the dataset, comprising all examples (41,188) with 17 input features, also organized chronologically.

bank.csv: A 10% random subset of the older version, containing 4,119 examples and 17 input features.

The smaller subsets are designed for testing computationally intensive machine learning algorithms (e.g., SVM). The primary classification objective is to predict whether a client will subscribe to a term deposit ('yes' or 'no'), based on the target variable y.
CKD Datasets UCI Machine Learning Repo
kaggle.com
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chirag Jain (2024). CKD Datasets UCI Machine Learning Repo [Dataset]. https://www.kaggle.com/datasets/chiragajain/ckd-datasets-uci-machine-learning-repo/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chirag Jain
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Chirag Jain

Released under CC0: Public Domain

Contents
z
UCI Datasets: "Air quality" and "US Census (1990)"
zenodo.org
bin, csv, html
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). UCI Datasets: "Air quality" and "US Census (1990)" [Dataset]. http://doi.org/10.5281/zenodo.8063512
Explore at:
bin, csv, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8063512
Dataset updated
Jan 27, 2025
Dataset provided by
Zenodo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Two preprocessed datasets collected from the UCI repository that can be used for the purpose of structure learning from multivariate data of different types.

Air Quality

This dataset represents hourly averaged measurements of 5 metal oxide chemical sensors embedded in an air quality chemical multisensor device. The certified analyzer was located on the field in a significantly polluted area, at road level, within an Italian city. Data were recorded from March 2004 to February 2005 (one year), representing the longest freely available recordings of on-field deployed air quality chemical sensor device responses [1]. More information about the attributes and their type can be found in airqualitydataset_description.html.

Size of dataset: 9358
Number of Features: 16
Type of data: discrete and continuous
Ground Truth: No

Contains the responses of a gas multisensor device deployed on the field in an Italian city. Hourly responses averages are recorded along with gas concentrations references from a certified analyzer. There are 15 attributes. Date and Time as well as discrete and real covariates.

0 Date (DD/MM/YYYY)
1 Time (HH.MM.SS)
2 True hourly averaged concentration CO in mg/m^3 (reference analyzer)
3 PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted)
4 True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer)
5 True hourly averaged Benzene concentration in microg/m^3 (reference analyzer)
6 PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted)
7 True hourly averaged NOx concentration in ppb (reference analyzer)
8 PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted)
9 True hourly averaged NO2 concentration in microg/m^3 (reference analyzer)
10 PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)
11 PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted)
12 Temperature in Â°C
13 Relative Humidity (%)
14 AH Absolute Humidity

US Census (1990)

This dataset is a discretized version of the USCensus1990raw dataset. The data was collected as part of the 1990 census, and it describes one percent sample of the Public Use Microdata Samples (PUMS) person records drawn from the full 1990 census sample (all fifty states and the District of Columbia but not including "PUMA Cross State Lines One Percent Persons Records") [2]. More information about the attributes and their type can be found in census1990_description.html.

Size of dataset: 2458285
Number of features: 68
Ground truth: No

References:

[1] S. De Vito and E. Massera and M. Piga and L. Martinotto and G. Di Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical, Volume 129, Issue 2, 22 February 2008, Pages 750-757, ISSN 0925-4005 https://doi.org/10.1016/j.snb.2007.09.060

[2] Meek, Thiesson and Heckerman (2001), "The Learning Curve Method Applied to Clustering",The Journal of Machine Learning Research. (Also see MSR-TR-2001-34 available athttps://www.microsoft.com/en-us/research/wp-content/uploads/2001/01/lc-aistats.pdf)
h
my_iris
huggingface.co
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ralf Beier (2024). my_iris [Dataset]. https://huggingface.co/datasets/beierr1/my_iris
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Authors
Ralf Beier
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Iris Species Dataset

The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The dataset is taken from UCI Machine Learning Repository's… See the full description on the dataset page: https://huggingface.co/datasets/beierr1/my_iris.
o
kr-vs-kp
openml.org
Updated Apr 6, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alen Shapiro (2014). kr-vs-kp [Dataset]. https://www.openml.org/search?type=data&sort=runs&status=active&qualities.NumberOfClasses=%3D_2&qualities.NumberOfInstances=gte_0&id=3
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 6, 2014
Authors
Alen Shapiro
Description
Author: Alen Shapiro Source: UCI Please cite: UCI citation policy

Title: Chess End-Game -- King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7). The pawn on a7 means it is one square away from queening. It is the King+Rook's side (white) to move.

Sources: (a) Database originally generated and described by Alen Shapiro. (b) Donor/Coder: Rob Holte (holte@uottawa.bitnet). The database was supplied to Holte by Peter Clark of the Turing Institute in Glasgow (pete@turing.ac.uk). (c) Date: 1 August 1989

Past Usage:

Alen D. Shapiro (1983,1987), "Structured Induction in Expert Systems", Addison-Wesley. This book is based on Shapiro's Ph.D. thesis (1983) at the University of Edinburgh entitled "The Role of Structured Induction in Expert Systems".

Stephen Muggleton (1987), "Structuring Knowledge by Asking Questions", pp.218-229 in "Progress in Machine Learning", edited by I. Bratko and Nada Lavrac, Sigma Press, Wilmslow, England SK9 5BB.

Robert C. Holte, Liane Acker, and Bruce W. Porter (1989), "Concept Learning and the Problem of Small Disjuncts", Proceedings of IJCAI. Also available as technical report AI89-106, Computer Sciences Department, University of Texas at Austin, Austin, Texas 78712.

Relevant Information: The dataset format is described below. Note: the format of this database was modified on 2/26/90 to conform with the format of all the other databases in the UCI repository of machine learning databases.

Number of Instances: 3196 total

Number of Attributes: 36

Attribute Summaries: Classes (2): -- White-can-win ("won") and White-cannot-win ("nowin"). I believe that White is deemed to be unable to win if the Black pawn can safely advance. Attributes: see Shapiro's book.

Missing Attributes: -- none

Class Distribution: In 1669 of the positions (52%), White can win. In 1527 of the positions (48%), White cannot win.

The format for instances in this database is a sequence of 37 attribute values. Each instance is a board-descriptions for this chess endgame. The first 36 attributes describe the board. The last (37th) attribute is the classification: "win" or "nowin". There are 0 missing values. A typical board-description is

f,f,f,f,f,f,f,f,f,f,f,f,l,f,n,f,f,t,f,f,f,f,f,f,f,t,f,f,f,f,f,f,f,t,t,n,won

The names of the features do not appear in the board-descriptions. Instead, each feature correponds to a particular position in the feature-value list. For example, the head of this list is the value for the feature "bkblk". The following is the list of features, in the order in which their values appear in the feature-value list:

[bkblk,bknwy,bkon8,bkona,bkspr,bkxbq,bkxcr,bkxwp,blxwp,bxqsq,cntxt,dsopp,dwipd, hdchk,katri,mulch,qxmsq,r2ar8,reskd,reskr,rimmx,rkxwp,rxmsq,simpl,skach,skewr, skrxp,spcop,stlmt,thrsk,wkcti,wkna8,wknck,wkovl,wkpos,wtoeg]

In the file, there is one instance (board position) per line.

Num Instances: 3196 Num Attributes: 37 Num Continuous: 0 (Int 0 / Real 0) Num Discrete: 37 Missing values: 0 / 0.0%