100+ datasets found

P
UCI Machine Learning Repository Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository
Explore at:
Authors
Jan N. van Rijn; Jonathan K. Vis
Description
UCI Machine Learning Repository is a collection of over 550 datasets.
UCI dataset
springernature.figshare.com
bin
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20496258.v1
Dataset updated
Mar 13, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

[2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.
i
UCI datasets
ieee-dataport.org
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets
Explore at:
Dataset updated
May 14, 2025
Authors
Yuan Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
biology
i
UCI dataset
ieee-dataport.org
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wutao Xiong (2024). UCI dataset [Dataset]. https://ieee-dataport.org/documents/uci-dataset
Explore at:
Dataset updated
Jun 12, 2024
Authors
Wutao Xiong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
and different customers have different starting times
Z
UCI datasets
data.niaid.nih.gov
zenodo.org
Updated Apr 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Drton, Mathias (2023). UCI datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7681647
Explore at:
Dataset updated
Apr 4, 2023
Dataset provided by
Zadorozhnyi, Oleksandr
Drton, Mathias
Reifferscheidt, David
Haug, Stephan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding

Air Quality

US census 1990

Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly

Number of features: 15-68

Ground truth: No

Type of Graph: No ground truth

More information about the datasets is contained in the dataset_description.html files.
Bike Rental Data Set - UCI
kaggle.com
Updated Nov 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Aguado (2022). Bike Rental Data Set - UCI [Dataset]. https://www.kaggle.com/datasets/aguado/bike-rental-data-set-uci
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Víctor Aguado
Description
Description

The existing bicycle rental systems in large cities have a system automated collection and return of the vehicle through a network of stations distributed throughout the entire metropolis. With the use of these systems, people can rent a bike in a location and return it in a different one depending on your needs. The data generated by these systems are attractive to researchers due to variables such as the duration of the trip, departure and destination points and travel time. Therefore, exchange systems Bicycles work as a network of sensors that are useful for mobility studies. With In order to improve management, one of these companies needs to anticipate the demand that there will be in a certain range of time depending on factors such as the time zone, the type day (weekday or holiday), the weather, etc.

The objective of this data set is to predict the demand in a series of specific time slots, using the historical data set as the basis to build a linear model.

Data Description

Two data sets will be delivered containing the number of rented bicycles in different time slots:

Training data. They will contain the response variable (number of bicycles rented in that strip)

Test data. They will not contain the response variable and the response variable must be predicted based on on the historical data of the training set.

The variables present in the 2 data sets are:

id: time slot identifier (not related to time order)

year: year (2011 or 2012)

hour: hour of the day (0 to 23)

season: 1 = winter, 2 = spring, 3 = summer, 4 = autumn

holiday: if the day was a holiday

workingday: if the day was a working day (neither a holiday nor a weekend)

weather: four categories (1 to 4) ranging from best to worst weather

temp: temperature in degrees Celsius

atemp: sensation of temperature in degrees Celsius

humidity: relative humidity

windspeed: wind speed (km/h)

count (only in the training set): total number of rentals in that band
c
Diabetes UCI Dataset
cubig.ai
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Diabetes UCI Dataset [Dataset]. https://cubig.ai/store/products/494/diabetes-uci-dataset
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Diabetes UCI Dataset is a structured dataset designed for early-stage diabetes risk prediction, collected through questionnaire-based responses from patients at the Sylhet Diabetes Hospital in Bangladesh.

2) Data Utilization (1) Characteristics of the Diabetes UCI Dataset: • This dataset includes 16 key symptoms of diabetes such as age, gender, sudden weight loss, polyuria, polyphagia, and visual blurring, each recorded as binary indicators (Yes/No). The Class column serves as a binary classification label indicating whether the individual has diabetes (Positive/Negative). • All features are discrete or binary variables, making the dataset highly interpretable and well-structured for medical domain applications.

(2) Applications of the Diabetes UCI Dataset: • Training Early Diabetes Prediction Models: The dataset can be used to train machine learning binary classification models that predict the likelihood of diabetes onset based on various symptom-related features. • Risk Factor Analysis and Clinical Decision Support: It can be applied to statistical analysis of symptom influence on diabetes diagnosis, or to support the development of clinical decision support systems in healthcare environments.

arrhythmia

openml.org

Updated Apr 6, 2014

Facebook

Twitter

Click to copy link

Link copied

Cite

H. Altay Guvenir; Burak Acar; Haldun Muderrisoglu (2014). arrhythmia [Dataset]. https://www.openml.org/d/5

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 6, 2014

Authors

H. Altay Guvenir; Burak Acar; Haldun Muderrisoglu

Description

Author: H. Altay Guvenir, Burak Acar, Haldun Muderrisoglu
Source: UCI
Please cite: UCI

Cardiac Arrhythmia Database
The aim is to determine the type of arrhythmia from the ECG recordings. This database contains 279 attributes, 206 of which are linear valued and the rest are nominal.

Concerning the study of H. Altay Guvenir: "The aim is to distinguish between the presence and absence of cardiac arrhythmia and to classify it in one of the 16 groups. Class 01 refers to 'normal' ECG classes, 02 to 15 refers to different classes of arrhythmia and class 16 refers to the rest of unclassified ones. For the time being, there exists a computer program that makes such a classification. However, there are differences between the cardiologist's and the program's classification. Taking the cardiologist's as a gold standard we aim to minimize this difference by means of machine learning tools.

The names and id numbers of the patients were recently removed from the database.

Attribute Information

  1 Age: Age in years , linear
  2 Sex: Sex (0 = male; 1 = female) , nominal
  3 Height: Height in centimeters , linear
  4 Weight: Weight in kilograms , linear
  5 QRS duration: Average of QRS duration in msec., linear
  6 P-R interval: Average duration between onset of P and Q waves
   in msec., linear
  7 Q-T interval: Average duration between onset of Q and offset
   of T waves in msec., linear
  8 T interval: Average duration of T wave in msec., linear
  9 P interval: Average duration of P wave in msec., linear
 Vector angles in degrees on front plane of:, linear
 10 QRS
 11 T
 12 P
 13 QRST
 14 J
 15 Heart rate: Number of heart beats per minute ,linear
 Of channel DI:
  Average width, in msec., of: linear
  16 Q wave
  17 R wave
  18 S wave
  19 R' wave, small peak just after R
  20 S' wave
  21 Number of intrinsic deflections, linear
  22 Existence of ragged R wave, nominal
  23 Existence of diphasic derivation of R wave, nominal
  24 Existence of ragged P wave, nominal
  25 Existence of diphasic derivation of P wave, nominal
  26 Existence of ragged T wave, nominal
  27 Existence of diphasic derivation of T wave, nominal
 Of channel DII: 
  28 .. 39 (similar to 16 .. 27 of channel DI)
 Of channels DIII:
  40 .. 51
 Of channel AVR:
  52 .. 63
 Of channel AVL:
  64 .. 75
 Of channel AVF:
  76 .. 87
 Of channel V1:
  88 .. 99
 Of channel V2:
  100 .. 111
 Of channel V3:
  112 .. 123
 Of channel V4:
  124 .. 135
 Of channel V5:
  136 .. 147
 Of channel V6:
  148 .. 159
 Of channel DI:
  Amplitude , * 0.1 milivolt, of
  160 JJ wave, linear
  161 Q wave, linear
  162 R wave, linear
  163 S wave, linear
  164 R' wave, linear
  165 S' wave, linear
  166 P wave, linear
  167 T wave, linear
  168 QRSA , Sum of areas of all segments divided by 10,
    ( Area= width * height / 2 ), linear
  169 QRSTA = QRSA + 0.5 * width of T wave * 0.1 * height of T
    wave. (If T is diphasic then the bigger segment is
    considered), linear
 Of channel DII:
  170 .. 179
 Of channel DIII:
  180 .. 189
 Of channel AVR:
  190 .. 199
 Of channel AVL:
  200 .. 209
 Of channel AVF:
  210 .. 219
 Of channel V1:
  220 .. 229
 Of channel V2:
  230 .. 239
 Of channel V3:
  240 .. 249
 Of channel V4:
  250 .. 259
 Of channel V5:
  260 .. 269
 Of channel V6:
  270 .. 279

Class code - class - number of instances:

  01       Normal        245
  02       Ischemic changes (Coronary Artery Disease)  44
  03       Old Anterior Myocardial Infarction      15
  04       Old Inferior Myocardial Infarction      15
  05       Sinus tachycardy    13
  06       Sinus bradycardy    25
  07       Ventricular Premature Contraction (PVC)    3
  08       Supraventricular Premature Contraction    2
  09       Left bundle branch block     9 
  10       Right bundle branch block    50
  11       1. degree AtrioVentricular block    0 
  12       2. degree AV block        0
  13       3. degree AV block        0
  14       Left ventricule hypertrophy        4
  15       Atrial Fibrillation or Flutter        5
  16       Others         22

s
UCI Machine Learning Repository
scicrunch.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_026571
Description
Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given
h
uci-shopper
huggingface.co
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Henning (2023). uci-shopper [Dataset]. https://huggingface.co/datasets/jlh/uci-shopper
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2023
Authors
John Henning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Online Shoppers Purchasing Intention Dataset

Dataset Summary

This dataset is a reupload of the Online Shoppers Purchasing Intention Dataset from the UCI Machine Learning Repository.

NOTE: The information below is from the original dataset description from UCI's website.

Overview

Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples… See the full description on the dataset page: https://huggingface.co/datasets/jlh/uci-shopper.
P
https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Dataset
paperswithcode.com
Updated Oct 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Dataset [Dataset]. https://paperswithcode.com/dataset/https-kdd-ics-uci-edu-databases-kddcup99
Explore at:
Dataset updated
Oct 28, 2024
Description
Click to add a brief description of the dataset (Markdown and LaTeX enabled).

Provide:

a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset
a
UCI Machine Learning Datasets 12/2013
academictorrents.com
bittorrent
Updated Dec 20, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI (2013). UCI Machine Learning Datasets 12/2013 [Dataset]. https://academictorrents.com/details/7fafb101f9c7961f9b840daeb4af43039107ddef
Explore at:
bittorrent(16365432846)Available download formats
Dataset updated
Dec 20, 2013
Dataset authored and provided by
UCI
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. Many people deserve thanks for making the repository a success. Foremost among them are the d
dataset-uci
zenodo.org
csv
Updated Apr 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David López de la Fuente; Alberto Lendínez Gutiérrez; David López de la Fuente; Alberto Lendínez Gutiérrez (2020). dataset-uci [Dataset]. http://doi.org/10.5281/zenodo.3748994
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3748994
Dataset updated
Apr 12, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David López de la Fuente; Alberto Lendínez Gutiérrez; David López de la Fuente; Alberto Lendínez Gutiérrez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset contiene la lista de bases de datos que se puede encontrar en el repositorio web de UCI
a
UCI Folio Leaf Dataset
academictorrents.com
bittorrent
Updated Oct 12, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trishen Munisami and Mahess Ramsurn and Somveer Kishnah and Sameerch and Pudaruth (2015). UCI Folio Leaf Dataset [Dataset]. https://academictorrents.com/details/a6c64db1e42721f5d7e7aa2b118e293a0d0d335b
Explore at:
bittorrent(972471245)Available download formats
Dataset updated
Oct 12, 2015
Dataset authored and provided by
Trishen Munisami and Mahess Ramsurn and Somveer Kishnah and Sameerch and Pudaruth
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Source: The leaves were taken from plants in the farm of the University of Mauritius and nearby locations. Donors: Trishen Munisami trishen.munisami @ gmail.com Mahess Ramsurn ramsurn.mahess @ umail.uom.ac.mu Somveer Kishnah s.kishnah @ uom.ac.mu Sameerchand Pudaruth sameerchand.pudaruth @ gmail.com Data Set Information: - The leaves were placed on a white background and then photographed. - The pictures were taken in broad daylight to ensure optimum light intensity. Attribute Information: List of plant species: 1. Beaumier du perou 2. Eggplant 3. Fruitcitere 4. Guava 5. Hibiscus 6. Betel 7. Rose 8. Chrysanthemum 9. Ficus 10. Duranta gold 11. Ashanti blood 12. Bitter Orange 13. Coeur Demoiselle 14. Jackfruit 15. Mulberry Leaf 16. Pimento 17. Pomme Jacquot 18. Star Apple 19. Barbados Cherry 20. Sweet Olive 21. Croton 22. Thevetia 23. Vieux Garcon 24. Chocolate tree 25. Carricature plant 26. Coffee 27. Ketembilla 28. Chinese guava 29. Lychee 30. Geranium 31. Sweet potato 32. Papa
Daily Demand Forecasting Orders from UCI ML
kaggle.com
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pham Huyen (2025). Daily Demand Forecasting Orders from UCI ML [Dataset]. https://www.kaggle.com/datasets/phamhuyen286/daily-demand-forecasting-orders-from-uci-ml/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Pham Huyen
Description
The dataset was collected during 60 days, this is a real database of a brazilian logistics company. The dataset has twelve predictive attributes and a target that is the total of orders for daily treatment. The database was used in academic research at the Universidade Nove de Julho.
f
Comparison of decision tree dimensions on 40 UCI datasets including the...
figshare.com
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregor Stiglic; Simon Kocbek; Igor Pernek; Peter Kokol (2023). Comparison of decision tree dimensions on 40 UCI datasets including the number of leaves. [Dataset]. http://doi.org/10.1371/journal.pone.0033812.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0033812.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Gregor Stiglic; Simon Kocbek; Igor Pernek; Peter Kokol
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of decision tree dimensions on 40 UCI datasets including the number of leaves.
g
UCI Heart Disease Data
gts.ai
json
Updated Jan 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2025). UCI Heart Disease Data [Dataset]. https://gts.ai/dataset-download/uci-heart-disease-data/
Explore at:
jsonAvailable download formats
Dataset updated
Jan 26, 2025
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
Description
The UCI Heart Disease Dataset with 14 key attributes for machine learning & research. Ideal for predictive modeling.
P
UCI SMS spam dataset Dataset
paperswithcode.com
Updated Apr 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). UCI SMS spam dataset Dataset [Dataset]. https://paperswithcode.com/dataset/uci-sms-spam-dataset
Explore at:
Dataset updated
Apr 7, 2024
Description
The SMS Spam Collection is a public set of SMS labeled messages that have been collected for mobile phone spam research.
Obesity DataSet UCI ML
kaggle.com
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tathagat Banerjee (2022). Obesity DataSet UCI ML [Dataset]. https://www.kaggle.com/datasets/tathagatbanerjee/obesity-dataset-uci-ml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tathagat Banerjee
Description
Estimation of obesity levels based on eating habits and physical condition Data Set Download: Data Folder, Data Set Description

Abstract: This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition.

Data Set Characteristics:

Multivariate

Number of Instances:

2111

Area:

Life

Attribute Characteristics:

Integer

Number of Attributes:

17

Date Donated

2019-08-27

Associated Tasks:

Classification, Regression, Clustering

Missing Values?

N/A

Number of Web Hits:

70843

Source:

Fabio Mendoza Palechor, Email: fmendoza1 '@' cuc.edu.co, Celphone: +573182929611 Alexis de la Hoz Manotas, Email: akdelahoz '@' gmail.com, Celphone: +573017756983

Data Set Information:

This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. The data contains 17 attributes and 2111 records, the records are labeled with the class variable NObesity (Obesity Level), that allows classification of the data using the values of Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, 23% of the data was collected directly from users through a web platform.

Attribute Information:

Read the article ([Web Link]) to see the description of the attributes.

Relevant Papers:

[1]Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344. [2]De-La-Hoz-Correa, E., Mendoza Palechor, F., De-La-Hoz-Manotas, A., Morales Ortega, R., & SÃ¡nchez HernÃ¡ndez, A. B. (2019). Obesity level estimation software based on decision trees.

Citation Request:

[1] Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344.

Data from: Imbalanced dataset for benchmarking

data.niaid.nih.gov
zenodo.org

Updated Jan 24, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Lemaitre, Guillaume (2020). Imbalanced dataset for benchmarking [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_61452

Explore at:

Dataset updated

Jan 24, 2020

Dataset provided by

Aridas, Christos K.
Oliveira, Dayvid V. R.
Nogueira, Fernando
Lemaitre, Guillaume

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

Imbalanced dataset for benchmarking

The different algorithms of the imbalanced-learn toolbox are evaluated on a set of common dataset, which are more or less balanced. These benchmark have been proposed in [1]. The following section presents the main characteristics of this benchmark.

Characteristics

ID	Name	Repository & Target	Ratio	# samples	# features
1	Ecoli	UCI, target: imU	8.6:1	336	7
2	Optical Digits	UCI, target: 8	9.1:1	5,620	64
3	SatImage	UCI, target: 4	9.3:1	6,435	36
4	Pen Digits	UCI, target: 5	9.4:1	10,992	16
5	Abalone	UCI, target: 7	9.7:1	4,177	8
6	Sick Euthyroid	UCI, target: sick euthyroid	9.8:1	3,163	25
7	Spectrometer	UCI, target: >=44	11:1	531	93
8	Car_Eval_34	UCI, target: good, v good	12:1	1,728	6
9	ISOLET	UCI, target: A, B	12:1	7,797	617
10	US Crime	UCI, target: >0.65	12:1	1,994	122
11	Yeast_ML8	LIBSVM, target: 8	13:1	2,417	103
12	Scene	LIBSVM, target: >one label	13:1	2,407	294
13	Libras Move	UCI, target: 1	14:1	360	90
14	Thyroid Sick	UCI, target: sick	15:1	3,772	28
15	Coil_2000	KDD, CoIL, target: minority	16:1	9,822	85
16	Arrhythmia	UCI, target: 06	17:1	452	279
17	Solar Flare M0	UCI, target: M->0	19:1	1,389	10
18	OIL	UCI, target: minority	22:1	937	49
19	Car_Eval_4	UCI, target: vgood	26:1	1,728	6
20	Wine Quality	UCI, wine, target: <=4	26:1	4,898	11
21	Letter Img	UCI, target: Z	26:1	20,000	16
22	Yeast _ME2	UCI, target: ME2	28:1	1,484	8
23	Webpage	LIBSVM, w7a, target: minority	33:1	49,749	300
24	Ozone Level	UCI, ozone, data	34:1	2,536	72
25	Mammography	UCI, target: minority	42:1	11,183	6
26	Protein homo.	KDD CUP 2004, minority	111:1	145,751	74
27	Abalone_19	UCI, target: 19	130:1	4,177	8

References

[1] Ding, Zejin, "Diversified Ensemble Classifiers for H ighly Imbalanced Data Learning and their Application in Bioinformatics." Dissertation, Georgia State University, (2011).

[2] Blake, Catherine, and Christopher J. Merz. "UCI Repository of machine learning databases." (1998).

[3] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

[4] Caruana, Rich, Thorsten Joachims, and Lars Backstrom. "KDD-Cup 2004: results and analysis." ACM SIGKDD Explorations Newsletter 6.2 (2004): 95-108.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository

UCI Machine Learning Repository Dataset

Explore at:

Authors

Jan N. van Rijn; Jonathan K. Vis

Description

UCI Machine Learning Repository is a collection of over 550 datasets.

Clear search

Close search

Google apps

Main menu

UCI Machine Learning Repository Dataset

UCI dataset

UCI datasets

UCI dataset

UCI datasets

Bike Rental Data Set - UCI

Description

Data Description

Diabetes UCI Dataset

arrhythmia

Attribute Information

UCI Machine Learning Repository

uci-shopper

https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Dataset

UCI Machine Learning Datasets 12/2013

dataset-uci

UCI Folio Leaf Dataset

Daily Demand Forecasting Orders from UCI ML

Comparison of decision tree dimensions on 40 UCI datasets including the...

UCI Heart Disease Data

UCI SMS spam dataset Dataset

Obesity DataSet UCI ML

Source:

Data Set Information:

Attribute Information:

Relevant Papers:

Citation Request:

Data from: Imbalanced dataset for benchmarking

Imbalanced dataset for benchmarking

Characteristics

References

UCI Machine Learning Repository Dataset