100+ datasets found
  1. P

    UCI Machine Learning Repository Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository
    Explore at:
    Authors
    Jan N. van Rijn; Jonathan K. Vis
    Description

    UCI Machine Learning Repository is a collection of over 550 datasets.

  2. UCI dataset

    • springernature.figshare.com
    bin
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 13, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

    [2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.

  3. i

    UCI datasets

    • ieee-dataport.org
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets
    Explore at:
    Dataset updated
    May 14, 2025
    Authors
    Yuan Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    biology

  4. i

    UCI dataset

    • ieee-dataport.org
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wutao Xiong (2024). UCI dataset [Dataset]. https://ieee-dataport.org/documents/uci-dataset
    Explore at:
    Dataset updated
    Jun 12, 2024
    Authors
    Wutao Xiong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    and different customers have different starting times

  5. Z

    UCI datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Drton, Mathias (2023). UCI datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7681647
    Explore at:
    Dataset updated
    Apr 4, 2023
    Dataset provided by
    Zadorozhnyi, Oleksandr
    Drton, Mathias
    Reifferscheidt, David
    Haug, Stephan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding

    Air Quality

    US census 1990

    Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly

    Number of features: 15-68

    Ground truth: No

    Type of Graph: No ground truth

    More information about the datasets is contained in the dataset_description.html files.

  6. Bike Rental Data Set - UCI

    • kaggle.com
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Aguado (2022). Bike Rental Data Set - UCI [Dataset]. https://www.kaggle.com/datasets/aguado/bike-rental-data-set-uci
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Víctor Aguado
    Description

    Description

    The existing bicycle rental systems in large cities have a system automated collection and return of the vehicle through a network of stations distributed throughout the entire metropolis. With the use of these systems, people can rent a bike in a location and return it in a different one depending on your needs. The data generated by these systems are attractive to researchers due to variables such as the duration of the trip, departure and destination points and travel time. Therefore, exchange systems Bicycles work as a network of sensors that are useful for mobility studies. With In order to improve management, one of these companies needs to anticipate the demand that there will be in a certain range of time depending on factors such as the time zone, the type day (weekday or holiday), the weather, etc.

    The objective of this data set is to predict the demand in a series of specific time slots, using the historical data set as the basis to build a linear model.

    Data Description

    Two data sets will be delivered containing the number of rented bicycles in different time slots:

    1. Training data. They will contain the response variable (number of bicycles rented in that strip)
    2. Test data. They will not contain the response variable and the response variable must be predicted based on on the historical data of the training set.

    The variables present in the 2 data sets are:

    • id: time slot identifier (not related to time order)
    • year: year (2011 or 2012)
    • hour: hour of the day (0 to 23)
    • season: 1 = winter, 2 = spring, 3 = summer, 4 = autumn
    • holiday: if the day was a holiday
    • workingday: if the day was a working day (neither a holiday nor a weekend)
    • weather: four categories (1 to 4) ranging from best to worst weather
    • temp: temperature in degrees Celsius
    • atemp: sensation of temperature in degrees Celsius
    • humidity: relative humidity
    • windspeed: wind speed (km/h)
    • count (only in the training set): total number of rentals in that band
  7. c

    Diabetes UCI Dataset

    • cubig.ai
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Diabetes UCI Dataset [Dataset]. https://cubig.ai/store/products/494/diabetes-uci-dataset
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Diabetes UCI Dataset is a structured dataset designed for early-stage diabetes risk prediction, collected through questionnaire-based responses from patients at the Sylhet Diabetes Hospital in Bangladesh.

    2) Data Utilization (1) Characteristics of the Diabetes UCI Dataset: • This dataset includes 16 key symptoms of diabetes such as age, gender, sudden weight loss, polyuria, polyphagia, and visual blurring, each recorded as binary indicators (Yes/No). The Class column serves as a binary classification label indicating whether the individual has diabetes (Positive/Negative). • All features are discrete or binary variables, making the dataset highly interpretable and well-structured for medical domain applications.

    (2) Applications of the Diabetes UCI Dataset: • Training Early Diabetes Prediction Models: The dataset can be used to train machine learning binary classification models that predict the likelihood of diabetes onset based on various symptom-related features. • Risk Factor Analysis and Clinical Decision Support: It can be applied to statistical analysis of symptom influence on diabetes diagnosis, or to support the development of clinical decision support systems in healthcare environments.

  8. o

    arrhythmia

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Altay Guvenir; Burak Acar; Haldun Muderrisoglu (2014). arrhythmia [Dataset]. https://www.openml.org/d/5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    H. Altay Guvenir; Burak Acar; Haldun Muderrisoglu
    Description

    Author: H. Altay Guvenir, Burak Acar, Haldun Muderrisoglu
    Source: UCI
    Please cite: UCI

    Cardiac Arrhythmia Database
    The aim is to determine the type of arrhythmia from the ECG recordings. This database contains 279 attributes, 206 of which are linear valued and the rest are nominal.

    Concerning the study of H. Altay Guvenir: "The aim is to distinguish between the presence and absence of cardiac arrhythmia and to classify it in one of the 16 groups. Class 01 refers to 'normal' ECG classes, 02 to 15 refers to different classes of arrhythmia and class 16 refers to the rest of unclassified ones. For the time being, there exists a computer program that makes such a classification. However, there are differences between the cardiologist's and the program's classification. Taking the cardiologist's as a gold standard we aim to minimize this difference by means of machine learning tools.

    The names and id numbers of the patients were recently removed from the database.

    Attribute Information

      1 Age: Age in years , linear
      2 Sex: Sex (0 = male; 1 = female) , nominal
      3 Height: Height in centimeters , linear
      4 Weight: Weight in kilograms , linear
      5 QRS duration: Average of QRS duration in msec., linear
      6 P-R interval: Average duration between onset of P and Q waves
       in msec., linear
      7 Q-T interval: Average duration between onset of Q and offset
       of T waves in msec., linear
      8 T interval: Average duration of T wave in msec., linear
      9 P interval: Average duration of P wave in msec., linear
     Vector angles in degrees on front plane of:, linear
     10 QRS
     11 T
     12 P
     13 QRST
     14 J
     15 Heart rate: Number of heart beats per minute ,linear
     Of channel DI:
      Average width, in msec., of: linear
      16 Q wave
      17 R wave
      18 S wave
      19 R' wave, small peak just after R
      20 S' wave
      21 Number of intrinsic deflections, linear
      22 Existence of ragged R wave, nominal
      23 Existence of diphasic derivation of R wave, nominal
      24 Existence of ragged P wave, nominal
      25 Existence of diphasic derivation of P wave, nominal
      26 Existence of ragged T wave, nominal
      27 Existence of diphasic derivation of T wave, nominal
     Of channel DII: 
      28 .. 39 (similar to 16 .. 27 of channel DI)
     Of channels DIII:
      40 .. 51
     Of channel AVR:
      52 .. 63
     Of channel AVL:
      64 .. 75
     Of channel AVF:
      76 .. 87
     Of channel V1:
      88 .. 99
     Of channel V2:
      100 .. 111
     Of channel V3:
      112 .. 123
     Of channel V4:
      124 .. 135
     Of channel V5:
      136 .. 147
     Of channel V6:
      148 .. 159
     Of channel DI:
      Amplitude , * 0.1 milivolt, of
      160 JJ wave, linear
      161 Q wave, linear
      162 R wave, linear
      163 S wave, linear
      164 R' wave, linear
      165 S' wave, linear
      166 P wave, linear
      167 T wave, linear
      168 QRSA , Sum of areas of all segments divided by 10,
        ( Area= width * height / 2 ), linear
      169 QRSTA = QRSA + 0.5 * width of T wave * 0.1 * height of T
        wave. (If T is diphasic then the bigger segment is
        considered), linear
     Of channel DII:
      170 .. 179
     Of channel DIII:
      180 .. 189
     Of channel AVR:
      190 .. 199
     Of channel AVL:
      200 .. 209
     Of channel AVF:
      210 .. 219
     Of channel V1:
      220 .. 229
     Of channel V2:
      230 .. 239
     Of channel V3:
      240 .. 249
     Of channel V4:
      250 .. 259
     Of channel V5:
      260 .. 269
     Of channel V6:
      270 .. 279
    

    Class code - class - number of instances:

      01       Normal        245
      02       Ischemic changes (Coronary Artery Disease)  44
      03       Old Anterior Myocardial Infarction      15
      04       Old Inferior Myocardial Infarction      15
      05       Sinus tachycardy    13
      06       Sinus bradycardy    25
      07       Ventricular Premature Contraction (PVC)    3
      08       Supraventricular Premature Contraction    2
      09       Left bundle branch block     9 
      10       Right bundle branch block    50
      11       1. degree AtrioVentricular block    0 
      12       2. degree AV block        0
      13       3. degree AV block        0
      14       Left ventricule hypertrophy        4
      15       Atrial Fibrillation or Flutter        5
      16       Others         22
    
  9. s

    UCI Machine Learning Repository

    • scicrunch.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
    Explore at:
    Description

    Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given

  10. h

    uci-shopper

    • huggingface.co
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Henning (2023). uci-shopper [Dataset]. https://huggingface.co/datasets/jlh/uci-shopper
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2023
    Authors
    John Henning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Online Shoppers Purchasing Intention Dataset

      Dataset Summary
    

    This dataset is a reupload of the Online Shoppers Purchasing Intention Dataset from the UCI Machine Learning Repository.

    NOTE: The information below is from the original dataset description from UCI's website.

      Overview
    

    Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples… See the full description on the dataset page: https://huggingface.co/datasets/jlh/uci-shopper.

  11. P

    https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Dataset

    • paperswithcode.com
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Dataset [Dataset]. https://paperswithcode.com/dataset/https-kdd-ics-uci-edu-databases-kddcup99
    Explore at:
    Dataset updated
    Oct 28, 2024
    Description

    Click to add a brief description of the dataset (Markdown and LaTeX enabled).

    Provide:

    a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset

  12. a

    UCI Machine Learning Datasets 12/2013

    • academictorrents.com
    bittorrent
    Updated Dec 20, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2013). UCI Machine Learning Datasets 12/2013 [Dataset]. https://academictorrents.com/details/7fafb101f9c7961f9b840daeb4af43039107ddef
    Explore at:
    bittorrent(16365432846)Available download formats
    Dataset updated
    Dec 20, 2013
    Dataset authored and provided by
    UCI
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. Many people deserve thanks for making the repository a success. Foremost among them are the d

  13. dataset-uci

    • zenodo.org
    csv
    Updated Apr 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David López de la Fuente; Alberto Lendínez Gutiérrez; David López de la Fuente; Alberto Lendínez Gutiérrez (2020). dataset-uci [Dataset]. http://doi.org/10.5281/zenodo.3748994
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 12, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David López de la Fuente; Alberto Lendínez Gutiérrez; David López de la Fuente; Alberto Lendínez Gutiérrez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset contiene la lista de bases de datos que se puede encontrar en el repositorio web de UCI

  14. a

    UCI Folio Leaf Dataset

    • academictorrents.com
    bittorrent
    Updated Oct 12, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trishen Munisami and Mahess Ramsurn and Somveer Kishnah and Sameerch and Pudaruth (2015). UCI Folio Leaf Dataset [Dataset]. https://academictorrents.com/details/a6c64db1e42721f5d7e7aa2b118e293a0d0d335b
    Explore at:
    bittorrent(972471245)Available download formats
    Dataset updated
    Oct 12, 2015
    Dataset authored and provided by
    Trishen Munisami and Mahess Ramsurn and Somveer Kishnah and Sameerch and Pudaruth
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Source: The leaves were taken from plants in the farm of the University of Mauritius and nearby locations. Donors: Trishen Munisami trishen.munisami @ gmail.com Mahess Ramsurn ramsurn.mahess @ umail.uom.ac.mu Somveer Kishnah s.kishnah @ uom.ac.mu Sameerchand Pudaruth sameerchand.pudaruth @ gmail.com Data Set Information: - The leaves were placed on a white background and then photographed. - The pictures were taken in broad daylight to ensure optimum light intensity. Attribute Information: List of plant species: 1. Beaumier du perou 2. Eggplant 3. Fruitcitere 4. Guava 5. Hibiscus 6. Betel 7. Rose 8. Chrysanthemum 9. Ficus 10. Duranta gold 11. Ashanti blood 12. Bitter Orange 13. Coeur Demoiselle 14. Jackfruit 15. Mulberry Leaf 16. Pimento 17. Pomme Jacquot 18. Star Apple 19. Barbados Cherry 20. Sweet Olive 21. Croton 22. Thevetia 23. Vieux Garcon 24. Chocolate tree 25. Carricature plant 26. Coffee 27. Ketembilla 28. Chinese guava 29. Lychee 30. Geranium 31. Sweet potato 32. Papa

  15. Daily Demand Forecasting Orders from UCI ML

    • kaggle.com
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pham Huyen (2025). Daily Demand Forecasting Orders from UCI ML [Dataset]. https://www.kaggle.com/datasets/phamhuyen286/daily-demand-forecasting-orders-from-uci-ml/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Pham Huyen
    Description

    The dataset was collected during 60 days, this is a real database of a brazilian logistics company. The dataset has twelve predictive attributes and a target that is the total of orders for daily treatment. The database was used in academic research at the Universidade Nove de Julho.

  16. f

    Comparison of decision tree dimensions on 40 UCI datasets including the...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregor Stiglic; Simon Kocbek; Igor Pernek; Peter Kokol (2023). Comparison of decision tree dimensions on 40 UCI datasets including the number of leaves. [Dataset]. http://doi.org/10.1371/journal.pone.0033812.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Gregor Stiglic; Simon Kocbek; Igor Pernek; Peter Kokol
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of decision tree dimensions on 40 UCI datasets including the number of leaves.

  17. g

    UCI Heart Disease Data

    • gts.ai
    json
    Updated Jan 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2025). UCI Heart Disease Data [Dataset]. https://gts.ai/dataset-download/uci-heart-disease-data/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 26, 2025
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    Description

    The UCI Heart Disease Dataset with 14 key attributes for machine learning & research. Ideal for predictive modeling.

  18. P

    UCI SMS spam dataset Dataset

    • paperswithcode.com
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). UCI SMS spam dataset Dataset [Dataset]. https://paperswithcode.com/dataset/uci-sms-spam-dataset
    Explore at:
    Dataset updated
    Apr 7, 2024
    Description

    The SMS Spam Collection is a public set of SMS labeled messages that have been collected for mobile phone spam research.

  19. Obesity DataSet UCI ML

    • kaggle.com
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tathagat Banerjee (2022). Obesity DataSet UCI ML [Dataset]. https://www.kaggle.com/datasets/tathagatbanerjee/obesity-dataset-uci-ml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tathagat Banerjee
    Description

    Estimation of obesity levels based on eating habits and physical condition Data Set Download: Data Folder, Data Set Description

    Abstract: This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition.

    Data Set Characteristics:

    Multivariate

    Number of Instances:

    2111

    Area:

    Life

    Attribute Characteristics:

    Integer

    Number of Attributes:

    17

    Date Donated

    2019-08-27

    Associated Tasks:

    Classification, Regression, Clustering

    Missing Values?

    N/A

    Number of Web Hits:

    70843

    Source:

    Fabio Mendoza Palechor, Email: fmendoza1 '@' cuc.edu.co, Celphone: +573182929611 Alexis de la Hoz Manotas, Email: akdelahoz '@' gmail.com, Celphone: +573017756983

    Data Set Information:

    This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. The data contains 17 attributes and 2111 records, the records are labeled with the class variable NObesity (Obesity Level), that allows classification of the data using the values of Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, 23% of the data was collected directly from users through a web platform.

    Attribute Information:

    Read the article ([Web Link]) to see the description of the attributes.

    Relevant Papers:

    [1]Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344. [2]De-La-Hoz-Correa, E., Mendoza Palechor, F., De-La-Hoz-Manotas, A., Morales Ortega, R., & Sánchez Hernández, A. B. (2019). Obesity level estimation software based on decision trees.

    Citation Request:

    [1] Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344.

  20. Z

    Data from: Imbalanced dataset for benchmarking

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lemaitre, Guillaume (2020). Imbalanced dataset for benchmarking [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_61452
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Aridas, Christos K.
    Oliveira, Dayvid V. R.
    Nogueira, Fernando
    Lemaitre, Guillaume
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Imbalanced dataset for benchmarking

    The different algorithms of the imbalanced-learn toolbox are evaluated on a set of common dataset, which are more or less balanced. These benchmark have been proposed in [1]. The following section presents the main characteristics of this benchmark.

    Characteristics

    IDNameRepository & TargetRatio# samples# features
    1EcoliUCI, target: imU8.6:13367
    2Optical DigitsUCI, target: 89.1:15,62064
    3SatImageUCI, target: 49.3:16,43536
    4Pen DigitsUCI, target: 59.4:110,99216
    5AbaloneUCI, target: 79.7:14,1778
    6Sick EuthyroidUCI, target: sick euthyroid9.8:13,16325
    7SpectrometerUCI, target: >=4411:153193
    8Car_Eval_34UCI, target: good, v good12:11,7286
    9ISOLETUCI, target: A, B12:17,797617
    10US CrimeUCI, target: >0.6512:11,994122
    11Yeast_ML8LIBSVM, target: 813:12,417103
    12SceneLIBSVM, target: >one label13:12,407294
    13Libras MoveUCI, target: 114:136090
    14Thyroid SickUCI, target: sick15:13,77228
    15Coil_2000KDD, CoIL, target: minority16:19,82285
    16ArrhythmiaUCI, target: 0617:1452279
    17Solar Flare M0UCI, target: M->019:11,38910
    18OILUCI, target: minority22:193749
    19Car_Eval_4UCI, target: vgood26:11,7286
    20Wine QualityUCI, wine, target: <=426:14,89811
    21Letter ImgUCI, target: Z26:120,00016
    22Yeast _ME2UCI, target: ME228:11,4848
    23WebpageLIBSVM, w7a, target: minority33:149,749300
    24Ozone LevelUCI, ozone, data34:12,53672
    25MammographyUCI, target: minority42:111,1836
    26Protein homo.KDD CUP 2004, minority111:1145,75174
    27Abalone_19UCI, target: 19130:14,1778

    References

    [1] Ding, Zejin, "Diversified Ensemble Classifiers for H ighly Imbalanced Data Learning and their Application in Bioinformatics." Dissertation, Georgia State University, (2011).

    [2] Blake, Catherine, and Christopher J. Merz. "UCI Repository of machine learning databases." (1998).

    [3] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

    [4] Caruana, Rich, Thorsten Joachims, and Lars Backstrom. "KDD-Cup 2004: results and analysis." ACM SIGKDD Explorations Newsletter 6.2 (2004): 95-108.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository

UCI Machine Learning Repository Dataset

Explore at:
Authors
Jan N. van Rijn; Jonathan K. Vis
Description

UCI Machine Learning Repository is a collection of over 550 datasets.

Search
Clear search
Close search
Google apps
Main menu