100+ datasets found
  1. i

    UCI datasets

    • ieee-dataport.org
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets
    Explore at:
    Dataset updated
    May 14, 2025
    Authors
    Yuan Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    biology

  2. P

    UCI Machine Learning Repository Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository
    Explore at:
    Authors
    Jan N. van Rijn; Jonathan K. Vis
    Description

    UCI Machine Learning Repository is a collection of over 550 datasets.

  3. UCI dataset

    • springernature.figshare.com
    bin
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 13, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

    [2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.

  4. s

    UCI Machine Learning Repository

    • scicrunch.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
    Explore at:
    Description

    Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given

  5. Z

    UCI and OpenML Data Sets for Ordinal Quantification

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bunse, Mirko (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8177301
    Explore at:
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Moreo, Alejandro
    Bunse, Mirko
    Sebastiani, Fabrizio
    Senz, Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  6. g

    UCI Heart Disease Data

    • gts.ai
    json
    Updated Jan 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2025). UCI Heart Disease Data [Dataset]. https://gts.ai/dataset-download/uci-heart-disease-data/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 26, 2025
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    Description

    The UCI Heart Disease Dataset with 14 key attributes for machine learning & research. Ideal for predictive modeling.

  7. a

    UCI Machine Learning Datasets 12/2013

    • academictorrents.com
    bittorrent
    Updated Dec 20, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2013). UCI Machine Learning Datasets 12/2013 [Dataset]. https://academictorrents.com/details/7fafb101f9c7961f9b840daeb4af43039107ddef
    Explore at:
    bittorrent(16365432846)Available download formats
    Dataset updated
    Dec 20, 2013
    Dataset authored and provided by
    UCI
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. Many people deserve thanks for making the repository a success. Foremost among them are the d

  8. i

    UCI dataset

    • ieee-dataport.org
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wutao Xiong (2024). UCI dataset [Dataset]. https://ieee-dataport.org/documents/uci-dataset
    Explore at:
    Dataset updated
    Jun 12, 2024
    Authors
    Wutao Xiong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    and different customers have different starting times

  9. Bike Rental Data Set - UCI

    • kaggle.com
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Aguado (2022). Bike Rental Data Set - UCI [Dataset]. https://www.kaggle.com/datasets/aguado/bike-rental-data-set-uci
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Víctor Aguado
    Description

    Description

    The existing bicycle rental systems in large cities have a system automated collection and return of the vehicle through a network of stations distributed throughout the entire metropolis. With the use of these systems, people can rent a bike in a location and return it in a different one depending on your needs. The data generated by these systems are attractive to researchers due to variables such as the duration of the trip, departure and destination points and travel time. Therefore, exchange systems Bicycles work as a network of sensors that are useful for mobility studies. With In order to improve management, one of these companies needs to anticipate the demand that there will be in a certain range of time depending on factors such as the time zone, the type day (weekday or holiday), the weather, etc.

    The objective of this data set is to predict the demand in a series of specific time slots, using the historical data set as the basis to build a linear model.

    Data Description

    Two data sets will be delivered containing the number of rented bicycles in different time slots:

    1. Training data. They will contain the response variable (number of bicycles rented in that strip)
    2. Test data. They will not contain the response variable and the response variable must be predicted based on on the historical data of the training set.

    The variables present in the 2 data sets are:

    • id: time slot identifier (not related to time order)
    • year: year (2011 or 2012)
    • hour: hour of the day (0 to 23)
    • season: 1 = winter, 2 = spring, 3 = summer, 4 = autumn
    • holiday: if the day was a holiday
    • workingday: if the day was a working day (neither a holiday nor a weekend)
    • weather: four categories (1 to 4) ranging from best to worst weather
    • temp: temperature in degrees Celsius
    • atemp: sensation of temperature in degrees Celsius
    • humidity: relative humidity
    • windspeed: wind speed (km/h)
    • count (only in the training set): total number of rentals in that band
  10. o

    kr-vs-kp

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alen Shapiro (2014). kr-vs-kp [Dataset]. https://www.openml.org/search?type=data&sort=runs&status=active&qualities.NumberOfClasses=%3D_2&qualities.NumberOfInstances=gte_0&id=3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    Alen Shapiro
    Description

    Author: Alen Shapiro Source: UCI Please cite: UCI citation policy

    1. Title: Chess End-Game -- King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7). The pawn on a7 means it is one square away from queening. It is the King+Rook's side (white) to move.

    2. Sources: (a) Database originally generated and described by Alen Shapiro. (b) Donor/Coder: Rob Holte (holte@uottawa.bitnet). The database was supplied to Holte by Peter Clark of the Turing Institute in Glasgow (pete@turing.ac.uk). (c) Date: 1 August 1989

    3. Past Usage:

    4. Alen D. Shapiro (1983,1987), "Structured Induction in Expert Systems", Addison-Wesley. This book is based on Shapiro's Ph.D. thesis (1983) at the University of Edinburgh entitled "The Role of Structured Induction in Expert Systems".

    5. Stephen Muggleton (1987), "Structuring Knowledge by Asking Questions", pp.218-229 in "Progress in Machine Learning", edited by I. Bratko and Nada Lavrac, Sigma Press, Wilmslow, England SK9 5BB.

    6. Robert C. Holte, Liane Acker, and Bruce W. Porter (1989), "Concept Learning and the Problem of Small Disjuncts", Proceedings of IJCAI. Also available as technical report AI89-106, Computer Sciences Department, University of Texas at Austin, Austin, Texas 78712.

    7. Relevant Information: The dataset format is described below. Note: the format of this database was modified on 2/26/90 to conform with the format of all the other databases in the UCI repository of machine learning databases.

    8. Number of Instances: 3196 total

    9. Number of Attributes: 36

    10. Attribute Summaries: Classes (2): -- White-can-win ("won") and White-cannot-win ("nowin"). I believe that White is deemed to be unable to win if the Black pawn can safely advance. Attributes: see Shapiro's book.

    11. Missing Attributes: -- none

    12. Class Distribution: In 1669 of the positions (52%), White can win. In 1527 of the positions (48%), White cannot win.

    The format for instances in this database is a sequence of 37 attribute values. Each instance is a board-descriptions for this chess endgame. The first 36 attributes describe the board. The last (37th) attribute is the classification: "win" or "nowin". There are 0 missing values. A typical board-description is

    f,f,f,f,f,f,f,f,f,f,f,f,l,f,n,f,f,t,f,f,f,f,f,f,f,t,f,f,f,f,f,f,f,t,t,n,won

    The names of the features do not appear in the board-descriptions. Instead, each feature correponds to a particular position in the feature-value list. For example, the head of this list is the value for the feature "bkblk". The following is the list of features, in the order in which their values appear in the feature-value list:

    [bkblk,bknwy,bkon8,bkona,bkspr,bkxbq,bkxcr,bkxwp,blxwp,bxqsq,cntxt,dsopp,dwipd, hdchk,katri,mulch,qxmsq,r2ar8,reskd,reskr,rimmx,rkxwp,rxmsq,simpl,skach,skewr, skrxp,spcop,stlmt,thrsk,wkcti,wkna8,wknck,wkovl,wkpos,wtoeg]

    In the file, there is one instance (board position) per line.

    Num Instances: 3196 Num Attributes: 37 Num Continuous: 0 (Int 0 / Real 0) Num Discrete: 37 Missing values: 0 / 0.0%

  11. P

    https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Dataset

    • paperswithcode.com
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Dataset [Dataset]. https://paperswithcode.com/dataset/https-kdd-ics-uci-edu-databases-kddcup99
    Explore at:
    Dataset updated
    Oct 28, 2024
    Description

    Click to add a brief description of the dataset (Markdown and LaTeX enabled).

    Provide:

    a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset

  12. Obesity DataSet UCI ML

    • kaggle.com
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tathagat Banerjee (2022). Obesity DataSet UCI ML [Dataset]. https://www.kaggle.com/datasets/tathagatbanerjee/obesity-dataset-uci-ml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tathagat Banerjee
    Description

    Estimation of obesity levels based on eating habits and physical condition Data Set Download: Data Folder, Data Set Description

    Abstract: This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition.

    Data Set Characteristics:

    Multivariate

    Number of Instances:

    2111

    Area:

    Life

    Attribute Characteristics:

    Integer

    Number of Attributes:

    17

    Date Donated

    2019-08-27

    Associated Tasks:

    Classification, Regression, Clustering

    Missing Values?

    N/A

    Number of Web Hits:

    70843

    Source:

    Fabio Mendoza Palechor, Email: fmendoza1 '@' cuc.edu.co, Celphone: +573182929611 Alexis de la Hoz Manotas, Email: akdelahoz '@' gmail.com, Celphone: +573017756983

    Data Set Information:

    This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. The data contains 17 attributes and 2111 records, the records are labeled with the class variable NObesity (Obesity Level), that allows classification of the data using the values of Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, 23% of the data was collected directly from users through a web platform.

    Attribute Information:

    Read the article ([Web Link]) to see the description of the attributes.

    Relevant Papers:

    [1]Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344. [2]De-La-Hoz-Correa, E., Mendoza Palechor, F., De-La-Hoz-Manotas, A., Morales Ortega, R., & Sánchez Hernández, A. B. (2019). Obesity level estimation software based on decision trees.

    Citation Request:

    [1] Palechor, F. M., & de la Hoz Manotas, A. (2019). Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in Brief, 104344.

  13. UCI Diabetes Data Set

    • kaggle.com
    Updated May 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ergin Altıntaş (2020). UCI Diabetes Data Set [Dataset]. https://www.kaggle.com/ealtintas/uci-machine-learning-repository-diabetes-data-set/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ergin Altıntaş
    Description

    About this Dataset

    This CSV contain a data set prepared for the use of participants for the 1994 AAAI Spring Symposium on Artificial Intelligence in Medicine.

    Content

    Original files were obtained from: https://archive.ics.uci.edu/ml/datasets/diabetes

    Archived file diabetes-data.tar.z which contains 70 sets of data recorded on diabetes patients (several weeks' to months' worth of glucose, insulin, and lifestyle data per patient + a description of the problem domain) is extracted and processed and merged as a CSV file.

    The Code field of the CSV is deciphered as follows:

    33 = Regular insulin dose 34 = NPH insulin dose 35 = UltraLente insulin dose 48 = Unspecified blood glucose measurement 57 = Unspecified blood glucose measurement 58 = Pre-breakfast blood glucose measurement 59 = Post-breakfast blood glucose measurement 60 = Pre-lunch blood glucose measurement 61 = Post-lunch blood glucose measurement 62 = Pre-supper blood glucose measurement 63 = Post-supper blood glucose measurement 64 = Pre-snack blood glucose measurement 65 = Hypoglycemic symptoms 66 = Typical meal ingestion 67 = More-than-usual meal ingestion 68 = Less-than-usual meal ingestion 69 = Typical exercise activity 70 = More-than-usual exercise activity 71 = Less-than-usual exercise activity 72 = Unspecified special event

  14. d

    Replication Data for: Scalable Kernel Mean Matching

    • search.dataone.org
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chandra, Swarup (2023). Replication Data for: Scalable Kernel Mean Matching [Dataset]. http://doi.org/10.7910/DVN/ELFPEM
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Chandra, Swarup
    Description
  15. o

    arrhythmia

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Altay Guvenir; Burak Acar; Haldun Muderrisoglu (2014). arrhythmia [Dataset]. https://www.openml.org/d/5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    H. Altay Guvenir; Burak Acar; Haldun Muderrisoglu
    Description

    Author: H. Altay Guvenir, Burak Acar, Haldun Muderrisoglu
    Source: UCI
    Please cite: UCI

    Cardiac Arrhythmia Database
    The aim is to determine the type of arrhythmia from the ECG recordings. This database contains 279 attributes, 206 of which are linear valued and the rest are nominal.

    Concerning the study of H. Altay Guvenir: "The aim is to distinguish between the presence and absence of cardiac arrhythmia and to classify it in one of the 16 groups. Class 01 refers to 'normal' ECG classes, 02 to 15 refers to different classes of arrhythmia and class 16 refers to the rest of unclassified ones. For the time being, there exists a computer program that makes such a classification. However, there are differences between the cardiologist's and the program's classification. Taking the cardiologist's as a gold standard we aim to minimize this difference by means of machine learning tools.

    The names and id numbers of the patients were recently removed from the database.

    Attribute Information

      1 Age: Age in years , linear
      2 Sex: Sex (0 = male; 1 = female) , nominal
      3 Height: Height in centimeters , linear
      4 Weight: Weight in kilograms , linear
      5 QRS duration: Average of QRS duration in msec., linear
      6 P-R interval: Average duration between onset of P and Q waves
       in msec., linear
      7 Q-T interval: Average duration between onset of Q and offset
       of T waves in msec., linear
      8 T interval: Average duration of T wave in msec., linear
      9 P interval: Average duration of P wave in msec., linear
     Vector angles in degrees on front plane of:, linear
     10 QRS
     11 T
     12 P
     13 QRST
     14 J
     15 Heart rate: Number of heart beats per minute ,linear
     Of channel DI:
      Average width, in msec., of: linear
      16 Q wave
      17 R wave
      18 S wave
      19 R' wave, small peak just after R
      20 S' wave
      21 Number of intrinsic deflections, linear
      22 Existence of ragged R wave, nominal
      23 Existence of diphasic derivation of R wave, nominal
      24 Existence of ragged P wave, nominal
      25 Existence of diphasic derivation of P wave, nominal
      26 Existence of ragged T wave, nominal
      27 Existence of diphasic derivation of T wave, nominal
     Of channel DII: 
      28 .. 39 (similar to 16 .. 27 of channel DI)
     Of channels DIII:
      40 .. 51
     Of channel AVR:
      52 .. 63
     Of channel AVL:
      64 .. 75
     Of channel AVF:
      76 .. 87
     Of channel V1:
      88 .. 99
     Of channel V2:
      100 .. 111
     Of channel V3:
      112 .. 123
     Of channel V4:
      124 .. 135
     Of channel V5:
      136 .. 147
     Of channel V6:
      148 .. 159
     Of channel DI:
      Amplitude , * 0.1 milivolt, of
      160 JJ wave, linear
      161 Q wave, linear
      162 R wave, linear
      163 S wave, linear
      164 R' wave, linear
      165 S' wave, linear
      166 P wave, linear
      167 T wave, linear
      168 QRSA , Sum of areas of all segments divided by 10,
        ( Area= width * height / 2 ), linear
      169 QRSTA = QRSA + 0.5 * width of T wave * 0.1 * height of T
        wave. (If T is diphasic then the bigger segment is
        considered), linear
     Of channel DII:
      170 .. 179
     Of channel DIII:
      180 .. 189
     Of channel AVR:
      190 .. 199
     Of channel AVL:
      200 .. 209
     Of channel AVF:
      210 .. 219
     Of channel V1:
      220 .. 229
     Of channel V2:
      230 .. 239
     Of channel V3:
      240 .. 249
     Of channel V4:
      250 .. 259
     Of channel V5:
      260 .. 269
     Of channel V6:
      270 .. 279
    

    Class code - class - number of instances:

      01       Normal        245
      02       Ischemic changes (Coronary Artery Disease)  44
      03       Old Anterior Myocardial Infarction      15
      04       Old Inferior Myocardial Infarction      15
      05       Sinus tachycardy    13
      06       Sinus bradycardy    25
      07       Ventricular Premature Contraction (PVC)    3
      08       Supraventricular Premature Contraction    2
      09       Left bundle branch block     9 
      10       Right bundle branch block    50
      11       1. degree AtrioVentricular block    0 
      12       2. degree AV block        0
      13       3. degree AV block        0
      14       Left ventricule hypertrophy        4
      15       Atrial Fibrillation or Flutter        5
      16       Others         22
    
  16. KOS bag of words data

    • kaggle.com
    Updated May 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PhilipHarmuth (2017). KOS bag of words data [Dataset]. https://www.kaggle.com/datasets/harmuth/bagofwords/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PhilipHarmuth
    Description

    Data Set Information:

    Taken from https://archive.ics.uci.edu/ml/datasets/bag+of+words

    For each text collection, D is the number of documents, W is the number of words in the vocabulary, and N is the total number of words in the collection (below, NNZ is the number of nonzero counts in the bag-of-words). After tokenization and removal of stopwords, the vocabulary of unique words was truncated by only keeping words that occurred more than ten times. Individual document names (i.e. a identifier for each docID) are not provided for copyright reasons.

    These data sets have no class labels, and for copyright reasons no filenames or other document-level metadata. These data sets are ideal for clustering and topic modeling experiments.

    KOS blog entries: orig source: dailykos.com D=3430 W=6906 N=467714

    Attribute Information:

    The format of the docword.*.txt file is 3 header lines, followed by

    NNZ triples:

    D W NNZ docID wordID count docID wordID count docID wordID count docID wordID count ... docID wordID count docID wordID count

    docID wordID count

  17. Z

    UCI datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Drton, Mathias (2023). UCI datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7681647
    Explore at:
    Dataset updated
    Apr 4, 2023
    Dataset provided by
    Zadorozhnyi, Oleksandr
    Drton, Mathias
    Reifferscheidt, David
    Haug, Stephan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding

    Air Quality

    US census 1990

    Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly

    Number of features: 15-68

    Ground truth: No

    Type of Graph: No ground truth

    More information about the datasets is contained in the dataset_description.html files.

  18. z

    UCI Datasets: "Air quality" and "US Census (1990)"

    • zenodo.org
    bin, csv, html
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). UCI Datasets: "Air quality" and "US Census (1990)" [Dataset]. http://doi.org/10.5281/zenodo.8063512
    Explore at:
    bin, csv, htmlAvailable download formats
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    Zenodo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Two preprocessed datasets collected from the UCI repository that can be used for the purpose of structure learning from multivariate data of different types.

    Air Quality

    This dataset represents hourly averaged measurements of 5 metal oxide chemical sensors embedded in an air quality chemical multisensor device. The certified analyzer was located on the field in a significantly polluted area, at road level, within an Italian city. Data were recorded from March 2004 to February 2005 (one year), representing the longest freely available recordings of on-field deployed air quality chemical sensor device responses [1]. More information about the attributes and their type can be found in airqualitydataset_description.html.

    Size of dataset: 9358
    Number of Features: 16
    Type of data: discrete and continuous
    Ground Truth: No

    Contains the responses of a gas multisensor device deployed on the field in an Italian city. Hourly responses averages are recorded along with gas concentrations references from a certified analyzer. There are 15 attributes. Date and Time as well as discrete and real covariates.

    0 Date (DD/MM/YYYY)
    1 Time (HH.MM.SS)
    2 True hourly averaged concentration CO in mg/m^3 (reference analyzer)
    3 PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted)
    4 True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer)
    5 True hourly averaged Benzene concentration in microg/m^3 (reference analyzer)
    6 PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted)
    7 True hourly averaged NOx concentration in ppb (reference analyzer)
    8 PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted)
    9 True hourly averaged NO2 concentration in microg/m^3 (reference analyzer)
    10 PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)
    11 PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted)
    12 Temperature in °C
    13 Relative Humidity (%)
    14 AH Absolute Humidity

    US Census (1990)

    This dataset is a discretized version of the USCensus1990raw dataset. The data was collected as part of the 1990 census, and it describes one percent sample of the Public Use Microdata Samples (PUMS) person records drawn from the full 1990 census sample (all fifty states and the District of Columbia but not including "PUMA Cross State Lines One Percent Persons Records") [2]. More information about the attributes and their type can be found in census1990_description.html.

    Size of dataset: 2458285
    Number of features: 68
    Ground truth: No

    References:

    [1] S. De Vito and E. Massera and M. Piga and L. Martinotto and G. Di Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical, Volume 129, Issue 2, 22 February 2008, Pages 750-757, ISSN 0925-4005 https://doi.org/10.1016/j.snb.2007.09.060

    [2] Meek, Thiesson and Heckerman (2001), "The Learning Curve Method Applied to Clustering",The Journal of Machine Learning Research. (Also see MSR-TR-2001-34 available athttps://www.microsoft.com/en-us/research/wp-content/uploads/2001/01/lc-aistats.pdf)

  19. Open-source data sets for classification task from UCI repository and...

    • figshare.com
    txt
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xh niu (2024). Open-source data sets for classification task from UCI repository and Scikit-learn in section 4 [Dataset]. http://doi.org/10.6084/m9.figshare.26886055.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    xh niu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets from Scikit-learn are: ‘Iris’, ‘Wine’, ‘Breast Cancer Wisconsin (Diagnostic)’. Datasets from UCI repository are: ‘Seeds’ ‘Banknote Authentication’ (‘Banknotes’), ‘Heart disease’ ‘ Parkinsons ‘, ‘Ecoli’, ‘Thyroid (Thyroid gland data)’

  20. UCI Heart Disease Data Set

    • kaggle.com
    Updated Jan 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lourens Walters (2021). UCI Heart Disease Data Set [Dataset]. https://www.kaggle.com/lourenswalters/uci-heart-disease-data-set/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 1, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lourens Walters
    Description

    Context

    The dataset used can be found on the UCI Machine Learning Repository at the following location:

    Heart Disease Dataset

    There are several copies of this dataset to be found on Kaggle, with people focusing on different types of analyses of the data. This specific copy can be analysed by anyone interested, but is primarily used by a study group from the Udacity Bertelsmann Technology Scholarship to practice analysis of association between variables as well as implementation and comparison of various Machine Learning models.

    Content

    According to the paper by (Detrano et al., 1989) as found on the UCI Dataset webpage, the data represents data collected for 303 patients referred for coronary angiography at the Cleveland Clinic between May 1981 and September 1984. The 13 independent/ features variables can be divided into 3 groups as follows:

    Routine evaluation (based on historical data):

    • ECG at rest
    • Serum Cholesterol
    • Fasting blood sugar

    Non-invasive test data (informed consent obtained for data as part of research protocol):

    • Exercise ECG
      • ST-segment peak slope (upsloping, flat or downsloping)
      • ST-segment depression
    • Excercise Thallium scintigraphy (fixed, reversible or none)
    • Cardiac fluoroscopy (number of vessels appeared to contain calcium)

    Other demographic and clinical variables (based on routine data):

    • Age
    • Sex
    • Chest pain type
    • Systolic blood pressure
    • ST-T-wave abnormality (T-wave abnormality)
    • Probably or definite ventricular hypertrophy (Este's criteria)
    • The dependent/ response variable was the angiographic test result indicating a >50% diameter narrowing.

    Data Dictionary

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3632459%2Fa01747fb0158dc51c12bc0824c9c4ae4%2Fdata_dictionary2.png?generation=1609522473018549&alt=media" alt="">

    Acknowledgements

    UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Donor:

    David W. Aha (aha '@' ics.uci.edu) (714) 856-8779

    Inspiration

    The objective of the analysis is to use statistical learning to identify factors associated with Coronary Artery Disease as indicated by a coronary angiography interpreted by a Cardiologist (as per paper written by Detrano et al cited before).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets

UCI datasets

Explore at:
Dataset updated
May 14, 2025
Authors
Yuan Sun
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

biology

Search
Clear search
Close search
Google apps
Main menu