89 datasets found
  1. Cancer Multiple Dataset UCI MLR

    • kaggle.com
    zip
    Updated Aug 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medi Hunter - 4004 (2025). Cancer Multiple Dataset UCI MLR [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasakbd/cancer-multiple-dataset-uci-mlr/suggestions
    Explore at:
    zip(74213598 bytes)Available download formats
    Dataset updated
    Aug 5, 2025
    Authors
    Medi Hunter - 4004
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Source More Info : https://archive.ics.uci.edu/datasets

    The **UCI Machine Learning Repository **is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.

    The datasets collected in this project represent a diverse and comprehensive set of cancer-related data sourced from the UCI Machine Learning Repository. They cover a wide spectrum of cancer types and research perspectives, including breast cancer datasets such as the original, diagnostic, prognostic, and Coimbra variants, which focus on tumor features, recurrence, and biochemical markers. Cervical cancer is represented through datasets focusing on behavioral risks and general risk factors. The lung cancer dataset provides categorical diagnostic attributes, while the primary tumor dataset offers insights into tumor locations based on metastasis data. Additionally, specialized datasets like differentiated thyroid cancer recurrence, glioma grading with clinical and mutation features, and gene expression RNA-Seq data expand the scope into genetic and molecular-level cancer analysis. Together, these datasets support a wide range of machine learning applications including classification, prediction, survival analysis, and feature correlation across various types of cancer.

    RRA_Think Differently, Create history’s next line.

    Hello Data Hunters! Hope you're doing well. https://www.kaggle.com/shuvokumarbasak4004 (More Dataset) https://www.kaggle.com/shuvokumarbasak2030

  2. o

    kr-vs-kp

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alen Shapiro (2014). kr-vs-kp [Dataset]. https://www.openml.org/d/3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    Alen Shapiro
    Description

    Author: Alen Shapiro Source: UCI Please cite: UCI citation policy

    1. Title: Chess End-Game -- King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7). The pawn on a7 means it is one square away from queening. It is the King+Rook's side (white) to move.

    2. Sources: (a) Database originally generated and described by Alen Shapiro. (b) Donor/Coder: Rob Holte (holte@uottawa.bitnet). The database was supplied to Holte by Peter Clark of the Turing Institute in Glasgow (pete@turing.ac.uk). (c) Date: 1 August 1989

    3. Past Usage:

    4. Alen D. Shapiro (1983,1987), "Structured Induction in Expert Systems", Addison-Wesley. This book is based on Shapiro's Ph.D. thesis (1983) at the University of Edinburgh entitled "The Role of Structured Induction in Expert Systems".

    5. Stephen Muggleton (1987), "Structuring Knowledge by Asking Questions", pp.218-229 in "Progress in Machine Learning", edited by I. Bratko and Nada Lavrac, Sigma Press, Wilmslow, England SK9 5BB.

    6. Robert C. Holte, Liane Acker, and Bruce W. Porter (1989), "Concept Learning and the Problem of Small Disjuncts", Proceedings of IJCAI. Also available as technical report AI89-106, Computer Sciences Department, University of Texas at Austin, Austin, Texas 78712.

    7. Relevant Information: The dataset format is described below. Note: the format of this database was modified on 2/26/90 to conform with the format of all the other databases in the UCI repository of machine learning databases.

    8. Number of Instances: 3196 total

    9. Number of Attributes: 36

    10. Attribute Summaries: Classes (2): -- White-can-win ("won") and White-cannot-win ("nowin"). I believe that White is deemed to be unable to win if the Black pawn can safely advance. Attributes: see Shapiro's book.

    11. Missing Attributes: -- none

    12. Class Distribution: In 1669 of the positions (52%), White can win. In 1527 of the positions (48%), White cannot win.

    The format for instances in this database is a sequence of 37 attribute values. Each instance is a board-descriptions for this chess endgame. The first 36 attributes describe the board. The last (37th) attribute is the classification: "win" or "nowin". There are 0 missing values. A typical board-description is

    f,f,f,f,f,f,f,f,f,f,f,f,l,f,n,f,f,t,f,f,f,f,f,f,f,t,f,f,f,f,f,f,f,t,t,n,won

    The names of the features do not appear in the board-descriptions. Instead, each feature correponds to a particular position in the feature-value list. For example, the head of this list is the value for the feature "bkblk". The following is the list of features, in the order in which their values appear in the feature-value list:

    [bkblk,bknwy,bkon8,bkona,bkspr,bkxbq,bkxcr,bkxwp,blxwp,bxqsq,cntxt,dsopp,dwipd, hdchk,katri,mulch,qxmsq,r2ar8,reskd,reskr,rimmx,rkxwp,rxmsq,simpl,skach,skewr, skrxp,spcop,stlmt,thrsk,wkcti,wkna8,wknck,wkovl,wkpos,wtoeg]

    In the file, there is one instance (board position) per line.

    Num Instances: 3196 Num Attributes: 37 Num Continuous: 0 (Int 0 / Real 0) Num Discrete: 37 Missing values: 0 / 0.0%

  3. Integrated Heart Disease Dataset

    • kaggle.com
    zip
    Updated Apr 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Gyawali (2019). Integrated Heart Disease Dataset [Dataset]. https://www.kaggle.com/unikpoet/heartdisease
    Explore at:
    zip(37479 bytes)Available download formats
    Dataset updated
    Apr 2, 2019
    Authors
    Rahul Gyawali
    Description

    Context

    This dataset integrates all the databases present in Heart Disease Dataset available at UCI Machine Learning Repository. Original one contains 4 databases: Cleveland, Hungarian, Long Beach, and Switzerland. Most of the work has been done using Cleveland dataset only.

    Content

    Originally there are 76 attributes in the dataset, Selection of attributes depends on one's need. Here I've taken 10 attributes for the prediction.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  4. d

    UCI Machine Learning Repository

    • dknet.org
    • rrid.site
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning Repository [Dataset]. http://identifiers.org/RRID:SCR_026571
    Explore at:
    Description

    Collection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given

  5. Breast Cancer Wisconsin (Prognostic) Data Set

    • kaggle.com
    zip
    Updated Mar 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah VCH (2017). Breast Cancer Wisconsin (Prognostic) Data Set [Dataset]. https://www.kaggle.com/sarahvch/breast-cancer-wisconsin-prognostic-data-set
    Explore at:
    zip(49800 bytes)Available download formats
    Dataset updated
    Mar 31, 2017
    Authors
    Sarah VCH
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    Data From: UCI Machine Learning Repository http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wpbc.names

    Content

    "Each record represents follow-up data for one breast cancer case. These are consecutive patients seen by Dr. Wolberg since 1984, and include only those cases exhibiting invasive breast cancer and no evidence of distant metastases at the time of diagnosis.

    The first 30 features are computed from a digitized image of a
    fine needle aspirate (FNA) of a breast mass. They describe
    characteristics of the cell nuclei present in the image.
    A few of the images can be found at
    http://www.cs.wisc.edu/~street/images/
    
    The separation described above was obtained using
    Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree
    Construction Via Linear Programming." Proceedings of the 4th
    Midwest Artificial Intelligence and Cognitive Science Society,
    pp. 97-101, 1992], a classification method which uses linear
    programming to construct a decision tree. Relevant features
    were selected using an exhaustive search in the space of 1-4
    features and 1-3 separating planes.
    
    The actual linear program used to obtain the separating plane
    in the 3-dimensional space is that described in:
    [K. P. Bennett and O. L. Mangasarian: "Robust Linear
    Programming Discrimination of Two Linearly Inseparable Sets",
    Optimization Methods and Software 1, 1992, 23-34].
    
    The Recurrence Surface Approximation (RSA) method is a linear
    programming model which predicts Time To Recur using both
    recurrent and nonrecurrent cases. See references (i) and (ii)
    above for details of the RSA method. 
    
    This database is also available through the UW CS ftp server:
    
    ftp ftp.cs.wisc.edu
    cd math-prog/cpo-dataset/machine-learn/WPBC/
    

    1) ID number 2) Outcome (R = recur, N = nonrecur) 3) Time (recurrence time if field 2 = R, disease-free time if field 2 = N) 4-33) Ten real-valued features are computed for each cell nucleus:

    a) radius (mean of distances from center to points on the perimeter)
    b) texture (standard deviation of gray-scale values)
    c) perimeter
    d) area
    e) smoothness (local variation in radius lengths)
    f) compactness (perimeter^2 / area - 1.0)
    g) concavity (severity of concave portions of the contour)
    h) concave points (number of concave portions of the contour)
    i) symmetry 
    j) fractal dimension ("coastline approximation" - 1)"
    

    Acknowledgements

    Creators:

    Dr. William H. Wolberg, General Surgery Dept., University of
    Wisconsin, Clinical Sciences Center, Madison, WI 53792
    wolberg@eagle.surgery.wisc.edu
    
    W. Nick Street, Computer Sciences Dept., University of
    Wisconsin, 1210 West Dayton St., Madison, WI 53706
    street@cs.wisc.edu 608-262-6619
    
    Olvi L. Mangasarian, Computer Sciences Dept., University of
    Wisconsin, 1210 West Dayton St., Madison, WI 53706
    olvi@cs.wisc.edu 
    

    Inspiration

    I'm really interested in trying out various machine learning algorithms on some real life science data.

  6. UCI Datasets for Metaheuristic Feature Selection

    • kaggle.com
    zip
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deku (2025). UCI Datasets for Metaheuristic Feature Selection [Dataset]. https://www.kaggle.com/datasets/piyushsharma5654/uci-datasets-for-metaheuristic-feature-selection
    Explore at:
    zip(671346 bytes)Available download formats
    Dataset updated
    Nov 7, 2025
    Authors
    Deku
    Description

    This dataset is a curated collection of 5 classic, publicly available datasets from the UCI Machine Learning Repository.

    Context This collection was compiled for the purpose of benchmarking and evaluating machine learning algorithms, particularly metaheuristic-based feature selection (FS) algorithms. The datasets were used in the research paper: "Hybrid-FS: A Novel Feature Selection Algorithm Integrating Sine-Cosine Optimization and Genetic Operators for High-Dimensional Data Classification."

    Content The dataset contains 5 separate files, all originating from the UCI ML Repository:

    Ionosphere: 351 instances, 34 features, 2 classes

    Sonar: 208 instances, 60 features, 2 classes

    Waveform (v2): 5000 instances, 40 features, 3 classes

    Wine: 178 instances, 13 features, 3 classes

    Zoo: 101 instances, 16 features, 7 classes

    Original Sources & Citation All datasets are provided as-is from the UCI Machine Learning Repository. Please cite the original creators of each dataset as specified on their respective UCI pages.

    Ionosphere: https://archive.ics.uci.edu/dataset/52/ionosphere

    Sonar: https://archive.ics.uci.edu/dataset/151/connectionist-bench-sonar-mines-vs-rocks

    Waveform: https://archive.ics.uci.edu/dataset/108/waveform+database+generator+version+2

    Wine: https://archive.ics.uci.edu/dataset/109/wine

    Zoo: https://archive.ics.uci.edu/dataset/111/zoo

  7. o

    arrhythmia

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Altay Guvenir; Burak Acar; Haldun Muderrisoglu (2014). arrhythmia [Dataset]. https://www.openml.org/d/5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    H. Altay Guvenir; Burak Acar; Haldun Muderrisoglu
    Description

    Author: H. Altay Guvenir, Burak Acar, Haldun Muderrisoglu
    Source: UCI
    Please cite: UCI

    Cardiac Arrhythmia Database
    The aim is to determine the type of arrhythmia from the ECG recordings. This database contains 279 attributes, 206 of which are linear valued and the rest are nominal.

    Concerning the study of H. Altay Guvenir: "The aim is to distinguish between the presence and absence of cardiac arrhythmia and to classify it in one of the 16 groups. Class 01 refers to 'normal' ECG classes, 02 to 15 refers to different classes of arrhythmia and class 16 refers to the rest of unclassified ones. For the time being, there exists a computer program that makes such a classification. However, there are differences between the cardiologist's and the program's classification. Taking the cardiologist's as a gold standard we aim to minimize this difference by means of machine learning tools.

    The names and id numbers of the patients were recently removed from the database.

    Attribute Information

      1 Age: Age in years , linear
      2 Sex: Sex (0 = male; 1 = female) , nominal
      3 Height: Height in centimeters , linear
      4 Weight: Weight in kilograms , linear
      5 QRS duration: Average of QRS duration in msec., linear
      6 P-R interval: Average duration between onset of P and Q waves
       in msec., linear
      7 Q-T interval: Average duration between onset of Q and offset
       of T waves in msec., linear
      8 T interval: Average duration of T wave in msec., linear
      9 P interval: Average duration of P wave in msec., linear
     Vector angles in degrees on front plane of:, linear
     10 QRS
     11 T
     12 P
     13 QRST
     14 J
     15 Heart rate: Number of heart beats per minute ,linear
     Of channel DI:
      Average width, in msec., of: linear
      16 Q wave
      17 R wave
      18 S wave
      19 R' wave, small peak just after R
      20 S' wave
      21 Number of intrinsic deflections, linear
      22 Existence of ragged R wave, nominal
      23 Existence of diphasic derivation of R wave, nominal
      24 Existence of ragged P wave, nominal
      25 Existence of diphasic derivation of P wave, nominal
      26 Existence of ragged T wave, nominal
      27 Existence of diphasic derivation of T wave, nominal
     Of channel DII: 
      28 .. 39 (similar to 16 .. 27 of channel DI)
     Of channels DIII:
      40 .. 51
     Of channel AVR:
      52 .. 63
     Of channel AVL:
      64 .. 75
     Of channel AVF:
      76 .. 87
     Of channel V1:
      88 .. 99
     Of channel V2:
      100 .. 111
     Of channel V3:
      112 .. 123
     Of channel V4:
      124 .. 135
     Of channel V5:
      136 .. 147
     Of channel V6:
      148 .. 159
     Of channel DI:
      Amplitude , * 0.1 milivolt, of
      160 JJ wave, linear
      161 Q wave, linear
      162 R wave, linear
      163 S wave, linear
      164 R' wave, linear
      165 S' wave, linear
      166 P wave, linear
      167 T wave, linear
      168 QRSA , Sum of areas of all segments divided by 10,
        ( Area= width * height / 2 ), linear
      169 QRSTA = QRSA + 0.5 * width of T wave * 0.1 * height of T
        wave. (If T is diphasic then the bigger segment is
        considered), linear
     Of channel DII:
      170 .. 179
     Of channel DIII:
      180 .. 189
     Of channel AVR:
      190 .. 199
     Of channel AVL:
      200 .. 209
     Of channel AVF:
      210 .. 219
     Of channel V1:
      220 .. 229
     Of channel V2:
      230 .. 239
     Of channel V3:
      240 .. 249
     Of channel V4:
      250 .. 259
     Of channel V5:
      260 .. 269
     Of channel V6:
      270 .. 279
    

    Class code - class - number of instances:

      01       Normal        245
      02       Ischemic changes (Coronary Artery Disease)  44
      03       Old Anterior Myocardial Infarction      15
      04       Old Inferior Myocardial Infarction      15
      05       Sinus tachycardy    13
      06       Sinus bradycardy    25
      07       Ventricular Premature Contraction (PVC)    3
      08       Supraventricular Premature Contraction    2
      09       Left bundle branch block     9 
      10       Right bundle branch block    50
      11       1. degree AtrioVentricular block    0 
      12       2. degree AV block        0
      13       3. degree AV block        0
      14       Left ventricule hypertrophy        4
      15       Atrial Fibrillation or Flutter        5
      16       Others         22
    
  8. o

    tic-tac-toe

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David W. Aha (2014). tic-tac-toe [Dataset]. https://www.openml.org/d/50
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    David W. Aha
    Description

    Author: David W. Aha
    Source: UCI - 1991
    Please cite: UCI

    Tic-Tac-Toe Endgame database
    This database encodes the complete set of possible board configurations at the end of tic-tac-toe games, where "x" is assumed to have played first. The target concept is "win for x" (i.e., true when "x" has one of 8 possible ways to create a "three-in-a-row").

    Attribute Information

     (x=player x has taken, o=player o has taken, b=blank)
     1. top-left-square: {x,o,b}
     2. top-middle-square: {x,o,b}
     3. top-right-square: {x,o,b}
     4. middle-left-square: {x,o,b}
     5. middle-middle-square: {x,o,b}
     6. middle-right-square: {x,o,b}
     7. bottom-left-square: {x,o,b}
     8. bottom-middle-square: {x,o,b}
     9. bottom-right-square: {x,o,b}
    10. Class: {positive,negative}
    
  9. o

    PhishingWebsites

    • openml.org
    Updated Feb 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae) (2016). PhishingWebsites [Dataset]. https://www.openml.org/d/4534
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2016
    Authors
    Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae)
    Description

    Author: Rami Mustafa A Mohammad ( University of Huddersfield","rami.mohammad '@' hud.ac.uk","rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield","t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai","fadi '@' cud.ac.ae)
    Source: UCI
    Please cite: Please refer to the Machine Learning Repository's citation policy

    Source:

    Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai,fadi '@' cud.ac.ae)

    Data Set Information:

    One of the challenges faced by our research was the unavailability of reliable training datasets. In fact this challenge faces any researcher in the field. However, although plenty of articles about predicting phishing websites have been disseminated these days, no reliable training dataset has been published publically, may be because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. In this dataset, we shed light on the important features that have proved to be sound and effective in predicting phishing websites. In addition, we propose some new features.

    Attribute Information:

    For Further information about the features see the features file in the data folder of UCI.

    Relevant Papers:

    Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi (2012) An Assessment of Features Related to Phishing Websites using an Automated Technique. In: International Conferece For Internet Technology And Secured Transactions. ICITST 2012 . IEEE, London, UK, pp. 492-497. ISBN 978-1-4673-5325-0

    Mohammad, Rami, Thabtah, Fadi Abdeljaber and McCluskey, T.L. (2014) Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25 (2). pp. 443-458. ISSN 0941-0643

    Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi Abdeljaber (2014) Intelligent Rule based Phishing Websites Classification. IET Information Security, 8 (3). pp. 153-160. ISSN 1751-8709

    Citation Request:

    Please refer to the Machine Learning Repository's citation policy

  10. Description of UCI HAR and UniMiB SHAR.

    • plos.figshare.com
    xls
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarmela Raja Sekaran; Ying Han Pang; Lim Zheng You; Ooi Shih Yin (2024). Description of UCI HAR and UniMiB SHAR. [Dataset]. http://doi.org/10.1371/journal.pone.0304655.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 13, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sarmela Raja Sekaran; Ying Han Pang; Lim Zheng You; Ooi Shih Yin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recognising human activities using smart devices has led to countless inventions in various domains like healthcare, security, sports, etc. Sensor-based human activity recognition (HAR), especially smartphone-based HAR, has become popular among the research community due to lightweight computation and user privacy protection. Deep learning models are the most preferred solutions in developing smartphone-based HAR as they can automatically capture salient and distinctive features from input signals and classify them into respective activity classes. However, in most cases, the architecture of these models needs to be deep and complex for better classification performance. Furthermore, training these models requires extensive computational resources. Hence, this research proposes a hybrid lightweight model that integrates an enhanced Temporal Convolutional Network (TCN) with Gated Recurrent Unit (GRU) layers for salient spatiotemporal feature extraction without tedious manual feature extraction. Essentially, dilations are incorporated into each convolutional kernel in the TCN-GRU model to extend the kernel’s field of view without imposing additional model parameters. Moreover, fewer short filters are applied for each convolutional layer to alleviate excess parameters. Despite reducing computational cost, the proposed model utilises dilations, residual connections, and GRU layers for longer-term time dependency modelling by retaining longer implicit features of the input inertial sequences throughout training to provide sufficient information for future prediction. The performance of the TCN-GRU model is verified on two benchmark smartphone-based HAR databases, i.e., UCI HAR and UniMiB SHAR. The model attains promising accuracy in recognising human activities with 97.25% on UCI HAR and 93.51% on UniMiB SHAR. Since the current study exclusively works on the inertial signals captured by smartphones, future studies will explore the generalisation of the proposed TCN-GRU across diverse datasets, including various sensor types, to ensure its adaptability across different applications.

  11. Classifying wine varieties

    • kaggle.com
    zip
    Updated Jun 20, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    brynja (2017). Classifying wine varieties [Dataset]. https://www.kaggle.com/brynja/wineuci
    Explore at:
    zip(4305 bytes)Available download formats
    Dataset updated
    Jun 20, 2017
    Authors
    brynja
    Description

    Context

    Wine recognition dataset from UC Irvine. Great for testing out different classifiers

    Labels: "name" - Number denoting a specific wine class

    Number of instances of each wine class

    • Class 1 - 59
    • Class 2 - 71
    • Class 3 - 48

    Features:

    1. Alcohol
    2. Malic acid
    3. Ash
    4. Alcalinity of ash
    5. Magnesium
    6. Total phenols
    7. Flavanoids
    8. Nonflavanoid phenols
    9. Proanthocyanins
    10. Color intensity
    11. Hue
    12. OD280/OD315 of diluted wines
    13. Proline

    Content

    "This data set is the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines"

    Acknowledgements

    Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

    @misc{Lichman:2013 , author = "M. Lichman", year = "2013", title = "{UCI} Machine Learning Repository", url = "http://archive.ics.uci.edu/ml", institution = "University of California, Irvine, School of Information and Computer Sciences" }

    UC Irvine data base: "https://archive.ics.uci.edu/ml/machine-learning-databases/wine"

    Sources: (a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. (b) Stefan Aeberhard, email: stefan@coral.cs.jcu.edu.au (c) July 1991 Past Usage: (1) S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Technometrics).

    The data was used with many others for comparing various classifiers. The classes are separable, though only RDA has achieved 100% correct classification. (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) (All results using the leave-one-out technique)

    (2) S. Aeberhard, D. Coomans and O. de Vel, "THE CLASSIFICATION PERFORMANCE OF RDA" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Journal of Chemometrics).

    Inspiration

    This data set is great for drawing comparisons between algorithms and testing out classifications models when learning new techniques

  12. h

    census-income

    • huggingface.co
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WC (2025). census-income [Dataset]. https://huggingface.co/datasets/cestwc/census-income
    Explore at:
    Dataset updated
    Jul 21, 2025
    Authors
    WC
    Description

    Dataset Card for Census Income (Adult)

    This dataset is a precise version of Adult or Census Income. This dataset from UCI somehow happens to occupy two links, but we checked and confirm that they are identical. We used the following python script to create this Hugging Face dataset. import pandas as pd from datasets import Dataset, DatasetDict, Features, Value, ClassLabel

    URLs

    url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data" url2 =… See the full description on the dataset page: https://huggingface.co/datasets/cestwc/census-income.

  13. o

    mfeat-factors

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert P.W. Duin (2014). mfeat-factors [Dataset]. https://openml.org/search?type=data&sort=runs&id=12
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    Robert P.W. Duin
    Description

    Author: Robert P.W. Duin, Department of Applied Physics, Delft University of Technology
    Source: UCI - 1998
    Please cite: UCI

    Multiple Features Dataset: Factors
    One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same original character. 200 instances per class (for a total of 2,000 instances) have been digitized in binary images.

    Attribute Information

    The attributes represent 216 profile correlations. No more information is known.

    Relevant Papers

    A slightly different version of the database is used in
    M. van Breukelen, R.P.W. Duin, D.M.J. Tax, and J.E. den Hartog, Handwritten digit recognition by combined classifiers, Kybernetika, vol. 34, no. 4, 1998, 381-386.

    The database as is is used in:
    A.K. Jain, R.P.W. Duin, J. Mao, Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence archive, Volume 22 Issue 1, January 2000

  14. Daily Demand Forecasting Orders from UCI ML

    • kaggle.com
    zip
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pham Huyen (2025). Daily Demand Forecasting Orders from UCI ML [Dataset]. https://www.kaggle.com/datasets/phamhuyen286/daily-demand-forecasting-orders-from-uci-ml
    Explore at:
    zip(2870 bytes)Available download formats
    Dataset updated
    Jan 7, 2025
    Authors
    Pham Huyen
    Description

    The dataset was collected during 60 days, this is a real database of a brazilian logistics company. The dataset has twelve predictive attributes and a target that is the total of orders for daily treatment. The database was used in academic research at the Universidade Nove de Julho.

  15. d

    Annual Income - PCS5031 Project

    • datadryad.org
    • search.dataone.org
    zip
    Updated Sep 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emerson Cruz; Mauro Ohara (2016). Annual Income - PCS5031 Project [Dataset]. http://doi.org/10.15146/R3T88S
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2016
    Dataset provided by
    Dryad
    Authors
    Emerson Cruz; Mauro Ohara
    Time period covered
    Sep 28, 2016
    Description

    The data used in this project is a sample from a census data(1994) from the US census database. The data generated will contain census prediction models income for the selected sample. When inserted new data on a specific person, the model will indicate whether the person will achieve a desired income census. From the data a computational learning process will be used to do inference trough bayesian networks

  16. Z

    Data from: Imbalanced dataset for benchmarking

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lemaitre, Guillaume; Nogueira, Fernando; Aridas, Christos K.; Oliveira, Dayvid V. R. (2020). Imbalanced dataset for benchmarking [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_61452
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Universite de Bourgogne, Universitat de Girona
    University of Patras
    Universidade Federal de Pernambuco
    ShoppeAI
    Authors
    Lemaitre, Guillaume; Nogueira, Fernando; Aridas, Christos K.; Oliveira, Dayvid V. R.
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Imbalanced dataset for benchmarking

    The different algorithms of the imbalanced-learn toolbox are evaluated on a set of common dataset, which are more or less balanced. These benchmark have been proposed in [1]. The following section presents the main characteristics of this benchmark.

    Characteristics

    IDNameRepository & TargetRatio# samples# features
    1EcoliUCI, target: imU8.6:13367
    2Optical DigitsUCI, target: 89.1:15,62064
    3SatImageUCI, target: 49.3:16,43536
    4Pen DigitsUCI, target: 59.4:110,99216
    5AbaloneUCI, target: 79.7:14,1778
    6Sick EuthyroidUCI, target: sick euthyroid9.8:13,16325
    7SpectrometerUCI, target: >=4411:153193
    8Car_Eval_34UCI, target: good, v good12:11,7286
    9ISOLETUCI, target: A, B12:17,797617
    10US CrimeUCI, target: >0.6512:11,994122
    11Yeast_ML8LIBSVM, target: 813:12,417103
    12SceneLIBSVM, target: >one label13:12,407294
    13Libras MoveUCI, target: 114:136090
    14Thyroid SickUCI, target: sick15:13,77228
    15Coil_2000KDD, CoIL, target: minority16:19,82285
    16ArrhythmiaUCI, target: 0617:1452279
    17Solar Flare M0UCI, target: M->019:11,38910
    18OILUCI, target: minority22:193749
    19Car_Eval_4UCI, target: vgood26:11,7286
    20Wine QualityUCI, wine, target: <=426:14,89811
    21Letter ImgUCI, target: Z26:120,00016
    22Yeast _ME2UCI, target: ME228:11,4848
    23WebpageLIBSVM, w7a, target: minority33:149,749300
    24Ozone LevelUCI, ozone, data34:12,53672
    25MammographyUCI, target: minority42:111,1836
    26Protein homo.KDD CUP 2004, minority111:1145,75174
    27Abalone_19UCI, target: 19130:14,1778

    References

    [1] Ding, Zejin, "Diversified Ensemble Classifiers for H ighly Imbalanced Data Learning and their Application in Bioinformatics." Dissertation, Georgia State University, (2011).

    [2] Blake, Catherine, and Christopher J. Merz. "UCI Repository of machine learning databases." (1998).

    [3] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

    [4] Caruana, Rich, Thorsten Joachims, and Lars Backstrom. "KDD-Cup 2004: results and analysis." ACM SIGKDD Explorations Newsletter 6.2 (2004): 95-108.

  17. Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. http://doi.org/10.5281/zenodo.8098909
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

    This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

    The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

    The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

    Dataset References

  18. o

    solar-flare

    • openml.org
    Updated Apr 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). solar-flare [Dataset]. https://www.openml.org/d/40686
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2017
    Description

    Author: Gary Bradshaw
    Source: UCI
    Please cite:

    Solar Flare database Relevant Information: -- The database contains 3 potential classes, one for the number of times a certain type of solar flare occured in a 24 hour period. -- Each instance represents captured features for 1 active region on the sun. -- The data are divided into two sections. The second section (flare.data2) has had much more error correction applied to the it, and has consequently been treated as more reliable.

    Number of Instances: flare.data1: 323, flare.data2: 1066

    Number of attributes: 13 (includes 3 class attributes)

    Attribute Information

    1. Code for class (modified Zurich class) (A,B,C,D,E,F,H)
    2. Code for largest spot size       (X,R,S,A,H,K)
    3. Code for spot distribution       (X,O,I,C)
    4. Activity                (1 = reduced, 2 = unchanged)
    5. Evolution                (1 = decay, 2 = no growth, 
                          3 = growth)
    6. Previous 24 hour flare activity code  (1 = nothing as big as an M1,
                          2 = one M1,
                          3 = more activity than one M1)
    7. Historically-complex          (1 = Yes, 2 = No)
    8. Did region become historically complex (1 = yes, 2 = no) 
      on this pass across the sun's disk
    9. Area                  (1 = small, 2 = large)
    
    1. Area of the largest spot (1 = <=5, 2 = >5)

      From all these predictors three classes of flares are predicted, which are represented in the last three columns.

    2. C-class flares production by this region Number
      in the following 24 hours (common flares)

    3. M-class flares production by this region Number in the following 24 hours (moderate flares)

    4. X-class flares production by this region Number in the following 24 hours (severe flares)

      CLASSTYPE: nominal CLASSINDEX: first

  19. DAGHAR: A Benchmark for Domain Adaptation and Generalization in...

    • zenodo.org
    zip
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Otávio Oliveira Napoli; Otávio Oliveira Napoli; Dami Duarte; Dami Duarte; Patrick Alves; Darlinne Hubert Palo Soto; Henrique Evangelista de Oliveira; Anderson Rocha; Anderson Rocha; Levy Boccato; Levy Boccato; Edson Borin; Edson Borin; Patrick Alves; Darlinne Hubert Palo Soto; Henrique Evangelista de Oliveira (2024). DAGHAR: A Benchmark for Domain Adaptation and Generalization in Smartphone-Based Human Activity Recognition [Dataset]. http://doi.org/10.5281/zenodo.11992126
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 7, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Otávio Oliveira Napoli; Otávio Oliveira Napoli; Dami Duarte; Dami Duarte; Patrick Alves; Darlinne Hubert Palo Soto; Henrique Evangelista de Oliveira; Anderson Rocha; Anderson Rocha; Levy Boccato; Levy Boccato; Edson Borin; Edson Borin; Patrick Alves; Darlinne Hubert Palo Soto; Henrique Evangelista de Oliveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DAGHAR benchmark is a curated dataset collection designed for domain adaptation and domain generalization studies in HAR tasks, using inertial sensors such as accelerometers and gyroscopes, from "A benchmark for domain adaptation and generalization in smartphone-based human activity recognition" work. It features raw inertial sensor data sourced exclusively from smartphones. Six public datasets were selected and standardized in terms of accelerometer units of measurement, sampling rate, gravity component, activity labels, user partitioning, and time window size. This standardization process allows for creating a comprehensive benchmark for evaluating the generalization capabilities of HAR models in cross-dataset scenarios.

    The benchmark is based on the following datasets:

    • Ku-HAR, from "Sikder, N. and Nahid, A.A., 2021. KU-HAR: An open dataset for heterogeneous human activity recognition. Pattern Recognition Letters, 146, pp.46-54", avaliable at Mendeley. Distributed under CC BY 4.0.
    • MotionSense, from "Malekzadeh, M., Clegg, R.G., Cavallaro, A. and Haddadi, H., 2019, April. Mobile sensor data anonymization. In Proceedings of the international conference on internet of things design and implementation (pp. 49-58)", available at Kaggle. Distributed under Open Data Commons Open Database License (ODbL) v1.0.
    • RealWorld, from "Sztyler, T. and Stuckenschmidt, H., 2016, March. On-body localization of wearable devices: An investigation of position-aware activity recognition. In 2016 IEEE international conference on pervasive computing and communications (PerCom) (pp. 1-9). IEEE", available at this link. We obtained explicitly permission to distribute a copy of the preprocessed data from the original authors.
    • UCI-HAR, from "Reyes-Ortiz, J.L., Oneto, L., SamĂ , A., Parra, X. and Anguita, D., 2016. Transition-aware human activity recognition using smartphones. Neurocomputing, 171, pp.754-767", available at UCI Repository. Distributed under CC BY 4.0.
    • WISDM, from "Weiss, G.M., Yoneda, K. and Hayajneh, T., 2019. Smartphone and smartwatch-based biometrics using activities of daily living. Ieee Access, 7, pp.133190-133202", available at UCI repository. Distributed under CC BY 4.0.
  20. UCI drug name dataset

    • kaggle.com
    zip
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Alghali (2024). UCI drug name dataset [Dataset]. https://www.kaggle.com/datasets/ahmedalghali/uci-drug-name-dataset
    Explore at:
    zip(76366968 bytes)Available download formats
    Dataset updated
    Jan 23, 2024
    Authors
    Ahmed Alghali
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Ahmed Alghali

    Released under Database: Open Database, Contents: Database Contents

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Medi Hunter - 4004 (2025). Cancer Multiple Dataset UCI MLR [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasakbd/cancer-multiple-dataset-uci-mlr/suggestions
Organization logo

Cancer Multiple Dataset UCI MLR

UCI Machine Learning Repository is a collection of databases

Explore at:
zip(74213598 bytes)Available download formats
Dataset updated
Aug 5, 2025
Authors
Medi Hunter - 4004
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Source More Info : https://archive.ics.uci.edu/datasets

The **UCI Machine Learning Repository **is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.

The datasets collected in this project represent a diverse and comprehensive set of cancer-related data sourced from the UCI Machine Learning Repository. They cover a wide spectrum of cancer types and research perspectives, including breast cancer datasets such as the original, diagnostic, prognostic, and Coimbra variants, which focus on tumor features, recurrence, and biochemical markers. Cervical cancer is represented through datasets focusing on behavioral risks and general risk factors. The lung cancer dataset provides categorical diagnostic attributes, while the primary tumor dataset offers insights into tumor locations based on metastasis data. Additionally, specialized datasets like differentiated thyroid cancer recurrence, glioma grading with clinical and mutation features, and gene expression RNA-Seq data expand the scope into genetic and molecular-level cancer analysis. Together, these datasets support a wide range of machine learning applications including classification, prediction, survival analysis, and feature correlation across various types of cancer.

RRA_Think Differently, Create history’s next line.

Hello Data Hunters! Hope you're doing well. https://www.kaggle.com/shuvokumarbasak4004 (More Dataset) https://www.kaggle.com/shuvokumarbasak2030

Search
Clear search
Close search
Google apps
Main menu