64 datasets found
  1. UCI-dataset

    • kaggle.com
    zip
    Updated Aug 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Waquar Azam (2022). UCI-dataset [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/ucidatasetlist
    Explore at:
    zip(20774 bytes)Available download formats
    Dataset updated
    Aug 17, 2022
    Authors
    Md Waquar Azam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.

    There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year

    Some missing values are present there also,

    You can analyse the as per your requirement

    EDA

  2. Phishing Dataset UCI ML CSV

    • kaggle.com
    zip
    Updated Sep 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satish Yadav (2020). Phishing Dataset UCI ML CSV [Dataset]. https://www.kaggle.com/datasets/isatish/phishing-dataset-uci-ml-csv
    Explore at:
    zip(112567 bytes)Available download formats
    Dataset updated
    Sep 27, 2020
    Authors
    Satish Yadav
    Description

    Context

    This dataset is taken from UCI Phishing Dataset originally in ARFF format, converted into CSV. This dataset can be used to train and validate Phishing Detection Machine Learning Projects

  3. Abalone Dataset from UCI

    • kaggle.com
    zip
    Updated Apr 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elza (2024). Abalone Dataset from UCI [Dataset]. https://www.kaggle.com/datasets/nayanack/abalone-dataset-from-uci
    Explore at:
    zip(61442 bytes)Available download formats
    Dataset updated
    Apr 19, 2024
    Authors
    Elza
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12038776%2F405363d221ecc31c9c929a0dff4fddca%2FAbalone_300.jpg?generation=1713152286444364&alt=media" alt="">https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12038776%2F7138a2185d033540f53100d39e880433%2FAbalone_533x.webp?generation=1713152221203864&alt=media" alt="">>

    Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.

  4. D

    UCI HAR Dataset Processed

    • researchdata.ntu.edu.sg
    Updated May 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Ragab; Mohamed Ragab; Emadeldeen Eldele; Emadeldeen Eldele (2022). UCI HAR Dataset Processed [Dataset]. http://doi.org/10.21979/N9/0SYHTZ
    Explore at:
    bin(3686095), bin(3545878), application/x-ipynb+json(7803), bin(1731800), bin(3350967), bin(3484869), bin(1585859), bin(3597503), bin(3095904), bin(3570740), bin(3218754), bin(2884241), bin(3746839), bin(3166405), bin(1283093), bin(1518360), bin(3207671), bin(3835115), bin(1458811), bin(1315154), bin(3369439), bin(2974646), bin(1509915), text/x-python(4587), application/x-ipynb+json(16123), bin(1580644), bin(1390458), bin(1630135), bin(3213684), bin(1285362), bin(3661680), bin(3112375), bin(1545871), bin(3422740), bin(1379532), bin(1354626), bin(3781860), bin(3787291), bin(2979793), bin(1731929), bin(3214992), bin(1230467), bin(4024538), bin(1359387), application/x-ipynb+json(18215), bin(2848602), text/x-python(5714), bin(4021695), bin(1357798), bin(1652277), bin(1397906), bin(1485800), bin(1331975), bin(1562213), bin(3617766), bin(3044305), bin(3147079), bin(1622003), bin(1196894), bin(1544345), bin(1252355), bin(1451586), bin(2759950), bin(1388870), bin(1619341)Available download formats
    Dataset updated
    May 27, 2022
    Dataset provided by
    DR-NTU (Data)
    Authors
    Mohamed Ragab; Mohamed Ragab; Emadeldeen Eldele; Emadeldeen Eldele
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    UCIHAR is one of the most widely used datasets to evaluate performance on time series data. It contains three different sensors namely, accelerometer, gyroscope, and body sensors. These sensors have been used to collect data from 30 different persons. In our experiments, we treat each subject as a separate domain. Due to the large number of cross-domain combinations, we randomly selected five cross domain scenarios,

  5. o

    kr-vs-kp

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alen Shapiro (2014). kr-vs-kp [Dataset]. https://www.openml.org/d/3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    Alen Shapiro
    Description

    Author: Alen Shapiro Source: UCI Please cite: UCI citation policy

    1. Title: Chess End-Game -- King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7). The pawn on a7 means it is one square away from queening. It is the King+Rook's side (white) to move.

    2. Sources: (a) Database originally generated and described by Alen Shapiro. (b) Donor/Coder: Rob Holte (holte@uottawa.bitnet). The database was supplied to Holte by Peter Clark of the Turing Institute in Glasgow (pete@turing.ac.uk). (c) Date: 1 August 1989

    3. Past Usage:

    4. Alen D. Shapiro (1983,1987), "Structured Induction in Expert Systems", Addison-Wesley. This book is based on Shapiro's Ph.D. thesis (1983) at the University of Edinburgh entitled "The Role of Structured Induction in Expert Systems".

    5. Stephen Muggleton (1987), "Structuring Knowledge by Asking Questions", pp.218-229 in "Progress in Machine Learning", edited by I. Bratko and Nada Lavrac, Sigma Press, Wilmslow, England SK9 5BB.

    6. Robert C. Holte, Liane Acker, and Bruce W. Porter (1989), "Concept Learning and the Problem of Small Disjuncts", Proceedings of IJCAI. Also available as technical report AI89-106, Computer Sciences Department, University of Texas at Austin, Austin, Texas 78712.

    7. Relevant Information: The dataset format is described below. Note: the format of this database was modified on 2/26/90 to conform with the format of all the other databases in the UCI repository of machine learning databases.

    8. Number of Instances: 3196 total

    9. Number of Attributes: 36

    10. Attribute Summaries: Classes (2): -- White-can-win ("won") and White-cannot-win ("nowin"). I believe that White is deemed to be unable to win if the Black pawn can safely advance. Attributes: see Shapiro's book.

    11. Missing Attributes: -- none

    12. Class Distribution: In 1669 of the positions (52%), White can win. In 1527 of the positions (48%), White cannot win.

    The format for instances in this database is a sequence of 37 attribute values. Each instance is a board-descriptions for this chess endgame. The first 36 attributes describe the board. The last (37th) attribute is the classification: "win" or "nowin". There are 0 missing values. A typical board-description is

    f,f,f,f,f,f,f,f,f,f,f,f,l,f,n,f,f,t,f,f,f,f,f,f,f,t,f,f,f,f,f,f,f,t,t,n,won

    The names of the features do not appear in the board-descriptions. Instead, each feature correponds to a particular position in the feature-value list. For example, the head of this list is the value for the feature "bkblk". The following is the list of features, in the order in which their values appear in the feature-value list:

    [bkblk,bknwy,bkon8,bkona,bkspr,bkxbq,bkxcr,bkxwp,blxwp,bxqsq,cntxt,dsopp,dwipd, hdchk,katri,mulch,qxmsq,r2ar8,reskd,reskr,rimmx,rkxwp,rxmsq,simpl,skach,skewr, skrxp,spcop,stlmt,thrsk,wkcti,wkna8,wknck,wkovl,wkpos,wtoeg]

    In the file, there is one instance (board position) per line.

    Num Instances: 3196 Num Attributes: 37 Num Continuous: 0 (Int 0 / Real 0) Num Discrete: 37 Missing values: 0 / 0.0%

  6. o

    mushroom

    • openml.org
    Updated Apr 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeff Schlimmer (2014). mushroom [Dataset]. https://www.openml.org/d/24
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2014
    Authors
    Jeff Schlimmer
    Description

    Author: Jeff Schlimmer
    Source: UCI - 1981
    Please cite: The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf

    Description

    This dataset describes mushrooms in terms of their physical characteristics. They are classified into: poisonous or edible.

    Source

    (a) Origin: 
    Mushroom records are drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf 
    
    (b) Donor: 
    Jeff Schlimmer (Jeffrey.Schlimmer '@' a.gp.cs.cmu.edu)
    

    Dataset description

    This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like ``leaflets three, let it be'' for Poisonous Oak and Ivy.

    Attributes Information

    1. cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s 
    2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s 
    3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y 
    4. bruises?: bruises=t,no=f 
    5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s 
    6. gill-attachment: attached=a,descending=d,free=f,notched=n 
    7. gill-spacing: close=c,crowded=w,distant=d 
    8. gill-size: broad=b,narrow=n 
    9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y 
    10. stalk-shape: enlarging=e,tapering=t 
    11. stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=? 
    12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s 
    13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s 
    14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y 
    15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y 
    16. veil-type: partial=p,universal=u 
    17. veil-color: brown=n,orange=o,white=w,yellow=y 
    18. ring-number: none=n,one=o,two=t 
    19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z 
    20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y 
    21. population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y 
    22. habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d
    

    Relevant papers

    Schlimmer,J.S. (1987). Concept Acquisition Through Representational Adjustment (Technical Report 87-19). Doctoral disseration, Department of Information and Computer Science, University of California, Irvine.

    Iba,W., Wogulis,J., & Langley,P. (1988). Trading off Simplicity and Coverage in Incremental Concept Learning. In Proceedings of the 5th International Conference on Machine Learning, 73-79. Ann Arbor, Michigan: Morgan Kaufmann.

    Duch W, Adamczak R, Grabczewski K (1996) Extraction of logical rules from training data using backpropagation networks, in: Proc. of the The 1st Online Workshop on Soft Computing, 19-30.Aug.1996, pp. 25-30, [Web Link]

    Duch W, Adamczak R, Grabczewski K, Ishikawa M, Ueda H, Extraction of crisp logical rules using constrained backpropagation networks - comparison of two new approaches, in: Proc. of the European Symposium on Artificial Neural Networks (ESANN'97), Bruge, Belgium 16-18.4.1997.

  7. h

    Data from: breast-cancer-wisconsin

    • huggingface.co
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scikit-learn (2025). breast-cancer-wisconsin [Dataset]. https://huggingface.co/datasets/scikit-learn/breast-cancer-wisconsin
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 26, 2025
    Dataset authored and provided by
    scikit-learn
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Breast Cancer Wisconsin Diagnostic Dataset

    Following description was retrieved from breast cancer dataset on UCI machine learning repository. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. A few of the images can be found at here. Separating plane described above was obtained using Multisurface Method-Tree (MSM-T), a classification method which uses linear… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/breast-cancer-wisconsin.

  8. o

    madelon

    • openml.org
    Updated May 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). madelon [Dataset]. https://www.openml.org/d/1485
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 22, 2015
    Description

    Author: Isabelle Guyon
    Source: UCI
    Please cite: Isabelle Guyon, Steve R. Gunn, Asa Ben-Hur, Gideon Dror, 2004. Result analysis of the NIPS 2003 feature selection challenge.

    Abstract:

    MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is that the problem is multivariate and highly non-linear.

    Source:

    Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 90708 isabelle '@' clopinet.com

    Data Set Information:

    MADELON is an artificial dataset containing data points grouped in 32 clusters placed on the vertices of a five-dimensional hypercube and randomly labeled +1 or -1. The five dimensions constitute 5 informative features. 15 linear combinations of those features were added to form a set of 20 (redundant) informative features. Based on those 20 features one must separate the examples into the 2 classes (corresponding to the +-1 labels). It was added a number of distractor feature called 'probes' having no predictive power. The order of the features and patterns were randomized.

    This dataset is one of five datasets used in the NIPS 2003 feature selection challenge. The original data was split into training, validation and test set. Target values are provided only for two first sets (not for the test set). So, this dataset version contains all the examples from training and validation partitions.

    There is no attribute information provided to avoid biasing the feature selection process.

    Relevant Papers:

    The best challenge entrants wrote papers collected in the book: Isabelle Guyon, Steve Gunn, Masoud Nikravesh, Lofti Zadeh (Eds.), Feature Extraction, Foundations and Applications. Studies in Fuzziness and Soft Computing. Physica-Verlag, Springer.

    Isabelle Guyon, et al, 2007. Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recognition Letters 28 (2007) 1438–1444.

    Isabelle Guyon, et al. 2006. Feature selection with the CLOP package. Technical Report.

  9. o

    PhishingWebsites

    • openml.org
    Updated Feb 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae) (2016). PhishingWebsites [Dataset]. https://www.openml.org/d/4534
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2016
    Authors
    Rami Mustafa A Mohammad ( University of Huddersfield; rami.mohammad '@' hud.ac.uk; rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield; t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai; fadi '@' cud.ac.ae)
    Description

    Author: Rami Mustafa A Mohammad ( University of Huddersfield","rami.mohammad '@' hud.ac.uk","rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield","t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai","fadi '@' cud.ac.ae)
    Source: UCI
    Please cite: Please refer to the Machine Learning Repository's citation policy

    Source:

    Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi Thabtah (Canadian University of Dubai,fadi '@' cud.ac.ae)

    Data Set Information:

    One of the challenges faced by our research was the unavailability of reliable training datasets. In fact this challenge faces any researcher in the field. However, although plenty of articles about predicting phishing websites have been disseminated these days, no reliable training dataset has been published publically, may be because there is no agreement in literature on the definitive features that characterize phishing webpages, hence it is difficult to shape a dataset that covers all possible features. In this dataset, we shed light on the important features that have proved to be sound and effective in predicting phishing websites. In addition, we propose some new features.

    Attribute Information:

    For Further information about the features see the features file in the data folder of UCI.

    Relevant Papers:

    Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi (2012) An Assessment of Features Related to Phishing Websites using an Automated Technique. In: International Conferece For Internet Technology And Secured Transactions. ICITST 2012 . IEEE, London, UK, pp. 492-497. ISBN 978-1-4673-5325-0

    Mohammad, Rami, Thabtah, Fadi Abdeljaber and McCluskey, T.L. (2014) Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25 (2). pp. 443-458. ISSN 0941-0643

    Mohammad, Rami, McCluskey, T.L. and Thabtah, Fadi Abdeljaber (2014) Intelligent Rule based Phishing Websites Classification. IET Information Security, 8 (3). pp. 153-160. ISSN 1751-8709

    Citation Request:

    Please refer to the Machine Learning Repository's citation policy

  10. UCI Mushroom Dataset

    • kaggle.com
    zip
    Updated Mar 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rini Christy (2022). UCI Mushroom Dataset [Dataset]. https://www.kaggle.com/datasets/rinichristy/uci-mushroom-dataset
    Explore at:
    zip(35083 bytes)Available download formats
    Dataset updated
    Mar 25, 2022
    Authors
    Rini Christy
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Data Set Information:

    This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family . Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.

  11. Online Shoppers Intention UCI Machine Learning

    • kaggle.com
    zip
    Updated Jan 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henry Sue (2020). Online Shoppers Intention UCI Machine Learning [Dataset]. https://www.kaggle.com/datasets/henrysue/online-shoppers-intention
    Explore at:
    zip(258305 bytes)Available download formats
    Dataset updated
    Jan 15, 2020
    Authors
    Henry Sue
    Description

    Data Set Information:

    The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period.

    Dataset Origin:

    https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset

    Source:

    1. C. Okan Sakar Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Bahcesehir University, 34349 Besiktas, Istanbul, Turkey

    2. Yomi Kastro Inveon Information Technologies Consultancy and Trade, 34335 Istanbul, Turkey

    Relevant Papers:

    Sakar, C.O., Polat, S.O., Katircioglu, M. et al. Neural Comput & Applic (2018). [Web Link]

    Citation Request:

    If you use this dataset, please cite: Sakar, C.O., Polat, S.O., Katircioglu, M. et al. Neural Comput & Applic (2018). [Web Link]

    Cover Photo:

    Photo by Bruno Kelzer on Unsplash

    Dataset downloaded from UCI Machine Learning Repository.

    Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

  12. Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. http://doi.org/10.5281/zenodo.8098909
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

    This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

    The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

    The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

    Dataset References

  13. h

    iris

    • huggingface.co
    Updated Sep 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scikit-learn (2022). iris [Dataset]. https://huggingface.co/datasets/scikit-learn/iris
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2022
    Dataset authored and provided by
    scikit-learn
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Iris Species Dataset

    The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The dataset is taken from UCI Machine Learning Repository's… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/iris.

  14. h

    sms_spam

    • huggingface.co
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UC Irvine (2023). sms_spam [Dataset]. https://huggingface.co/datasets/ucirvine/sms_spam
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2023
    Dataset authored and provided by
    UC Irvine
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for [Dataset Name]

      Dataset Summary
    

    The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    English

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    [More Information… See the full description on the dataset page: https://huggingface.co/datasets/ucirvine/sms_spam.

  15. processed.cleveland.data.csv

    • figshare.com
    txt
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramkumar R P; Sanjeeva Polepaka; Karuna G; Ch Mallikarjuna Rao (2022). processed.cleveland.data.csv [Dataset]. http://doi.org/10.6084/m9.figshare.20410665.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ramkumar R P; Sanjeeva Polepaka; Karuna G; Ch Mallikarjuna Rao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Cleveland
    Description

    Heart Disease Dataset from UCI Repository

  16. R

    Image_fusion Dataset

    • universe.roboflow.com
    zip
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2023). Image_fusion Dataset [Dataset]. https://universe.roboflow.com/uci-1rr6x/image_fusion-vfqr2/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 28, 2023
    Dataset authored and provided by
    UCI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bubble Dh5F Polygons
    Description

    Image_Fusion

    ## Overview
    
    Image_Fusion is a dataset for instance segmentation tasks - it contains Bubble Dh5F annotations for 647 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  17. o

    isolet

    • openml.org
    Updated Aug 20, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ron Cole; Mark Fanty (2014). isolet [Dataset]. https://www.openml.org/d/300
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2014
    Authors
    Ron Cole; Mark Fanty
    Description

    Author: Ron Cole and Mark Fanty (cole@cse.ogi.edu, fanty@cse.ogi.edu)
    Donor: Tom Dietterich (tgd@cs.orst.edu)
    Source: UCI
    Please cite: UCI

    Description

    ISOLET (Isolated Letter Speech Recognition) dataset was generated as follows: 150 subjects spoke the name of each letter of the alphabet twice. Hence, there are 52 training examples from each speaker. The speakers are grouped into sets of 30 speakers each, 4 groups can serve as training set, the last group as the test set. A total of 3 examples are missing, the authors dropped them due to difficulties in recording.

    This is a good domain for a noisy, perceptual task. It is also a very good domain for testing the scaling abilities of algorithms. For example, C4.5 on this domain is slower than backpropagation!

    Source

    • Creators: Ron Cole and Mark Fanty Department of Computer Science and Engineering, Oregon Graduate Institute, Beaverton, OR 97006. cole '@' cse.ogi.edu, fanty '@' cse.ogi.edu

    • Donor: Tom Dietterich Department of Computer Science Oregon State University, Corvallis, OR 97331 tgd '@' cs.orst.edu

    Attributes Information

    All attributes are continuous, real-valued attributes scaled into the range -1.0 to 1.0. The features are described in the paper by Cole and Fanty cited below. The features include spectral coefficients; contour features, sonorant features, pre-sonorant features, and post-sonorant features. The exact order of appearance of the features is not known.

    Relevant papers

    Fanty, M., Cole, R. (1991). Spoken letter recognition.
    In Lippman, R. P., Moody, J., and Touretzky, D. S. (Eds). Advances in Neural Information Processing Systems 3. San Mateo, CA: Morgan Kaufmann.

    Dietterich, T. G., Bakiri, G. (1991) Error-correcting output codes: A general method for improving multiclass inductive learning programs.
    Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), Anaheim, CA: AAAI Press.

    Dietterich, T. G., Bakiri, G. (1994) Solving Multiclass Learning Problems via Error-Correcting Output Codes.

  18. phishing.arff

    • figshare.com
    txt
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ambroise Odonnat (2024). phishing.arff [Dataset]. http://doi.org/10.6084/m9.figshare.26232710.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 10, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ambroise Odonnat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains the data from the Phishing Website dataset provided in [1]. All the features are categorical and were preprocessed in integer values. The data can be downloaded from https://archive.ics.uci.edu/dataset/327/phishing+websites. There are 11055 samples with 30 features. Websites belong to 2 domains: websites that use the IP address used instead of the domain name in the URL and websites that use the domain name in the URL. For reference, please refer to: [1] R. Mohammad, F. Thabtah, L. Mccluskey. An assessment of features related to phishing websites using an automated technique In International Conference for Internet Technology and Secured Transactions, 2012

  19. d

    Allegheny County Parcel Boundaries

    • catalog.data.gov
    • data.wprdc.org
    • +1more
    Updated May 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allegheny County (2023). Allegheny County Parcel Boundaries [Dataset]. https://catalog.data.gov/dataset/allegheny-county-parcel-boundaries
    Explore at:
    Dataset updated
    May 14, 2023
    Dataset provided by
    Allegheny County
    Area covered
    Allegheny County
    Description

    This dataset contains boundaries of individual parcels in Allegheny County, including the county block and lot number. As this is a very large dataset, you may wish to use our property information extractor (http://tools.wprdc.org/parcels-n-at/) to download filtered versions of this parcel dataset. The most authoritative source for this data is now the PASDA page (https://www.pasda.psu.edu/uci/DataSummary.aspx?dataset=1214), which includes links to historical versions of the shapefile representations of this data.

  20. R

    Eyetoethnicity Dataset

    • universe.roboflow.com
    zip
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2024). Eyetoethnicity Dataset [Dataset]. https://universe.roboflow.com/uci-zh2rr/eyetoethnicity/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2024
    Dataset authored and provided by
    UCI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Eye
    Description

    EyeToEthnicity

    ## Overview
    
    EyeToEthnicity is a dataset for classification tasks - it contains Eye annotations for 9,976 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Md Waquar Azam (2022). UCI-dataset [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/ucidatasetlist
Organization logo

UCI-dataset

UCI dataset Name ,Link, Year of publishing till 2021

Explore at:
zip(20774 bytes)Available download formats
Dataset updated
Aug 17, 2022
Authors
Md Waquar Azam
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.

There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year

Some missing values are present there also,

You can analyse the as per your requirement

EDA

Search
Clear search
Close search
Google apps
Main menu