24 datasets found
  1. i

    UCI datasets

    • ieee-dataport.org
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets
    Explore at:
    Dataset updated
    May 14, 2025
    Authors
    Yuan Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    biology

  2. P

    UCI Machine Learning Repository Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan N. van Rijn; Jonathan K. Vis, UCI Machine Learning Repository Dataset [Dataset]. https://paperswithcode.com/dataset/uci-machine-learning-repository
    Explore at:
    Authors
    Jan N. van Rijn; Jonathan K. Vis
    Description

    UCI Machine Learning Repository is a collection of over 550 datasets.

  3. a

    UCI Machine Learning Datasets 12/2013

    • academictorrents.com
    bittorrent
    Updated Dec 20, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI (2013). UCI Machine Learning Datasets 12/2013 [Dataset]. https://academictorrents.com/details/7fafb101f9c7961f9b840daeb4af43039107ddef
    Explore at:
    bittorrent(16365432846)Available download formats
    Dataset updated
    Dec 20, 2013
    Dataset authored and provided by
    UCI
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged. Many people deserve thanks for making the repository a success. Foremost among them are the d

  4. UCI dataset

    • springernature.figshare.com
    bin
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen (2023). UCI dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20496258.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 13, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wan-Ting Hsieh; Sergio González Vázquez; Trista Chen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).

    [2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.

  5. UCI Heart Disease Data Set

    • kaggle.com
    Updated Jan 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lourens Walters (2021). UCI Heart Disease Data Set [Dataset]. https://www.kaggle.com/lourenswalters/uci-heart-disease-data-set/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 1, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lourens Walters
    Description

    Context

    The dataset used can be found on the UCI Machine Learning Repository at the following location:

    Heart Disease Dataset

    There are several copies of this dataset to be found on Kaggle, with people focusing on different types of analyses of the data. This specific copy can be analysed by anyone interested, but is primarily used by a study group from the Udacity Bertelsmann Technology Scholarship to practice analysis of association between variables as well as implementation and comparison of various Machine Learning models.

    Content

    According to the paper by (Detrano et al., 1989) as found on the UCI Dataset webpage, the data represents data collected for 303 patients referred for coronary angiography at the Cleveland Clinic between May 1981 and September 1984. The 13 independent/ features variables can be divided into 3 groups as follows:

    Routine evaluation (based on historical data):

    • ECG at rest
    • Serum Cholesterol
    • Fasting blood sugar

    Non-invasive test data (informed consent obtained for data as part of research protocol):

    • Exercise ECG
      • ST-segment peak slope (upsloping, flat or downsloping)
      • ST-segment depression
    • Excercise Thallium scintigraphy (fixed, reversible or none)
    • Cardiac fluoroscopy (number of vessels appeared to contain calcium)

    Other demographic and clinical variables (based on routine data):

    • Age
    • Sex
    • Chest pain type
    • Systolic blood pressure
    • ST-T-wave abnormality (T-wave abnormality)
    • Probably or definite ventricular hypertrophy (Este's criteria)
    • The dependent/ response variable was the angiographic test result indicating a >50% diameter narrowing.

    Data Dictionary

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3632459%2Fa01747fb0158dc51c12bc0824c9c4ae4%2Fdata_dictionary2.png?generation=1609522473018549&alt=media" alt="">

    Acknowledgements

    UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Donor:

    David W. Aha (aha '@' ics.uci.edu) (714) 856-8779

    Inspiration

    The objective of the analysis is to use statistical learning to identify factors associated with Coronary Artery Disease as indicated by a coronary angiography interpreted by a Cardiologist (as per paper written by Detrano et al cited before).

  6. Occupancy Detection Data Set UCI

    • kaggle.com
    Updated Aug 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    robmarkcole (2019). Occupancy Detection Data Set UCI [Dataset]. https://www.kaggle.com/robmarkcole/occupancy-detection-data-set-uci/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    robmarkcole
    Description

    Context

    This is the dataset Occupancy Detection Data Set, UCI as used in the article how-to-predict-room-occupancy-based-on-environmental-factors

    Content

    "no","date","Temperature","Humidity","Light","CO2","HumidityRatio","Occupancy"

    Acknowledgements

    UC Irvine Machine Learning Repository

  7. UCI datasets

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi; Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi (2023). UCI datasets [Dataset]. http://doi.org/10.5281/zenodo.7681792
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 4, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi; Mathias Drton; Stephan Haug; David Reifferscheidt; Oleksandr Zadorozhnyi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Collection of two datasets from the UCI website that could be used for structure learning tasks. Includes datasets regarding

    • Air Quality
    • US census 1990

    Size: Two datasets of sizes 9471*17 and 2458285*68 correspondingly

    Number of features: 15-68

    Ground truth: No

    Type of Graph: No ground truth

    More information about the datasets is contained in the dataset_description.html files.

  8. i

    Combined Cycle Power Plant Data Set

    • ieee-dataport.org
    Updated Dec 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arijit Goswami (2020). Combined Cycle Power Plant Data Set [Dataset]. https://ieee-dataport.org/documents/combined-cycle-power-plant-data-set-university-california-irvine
    Explore at:
    Dataset updated
    Dec 17, 2020
    Authors
    Arijit Goswami
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Irvine
    Description

    Irvine.

  9. A

    ‘Classifying wine varieties’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2001). ‘Classifying wine varieties’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-classifying-wine-varieties-72c7/latest
    Explore at:
    Dataset updated
    Feb 1, 2001
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Classifying wine varieties’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/brynja/wineuci on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Wine recognition dataset from UC Irvine. Great for testing out different classifiers

    Labels: "name" - Number denoting a specific wine class

    Number of instances of each wine class

    • Class 1 - 59
    • Class 2 - 71
    • Class 3 - 48

    Features:

    1. Alcohol
    2. Malic acid
    3. Ash
    4. Alcalinity of ash
    5. Magnesium
    6. Total phenols
    7. Flavanoids
    8. Nonflavanoid phenols
    9. Proanthocyanins
    10. Color intensity
    11. Hue
    12. OD280/OD315 of diluted wines
    13. Proline

    Content

    "This data set is the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines"

    Acknowledgements

    Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

    @misc{Lichman:2013 , author = "M. Lichman", year = "2013", title = "{UCI} Machine Learning Repository", url = "http://archive.ics.uci.edu/ml", institution = "University of California, Irvine, School of Information and Computer Sciences" }

    UC Irvine data base: "https://archive.ics.uci.edu/ml/machine-learning-databases/wine"

    Sources: (a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. (b) Stefan Aeberhard, email: stefan@coral.cs.jcu.edu.au (c) July 1991 Past Usage: (1) S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Technometrics).

    The data was used with many others for comparing various classifiers. The classes are separable, though only RDA has achieved 100% correct classification. (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) (All results using the leave-one-out technique)

    (2) S. Aeberhard, D. Coomans and O. de Vel, "THE CLASSIFICATION PERFORMANCE OF RDA" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Journal of Chemometrics).

    Inspiration

    This data set is great for drawing comparisons between algorithms and testing out classifications models when learning new techniques

    --- Original source retains full ownership of the source dataset ---

  10. Predict student's level

    • kaggle.com
    Updated Jun 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farkhod Khojikurbonov (2022). Predict student's level [Dataset]. https://www.kaggle.com/datasets/farkhod77/predict-students-level/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 27, 2022
    Dataset provided by
    Kaggle
    Authors
    Farkhod Khojikurbonov
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    User Knowledge Modeling Data Set

    Predict student's knowledge level

    • Level: Beginner
    • Recommended Use: Classification/Clustering
    • Domain: Education/Web

    https://img.freepik.com/free-photo/group-happy-young-students-university_85574-4531.jpg" alt="student">

    This beginner level data set has 403 rows and 6 columns. It is a real dataset about the students' knowledge status about the subject of Electrical DC Machines. This data set is recommended for learning and practicing your skills in exploratory data analysis, data visualization, and classification and clustering techniques. Feel free to explore the data set with multiple supervised and unsupervised learning techniques. The Following data dictionary gives more details on this data set:

    |Column Position|Atribute Name|Definition|Data Type|Example| | --- | --- | |1 |STG|The degree of study time for goal object materials |Quantitative |0.060, 0.100, 0.080 | |2 |SCG|The degree of repetition number of user for goal object materials |Quantitative |0.000, 0.100, 0.250 | |3 |STR|The degree of study time of user for related objects with goal object |Quantitative |0.10, 0.15, 0.05 | |4 |LPR|The exam performance of user for related objects with goal object |Quantitative |0.98, 0.10, 0.01 | |5 |PEG|The exam performance of user for goal objects |Quantitative |0.66, 0.56, 0.33 | |6 |UNS|The knowledge level of user (Very Low, Low, Middle, High) |Quantitative |"High", "Middle", "Low" |

    Acknowledgement

    This data set has been sourced from the Machine Learning Repository of University of California, Irvine User Knowledge Modeling Data Set (UC Irvine). The UCI page mentions the following publication as the original source of the data set: H. T. Kahraman, Sagiroglu, S., Colak, I., Developing intuitive knowledge classifier and modeling of users' domain dependent data in web, Knowledge Based Systems, vol. 37, pp. 283-295, 2013

  11. Data from: A new hybrid ensemble model with voting-based outlier detection...

    • figshare.com
    txt
    Updated Aug 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenyu Zhang; Dongqi Yang; Shuai Zhang (2020). A new hybrid ensemble model with voting-based outlier detection and balanced sampling for credit scoring [Dataset]. http://doi.org/10.6084/m9.figshare.12782552.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 11, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wenyu Zhang; Dongqi Yang; Shuai Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Three datasets from the UC Irvine (UCI) machine learning repository, that is, the Australian, German, and Japanese datasets, were adopted for the current study. The Australian credit dataset contains 690 samples, of which 307 are positive and 383 are negative. The dimensions of its input features are 15. The German credit dataset contains 1000 samples, 700 of which are positive and 300 are negative. The dimensions of its input features are 21. The Japanese credit dataset contains 690 samples, of which 383 are positive and 307 are negative. The dimensions of its input features are 16.

  12. h

    drug-reviews

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mouwiya S. A. Al-Qaisieh, drug-reviews [Dataset]. https://huggingface.co/datasets/Mouwiya/drug-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Mouwiya S. A. Al-Qaisieh
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    Dataset Details

      1.Dataset Loading:
    

    Initially, we load the Drug Review Dataset from the UC Irvine Machine Learning Repository. This dataset contains patient reviews of different drugs, along with the medical condition being treated and the patients' satisfaction ratings.

      2.Data Preprocessing:
    

    The dataset is preprocessed to ensure data integrity and consistency. We handle missing values and ensure that each patient ID is unique across the dataset.

      3.Text… See the full description on the dataset page: https://huggingface.co/datasets/Mouwiya/drug-reviews.
    
  13. A

    ‘Travel Review Rating Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Travel Review Rating Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-travel-review-rating-dataset-d315/6c6ad6b1/?iid=003-929&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Travel Review Rating Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/wirachleelakiatiwong/travel-review-rating-dataset on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set. This data set is populated by capturing user ratings from Google reviews. Reviews on attractions from 24 categories across Europe are considered. Google user rating ranges from 1 to 5 and average user rating per category is calculated.

    Content

    Attribute 1 : Unique user id Attribute 2 : Average ratings on churches Attribute 3 : Average ratings on resorts Attribute 4 : Average ratings on beaches Attribute 5 : Average ratings on parks Attribute 6 : Average ratings on theatres Attribute 7 : Average ratings on museums Attribute 8 : Average ratings on malls Attribute 9 : Average ratings on zoo Attribute 10 : Average ratings on restaurants Attribute 11 : Average ratings on pubs/bars Attribute 12 : Average ratings on local services Attribute 13 : Average ratings on burger/pizza shops Attribute 14 : Average ratings on hotels/other lodgings Attribute 15 : Average ratings on juice bars Attribute 16 : Average ratings on art galleries Attribute 17 : Average ratings on dance clubs Attribute 18 : Average ratings on swimming pools Attribute 19 : Average ratings on gyms Attribute 20 : Average ratings on bakeries Attribute 21 : Average ratings on beauty & spas Attribute 22 : Average ratings on cafes Attribute 23 : Average ratings on view points Attribute 24 : Average ratings on monuments Attribute 25 : Average ratings on gardens

    Acknowledgements

    This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set

    The UCI page mentions the following publication as the original source of the data set: Renjith, Shini, A. Sreekumar, and M. Jathavedan. 2018. Evaluation of Partitioning Clustering Algorithms for Processing Social Media Data in Tourism Domain. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 12731. IEEE

    Inspiration

    I'm kind of people who love traveling. But sometimes I've problems like where should I visit? Are there somewhere interesting places matched with my lifestyle? Often I spent hours to search for interesting place to go out. Such a waste of time.

    What if we can build a recommender system which can recommend you several interesting venue based on your preferences. With information from Google review, I'll try to divide Google review user into cluster of similar interest for further work of building recommender system based on thier preference.

    --- Original source retains full ownership of the source dataset ---

  14. Identifying Interesting Web Pages

    • kaggle.com
    Updated Sep 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning (2017). Identifying Interesting Web Pages [Dataset]. https://www.kaggle.com/uciml/identifying-interesting-web-pages/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 14, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    UCI Machine Learning
    Description

    Context

    The problem is to predict user ratings for web pages (within a subject category). The HTML source of a web page is given. Users looked at each web page and indicated on a 3 point scale (hot medium cold) 50-100 pages per domain.

    Content

    This database contains HTML source of web pages plus the ratings of a single user on these web pages. Web pages are on four separate subjects (Bands- recording artists; Goats; Sheep; and BioMedical).

    Acknowledgement

    Data originally from the UCI ML Repository. Donated by:

    Michael Pazzani Department of Information and Computer Science, University of California, Irvine Irvine, CA 92697-3425 pazzani@ics.uci.edu

    Concept based Information Access with Google for Personalized Information Retrieval

  15. Data from: A novel multi-stage ensemble model with fuzzy-clustering and...

    • figshare.com
    txt
    Updated Sep 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongqi Yang; Wenyu Zhang; Xin Wu; Jose H.Ablanedo; Wangzhi Yu (2020). A novel multi-stage ensemble model with fuzzy-clustering and optimized classifier composition for corporate bankruptcy prediction [Dataset]. http://doi.org/10.6084/m9.figshare.12103773.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 2, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Dongqi Yang; Wenyu Zhang; Xin Wu; Jose H.Ablanedo; Wangzhi Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this experiment, the datasets are from the UC Irvine (UCI) UCI machine learning repository (Zięba et al., 2016), which contains the financial indicators of Polish manufacturing corporates from 2007 to 2011 in the real world. The datasets were separated into five parts (each part represents each fiscal year) that describe the period from the 1st year (2007 fiscal year) to the 5th year (2011 fiscal year), which corresponds to five different bankruptcy cycles. The class labels (“0” is operating and “1” is bankruptcy) of the datasets are determined by the bankruptcy status of the enterprise in 2012. Furthermore, the Creator dataset from the real world that was published by a Chinese intelligent government services provider called Creator Information Technology Co., Ltd in 2019 was also adopted. The Creator dataset includes company management information of 35960 Chinese companies.

  16. f

    Data from: A novel multi-stage ensemble model with enhanced outlier...

    • figshare.com
    txt
    Updated Jun 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenyu Zhang; Dongqi Yang; Shuai Zhang; Jose H.Ablanedo; Yu Lou (2020). A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring [Dataset]. http://doi.org/10.6084/m9.figshare.12512360.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 19, 2020
    Dataset provided by
    figshare
    Authors
    Wenyu Zhang; Dongqi Yang; Shuai Zhang; Jose H.Ablanedo; Yu Lou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nine datasets from the UC Irvine (UCI) machine learning repository, i.e., the Australian, Japanese, German (Asuncion & Newman, 2007), Taiwan (Yeh & Lien, 2009) and Polish credit datasets (Zięba et al., 2016) were adopted for the current study. The Polish credit datasets contain five datasets distinguished five classification cases that depend on the forecasting period (e.g., the Polish 1, the Polish 2, the Polish 3, the Polish 4 and the Polish 5). AER credit dataset (Greene, 2003), which is a credit card dataset for econometric analysis. Creator dataset, which is published in 2019 by a Chinese digital government services provider named Creator Information Technology Co., Ltd[1]. The Creator dataset contains the property rights, financial statements, and basic company information of 35960 Chinese companies.

    [1] http://www.chinacreator.com/cn/

  17. Z

    Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vansummeren, Stijn (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8098908
    Explore at:
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Parciak, Marcel
    Peeters, Liesbet M.
    Weytjens, Sebastiaan
    Vansummeren, Stijn
    Hens, Niel
    Neven, Frank
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

    This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

    The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

    The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

    Dataset References

    adult.csv: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

    claims.csv: TSA Claims Data 2002 to 2006, published by the U.S. Department of Homeland Security.

    dblp10k.csv: Frequency-aware Similarity Measures. Lange, Dustin; Naumann, Felix (2011). 243–248. Made available as DBLP Dataset 2.

    hospital.csv: Hospital dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

    t_biocase_... files: t_bioc_... files used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

    tax.csv: Tax dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

  18. f

    Fault diagnosis of the train communication network based on multi hop edge...

    • figshare.com
    rar
    Updated Dec 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhaozhao Li (2019). Fault diagnosis of the train communication network based on multi hop edge approaching method and weighted support vector machine [Dataset]. http://doi.org/10.6084/m9.figshare.11330402.v1
    Explore at:
    rarAvailable download formats
    Dataset updated
    Dec 6, 2019
    Dataset provided by
    figshare
    Authors
    Zhaozhao Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets including: MVB waveform feature datasets; 15 UCI datasets from the University of California at Irvine(UCI) machine learning repository; 2 artificial synthetic datasets

  19. f

    Data from: Feature enhanced ensemble modeling with voting optimization for...

    • figshare.com
    application/csv
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongqi Yang; Binqing Xiao (2024). Feature enhanced ensemble modeling with voting optimization for credit risk assessment [Dataset]. http://doi.org/10.6084/m9.figshare.25764189.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    May 7, 2024
    Dataset provided by
    figshare
    Authors
    Dongqi Yang; Binqing Xiao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ChinaZJB dataset consists of 1,329 valid samples of SMEs after merging the non-financial behavioral information and soft information on credit rating with the financial information, loan information, and non-financial basic information found in the annual loan ledger data. Among them, 108 SMEs have default records, while 1,221 SMEs have no default records, resulting in an imbalanced ratio of approximately 1:11.Five datasets from the UC Irvine (UCI) machine-learning repository, that is, the Polish 1, Polish 2, Polish 3 , Australian, and Taiwan credit datasets, were used for robustness checks.

  20. d

    Special Survey of Orange County 2001

    • datamed.org
    • dataverse.harvard.edu
    • +2more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Special Survey of Orange County 2001 [Dataset]. https://datamed.org/display-item.php?repository=0012&id=56d4b888e4b0e644d31351be&query=PPIC
    Explore at:
    Area covered
    Orange County
    Description

    The Orange County Survey - a collaborative effort of the Public Policy Institute of California and the School of Social Ecology at the University of California, Irvine - is a special edition of the PPIC Statewide Survey. This is the first of an annual series of PPIC surveys of Orange County. The purpose of this study is to inform policymakers by providing timely, accurate, and objective information about policy preferences and economic, social, and political trends. The sample size is 2,004 Orange County adult residents.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yuan Sun (2025). UCI datasets [Dataset]. https://ieee-dataport.org/documents/uci-datasets

UCI datasets

Explore at:
Dataset updated
May 14, 2025
Authors
Yuan Sun
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

biology

Search
Clear search
Close search
Google apps
Main menu