Facebook
TwitterCollection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.
There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year
Some missing values are present there also,
You can analyse the as per your requirement
Facebook
Twitterhttps://networkrepository.com/policy.phphttps://networkrepository.com/policy.php
Facebook social network - A social friendship network extracted from Facebook consisting of people (nodes) with edges representing friendship ties.
Facebook
TwitterData Description
This data comes from the UC Irvine Machine Learning Repository. It has been preprocessed to only contain reviews at least 13 or more words in length. The raw data for this specific dataset can be found here. The base UCI ML url can be found here.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Cuff-Less Blood Pressure Estimation Dataset [2] from the UCI Machine Learning Repository. It is a subset of the MIMIC-II Waveform Dataset that contains 12000 records of simultaneous PPG and ABP from 942 patients with a sampling rate of 125 Hz. The 12000 records were uniformly split into four parts with 3000 records each. However, as the subject information is lacking, the Hold-one-out strategy was utilized to generate training, validation, and test sets once the data was preprocessed. In the end, the UCI dataset had 291,078 segments, which was around 404 hours of recording, making it substantially the biggest data set with a considerably higher ratio of continuous segments per record (32.15).
[2] Kachuee, M., Kiani, M. M., Mohammadzade, H. & Shabany, M. Cuff-less blood pressure estimation data set (2015). UCI repository https://archive.ics.uci.edu/ml/datasets/Cuff-Less+Blood+Pressure+Estimation.
Facebook
TwitterThis data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set. This data set is populated by capturing user ratings from Google reviews. Reviews on attractions from 24 categories across Europe are considered. Google user rating ranges from 1 to 5 and average user rating per category is calculated.
Attribute 1 : Unique user id Attribute 2 : Average ratings on churches Attribute 3 : Average ratings on resorts Attribute 4 : Average ratings on beaches Attribute 5 : Average ratings on parks Attribute 6 : Average ratings on theatres Attribute 7 : Average ratings on museums Attribute 8 : Average ratings on malls Attribute 9 : Average ratings on zoo Attribute 10 : Average ratings on restaurants Attribute 11 : Average ratings on pubs/bars Attribute 12 : Average ratings on local services Attribute 13 : Average ratings on burger/pizza shops Attribute 14 : Average ratings on hotels/other lodgings Attribute 15 : Average ratings on juice bars Attribute 16 : Average ratings on art galleries Attribute 17 : Average ratings on dance clubs Attribute 18 : Average ratings on swimming pools Attribute 19 : Average ratings on gyms Attribute 20 : Average ratings on bakeries Attribute 21 : Average ratings on beauty & spas Attribute 22 : Average ratings on cafes Attribute 23 : Average ratings on view points Attribute 24 : Average ratings on monuments Attribute 25 : Average ratings on gardens
This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set
The UCI page mentions the following publication as the original source of the data set: Renjith, Shini, A. Sreekumar, and M. Jathavedan. 2018. Evaluation of Partitioning Clustering Algorithms for Processing Social Media Data in Tourism Domain. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 12731. IEEE
I'm kind of people who love traveling. But sometimes I've problems like where should I visit? Are there somewhere interesting places matched with my lifestyle? Often I spent hours to search for interesting place to go out. Such a waste of time.
What if we can build a recommender system which can recommend you several interesting venue based on your preferences. With information from Google review, I'll try to divide Google review user into cluster of similar interest for further work of building recommender system based on thier preference.
Facebook
TwitterThe dataset used is US Census data which is an extraction of the 1994 census data which was donated to the UC Irvine’s Machine Learning Repository. The data contains approximately 32,000 observations with over 15 variables. The dataset was downloaded from: http://archive.ics.uci.edu/ml/datasets/Adult. The dependent variable in our analysis will be income level and who earns above $50,000 a year using SQL queries, Proportion Analysis using bar charts and Simple Decision Tree to understand the important variables and their influence on prediction.
Facebook
Twitterhttps://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Dataset Details
1.Dataset Loading:
Initially, we load the Drug Review Dataset from the UC Irvine Machine Learning Repository. This dataset contains patient reviews of different drugs, along with the medical condition being treated and the patients' satisfaction ratings.
2.Data Preprocessing:
The dataset is preprocessed to ensure data integrity and consistency. We handle missing values and ensure that each patient ID is unique across the dataset.
3.Text… See the full description on the dataset page: https://huggingface.co/datasets/Mouwiya/drug-reviews.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Oxford Parkinson's Disease Detection Dataset UCI Machine Learning Repository
dataset's original link : https://archive.ics.uci.edu/dataset/174/parkinsons
Dataset Characteristics Multivariate
Subject Area Health and Medicine
Associated Tasks Classification
Feature Type Real
197
22
Dataset Information Additional Information
This dataset is composed of a range of biomedical voice measurements from 31 people, 23 with Parkinson's disease (PD). Each column in the table is a particular voice measure, and each row corresponds one of 195 voice recording from these individuals ("name" column). The main aim of the data is to discriminate healthy people from those with PD, according to "status" column which is set to 0 for healthy and 1 for PD.
The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around six recordings per patient, the name of the patient is identified in the first column.For further information or to pass on comments, please contact Max Little (littlem '@' robots.ox.ac.uk).
Further details are contained in the following reference -- if you use this dataset, please cite: Max A. Little, Patrick E. McSharry, Eric J. Hunter, Lorraine O. Ramig (2008), 'Suitability of dysphonia measurements for telemonitoring of Parkinson's disease', IEEE Transactions on Biomedical Engineering (to appear).
Has Missing Values?
No
Facebook
TwitterThe columns included in this dataset are:
ID : the numeric ID of the article
TITLE : the headline of the article
URL : the URL of the article
PUBLISHER : the publisher of the article
CATEGORY : the category of the news item; one of: -- b : business -- t : science and technology -- e : entertainment -- m : health
STORY : alphanumeric ID of the news story that the article discusses
HOSTNAME : hostname where the article was posted
TIMESTAMP : approximate timestamp of the article's publication, given in Unix time (seconds since midnight on Jan 1, 1970)
MAIN_CONTENT: article's content
MAIN_CONTENT_LEN: length of main_content
This dataset comes from the UCI Machine Learning Repository. Any publications that use this data should cite the repository as follows:
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
This specific dataset can be found in the UCI ML Repository at this URL
Facebook
TwitterANTswers is an experimental chatbot that can answer questions about the UC Irvine Libraries. ANTswers is a web-based application, run on a remote library server and is accessed through a web interface page. ANTswers’ personality and persona is based on the UCI mascot, Peter the Anteater. ANTswers responds to simple and short questions. The first link in a response opens in a preview window, all other links open in a new window. Each transaction is reviewed and a data form is filled out to track usage; such as date, time, answer rate, etc.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to
this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).
Facebook
TwitterThe Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. I encountered it during my course, and I wish to share it here because it is a good starter example for data pre-processing and machine learning practices.
Fields
The dataset contains 16 columns Target filed: Income -- The income is divide into two classes: 50K Number of attributes: 14 -- These are the demographics and other features to describe a person
We can explore the possibility in predicting income level based on the individual’s personal information.
Acknowledgements
This dataset named “adult” is found in the UCI machine learning repository
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a cleaned version of this dataset from UCI machine learning repository on credit card approvals.
Missing values have been filled and feature names and categorical names have been inferred, resulting in more context and it being easier to use.
Your task is to predict which people in the dataset are successful in applying for a credit card.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Three datasets from the UC Irvine (UCI) machine learning repository, that is, the Australian, German, and Japanese datasets, were adopted for the current study. The Australian credit dataset contains 690 samples, of which 307 are positive and 383 are negative. The dimensions of its input features are 15. The German credit dataset contains 1000 samples, 700 of which are positive and 300 are negative. The dimensions of its input features are 21. The Japanese credit dataset contains 690 samples, of which 383 are positive and 307 are negative. The dimensions of its input features are 16.
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for [Dataset Name]
Dataset Summary
The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
English
Dataset Structure
Data Instances
[More Information… See the full description on the dataset page: https://huggingface.co/datasets/ucirvine/sms_spam.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this experiment, the datasets are from the UC Irvine (UCI) UCI machine learning repository (Zięba et al., 2016), which contains the financial indicators of Polish manufacturing corporates from 2007 to 2011 in the real world. The datasets were separated into five parts (each part represents each fiscal year) that describe the period from the 1st year (2007 fiscal year) to the 5th year (2011 fiscal year), which corresponds to five different bankruptcy cycles. The class labels (“0” is operating and “1” is bankruptcy) of the datasets are determined by the bankruptcy status of the enterprise in 2012. Furthermore, the Creator dataset from the real world that was published by a Chinese intelligent government services provider called Creator Information Technology Co., Ltd in 2019 was also adopted. The Creator dataset includes company management information of 35960 Chinese companies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery
This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.
The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.
The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.
Dataset References
Facebook
TwitterThe dataset used can be found on the UCI Machine Learning Repository at the following location:
There are several copies of this dataset to be found on Kaggle, with people focusing on different types of analyses of the data. This specific copy can be analysed by anyone interested, but is primarily used by a study group from the Udacity Bertelsmann Technology Scholarship to practice analysis of association between variables as well as implementation and comparison of various Machine Learning models.
According to the paper by (Detrano et al., 1989) as found on the UCI Dataset webpage, the data represents data collected for 303 patients referred for coronary angiography at the Cleveland Clinic between May 1981 and September 1984. The 13 independent/ features variables can be divided into 3 groups as follows:
Routine evaluation (based on historical data):
Non-invasive test data (informed consent obtained for data as part of research protocol):
Other demographic and clinical variables (based on routine data):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3632459%2Fa01747fb0158dc51c12bc0824c9c4ae4%2Fdata_dictionary2.png?generation=1609522473018549&alt=media" alt="">
UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. Donor:
David W. Aha (aha '@' ics.uci.edu) (714) 856-8779
The objective of the analysis is to use statistical learning to identify factors associated with Coronary Artery Disease as indicated by a coronary angiography interpreted by a Cardiologist (as per paper written by Detrano et al cited before).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Nine datasets from the UC Irvine (UCI) machine learning repository, i.e., the Australian, Japanese, German (Asuncion & Newman, 2007), Taiwan (Yeh & Lien, 2009) and Polish credit datasets (Zięba et al., 2016) were adopted for the current study. The Polish credit datasets contain five datasets distinguished five classification cases that depend on the forecasting period (e.g., the Polish 1, the Polish 2, the Polish 3, the Polish 4 and the Polish 5). AER credit dataset (Greene, 2003), which is a credit card dataset for econometric analysis. Creator dataset, which is published in 2019 by a Chinese digital government services provider named Creator Information Technology Co., Ltd[1]. The Creator dataset contains the property rights, financial statements, and basic company information of 35960 Chinese companies.
Facebook
TwitterCollection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given