13 datasets found

A
‘Swiss banknote conterfeit detection’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 17, 2004
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2004). ‘Swiss banknote conterfeit detection’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-swiss-banknote-conterfeit-detection-8fe6/051a31c5/?iid=001-267&v=presentation
Explore at:
Dataset updated
Jan 17, 2004
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Swiss banknote conterfeit detection’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/chrizzles/swiss-banknote-conterfeit-detection on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Will you be able to identify genuine and conterfeit banknotes, even if half of the data is conterfeit? Perfect for testing different outlier detection algorithms.

Content

The dataset includes information about the shape of the bill, as well as the label. It is made up of 200 banknotes in total, 100 for genuine/conterfeit each.

Attributes: -conterfeit: Wether a banknote is conterfeit (1) or genuine (0) -Length: Length of bill (mm) -Left: Width of left edge (mm) -Right: Width of right edge (mm) -Bottom: Bottom margin width (mm) -Top: Top margin width (mm) -Diagonal: Length of diagonal (mm)

Original Data Source

Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A practical approach. London: Chapman & Hall, Tables 1.1 and 1.2, pp. 5-8.

Applications

While it might be pretty easy for a classifier to decide wether the banknotes are conterfeit or not, what about methods using outlier detection? Classical methods of outlier detection won't work, since half of the data consist of outliers (conterfeit bills), so more robust methods will be needed.

--- Original source retains full ownership of the source dataset ---
U
Data from: Instances and computational results of: Mathematical Programming...
dataverse.unimi.it
text/x-fixed-field +2
Updated Jan 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michele Barbato; Michele Barbato; Alberto Ceselli; Alberto Ceselli (2024). Instances and computational results of: Mathematical Programming for Simultaneous Feature Selection and Outlier Detection under l1 Norm [Dataset]. http://doi.org/10.13130/RD_UNIMI/LZA4F8
Explore at:
text/x-fixed-field(91025), text/x-fixed-field(22061), tsv(792), text/x-fixed-field(136627), txt(1441), tsv(1092), text/x-fixed-field(136566), text/x-fixed-field(136580), text/x-fixed-field(182044), text/x-fixed-field(182322), text/x-fixed-field(91011), tsv(492), text/x-fixed-field(90975), text/x-fixed-field(2211), tsv(27269), text/x-fixed-field(181920), txt(4040), tsv(6227), text/x-fixed-field(2214), text/x-fixed-field(21769), text/x-fixed-field(91006), text/x-fixed-field(182152), text/x-fixed-field(182077), text/x-fixed-field(182266), tsv(6188), tsv(361), tsv(31888), text/x-fixed-field(2212), text/x-fixed-field(22078), text/x-fixed-field(136794), tsv(31813), text/x-fixed-field(2210), text/x-fixed-field(181954), text/x-fixed-field(91096), text/x-fixed-field(2220), text/x-fixed-field(136491), tsv(6291), tsv(27315), text/x-fixed-field(22081), text/x-fixed-field(136538), tsv(27314), text/x-fixed-field(136469), text/x-fixed-field(136544), text/x-fixed-field(22083), tsv(6267), text/x-fixed-field(91028), text/x-fixed-field(136585), text/x-fixed-field(90851), text/x-fixed-field(136628), text/x-fixed-field(2523), text/x-fixed-field(136671), tsv(31764), text/x-fixed-field(182024), tsv(31922), tsv(367), text/x-fixed-field(136464), text/x-fixed-field(91075), tsv(31754), text/x-fixed-field(22072), tsv(31886), tsv(23101), text/x-fixed-field(136629), text/x-fixed-field(182089), text/x-fixed-field(22076), tsv(27288), text/x-fixed-field(91050), text/x-fixed-field(136536), text/x-fixed-field(90995), tsv(32009), text/x-fixed-field(182110), txt(3128), text/x-fixed-field(182205), text/x-fixed-field(182186), text/x-fixed-field(91002), text/x-fixed-field(2216), text/x-fixed-field(22080), text/x-fixed-field(182126), text/x-fixed-field(2209), text/x-fixed-field(90832), text/x-fixed-field(182168), text/x-fixed-field(136482), text/x-fixed-field(182101), text/x-fixed-field(182438), text/x-fixed-field(2231), text/x-fixed-field(91102), text/x-fixed-field(90987), tsv(27297), text/x-fixed-field(136417), text/x-fixed-field(91086), text/x-fixed-field(22082), text/x-fixed-field(91087), text/x-fixed-field(136505), text/x-fixed-field(136633), text/x-fixed-field(181973), text/x-fixed-field(182103), text/x-fixed-field(136493), text/x-fixed-field(91101), text/x-fixed-field(91024), txt(749), text/x-fixed-field(136542), tsv(32184), tsv(6189), text/x-fixed-field(90989), text/x-fixed-field(182083), text/x-fixed-field(182100)Available download formats
Unique identifier
https://doi.org/10.13130/RD_UNIMI/LZA4F8
Dataset updated
Jan 31, 2024
Dataset provided by
UNIMI Dataverse
Authors
Michele Barbato; Michele Barbato; Alberto Ceselli; Alberto Ceselli
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains instances and computational results of "Mathematical Programming for Simultaneous Feature Selection and Outlier Detection under l1 Norm". See the included readme files for a description of the dataset content.
Cardiotocogrpahy dataset
kaggle.com
Updated Feb 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phan Viet Hoang (2020). Cardiotocogrpahy dataset [Dataset]. https://www.kaggle.com/warkingleo2000/cardiotocogrpahy-dataset/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 6, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Phan Viet Hoang
Description
Context

This is the dataset in my first assignment at Cinnamon AI Bootcamp 20. I have implemented a Gaussian Mixture Models from scracth for anomaly detection here or github repo

Content

A data set containing measurements of fetal heart rate and uterine contraction from cardiotocograms. This data set was obtained from the UCI machine learning repository

Acknowledgements

The original Cardiotocography (Cardio) dataset from UCI machine learning repository consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians. This is a classification dataset, where the classes are normal, suspect, and pathologic. For outlier detection, The normal class formed the inliers, while the pathologic (outlier) class is downsampled to 176 points. The suspect class is discarded.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
A
‘Red and White Wine Quality Analysis’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Red and White Wine Quality Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-red-and-white-wine-quality-analysis-0938/d129fe93/?iid=005-803&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Red and White Wine Quality Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/saigeethac/red-and-white-wine-quality-datasets on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Wine Quality Data Set

This data set is available in UCI at https://archive.ics.uci.edu/ml/datasets/Wine+Quality.

Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.

Data Set Information:

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Attribute Information:

Input variables (based on physicochemical tests):

fixed acidity

volatile acidity

citric acid

residual sugar

chlorides

free sulfur dioxide

total sulfur dioxide

density

pH

sulphates

alcohol

Output variable (based on sensory data):

quality (score between 0 and 10)

These columns have been described in the Kaggle Data Explorer.

Context

The authors state "we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods." We have briefly explored this aspect and see that Red wine quality prediction on the test and training datasets is almost the same (~88%) with just three features. Likewise White wine quality prediction appears to depend on just one feature. This may be due to the privacy and logistics issues mentioned by the dataset authors.

Content

Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. Both these datasets are analyzed and linear regression models are developed in Python 3. The github link provided for the source code also includes a Flask web application for deployment on the local machine or on Heroku.

Acknowledgements

Datasets: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Banner Image: Photo by Roberta Sorge on Unsplash

Github Link

Complete code has been uploaded onto github at https://github.com/saigeethachandrashekar/wine_quality.

Please clone the repo - this contains both the datasets, the code required for building and saving the model on to your local system. Code for a Flask app is provided for deploying the models on your local machine. The app can also be deployed on Heroku - the requirements.txt and Procfile are also provided for this.

Next Steps

White wine quality prediction appears to depend on just one feature. This may be due to the privacy and logistics issues mentioned by the dataset authors (e.g. there is no data about grape types, wine brand, wine selling price, etc.) or it may be due to other factors that are not clear. This is an area that might be worth exploring further.

Other ML techniques may be applied to improve the accuracy.

--- Original source retains full ownership of the source dataset ---
f
Summary of findings.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ray, Joel G.; Walker, Mark C.; Clifford, Tammy; Giffen, Randy; Janoudi, Ghayath; Fell, Deshayne B.; Foster, Angel M.; Uzun, Mara (2024). Summary of findings. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001303534
Explore at:
Dataset updated
May 22, 2024
Authors
Ray, Joel G.; Walker, Mark C.; Clifford, Tammy; Giffen, Randy; Janoudi, Ghayath; Fell, Deshayne B.; Foster, Angel M.; Uzun, Mara
Description
Clinical discoveries largely depend on dedicated clinicians and scientists to identify and pursue unique and unusual clinical encounters with patients and communicate these through case reports and case series. This process has remained essentially unchanged throughout the history of modern medicine. However, these traditional methods are inefficient, especially considering the modern-day availability of health-related data and the sophistication of computer processing. Outlier analysis has been used in various fields to uncover unique observations, including fraud detection in finance and quality control in manufacturing. We propose that clinical discovery can be formulated as an outlier problem within an augmented intelligence framework to be implemented on any health-related data. Such an augmented intelligence approach would accelerate the identification and pursuit of clinical discoveries, advancing our medical knowledge and uncovering new therapies and management approaches. We define clinical discoveries as contextual outliers measured through an information-based approach and with a novelty-based root cause. Our augmented intelligence framework has five steps: define a patient population with a desired clinical outcome, build a predictive model, identify outliers through appropriate measures, investigate outliers through domain content experts, and generate scientific hypotheses. Recognizing that the field of obstetrics can particularly benefit from this approach, as it is traditionally neglected in commercial research, we conducted a systematic review to explore how outlier analysis is implemented in obstetric research. We identified two obstetrics-related studies that assessed outliers at an aggregate level for purposes outside of clinical discovery. Our findings indicate that using outlier analysis in clinical research in obstetrics and clinical research, in general, requires further development.
e
K2-18 HARPS time-series - Dataset - B2FIND
b2find.eudat.eu
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). K2-18 HARPS time-series - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/45f01dd6-e144-5e90-8b4c-4e7bc357e9d7
Explore at:
Dataset updated
Apr 25, 2023
Description
The cross-correlation function and template matching techniques have dominated the world of precision radial velocities for many years. Recently, a new technique, named line-by-line, has been developed as an outlier resistant way to efficiently extract radial velocity content from high resolution spectra. We apply this new method to archival HARPS and CARMENES datasets of the K2-18 system. After reprocessing the HARPS dataset with the line-by-line framework, we are able to replicate the findings of previous studies. Furthermore, by splitting the full wavelength range into sub-domains, we were able to identify a systematic chromatic correlation of the radial velocities in the reprocessed CARMENES dataset. After post-processing the radial velocities to remove this correlation, as well as rejecting some outlier nights, we robustly uncover the signal of both K2-18b and K2-18c, with masses that agree with those found from our analysis of the HARPS dataset. We then combine both the HARPS and CARMENES velocities to refine the parameters of both planets, notably resulting in a revised mass and period for K2-18c of 6.99+/-0.97 Earth masses and 9.2072+/-0.0065d, respectively. Our work thoroughly demonstrates the power of the line-by-line technique for the extraction of precision radial velocity information.
f
Eligibility Criteria for the Systematic Review.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghayath Janoudi; Mara Uzun (Rada); Deshayne B. Fell; Joel G. Ray; Angel M. Foster; Randy Giffen; Tammy Clifford; Mark C. Walker (2024). Eligibility Criteria for the Systematic Review. [Dataset]. http://doi.org/10.1371/journal.pdig.0000515.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000515.t002
Dataset updated
May 22, 2024
Dataset provided by
PLOS Digital Health
Authors
Ghayath Janoudi; Mara Uzun (Rada); Deshayne B. Fell; Joel G. Ray; Angel M. Foster; Randy Giffen; Tammy Clifford; Mark C. Walker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clinical discoveries largely depend on dedicated clinicians and scientists to identify and pursue unique and unusual clinical encounters with patients and communicate these through case reports and case series. This process has remained essentially unchanged throughout the history of modern medicine. However, these traditional methods are inefficient, especially considering the modern-day availability of health-related data and the sophistication of computer processing. Outlier analysis has been used in various fields to uncover unique observations, including fraud detection in finance and quality control in manufacturing. We propose that clinical discovery can be formulated as an outlier problem within an augmented intelligence framework to be implemented on any health-related data. Such an augmented intelligence approach would accelerate the identification and pursuit of clinical discoveries, advancing our medical knowledge and uncovering new therapies and management approaches. We define clinical discoveries as contextual outliers measured through an information-based approach and with a novelty-based root cause. Our augmented intelligence framework has five steps: define a patient population with a desired clinical outcome, build a predictive model, identify outliers through appropriate measures, investigate outliers through domain content experts, and generate scientific hypotheses. Recognizing that the field of obstetrics can particularly benefit from this approach, as it is traditionally neglected in commercial research, we conducted a systematic review to explore how outlier analysis is implemented in obstetric research. We identified two obstetrics-related studies that assessed outliers at an aggregate level for purposes outside of clinical discovery. Our findings indicate that using outlier analysis in clinical research in obstetrics and clinical research, in general, requires further development.
Data to accompany the outlier-waveform-detection Github repository (internal...
zenodo.org
bin, txt
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karin Cox; Karin Cox; Daisuke Kase; Daisuke Kase; Robert Turner; Robert Turner (2024). Data to accompany the outlier-waveform-detection Github repository (internal globus pallidus, GPi) [Dataset]. http://doi.org/10.5281/zenodo.11077189
Explore at:
bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11077189
Dataset updated
May 23, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Karin Cox; Karin Cox; Daisuke Kase; Daisuke Kase; Robert Turner; Robert Turner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains data based on neuronal recordings from two monkeys (G and I, in the pre- and post-MPTP states) that serve as input to the code provided at https://github.com/turner-lab-pitt/outlier-waveform-detection. Text files located within that Github repository provide detailed instructions on how these data may be used with that code. As described in those text files, extra data are provided for Monkey G, in the pre-MPTP state.

The data-description.txt file provides detailed information regarding the contents of each zipped tar archive. Briefly, the most important components of the files are the "snips" (individual spike waveforms) from the two monkeys and MPTP states, as extracted for each of a series of single sorted units from the internal globus pallidus (GPi). The additional G-Pre data provides examples of the high-pass filtered voltage signals from which these snips were extracted. All data are stored in the Matlab .mat format.

All zipped files can be decompressed with 7-zip: https://www.7-zip.org/

These data and the associated Github code were used for analyses reported in an in-preparation manuscript (Kase et al., "Movement-related activity in the internal globus pallidus of the parkinsonian macaque"), and also with a preprint that is currently under review:

Detecting rhythmic spiking through the power spectra of point process model residuals

Karin M. Cox, Daisuke Kase, Taieb Znati, Robert S. Turner

bioRxiv 2023.09.08.556120; doi: https://doi.org/10.1101/2023.09.08.556120

This research was funded in part by Aligning Science Across Parkinson's [ASAP-020519] through the Michael J. Fox Foundation for Parkinson's Research (MJFF). For the purpose of open access, the authors have applied a Creative Commons Attribution 4.0 International (CC BY) public copyright license to this dataset.
d
Gila Trout neutral and outlier SNP genotype matrices
datadryad.org
search.dataone.org
+2more
zip
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Camak; Megan Osborne; Thomas Turner, Gila Trout neutral and outlier SNP genotype matrices [Dataset]. http://doi.org/10.5061/dryad.g79cnp5pk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.g79cnp5pk
Dataset provided by
Dryad
Authors
David Camak; Megan Osborne; Thomas Turner
Time period covered
Apr 8, 2021
Description
Genotypes are coded as 0, 1, or 2 depending on the number of alternate alleles each individual is contained (0 = homozygous reference allele, 1 = heterozygotes, 2 = homozygous alternate allele). CHROM ID's refer to RefSeq sequence ID's of Oncorhynchus gilae (Rainbow Trout) found at https://www.ncbi.nlm.nih.gov/assembly/GCF_002163495.1
Abnormal High-density Crowds
kaggle.com
Updated Dec 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samar Mahmoud (2019). Abnormal High-density Crowds [Dataset]. https://www.kaggle.com/angelchi56/abnormal-highdensity-crowds/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Samar Mahmoud
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The availability of a benchmark dataset of high-density crowd footage is very limited. Furthermore, a dataset of high-density crowd footage that includes anomalous behaviours such as stampedes, overcrowding, violence, and panic, is not available. To solve this, public footage, which adheres to these constraints, has been collected.

Content

Footage of four events has been collected: Times Square Chaos, Las Vegas Mass Shooting, Love Parade disaster and Juventus fan panic. All footage was obtained from Youtube.

Acknowledgements

I would like to thank my supervisors, for the patient guidance, encouragement and advice they constantly provide.

Inspiration

With this data, what are the best algorithms to detect anomalous behaviour within a high-density crowd? Are low/medium density crowd anomaly detection algorithms applicable to more dense crowds?
b
Outliers - Catalogue, Press release and private view invite for exhibition
data.bathspa.ac.uk
pdf
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosemary Snell (2023). Outliers - Catalogue, Press release and private view invite for exhibition [Dataset]. http://doi.org/10.17870/bathspa.11537910.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.17870/bathspa.11537910.v1
Dataset updated
Jun 1, 2023
Dataset provided by
BathSPAdata
Authors
Rosemary Snell
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
Outliers is a research project articulated through a solo exhibition held at No 20 Arts in London. It contained a body of 26 works including paintings, drawings and photographs that were the culmination of a research trip to Greenland. This body of work aimed to explore how the medium of paint could be manipulated to not only represent the dramatic and transient nature of the icescapes of Greenland but to also emulate and explore the properties of snow and ice themselves. This item contains the catalogue, press release and invite from the exhibition.All content included on this item courtesy and copyright of No. 20 Arts, London. Used with permission. The work is under copyright and may not be used without permission. Use of this repository acknowledges cooperation with its policies and relevant copyright law.
e
Wealth and Assets Survey, Waves 1-5 and Rounds 5-7, 2006-2020: Secure Access...
b2find.eudat.eu
Updated Oct 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Wealth and Assets Survey, Waves 1-5 and Rounds 5-7, 2006-2020: Secure Access - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/60139fe5-3862-5bae-853e-764863b1e3ff
Explore at:
Dataset updated
Oct 28, 2023
Description
Abstract copyright UK Data Service and data collection copyright owner. The Wealth and Assets Survey (WAS) is a longitudinal survey, which aims to address gaps identified in data about the economic well-being of households by gathering information on level of assets, savings and debt; saving for retirement; how wealth is distributed among households or individuals; and factors that affect financial planning. Private households in Great Britain were sampled for the survey (meaning that people in residential institutions, such as retirement homes, nursing homes, prisons, barracks or university halls of residence, and also homeless people were not included).The WAS commenced in July 2006, with a first wave of interviews carried out over two years, to June 2008. Interviews were achieved with 30,595 households at Wave 1. Those households were approached again for a Wave 2 interview between July 2008 and June 2010, and 20,170 households took part. Wave 3 covered July 2010 - June 2012, Wave 4 covered July 2012 - June 2014 and Wave 5 covered July 2014 - June 2016. Revisions to previous waves' data mean that small differences may occur between originally published estimates and estimates from the datasets held by the UK Data Service. These revisions are due to improvements in the imputation methodology.Note from the WAS team - November 2023:"The Office for National Statistics has identified a very small number of outlier cases present in the seventh round of the Wealth and Assets Survey covering the period April 2018 to March 2020. Our current approach is to treat cases where we have reasonable evidence to suggest the values provided for specific variables are outliers. This approach did not occur for two individuals for several variables involved in the estimation of their pension wealth. While we estimate any impacts are very small overall and median pension wealth and median total wealth estimates are unaffected, this will affect the accuracy of the breakdowns of the pension wealth within the wealthiest decile, and data derived from them. We are urging caution in the interpretation of more detailed estimates."Survey Periodicity - "Waves" to "Rounds"Due to the survey periodicity moving from "Waves" (July, ending in June two years later) to “Rounds” (April, ending in March two years later), interviews using the ‘Wave 6’ questionnaire started in July 2016 and were conducted for 21 months, finishing in March 2018. Data for round 6 covers the period April 2016 to March 2018. This comprises of the last three months of Wave 5 (April to June 2016) and 21 months of Wave 6 (July 2016 to March 2018). Round 5 and Round 6 datasets are based on a mixture of original wave-based datasets. Each wave of the survey has a unique questionnaire and therefore each of these round-based datasets are based on two questionnaires. While there may be some changes in the questionnaires, the derived variables for the key wealth estimates have not changed over this period. The aim is to collect the same data, though in some cases the exact questions asked may differ slightly. Detailed information on Moving the Wealth and Assets Survey onto a financial years’ basis was published on the ONS website in July 2019.Further information and documentation may be found on the ONS Wealth and Assets Survey webpage. Users are advised to the check the page for updates before commencing analysis.Users should note that issues with linking have been reported and the WAS team are currently investigating.Secure Access WAS dataThe Secure Access version of the WAS includes additional, detailed geographical variables not included in the End User Licence (EUL) version (SN 7215). These include:WardsParliamentary Constituency Areas for Wave 1 onlyCensus Output AreasLower Layer Super Output AreasLocal AuthoritiesLocal Education AuthoritiesProspective users of the Secure Access version of the WAS will need to fulfil additional requirements, including completion of face-to-face training, and agreement to the Secure Access User Agreement and Licence Compliance Policy, in order to obtain permission to use that version (see 'Access' section below). Users are therefore strongly encouraged to download the EUL version (SN 7215) to see if it contains sufficient detail for their needs, before considering making an application for the Secure Access version.Latest Edition InformationFor the ninth edition (October 2022), the Round 7 person and household data have been updated. The Round 7 Wave 1 Variable Catalogue Excel file has also been updated. Main Topics: The WAS questionnaire was divided into two parts with all adults aged 16 years and over (excluding those aged 16 to 18 currently in full-time education) being interviewed in each responding household. Household schedule: This was completed by one person in the household (usually the head of household or their partner) and predominantly collected household level information such as the number, demographics and relationship of individuals to each other, as well as information about the ownership, value and mortgages on the residence and other household assets. Individual schedule: This was given to each adult in the household and asked questions about economic status, education and employment, business assets, benefits and tax credits, saving attitudes and behaviour, attitudes to debt, insolvency, major items of expenditure, retirement, attitudes to saving for retirement, pensions, financial assets, non-mortgage debt, investments and other income. Multi-stage stratified random sample Face-to-face interview 2006 2020 ADOPTION PAY AGE AIRCRAFT ALIMONY ASSETS ATTITUDES TO SAVING BANK ACCOUNTS BEDROOMS BICYCLES BOATS BONDS BUSINESS OWNERSHIP BUSINESS RECORDS BUSINESSES CARAVANS CARE OF DEPENDANTS CARERS BENEFITS CARS CHILD BENEFITS CHILD SUPPORT PAYMENTS CHILD TRUST FUNDS COHABITING COMMERCIAL BUILDINGS COST OF LIVING COSTS CREDIT CARD USE DEBILITATIVE ILLNESS DEBTS DISABILITIES EARLY RETIREMENT ECONOMIC ACTIVITY EDUCATIONAL BACKGROUND EDUCATIONAL COURSES EDUCATIONAL FEES EDUCATIONAL GRANTS EDUCATIONAL STATUS EMPLOYEES EMPLOYMENT EMPLOYMENT HISTORY EMPLOYMENT PROGRAMMES ENDOWMENT ASSURANCE ESTATES ETHNIC GROUPS EXPENDITURE FAMILY BENEFITS FAMILY INCOME FAMILY MEMBERS FINANCIAL ADVICE FINANCIAL COMPENSATION FINANCIAL DIFFICULTIES FINANCIAL SERVICES FREQUENCY OF PAY FRINGE BENEFITS FULL TIME EMPLOYMENT FURNISHED ACCOMMODA... GENDER GIFTS Great Britain HEALTH HEALTH STATUS HIRE PURCHASE HOME BUILDINGS INSU... HOME BUYING HOME CONTENTS INSUR... HOME OWNERSHIP HOUSE PRICES HOUSEHOLD BUDGETS HOUSEHOLD HEAD S EC... HOUSEHOLD HEAD S SO... HOUSEHOLD INCOME HOUSEHOLDERS HOUSEHOLDS HOUSING HOUSING AGE HOUSING ECONOMICS HOUSING FINANCE HOUSING TENURE ILL HEALTH INCOME INCOME TAX INCONTINENCE INFORMAL CARE INHERITANCE INSOLVENCIES INSURANCE CLAIMS INTELLECTUAL IMPAIR... INTEREST FINANCE INVESTMENT Income JOB HUNTING JOB SEEKER S ALLOWANCE LAND OWNERSHIP LAND VALUE LANDLORDS LIFE INSURANCE LOANS Labour and employment MAIL ORDER SERVICES MARITAL STATUS MATERNITY BENEFITS MATERNITY PAY MATHEMATICS MOBILE HOMES MORTGAGE ARREARS MORTGAGE PROTECTION... MORTGAGES MOTOR VEHICLE VALUE MOTOR VEHICLES MOTORCYCLES OCCUPATIONAL PENSIONS OCCUPATIONAL QUALIF... OCCUPATIONS OLD AGE BENEFITS ONE PARENT FAMILIES OVERDRAFTS PART TIME EMPLOYMENT PARTNERSHIPS BUSINESS PATERNITY BENEFITS PATERNITY PAY PENSION BENEFITS PENSION CONTRIBUTIONS PENSIONS PERSONAL DEBT REPAY... PERSONAL FINANCE MA... PHYSICAL MOBILITY PLACE OF BIRTH PRIVATE PENSIONS PRIVATE PERSONAL PE... PROFIT SHARING PROFITS QUALIFICATIONS REDUNDANCY PAY RELIGIOUS AFFILIATION RELIGIOUS ATTENDANCE RENTED ACCOMMODATION RENTS RESIDENTIAL BUILDINGS RETIREMENT RETIREMENT AGE ROYALTIES SAVINGS SAVINGS ACCOUNTS AN... SECOND HOMES SELF EMPLOYED SELLING SHARED HOME OWNERSHIP SHARES SICK PAY SICKNESS AND DISABI... SOCIAL HOUSING SOCIAL SECURITY SOCIAL SECURITY BEN... SOCIO ECONOMIC STATUS SPOUSES STAKEHOLDER PENSIONS STATE RETIREMENT PE... STATUS IN EMPLOYMENT STUDENT LOANS SUBSIDIARY EMPLOYMENT SUPERVISORY STATUS SURVIVORS BENEFITS TAX RELIEF TAXATION TENANTS HOME PURCHA... TIED HOUSING TOP MANAGEMENT TRANSPORT FARES TRUSTS UNEARNED INCOME UNEMPLOYED UNFURNISHED ACCOMMO... UNWAGED WORKERS WAGES WAR VETERANS BENEFITS WEALTH WILLS WINNINGS WORKPLACE property and invest...
combined wine data
kaggle.com
Updated Nov 25, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siyuan H (2017). combined wine data [Dataset]. https://www.kaggle.com/datasets/siyuanh/combined-wine-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Siyuan H
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

Title: Wine Quality Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 Past Usage:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure). Relevant Information:

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods. Number of Instances: red wine - 1599; white wine - 4898. Number of Attributes: 11 + output attribute

Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection. Attribute information:

For more information, read [Cortez et al., 2009].

Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10) Missing Attribute Values: None Description of attributes:

1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)

2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste

3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines

4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet

5 - chlorides: the amount of salt in the wine

6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine

7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine

8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content

9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant

11 - alcohol: the percent alcohol content of the wine

Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2004). ‘Swiss banknote conterfeit detection’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-swiss-banknote-conterfeit-detection-8fe6/051a31c5/?iid=001-267&v=presentation

‘Swiss banknote conterfeit detection’ analyzed by Analyst-2

Explore at:

Dataset updated

Jan 17, 2004

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Swiss banknote conterfeit detection’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/chrizzles/swiss-banknote-conterfeit-detection on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Will you be able to identify genuine and conterfeit banknotes, even if half of the data is conterfeit? Perfect for testing different outlier detection algorithms.

Content

The dataset includes information about the shape of the bill, as well as the label. It is made up of 200 banknotes in total, 100 for genuine/conterfeit each.

Attributes: -conterfeit: Wether a banknote is conterfeit (1) or genuine (0) -Length: Length of bill (mm) -Left: Width of left edge (mm) -Right: Width of right edge (mm) -Bottom: Bottom margin width (mm) -Top: Top margin width (mm) -Diagonal: Length of diagonal (mm)

Original Data Source

Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A practical approach. London: Chapman & Hall, Tables 1.1 and 1.2, pp. 5-8.

Applications

While it might be pretty easy for a classifier to decide wether the banknotes are conterfeit or not, what about methods using outlier detection? Classical methods of outlier detection won't work, since half of the data consist of outliers (conterfeit bills), so more robust methods will be needed.

--- Original source retains full ownership of the source dataset ---

Clear search

Close search

Google apps

Main menu

‘Swiss banknote conterfeit detection’ analyzed by Analyst-2

Context

Content

Original Data Source

Applications

Data from: Instances and computational results of: Mathematical Programming...

Cardiotocogrpahy dataset

Context

Content

Acknowledgements

Inspiration

‘Red and White Wine Quality Analysis’ analyzed by Analyst-2

Wine Quality Data Set

Data Set Information:

Attribute Information:

Context

Content

Acknowledgements

Github Link

Next Steps

Summary of findings.

K2-18 HARPS time-series - Dataset - B2FIND

Eligibility Criteria for the Systematic Review.

Data to accompany the outlier-waveform-detection Github repository (internal...

Gila Trout neutral and outlier SNP genotype matrices

Abnormal High-density Crowds

Context

Content

Acknowledgements

Inspiration

Outliers - Catalogue, Press release and private view invite for exhibition

Wealth and Assets Survey, Waves 1-5 and Rounds 5-7, 2006-2020: Secure Access...

combined wine data

‘Swiss banknote conterfeit detection’ analyzed by Analyst-2

Context

Content

Original Data Source

Applications