13 datasets found
  1. A

    ‘Swiss banknote conterfeit detection’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 17, 2004
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2004). ‘Swiss banknote conterfeit detection’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-swiss-banknote-conterfeit-detection-8fe6/051a31c5/?iid=001-267&v=presentation
    Explore at:
    Dataset updated
    Jan 17, 2004
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Swiss banknote conterfeit detection’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/chrizzles/swiss-banknote-conterfeit-detection on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Will you be able to identify genuine and conterfeit banknotes, even if half of the data is conterfeit? Perfect for testing different outlier detection algorithms.

    Content

    The dataset includes information about the shape of the bill, as well as the label. It is made up of 200 banknotes in total, 100 for genuine/conterfeit each.

    Attributes: -conterfeit: Wether a banknote is conterfeit (1) or genuine (0) -Length: Length of bill (mm) -Left: Width of left edge (mm) -Right: Width of right edge (mm) -Bottom: Bottom margin width (mm) -Top: Top margin width (mm) -Diagonal: Length of diagonal (mm)

    Original Data Source

    Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A practical approach. London: Chapman & Hall, Tables 1.1 and 1.2, pp. 5-8.

    Applications

    While it might be pretty easy for a classifier to decide wether the banknotes are conterfeit or not, what about methods using outlier detection? Classical methods of outlier detection won't work, since half of the data consist of outliers (conterfeit bills), so more robust methods will be needed.

    --- Original source retains full ownership of the source dataset ---

  2. U

    Data from: Instances and computational results of: Mathematical Programming...

    • dataverse.unimi.it
    text/x-fixed-field +2
    Updated Jan 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michele Barbato; Michele Barbato; Alberto Ceselli; Alberto Ceselli (2024). Instances and computational results of: Mathematical Programming for Simultaneous Feature Selection and Outlier Detection under l1 Norm [Dataset]. http://doi.org/10.13130/RD_UNIMI/LZA4F8
    Explore at:
    text/x-fixed-field(91025), text/x-fixed-field(22061), tsv(792), text/x-fixed-field(136627), txt(1441), tsv(1092), text/x-fixed-field(136566), text/x-fixed-field(136580), text/x-fixed-field(182044), text/x-fixed-field(182322), text/x-fixed-field(91011), tsv(492), text/x-fixed-field(90975), text/x-fixed-field(2211), tsv(27269), text/x-fixed-field(181920), txt(4040), tsv(6227), text/x-fixed-field(2214), text/x-fixed-field(21769), text/x-fixed-field(91006), text/x-fixed-field(182152), text/x-fixed-field(182077), text/x-fixed-field(182266), tsv(6188), tsv(361), tsv(31888), text/x-fixed-field(2212), text/x-fixed-field(22078), text/x-fixed-field(136794), tsv(31813), text/x-fixed-field(2210), text/x-fixed-field(181954), text/x-fixed-field(91096), text/x-fixed-field(2220), text/x-fixed-field(136491), tsv(6291), tsv(27315), text/x-fixed-field(22081), text/x-fixed-field(136538), tsv(27314), text/x-fixed-field(136469), text/x-fixed-field(136544), text/x-fixed-field(22083), tsv(6267), text/x-fixed-field(91028), text/x-fixed-field(136585), text/x-fixed-field(90851), text/x-fixed-field(136628), text/x-fixed-field(2523), text/x-fixed-field(136671), tsv(31764), text/x-fixed-field(182024), tsv(31922), tsv(367), text/x-fixed-field(136464), text/x-fixed-field(91075), tsv(31754), text/x-fixed-field(22072), tsv(31886), tsv(23101), text/x-fixed-field(136629), text/x-fixed-field(182089), text/x-fixed-field(22076), tsv(27288), text/x-fixed-field(91050), text/x-fixed-field(136536), text/x-fixed-field(90995), tsv(32009), text/x-fixed-field(182110), txt(3128), text/x-fixed-field(182205), text/x-fixed-field(182186), text/x-fixed-field(91002), text/x-fixed-field(2216), text/x-fixed-field(22080), text/x-fixed-field(182126), text/x-fixed-field(2209), text/x-fixed-field(90832), text/x-fixed-field(182168), text/x-fixed-field(136482), text/x-fixed-field(182101), text/x-fixed-field(182438), text/x-fixed-field(2231), text/x-fixed-field(91102), text/x-fixed-field(90987), tsv(27297), text/x-fixed-field(136417), text/x-fixed-field(91086), text/x-fixed-field(22082), text/x-fixed-field(91087), text/x-fixed-field(136505), text/x-fixed-field(136633), text/x-fixed-field(181973), text/x-fixed-field(182103), text/x-fixed-field(136493), text/x-fixed-field(91101), text/x-fixed-field(91024), txt(749), text/x-fixed-field(136542), tsv(32184), tsv(6189), text/x-fixed-field(90989), text/x-fixed-field(182083), text/x-fixed-field(182100)Available download formats
    Dataset updated
    Jan 31, 2024
    Dataset provided by
    UNIMI Dataverse
    Authors
    Michele Barbato; Michele Barbato; Alberto Ceselli; Alberto Ceselli
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains instances and computational results of "Mathematical Programming for Simultaneous Feature Selection and Outlier Detection under l1 Norm". See the included readme files for a description of the dataset content.

  3. Cardiotocogrpahy dataset

    • kaggle.com
    Updated Feb 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phan Viet Hoang (2020). Cardiotocogrpahy dataset [Dataset]. https://www.kaggle.com/warkingleo2000/cardiotocogrpahy-dataset/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Phan Viet Hoang
    Description

    Context

    This is the dataset in my first assignment at Cinnamon AI Bootcamp 20. I have implemented a Gaussian Mixture Models from scracth for anomaly detection here or github repo

    Content

    A data set containing measurements of fetal heart rate and uterine contraction from cardiotocograms. This data set was obtained from the UCI machine learning repository

    Acknowledgements

    The original Cardiotocography (Cardio) dataset from UCI machine learning repository consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians. This is a classification dataset, where the classes are normal, suspect, and pathologic. For outlier detection, The normal class formed the inliers, while the pathologic (outlier) class is downsampled to 176 points. The suspect class is discarded.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  4. A

    ‘Red and White Wine Quality Analysis’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Red and White Wine Quality Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-red-and-white-wine-quality-analysis-0938/d129fe93/?iid=005-803&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Red and White Wine Quality Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/saigeethac/red-and-white-wine-quality-datasets on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Wine Quality Data Set

    This data set is available in UCI at https://archive.ics.uci.edu/ml/datasets/Wine+Quality.

    Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.

    Data Set Information:

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

    Attribute Information:

    Input variables (based on physicochemical tests):

    1. fixed acidity
    2. volatile acidity
    3. citric acid
    4. residual sugar
    5. chlorides
    6. free sulfur dioxide
    7. total sulfur dioxide
    8. density
    9. pH
    10. sulphates
    11. alcohol

    Output variable (based on sensory data):

    1. quality (score between 0 and 10)

    These columns have been described in the Kaggle Data Explorer.

    Context

    The authors state "we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods." We have briefly explored this aspect and see that Red wine quality prediction on the test and training datasets is almost the same (~88%) with just three features. Likewise White wine quality prediction appears to depend on just one feature. This may be due to the privacy and logistics issues mentioned by the dataset authors.

    Content

    Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. Both these datasets are analyzed and linear regression models are developed in Python 3. The github link provided for the source code also includes a Flask web application for deployment on the local machine or on Heroku.

    Acknowledgements

    Datasets: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

    Banner Image: Photo by Roberta Sorge on Unsplash

    Github Link

    Complete code has been uploaded onto github at https://github.com/saigeethachandrashekar/wine_quality.

    Please clone the repo - this contains both the datasets, the code required for building and saving the model on to your local system. Code for a Flask app is provided for deploying the models on your local machine. The app can also be deployed on Heroku - the requirements.txt and Procfile are also provided for this.

    Next Steps

    1. White wine quality prediction appears to depend on just one feature. This may be due to the privacy and logistics issues mentioned by the dataset authors (e.g. there is no data about grape types, wine brand, wine selling price, etc.) or it may be due to other factors that are not clear. This is an area that might be worth exploring further.

    2. Other ML techniques may be applied to improve the accuracy.

    --- Original source retains full ownership of the source dataset ---

  5. f

    Summary of findings.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ray, Joel G.; Walker, Mark C.; Clifford, Tammy; Giffen, Randy; Janoudi, Ghayath; Fell, Deshayne B.; Foster, Angel M.; Uzun, Mara (2024). Summary of findings. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001303534
    Explore at:
    Dataset updated
    May 22, 2024
    Authors
    Ray, Joel G.; Walker, Mark C.; Clifford, Tammy; Giffen, Randy; Janoudi, Ghayath; Fell, Deshayne B.; Foster, Angel M.; Uzun, Mara
    Description

    Clinical discoveries largely depend on dedicated clinicians and scientists to identify and pursue unique and unusual clinical encounters with patients and communicate these through case reports and case series. This process has remained essentially unchanged throughout the history of modern medicine. However, these traditional methods are inefficient, especially considering the modern-day availability of health-related data and the sophistication of computer processing. Outlier analysis has been used in various fields to uncover unique observations, including fraud detection in finance and quality control in manufacturing. We propose that clinical discovery can be formulated as an outlier problem within an augmented intelligence framework to be implemented on any health-related data. Such an augmented intelligence approach would accelerate the identification and pursuit of clinical discoveries, advancing our medical knowledge and uncovering new therapies and management approaches. We define clinical discoveries as contextual outliers measured through an information-based approach and with a novelty-based root cause. Our augmented intelligence framework has five steps: define a patient population with a desired clinical outcome, build a predictive model, identify outliers through appropriate measures, investigate outliers through domain content experts, and generate scientific hypotheses. Recognizing that the field of obstetrics can particularly benefit from this approach, as it is traditionally neglected in commercial research, we conducted a systematic review to explore how outlier analysis is implemented in obstetric research. We identified two obstetrics-related studies that assessed outliers at an aggregate level for purposes outside of clinical discovery. Our findings indicate that using outlier analysis in clinical research in obstetrics and clinical research, in general, requires further development.

  6. e

    K2-18 HARPS time-series - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). K2-18 HARPS time-series - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/45f01dd6-e144-5e90-8b4c-4e7bc357e9d7
    Explore at:
    Dataset updated
    Apr 25, 2023
    Description

    The cross-correlation function and template matching techniques have dominated the world of precision radial velocities for many years. Recently, a new technique, named line-by-line, has been developed as an outlier resistant way to efficiently extract radial velocity content from high resolution spectra. We apply this new method to archival HARPS and CARMENES datasets of the K2-18 system. After reprocessing the HARPS dataset with the line-by-line framework, we are able to replicate the findings of previous studies. Furthermore, by splitting the full wavelength range into sub-domains, we were able to identify a systematic chromatic correlation of the radial velocities in the reprocessed CARMENES dataset. After post-processing the radial velocities to remove this correlation, as well as rejecting some outlier nights, we robustly uncover the signal of both K2-18b and K2-18c, with masses that agree with those found from our analysis of the HARPS dataset. We then combine both the HARPS and CARMENES velocities to refine the parameters of both planets, notably resulting in a revised mass and period for K2-18c of 6.99+/-0.97 Earth masses and 9.2072+/-0.0065d, respectively. Our work thoroughly demonstrates the power of the line-by-line technique for the extraction of precision radial velocity information.

  7. f

    Eligibility Criteria for the Systematic Review.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghayath Janoudi; Mara Uzun (Rada); Deshayne B. Fell; Joel G. Ray; Angel M. Foster; Randy Giffen; Tammy Clifford; Mark C. Walker (2024). Eligibility Criteria for the Systematic Review. [Dataset]. http://doi.org/10.1371/journal.pdig.0000515.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 22, 2024
    Dataset provided by
    PLOS Digital Health
    Authors
    Ghayath Janoudi; Mara Uzun (Rada); Deshayne B. Fell; Joel G. Ray; Angel M. Foster; Randy Giffen; Tammy Clifford; Mark C. Walker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Clinical discoveries largely depend on dedicated clinicians and scientists to identify and pursue unique and unusual clinical encounters with patients and communicate these through case reports and case series. This process has remained essentially unchanged throughout the history of modern medicine. However, these traditional methods are inefficient, especially considering the modern-day availability of health-related data and the sophistication of computer processing. Outlier analysis has been used in various fields to uncover unique observations, including fraud detection in finance and quality control in manufacturing. We propose that clinical discovery can be formulated as an outlier problem within an augmented intelligence framework to be implemented on any health-related data. Such an augmented intelligence approach would accelerate the identification and pursuit of clinical discoveries, advancing our medical knowledge and uncovering new therapies and management approaches. We define clinical discoveries as contextual outliers measured through an information-based approach and with a novelty-based root cause. Our augmented intelligence framework has five steps: define a patient population with a desired clinical outcome, build a predictive model, identify outliers through appropriate measures, investigate outliers through domain content experts, and generate scientific hypotheses. Recognizing that the field of obstetrics can particularly benefit from this approach, as it is traditionally neglected in commercial research, we conducted a systematic review to explore how outlier analysis is implemented in obstetric research. We identified two obstetrics-related studies that assessed outliers at an aggregate level for purposes outside of clinical discovery. Our findings indicate that using outlier analysis in clinical research in obstetrics and clinical research, in general, requires further development.

  8. Data to accompany the outlier-waveform-detection Github repository (internal...

    • zenodo.org
    bin, txt
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karin Cox; Karin Cox; Daisuke Kase; Daisuke Kase; Robert Turner; Robert Turner (2024). Data to accompany the outlier-waveform-detection Github repository (internal globus pallidus, GPi) [Dataset]. http://doi.org/10.5281/zenodo.11077189
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Karin Cox; Karin Cox; Daisuke Kase; Daisuke Kase; Robert Turner; Robert Turner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains data based on neuronal recordings from two monkeys (G and I, in the pre- and post-MPTP states) that serve as input to the code provided at https://github.com/turner-lab-pitt/outlier-waveform-detection. Text files located within that Github repository provide detailed instructions on how these data may be used with that code. As described in those text files, extra data are provided for Monkey G, in the pre-MPTP state.

    The data-description.txt file provides detailed information regarding the contents of each zipped tar archive. Briefly, the most important components of the files are the "snips" (individual spike waveforms) from the two monkeys and MPTP states, as extracted for each of a series of single sorted units from the internal globus pallidus (GPi). The additional G-Pre data provides examples of the high-pass filtered voltage signals from which these snips were extracted. All data are stored in the Matlab .mat format.

    All zipped files can be decompressed with 7-zip: https://www.7-zip.org/

    These data and the associated Github code were used for analyses reported in an in-preparation manuscript (Kase et al., "Movement-related activity in the internal globus pallidus of the parkinsonian macaque"), and also with a preprint that is currently under review:

    Detecting rhythmic spiking through the power spectra of point process model residuals
    Karin M. Cox, Daisuke Kase, Taieb Znati, Robert S. Turner
    bioRxiv 2023.09.08.556120; doi: https://doi.org/10.1101/2023.09.08.556120

    This research was funded in part by Aligning Science Across Parkinson's [ASAP-020519] through the Michael J. Fox Foundation for Parkinson's Research (MJFF). For the purpose of open access, the authors have applied a Creative Commons Attribution 4.0 International (CC BY) public copyright license to this dataset.

  9. d

    Gila Trout neutral and outlier SNP genotype matrices

    • datadryad.org
    • search.dataone.org
    • +2more
    zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Camak; Megan Osborne; Thomas Turner, Gila Trout neutral and outlier SNP genotype matrices [Dataset]. http://doi.org/10.5061/dryad.g79cnp5pk
    Explore at:
    zipAvailable download formats
    Dataset provided by
    Dryad
    Authors
    David Camak; Megan Osborne; Thomas Turner
    Time period covered
    Apr 8, 2021
    Description

    Genotypes are coded as 0, 1, or 2 depending on the number of alternate alleles each individual is contained (0 = homozygous reference allele, 1 = heterozygotes, 2 = homozygous alternate allele). CHROM ID's refer to RefSeq sequence ID's of Oncorhynchus gilae (Rainbow Trout) found at https://www.ncbi.nlm.nih.gov/assembly/GCF_002163495.1

  10. Abnormal High-density Crowds

    • kaggle.com
    Updated Dec 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samar Mahmoud (2019). Abnormal High-density Crowds [Dataset]. https://www.kaggle.com/angelchi56/abnormal-highdensity-crowds/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Samar Mahmoud
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The availability of a benchmark dataset of high-density crowd footage is very limited. Furthermore, a dataset of high-density crowd footage that includes anomalous behaviours such as stampedes, overcrowding, violence, and panic, is not available. To solve this, public footage, which adheres to these constraints, has been collected.

    Content

    Footage of four events has been collected: Times Square Chaos, Las Vegas Mass Shooting, Love Parade disaster and Juventus fan panic. All footage was obtained from Youtube.

    Acknowledgements

    I would like to thank my supervisors, for the patient guidance, encouragement and advice they constantly provide.

    Inspiration

    With this data, what are the best algorithms to detect anomalous behaviour within a high-density crowd? Are low/medium density crowd anomaly detection algorithms applicable to more dense crowds?

  11. b

    Outliers - Catalogue, Press release and private view invite for exhibition

    • data.bathspa.ac.uk
    pdf
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosemary Snell (2023). Outliers - Catalogue, Press release and private view invite for exhibition [Dataset]. http://doi.org/10.17870/bathspa.11537910.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    BathSPAdata
    Authors
    Rosemary Snell
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    Outliers is a research project articulated through a solo exhibition held at No 20 Arts in London. It contained a body of 26 works including paintings, drawings and photographs that were the culmination of a research trip to Greenland. This body of work aimed to explore how the medium of paint could be manipulated to not only represent the dramatic and transient nature of the icescapes of Greenland but to also emulate and explore the properties of snow and ice themselves. This item contains the catalogue, press release and invite from the exhibition.All content included on this item courtesy and copyright of No. 20 Arts, London. Used with permission. The work is under copyright and may not be used without permission. Use of this repository acknowledges cooperation with its policies and relevant copyright law.

  12. e

    Wealth and Assets Survey, Waves 1-5 and Rounds 5-7, 2006-2020: Secure Access...

    • b2find.eudat.eu
    Updated Oct 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Wealth and Assets Survey, Waves 1-5 and Rounds 5-7, 2006-2020: Secure Access - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/60139fe5-3862-5bae-853e-764863b1e3ff
    Explore at:
    Dataset updated
    Oct 28, 2023
    Description

    Abstract copyright UK Data Service and data collection copyright owner. The Wealth and Assets Survey (WAS) is a longitudinal survey, which aims to address gaps identified in data about the economic well-being of households by gathering information on level of assets, savings and debt; saving for retirement; how wealth is distributed among households or individuals; and factors that affect financial planning. Private households in Great Britain were sampled for the survey (meaning that people in residential institutions, such as retirement homes, nursing homes, prisons, barracks or university halls of residence, and also homeless people were not included).The WAS commenced in July 2006, with a first wave of interviews carried out over two years, to June 2008. Interviews were achieved with 30,595 households at Wave 1. Those households were approached again for a Wave 2 interview between July 2008 and June 2010, and 20,170 households took part. Wave 3 covered July 2010 - June 2012, Wave 4 covered July 2012 - June 2014 and Wave 5 covered July 2014 - June 2016. Revisions to previous waves' data mean that small differences may occur between originally published estimates and estimates from the datasets held by the UK Data Service. These revisions are due to improvements in the imputation methodology.Note from the WAS team - November 2023:"The Office for National Statistics has identified a very small number of outlier cases present in the seventh round of the Wealth and Assets Survey covering the period April 2018 to March 2020. Our current approach is to treat cases where we have reasonable evidence to suggest the values provided for specific variables are outliers. This approach did not occur for two individuals for several variables involved in the estimation of their pension wealth. While we estimate any impacts are very small overall and median pension wealth and median total wealth estimates are unaffected, this will affect the accuracy of the breakdowns of the pension wealth within the wealthiest decile, and data derived from them. We are urging caution in the interpretation of more detailed estimates."Survey Periodicity - "Waves" to "Rounds"Due to the survey periodicity moving from "Waves" (July, ending in June two years later) to “Rounds” (April, ending in March two years later), interviews using the ‘Wave 6’ questionnaire started in July 2016 and were conducted for 21 months, finishing in March 2018. Data for round 6 covers the period April 2016 to March 2018. This comprises of the last three months of Wave 5 (April to June 2016) and 21 months of Wave 6 (July 2016 to March 2018). Round 5 and Round 6 datasets are based on a mixture of original wave-based datasets. Each wave of the survey has a unique questionnaire and therefore each of these round-based datasets are based on two questionnaires. While there may be some changes in the questionnaires, the derived variables for the key wealth estimates have not changed over this period. The aim is to collect the same data, though in some cases the exact questions asked may differ slightly. Detailed information on Moving the Wealth and Assets Survey onto a financial years’ basis was published on the ONS website in July 2019.Further information and documentation may be found on the ONS Wealth and Assets Survey webpage. Users are advised to the check the page for updates before commencing analysis.Users should note that issues with linking have been reported and the WAS team are currently investigating.Secure Access WAS dataThe Secure Access version of the WAS includes additional, detailed geographical variables not included in the End User Licence (EUL) version (SN 7215). These include:WardsParliamentary Constituency Areas for Wave 1 onlyCensus Output AreasLower Layer Super Output AreasLocal AuthoritiesLocal Education AuthoritiesProspective users of the Secure Access version of the WAS will need to fulfil additional requirements, including completion of face-to-face training, and agreement to the Secure Access User Agreement and Licence Compliance Policy, in order to obtain permission to use that version (see 'Access' section below). Users are therefore strongly encouraged to download the EUL version (SN 7215) to see if it contains sufficient detail for their needs, before considering making an application for the Secure Access version.Latest Edition InformationFor the ninth edition (October 2022), the Round 7 person and household data have been updated. The Round 7 Wave 1 Variable Catalogue Excel file has also been updated. Main Topics: The WAS questionnaire was divided into two parts with all adults aged 16 years and over (excluding those aged 16 to 18 currently in full-time education) being interviewed in each responding household. Household schedule: This was completed by one person in the household (usually the head of household or their partner) and predominantly collected household level information such as the number, demographics and relationship of individuals to each other, as well as information about the ownership, value and mortgages on the residence and other household assets. Individual schedule: This was given to each adult in the household and asked questions about economic status, education and employment, business assets, benefits and tax credits, saving attitudes and behaviour, attitudes to debt, insolvency, major items of expenditure, retirement, attitudes to saving for retirement, pensions, financial assets, non-mortgage debt, investments and other income. Multi-stage stratified random sample Face-to-face interview 2006 2020 ADOPTION PAY AGE AIRCRAFT ALIMONY ASSETS ATTITUDES TO SAVING BANK ACCOUNTS BEDROOMS BICYCLES BOATS BONDS BUSINESS OWNERSHIP BUSINESS RECORDS BUSINESSES CARAVANS CARE OF DEPENDANTS CARERS BENEFITS CARS CHILD BENEFITS CHILD SUPPORT PAYMENTS CHILD TRUST FUNDS COHABITING COMMERCIAL BUILDINGS COST OF LIVING COSTS CREDIT CARD USE DEBILITATIVE ILLNESS DEBTS DISABILITIES EARLY RETIREMENT ECONOMIC ACTIVITY EDUCATIONAL BACKGROUND EDUCATIONAL COURSES EDUCATIONAL FEES EDUCATIONAL GRANTS EDUCATIONAL STATUS EMPLOYEES EMPLOYMENT EMPLOYMENT HISTORY EMPLOYMENT PROGRAMMES ENDOWMENT ASSURANCE ESTATES ETHNIC GROUPS EXPENDITURE FAMILY BENEFITS FAMILY INCOME FAMILY MEMBERS FINANCIAL ADVICE FINANCIAL COMPENSATION FINANCIAL DIFFICULTIES FINANCIAL SERVICES FREQUENCY OF PAY FRINGE BENEFITS FULL TIME EMPLOYMENT FURNISHED ACCOMMODA... GENDER GIFTS Great Britain HEALTH HEALTH STATUS HIRE PURCHASE HOME BUILDINGS INSU... HOME BUYING HOME CONTENTS INSUR... HOME OWNERSHIP HOUSE PRICES HOUSEHOLD BUDGETS HOUSEHOLD HEAD S EC... HOUSEHOLD HEAD S SO... HOUSEHOLD INCOME HOUSEHOLDERS HOUSEHOLDS HOUSING HOUSING AGE HOUSING ECONOMICS HOUSING FINANCE HOUSING TENURE ILL HEALTH INCOME INCOME TAX INCONTINENCE INFORMAL CARE INHERITANCE INSOLVENCIES INSURANCE CLAIMS INTELLECTUAL IMPAIR... INTEREST FINANCE INVESTMENT Income JOB HUNTING JOB SEEKER S ALLOWANCE LAND OWNERSHIP LAND VALUE LANDLORDS LIFE INSURANCE LOANS Labour and employment MAIL ORDER SERVICES MARITAL STATUS MATERNITY BENEFITS MATERNITY PAY MATHEMATICS MOBILE HOMES MORTGAGE ARREARS MORTGAGE PROTECTION... MORTGAGES MOTOR VEHICLE VALUE MOTOR VEHICLES MOTORCYCLES OCCUPATIONAL PENSIONS OCCUPATIONAL QUALIF... OCCUPATIONS OLD AGE BENEFITS ONE PARENT FAMILIES OVERDRAFTS PART TIME EMPLOYMENT PARTNERSHIPS BUSINESS PATERNITY BENEFITS PATERNITY PAY PENSION BENEFITS PENSION CONTRIBUTIONS PENSIONS PERSONAL DEBT REPAY... PERSONAL FINANCE MA... PHYSICAL MOBILITY PLACE OF BIRTH PRIVATE PENSIONS PRIVATE PERSONAL PE... PROFIT SHARING PROFITS QUALIFICATIONS REDUNDANCY PAY RELIGIOUS AFFILIATION RELIGIOUS ATTENDANCE RENTED ACCOMMODATION RENTS RESIDENTIAL BUILDINGS RETIREMENT RETIREMENT AGE ROYALTIES SAVINGS SAVINGS ACCOUNTS AN... SECOND HOMES SELF EMPLOYED SELLING SHARED HOME OWNERSHIP SHARES SICK PAY SICKNESS AND DISABI... SOCIAL HOUSING SOCIAL SECURITY SOCIAL SECURITY BEN... SOCIO ECONOMIC STATUS SPOUSES STAKEHOLDER PENSIONS STATE RETIREMENT PE... STATUS IN EMPLOYMENT STUDENT LOANS SUBSIDIARY EMPLOYMENT SUPERVISORY STATUS SURVIVORS BENEFITS TAX RELIEF TAXATION TENANTS HOME PURCHA... TIED HOUSING TOP MANAGEMENT TRANSPORT FARES TRUSTS UNEARNED INCOME UNEMPLOYED UNFURNISHED ACCOMMO... UNWAGED WORKERS WAGES WAR VETERANS BENEFITS WEALTH WILLS WINNINGS WORKPLACE property and invest...

  13. combined wine data

    • kaggle.com
    Updated Nov 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siyuan H (2017). combined wine data [Dataset]. https://www.kaggle.com/datasets/siyuanh/combined-wine-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Siyuan H
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

    Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

    Title: Wine Quality Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 Past Usage:

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

    In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure). Relevant Information:

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods. Number of Instances: red wine - 1599; white wine - 4898. Number of Attributes: 11 + output attribute

    Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection. Attribute information:

    For more information, read [Cortez et al., 2009].

    Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10) Missing Attribute Values: None Description of attributes:

    1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)

    2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste

    3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines

    4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet

    5 - chlorides: the amount of salt in the wine

    6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine

    7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine

    8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content

    9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

    10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant

    11 - alcohol: the percent alcohol content of the wine

    Output variable (based on sensory data): 12 - quality (score between 0 and 10)

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2004). ‘Swiss banknote conterfeit detection’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-swiss-banknote-conterfeit-detection-8fe6/051a31c5/?iid=001-267&v=presentation

‘Swiss banknote conterfeit detection’ analyzed by Analyst-2

Explore at:
Dataset updated
Jan 17, 2004
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Swiss banknote conterfeit detection’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/chrizzles/swiss-banknote-conterfeit-detection on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Will you be able to identify genuine and conterfeit banknotes, even if half of the data is conterfeit? Perfect for testing different outlier detection algorithms.

Content

The dataset includes information about the shape of the bill, as well as the label. It is made up of 200 banknotes in total, 100 for genuine/conterfeit each.

Attributes: -conterfeit: Wether a banknote is conterfeit (1) or genuine (0) -Length: Length of bill (mm) -Left: Width of left edge (mm) -Right: Width of right edge (mm) -Bottom: Bottom margin width (mm) -Top: Top margin width (mm) -Diagonal: Length of diagonal (mm)

Original Data Source

Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A practical approach. London: Chapman & Hall, Tables 1.1 and 1.2, pp. 5-8.

Applications

While it might be pretty easy for a classifier to decide wether the banknotes are conterfeit or not, what about methods using outlier detection? Classical methods of outlier detection won't work, since half of the data consist of outliers (conterfeit bills), so more robust methods will be needed.

--- Original source retains full ownership of the source dataset ---

Search
Clear search
Close search
Google apps
Main menu