34 datasets found
  1. Data from: Outlier classification using autoencoders: application for...

    • osti.gov
    • dataverse.harvard.edu
    Updated Jun 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. http://doi.org/10.7910/DVN/SKEHRJ
    Explore at:
    Dataset updated
    Jun 2, 2021
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
    Description

    Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.

  2. f

    The 12 outliers identified in the Tonga dataset.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anderson B. Mayfield; Chii-Shiarng Chen; Alexandra C. Dempsey (2023). The 12 outliers identified in the Tonga dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0185857.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Anderson B. Mayfield; Chii-Shiarng Chen; Alexandra C. Dempsey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Tonga
    Description

    Gene expression data have been presented as non-normalized (2-Ct*109) in all but the last six rows; this allows for the back-calculation of the raw threshold cycle (Ct) values so that interested individuals can readily estimate the typical range of expression of each gene. Values representing aberrant levels for a particular parameter (z-score>2.5) have been highlighted in bold. When there was a statistically significant difference (student’s t-test, p0.05). SA = surface area. GCP = genome copy proportion. Ma Dis = Mahalanobis distance. “.” = missing data.

  3. f

    LOF calculation time (seconds) comparison.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jihwan Lee; Nam-Wook Cho (2023). LOF calculation time (seconds) comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0165972.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Jihwan Lee; Nam-Wook Cho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LOF calculation time (seconds) comparison.

  4. Effect sizes calculated using MD and MC, excluding outliers

    • dro.deakin.edu.au
    • researchdata.edu.au
    txt
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Don Driscoll (2024). Effect sizes calculated using MD and MC, excluding outliers [Dataset]. http://doi.org/10.26187/deakin.26264351.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    Deakin Universityhttp://www.deakin.edu.au/
    Authors
    Don Driscoll
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Effect sizes calculated using mean difference for burnt-unburnt study designs and mean change for before-after desings. Outliers, as defined in the methods section of the paper, were excluded prior to calculating effect sizes.

  5. f

    DataSheet_1_Research on outlier detection in CTD conductivity data based on...

    • frontiersin.figshare.com
    docx
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Long Yu; Jia Sun; Yanliang Guo; Baohua Zhang; Guangbing Yang; Liang Chen; Xia Ju; Fanlin Yang; Xuejun Xiong; Xianqing Lv (2023). DataSheet_1_Research on outlier detection in CTD conductivity data based on cubic spline fitting.docx [Dataset]. http://doi.org/10.3389/fmars.2022.1030980.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Long Yu; Jia Sun; Yanliang Guo; Baohua Zhang; Guangbing Yang; Liang Chen; Xia Ju; Fanlin Yang; Xuejun Xiong; Xianqing Lv
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Outlier detection is the key to the quality control of marine survey data. For the detection of outliers in Conductivity-Temperature-Depth (CTD) data, previous methods, such as the Wild Edit method and the Median Filter Combined with Maximum Deviation method, mostly set a threshold based on statistics. Values greater than the threshold are treated as outliers, but there is no clear specification for the selection of threshold, thus multiple attempts are required. The process is time-consuming and inefficient, and the results have high false negative and positive rates. In response to this problem, we proposed an outlier detection method in CTD conductivity data, based on a physical constraint, the continuity of seawater. The method constructs a cubic spline fitting function based on the independent points scheme and the cubic spline interpolation to fit the conductivity data. The maximum fitting residual points will be flagged as outliers. The fitting stops when the optimal number of iterations is reached, which is automatically obtained by the minimum value of the sequence of maximum fitting residuals. Verification of the accuracy and stability of the method by means of examples proves that it has a lower false negative rate (17.88%) and false positive rate (0.24%) than other methods. Indeed, rates for the Wild Edit method are 56.96% and 2.19%, while for the Median Filter Combined with Maximum Deviation method rates are 23.28% and 0.31%. The Cubic Spline Fitting method is simple to operate, the result is clear and definite, better solved the problem of conductivity outliers detection.

  6. S

    Water quality test data

    • scidb.cn
    Updated Oct 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HuiyunFeng; JingangJiang (2022). Water quality test data [Dataset]. http://doi.org/10.57760/sciencedb.05375
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 26, 2022
    Dataset provided by
    Science Data Bank
    Authors
    HuiyunFeng; JingangJiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Outliers are often present in large datasets of water quality monitoring time series data. A method of combining the sliding window technique with Dixon detection criterion for the automatic detection of outliers in time series data is limited by the empirical determination of sliding window sizes. The scientific determination of the optimal sliding window size is very meaningful research work. This paper presents a new Monte Carlo Search Method (MCSM) based on random sampling to optimize the size of the sliding window, which fully takes advantage of computers and statistics. The MCSM was applied in a case study to automatic monitoring data of water quality factors in order to test its validity and usefulness. The results of comparing the accuracy and efficiency of the MCSM show that the new method in this paper is scientific and effective. The experimental results show that, at different sample sizes, the average accuracy is between 58.70% and 75.75%, and the average computation time increase is between 17.09% and 45.53%. In the era of big data in environmental monitoring, the proposed new methods can meet the required accuracy of outlier detection and improve the efficiency of calculation.

  7. v

    11: Streamwater sample constituent concentration outliers from 15 watersheds...

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). 11: Streamwater sample constituent concentration outliers from 15 watersheds in Gwinnett County, Georgia for water years 2003-2020 [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/11-streamwater-sample-constituent-concentration-outliers-from-15-watersheds-in-gwinne-2003
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Gwinnett County, Georgia
    Description

    This dataset contains a list of outlier sample concentrations identified for 17 water quality constituents from streamwater sample collected at 15 study watersheds in Gwinnett County, Georgia for water years 2003 to 2020. The 17 water quality constituents are: biochemical oxygen demand (BOD), chemical oxygen demand (COD), total suspended solids (TSS), suspended sediment concentration (SSC), total nitrogen (TN), total nitrate plus nitrite (NO3NO2), total ammonia plus organic nitrogen (TKN), dissolved ammonia (NH3), total phosphorus (TP), dissolved phosphorus (DP), total organic carbon (TOC), total calcium (Ca), total magnesium (Mg), total copper (TCu), total lead (TPb), total zinc (TZn), and total dissolved solids (TDS). 885 outlier concentrations were identified. Outliers were excluded from model calibration datasets used to estimate streamwater constituent loads for 12 of these constituents. Outlier concentrations were removed because they had a high influence on the model fits of the concentration relations, which could substantially affect model predictions. Identified outliers were also excluded from loads that were calculated using the Beale ratio estimator. Notes on reason(s) for considering a concentration as an outlier are included.

  8. Capital Ratios For Acute Care Hospitals

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Capital Ratios For Acute Care Hospitals [Dataset]. https://www.johnsnowlabs.com/marketplace/capital-ratios-for-acute-care-hospitals/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    This dataset is used to determine whether a case qualifies for outlier payments under the hospital inpatient prospective payment system (IPPS), hospital-specific cost-to-charge ratios are applied to the total covered charges for the case. Operating and capital costs for the case are calculated separately by applying separate operating and capital cost-to-charge ratios and combining these costs to compare them with the fixed-loss outlier threshold.

  9. R code

    • figshare.com
    txt
    Updated Jun 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Christine Dodge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

  10. e

    HGW: Lead, Average total content (surface)

    • data.europa.eu
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). HGW: Lead, Average total content (surface) [Dataset]. https://data.europa.eu/88u/dataset/29e63b6a-6db6-ea92-29f8-e5260aaf3001
    Explore at:
    Dataset updated
    Aug 28, 2024
    Description

    The mean is the median (synonym: 50. percentile, central value). It is the value above or below which 50% of all cases of a data group are located. The calculation is carried out on outlier-adjusted data collectives. The total content is determined from the aqua regia extract (according to DIN ISO 11466 (1997)). The concentration is given in mg/kg. The salary classes take into account, among other things, the pension values of the BBodSchV (1999). These are 40 mg/kg for sand, 70 mg/kg for clay, silt and very silty sand and 100 mg/kg for clay. According to LABO (2003) a sample count of >=20 is required for the calculation of background values. However, the map also shows groups with a sample count >= 10. This information is then only informal and not representative.

  11. Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health...

    • zenodo.org
    bin, csv, pdf
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Domjan; Peter Domjan; Viola Angyal; Viola Angyal; Istvan Vingender; Istvan Vingender (2024). Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary" [Dataset]. http://doi.org/10.5281/zenodo.13826993
    Explore at:
    bin, pdf, csvAvailable download formats
    Dataset updated
    Sep 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Domjan; Peter Domjan; Viola Angyal; Viola Angyal; Istvan Vingender; Istvan Vingender
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 23, 2024
    Area covered
    Hungary
    Description

    Introduction

    We are enclosing the database used in our research titled "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary", along with our statistical calculations. For the sake of reproducibility, further information can be found in the file Short_Description_of_Data_Analysis.pdf and Statistical_formulas.pdf

    The sharing of data is part of our aim to strengthen the base of our scientific research. As of March 7, 2024, the detailed submission and analysis of our research findings to a scientific journal has not yet been completed.

    The dataset was expanded on 23rd September 2024 to include SPSS statistical analysis data, a heatmap, and buffer zone analysis around the Health Development Offices (HDOs) created in QGIS software.

    Short Description of Data Analysis and Attached Files (datasets):

    Our research utilised data from 2022, serving as the basis for statistical standardisation. The 2022 Hungarian census provided an objective basis for our analysis, with age group data available at the county level from the Hungarian Central Statistical Office (KSH) website. The 2022 demographic data provided an accurate picture compared to the data available from the 2023 microcensus. The used calculation is based on our standardisation of the 2022 data. For xlsx files, we used MS Excel 2019 (version: 1808, build: 10406.20006) with the SOLVER add-in.

    Hungarian Central Statistical Office served as the data source for population by age group, county, and regions: https://www.ksh.hu/stadat_files/nep/hu/nep0035.html, (accessed 04 Jan. 2024.) with data recorded in MS Excel in the Data_of_demography.xlsx file.

    In 2022, 108 Health Development Offices (HDOs) were operational, and it's noteworthy that no developments have occurred in this area since 2022. The availability of these offices and the demographic data from the Central Statistical Office in Hungary are considered public interest data, freely usable for research purposes without requiring permission.

    The contact details for the Health Development Offices were sourced from the following page (Hungarian National Population Centre (NNK)): https://www.nnk.gov.hu/index.php/efi (n=107). The Semmelweis University Health Development Centre was not listed by NNK, hence it was separately recorded as the 108th HDO. More information about the office can be found here: https://semmelweis.hu/egeszsegfejlesztes/en/ (n=1). (accessed 05 Dec. 2023.)

    Geocoordinates were determined using Google Maps (N=108): https://www.google.com/maps. (accessed 02 Jan. 2024.) Recording of geocoordinates (latitude and longitude according to WGS 84 standard), address data (postal code, town name, street, and house number), and the name of each HDO was carried out in the: Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file.

    The foundational software for geospatial modelling and display (QGIS 3.34), an open-source software, can be downloaded from:

    https://qgis.org/en/site/forusers/download.html. (accessed 04 Jan. 2024.)

    The HDOs_GeoCoordinates.gpkg QGIS project file contains Hungary's administrative map and the recorded addresses of the HDOs from the

    Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file,

    imported via .csv file.

    The OpenStreetMap tileset is directly accessible from www.openstreetmap.org in QGIS. (accessed 04 Jan. 2024.)

    The Hungarian county administrative boundaries were downloaded from the following website: https://data2.openstreetmap.hu/hatarok/index.php?admin=6 (accessed 04 Jan. 2024.)

    HDO_Buffers.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding buffer zones with a radius of 7.5 km.

    Heatmap.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding heatmap (Kernel Density Estimation).

    A brief description of the statistical formulas applied is included in the Statistical_formulas.pdf.

    Recording of our base data for statistical concentration and diversification measurement was done using MS Excel 2019 (version: 1808, build: 10406.20006) in .xlsx format.

    • Aggregated number of HDOs by county: Number_of_HDOs.xlsx
    • Standardised data (Number of HDOs per 100,000 residents): Standardized_data.xlsx
    • Calculation of the Lorenz curve: Lorenz_curve.xlsx
    • Calculation of the Gini index: Gini_Index.xlsx
    • Calculation of the LQ index: LQ_Index.xlsx
    • Calculation of the Herfindahl-Hirschman Index: Herfindahl_Hirschman_Index.xlsx
    • Calculation of the Entropy index: Entropy_Index.xlsx
    • Regression and correlation analysis calculation: Regression_correlation.xlsx

    Using the SPSS 29.0.1.0 program, we performed the following statistical calculations with the databases Data_HDOs_population_without_outliers.sav and Data_HDOs_population.sav:

    • Regression curve estimation with elderly population and number of HDOs, excluding outlier values (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_elderly_without_outlier.spv
    • Pearson correlation table between the total population, elderly population, and number of HDOs per county, excluding outlier values such as Budapest and Pest County: Pearson_Correlation_populations_HDOs_number_without_outliers.spv.
    • Dot diagram including total population and number of HDOs per county, excluding outlier values such as Budapest and Pest Counties: Dot_HDO_total_population_without_outliers.spv.
    • Dot diagram including elderly (64<) population and number of HDOs per county, excluding outlier values such as Budapest and Pest Counties: Dot_HDO_elderly_population_without_outliers.spv
    • Regression curve estimation with total population and number of HDOs, excluding outlier values (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_without_outlier.spv
    • Dot diagram including elderly (64<) population and number of HDOs per county: Dot_HDO_elderly_population.spv
    • Dot diagram including total population and number of HDOs per county: Dot_HDO_total_population.spv
    • Pearson correlation table between the total population, elderly population, and number of HDOs per county: Pearson_Correlation_populations_HDOs_number.spv
    • Regression curve estimation with total population and number of HDOs, (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_total_population.spv

    For easier readability, the files have been provided in both SPV and PDF formats.

    The translation of these supplementary files into English was completed on 23rd Sept. 2024.

    If you have any further questions regarding the dataset, please contact the corresponding author: domjan.peter@phd.semmelweis.hu

  12. Probability-Density-Ranking (PDR) outliers and Most Probable Range (MPR) of...

    • springernature.figshare.com
    application/gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenghua Shao; Huanwang Yang; Sijiang Wang; Zonghong Liu; Stephen K. Burley (2023). Probability-Density-Ranking (PDR) outliers and Most Probable Range (MPR) of PDB data [Dataset]. http://doi.org/10.6084/m9.figshare.7150124.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Chenghua Shao; Huanwang Yang; Sijiang Wang; Zonghong Liu; Stephen K. Burley
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data and code to calculate Probability-Density-Ranking (PDR) outliers and Most Probable Range (MPR)

  13. C

    Arsenic, mean total content (topsoil)

    • ckan.mobidatalab.eu
    html, karte +2
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landesamt für Geologie und Bergbau (2023). Arsenic, mean total content (topsoil) [Dataset]. https://ckan.mobidatalab.eu/dataset/arsen-mittlerer-gesamtgehalt-oberboden
    Explore at:
    webanwendung, html, wms, karteAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Landesamt für Geologie und Bergbau
    License

    Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
    License information was derived automatically

    Description

    The median (synonym: 50th percentile, central value) is used as the mean value. It is the value above or below which 50% of all cases in a data group are. The calculation is carried out on outlier-free data collectives. The total content is determined from the aqua regia extract (according to DIN ISO 11466 (1997)). The concentration is given in mg/kg. The BBodSchV (1999) does not set any precautionary values ​​for arsenic. According to LABO (2003), a sample number of >=20 is required for the calculation of background values. However, groups with a number of samples >= 10 are also shown on the map. This information is then only informal and not representative. Further information on definitions of terms, horizon grouping and statistical evaluation: (http://mapserver.lgb-rlp.de/php_hgw_bod/meta/Background values_Hinweise.pdf) Terms of use see: http://www.lgb-rlp.de/karten-und- products/online-maps/terms-of-use-for-online-maps.html

  14. g

    HGW: Nickel, Average total content (surface) | gimi9.com

    • gimi9.com
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). HGW: Nickel, Average total content (surface) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-www-lgb-rlp-de-registry-spatial-dataset-d2f431c4-b2b1-9168-14f3-7e374c40d2f0
    Explore at:
    Dataset updated
    Aug 28, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The mean is the median (synonym: 50. percentile, central value). It is the value above or below which 50% of all cases of a data group are located. The calculation is carried out on outlier-adjusted data collectives. The total content is determined from the aqua regia extract (according to DIN ISO 11466 (1997)). The concentration is given in mg/kg. The salary classes take into account, among other things, the pension values of the BBodSchV (1999). These are 15 mg/kg for the soil type sand, 50 mg/kg for clay, silt and heavily silty sand and 70 mg/kg for clay. According to LABO (2003) a sample count of >=20 is required for the calculation of background values. However, the map also shows groups with a sample count >= 10. This information is then only informal and not representative.

  15. f

    Data from: OUTLIERS DETECTION BY RANSAC ALGORITHM IN THE TRANSFORMATION OF...

    • scielo.figshare.com
    jpeg
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joanna Janicka; Jacek Rapinski (2023). OUTLIERS DETECTION BY RANSAC ALGORITHM IN THE TRANSFORMATION OF 2D COORDINATE FRAMES [Dataset]. http://doi.org/10.6084/m9.figshare.14327643.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    SciELO journals
    Authors
    Joanna Janicka; Jacek Rapinski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Over the years there have been a number of different computational methods that allow for the identification of outliers. Methods for robust estimation are known in the set of M-estimates methods (derived from the method of Maximum Likelihood Estimation) or in the set of R-estimation methods (robust estimation based on the application of some rank test). There are also algorithms that are not classified in any of these groups but these methods are also resistant to gross errors, for example, in M-split estimation. Another proposal, which can be used to detect outliers in the process of transformation of coordinates, where the coordinates of some points may be affected by gross errors, can be a method called RANSAC algorithm (Random Sample and Consensus). The authors present a study that was performed in the process of 2D transformation parameter estimation using RANSAC algorithm to detect points that have coordinates with outliers. The calculations were performed in three scenarios on the real geodetic network. Selected coordinates were burdened with simulated values of errors to confirm the efficiency of the proposed method.

  16. C

    Lead, mean total content (topsoil)

    • ckan.mobidatalab.eu
    html, karte +2
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landesamt für Geologie und Bergbau (2023). Lead, mean total content (topsoil) [Dataset]. https://ckan.mobidatalab.eu/dataset/blei-mittlerer-gesamtgehalt-oberboden
    Explore at:
    karte, webanwendung, wms, htmlAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Landesamt für Geologie und Bergbau
    License

    Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
    License information was derived automatically

    Description

    The median (synonym: 50th percentile, central value) is used as the mean value. It is the value above or below which 50% of all cases in a data group are. The calculation is carried out on outlier-free data collectives. The total content is determined from the aqua regia extract (according to DIN ISO 11466 (1997)). The concentration is given in mg/kg. The salary classes take into account, among other things, the precautionary values ​​of the BBodSchV (1999). These are 40 mg/kg for the soil type sand, 70 mg/kg for loam, silt and very silty sand and 100 mg/kg for clay. According to LABO (2003), a sample number of >=20 is required for the calculation of background values. However, groups with a number of samples >= 10 are also shown on the map. This information is then only informal and not representative.

  17. d

    Data from: Simulation as a new tool to establish benchmark outcome measures...

    • datadryad.org
    zip
    Updated Jun 17, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt M. Kurrek; Pamela Morgan; Steven Howard; Peter Kranke; Aaron Calhoun; Joshua Hui; Alex Kiss (2016). Simulation as a new tool to establish benchmark outcome measures in obstetrics [Dataset]. http://doi.org/10.5061/dryad.8s511
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 17, 2016
    Dataset provided by
    Dryad
    Authors
    Matt M. Kurrek; Pamela Morgan; Steven Howard; Peter Kranke; Aaron Calhoun; Joshua Hui; Alex Kiss
    Time period covered
    Dec 20, 2014
    Description

    20141220 Database for PLoS ONE Manuscript

  18. f

    Sensitivity analysis on pooled HR of mortality after excluding the outliers...

    • datasetcatalog.nlm.nih.gov
    Updated Mar 14, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qin, Xue; Liu, Zhengtao; Peng, Tao; Wang, Linlin; Ning, Huaijun; Que, Shuping (2014). Sensitivity analysis on pooled HR of mortality after excluding the outliers classified by age. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001207275
    Explore at:
    Dataset updated
    Mar 14, 2014
    Authors
    Qin, Xue; Liu, Zhengtao; Peng, Tao; Wang, Linlin; Ning, Huaijun; Que, Shuping
    Description

    aHR calculated in younger subgroup was the dose-response evaluation assessed per 5 U/l of ALT increment;HR calculated in older subgroup was the evaluation compared between higher and lower ALT categories.Abbreviations: CI: confidence interval; CV: cardiovascular; HR: hazard ratio.

  19. f

    Data_Sheet_1_Computational Surveillance of Microbial Water Quality With...

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie C. Sadler; Jérémy Senouillet; Simon Kuenzi; Luigino Grasso; Douglas C. Watson (2023). Data_Sheet_1_Computational Surveillance of Microbial Water Quality With Online Flow Cytometry.PDF [Dataset]. http://doi.org/10.3389/frwa.2020.586969.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Marie C. Sadler; Jérémy Senouillet; Simon Kuenzi; Luigino Grasso; Douglas C. Watson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Automated flow cytometry (FCM) adapted to real-time quality surveillance provides high-temporal-resolution data about the microbial communities in a water system. The cell concentration calculated from FCM measurements indicates sudden increases in the number of bacteria, but can fluctuate significantly due to man-made and natural dynamics; it can thus obscure the presence of microbial anomalies. Cytometric fingerprinting tools enable a detailed analysis of the aquatic microbial communities, and could distinguish between normal and abnormal community changes. However, the vast majority of current cytometric fingerprinting tools use offline statistical computations which cannot detect anomalies immediately. Here, we present a computational model, entitled Microbial Community Change Detection (MCCD), which transforms microbial community characteristics into an online process control signal (herein called outlier score) that remains close to zero if the microbial community remains stable and increases with fluctuations in the community. The model is based on fingerprints and distance-based outlier calculations. We tested it in silico and in vitro by simulating acute contaminations to real-world water systems with large inherent microbial fluctuations. We showed that the outlier score was robust against these dynamic variations, while reliably detecting intentional contaminations. This model can be used with automated FCM to quickly detect potential microbiological contamination, and this especially when the time between treatment and distribution is very short.

  20. f

    Data from: PCP-SAFT Parameters of Pure Substances Using Large Experimental...

    • acs.figshare.com
    zip
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timm Esper; Gernot Bauer; Philipp Rehner; Joachim Gross (2023). PCP-SAFT Parameters of Pure Substances Using Large Experimental Databases [Dataset]. http://doi.org/10.1021/acs.iecr.3c02255.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    ACS Publications
    Authors
    Timm Esper; Gernot Bauer; Philipp Rehner; Joachim Gross
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This work reports pure component parameters for the PCP-SAFT equation of state for 1842 substances using a total of approximately 551 172 experimental data points for vapor pressure and liquid density. We utilize data from commercial and public databases in combination with an automated workflow to assign chemical identifiers to all substances, remove duplicate data sets, and filter unsuited data. The use of raw experimental data, as opposed to pseudoexperimental data from empirical correlations, requires means to identify and remove outliers, especially for vapor pressure data. We apply robust regression using a Huber loss function. For identifying and removing outliers, the empirical Wagner equation for vapor pressure is adjusted to experimental data, because the Wagner equation is mathematically rather flexible and is thus not subject to a systematic model bias. For adjusting model parameters of the PCP-SAFT model, nonpolar, dipolar and associating substances are distinguished. The resulting substance-specific parameters of the PCP-SAFT equation of state yield in a mean absolute relative deviation of the of 2.73% for vapor pressure and 0.52% for liquid densities (2.56% and 0.47% for nonpolar substances, 2.67% and 0.61% for dipolar substances, and 3.24% and 0.54% for associating substances) when evaluated against outlier-removed data. All parameters are provided as JSON and CSV files.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. http://doi.org/10.7910/DVN/SKEHRJ
Organization logo

Data from: Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas

Related Article
Explore at:
Dataset updated
Jun 2, 2021
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
Description

Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.

Search
Clear search
Close search
Google apps
Main menu