34 datasets found

Data from: Outlier classification using autoencoders: application for...
osti.gov
dataverse.harvard.edu
Updated Jun 2, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. http://doi.org/10.7910/DVN/SKEHRJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SKEHRJ
Dataset updated
Jun 2, 2021
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
Description
Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.
f
The 12 outliers identified in the Tonga dataset.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson B. Mayfield; Chii-Shiarng Chen; Alexandra C. Dempsey (2023). The 12 outliers identified in the Tonga dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0185857.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0185857.t004
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Anderson B. Mayfield; Chii-Shiarng Chen; Alexandra C. Dempsey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Tonga
Description
Gene expression data have been presented as non-normalized (2-Ct*109) in all but the last six rows; this allows for the back-calculation of the raw threshold cycle (Ct) values so that interested individuals can readily estimate the typical range of expression of each gene. Values representing aberrant levels for a particular parameter (z-score>2.5) have been highlighted in bold. When there was a statistically significant difference (student’s t-test, p0.05). SA = surface area. GCP = genome copy proportion. Ma Dis = Mahalanobis distance. “.” = missing data.
f
LOF calculation time (seconds) comparison.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jihwan Lee; Nam-Wook Cho (2023). LOF calculation time (seconds) comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0165972.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0165972.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Jihwan Lee; Nam-Wook Cho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LOF calculation time (seconds) comparison.
Effect sizes calculated using MD and MC, excluding outliers
dro.deakin.edu.au
researchdata.edu.au
txt
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Don Driscoll (2024). Effect sizes calculated using MD and MC, excluding outliers [Dataset]. http://doi.org/10.26187/deakin.26264351.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.26187/deakin.26264351.v1
Dataset updated
Nov 7, 2024
Dataset provided by
Deakin Universityhttp://www.deakin.edu.au/
Authors
Don Driscoll
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Effect sizes calculated using mean difference for burnt-unburnt study designs and mean change for before-after desings. Outliers, as defined in the methods section of the paper, were excluded prior to calculating effect sizes.
f
DataSheet_1_Research on outlier detection in CTD conductivity data based on...
frontiersin.figshare.com
docx
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Long Yu; Jia Sun; Yanliang Guo; Baohua Zhang; Guangbing Yang; Liang Chen; Xia Ju; Fanlin Yang; Xuejun Xiong; Xianqing Lv (2023). DataSheet_1_Research on outlier detection in CTD conductivity data based on cubic spline fitting.docx [Dataset]. http://doi.org/10.3389/fmars.2022.1030980.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmars.2022.1030980.s001
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Long Yu; Jia Sun; Yanliang Guo; Baohua Zhang; Guangbing Yang; Liang Chen; Xia Ju; Fanlin Yang; Xuejun Xiong; Xianqing Lv
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Outlier detection is the key to the quality control of marine survey data. For the detection of outliers in Conductivity-Temperature-Depth (CTD) data, previous methods, such as the Wild Edit method and the Median Filter Combined with Maximum Deviation method, mostly set a threshold based on statistics. Values greater than the threshold are treated as outliers, but there is no clear specification for the selection of threshold, thus multiple attempts are required. The process is time-consuming and inefficient, and the results have high false negative and positive rates. In response to this problem, we proposed an outlier detection method in CTD conductivity data, based on a physical constraint, the continuity of seawater. The method constructs a cubic spline fitting function based on the independent points scheme and the cubic spline interpolation to fit the conductivity data. The maximum fitting residual points will be flagged as outliers. The fitting stops when the optimal number of iterations is reached, which is automatically obtained by the minimum value of the sequence of maximum fitting residuals. Verification of the accuracy and stability of the method by means of examples proves that it has a lower false negative rate (17.88%) and false positive rate (0.24%) than other methods. Indeed, rates for the Wild Edit method are 56.96% and 2.19%, while for the Median Filter Combined with Maximum Deviation method rates are 23.28% and 0.31%. The Cubic Spline Fitting method is simple to operate, the result is clear and definite, better solved the problem of conductivity outliers detection.
S
Water quality test data
scidb.cn
Updated Oct 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HuiyunFeng; JingangJiang (2022). Water quality test data [Dataset]. http://doi.org/10.57760/sciencedb.05375
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.05375
Dataset updated
Oct 26, 2022
Dataset provided by
Science Data Bank
Authors
HuiyunFeng; JingangJiang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Outliers are often present in large datasets of water quality monitoring time series data. A method of combining the sliding window technique with Dixon detection criterion for the automatic detection of outliers in time series data is limited by the empirical determination of sliding window sizes. The scientific determination of the optimal sliding window size is very meaningful research work. This paper presents a new Monte Carlo Search Method (MCSM) based on random sampling to optimize the size of the sliding window, which fully takes advantage of computers and statistics. The MCSM was applied in a case study to automatic monitoring data of water quality factors in order to test its validity and usefulness. The results of comparing the accuracy and efficiency of the MCSM show that the new method in this paper is scientific and effective. The experimental results show that, at different sample sizes, the average accuracy is between 58.70% and 75.75%, and the average computation time increase is between 17.09% and 45.53%. In the era of big data in environmental monitoring, the proposed new methods can meet the required accuracy of outlier detection and improve the efficiency of calculation.
v
11: Streamwater sample constituent concentration outliers from 15 watersheds...
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). 11: Streamwater sample constituent concentration outliers from 15 watersheds in Gwinnett County, Georgia for water years 2003-2020 [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/11-streamwater-sample-constituent-concentration-outliers-from-15-watersheds-in-gwinne-2003
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Gwinnett County, Georgia
Description
This dataset contains a list of outlier sample concentrations identified for 17 water quality constituents from streamwater sample collected at 15 study watersheds in Gwinnett County, Georgia for water years 2003 to 2020. The 17 water quality constituents are: biochemical oxygen demand (BOD), chemical oxygen demand (COD), total suspended solids (TSS), suspended sediment concentration (SSC), total nitrogen (TN), total nitrate plus nitrite (NO3NO2), total ammonia plus organic nitrogen (TKN), dissolved ammonia (NH3), total phosphorus (TP), dissolved phosphorus (DP), total organic carbon (TOC), total calcium (Ca), total magnesium (Mg), total copper (TCu), total lead (TPb), total zinc (TZn), and total dissolved solids (TDS). 885 outlier concentrations were identified. Outliers were excluded from model calibration datasets used to estimate streamwater constituent loads for 12 of these constituents. Outlier concentrations were removed because they had a high influence on the model fits of the concentration relations, which could substantially affect model predictions. Identified outliers were also excluded from loads that were calculated using the Beale ratio estimator. Notes on reason(s) for considering a concentration as an outlier are included.
Capital Ratios For Acute Care Hospitals
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Capital Ratios For Acute Care Hospitals [Dataset]. https://www.johnsnowlabs.com/marketplace/capital-ratios-for-acute-care-hospitals/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
United States
Description
This dataset is used to determine whether a case qualifies for outlier payments under the hospital inpatient prospective payment system (IPPS), hospital-specific cost-to-charge ratios are applied to the total covered charges for the case. Operating and capital costs for the case are calculated separately by applying separate operating and capital cost-to-charge ratios and combining these costs to compare them with the fixed-loss outlier threshold.
R code
figshare.com
txt
Updated Jun 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5021297.v1
Dataset updated
Jun 5, 2017
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Christine Dodge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers
e
HGW: Lead, Average total content (surface)
data.europa.eu
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). HGW: Lead, Average total content (surface) [Dataset]. https://data.europa.eu/88u/dataset/29e63b6a-6db6-ea92-29f8-e5260aaf3001
Explore at:
Dataset updated
Aug 28, 2024
Description
The mean is the median (synonym: 50. percentile, central value). It is the value above or below which 50% of all cases of a data group are located. The calculation is carried out on outlier-adjusted data collectives. The total content is determined from the aqua regia extract (according to DIN ISO 11466 (1997)). The concentration is given in mg/kg. The salary classes take into account, among other things, the pension values of the BBodSchV (1999). These are 40 mg/kg for sand, 70 mg/kg for clay, silt and very silty sand and 100 mg/kg for clay. According to LABO (2003) a sample count of >=20 is required for the calculation of background values. However, the map also shows groups with a sample count >= 10. This information is then only informal and not representative.
Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health...
zenodo.org
bin, csv, pdf
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Domjan; Peter Domjan; Viola Angyal; Viola Angyal; Istvan Vingender; Istvan Vingender (2024). Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary" [Dataset]. http://doi.org/10.5281/zenodo.13826993
Explore at:
bin, pdf, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13826993
Dataset updated
Sep 23, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Domjan; Peter Domjan; Viola Angyal; Viola Angyal; Istvan Vingender; Istvan Vingender
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 23, 2024
Area covered
Hungary
Description
Introduction

We are enclosing the database used in our research titled "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary", along with our statistical calculations. For the sake of reproducibility, further information can be found in the file Short_Description_of_Data_Analysis.pdf and Statistical_formulas.pdf

The sharing of data is part of our aim to strengthen the base of our scientific research. As of March 7, 2024, the detailed submission and analysis of our research findings to a scientific journal has not yet been completed.

The dataset was expanded on 23rd September 2024 to include SPSS statistical analysis data, a heatmap, and buffer zone analysis around the Health Development Offices (HDOs) created in QGIS software.

Short Description of Data Analysis and Attached Files (datasets):

Our research utilised data from 2022, serving as the basis for statistical standardisation. The 2022 Hungarian census provided an objective basis for our analysis, with age group data available at the county level from the Hungarian Central Statistical Office (KSH) website. The 2022 demographic data provided an accurate picture compared to the data available from the 2023 microcensus. The used calculation is based on our standardisation of the 2022 data. For xlsx files, we used MS Excel 2019 (version: 1808, build: 10406.20006) with the SOLVER add-in.

Hungarian Central Statistical Office served as the data source for population by age group, county, and regions: https://www.ksh.hu/stadat_files/nep/hu/nep0035.html, (accessed 04 Jan. 2024.) with data recorded in MS Excel in the Data_of_demography.xlsx file.

In 2022, 108 Health Development Offices (HDOs) were operational, and it's noteworthy that no developments have occurred in this area since 2022. The availability of these offices and the demographic data from the Central Statistical Office in Hungary are considered public interest data, freely usable for research purposes without requiring permission.

The contact details for the Health Development Offices were sourced from the following page (Hungarian National Population Centre (NNK)): https://www.nnk.gov.hu/index.php/efi (n=107). The Semmelweis University Health Development Centre was not listed by NNK, hence it was separately recorded as the 108th HDO. More information about the office can be found here: https://semmelweis.hu/egeszsegfejlesztes/en/ (n=1). (accessed 05 Dec. 2023.)

Geocoordinates were determined using Google Maps (N=108): https://www.google.com/maps. (accessed 02 Jan. 2024.) Recording of geocoordinates (latitude and longitude according to WGS 84 standard), address data (postal code, town name, street, and house number), and the name of each HDO was carried out in the: Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file.

The foundational software for geospatial modelling and display (QGIS 3.34), an open-source software, can be downloaded from:

https://qgis.org/en/site/forusers/download.html. (accessed 04 Jan. 2024.)

The HDOs_GeoCoordinates.gpkg QGIS project file contains Hungary's administrative map and the recorded addresses of the HDOs from the

Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file,

imported via .csv file.

The OpenStreetMap tileset is directly accessible from www.openstreetmap.org in QGIS. (accessed 04 Jan. 2024.)

The Hungarian county administrative boundaries were downloaded from the following website: https://data2.openstreetmap.hu/hatarok/index.php?admin=6 (accessed 04 Jan. 2024.)

HDO_Buffers.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding buffer zones with a radius of 7.5 km.

Heatmap.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding heatmap (Kernel Density Estimation).

A brief description of the statistical formulas applied is included in the Statistical_formulas.pdf.

Recording of our base data for statistical concentration and diversification measurement was done using MS Excel 2019 (version: 1808, build: 10406.20006) in .xlsx format.

Aggregated number of HDOs by county: Number_of_HDOs.xlsx

Standardised data (Number of HDOs per 100,000 residents): Standardized_data.xlsx

Calculation of the Lorenz curve: Lorenz_curve.xlsx

Calculation of the Gini index: Gini_Index.xlsx

Calculation of the LQ index: LQ_Index.xlsx

Calculation of the Herfindahl-Hirschman Index: Herfindahl_Hirschman_Index.xlsx

Calculation of the Entropy index: Entropy_Index.xlsx

Regression and correlation analysis calculation: Regression_correlation.xlsx

Using the SPSS 29.0.1.0 program, we performed the following statistical calculations with the databases Data_HDOs_population_without_outliers.sav and Data_HDOs_population.sav:

Regression curve estimation with elderly population and number of HDOs, excluding outlier values (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_elderly_without_outlier.spv

Pearson correlation table between the total population, elderly population, and number of HDOs per county, excluding outlier values such as Budapest and Pest County: Pearson_Correlation_populations_HDOs_number_without_outliers.spv.

Dot diagram including total population and number of HDOs per county, excluding outlier values such as Budapest and Pest Counties: Dot_HDO_total_population_without_outliers.spv.

Dot diagram including elderly (64<) population and number of HDOs per county, excluding outlier values such as Budapest and Pest Counties: Dot_HDO_elderly_population_without_outliers.spv

Regression curve estimation with total population and number of HDOs, excluding outlier values (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_without_outlier.spv

Dot diagram including elderly (64<) population and number of HDOs per county: Dot_HDO_elderly_population.spv

Dot diagram including total population and number of HDOs per county: Dot_HDO_total_population.spv

Pearson correlation table between the total population, elderly population, and number of HDOs per county: Pearson_Correlation_populations_HDOs_number.spv

Regression curve estimation with total population and number of HDOs, (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_total_population.spv

For easier readability, the files have been provided in both SPV and PDF formats.

The translation of these supplementary files into English was completed on 23rd Sept. 2024.

If you have any further questions regarding the dataset, please contact the corresponding author: domjan.peter@phd.semmelweis.hu
Probability-Density-Ranking (PDR) outliers and Most Probable Range (MPR) of...
springernature.figshare.com
application/gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenghua Shao; Huanwang Yang; Sijiang Wang; Zonghong Liu; Stephen K. Burley (2023). Probability-Density-Ranking (PDR) outliers and Most Probable Range (MPR) of PDB data [Dataset]. http://doi.org/10.6084/m9.figshare.7150124.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7150124.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Chenghua Shao; Huanwang Yang; Sijiang Wang; Zonghong Liu; Stephen K. Burley
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data and code to calculate Probability-Density-Ranking (PDR) outliers and Most Probable Range (MPR)
C
Arsenic, mean total content (topsoil)
ckan.mobidatalab.eu
html, karte +2
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landesamt für Geologie und Bergbau (2023). Arsenic, mean total content (topsoil) [Dataset]. https://ckan.mobidatalab.eu/dataset/arsen-mittlerer-gesamtgehalt-oberboden
Explore at:
webanwendung, html, wms, karteAvailable download formats
Dataset updated
Jun 13, 2023
Dataset provided by
Landesamt für Geologie und Bergbau
License
Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
Description
The median (synonym: 50th percentile, central value) is used as the mean value. It is the value above or below which 50% of all cases in a data group are. The calculation is carried out on outlier-free data collectives. The total content is determined from the aqua regia extract (according to DIN ISO 11466 (1997)). The concentration is given in mg/kg. The BBodSchV (1999) does not set any precautionary values for arsenic. According to LABO (2003), a sample number of >=20 is required for the calculation of background values. However, groups with a number of samples >= 10 are also shown on the map. This information is then only informal and not representative. Further information on definitions of terms, horizon grouping and statistical evaluation: (http://mapserver.lgb-rlp.de/php_hgw_bod/meta/Background values_Hinweise.pdf) Terms of use see: http://www.lgb-rlp.de/karten-und- products/online-maps/terms-of-use-for-online-maps.html
g
HGW: Nickel, Average total content (surface) | gimi9.com
gimi9.com
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). HGW: Nickel, Average total content (surface) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-www-lgb-rlp-de-registry-spatial-dataset-d2f431c4-b2b1-9168-14f3-7e374c40d2f0
Explore at:
Dataset updated
Aug 28, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The mean is the median (synonym: 50. percentile, central value). It is the value above or below which 50% of all cases of a data group are located. The calculation is carried out on outlier-adjusted data collectives. The total content is determined from the aqua regia extract (according to DIN ISO 11466 (1997)). The concentration is given in mg/kg. The salary classes take into account, among other things, the pension values of the BBodSchV (1999). These are 15 mg/kg for the soil type sand, 50 mg/kg for clay, silt and heavily silty sand and 70 mg/kg for clay. According to LABO (2003) a sample count of >=20 is required for the calculation of background values. However, the map also shows groups with a sample count >= 10. This information is then only informal and not representative.
f
Data from: OUTLIERS DETECTION BY RANSAC ALGORITHM IN THE TRANSFORMATION OF...
scielo.figshare.com
jpeg
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joanna Janicka; Jacek Rapinski (2023). OUTLIERS DETECTION BY RANSAC ALGORITHM IN THE TRANSFORMATION OF 2D COORDINATE FRAMES [Dataset]. http://doi.org/10.6084/m9.figshare.14327643.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14327643.v1
Dataset updated
Jun 11, 2023
Dataset provided by
SciELO journals
Authors
Joanna Janicka; Jacek Rapinski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Over the years there have been a number of different computational methods that allow for the identification of outliers. Methods for robust estimation are known in the set of M-estimates methods (derived from the method of Maximum Likelihood Estimation) or in the set of R-estimation methods (robust estimation based on the application of some rank test). There are also algorithms that are not classified in any of these groups but these methods are also resistant to gross errors, for example, in M-split estimation. Another proposal, which can be used to detect outliers in the process of transformation of coordinates, where the coordinates of some points may be affected by gross errors, can be a method called RANSAC algorithm (Random Sample and Consensus). The authors present a study that was performed in the process of 2D transformation parameter estimation using RANSAC algorithm to detect points that have coordinates with outliers. The calculations were performed in three scenarios on the real geodetic network. Selected coordinates were burdened with simulated values of errors to confirm the efficiency of the proposed method.
C
Lead, mean total content (topsoil)
ckan.mobidatalab.eu
html, karte +2
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landesamt für Geologie und Bergbau (2023). Lead, mean total content (topsoil) [Dataset]. https://ckan.mobidatalab.eu/dataset/blei-mittlerer-gesamtgehalt-oberboden
Explore at:
karte, webanwendung, wms, htmlAvailable download formats
Dataset updated
Jun 13, 2023
Dataset provided by
Landesamt für Geologie und Bergbau
License
Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
Description
The median (synonym: 50th percentile, central value) is used as the mean value. It is the value above or below which 50% of all cases in a data group are. The calculation is carried out on outlier-free data collectives. The total content is determined from the aqua regia extract (according to DIN ISO 11466 (1997)). The concentration is given in mg/kg. The salary classes take into account, among other things, the precautionary values of the BBodSchV (1999). These are 40 mg/kg for the soil type sand, 70 mg/kg for loam, silt and very silty sand and 100 mg/kg for clay. According to LABO (2003), a sample number of >=20 is required for the calculation of background values. However, groups with a number of samples >= 10 are also shown on the map. This information is then only informal and not representative.
d
Data from: Simulation as a new tool to establish benchmark outcome measures...
datadryad.org
zip
Updated Jun 17, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matt M. Kurrek; Pamela Morgan; Steven Howard; Peter Kranke; Aaron Calhoun; Joshua Hui; Alex Kiss (2016). Simulation as a new tool to establish benchmark outcome measures in obstetrics [Dataset]. http://doi.org/10.5061/dryad.8s511
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8s511
Dataset updated
Jun 17, 2016
Dataset provided by
Dryad
Authors
Matt M. Kurrek; Pamela Morgan; Steven Howard; Peter Kranke; Aaron Calhoun; Joshua Hui; Alex Kiss
Time period covered
Dec 20, 2014
Description
20141220 Database for PLoS ONE Manuscript
f
Sensitivity analysis on pooled HR of mortality after excluding the outliers...
datasetcatalog.nlm.nih.gov
Updated Mar 14, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qin, Xue; Liu, Zhengtao; Peng, Tao; Wang, Linlin; Ning, Huaijun; Que, Shuping (2014). Sensitivity analysis on pooled HR of mortality after excluding the outliers classified by age. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001207275
Explore at:
Dataset updated
Mar 14, 2014
Authors
Qin, Xue; Liu, Zhengtao; Peng, Tao; Wang, Linlin; Ning, Huaijun; Que, Shuping
Description
aHR calculated in younger subgroup was the dose-response evaluation assessed per 5 U/l of ALT increment;HR calculated in older subgroup was the evaluation compared between higher and lower ALT categories.Abbreviations: CI: confidence interval; CV: cardiovascular; HR: hazard ratio.
f
Data_Sheet_1_Computational Surveillance of Microbial Water Quality With...
frontiersin.figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marie C. Sadler; Jérémy Senouillet; Simon Kuenzi; Luigino Grasso; Douglas C. Watson (2023). Data_Sheet_1_Computational Surveillance of Microbial Water Quality With Online Flow Cytometry.PDF [Dataset]. http://doi.org/10.3389/frwa.2020.586969.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frwa.2020.586969.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Marie C. Sadler; Jérémy Senouillet; Simon Kuenzi; Luigino Grasso; Douglas C. Watson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Automated flow cytometry (FCM) adapted to real-time quality surveillance provides high-temporal-resolution data about the microbial communities in a water system. The cell concentration calculated from FCM measurements indicates sudden increases in the number of bacteria, but can fluctuate significantly due to man-made and natural dynamics; it can thus obscure the presence of microbial anomalies. Cytometric fingerprinting tools enable a detailed analysis of the aquatic microbial communities, and could distinguish between normal and abnormal community changes. However, the vast majority of current cytometric fingerprinting tools use offline statistical computations which cannot detect anomalies immediately. Here, we present a computational model, entitled Microbial Community Change Detection (MCCD), which transforms microbial community characteristics into an online process control signal (herein called outlier score) that remains close to zero if the microbial community remains stable and increases with fluctuations in the community. The model is based on fingerprints and distance-based outlier calculations. We tested it in silico and in vitro by simulating acute contaminations to real-world water systems with large inherent microbial fluctuations. We showed that the outlier score was robust against these dynamic variations, while reliably detecting intentional contaminations. This model can be used with automated FCM to quickly detect potential microbiological contamination, and this especially when the time between treatment and distribution is very short.
f
Data from: PCP-SAFT Parameters of Pure Substances Using Large Experimental...
acs.figshare.com
zip
Updated Sep 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timm Esper; Gernot Bauer; Philipp Rehner; Joachim Gross (2023). PCP-SAFT Parameters of Pure Substances Using Large Experimental Databases [Dataset]. http://doi.org/10.1021/acs.iecr.3c02255.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.iecr.3c02255.s001
Dataset updated
Sep 6, 2023
Dataset provided by
ACS Publications
Authors
Timm Esper; Gernot Bauer; Philipp Rehner; Joachim Gross
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This work reports pure component parameters for the PCP-SAFT equation of state for 1842 substances using a total of approximately 551 172 experimental data points for vapor pressure and liquid density. We utilize data from commercial and public databases in combination with an automated workflow to assign chemical identifiers to all substances, remove duplicate data sets, and filter unsuited data. The use of raw experimental data, as opposed to pseudoexperimental data from empirical correlations, requires means to identify and remove outliers, especially for vapor pressure data. We apply robust regression using a Huber loss function. For identifying and removing outliers, the empirical Wagner equation for vapor pressure is adjusted to experimental data, because the Wagner equation is mathematically rather flexible and is thus not subject to a systematic model bias. For adjusting model parameters of the PCP-SAFT model, nonpolar, dipolar and associating substances are distinguished. The resulting substance-specific parameters of the PCP-SAFT equation of state yield in a mean absolute relative deviation of the of 2.73% for vapor pressure and 0.52% for liquid densities (2.56% and 0.47% for nonpolar substances, 2.67% and 0.61% for dipolar substances, and 3.24% and 0.54% for associating substances) when evaluated against outlier-removed data. All parameters are provided as JSON and CSV files.

Facebook

Twitter

Click to copy link

Link copied

Cite

Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. http://doi.org/10.7910/DVN/SKEHRJ

Data from: Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas

Explore at:

Unique identifier

https://doi.org/10.7910/DVN/SKEHRJ

Dataset updated

Jun 2, 2021

Dataset provided by

Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center

Description

Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.

Clear search

Close search

Google apps

Main menu

Data from: Outlier classification using autoencoders: application for...

The 12 outliers identified in the Tonga dataset.

LOF calculation time (seconds) comparison.

Effect sizes calculated using MD and MC, excluding outliers

DataSheet_1_Research on outlier detection in CTD conductivity data based on...

Water quality test data

11: Streamwater sample constituent concentration outliers from 15 watersheds...

Capital Ratios For Acute Care Hospitals

R code

HGW: Lead, Average total content (surface)

Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health...

Probability-Density-Ranking (PDR) outliers and Most Probable Range (MPR) of...

Arsenic, mean total content (topsoil)

HGW: Nickel, Average total content (surface) | gimi9.com

Data from: OUTLIERS DETECTION BY RANSAC ALGORITHM IN THE TRANSFORMATION OF...

Lead, mean total content (topsoil)

Data from: Simulation as a new tool to establish benchmark outcome measures...

Sensitivity analysis on pooled HR of mortality after excluding the outliers...

Data_Sheet_1_Computational Surveillance of Microbial Water Quality With...

Data from: PCP-SAFT Parameters of Pure Substances Using Large Experimental...

Data from: Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmasSee More Versions

Data from: Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas