37 datasets found

f
Data from: Methodology to filter out outliers in high spatial density data...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14305658.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
Data from: Outlier classification using autoencoders: application for...
osti.gov
Updated Jun 2, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bianchi, F. M.; Brunner, D.; Kube, R.; LaBombard, B. (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/1882649-outlier-classification-using-autoencoders-application-fluctuation-driven-flows-fusion-plasmas
Explore at:
Dataset updated
Jun 2, 2021
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
Authors
Bianchi, F. M.; Brunner, D.; Kube, R.; LaBombard, B.
Description
Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that aremore » identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.« less
a
Find Outliers Percent of households with income below the Federal Poverty...
uscssi.hub.arcgis.com
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spatial Sciences Institute (2021). Find Outliers Percent of households with income below the Federal Poverty Level [Dataset]. https://uscssi.hub.arcgis.com/maps/USCSSI::find-outliers-percent-of-households-with-income-below-the-federal-poverty-level
Explore at:
Dataset updated
Dec 5, 2021
Dataset authored and provided by
Spatial Sciences Institute
Area covered

Description
The following report outlines the workflow used to optimize your Find Outliers result:Initial Data Assessment.There were 1684 valid input features.POVERTY Properties:Min0.0000Max91.8000Mean18.9902Std. Dev.12.7152There were 22 outlier locations; these will not be used to compute the optimal fixed distance band.Scale of AnalysisThe optimal fixed distance band was based on the average distance to 30 nearest neighbors: 3709.0000 Meters.Outlier AnalysisCreating the random reference distribution with 499 permutations.There are 1155 output features statistically significant based on a FDR correction for multiple testing and spatial dependence.There are 68 statistically significant high outlier features.There are 84 statistically significant low outlier features.There are 557 features part of statistically significant low clusters.There are 446 features part of statistically significant high clusters.OutputPink output features are part of a cluster of high POVERTY values.Light Blue output features are part of a cluster of low POVERTY values.Red output features represent high outliers within a cluster of low POVERTY values.Blue output features represent low outliers within a cluster of high POVERTY values.
i
Fifth Generation Wireless Channels Outlier Detection and Clustering
ieee-dataport.org
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jojo Blanza (2024). Fifth Generation Wireless Channels Outlier Detection and Clustering [Dataset]. https://ieee-dataport.org/documents/fifth-generation-wireless-channels-outlier-detection-and-clustering
Explore at:
Dataset updated
May 27, 2024
Authors
Jojo Blanza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
lower latency
f
Observed to expected or logistic regression to identify hospitals with high...
figshare.com
7z
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doris Tove Kristoffersen; Jon Helgeland; Jocelyne Clench-Aas; Petter Laake; Marit B. Veierød (2023). Observed to expected or logistic regression to identify hospitals with high or low 30-day mortality? [Dataset]. http://doi.org/10.1371/journal.pone.0195248
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0195248
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Doris Tove Kristoffersen; Jon Helgeland; Jocelyne Clench-Aas; Petter Laake; Marit B. Veierød
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionA common quality indicator for monitoring and comparing hospitals is based on death within 30 days of admission. An important use is to determine whether a hospital has higher or lower mortality than other hospitals. Thus, the ability to identify such outliers correctly is essential. Two approaches for detection are: 1) calculating the ratio of observed to expected number of deaths (OE) per hospital and 2) including all hospitals in a logistic regression (LR) comparing each hospital to a form of average over all hospitals. The aim of this study was to compare OE and LR with respect to correctly identifying 30-day mortality outliers. Modifications of the methods, i.e., variance corrected approach of OE (OE-Faris), bias corrected LR (LR-Firth), and trimmed mean variants of LR and LR-Firth were also studied.Materials and methodsTo study the properties of OE and LR and their variants, we performed a simulation study by generating patient data from hospitals with known outlier status (low mortality, high mortality, non-outlier). Data from simulated scenarios with varying number of hospitals, hospital volume, and mortality outlier status, were analysed by the different methods and compared by level of significance (ability to falsely claim an outlier) and power (ability to reveal an outlier). Moreover, administrative data for patients with acute myocardial infarction (AMI), stroke, and hip fracture from Norwegian hospitals for 2012–2014 were analysed.ResultsNone of the methods achieved the nominal (test) level of significance for both low and high mortality outliers. For low mortality outliers, the levels of significance were increased four- to fivefold for OE and OE-Faris. For high mortality outliers, OE and OE-Faris, LR 25% trimmed and LR-Firth 10% and 25% trimmed maintained approximately the nominal level. The methods agreed with respect to outlier status for 94.1% of the AMI hospitals, 98.0% of the stroke, and 97.8% of the hip fracture hospitals.ConclusionWe recommend, on the balance, LR-Firth 10% or 25% trimmed for detection of both low and high mortality outliers.
f
The 12 outliers identified in the Tonga dataset.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Nov 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mayfield, Anderson B.; Dempsey, Alexandra C.; Chen, Chii-Shiarng (2017). The 12 outliers identified in the Tonga dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001760878
Explore at:
Dataset updated
Nov 1, 2017
Authors
Mayfield, Anderson B.; Dempsey, Alexandra C.; Chen, Chii-Shiarng
Description
Gene expression data have been presented as non-normalized (2-Ct*109) in all but the last six rows; this allows for the back-calculation of the raw threshold cycle (Ct) values so that interested individuals can readily estimate the typical range of expression of each gene. Values representing aberrant levels for a particular parameter (z-score>2.5) have been highlighted in bold. When there was a statistically significant difference (student’s t-test, p<0.05) between the outlier and non-outlier averages for a parameter (instead using normalized gene expression data), the lower of the two values has been underlined. All samples hosted Symbiodinium of clade C only unless noted otherwise. The mean Mahalanobis distance did not differ between Pocillopora damicornis and P. acuta (student’s t-test, p>0.05). SA = surface area. GCP = genome copy proportion. Ma Dis = Mahalanobis distance. “.” = missing data.
s
Citation Trends for "Sparse online low-rank projection and outlier rejection...
shibatadb.com
Updated Sep 23, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2011). Citation Trends for "Sparse online low-rank projection and outlier rejection (SOLO) for 3-D rigid-body motion registration" [Dataset]. https://www.shibatadb.com/article/SKbAgbBG
Explore at:
Dataset updated
Sep 23, 2011
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2014 - 2016
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "Sparse online low-rank projection and outlier rejection (SOLO) for 3-D rigid-body motion registration".
outlier detection text reducing
kaggle.com
Updated Aug 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Mortezaie (2025). outlier detection text reducing [Dataset]. https://www.kaggle.com/datasets/alimortezaie/outlier-detection-text-reducing
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 7, 2025
Dataset provided by
Kaggle
Authors
Ali Mortezaie
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Ali Mortezaie

Released under Apache 2.0

Contents
u
Association analysis of high-low outlier road intersection crashes involving...
zivahub.uct.ac.za
xlsx
Updated Jun 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Vieira; Simon Hull; Roger Behrens (2024). Association analysis of high-low outlier road intersection crashes involving public transport within the CoCT in 2017, 2018, 2019 and 2021 [Dataset]. http://doi.org/10.25375/uct.25976179.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.25976179.v1
Dataset updated
Jun 7, 2024
Dataset provided by
University of Cape Town
Authors
Simone Vieira; Simon Hull; Roger Behrens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
City of Cape Town
Description
This dataset provides comprehensive information on road intersection crashes involving public transport (Bus, Bus-train, Combi/minibusses, midibusses) recognised as "high-low" outliers within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in at least 10% of the total "high-low" outlier public transport road intersection crashes for the years 2017, 2018, 2019, and 2021.The dataset is meticulously organised according to support metric values, ranging from 0,10 to 0,17, with entries presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 65,9 KBNumber of Files: The dataset contains a total of 1280 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-low" outlier fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes involving public transport that occurred within the "high-low" outlier fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,10 support metric value. Consequently, commonly occurring crash attributes among at least 10% of the "high-low" outlier road intersection public transport crashes were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)
s
Citation Trends for "Non-convex low-rank matrix recovery with arbitrary...
shibatadb.com
Updated May 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2019). Citation Trends for "Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent" [Dataset]. https://www.shibatadb.com/article/yUqSNnAw
Explore at:
Dataset updated
May 7, 2019
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2020 - 2024
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent".
f
Pairwise correlations between stability estimates excluding the outliers.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jan 28, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dediu, Dan; Cysouw, Michael (2013). Pairwise correlations between stability estimates excluding the outliers. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001721675
Explore at:
Dataset updated
Jan 28, 2013
Authors
Dediu, Dan; Cysouw, Michael
Description
Upper diagonal: Pearson’s ; lower diagonal: Spearman’s ; within cells, upper line is the correlation estimate (*stands for significant correlation at -level = 0.05, **the correlation is significant at -level = 0.01; all significant correlations are in bold) and the lower line is the -value.
n
Anolis carolinensis character displacement SNP
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Douglas Crawford (2023). Anolis carolinensis character displacement SNP [Dataset]. http://doi.org/10.5061/dryad.qbzkh18ks
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.qbzkh18ks
Dataset updated
Jan 27, 2023
Dataset provided by
University of Miami
Authors
Douglas Crawford
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Here are six files that provide details for all 44,120 identified single nucleotide polymorphisms (SNPs) or the 215 outlier SNPs associated with the evolution of rapid character displacement among replicate islands with (2Spp) and without competition (1Spp) between two Anolis species. On 2Spp islands, A. carolinensis occurs higher in trees and have evolved larger toe pads. Among 1Spp and 2Spp island populations, we identify 44,120 SNPs, with 215-outlier SNPs with improbably large FST values, low nucleotide variation, greater linkage than expected, and these SNPs are enriched for animal walking behavior. Thus, we conclude that these 215-outliers are evolving by natural selection in response to the phenotypic convergent evolution of character displacement. There are two, non-mutually exclusive perspective of these nucleotide variants. One is character displacement is convergent: all 215 outlier SNPs are shared among 3 out of 5 2Spp island and 24% of outlier SNPS are shared among all five out of five 2Spp island. Second, character displacement is genetically redundant because the allele frequencies in one or more 2Spp are similar to 1Spp islands: among one or more 2Spp islands 33% of outlier SNPS are within the range of 1Spp MiAF and 76% of outliers are more similar to 1Spp island than mean MiAF of 2Spp islands. Focusing on convergence SNP is scientifically more robust, yet it distracts from the perspective of multiple genetic solutions that enhances the rate and stability of adaptive change. The six files include: a description of eight islands, details of 94 individuals, and four files on SNPs. The four SNP files include the VCF files for 94 individuals with 44KSNPs and two files (Excel sheet/tab-delimited file) with FST, p-values and outlier status for all 44,120 identified single nucleotide polymorphisms (SNPs) associated with the evolution of rapid character displacement. The sixth file is a detailed file on the 215 outlier SNPs. Complete sequence data is available at Bioproject PRJNA833453, which including samples not included in this study. The 94 individuals used in this study are described in “Supplemental_Sample_description.txt” Methods Anoles and genomic DNA: Tissue or DNA for 160 Anolis carolinensis and 20 A. sagrei samples were provided by the Museum of Comparative Zoology at Harvard University (Table S2). Samples were previously used to examine evolution of character displacement in native A. carolinensis following invasion by A. sagrei onto man-made spoil islands in Mosquito Lagoon Florida (Stuart et al. 2014). One hundred samples were genomic DNAs, and 80 samples were tissues (terminal tail clip, Table S2). Genomic DNA was isolated from 80 of 160 A. carolinensis individuals (MCZ, Table S2) using a custom SPRI magnetic bead protocol (Psifidi et al. 2015). Briefly, after removing ethanol, tissues were placed in 200 ul of GH buffer (25 mM Tris- HCl pH 7.5, 25 mM EDTA, , 2M GuHCl Guanidine hydrochloride, G3272 SIGMA, 5 mM CaCl2, 0.5% v/v Triton X-100, 1% N-Lauroyl-Sarcosine) with 5% per volume of 20 mg/ml proteinase K (10 ul/200 ul GH) and digested at 55º C for at least 2 hours. After proteinase K digestion, 100 ul of 0.1% carboxyl-modified Sera-Mag Magnetic beads (Fisher Scientific) resuspended in 2.5 M NaCl, 20% PEG were added and allowed to bind the DNA. Beads were subsequently magnetized and washed twice with 200 ul 70% EtOH, and then DNA was eluted in 100 ul 0.1x TE (10 mM Tris, 0.1 mM EDTA). All DNA samples were gel electrophoresed to ensure high molecular mass and quantified by spectrophotometry and fluorescence using Biotium AccuBlueTM High Sensitivity dsDNA Quantitative Solution according to manufacturer’s instructions. Genotyping-by-sequencing (GBS) libraries were prepared using a modified protocol after Elshire et al. (Elshire et al. 2011). Briefly, high-molecular-weight genomic DNA was aliquoted and digested using ApeKI restriction enzyme. Digests from each individual sample were uniquely barcoded, pooled, and size selected to yield insert sizes between 300-700 bp (Borgstrom et al. 2011). Pooled libraries were PCR amplified (15 cycles) using custom primers that extend into the genomic DNA insert by 3 bases (CTG). Adding 3 extra base pairs systematically reduces the number of sequenced GBS tags, ensuring sufficient sequencing depth. The final library had a mean size of 424 bp ranging from 188 to 700 bp . Anolis SNPs: Pooled libraries were sequenced on one lane on the Illumina HiSeq 4000 in 2x150 bp paired-end configuration, yielding approximately 459 million paired-end reads ( ~138 Gb). The medium Q-Score was 42 with the lower 10% Q-Scores exceeding 32 for all 150 bp. The initial library contained 180 individuals with 8,561,493 polymorphic sites. Twenty individuals were Anolis sagrei, and two individuals (Yan 1610 & Yin 1411) clustered with A. sagrei and were not used to define A. carolinesis’ SNPs. Anolis carolinesis reads were aligned to the Anolis carolinensis genome (NCBI RefSeq accession number:/GCF_000090745.1_AnoCar2.0). Single nucleotide polymorphisms (SNPs) for A. carolinensis were called using the GBeaSy analysis pipeline (Wickland et al. 2017) with the following filter settings: minimum read length of 100 bp after barcode and adapter trimming, minimum phred-scaled variant quality of 30 and minimum read depth of 5. SNPs were further filtered by requiring SNPs to occur in > 50% of individuals, and 66 individuals were removed because they had less than 70% of called SNPs. These filtering steps resulted in 51,155 SNPs among 94 individuals. Final filtering among 94 individuals required all sites to be polymorphic (with fewer individuals, some sites were no longer polymorphic) with a maximum of 2 alleles (all are bi-allelic), minimal allele frequency 0.05, and He that does not exceed HWE (FDR <0.01). SNPs with large He were removed (2,280 SNPs). These SNPs with large significant heterozygosity may result from aligning paralogues (different loci), and thus may not represent polymorphisms. No SNPs were removed with low He (due to possible demography or other exceptions to HWE). After filtering, 94 individual yielded 44,120 SNPs. Thus, the final filtered SNP data set was 44K SNPs from 94 indiviuals. Statistical Analyses: Eight A. carolinensis populations were analyzed: three populations from islands with native species only (1Spp islands) and 5 populations from islands where A. carolinesis co-exist with A. sagrei (2Spp islands, Table 1, Table S1). Most analyses pooled the three 1Spp islands and contrasted these with the pooled five 2Spp islands. Two approaches were used to define SNPs with unusually large allele frequency differences between 1Spp and 2Spp islands: 1) comparison of FST values to random permutations and 2) a modified FDIST approach to identify outlier SNPs with large and statistically unlikely FST values. Random Permutations: FST values were calculated in VCFTools (version 4.2, (Danecek et al. 2011)) where the p-value per SNP were defined by comparing FST values to 1,000 random permutations using a custom script (below). Basically, individuals and all their SNPs were randomly assigned to one of eight islands or to 1Spp versus 2Spp groups. The sample sizes (55 for 2Spp and 39 for 1Spp islands) were maintained. FST values were re-calculated for each 1,000 randomizations using VCFTools. Modified FDIST: To identify outlier SNPs with statistically large FST values, a modified FDIST (Beaumont and Nichols 1996) was implemented in Arlequin (Excoffier et al. 2005). This modified approach applies 50,000 coalescent simulations using hierarchical population structure, in which demes are arranged into k groups of d demes and in which migration rates between demes are different within and between groups. Unlike the finite island models, which have led to large frequencies of false positive because populations share different histories (Lotterhos and Whitlock 2014), the hierarchical island model avoids these false positives by avoiding the assumption of similar ancestry (Excoffier et al. 2009). References Beaumont, M. A. and R. A. Nichols. 1996. Evaluating loci for use in the genetic analysis of population structure. P Roy Soc B-Biol Sci 263:1619-1626. Borgstrom, E., S. Lundin, and J. Lundeberg. 2011. Large scale library generation for high throughput sequencing. PLoS One 6:e19119. Bradbury, P. J., Z. Zhang, D. E. Kroon, T. M. Casstevens, Y. Ramdoss, and E. S. Buckler. 2007. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635. Cingolani, P., A. Platts, L. Wang le, M. Coon, T. Nguyen, L. Wang, S. J. Land, X. Lu, and D. M. Ruden. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6:80-92. Danecek, P., A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, and G. Genomes Project Analysis. 2011. The variant call format and VCFtools. Bioinformatics 27:2156-2158. Earl, D. A. and B. M. vonHoldt. 2011. Structure Harvester: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genet Resour 4:359-361. Elshire, R. J., J. C. Glaubitz, Q. Sun, J. A. Poland, K. Kawamoto, E. S. Buckler, and S. E. Mitchell. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6:e19379. Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611-2620. Excoffier, L., T. Hofer, and M. Foll. 2009. Detecting loci under selection in a hierarchically structured population. Heredity 103:285-298. Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin (version 3.0): An integrated software package for population genetics data analysis.
Data from: Spatial detection of outlier loci with Moran eigenvector maps...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helene H. Wagner; Mariana Chávez-Pesqueira; Brenna R. Forester; Helene H. Wagner; Mariana Chávez-Pesqueira; Brenna R. Forester (2022). Data from: Spatial detection of outlier loci with Moran eigenvector maps (MEM) [Dataset]. http://doi.org/10.5061/dryad.b12kk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.b12kk
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Helene H. Wagner; Mariana Chávez-Pesqueira; Brenna R. Forester; Helene H. Wagner; Mariana Chávez-Pesqueira; Brenna R. Forester
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The spatial signature of microevolutionary processes structuring genetic variation may play an important role in the detection of loci under selection. However, the spatial location of samples has not yet been used to quantify this. Here, we present a new two-step method of spatial outlier detection at the individual and deme levels using the power spectrum of Moran eigenvector maps (MEM). The MEM power spectrum quantifies how the variation in a variable, such as the frequency of an allele at a SNP locus, is distributed across a range of spatial scales defined by MEM spatial eigenvectors. The first step (Moran spectral outlier detection: MSOD) uses genetic and spatial information to identify outlier loci by their unusual power spectrum. The second step uses Moran spectral randomization (MSR) to test the association between outlier loci and environmental predictors, accounting for spatial autocorrelation. Using simulated data from two published papers, we tested this two-step method in different scenarios of landscape configuration, selection strength, dispersal capacity and sampling design. Under scenarios that included spatial structure, MSOD alone was sufficient to detect outlier loci at the individual and deme levels without the need for incorporating environmental predictors. Follow-up with MSR generally reduced (already low) false-positive rates, though in some cases led to a reduction in power. The results were surprisingly robust to differences in sample size and sampling design. Our method represents a new tool for detecting potential loci under selection with individual-based and population-based sampling by leveraging spatial information that has hitherto been neglected.
AI Histology QC Outlier Detection Tool Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). AI Histology QC Outlier Detection Tool Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-histology-qc-outlier-detection-tool-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
AI Histology QC Outlier Detection Tool Market Outlook

According to our latest research, the global AI Histology QC Outlier Detection Tool market size reached USD 412 million in 2024, with a robust compound annual growth rate (CAGR) of 18.7% observed over the past year. The market’s expansion is primarily driven by the increasing adoption of artificial intelligence in digital pathology and the rising demand for high-precision quality control in histological workflows. By 2033, the market is forecasted to reach USD 1.97 billion, reflecting the accelerating integration of AI-powered QC outlier detection tools across clinical and research environments worldwide.

The surge in demand for AI Histology QC Outlier Detection Tools is primarily attributed to the pressing need for accuracy and consistency in histopathological diagnostics. Traditional quality control processes in histology are labor-intensive and prone to human error, which can result in diagnostic discrepancies and impact patient outcomes. The deployment of advanced AI-driven QC outlier detection tools addresses these challenges by automating the identification of anomalies and artifacts in histological slides, ensuring standardized results and significantly reducing turnaround times. Moreover, the integration of machine learning algorithms enables these systems to continuously improve their detection capabilities, further enhancing diagnostic reliability and supporting the growing trend towards digitization in pathology laboratories.

Another significant growth driver for the AI Histology QC Outlier Detection Tool market is the increasing prevalence of cancer and other chronic diseases that require histopathological examination for diagnosis and treatment planning. The rising global cancer burden, coupled with the shortage of skilled pathologists, is pushing healthcare providers to adopt AI-powered solutions that can streamline workflow efficiency and mitigate diagnostic bottlenecks. These tools not only facilitate faster and more accurate detection of outliers in tissue samples but also support pathologists in prioritizing cases that require immediate attention. As a result, healthcare institutions are investing heavily in AI-based QC solutions to optimize resource utilization, improve patient care, and comply with stringent regulatory standards for laboratory quality assurance.

Technological advancements and strategic collaborations between AI developers, pathology labs, and healthcare providers are further accelerating market growth. The ongoing development of sophisticated image analysis algorithms, cloud-based platforms, and interoperability standards is enabling seamless integration of AI QC tools into existing laboratory information systems. Additionally, government initiatives aimed at promoting digital health transformation and funding for AI research in medical diagnostics are creating a favorable environment for market expansion. The proliferation of digital pathology infrastructure, particularly in developed regions, is expected to drive the adoption of AI QC outlier detection tools, while emerging markets are witnessing growing interest as healthcare systems modernize and invest in advanced diagnostic technologies.

From a regional perspective, North America currently dominates the AI Histology QC Outlier Detection Tool market, accounting for a significant share of global revenues in 2024. The region’s leadership is underpinned by a well-established healthcare infrastructure, high adoption rates of digital pathology, and strong presence of leading AI technology providers. Europe follows closely, supported by robust investments in healthcare innovation and a proactive regulatory landscape. Meanwhile, the Asia Pacific region is poised for the fastest growth over the forecast period, driven by increasing healthcare expenditure, expanding cancer screening programs, and rising awareness of the benefits of AI-powered diagnostic solutions. Latin America and the Middle East & Africa are also expected to witness steady growth as digital transformation initiatives gain momentum in these regions.
f
DataSheet_1_Research on outlier detection in CTD conductivity data based on...
frontiersin.figshare.com
docx
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Long Yu; Jia Sun; Yanliang Guo; Baohua Zhang; Guangbing Yang; Liang Chen; Xia Ju; Fanlin Yang; Xuejun Xiong; Xianqing Lv (2023). DataSheet_1_Research on outlier detection in CTD conductivity data based on cubic spline fitting.docx [Dataset]. http://doi.org/10.3389/fmars.2022.1030980.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmars.2022.1030980.s001
Dataset updated
Jun 21, 2023
Dataset provided by
Frontiers
Authors
Long Yu; Jia Sun; Yanliang Guo; Baohua Zhang; Guangbing Yang; Liang Chen; Xia Ju; Fanlin Yang; Xuejun Xiong; Xianqing Lv
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Outlier detection is the key to the quality control of marine survey data. For the detection of outliers in Conductivity-Temperature-Depth (CTD) data, previous methods, such as the Wild Edit method and the Median Filter Combined with Maximum Deviation method, mostly set a threshold based on statistics. Values greater than the threshold are treated as outliers, but there is no clear specification for the selection of threshold, thus multiple attempts are required. The process is time-consuming and inefficient, and the results have high false negative and positive rates. In response to this problem, we proposed an outlier detection method in CTD conductivity data, based on a physical constraint, the continuity of seawater. The method constructs a cubic spline fitting function based on the independent points scheme and the cubic spline interpolation to fit the conductivity data. The maximum fitting residual points will be flagged as outliers. The fitting stops when the optimal number of iterations is reached, which is automatically obtained by the minimum value of the sequence of maximum fitting residuals. Verification of the accuracy and stability of the method by means of examples proves that it has a lower false negative rate (17.88%) and false positive rate (0.24%) than other methods. Indeed, rates for the Wild Edit method are 56.96% and 2.19%, while for the Median Filter Combined with Maximum Deviation method rates are 23.28% and 0.31%. The Cubic Spline Fitting method is simple to operate, the result is clear and definite, better solved the problem of conductivity outliers detection.
u
High-high cluster and high-low outlier road intersections for road traffic...
zivahub.uct.ac.za
docx
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Vieira; Simon Hull; Roger Behrens (2024). High-high cluster and high-low outlier road intersections for road traffic crashes involving pedestrians within the CoCT in 2017, 2018, 2019 and 2021 [Dataset]. http://doi.org/10.25375/uct.25968379.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.25968379.v1
Dataset updated
Jun 6, 2024
Dataset provided by
University of Cape Town
Authors
Simone Vieira; Simon Hull; Roger Behrens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
City of Cape Town
Description
This dataset offers a detailed inventory of road intersections and their corresponding suburbs within Cape Town, meticulously curated to highlight instances of high pedestrian crash counts observed in "high-high" cluster and "high-low" outlier fishnet grid cells across the years 2017, 2018, 2019, and 2021. To enhance its utility, the dataset meticulously colour-codes each month associated with elevated crash occurrences, providing a nuanced perspective. Furthermore, the dataset categorises road intersections based on their placement within "high-high" clusters (marked with pink tabs) or "high-low" outlier cells (indicated by red tabs). For ease of navigation, the intersections are further organised alphabetically by suburb name, ensuring accessibility and clarity.Data SpecificsData Type: Geospatial-temporal categorical data with numeric attributesFile Format: Word document (.docx)Size: 255 KBNumber of Files: The dataset contains a total of 264 road intersection records (68 "high-high" clusters and 196 "high-low" outliers)Date Created: 21st May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, Open Refine, Python, SQLProcessing Steps: The raw road traffic crash data underwent a comprehensive refining process using Python software to ensure its accuracy and consistency. Following this, duplicates were eliminated to retain only one entry per crash incident. Subsequently, the data underwent further refinement with Open Refine software, focusing specifically on isolating unique crash descriptions for subsequent geocoding in ArcGIS Pro. Notably, during this process, only the road intersection crashes were retained, as they were the only incidents with spatial definitions.Once geocoded, road intersection crashes that involved a pedestrian were extracted so that subsequent spatio-temporal analyses would focus on these crashes only. The spatio-temporal analysis methods by which the pedestrian crashes were analysed included spatial autocorrelation, hotspot analysis, and cluster and outlier analysis. Leveraging these methods, road intersections involving pedestrian crashes identified as either "high-high" clusters or "high-low" outliers were extracted for inclusion in the dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)
e
Sample of 45 H{alpha}EW outliers - Dataset - B2FIND
b2find.eudat.eu
Updated Oct 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Sample of 45 H{alpha}EW outliers - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/7782063a-207c-571b-bad5-80eedba236cf
Explore at:
Dataset updated
Oct 23, 2023
Description
In this work, we calibrate the relationship between H{alpha} emission and M-dwarf ages. We compile a sample of 892 M-dwarfs with H{alpha} equivalent width (H{alpha}EW) measurements from the literature that are either comoving with a white dwarf of known age (21 stars) or in a known young association (871 stars). In this sample we identify 7 M-dwarfs that are new candidate members of known associations. By dividing the stars into active and inactive categories according to their H{alpha}EW and spectral type (SpT), we find that the fraction of active dwarfs decreases with increasing age, and the form of the decline depends on SpT. Using the compiled sample of age calibrators, we find that H{alpha} EW and fractional H{alpha} luminosity (L_H{alpha}/L_bol) decrease with increasing age. H{alpha}EW for SpT<~M7 decreases gradually up until ~1Gyr. For older ages, we found only two early M dwarfs that are both inactive and seem to continue the gradual decrease. We also found 14 mid-type M-dwarfs, out of which 11 are inactive and present a significant decrease in H{alpha}EW, suggesting that the magnetic activity decreases rapidly after ~1Gyr. We fit L_H{alpha}/L_bol versus age with a broken power law and find an index of -0.11_-0.01_^+0.02^ for ages >1Gyr) leaves this part of the relation far less constrained. Finally, from repeated independent measurements for the same stars, we find that 94% of them have a level of H{alpha}EW variability <~5{AA} at young ages (<1Gyr).
C
AirDataInfo - air pressure at sea level
ckan.mobidatalab.eu
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Metropolregion Rhein-Neckar (2023). AirDataInfo - air pressure at sea level [Dataset]. https://ckan.mobidatalab.eu/dataset/air-data-info-air-pressure-at-sea-level
Explore at:
http://publications.europa.eu/resource/authority/file-type/csv, http://publications.europa.eu/resource/authority/file-type/gpkg, http://publications.europa.eu/resource/authority/file-type/geojson, http://publications.europa.eu/resource/authority/file-type/wms_srvcAvailable download formats
Dataset updated
Aug 15, 2023
Dataset provided by
Metropolregion Rhein-Neckar
License
http://dcat-ap.de/def/licenses/other-opensourcehttp://dcat-ap.de/def/licenses/other-opensource
Description
The average of all measured values of a sensor over the last 5 minutes is displayed. The displayed measured values were filtered for high and low outliers. - High outliers are anything beyond the 3rd quartile + 1.5 * inter-quartile range (IQB) - Low outliers are anything below the 1st quartile - 1.5 * IQB
H
The Social Cost of Carbon: Trends, Outliers and Catastrophes [Dataset]
data.niaid.nih.gov
xls, zip
Updated Nov 25, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard S.J. Tol (2009). The Social Cost of Carbon: Trends, Outliers and Catastrophes [Dataset] [Dataset]. http://doi.org/10.7910/DVN/LGIF0V
Explore at:
xls, zipAvailable download formats
Unique identifier
https://doi.org/10.7910/DVN/LGIF0V
Dataset updated
Nov 25, 2009
Dataset provided by
Economic and Social Research Institute, Dublin
Authors
Richard S.J. Tol
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Global
Description
211 estimates of the social cost of carbon are included in a meta-analysis. The results confirm that a lower discount rate implies a higher estimate; and that higher estimates are found in the gray literature. It is also found that there is a downward trend in the economic impact estimates of the climate; that the Stern Review’s estimates of the social cost of carbon is an outlier; and that the right tail of the distribution is fat. There is a fair chance that the annual climate liability exceeds the annual income of many people.
f
Sampling probabilities and input regression estimates for simulation...
plos.figshare.com
xls
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doris Tove Kristoffersen; Jon Helgeland; Jocelyne Clench-Aas; Petter Laake; Marit B. Veierød (2023). Sampling probabilities and input regression estimates for simulation scenarios, logistic scale. [Dataset]. http://doi.org/10.1371/journal.pone.0195248.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0195248.t002
Dataset updated
Jun 10, 2023
Dataset provided by
PLOS ONE
Authors
Doris Tove Kristoffersen; Jon Helgeland; Jocelyne Clench-Aas; Petter Laake; Marit B. Veierød
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
μlow, μnon−outlier, and μhigh are the hospital specific mortality effects for low mortality outliers, non-outliers, and high mortality outliers. γsex and γage are the regression coefficients for sex and age, respectively.

Facebook

Twitter

Click to copy link

Link copied

Cite

Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1

Data from: Methodology to filter out outliers in high spatial density data to improve maps reliability

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.14305658.v1

Dataset updated

Jun 4, 2023

Dataset provided by

SciELO journals

Authors

Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.

Clear search

Close search

Google apps

Main menu

Data from: Methodology to filter out outliers in high spatial density data...

Data from: Outlier classification using autoencoders: application for...

Find Outliers Percent of households with income below the Federal Poverty...

Fifth Generation Wireless Channels Outlier Detection and Clustering

Observed to expected or logistic regression to identify hospitals with high...

The 12 outliers identified in the Tonga dataset.

Citation Trends for "Sparse online low-rank projection and outlier rejection...

outlier detection text reducing

Dataset

Contents

Association analysis of high-low outlier road intersection crashes involving...

Citation Trends for "Non-convex low-rank matrix recovery with arbitrary...

Pairwise correlations between stability estimates excluding the outliers.

Anolis carolinensis character displacement SNP

Data from: Spatial detection of outlier loci with Moran eigenvector maps...

AI Histology QC Outlier Detection Tool Market Research Report 2033

AI Histology QC Outlier Detection Tool Market Outlook

DataSheet_1_Research on outlier detection in CTD conductivity data based on...

High-high cluster and high-low outlier road intersections for road traffic...

Sample of 45 H{alpha}EW outliers - Dataset - B2FIND

AirDataInfo - air pressure at sea level

The Social Cost of Carbon: Trends, Outliers and Catastrophes [Dataset]

Sampling probabilities and input regression estimates for simulation...

Data from: Methodology to filter out outliers in high spatial density data to improve maps reliability