27 datasets found
  1. f

    Data from: Valid Inference Corrected for Outlier Removal

    • tandf.figshare.com
    pdf
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuxiao Chen; Jacob Bien (2023). Valid Inference Corrected for Outlier Removal [Dataset]. http://doi.org/10.6084/m9.figshare.9762731.v4
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Shuxiao Chen; Jacob Bien
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this article we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real datasets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R. Supplementary materials for this article are available online.

  2. f

    Data from: Error and anomaly detection for intra-participant time-series...

    • tandf.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    David R. Mullineaux; Gareth Irwin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

  3. R code

    • figshare.com
    txt
    Updated Jun 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Christine Dodge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

  4. r

    Data from: Male responses to sperm competition risk when rivals vary in...

    • researchdata.edu.au
    • search.dataone.org
    • +1more
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leigh W. Simmons; Joseph L. Tomkins; Samuel J. Lymbery; School of Biological Sciences (2019). Data from: Male responses to sperm competition risk when rivals vary in their number and familiarity [Dataset]. http://doi.org/10.5061/DRYAD.M097580
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    The University of Western Australia
    DRYAD
    Authors
    Leigh W. Simmons; Joseph L. Tomkins; Samuel J. Lymbery; School of Biological Sciences
    Description

    Males of many species adjust their reproductive investment to the number of rivals present simultaneously. However, few studies have investigated whether males sum previous encounters with rivals, and the total level of competition has never been explicitly separated from social familiarity. Social familiarity can be an important component of kin recognition and has been suggested as a cue that males use to avoid harming females when competing with relatives. Previous work has succeeded in independently manipulating social familiarity and relatedness among rivals, but experimental manipulations of familiarity are confounded with manipulations of the total number of rivals that males encounter. Using the seed beetle Callosobruchus maculatus we manipulated three factors: familiarity among rival males, the number of rivals encountered simultaneously, and the total number of rivals encountered over a 48-hour period. Males produced smaller ejaculates when exposed to more rivals in total, regardless of the maximum number of rivals they encountered simultaneously. Males did not respond to familiarity. Our results demonstrate that males of this species can sum the number of rivals encountered over separate days, and therefore the confounding of familiarity with the total level of competition in previous studies should not be ignored.,Lymbery et al 2018 Full datasetContains all the data used in the statistical analyses for the associated manuscript. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Full Dataset.xlsxLymbery et al 2018 Reduced dataset 1Contains data used in the attached manuscript following the removal of three outliers for the purposes of data distribution, as described in the associated R code. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Reduced Dataset After 1st Round of Outlier Removal.xlsxLymbery et al 2018 Reduced dataset 2Contains the data used in the statistical analyses for the associated manuscript, after the removal of all outliers stated in the manuscript and associated R code. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Reduced Dataset After Final Outlier Removal.xlsxLymbery et al 2018 R ScriptContains all the R code used for statistical analysis in this manuscript, with annotations to aid interpretation.,

  5. Data from: Spatial detection of outlier loci with Moran eigenvector maps...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Helene H. Wagner; Mariana Chávez-Pesqueira; Brenna R. Forester; Helene H. Wagner; Mariana Chávez-Pesqueira; Brenna R. Forester (2022). Data from: Spatial detection of outlier loci with Moran eigenvector maps (MEM) [Dataset]. http://doi.org/10.5061/dryad.b12kk
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Helene H. Wagner; Mariana Chávez-Pesqueira; Brenna R. Forester; Helene H. Wagner; Mariana Chávez-Pesqueira; Brenna R. Forester
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The spatial signature of microevolutionary processes structuring genetic variation may play an important role in the detection of loci under selection. However, the spatial location of samples has not yet been used to quantify this. Here, we present a new two-step method of spatial outlier detection at the individual and deme levels using the power spectrum of Moran eigenvector maps (MEM). The MEM power spectrum quantifies how the variation in a variable, such as the frequency of an allele at a SNP locus, is distributed across a range of spatial scales defined by MEM spatial eigenvectors. The first step (Moran spectral outlier detection: MSOD) uses genetic and spatial information to identify outlier loci by their unusual power spectrum. The second step uses Moran spectral randomization (MSR) to test the association between outlier loci and environmental predictors, accounting for spatial autocorrelation. Using simulated data from two published papers, we tested this two-step method in different scenarios of landscape configuration, selection strength, dispersal capacity and sampling design. Under scenarios that included spatial structure, MSOD alone was sufficient to detect outlier loci at the individual and deme levels without the need for incorporating environmental predictors. Follow-up with MSR generally reduced (already low) false-positive rates, though in some cases led to a reduction in power. The results were surprisingly robust to differences in sample size and sampling design. Our method represents a new tool for detecting potential loci under selection with individual-based and population-based sampling by leveraging spatial information that has hitherto been neglected.

  6. H

    Replication data for: Robust Estimation and Outlier Detection for...

    • dataverse.harvard.edu
    Updated Nov 28, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walter R. Mebane; Jasjeet S. Sekhon (2007). Replication data for: Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data [Dataset]. http://doi.org/10.7910/DVN/RDXADE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 28, 2007
    Dataset provided by
    Harvard Dataverse
    Authors
    Walter R. Mebane; Jasjeet S. Sekhon
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1993 - 2000
    Description

    We develop a robust estimator—the hyperbolic tangent (tanh) estimator—for over dispersed multinomial regression models of count data. The tanh estimator provides accurate estimates and reliable inferences even when the specified model is not good for as much as half of the data. Seriously ill-fitted counts—outliers—are identified as part of the estimation. A Monte Carlo sampling experiment shows that the tanh estimator produces good results at practical sample sizes even when ten percent of the data are generated by a significantly different process. The experiment shows that, with contaminated data, estimation fails using four other estimators: the non-robust maximum likelihood estimator, the additive logistic model and two SUR models. Using the tanh estimator to analyze data from Florida for the 2000 presidential election matches well-known features of the election that the other four estimators fail to capture. In an analysis of data from the 1993 Polish parliamentary election, the tanh estimator gives sharper inferences than does a previously proposed hetero-skedastic SUR model.

  7. m

    Guidelines for benchmarking and outlier detection in clinical quality...

    • bridges.monash.edu
    bin
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessy Hansen; Arul Earnest; Ahmad Reza Pourghaderi; Susannah Ahern (2025). Guidelines for benchmarking and outlier detection in clinical quality registries - simulation and model build code [Dataset]. http://doi.org/10.26180/28665671.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Monash University
    Authors
    Jessy Hansen; Arul Earnest; Ahmad Reza Pourghaderi; Susannah Ahern
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contains the summary dataset, simulation Stata code and model build R code for the study titled "Benchmarking methods for detection of underperforming healthcare providers in clinical quality registries – implementation guidelines".Contains:guidelines_data_preparation.do Stata code for running the simulations (using the user written hiersim command available at https://doi.org/10.26180/24480889) and preparing the summary performance dataset. sim_extra_sum.dtaSummary performance dataset containing the average accuracy of outlier detection methods for simulations of clinical quality registry data of varied data parameters.guidelines_model_build.RR code for developing generalised linear models for predicting the accuracy of outlier detection based on registry data parameters.

  8. z

    Data from: Snow depth estimation from Geoprecision-Maxbotic ultrasonic...

    • zenodo.org
    • produccioncientifica.uca.es
    bin, zip
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Ángel de Pablo; Miguel Ángel de Pablo; Belén Rosado Moscoso; Belén Rosado Moscoso (2025). Snow depth estimation from Geoprecision-Maxbotic ultrasonic devices: R processing code and example datasets from Antarctica [Dataset]. http://doi.org/10.5281/zenodo.15703929
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset provided by
    ACMA-Universidad de Alcalá
    Authors
    Miguel Ángel de Pablo; Miguel Ángel de Pablo; Belén Rosado Moscoso; Belén Rosado Moscoso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Antarctica
    Description

    This dataset provides the R script and example data used to estimate snow depth from ultrasonic distance measurements collected by low-cost Geoprecision-Maxbotic devices, designed for autonomous operation in polar conditions. The dataset includes:

    • The full R script used for data preprocessing, filtering, and snow depth calculation, with all parameters fully documented.

    • Example raw and clean data files, ready to use, acquired from a sensor installed in the South Shetland Islands (Antarctica) between 2023 and 2024.

    The processing pipeline includes outlier removal (Hampel filter), gap interpolation, moving average smoothing, reference level estimation, and snow depth conversion in millimetres and centimetres. Derived snow depths are exported alongside summary statistics.

    This code was developed as part of a research project evaluating the performance and limitations of low-cost ultrasonic snow depth measurement systems in Antarctic permafrost monitoring networks. Although the script was designed for the specific configuration of Geoprecision dataloggers and Maxbotic MB7574-SCXL-Maxsonar-WRST7 sensors, it can be easily adapted to other distance-measuring devices providing similar output formats.

    All files are provided in open formats (CSV, and R) to facilitate reuse and reproducibility. Users are encouraged to modify the script to fit their own instrumentation and field conditions.

  9. f

    Data from: Dimension Reduction for Outlier Detection Using DOBIN

    • tandf.figshare.com
    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sevvandi Kandanaarachchi; Rob J. Hyndman (2023). Dimension Reduction for Outlier Detection Using DOBIN [Dataset]. http://doi.org/10.6084/m9.figshare.12844487.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Sevvandi Kandanaarachchi; Rob J. Hyndman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article introduces DOBIN, a new approach to select a set of basis vectors tailored for outlier detection. DOBIN has a simple mathematical foundation and can be used as a dimension reduction tool for outlier detection tasks. We demonstrate the effectiveness of DOBIN on an extensive data repository, by comparing the performance of outlier detection methods using DOBIN and other bases. We further illustrate the utility of DOBIN as an outlier visualization tool. The R package dobin implements this basis construction. Supplementary materials for this article are available online.

  10. Z

    ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

    • data.niaid.nih.gov
    • elki-project.github.io
    • +1more
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schubert, Erich; Zimek, Arthur (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
    Explore at:
    Dataset updated
    May 2, 2024
    Dataset provided by
    Ludwig-Maximilians-Universität München
    Authors
    Schubert, Erich; Zimek, Arthur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data sets were originally created for the following publications:

    M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

    H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

    The outlier data set versions were introduced in:

    E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

    They are derived from the original image data available at https://aloi.science.uva.nl/

    The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

    Additional information is available at: https://elki-project.github.io/datasets/multi_view

    The following views are currently available:

        Feature type
        Description
        Files
    
    
        Object number
        Sparse 1000 dimensional vectors that give the true object assignment
        objs.arff.gz
    
    
        RGB color histograms
        Standard RGB color histograms (uniform binning)
        aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
    
    
        HSV color histograms
        Standard HSV/HSB color histograms in various binnings
        aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
    
    
        Color similiarity
        Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)
        aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
    
    
        Haralick features
        First 13 Haralick features (radius 1 pixel)
        aloi-haralick-1.csv.gz
    
    
        Front to back
        Vectors representing front face vs. back faces of individual objects
        front.arff.gz
    
    
        Basic light
        Vectors indicating basic light situations
        light.arff.gz
    
    
        Manual annotations
        Manually annotated object groups of semantically related objects such as cups
        manual1.arff.gz
    

    Outlier Detection Versions

    Additionally, we generated a number of subsets for outlier detection:

        Feature type
        Description
        Files
    
    
        RGB Histograms
        Downsampled to 100000 objects (553 outliers)
        aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
    
    
    
        Downsampled to 75000 objects (717 outliers)
        aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
    
    
    
        Downsampled to 50000 objects (1508 outliers)
        aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
    
  11. f

    Data from: Leave-One-Out Kernel Density Estimates for Outlier Detection

    • tandf.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sevvandi Kandanaarachchi; Rob J Hyndman (2023). Leave-One-Out Kernel Density Estimates for Outlier Detection [Dataset]. http://doi.org/10.6084/m9.figshare.16942936.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Sevvandi Kandanaarachchi; Rob J Hyndman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article introduces lookout, a new approach to detect outliers using leave-one-out kernel density estimates and extreme value theory. Outlier detection methods that use kernel density estimates generally employ a user defined parameter to determine the bandwidth. Lookout uses persistent homology to construct a bandwidth suitable for outlier detection without any user input. We demonstrate the effectiveness of lookout on an extensive data repository by comparing its performance with other outlier detection methods based on extreme value theory. Furthermore, we introduce outlier persistence, a useful concept that explores the birth and the cessation of outliers with changing bandwidth and significance levels. The R package lookout implements this algorithm. Supplementary files for this article are available online.

  12. f

    Data from: Objective Bayesian Survival Analysis Using Shape Mixtures of...

    • tandf.figshare.com
    • figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Catalina A. Vallejos; Mark F. J. Steel (2023). Objective Bayesian Survival Analysis Using Shape Mixtures of Log-Normal Distributions [Dataset]. http://doi.org/10.6084/m9.figshare.1473746.v3
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Catalina A. Vallejos; Mark F. J. Steel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Survival models such as the Weibull or log-normal lead to inference that is not robust to the presence of outliers. They also assume that all heterogeneity between individuals can be modeled through covariates. This article considers the use of infinite mixtures of lifetime distributions as a solution for these two issues. This can be interpreted as the introduction of a random effect in the survival distribution. We introduce the family of shape mixtures of log-normal distributions, which covers a wide range of density and hazard functions. Bayesian inference under nonsubjective priors based on the Jeffreys’ rule is examined and conditions for posterior propriety are established. The existence of the posterior distribution on the basis of a sample of point observations is not always guaranteed and a solution through set observations is implemented. In addition, we propose a method for outlier detection based on the mixture structure. A simulation study illustrates the performance of our methods under different scenarios and an application to a real dataset is provided. Supplementary materials for the article, which include R code, are available online.

  13. f

    Numbers of putative directional and balancing Fst outlier loci discovered.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monal M. Lal; Paul C. Southgate; Dean R. Jerry; Cyprien Bosserelle; Kyall R. Zenger (2023). Numbers of putative directional and balancing Fst outlier loci discovered. [Dataset]. http://doi.org/10.1371/journal.pone.0161390.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Monal M. Lal; Paul C. Southgate; Dean R. Jerry; Cyprien Bosserelle; Kyall R. Zenger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tests were carried out at three False Discovery Rate (FDR) thresholds using BayeScan 2.1 [70] and LOSITAN [72]. Jointly-identified loci were identified using both outlier detection platforms.

  14. Causal effect estimates using Radial MVMR with and without outlier removal...

    • plos.figshare.com
    xls
    Updated Dec 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wes Spiller; Jack Bowden; Eleanor Sanderson (2024). Causal effect estimates using Radial MVMR with and without outlier removal with varying levels of balanced pleiotropy. [Dataset]. http://doi.org/10.1371/journal.pgen.1011506.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 30, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wes Spiller; Jack Bowden; Eleanor Sanderson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Causal effect estimates using Radial MVMR with and without outlier removal with varying levels of balanced pleiotropy.

  15. f

    Data from: Modeling of the Sintered Density in Cu-Al Alloy Using Machine...

    • acs.figshare.com
    xlsx
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saleh Asnaashari; Mohammadhadi Shateri; Abdolhossein Hemmati-Sarapardeh; Shahab S. Band (2023). Modeling of the Sintered Density in Cu-Al Alloy Using Machine Learning Approaches [Dataset]. http://doi.org/10.1021/acsomega.2c07278.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    ACS Publications
    Authors
    Saleh Asnaashari; Mohammadhadi Shateri; Abdolhossein Hemmati-Sarapardeh; Shahab S. Band
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In powder metallurgy materials, sintered density in Cu-Al alloy plays a critical role in detecting mechanical properties. Experimental measurement of this property is costly and time-consuming. In this study, adaptive boosting decision tree, support vector regression, k-nearest neighbors, extreme gradient boosting, and four multilayer perceptron (MLP) models tuned by resilient backpropagation, Levenberg–Marquardt (LM), scaled conjugate gradient, and Bayesian regularization were employed for predicting powder densification through sintering. Yield strength, Young’s modulus, volume variation caused by the phase transformation, hardness, liquid volume, liquidus temperature, the solubility ratio among the liquid phase and the solid phase, sintered temperature, solidus temperature, sintered atmosphere, holding time, compaction pressure, particle size, and specific shape factor were regarded as the input parameters of the suggested models. The cross plot, error distribution curve, and cumulative frequency diagram as graphical tools and average percent relative error (APRE), average absolute percent relative error (AAPRE), root mean square error (RMSE), standard deviation (SD), and coefficient of correlation (R) as the statistical evaluations were utilized to estimate the models’ accuracy. All of the developed models were compared with preexisting approaches, and the results exhibited that the developed models in the present work are more precise and valid than the existing ones. The designed MLP-LM model was found to be the most precise approach with AAPRE = 1.292%, APRE = −0.032%, SD = 0.020, RMSE = 0.016, and R = 0.989. Lately, outlier detection was applied performing the leverage technique to detect the suspected data points. The outlier detection discovered that few points are located out of the applicability domain of the proposed MLP-LM model.

  16. Reduction in model Λ after sequential removal of major outlier populations.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keith Hunley; Michael Dunn; Eva Lindström; Ger Reesink; Angela Terrill; Meghan E. Healy; George Koki; Françoise R. Friedlaender; Jonathan S. Friedlaender (2023). Reduction in model Λ after sequential removal of major outlier populations. [Dataset]. http://doi.org/10.1371/journal.pgen.1000239.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Keith Hunley; Michael Dunn; Eva Lindström; Ger Reesink; Angela Terrill; Meghan E. Healy; George Koki; Françoise R. Friedlaender; Jonathan S. Friedlaender
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    aSee Text S1.

  17. Data cleaning EVI2

    • figshare.com
    txt
    Updated May 13, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geraldine Klarenberg (2019). Data cleaning EVI2 [Dataset]. http://doi.org/10.6084/m9.figshare.5327527.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 13, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Geraldine Klarenberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scripts to clean EVI2 data obtained from the VIP lab (University of Arizona) website (https://vip.arizona.edu/about.php and https://vip.arizona.edu/viplab_data_explorer.php). Data obtained in 2012.- outlier detection and removal/replacement- alignment of 2 periodsThe manuscript detailing the methods and resulting data sets has been accepted for publication in Nature Scientific Data (05/11/2019).Instructions: use the R Markdown html file for instructions!Code last manipulated and tested in R 3.4.3 ("Kite-Eating Tree")

  18. Additional file 2 of Detection of suspicious interactions of spiking...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miriam Sieg; Gesa Richter; Arne Schaefer; Jochen Kruppa (2023). Additional file 2 of Detection of suspicious interactions of spiking covariates in methylation data [Dataset]. http://doi.org/10.6084/m9.figshare.11776278.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Miriam Sieg; Gesa Richter; Arne Schaefer; Jochen Kruppa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2 R code and example of the Algorithms 1 and 2 for the detection of suspicious spike interactions.

  19. Causal effect estimates obtained using radial MR and radial MVMR models,...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Dec 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wes Spiller; Jack Bowden; Eleanor Sanderson (2024). Causal effect estimates obtained using radial MR and radial MVMR models, estimating the effect of lipid fractions (HDL, LDL, and triglycerides) on CHD. [Dataset]. http://doi.org/10.1371/journal.pgen.1011506.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 30, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wes Spiller; Jack Bowden; Eleanor Sanderson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Causal effect estimates obtained using radial MR and radial MVMR models, estimating the effect of lipid fractions (HDL, LDL, and triglycerides) on CHD.

  20. f

    Data from: mzQuality: An Open-Source Software Tool for Quality Monitoring...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marielle van der Peet; Pascal Maas; Agnieszka Wegrzyn; Lieke Lamont; Ronan Fleming; Constance Bordes; Stéphanie Debette; Amy Harms; Thomas Hankemeier; Alida Kindt (2025). mzQuality: An Open-Source Software Tool for Quality Monitoring and Reporting of Targeted Mass Spectrometry Measurements [Dataset]. http://doi.org/10.1021/jasms.5c00073.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 25, 2025
    Dataset provided by
    ACS Publications
    Authors
    Marielle van der Peet; Pascal Maas; Agnieszka Wegrzyn; Lieke Lamont; Ronan Fleming; Constance Bordes; Stéphanie Debette; Amy Harms; Thomas Hankemeier; Alida Kindt
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Analyzing metabolites using mass spectrometry provides valuable insight into an individual’s health or disease status. However, various sources of experimental variation can be introduced during sample handling, preparation, and measurement, which can negatively affect the data. Quality assurance and quality control practices are essential to ensuring accurate and reproducible metabolomics data. These practices include measuring reference samples to monitor instrument stability, blank samples to evaluate the background signal, and strategies to correct for changes in instrumental performance. In this context, we introduce mzQuality, a user-friendly, open-source R-Shiny app designed to assess and correct technical variations in mass spectrometry-based metabolomics data. It processes peak-integrated data independently of vendor software and provides essential quality control features, including batch correction, outlier detection, and background signal assessment, and it visualizes trends in signal or retention time. We demonstrate its functionality using a data set of 419 samples measured across six batches, including quality control samples. mzQuality visualizes data through sample plots, PCA plots, and violin plots, which illustrate its ability to reduce the effect of experiment variation. Compound quality is further assessed by evaluating the relative standard deviation of quality control samples and the background signal from blank samples. Based on these quality metrics, compounds are classified into confidence levels. mzQuality provides an accessible solution to improve the data quality without requiring prior programming skills. Its customizable settings integrate seamlessly into research workflows, enhancing the accuracy and reproducibility of the metabolomics data. Additionally, with an R-compatible output, the data are ready for statistical analysis and biological interpretation.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shuxiao Chen; Jacob Bien (2023). Valid Inference Corrected for Outlier Removal [Dataset]. http://doi.org/10.6084/m9.figshare.9762731.v4

Data from: Valid Inference Corrected for Outlier Removal

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
Taylor & Francis
Authors
Shuxiao Chen; Jacob Bien
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this article we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real datasets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R. Supplementary materials for this article are available online.

Search
Clear search
Close search
Google apps
Main menu