10 datasets found
  1. f

    Data from: Valid Inference Corrected for Outlier Removal

    • figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuxiao Chen; Jacob Bien (2023). Valid Inference Corrected for Outlier Removal [Dataset]. http://doi.org/10.6084/m9.figshare.9762731.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Shuxiao Chen; Jacob Bien
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this paper we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real data sets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R.

  2. R code

    • figshare.com
    txt
    Updated Jun 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Christine Dodge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

  3. Z

    ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

    • data.niaid.nih.gov
    • elki-project.github.io
    • +1more
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schubert, Erich (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
    Explore at:
    Dataset updated
    May 2, 2024
    Dataset provided by
    Schubert, Erich
    Zimek, Arthur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data sets were originally created for the following publications:

    M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

    H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

    The outlier data set versions were introduced in:

    E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

    They are derived from the original image data available at https://aloi.science.uva.nl/

    The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

    Additional information is available at: https://elki-project.github.io/datasets/multi_view

    The following views are currently available:

        Feature type
        Description
        Files
    
    
        Object number
        Sparse 1000 dimensional vectors that give the true object assignment
        objs.arff.gz
    
    
        RGB color histograms
        Standard RGB color histograms (uniform binning)
        aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
    
    
        HSV color histograms
        Standard HSV/HSB color histograms in various binnings
        aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
    
    
        Color similiarity
        Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)
        aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
    
    
        Haralick features
        First 13 Haralick features (radius 1 pixel)
        aloi-haralick-1.csv.gz
    
    
        Front to back
        Vectors representing front face vs. back faces of individual objects
        front.arff.gz
    
    
        Basic light
        Vectors indicating basic light situations
        light.arff.gz
    
    
        Manual annotations
        Manually annotated object groups of semantically related objects such as cups
        manual1.arff.gz
    

    Outlier Detection Versions

    Additionally, we generated a number of subsets for outlier detection:

        Feature type
        Description
        Files
    
    
        RGB Histograms
        Downsampled to 100000 objects (553 outliers)
        aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
    
    
    
        Downsampled to 75000 objects (717 outliers)
        aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
    
    
    
        Downsampled to 50000 objects (1508 outliers)
        aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
    
  4. Causal effect estimates using Radial MVMR with and without outlier removal...

    • plos.figshare.com
    xls
    Updated Dec 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wes Spiller; Jack Bowden; Eleanor Sanderson (2024). Causal effect estimates using Radial MVMR with and without outlier removal with varying levels of unbalanced pleiotropy. [Dataset]. http://doi.org/10.1371/journal.pgen.1011506.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 30, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wes Spiller; Jack Bowden; Eleanor Sanderson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Causal effect estimates using Radial MVMR with and without outlier removal with varying levels of unbalanced pleiotropy.

  5. RRegrs study for Growth Yield

    • figshare.com
    txt
    Updated Jun 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian Robert Munteanu (2016). RRegrs study for Growth Yield [Dataset]. http://doi.org/10.6084/m9.figshare.3409804.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2016
    Dataset provided by
    figshare
    Authors
    Cristian Robert Munteanu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    RRegrs study for Growth Yield for original and corrected/filterred datasets: inputs training and test files, R scripts to split the datasets, plot for outlier removal.

  6. f

    Causal effect estimates obtained using radial MR and radial MVMR models,...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Dec 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wes Spiller; Jack Bowden; Eleanor Sanderson (2024). Causal effect estimates obtained using radial MR and radial MVMR models, estimating the effect of lipid fractions (HDL, LDL, and triglycerides) on CHD. [Dataset]. http://doi.org/10.1371/journal.pgen.1011506.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 30, 2024
    Dataset provided by
    PLOS Genetics
    Authors
    Wes Spiller; Jack Bowden; Eleanor Sanderson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Causal effect estimates obtained using radial MR and radial MVMR models, estimating the effect of lipid fractions (HDL, LDL, and triglycerides) on CHD.

  7. f

    Data from: Modeling of the Sintered Density in Cu-Al Alloy Using Machine...

    • acs.figshare.com
    xlsx
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saleh Asnaashari; Mohammadhadi Shateri; Abdolhossein Hemmati-Sarapardeh; Shahab S. Band (2023). Modeling of the Sintered Density in Cu-Al Alloy Using Machine Learning Approaches [Dataset]. http://doi.org/10.1021/acsomega.2c07278.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    ACS Publications
    Authors
    Saleh Asnaashari; Mohammadhadi Shateri; Abdolhossein Hemmati-Sarapardeh; Shahab S. Band
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In powder metallurgy materials, sintered density in Cu-Al alloy plays a critical role in detecting mechanical properties. Experimental measurement of this property is costly and time-consuming. In this study, adaptive boosting decision tree, support vector regression, k-nearest neighbors, extreme gradient boosting, and four multilayer perceptron (MLP) models tuned by resilient backpropagation, Levenberg–Marquardt (LM), scaled conjugate gradient, and Bayesian regularization were employed for predicting powder densification through sintering. Yield strength, Young’s modulus, volume variation caused by the phase transformation, hardness, liquid volume, liquidus temperature, the solubility ratio among the liquid phase and the solid phase, sintered temperature, solidus temperature, sintered atmosphere, holding time, compaction pressure, particle size, and specific shape factor were regarded as the input parameters of the suggested models. The cross plot, error distribution curve, and cumulative frequency diagram as graphical tools and average percent relative error (APRE), average absolute percent relative error (AAPRE), root mean square error (RMSE), standard deviation (SD), and coefficient of correlation (R) as the statistical evaluations were utilized to estimate the models’ accuracy. All of the developed models were compared with preexisting approaches, and the results exhibited that the developed models in the present work are more precise and valid than the existing ones. The designed MLP-LM model was found to be the most precise approach with AAPRE = 1.292%, APRE = −0.032%, SD = 0.020, RMSE = 0.016, and R = 0.989. Lately, outlier detection was applied performing the leverage technique to detect the suspected data points. The outlier detection discovered that few points are located out of the applicability domain of the proposed MLP-LM model.

  8. f

    Pearson correlations (r) between siblings for Eyes scores and Eyes scores...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gillian Ragsdale; Robert A. Foley (2023). Pearson correlations (r) between siblings for Eyes scores and Eyes scores adjusted by removing the low-scoring outliers (Eyes Adj >17). [Dataset]. http://doi.org/10.1371/journal.pone.0023236.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Gillian Ragsdale; Robert A. Foley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    **Correlation is significant at the 0.01 level (2-tailed).*Correlation is significant at the 0.05 level (2-tailed).'Correlation is significant at the 0.1 level (2-tailed).For each model, the two categories of sibling pairs are derived from Table 2. In each case, a possible fit (in bold) is indicated by the second correlation being less than the first.

  9. MLR models of age at onset of T1D after removing outliers (N = 354).

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahood Alazwari; Mali Abdollahian; Laleh Tafakori; Alice Johnstone; Rahma A. Alshumrani; Manal T. Alhelal; Abdulhameed Y. Alsaheel; Eman S. Almoosa; Aseel R. Alkhaldi (2023). MLR models of age at onset of T1D after removing outliers (N = 354). [Dataset]. http://doi.org/10.1371/journal.pone.0264118.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ahood Alazwari; Mali Abdollahian; Laleh Tafakori; Alice Johnstone; Rahma A. Alshumrani; Manal T. Alhelal; Abdulhameed Y. Alsaheel; Eman S. Almoosa; Aseel R. Alkhaldi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MLR models of age at onset of T1D after removing outliers (N = 354).

  10. f

    Numbers of putative directional and balancing Fst outlier loci discovered.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monal M. Lal; Paul C. Southgate; Dean R. Jerry; Cyprien Bosserelle; Kyall R. Zenger (2023). Numbers of putative directional and balancing Fst outlier loci discovered. [Dataset]. http://doi.org/10.1371/journal.pone.0161390.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Monal M. Lal; Paul C. Southgate; Dean R. Jerry; Cyprien Bosserelle; Kyall R. Zenger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tests were carried out at three False Discovery Rate (FDR) thresholds using BayeScan 2.1 [70] and LOSITAN [72]. Jointly-identified loci were identified using both outlier detection platforms.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shuxiao Chen; Jacob Bien (2023). Valid Inference Corrected for Outlier Removal [Dataset]. http://doi.org/10.6084/m9.figshare.9762731.v1

Data from: Valid Inference Corrected for Outlier Removal

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Shuxiao Chen; Jacob Bien
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this paper we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real data sets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R.

Search
Clear search
Close search
Google apps
Main menu