86 datasets found

Data from: Valid Inference Corrected for Outlier Removal
tandf.figshare.com
pdf
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuxiao Chen; Jacob Bien (2023). Valid Inference Corrected for Outlier Removal [Dataset]. http://doi.org/10.6084/m9.figshare.9762731.v4
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9762731.v4
Dataset updated
Jun 4, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Shuxiao Chen; Jacob Bien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this article we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real datasets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R. Supplementary materials for this article are available online.
h
kubernetes-reformatted-remove-outliers
huggingface.co
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ishaan Sehgal (2024). kubernetes-reformatted-remove-outliers [Dataset]. https://huggingface.co/datasets/ishaansehgal99/kubernetes-reformatted-remove-outliers
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 12, 2024
Authors
Ishaan Sehgal
Description
ishaansehgal99/kubernetes-reformatted-remove-outliers dataset hosted on Hugging Face and contributed by the HF Datasets community
f
Data from: Methodology to filter out outliers in high spatial density data...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14305658.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
Outliers Practice model
kaggle.com
zip
Updated Jun 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riaz Ansari (2022). Outliers Practice model [Dataset]. https://www.kaggle.com/datasets/muhammadriazansari/outliers-practice-model/code
Explore at:
zip(179619 bytes)Available download formats
Dataset updated
Jun 24, 2022
Authors
Riaz Ansari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Riaz Ansari

Released under CC0: Public Domain

Contents
Ex. Normal Distribution & ZScore - Outlier Removal
kaggle.com
zip
Updated Nov 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Panagiotis Prassas (2023). Ex. Normal Distribution & ZScore - Outlier Removal [Dataset]. https://www.kaggle.com/datasets/panagiotisprassas/ex-normal-distribution-and-zscore-outlier-removal
Explore at:
zip(147098 bytes)Available download formats
Dataset updated
Nov 11, 2023
Authors
Panagiotis Prassas
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Panagiotis Prassas

Released under Apache 2.0

Contents
d
Data for Filtering Organized 3D Point Clouds for Bin Picking Applications
datasets.ai
catalog.data.gov
0, 34, 47
Updated Apr 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). Data for Filtering Organized 3D Point Clouds for Bin Picking Applications [Dataset]. https://datasets.ai/datasets/data-for-filtering-organized-3d-point-clouds-for-bin-picking-applications
Explore at:
0, 34, 47Available download formats
Dataset updated
Apr 10, 2024
Dataset authored and provided by
National Institute of Standards and Technology
Description
Contains scans of a bin filled with different parts ( screws, nuts, rods, spheres, sprockets). For each part type, RGB image and organized 3D point cloud obtained with structured light sensor are provided. In addition, unorganized 3D point cloud representing an empty bin and a small Matlab script to read the files is also provided. 3D data contain a lot of outliers and the data were used to demonstrate a new filtering technique.
Outlier Detection and Removal
kaggle.com
zip
Updated Jul 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prashant S Parhad (2022). Outlier Detection and Removal [Dataset]. https://www.kaggle.com/datasets/prashantsparhad/outlier-detection-and-removal
Explore at:
zip(3741 bytes)Available download formats
Dataset updated
Jul 28, 2022
Authors
Prashant S Parhad
Description
Dataset

This dataset was created by Prashant S Parhad

Contents
f
Timings and statistical data of point model by our method.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaojuan Ning; Fan Li; Ge Tian; Yinghui Wang (2023). Timings and statistical data of point model by our method. [Dataset]. http://doi.org/10.1371/journal.pone.0201280.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0201280.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Xiaojuan Ning; Fan Li; Ge Tian; Yinghui Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Timings and statistical data of point model by our method.
Outlier classification using autoencoders: application for fluctuation...
osti.gov
dataverse.harvard.edu
Updated Jun 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. http://doi.org/10.7910/DVN/SKEHRJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SKEHRJ
Dataset updated
Jun 2, 2021
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
Description
Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.
Sample(s) removed as outliers in each iteration of MFMW-outlier for all the...
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuk Yee Leung; Chun Qi Chang; Yeung Sam Hung (2023). Sample(s) removed as outliers in each iteration of MFMW-outlier for all the six microarray datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0046700.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0046700.t003
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yuk Yee Leung; Chun Qi Chang; Yeung Sam Hung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample(s) removed as outliers in each iteration of MFMW-outlier for all the six microarray datasets.
f
Numbers of detected cgSNP differences between German isolates before removal...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weber, Michael; Semmler, Torsten; Marcordes, Sandra; Brangsch, Hanka; Calvelage, Sten; Linde, Jörg; Höper, Dirk; Barth, Stefanie A.; Busch, Anne; Wolf, Silver A. (2025). Numbers of detected cgSNP differences between German isolates before removal of outlier SAMEA5164947 (original), after outlier removal (outlier removed) and after filtering for recombination sites (recombination-adjusted). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002085632
Explore at:
Dataset updated
Apr 1, 2025
Authors
Weber, Michael; Semmler, Torsten; Marcordes, Sandra; Brangsch, Hanka; Calvelage, Sten; Linde, Jörg; Höper, Dirk; Barth, Stefanie A.; Busch, Anne; Wolf, Silver A.
Description
Numbers of detected cgSNP differences between German isolates before removal of outlier SAMEA5164947 (original), after outlier removal (outlier removed) and after filtering for recombination sites (recombination-adjusted).
d
Pressure and processed water levels from the Time Series Station Spiekeroog,...
search.dataone.org
doi.pangaea.de
Updated Jan 6, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Holinde, Lars; Badewien, Thomas H; Freund, Jan A; Stanev, Emil V; Zielinski, Oliver (2018). Pressure and processed water levels from the Time Series Station Spiekeroog, 2005-2011 [Dataset]. http://doi.org/10.1594/PANGAEA.843740
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.843740
Dataset updated
Jan 6, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Holinde, Lars; Badewien, Thomas H; Freund, Jan A; Stanev, Emil V; Zielinski, Oliver
Time period covered
Jan 1, 2005 - Dec 31, 2011
Area covered

Description
The quality of water level time series data strongly varies with periods of high and low quality sensor data. In this paper we are presenting the processing steps which were used to generate high quality water level data from water pressure measured at the Time Series Station (TSS) Spiekeroog. The TSS is positioned in a tidal inlet between the islands of Spiekeroog and Langeoog in the East Frisian Wadden Sea (southern North Sea). The processing steps will cover sensor drift, outlier identification, interpolation of data gaps and quality control. A central step is the removal of outliers. For this process an absolute threshold of 0.25m/10min was selected which still keeps the water level increase and decrease during extreme events as shown during the quality control process. A second important feature of data processing is the interpolation of gappy data which is accomplished with a high certainty of generating trustworthy data. Applying these methods a 10 years dataset (December 2002-December 2012) of water level information at the TSS was processed resulting in a seven year time series (2005-2011).
d
Supporting data for \"A Standard Operating Procedure for Outlier Removal in...
search.dataone.org
dataverse.no
+1more
Updated Jul 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Holsbø, Einar (2024). Supporting data for \"A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets\" [Dataset]. http://doi.org/10.18710/FGVLKS
Explore at:
Unique identifier
https://doi.org/10.18710/FGVLKS
Dataset updated
Jul 29, 2024
Dataset provided by
DataverseNO
Authors
Holsbø, Einar
Description
This dataset is example data from the Norwegian Women and Cancer study. It is supporting information to our article "A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets." (In submission) The bulk of the data comes from measuring gene expression in blood samples from the Norwegian Women and Cancer study (NOWAC) on Illumina Whole-Genome Gene Expression Bead Chips, HumanHT-12 v4. Please see README.txt for details
Heart Diseases Dataset
kaggle.com
zip
Updated Feb 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sally Ahmed (2023). Heart Diseases Dataset [Dataset]. https://www.kaggle.com/datasets/sallyahmed/heart-diseases-dataset/discussion
Explore at:
zip(3277670 bytes)Available download formats
Dataset updated
Feb 6, 2023
Authors
Sally Ahmed
Description
Dataset

This dataset was created by Sally Ahmed

Contents
Performance analysis of our algorithm on 3D models.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaojuan Ning; Fan Li; Ge Tian; Yinghui Wang (2023). Performance analysis of our algorithm on 3D models. [Dataset]. http://doi.org/10.1371/journal.pone.0201280.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0201280.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xiaojuan Ning; Fan Li; Ge Tian; Yinghui Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance analysis of our algorithm on 3D models.
d
Data from: Monthly OpenET Image Collections (v2.0) Summarized by 12-Digit...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Monthly OpenET Image Collections (v2.0) Summarized by 12-Digit Hydrologic Unit Codes, 2008-2023 [Dataset]. https://catalog.data.gov/dataset/monthly-openet-image-collections-v2-0-summarized-by-12-digit-hydrologic-unit-codes-2008-20
Explore at:
Dataset updated
Nov 21, 2025
Dataset provided by
U.S. Geological Survey
Description
This dataset provides monthly summaries of evapotranspiration (ET) data from OpenET v2.0 image collections for the period 2008-2023 for all National Watershed Boundary Dataset subwatersheds (12-digit hydrologic unit codes [HUC12s]) in the US that overlap the spatial extent of OpenET datasets. For each HUC12, this dataset contains spatial aggregation statistics (minimum, mean, median, and maximum) for each of the ET variables from each of the publicly available image collections from OpenET for the six available models (DisALEXI, eeMETRIC, geeSEBAL, PT-JPL, SIMS, SSEBop) and the Ensemble image collection, which is a pixel-wise ensemble of all 6 individual models after filtering and removal of outliers according to the median absolute deviation approach (Melton and others, 2022). Data are available in this data release in two different formats: comma-separated values (CSV) and parquet, a high-performance format that is optimized for storage and processing of columnar data. CSV files containing data for each 4-digit HUC are grouped by 2-digit HUCs for easier access of regional data, and the single parquet file provides convenient access to the entire dataset. For each of the ET models (DisALEXI, eeMETRIC, geeSEBAL, PT-JPL, SIMS, SSEBop), variables in the model-specific CSV data files include: -huc12: The 12-digit hydrologic unit code -ET: Actual evapotranspiration (in millimeters) over the HUC12 area in the month calculated as the sum of daily ET interpolated between Landsat overpasses -statistic: Max, mean, median, or min. Statistic used in the spatial aggregation within each HUC12. For example, maximum ET is the maximum monthly pixel ET value occurring within the HUC12 boundary after summing daily ET in the month -year: 4-digit year -month: 2-digit month -count: Number of Landsat overpasses included in the ET calculation in the month -et_coverage_pct: Integer percentage of the HUC12 with ET data, which can be used to determine how representative the ET statistic is of the entire HUC12 -count_coverage_pct: Integer percentage of the HUC12 with count data, which can be different than the et_coverage_pct value because the “count” band in the source image collection extends beyond the “et” band in the eastern portion of the image collection extent For the Ensemble data, these additional variables are included in the CSV files: -et_mad: Ensemble ET value, computed as the mean of the ensemble after filtering outliers using the median absolute deviation (MAD) -et_mad_count: The number of models used to compute the ensemble ET value after filtering for outliers using the MAD -et_mad_max: The maximum value in the ensemble range, after filtering for outliers using the MAD -et_mad_min: The minimum value in the ensemble range, after filtering for outliers using the MAD -et_sam: A simple arithmetic mean (across the 6 models) of actual ET average without outlier removal Below are the locations of each OpenET image collection used in this summary: DisALEXI: https://developers.google.com/earth-engine/datasets/catalog/OpenET_DISALEXI_CONUS_GRIDMET_MONTHLY_v2_0 eeMETRIC: https://developers.google.com/earth-engine/datasets/catalog/OpenET_EEMETRIC_CONUS_GRIDMET_MONTHLY_v2_0 geeSEBAL: https://developers.google.com/earth-engine/datasets/catalog/OpenET_GEESEBAL_CONUS_GRIDMET_MONTHLY_v2_0 PT-JPL: https://developers.google.com/earth-engine/datasets/catalog/OpenET_PTJPL_CONUS_GRIDMET_MONTHLY_v2_0 SIMS: https://developers.google.com/earth-engine/datasets/catalog/OpenET_SIMS_CONUS_GRIDMET_MONTHLY_v2_0 SSEBop: https://developers.google.com/earth-engine/datasets/catalog/OpenET_SSEBOP_CONUS_GRIDMET_MONTHLY_v2_0 Ensemble: https://developers.google.com/earth-engine/datasets/catalog/OpenET_ENSEMBLE_CONUS_GRIDMET_MONTHLY_v2_0
f
A new fast filtering algorithm for a 3D point cloud based on RGB-D...
figshare.com
tiff
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaochuan Jia; Ting Yang; Chuanjiang Wang; Binghui Fan; Fugui He (2023). A new fast filtering algorithm for a 3D point cloud based on RGB-D information [Dataset]. http://doi.org/10.1371/journal.pone.0220253
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0220253
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Chaochuan Jia; Ting Yang; Chuanjiang Wang; Binghui Fan; Fugui He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A point cloud that is obtained by an RGB-D camera will inevitably be affected by outliers that do not belong to the surface of the object, which is due to the different viewing angles, light intensities, and reflective characteristics of the object surface and the limitations of the sensors. An effective and fast outlier removal method based on RGB-D information is proposed in this paper. This method aligns the color image to the depth image, and the color mapping image is converted to an HSV image. Then, the optimal segmentation threshold of the V image that is calculated by using the Otsu algorithm is applied to segment the color mapping image into a binary image, which is used to extract the valid point cloud from the original point cloud with outliers. The robustness of the proposed method to the noise types, light intensity and contrast is evaluated by using several experiments; additionally, the method is compared with other filtering methods and applied to independently developed foot scanning equipment. The experimental results show that the proposed method can remove all type of outliers quickly and effectively.
d
SKY Vaults Stable Coin Backing Per Most Active Vault (Outliers removed)
dune.com
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dhr (2025). SKY Vaults Stable Coin Backing Per Most Active Vault (Outliers removed) [Dataset]. https://dune.com/discover/content/relevant?q=author:dhr&resource-type=queries
Explore at:
Dataset updated
Oct 1, 2025
Authors
dhr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Blockchain data query: SKY Vaults Stable Coin Backing Per Most Active Vault (Outliers removed)
f
Data from: A Multi-Objective Genetic Algorithm for Outlier Removal
acs.figshare.com
xlsx
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oren E. Nahum; Abraham Yosipof; Hanoch Senderowitz (2023). A Multi-Objective Genetic Algorithm for Outlier Removal [Dataset]. http://doi.org/10.1021/acs.jcim.5b00515.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.5b00515.s001
Dataset updated
Jun 11, 2023
Dataset provided by
ACS Publications
Authors
Oren E. Nahum; Abraham Yosipof; Hanoch Senderowitz
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Quantitative structure activity relationship (QSAR) or quantitative structure property relationship (QSPR) models are developed to correlate activities for sets of compounds with their structure-derived descriptors by means of mathematical models. The presence of outliers, namely, compounds that differ in some respect from the rest of the data set, compromise the ability of statistical methods to derive QSAR models with good prediction statistics. Hence, outliers should be removed from data sets prior to model derivation. Here we present a new multi-objective genetic algorithm for the identification and removal of outliers based on the k nearest neighbors (kNN) method. The algorithm was used to remove outliers from three different data sets of pharmaceutical interest (logBBB, factor 7 inhibitors, and dihydrofolate reductase inhibitors), and its performances were compared with those of five other methods for outlier removal. The results suggest that the new algorithm provides filtered data sets that (1) better maintain the internal diversity of the parent data sets and (2) give rise to QSAR models with much better prediction statistics. Equally good filtered data sets in terms of these metrics were obtained when another objective function was added to the algorithm (termed “preservation”), forcing it to remove certain compounds with low probability only. This option is highly useful when specific compounds should be preferably kept in the final data set either because they have favorable activities or because they represent interesting molecular scaffolds. We expect this new algorithm to be useful in future QSAR applications.
r
Data from: Male responses to sperm competition risk when rivals vary in...
researchdata.edu.au
datadryad.org
Updated 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leigh W. Simmons; Joseph L. Tomkins; Samuel J. Lymbery; School of Biological Sciences (2019). Data from: Male responses to sperm competition risk when rivals vary in their number and familiarity [Dataset]. http://doi.org/10.5061/DRYAD.M097580
Explore at:
Unique identifier
https://doi.org/10.5061/DRYAD.M097580
Dataset updated
2019
Dataset provided by
The University of Western Australia
DRYAD
Authors
Leigh W. Simmons; Joseph L. Tomkins; Samuel J. Lymbery; School of Biological Sciences
Description
Males of many species adjust their reproductive investment to the number of rivals present simultaneously. However, few studies have investigated whether males sum previous encounters with rivals, and the total level of competition has never been explicitly separated from social familiarity. Social familiarity can be an important component of kin recognition and has been suggested as a cue that males use to avoid harming females when competing with relatives. Previous work has succeeded in independently manipulating social familiarity and relatedness among rivals, but experimental manipulations of familiarity are confounded with manipulations of the total number of rivals that males encounter. Using the seed beetle Callosobruchus maculatus we manipulated three factors: familiarity among rival males, the number of rivals encountered simultaneously, and the total number of rivals encountered over a 48-hour period. Males produced smaller ejaculates when exposed to more rivals in total, regardless of the maximum number of rivals they encountered simultaneously. Males did not respond to familiarity. Our results demonstrate that males of this species can sum the number of rivals encountered over separate days, and therefore the confounding of familiarity with the total level of competition in previous studies should not be ignored.,Lymbery et al 2018 Full datasetContains all the data used in the statistical analyses for the associated manuscript. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Full Dataset.xlsxLymbery et al 2018 Reduced dataset 1Contains data used in the attached manuscript following the removal of three outliers for the purposes of data distribution, as described in the associated R code. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Reduced Dataset After 1st Round of Outlier Removal.xlsxLymbery et al 2018 Reduced dataset 2Contains the data used in the statistical analyses for the associated manuscript, after the removal of all outliers stated in the manuscript and associated R code. The file contains two spreadsheets: one containing the data and one containing a legend relating to column titles.Lymbery et al Reduced Dataset After Final Outlier Removal.xlsxLymbery et al 2018 R ScriptContains all the R code used for statistical analysis in this manuscript, with annotations to aid interpretation.,

Facebook

Twitter

Click to copy link

Link copied

Cite

Shuxiao Chen; Jacob Bien (2023). Valid Inference Corrected for Outlier Removal [Dataset]. http://doi.org/10.6084/m9.figshare.9762731.v4

Data from: Valid Inference Corrected for Outlier Removal

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.9762731.v4

Dataset updated

Jun 4, 2023

Dataset provided by

Taylor & Francishttps://taylorandfrancis.com/

Authors

Shuxiao Chen; Jacob Bien

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this article we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real datasets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R. Supplementary materials for this article are available online.

Clear search

Close search

Google apps

Main menu

Data from: Valid Inference Corrected for Outlier Removal

kubernetes-reformatted-remove-outliers

Data from: Methodology to filter out outliers in high spatial density data...

Outliers Practice model

Dataset

Contents

Ex. Normal Distribution & ZScore - Outlier Removal

Dataset

Contents

Data for Filtering Organized 3D Point Clouds for Bin Picking Applications

Outlier Detection and Removal

Dataset

Contents

Timings and statistical data of point model by our method.

Outlier classification using autoencoders: application for fluctuation...

Sample(s) removed as outliers in each iteration of MFMW-outlier for all the...

Numbers of detected cgSNP differences between German isolates before removal...

Pressure and processed water levels from the Time Series Station Spiekeroog,...

Supporting data for \"A Standard Operating Procedure for Outlier Removal in...

Heart Diseases Dataset

Dataset

Contents

Performance analysis of our algorithm on 3D models.

Data from: Monthly OpenET Image Collections (v2.0) Summarized by 12-Digit...

A new fast filtering algorithm for a 3D point cloud based on RGB-D...

SKY Vaults Stable Coin Backing Per Most Active Vault (Outliers removed)

Data from: A Multi-Objective Genetic Algorithm for Outlier Removal

Data from: Male responses to sperm competition risk when rivals vary in...

Data from: Valid Inference Corrected for Outlier Removal