56 datasets found

Extreme outliers among state-level causes of death, 1999-2013
figshare.com
txt
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francis P. Boscoe; Eva Pradhan (2016). Extreme outliers among state-level causes of death, 1999-2013 [Dataset]. http://doi.org/10.6084/m9.figshare.1422036.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1422036.v1
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Francis P. Boscoe; Eva Pradhan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This table identifies all state-level causes of death that were at least five times the national rate in at least one of the periods 1999-2003, 2004-2008, and 2009-2013. Data are based on the 113 Cause of Death list and are based on the CDC's Underlying Cause of Death file accessible at: http://wonder.cdc.gov/ucd-icd10.html.
f
Performance of the extreme outlier test (EOS) with n = 1 selective site...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio Carvajal-Rodríguez (2023). Performance of the extreme outlier test (EOS) with n = 1 selective site located at the center of the chromosome or n = 5. [Dataset]. http://doi.org/10.1371/journal.pone.0175944.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0175944.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Antonio Carvajal-Rodríguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Selection was α = 600 and Nm = 10. Mean localization is given in distance (kb) from the real selective position.
f
Anomaly Detection in High-Dimensional Data
tandf.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priyanga Dilini Talagala; Rob J. Hyndman; Kate Smith-Miles (2023). Anomaly Detection in High-Dimensional Data [Dataset]. http://doi.org/10.6084/m9.figshare.12844508.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12844508.v2
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Priyanga Dilini Talagala; Rob J. Hyndman; Kate Smith-Miles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation where its k-nearest neighbor distance with the maximum gap is significantly different from what we would expect if the distribution of k-nearest neighbors with the maximum gap is in the maximum domain of attraction of the Gumbel distribution. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray. Supplementary materials for this article are available online.
f
Data from: Leave-One-Out Kernel Density Estimates for Outlier Detection
tandf.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sevvandi Kandanaarachchi; Rob J Hyndman (2023). Leave-One-Out Kernel Density Estimates for Outlier Detection [Dataset]. http://doi.org/10.6084/m9.figshare.16942936.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16942936.v2
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Sevvandi Kandanaarachchi; Rob J Hyndman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article introduces lookout, a new approach to detect outliers using leave-one-out kernel density estimates and extreme value theory. Outlier detection methods that use kernel density estimates generally employ a user defined parameter to determine the bandwidth. Lookout uses persistent homology to construct a bandwidth suitable for outlier detection without any user input. We demonstrate the effectiveness of lookout on an extensive data repository by comparing its performance with other outlier detection methods based on extreme value theory. Furthermore, we introduce outlier persistence, a useful concept that explores the birth and the cessation of outliers with changing bandwidth and significance levels. The R package lookout implements this algorithm. Supplementary files for this article are available online.
d
Outliers and similarity in APOGEE - Dataset - B2FIND
b2find.dkrz.de
Updated Nov 2, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Outliers and similarity in APOGEE - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/b624b506-541b-5a09-b615-14b8e202c468
Explore at:
Dataset updated
Nov 2, 2017
Description
In this work we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the dataset, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the dataset for objects allows us to find objects that are impossible to find using their best fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the dataset, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data. Cone search capability for table J/MNRAS/476/2117/apogeenn (Nearest neighbors APOGEE IDs)
c
Dynamic Apparel Sales with Anomalies Dataset
cubig.ai
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Dynamic Apparel Sales with Anomalies Dataset [Dataset]. https://cubig.ai/store/products/423/dynamic-apparel-sales-with-anomalies-dataset
Explore at:
Dataset updated
Jun 5, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Dynamic Apparel Sales with Anomalies Dataset is based on 100,000 sales transaction data from the fashion industry, including extreme outliers, missing values, and sales_categories, reflecting the different data characteristics of real retail environments.

2) Data Utilization (1) Dynamic Apparel Sales with Anomalies Dataset has characteristics that: • This dataset consists of nine categorical variables and 10 numerical variables, including product name, brand, gender clothing, price, discount rate, inventory level, and customer behavior, making it suitable for analyzing product and customer characteristics. (2) Dynamic Apparel Sales with Anomalies Dataset can be used to: • Sales anomaly detection and quality control: Transaction data with outliers and missing values can be used to detect outliers, manage quality, refine data, and develop outlier processing techniques. • Sales Forecast and Customer Analysis Modeling: Based on a variety of product and customer characteristics, it can be used to support data-driven decision-making, such as machine learning-based sales forecasting, customer segmentation, and customized marketing strategies.
g
Estimation of extreme water level values – Metropolitan coast, 2022 edition...
gimi9.com
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Estimation of extreme water level values – Metropolitan coast, 2022 edition | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_http-www-shom-fr-maree_courants-niv_extr
Explore at:
Dataset updated
Dec 17, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This study is a first global estimate of the extreme water level values along the metropolitan coastline. It is to be refined locally with all available data and knowledge. The method used is based on a statistical analysis of the tide charts available in ports. It does not take into account the observations of the waves. The results between ports are obtained by an interpolation method. The study produces statistical estimates at reference ports: — extreme values of open sea overcots in the Channel and Atlantic; — extreme water level values for the entire metropolis. And a set of statistical estimation maps of extreme water levels along the coastline. The estimates provided go up to the return period 1000 years. In view of the observation times used at ports, the user must check whether the estimates of a return period of more than 50 or 100 years still make sense.
CPU performance metrics (unclean)
kaggle.com
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Arfath R (2025). CPU performance metrics (unclean) [Dataset]. https://www.kaggle.com/datasets/mohammedarfathr/cpu-performance-metrics-unclean
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohammed Arfath R
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset simulates CPU performance logs collected from multiple systems.

Columns:

Disk write speed

Disk read speed

CPU Usage (%) - CPU utilization (with missing values, outliers, and anomalies).

CPU Temperature (°C) - Temperature readings (with random noise and extreme values).

Clock Speed (GHz) - CPU clock speed (with inconsistent formatting and missing values).

Cache Miss Rate (%) - Percentage of cache misses (skewed distribution and corrupted values).

Power Consumption (W) - Power usage in watts (with extreme outliers and inconsistent scaling).
Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values
ceicdata.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values [Dataset]. https://www.ceicdata.com/en/sweden/consumer-survey-national-institute-of-economic-research/consumer-survey-ki-perceived-inflation-now-excl-extreme-values
Explore at:
Dataset updated
Jan 15, 2025
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Aug 1, 2017 - Jul 1, 2018
Area covered
Sweden
Variables measured
Consumer Survey
Description
Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values data was reported at 2.210 % in Jul 2018. This records a decrease from the previous number of 2.550 % for Jun 2018. Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values data is updated monthly, averaging 1.720 % from Dec 2001 (Median) to Jul 2018, with 200 observations. The data reached an all-time high of 4.770 % in Jul 2008 and a record low of 0.000 % in Mar 2016. Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values data remains active status in CEIC and is reported by National Institute of Economic Research. The data is categorized under Global Database’s Sweden – Table SE.H009: Consumer Survey: National Institute of Economic Research.
i
Experimental results on extreme values of trilateration localization error...
ieee-dataport.org
Updated Oct 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Farooq-i-Azam (2019). Experimental results on extreme values of trilateration localization error in wireless communication systems [Dataset]. https://ieee-dataport.org/documents/experimental-results-extreme-values-trilateration-localization-error-wireless
Explore at:
Dataset updated
Oct 28, 2019
Authors
Muhammad Farooq-i-Azam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set provides experimental results for the analysis of extreme values of trilateration localization error in wireless communication systems. The analysis is based upon the analytical model of trilateration localization error described and discussed in the manuscript titled "An Analytical Model of Trilateration Localization Error".
Tab. 4 Absolute air temperature extreme values during the observational...
doi.pangaea.de
html, tsv
Updated Oct 2, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Diem (2010). Tab. 4 Absolute air temperature extreme values during the observational period 1951-1965 [Dataset]. http://doi.org/10.1594/PANGAEA.745849
Explore at:
tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.745849
Dataset updated
Oct 2, 2010
Dataset provided by
PANGAEA
Authors
Max Diem
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Time period covered
Jan 1, 1960 - Dec 1, 1960
Area covered

Variables measured
DATE/TIME, Event label, Temperature, air, maximum, Temperature, air, minimum
Description
This dataset is about: Tab. 4 Absolute air temperature extreme values during the observational period 1951-1965. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.745935 for more information.
E
Extreme values in systems with random interactions
edmond.mpg.de
bin
Updated Nov 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Girard; Martin Girard (2022). Extreme values in systems with random interactions [Dataset]. http://doi.org/10.17617/3.2TPJQX
Explore at:
bin(29778414), bin(29778418), bin(29680114), bin(6446965222), bin(407265787), bin(105275884), bin(1615225325)Available download formats
Unique identifier
https://doi.org/10.17617/3.2TPJQX
Dataset updated
Nov 24, 2022
Dataset provided by
Edmond
Authors
Martin Girard; Martin Girard
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Simulations of systems with random interactions, for different underlying distributions and realizations of the interaction.
f
Summary result of number of outliers in selected MNCH data items.
plos.figshare.com
xls
Updated Apr 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keshab Sanjel; Shiv Lal Sharma; Swadesh Gurung; Man Bahadur Oli; Samikshya Singh; Tuk Prasad Pokhrel (2024). Summary result of number of outliers in selected MNCH data items. [Dataset]. http://doi.org/10.1371/journal.pone.0298101.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0298101.t004
Dataset updated
Apr 1, 2024
Dataset provided by
PLOS ONE
Authors
Keshab Sanjel; Shiv Lal Sharma; Swadesh Gurung; Man Bahadur Oli; Samikshya Singh; Tuk Prasad Pokhrel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary result of number of outliers in selected MNCH data items.
d
Data from: A powerful test of independent assortment that determines...
search.dataone.org
data.niaid.nih.gov
+2more
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William C. L. Stewart; Valerie R. Hager (2025). A powerful test of independent assortment that determines genome-wide significance quickly and accurately [Dataset]. http://doi.org/10.5061/dryad.7tb57
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.7tb57
Dataset updated
Jun 14, 2025
Dataset provided by
Dryad Digital Repository
Authors
William C. L. Stewart; Valerie R. Hager
Time period covered
Jan 1, 2016
Description
In the analysis of DNA sequences on related individuals, most methods strive to incorporate as much information as possible, with little or no attention paid to the issue of statistical significance. For example, a modern workstation can easily handle the computations needed to perform a large-scale genome-wide inheritance-by-descent (IBD) scan, but accurate assessment of the significance of that scan is often hindered by inaccurate approximations and computationally intensive simulation. To address these issues, we developed gLOD-a test of co-segregation that, for large samples, models chromosome-specific IBD statistics as a collection of stationary Gaussian processes. With this simple model, the parametric bootstrap yields an accurate and rapid assessment of significance-the genome-wide corrected P-value. Furthermore, we show that (i) under the null hypothesis, the limiting distribution of the gLOD is the standard Gumbel distribution; (ii) our parametric bootstrap simulator is approxi...
Weather Type Classification
kaggle.com
Updated Jun 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikhil Narayan (2024). Weather Type Classification [Dataset]. https://www.kaggle.com/datasets/nikhil7280/weather-type-classification/suggestions?status=pending
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 23, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nikhil Narayan
Description
Description

This dataset is synthetically generated to mimic weather data for classification tasks. It includes various weather-related features and categorizes the weather into four types: Rainy, Sunny, Cloudy, and Snowy. This dataset is designed for practicing classification algorithms, data preprocessing, and outlier detection methods.

Variables

Temperature (numeric): The temperature in degrees Celsius, ranging from extreme cold to extreme heat.

Humidity (numeric): The humidity percentage, including values above 100% to introduce outliers.

Wind Speed (numeric): The wind speed in kilometers per hour, with a range including unrealistically high values.

Precipitation (%) (numeric): The precipitation percentage, including outlier values.

Cloud Cover (categorical): The cloud cover description.

Atmospheric Pressure (numeric): The atmospheric pressure in hPa, covering a wide range.

UV Index (numeric): The UV index, indicating the strength of ultraviolet radiation.

Season (categorical): The season during which the data was recorded.

Visibility (km) (numeric): The visibility in kilometers, including very low or very high values.

Location (categorical): The type of location where the data was recorded.

Weather Type (categorical): The target variable for classification, indicating the weather type.

Purpose and Utility

This dataset is useful for data scientists, students especially beginners, and practitioners to investigate classification algorithm's performance, practice data preprocessing, feature engineering, model evaluation, and test outlier detection methods. It provides opportunities for learning and experimenting with weather data analysis and machine learning techniques.

Important Note

This dataset is synthetically produced and does not convey real-world weather data. It includes intentional outliers to provide opportunities for practicing outlier detection and handling. The values, ranges, and distributions may not accurately represent real-world conditions, and the data should primarily be used for educational and experimental purposes.

License

Anyone is free to share and use the data
f
Data from: Interplay of Inhibition and Multiplexing : Largest Eigenvalue...
figshare.com
pdf
Updated Jun 21, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SAPTARSHI GHOSH; Sanjiv K Dwivedi; Sarika Jalan (2016). Interplay of Inhibition and Multiplexing : Largest Eigenvalue Statistics [Dataset]. http://doi.org/10.6084/m9.figshare.3452825.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3452825.v2
Dataset updated
Jun 21, 2016
Dataset provided by
figshare
Authors
SAPTARSHI GHOSH; Sanjiv K Dwivedi; Sarika Jalan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Captions:Figure 1: Phase diagram depicting shape parameter ξ for accepted GEV distribution forER-ER multiplex network as a function of IC inclusion probabilities (pin) in both the layers. Region Bcorresponds to the Weibull. Region A stands for undefined distributions. Size of the network N=100 in eachlayer.Figure 2: (Color online) Distribution of Rmax of SF networks with average degree hki = 4 for various ICinclusion probabilities (pin). Histogram is fitted with normal (blue dotted line) and GEV (red solid line)distributions. Network size N=500.Figure 3: (Color online) Distribution of Rmax of SF networks with average degree hki = 6 for various ICinclusion probabilities (pin). Histogram is fitted with normal (blue dotted line) and GEV (red solid line)distributions. Network size N=500.Table 1: Estimated parameters of KS test for fitting GEV and normal distributions of Rmax for differentnetwork sizes of SF network over a average of 5000 random realization. Other parameters are inhibitioninclusion probability pin = 0.5 and average degree k = 6.Table 2: Estimated parameters of KS test for fitting of GEV and normal distributions of Rmax for different inhibitoryinclusion probability (pin) of SF- SF network over 5000 population. Other parameters are network size N = 100 in each layer andaverage degree k = 6.Table 3: Estimated parameters of KS test for fitting of GEV and normal distributions of Rmax for different inhibitoryinclusion probability (pin) of ER- SF network over 5000 population. Other parameters are network size N = 100 in each layer andaverage degree k = 6.
f
The detailed introduction of the outlier data analysis dataset.
plos.figshare.com
xls
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qirong Lu; Jian Zou; Yingya Ye; Zexin Wang (2024). The detailed introduction of the outlier data analysis dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0299435.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299435.t009
Dataset updated
Mar 18, 2024
Dataset provided by
PLOS ONE
Authors
Qirong Lu; Jian Zou; Yingya Ye; Zexin Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The detailed introduction of the outlier data analysis dataset.
f
Statistical description of the observed data.
plos.figshare.com
xls
Updated May 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mao Liu; Wenyi Yang; Ting Tian; Jie Yang; Zhen Ding (2024). Statistical description of the observed data. [Dataset]. http://doi.org/10.1371/journal.pone.0302360.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302360.t001
Dataset updated
May 20, 2024
Dataset provided by
PLOS ONE
Authors
Mao Liu; Wenyi Yang; Ting Tian; Jie Yang; Zhen Ding
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Attendance absences have a substantial impact on student’s future physical and mental health as well as academic progress. Numerous personal, familial, and social issues are among the causes of student absences. Any kind of absence from school should be minimized. Extremely high rates of student absences may indicate the abrupt commencement of a serious school health crisis or public health crisis, such as the spread of tuberculosis or COVID-19, which provides school health professionals with an early warning. We take the extreme values in absence data as the object and attempt to apply the extreme value theory (EVT) to describe the distribution of extreme values. This study aims to predict extreme instances of student absences. School health professionals can take preventative measures to reduce future excessive absences, according to the predicted results. Five statistical distributions were applied to individually characterize the extreme values. Our findings suggest that EVT is a useful tool for predicting extreme student absences, thereby aiding preventative measures in public health.
f
Partial outlier data table.
plos.figshare.com
xls
Updated Mar 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qirong Lu; Jian Zou; Yingya Ye; Zexin Wang (2024). Partial outlier data table. [Dataset]. http://doi.org/10.1371/journal.pone.0299435.t013
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299435.t013
Dataset updated
Mar 18, 2024
Dataset provided by
PLOS ONE
Authors
Qirong Lu; Jian Zou; Yingya Ye; Zexin Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The detection of water quality indicators such as Temperature, pH, Turbidity, Conductivity, and TDS involves five national standard methods. Chemically based measurement techniques may generate liquid residue, causing secondary pollution. The water quality monitoring and data analysis system can effectively address the issues that conventional methods require multiple pieces of equipment and repeated measurements. This paper analyzes the distribution characteristics of the historical data from five sensors at a specific time, displays them graphically in real time, and provides an early warning of exceeding the standard; It selects four water samples from different sections of the Li River, based on the national standard method, the average measurement errors of Temperature, PH, TDS, Conductivity and Turbidity are 0.98%, 2.23%, 2.92%, 3.05% and 3.98%.;It further uses the quartile method to analyze the outlier data over 100,000 records and five historical periods are selected. Experiment results show the system is relatively stable in measuring Temperature, PH and TDS, and the proportion of outlier is 0.42%, 0.84% and 1.24%. When Turbidity and Conductivity are measured, the proportion is 3.11% and 2.92%. In the experiment of using 7 methods to fill outlier, K nearest neighbor algorithm is better than others. The analysis of data trends, outliers, means, and extreme values assists in making decisions, such as updating and maintaining equipment, addressing extreme water quality situations, and enhancing regional water quality oversight.
f
Supplement 1. The R source code for fitting extreme value distributions.
wiley.figshare.com
html
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard W. Katz; Grace S. Brush; Marc B. Parlange (2023). Supplement 1. The R source code for fitting extreme value distributions. [Dataset]. http://doi.org/10.6084/m9.figshare.3524510.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3524510.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wiley
Authors
Richard W. Katz; Grace S. Brush; Marc B. Parlange
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
File List rextreme.txt - R source code for fitting extreme value distributions

Description This is a text file containing R-language source code for fitting extreme value distributions. These functions were originally written in S by Stuart Coles and converted to R by Alec Stephenson. For an explanation of how to use these functions, see the Appendix in: Stuart Coles, 2001. An introduction to the statistical modeling of extreme values. Springer-Verlag, London, UK.

These functions are included to document how the results in the paper were obtained. For other use, it is recommended that the entire suite of functions for extreme value analysis be downloaded. The Extremes Toolkit includes this suite of functions, as well as a graphical user interface (currently available at: www.esig.ucar.edu/extremevalues/evtk.html). gev.fit is a R function that estimates the parameters of the generalized extreme value distribution by the method of maximum likelihood.

gpd.fit is a R function that estimates the parameters of the generalized Pareto distribution by the method of maximum likelihood.

pp.fit is a R function that estimates the parameters of the generalized extreme value distribution, via the point process representation, by the method of maximum likelihood.

Facebook

Twitter

Click to copy link

Link copied

Cite

Francis P. Boscoe; Eva Pradhan (2016). Extreme outliers among state-level causes of death, 1999-2013 [Dataset]. http://doi.org/10.6084/m9.figshare.1422036.v1

Extreme outliers among state-level causes of death, 1999-2013

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.1422036.v1

Dataset updated

Jan 19, 2016

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Francis P. Boscoe; Eva Pradhan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This table identifies all state-level causes of death that were at least five times the national rate in at least one of the periods 1999-2003, 2004-2008, and 2009-2013. Data are based on the 113 Cause of Death list and are based on the CDC's Underlying Cause of Death file accessible at: http://wonder.cdc.gov/ucd-icd10.html.

Clear search

Close search

Google apps

Main menu

Extreme outliers among state-level causes of death, 1999-2013

Performance of the extreme outlier test (EOS) with n = 1 selective site...

Anomaly Detection in High-Dimensional Data

Data from: Leave-One-Out Kernel Density Estimates for Outlier Detection

Outliers and similarity in APOGEE - Dataset - B2FIND

Dynamic Apparel Sales with Anomalies Dataset

Estimation of extreme water level values – Metropolitan coast, 2022 edition...

CPU performance metrics (unclean)

Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values

Experimental results on extreme values of trilateration localization error...

Tab. 4 Absolute air temperature extreme values during the observational...

Extreme values in systems with random interactions

Summary result of number of outliers in selected MNCH data items.

Data from: A powerful test of independent assortment that determines...

Weather Type Classification

Description

Variables

Purpose and Utility

Important Note

License

Data from: Interplay of Inhibition and Multiplexing : Largest Eigenvalue...

The detailed introduction of the outlier data analysis dataset.

Statistical description of the observed data.

Partial outlier data table.

Supplement 1. The R source code for fitting extreme value distributions.

Extreme outliers among state-level causes of death, 1999-2013