56 datasets found
  1. Extreme outliers among state-level causes of death, 1999-2013

    • figshare.com
    txt
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francis P. Boscoe; Eva Pradhan (2016). Extreme outliers among state-level causes of death, 1999-2013 [Dataset]. http://doi.org/10.6084/m9.figshare.1422036.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Francis P. Boscoe; Eva Pradhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table identifies all state-level causes of death that were at least five times the national rate in at least one of the periods 1999-2003, 2004-2008, and 2009-2013. Data are based on the 113 Cause of Death list and are based on the CDC's Underlying Cause of Death file accessible at: http://wonder.cdc.gov/ucd-icd10.html.

  2. f

    Performance of the extreme outlier test (EOS) with n = 1 selective site...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Carvajal-Rodríguez (2023). Performance of the extreme outlier test (EOS) with n = 1 selective site located at the center of the chromosome or n = 5. [Dataset]. http://doi.org/10.1371/journal.pone.0175944.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Antonio Carvajal-Rodríguez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Selection was α = 600 and Nm = 10. Mean localization is given in distance (kb) from the real selective position.

  3. f

    Anomaly Detection in High-Dimensional Data

    • tandf.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyanga Dilini Talagala; Rob J. Hyndman; Kate Smith-Miles (2023). Anomaly Detection in High-Dimensional Data [Dataset]. http://doi.org/10.6084/m9.figshare.12844508.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Priyanga Dilini Talagala; Rob J. Hyndman; Kate Smith-Miles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation where its k-nearest neighbor distance with the maximum gap is significantly different from what we would expect if the distribution of k-nearest neighbors with the maximum gap is in the maximum domain of attraction of the Gumbel distribution. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray. Supplementary materials for this article are available online.

  4. f

    Data from: Leave-One-Out Kernel Density Estimates for Outlier Detection

    • tandf.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sevvandi Kandanaarachchi; Rob J Hyndman (2023). Leave-One-Out Kernel Density Estimates for Outlier Detection [Dataset]. http://doi.org/10.6084/m9.figshare.16942936.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Sevvandi Kandanaarachchi; Rob J Hyndman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article introduces lookout, a new approach to detect outliers using leave-one-out kernel density estimates and extreme value theory. Outlier detection methods that use kernel density estimates generally employ a user defined parameter to determine the bandwidth. Lookout uses persistent homology to construct a bandwidth suitable for outlier detection without any user input. We demonstrate the effectiveness of lookout on an extensive data repository by comparing its performance with other outlier detection methods based on extreme value theory. Furthermore, we introduce outlier persistence, a useful concept that explores the birth and the cessation of outliers with changing bandwidth and significance levels. The R package lookout implements this algorithm. Supplementary files for this article are available online.

  5. d

    Outliers and similarity in APOGEE - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Nov 2, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Outliers and similarity in APOGEE - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/b624b506-541b-5a09-b615-14b8e202c468
    Explore at:
    Dataset updated
    Nov 2, 2017
    Description

    In this work we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the dataset, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the dataset for objects allows us to find objects that are impossible to find using their best fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the dataset, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data. Cone search capability for table J/MNRAS/476/2117/apogeenn (Nearest neighbors APOGEE IDs)

  6. c

    Dynamic Apparel Sales with Anomalies Dataset

    • cubig.ai
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Dynamic Apparel Sales with Anomalies Dataset [Dataset]. https://cubig.ai/store/products/423/dynamic-apparel-sales-with-anomalies-dataset
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Dynamic Apparel Sales with Anomalies Dataset is based on 100,000 sales transaction data from the fashion industry, including extreme outliers, missing values, and sales_categories, reflecting the different data characteristics of real retail environments.

    2) Data Utilization (1) Dynamic Apparel Sales with Anomalies Dataset has characteristics that: • This dataset consists of nine categorical variables and 10 numerical variables, including product name, brand, gender clothing, price, discount rate, inventory level, and customer behavior, making it suitable for analyzing product and customer characteristics. (2) Dynamic Apparel Sales with Anomalies Dataset can be used to: • Sales anomaly detection and quality control: Transaction data with outliers and missing values can be used to detect outliers, manage quality, refine data, and develop outlier processing techniques. • Sales Forecast and Customer Analysis Modeling: Based on a variety of product and customer characteristics, it can be used to support data-driven decision-making, such as machine learning-based sales forecasting, customer segmentation, and customized marketing strategies.

  7. g

    Estimation of extreme water level values – Metropolitan coast, 2022 edition...

    • gimi9.com
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Estimation of extreme water level values – Metropolitan coast, 2022 edition | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_http-www-shom-fr-maree_courants-niv_extr
    Explore at:
    Dataset updated
    Dec 17, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This study is a first global estimate of the extreme water level values along the metropolitan coastline. It is to be refined locally with all available data and knowledge. The method used is based on a statistical analysis of the tide charts available in ports. It does not take into account the observations of the waves. The results between ports are obtained by an interpolation method. The study produces statistical estimates at reference ports: — extreme values of open sea overcots in the Channel and Atlantic; — extreme water level values for the entire metropolis. And a set of statistical estimation maps of extreme water levels along the coastline. The estimates provided go up to the return period 1000 years. In view of the observation times used at ports, the user must check whether the estimates of a return period of more than 50 or 100 years still make sense.

  8. CPU performance metrics (unclean)

    • kaggle.com
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Arfath R (2025). CPU performance metrics (unclean) [Dataset]. https://www.kaggle.com/datasets/mohammedarfathr/cpu-performance-metrics-unclean
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohammed Arfath R
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset simulates CPU performance logs collected from multiple systems.

    Columns:

    1. Disk write speed
    2. Disk read speed
    3. CPU Usage (%) - CPU utilization (with missing values, outliers, and anomalies).

    4. CPU Temperature (°C) - Temperature readings (with random noise and extreme values).

    5. Clock Speed (GHz) - CPU clock speed (with inconsistent formatting and missing values).

    6. Cache Miss Rate (%) - Percentage of cache misses (skewed distribution and corrupted values).

    7. Power Consumption (W) - Power usage in watts (with extreme outliers and inconsistent scaling).

  9. Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values

    • ceicdata.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values [Dataset]. https://www.ceicdata.com/en/sweden/consumer-survey-national-institute-of-economic-research/consumer-survey-ki-perceived-inflation-now-excl-extreme-values
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 1, 2017 - Jul 1, 2018
    Area covered
    Sweden
    Variables measured
    Consumer Survey
    Description

    Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values data was reported at 2.210 % in Jul 2018. This records a decrease from the previous number of 2.550 % for Jun 2018. Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values data is updated monthly, averaging 1.720 % from Dec 2001 (Median) to Jul 2018, with 200 observations. The data reached an all-time high of 4.770 % in Jul 2008 and a record low of 0.000 % in Mar 2016. Sweden Consumer Survey: KI: Perceived Inflation Now: excl Extreme Values data remains active status in CEIC and is reported by National Institute of Economic Research. The data is categorized under Global Database’s Sweden – Table SE.H009: Consumer Survey: National Institute of Economic Research.

  10. i

    Experimental results on extreme values of trilateration localization error...

    • ieee-dataport.org
    Updated Oct 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Farooq-i-Azam (2019). Experimental results on extreme values of trilateration localization error in wireless communication systems [Dataset]. https://ieee-dataport.org/documents/experimental-results-extreme-values-trilateration-localization-error-wireless
    Explore at:
    Dataset updated
    Oct 28, 2019
    Authors
    Muhammad Farooq-i-Azam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set provides experimental results for the analysis of extreme values of trilateration localization error in wireless communication systems. The analysis is based upon the analytical model of trilateration localization error described and discussed in the manuscript titled "An Analytical Model of Trilateration Localization Error".

  11. Tab. 4 Absolute air temperature extreme values during the observational...

    • doi.pangaea.de
    html, tsv
    Updated Oct 2, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Diem (2010). Tab. 4 Absolute air temperature extreme values during the observational period 1951-1965 [Dataset]. http://doi.org/10.1594/PANGAEA.745849
    Explore at:
    tsv, htmlAvailable download formats
    Dataset updated
    Oct 2, 2010
    Dataset provided by
    PANGAEA
    Authors
    Max Diem
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1960 - Dec 1, 1960
    Area covered
    Variables measured
    DATE/TIME, Event label, Temperature, air, maximum, Temperature, air, minimum
    Description

    This dataset is about: Tab. 4 Absolute air temperature extreme values during the observational period 1951-1965. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.745935 for more information.

  12. E

    Extreme values in systems with random interactions

    • edmond.mpg.de
    bin
    Updated Nov 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Girard; Martin Girard (2022). Extreme values in systems with random interactions [Dataset]. http://doi.org/10.17617/3.2TPJQX
    Explore at:
    bin(29778414), bin(29778418), bin(29680114), bin(6446965222), bin(407265787), bin(105275884), bin(1615225325)Available download formats
    Dataset updated
    Nov 24, 2022
    Dataset provided by
    Edmond
    Authors
    Martin Girard; Martin Girard
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Simulations of systems with random interactions, for different underlying distributions and realizations of the interaction.

  13. f

    Summary result of number of outliers in selected MNCH data items.

    • plos.figshare.com
    xls
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keshab Sanjel; Shiv Lal Sharma; Swadesh Gurung; Man Bahadur Oli; Samikshya Singh; Tuk Prasad Pokhrel (2024). Summary result of number of outliers in selected MNCH data items. [Dataset]. http://doi.org/10.1371/journal.pone.0298101.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 1, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Keshab Sanjel; Shiv Lal Sharma; Swadesh Gurung; Man Bahadur Oli; Samikshya Singh; Tuk Prasad Pokhrel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary result of number of outliers in selected MNCH data items.

  14. d

    Data from: A powerful test of independent assortment that determines...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William C. L. Stewart; Valerie R. Hager (2025). A powerful test of independent assortment that determines genome-wide significance quickly and accurately [Dataset]. http://doi.org/10.5061/dryad.7tb57
    Explore at:
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    William C. L. Stewart; Valerie R. Hager
    Time period covered
    Jan 1, 2016
    Description

    In the analysis of DNA sequences on related individuals, most methods strive to incorporate as much information as possible, with little or no attention paid to the issue of statistical significance. For example, a modern workstation can easily handle the computations needed to perform a large-scale genome-wide inheritance-by-descent (IBD) scan, but accurate assessment of the significance of that scan is often hindered by inaccurate approximations and computationally intensive simulation. To address these issues, we developed gLOD-a test of co-segregation that, for large samples, models chromosome-specific IBD statistics as a collection of stationary Gaussian processes. With this simple model, the parametric bootstrap yields an accurate and rapid assessment of significance-the genome-wide corrected P-value. Furthermore, we show that (i) under the null hypothesis, the limiting distribution of the gLOD is the standard Gumbel distribution; (ii) our parametric bootstrap simulator is approxi...

  15. Weather Type Classification

    • kaggle.com
    Updated Jun 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Narayan (2024). Weather Type Classification [Dataset]. https://www.kaggle.com/datasets/nikhil7280/weather-type-classification/suggestions?status=pending
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nikhil Narayan
    Description

    Description

    This dataset is synthetically generated to mimic weather data for classification tasks. It includes various weather-related features and categorizes the weather into four types: Rainy, Sunny, Cloudy, and Snowy. This dataset is designed for practicing classification algorithms, data preprocessing, and outlier detection methods.

    Variables

    • Temperature (numeric): The temperature in degrees Celsius, ranging from extreme cold to extreme heat.
    • Humidity (numeric): The humidity percentage, including values above 100% to introduce outliers.
    • Wind Speed (numeric): The wind speed in kilometers per hour, with a range including unrealistically high values.
    • Precipitation (%) (numeric): The precipitation percentage, including outlier values.
    • Cloud Cover (categorical): The cloud cover description.
    • Atmospheric Pressure (numeric): The atmospheric pressure in hPa, covering a wide range.
    • UV Index (numeric): The UV index, indicating the strength of ultraviolet radiation.
    • Season (categorical): The season during which the data was recorded.
    • Visibility (km) (numeric): The visibility in kilometers, including very low or very high values.
    • Location (categorical): The type of location where the data was recorded.
    • Weather Type (categorical): The target variable for classification, indicating the weather type.

    Purpose and Utility

    This dataset is useful for data scientists, students especially beginners, and practitioners to investigate classification algorithm's performance, practice data preprocessing, feature engineering, model evaluation, and test outlier detection methods. It provides opportunities for learning and experimenting with weather data analysis and machine learning techniques.

    Important Note

    This dataset is synthetically produced and does not convey real-world weather data. It includes intentional outliers to provide opportunities for practicing outlier detection and handling. The values, ranges, and distributions may not accurately represent real-world conditions, and the data should primarily be used for educational and experimental purposes.

    License

    Anyone is free to share and use the data

  16. f

    Data from: Interplay of Inhibition and Multiplexing : Largest Eigenvalue...

    • figshare.com
    pdf
    Updated Jun 21, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SAPTARSHI GHOSH; Sanjiv K Dwivedi; Sarika Jalan (2016). Interplay of Inhibition and Multiplexing : Largest Eigenvalue Statistics [Dataset]. http://doi.org/10.6084/m9.figshare.3452825.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 21, 2016
    Dataset provided by
    figshare
    Authors
    SAPTARSHI GHOSH; Sanjiv K Dwivedi; Sarika Jalan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Captions:Figure 1: Phase diagram depicting shape parameter ξ for accepted GEV distribution forER-ER multiplex network as a function of IC inclusion probabilities (pin) in both the layers. Region Bcorresponds to the Weibull. Region A stands for undefined distributions. Size of the network N=100 in eachlayer.Figure 2: (Color online) Distribution of Rmax of SF networks with average degree hki = 4 for various ICinclusion probabilities (pin). Histogram is fitted with normal (blue dotted line) and GEV (red solid line)distributions. Network size N=500.Figure 3: (Color online) Distribution of Rmax of SF networks with average degree hki = 6 for various ICinclusion probabilities (pin). Histogram is fitted with normal (blue dotted line) and GEV (red solid line)distributions. Network size N=500.Table 1: Estimated parameters of KS test for fitting GEV and normal distributions of Rmax for differentnetwork sizes of SF network over a average of 5000 random realization. Other parameters are inhibitioninclusion probability pin = 0.5 and average degree k = 6.Table 2: Estimated parameters of KS test for fitting of GEV and normal distributions of Rmax for different inhibitoryinclusion probability (pin) of SF- SF network over 5000 population. Other parameters are network size N = 100 in each layer andaverage degree k = 6.Table 3: Estimated parameters of KS test for fitting of GEV and normal distributions of Rmax for different inhibitoryinclusion probability (pin) of ER- SF network over 5000 population. Other parameters are network size N = 100 in each layer andaverage degree k = 6.

  17. f

    The detailed introduction of the outlier data analysis dataset.

    • plos.figshare.com
    xls
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qirong Lu; Jian Zou; Yingya Ye; Zexin Wang (2024). The detailed introduction of the outlier data analysis dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0299435.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Qirong Lu; Jian Zou; Yingya Ye; Zexin Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The detailed introduction of the outlier data analysis dataset.

  18. f

    Statistical description of the observed data.

    • plos.figshare.com
    xls
    Updated May 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mao Liu; Wenyi Yang; Ting Tian; Jie Yang; Zhen Ding (2024). Statistical description of the observed data. [Dataset]. http://doi.org/10.1371/journal.pone.0302360.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 20, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Mao Liu; Wenyi Yang; Ting Tian; Jie Yang; Zhen Ding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Attendance absences have a substantial impact on student’s future physical and mental health as well as academic progress. Numerous personal, familial, and social issues are among the causes of student absences. Any kind of absence from school should be minimized. Extremely high rates of student absences may indicate the abrupt commencement of a serious school health crisis or public health crisis, such as the spread of tuberculosis or COVID-19, which provides school health professionals with an early warning. We take the extreme values in absence data as the object and attempt to apply the extreme value theory (EVT) to describe the distribution of extreme values. This study aims to predict extreme instances of student absences. School health professionals can take preventative measures to reduce future excessive absences, according to the predicted results. Five statistical distributions were applied to individually characterize the extreme values. Our findings suggest that EVT is a useful tool for predicting extreme student absences, thereby aiding preventative measures in public health.

  19. f

    Partial outlier data table.

    • plos.figshare.com
    xls
    Updated Mar 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qirong Lu; Jian Zou; Yingya Ye; Zexin Wang (2024). Partial outlier data table. [Dataset]. http://doi.org/10.1371/journal.pone.0299435.t013
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Qirong Lu; Jian Zou; Yingya Ye; Zexin Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The detection of water quality indicators such as Temperature, pH, Turbidity, Conductivity, and TDS involves five national standard methods. Chemically based measurement techniques may generate liquid residue, causing secondary pollution. The water quality monitoring and data analysis system can effectively address the issues that conventional methods require multiple pieces of equipment and repeated measurements. This paper analyzes the distribution characteristics of the historical data from five sensors at a specific time, displays them graphically in real time, and provides an early warning of exceeding the standard; It selects four water samples from different sections of the Li River, based on the national standard method, the average measurement errors of Temperature, PH, TDS, Conductivity and Turbidity are 0.98%, 2.23%, 2.92%, 3.05% and 3.98%.;It further uses the quartile method to analyze the outlier data over 100,000 records and five historical periods are selected. Experiment results show the system is relatively stable in measuring Temperature, PH and TDS, and the proportion of outlier is 0.42%, 0.84% and 1.24%. When Turbidity and Conductivity are measured, the proportion is 3.11% and 2.92%. In the experiment of using 7 methods to fill outlier, K nearest neighbor algorithm is better than others. The analysis of data trends, outliers, means, and extreme values assists in making decisions, such as updating and maintaining equipment, addressing extreme water quality situations, and enhancing regional water quality oversight.

  20. f

    Supplement 1. The R source code for fitting extreme value distributions.

    • wiley.figshare.com
    html
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard W. Katz; Grace S. Brush; Marc B. Parlange (2023). Supplement 1. The R source code for fitting extreme value distributions. [Dataset]. http://doi.org/10.6084/m9.figshare.3524510.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wiley
    Authors
    Richard W. Katz; Grace S. Brush; Marc B. Parlange
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    File List rextreme.txt - R source code for fitting extreme value distributions

    Description This is a text file containing R-language source code for fitting extreme value distributions. These functions were originally written in S by Stuart Coles and converted to R by Alec Stephenson. For an explanation of how to use these functions, see the Appendix in: Stuart Coles, 2001. An introduction to the statistical modeling of extreme values. Springer-Verlag, London, UK.

    These functions are included to document how the results in the paper were obtained. For other use, it is recommended that the entire suite of functions for extreme value analysis be downloaded. The Extremes Toolkit includes this suite of functions, as well as a graphical user interface (currently available at: www.esig.ucar.edu/extremevalues/evtk.html). gev.fit is a R function that estimates the parameters of the generalized extreme value distribution by the method of maximum likelihood.

    gpd.fit is a R function that estimates the parameters of the generalized Pareto distribution by the method of maximum likelihood.

    pp.fit is a R function that estimates the parameters of the generalized extreme value distribution, via the point process representation, by the method of maximum likelihood.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Francis P. Boscoe; Eva Pradhan (2016). Extreme outliers among state-level causes of death, 1999-2013 [Dataset]. http://doi.org/10.6084/m9.figshare.1422036.v1
Organization logoOrganization logo

Extreme outliers among state-level causes of death, 1999-2013

Explore at:
txtAvailable download formats
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Francis P. Boscoe; Eva Pradhan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This table identifies all state-level causes of death that were at least five times the national rate in at least one of the periods 1999-2003, 2004-2008, and 2009-2013. Data are based on the 113 Cause of Death list and are based on the CDC's Underlying Cause of Death file accessible at: http://wonder.cdc.gov/ucd-icd10.html.

Search
Clear search
Close search
Google apps
Main menu