87 datasets found
  1. Data from: Anomalous values and missing data in clinical and experimental...

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hélio Amante Miot (2023). Anomalous values and missing data in clinical and experimental studies [Dataset]. http://doi.org/10.6084/m9.figshare.8227163.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Hélio Amante Miot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

  2. d

    Algorithms for Speeding up Distance-Based Outlier Detection

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Algorithms for Speeding up Distance-Based Outlier Detection [Dataset]. https://catalog.data.gov/dataset/algorithms-for-speeding-up-distance-based-outlier-detection
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    The problem of distance-based outlier detection is difficult to solve efficiently in very large datasets because of potential quadratic time complexity. We address this problem and develop sequential and distributed algorithms that are significantly more efficient than state-of-the-art methods while still guaranteeing the same outliers. By combining simple but effective indexing and disk block accessing techniques, we have developed a sequential algorithm iOrca that is up to an order-of-magnitude faster than the state-of-the-art. The indexing scheme is based on sorting the data points in order of increasing distance from a fixed reference point and then accessing those points based on this sorted order. To speed up the basic outlier detection technique, we develop two distributed algorithms (DOoR and iDOoR) for modern distributed multi-core clusters of machines, connected on a ring topology. The first algorithm passes data blocks from each machine around the ring, incrementally updating the nearest neighbors of the points passed. By maintaining a cutoff threshold, it is able to prune a large number of points in a distributed fashion. The second distributed algorithm extends this basic idea with the indexing scheme discussed earlier. In our experiments, both distributed algorithms exhibit significant improvements compared to the state-of-the-art distributed methods.

  3. f

    Data from: Methodology to filter out outliers in high spatial density data...

    • scielo.figshare.com
    jpeg
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELO journals
    Authors
    Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.

  4. Privacy Preserving Outlier Detection through Random Nonlinear Data...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Privacy Preserving Outlier Detection through Random Nonlinear Data Distortion - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/privacy-preserving-outlier-detection-through-random-nonlinear-data-distortion
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Consider a scenario in which the data owner has some private/sensitive data and wants a data miner to access it for studying important patterns without revealing the sensitive information. Privacy preserving data mining aims to solve this problem by randomly transforming the data prior to its release to data miners. Previous work only considered the case of linear data perturbations — additive, multiplicative or a combination of both for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy preserving anomaly detection from sensitive datasets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that for specific cases it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. Experiments conducted on real-life datasets demonstrate the effectiveness of the approach.

  5. d

    Data from: Privacy Preserving Outlier Detection through Random Nonlinear...

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Privacy Preserving Outlier Detection through Random Nonlinear Data Distortion [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-outlier-detection-through-random-nonlinear-data-distortion
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Consider a scenario in which the data owner has some private/sensitive data and wants a data miner to access it for studying important patterns without revealing the sensitive information. Privacy preserving data mining aims to solve this problem by randomly transforming the data prior to its release to data miners. Previous work only considered the case of linear data perturbations — additive, multiplicative or a combination of both for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy preserving anomaly detection from sensitive datasets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that for specific cases it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. Experiments conducted on real-life datasets demonstrate the effectiveness of the approach.

  6. f

    MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as...

    • tandf.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mia Hubert; Peter J. Rousseeuw; Wannes Van den Bossche (2023). MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as Cellwise and Rowwise Outliers [Dataset]. http://doi.org/10.6084/m9.figshare.7624424.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Mia Hubert; Peter J. Rousseeuw; Wannes Van den Bossche
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multivariate data are typically represented by a rectangular matrix (table) in which the rows are the objects (cases) and the columns are the variables (measurements). When there are many variables one often reduces the dimension by principal component analysis (PCA), which in its basic form is not robust to outliers. Much research has focused on handling rowwise outliers, that is, rows that deviate from the majority of the rows in the data (e.g., they might belong to a different population). In recent years also cellwise outliers are receiving attention. These are suspicious cells (entries) that can occur anywhere in the table. Even a relatively small proportion of outlying cells can contaminate over half the rows, which causes rowwise robust methods to break down. In this article, a new PCA method is constructed which combines the strengths of two existing robust methods to be robust against both cellwise and rowwise outliers. At the same time, the algorithm can cope with missing values. As of yet it is the only PCA method that can deal with all three problems simultaneously. Its name MacroPCA stands for PCA allowing for Missingness And Cellwise & Rowwise Outliers. Several simulations and real datasets illustrate its robustness. New residual maps are introduced, which help to determine which variables are responsible for the outlying behavior. The method is well-suited for online process control.

  7. Z

    Multi-Domain Outlier Detection Dataset

    • data.niaid.nih.gov
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kerner, Hannah; Rebbapragada, Umaa; Wagstaff, Kiri; Lu, Steven; Dubayah, Bryce; Huff, Eric; Francis, Raymond; Lee, Jake; Raman, Vinay; Kulshrestha, Sakshum (2022). Multi-Domain Outlier Detection Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5941338
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    Jet Propulsion Laboratory, California Institute of Technology
    University of Maryland College Park
    Authors
    Kerner, Hannah; Rebbapragada, Umaa; Wagstaff, Kiri; Lu, Steven; Dubayah, Bryce; Huff, Eric; Francis, Raymond; Lee, Jake; Raman, Vinay; Kulshrestha, Sakshum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Multi-Domain Outlier Detection Dataset contains datasets for conducting outlier detection experiments for four different application domains:

    Astrophysics - detecting anomalous observations in the Dark Energy Survey (DES) catalog (data type: feature vectors)

    Planetary science - selecting novel geologic targets for follow-up observation onboard the Mars Science Laboratory (MSL) rover (data type: grayscale images)

    Earth science: detecting anomalous samples in satellite time series corresponding to ground-truth observations of maize crops (data type: time series/feature vectors)

    Fashion-MNIST/MNIST: benchmark task to detect anomalous MNIST images among Fashion-MNIST images (data type: grayscale images)

    Each dataset contains a "fit" dataset (used for fitting or training outlier detection models), a "score" dataset (used for scoring samples used to evaluate model performance, analogous to test set), and a label dataset (indicates whether samples in the score dataset are considered outliers or not in the domain of each dataset).

    To read more about the datasets and how they are used for outlier detection, or to cite this dataset in your own work, please see the following citation:

    Kerner, H. R., Rebbapragada, U., Wagstaff, K. L., Lu, S., Dubayah, B., Huff, E., Lee, J., Raman, V., and Kulshrestha, S. (2022). Domain-agnostic Outlier Ranking Algorithms (DORA)-A Configurable Pipeline for Facilitating Outlier Detection in Scientific Datasets. Under review for Frontiers in Astronomy and Space Sciences.

  8. Data of experiment 1 (outliers removed), split across conditions.

    • plos.figshare.com
    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom A. de Graaf; Joachim Gross; Gavin Paterson; Tessa Rusch; Alexander T. Sack; Gregor Thut (2023). Data of experiment 1 (outliers removed), split across conditions. [Dataset]. http://doi.org/10.1371/journal.pone.0060035.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tom A. de Graaf; Joachim Gross; Gavin Paterson; Tessa Rusch; Alexander T. Sack; Gregor Thut
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Displayed are average proportion correct and [standard error of the mean].

  9. r

    KMASH Data Repository for outlier detection

    • research-repository.rmit.edu.au
    • researchdata.edu.au
    • +1more
    zip
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sevvandi Kandanaarachchi; Mario Andres Munoz Acosta; Kate Smith-Miles; Rob J Hyndman (2023). KMASH Data Repository for outlier detection [Dataset]. http://doi.org/10.26180/5c6253c0b3323
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    RMIT University
    Authors
    Sevvandi Kandanaarachchi; Mario Andres Munoz Acosta; Kate Smith-Miles; Rob J Hyndman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The zip files contains 12338 datasets for outlier detection investigated in the following papers:(1) Instance space analysis for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Kate Smith-Miles (2) On normalization and algorithm selection for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Rob J. Hyndman, Kate Smith-MilesSome of these datasets were originally discussed in the paper: On the evaluation of unsupervised outlier detection:measures, datasets and an empirical studyAuthors : G. O. Campos, A, Zimek, J. Sander, R. J.G.B. Campello, B. Micenkova, E. Schubert, I. Assent, M.E. Houle.

  10. f

    Registration failure rates for registering a point-cloud target shape with...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 6, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taylor, Russell H.; Boctor, Emad M.; Billings, Seth D. (2015). Registration failure rates for registering a point-cloud target shape with outliers. (Experiment 5). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001857691
    Explore at:
    Dataset updated
    Mar 6, 2015
    Authors
    Taylor, Russell H.; Boctor, Emad M.; Billings, Seth D.
    Description

    Source shapes were randomly generated from a mesh model of a human hip (Fig. 1A), misaligned by [15, 30] mm / degrees in (Experiment 5A) and [30, 60] mm / degrees in (Experiment 5B), and registered back to a point-cloud representation of the mesh. The test cases represent the different noise models used to generate noise on the source shape (Table 4). Outliers were added to the source shape constituting 5% (-i), 10% (-ii), 20% (-iii), and 30% (-iv) of the source points. For each test case, 300 randomized trials were conducted with the percent of unsuccessful registrations (TRE > 10 mm) being shown in the table. The proposed IMLP algorithm was evaluated relative to standard ICP [1], GICP [11], a robust variant of ICP [4], and CPD [20].Registration failure rates for registering a point-cloud target shape with outliers. (Experiment 5).

  11. An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data

    • plos.figshare.com
    doc
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nysia I. George; John F. Bowyer; Nathaniel M. Crabtree; Ching-Wei Chang (2023). An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data [Dataset]. http://doi.org/10.1371/journal.pone.0125224
    Explore at:
    docAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nysia I. George; John F. Bowyer; Nathaniel M. Crabtree; Ching-Wei Chang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data.

  12. D

    Model Access Outlier Detection Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Model Access Outlier Detection Market Research Report 2033 [Dataset]. https://dataintelo.com/report/model-access-outlier-detection-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Model Access Outlier Detection Market Outlook



    According to our latest research, the global Model Access Outlier Detection market size reached USD 1.32 billion in 2024, driven by the increasing need for advanced anomaly detection in digital infrastructure. The market is projected to grow at a CAGR of 14.8% from 2025 to 2033, reaching an estimated USD 4.15 billion by 2033. This robust growth is fueled by the rising adoption of AI-based security solutions, the proliferation of complex data environments, and the urgent demand for real-time threat detection across critical industries.




    The primary growth factor for the Model Access Outlier Detection market is the exponential increase in cyber threats and sophisticated attacks targeting enterprise data and networks. As organizations digitize operations, they generate vast volumes of data, making traditional rule-based security approaches inadequate. Outlier detection solutions leverage machine learning and artificial intelligence to identify unusual patterns and potential threats in real time, significantly reducing response times and minimizing the risk of data breaches. The integration of these technologies into existing security frameworks is becoming a necessity, especially in highly regulated sectors such as banking, healthcare, and government, where data integrity and privacy are paramount.




    Another significant driver propelling the market is the rapid adoption of cloud computing and the proliferation of IoT devices. As businesses migrate workloads to the cloud and deploy interconnected devices, the attack surface expands, necessitating advanced outlier detection mechanisms. Cloud-based solutions offer scalability, flexibility, and centralized monitoring, making them particularly attractive for organizations with distributed operations. Furthermore, the shift towards remote work and digital collaboration has increased the demand for real-time monitoring and anomaly detection to safeguard sensitive data and ensure business continuity. The continuous evolution of AI algorithms and the availability of big data analytics further enhance the accuracy and efficiency of outlier detection systems, contributing to sustained market growth.




    The growing emphasis on regulatory compliance and data protection standards worldwide is also catalyzing the adoption of Model Access Outlier Detection solutions. Stringent regulations such as GDPR, HIPAA, and PCI DSS require organizations to implement robust security measures and continuously monitor access to critical systems. Outlier detection tools play a vital role in meeting these compliance requirements by providing automated alerts, detailed audit trails, and actionable insights into suspicious activities. As regulatory landscapes become more complex, organizations are investing in advanced detection technologies not only to avoid penalties but also to build trust with customers and stakeholders.




    From a regional perspective, North America currently dominates the Model Access Outlier Detection market, accounting for the largest revenue share in 2024, followed by Europe and Asia Pacific. The presence of leading technology vendors, high cybersecurity awareness, and significant investments in digital infrastructure contribute to North America’s leadership. Europe is experiencing steady growth due to stringent data protection regulations and the increasing adoption of cloud-based security solutions. Meanwhile, the Asia Pacific region is poised for the fastest growth, driven by rapid digital transformation, expanding IT ecosystems, and rising incidences of cyber threats in emerging economies. The market’s global expansion is further supported by ongoing technological advancements and the increasing integration of AI and machine learning in security operations.



    Component Analysis



    The Component segment of the Model Access Outlier Detection market is broadly categorized into Software and Services. Software solutions are at the core of this market, comprising advanced analytics platforms, AI-driven detection engines, and customizable dashboards. These software offerings are designed to seamlessly integrate with existing IT infrastructure, providing organizations with the capability to monitor access patterns, identify anomalies, and generate real-time alerts. The sophistication of these tools lies in their ability to adapt to evolving threat landscapes, utilizing machine learning algorithms to

  13. Weather Anomalies in the United States

    • kaggle.com
    zip
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Weather Anomalies in the United States [Dataset]. https://www.kaggle.com/datasets/thedevastator/weather-anomalies-in-the-united-states
    Explore at:
    zip(98365651 bytes)Available download formats
    Dataset updated
    Nov 22, 2022
    Authors
    The Devastator
    Area covered
    United States
    Description

    Weather Anomalies in the United States

    Outliers from 1964-2013

    By Carl V. Lewis [source]

    About this dataset

    Historical Weather Outliers in the United States,1964-2013:This dataset contains historical weather outliers in the United States from 1964 to 2013. The data includes thereporting station ID, name, min/max temperature, as well as degree coordinates of the recorded weather. The original weather data was collected from NOAA.

    Each entry in this dataset represents a report from a weather station with high or low temperatures that were historical outliers within that month, averaged over time. This table's columns contain data that was collected from NOAA as well as data that was calculated using Enigma's assortment of weather data. The direct source of the information is identified in the description of the column.

    Columns:date_str,degrees_from_mean,longitude,latitude,max_temp,min_temp,station_name,type

    How to use the dataset

    This dataset contains historical weather outliers in the United States from 1964 to 2013. The data includes the station ID, name, minimum and maximum temperatures, as well as degree coordinates of the recorded weather.

    To use this dataset, simply download it and open it in a text editor or spreadsheet program. The data is organized by columns, with each column representing a different piece of information. Here is a brief explanation of each column:

    • date_str: The date of the weather report.
    • degrees_from_mean: The number of degrees that the temperature was above or below the historical mean for that month.
    • longitude: The longitude of the weather station.
    • latitude: The latitude of the weather station.
    • max_temp: The maximum temperature reported by the weather station.
    • min_temp: The minimum temperature reported by the weather station.
    • station_name: The name of the weather station.
    • type: The type of outlier, either high or low

    Research Ideas

    • Plotting the locations of outliers on a map of the US
    • Identifying weather patterns associated with outliers
    • Determining which areas of the US are most vulnerable to extreme weather events

    Acknowledgements

    This dataset was originally published by Enigma.io Analysis.

    #

    Data Source>

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: weather-anomalies-1964-2013.csv | Column name | Description | |:----------------------|:----------------------------------------------------------------------------------------------------| | date_str | The date of the weather anomaly. (Date) | | degrees_from_mean | The number of degrees that the temperature was above or below the monthly mean temperature. (Float) | | longitude | The longitude of the weather station where the anomaly was recorded. (Float) | | latitude | The latitude of the weather station where the anomaly was recorded. (Float) | | max_temp | The maximum temperature recorded at the weather station on the date of the anomaly. (Float) | | min_temp | The minimum temperature recorded at the weather station on the date of the anomaly. (Float) | | station_name | The name of the weather station where the anomaly was recorded. (String) | | type | The type of anomaly, either high or low temperature. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit Carl V. Lewis.

  14. mumpcepy: A Python implementation of the Method of Uncertainty Minimization...

    • datasets.ai
    • catalog.data.gov
    0
    Updated Mar 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2021). mumpcepy: A Python implementation of the Method of Uncertainty Minimization using Polynomial Chaos Expansions [Dataset]. https://datasets.ai/datasets/mumpcepy-a-python-implementation-of-the-method-of-uncertainty-minimization-using-polynomia-c2fc3
    Explore at:
    0Available download formats
    Dataset updated
    Mar 11, 2021
    Dataset authored and provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The Method of Uncertainty Minimization using Polynomial Chaos Expansions (MUM-PCE) was developed as a software tool to constrain physical models against experimental measurements. These models contain parameters that cannot be easily determined from first principles and so must be measured, and some which cannot even be easily measured. In such cases, the models are validated and tuned against a set of global experiments which may depend on the underlying physical parameters in a complex way. The measurement uncertainty will affect the uncertainty in the parameter values.

  15. D

    Metrology Outlier Detection AI Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Metrology Outlier Detection AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/metrology-outlier-detection-ai-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Metrology Outlier Detection AI Market Outlook



    According to our latest research, the global Metrology Outlier Detection AI market size reached USD 1.18 billion in 2024, reflecting rapid adoption across high-precision industries. The market is expanding at a robust CAGR of 18.4% and is projected to attain a value of USD 5.53 billion by 2033. This impressive growth is primarily driven by the increasing demand for automated quality assurance and defect detection across manufacturing and high-tech sectors, as organizations strive to optimize processes and reduce costs while maintaining stringent accuracy standards.




    One of the primary growth factors propelling the Metrology Outlier Detection AI market is the surge in demand for advanced quality control solutions in semiconductor manufacturing and electronics industries. As these sectors face mounting pressure to deliver flawless products with microscopic tolerances, traditional metrology tools are often insufficient for detecting subtle anomalies. The integration of AI-based outlier detection into metrology systems enables real-time identification of defects and process deviations, significantly improving yield rates and reducing waste. Furthermore, the proliferation of smart factories and Industry 4.0 initiatives is compelling manufacturers to adopt intelligent metrology solutions that leverage machine learning algorithms, computer vision, and big data analytics to drive continuous process improvements and predictive maintenance.




    Another crucial driver is the increasing complexity of products in automotive, aerospace, and healthcare sectors. Modern vehicles, aircraft, and medical devices involve intricate assemblies and rely on components manufactured to exacting specifications. Even minor deviations can result in significant safety, performance, or regulatory issues. AI-powered metrology outlier detection systems provide a scalable and adaptive approach to monitoring production quality, detecting anomalies that might escape conventional inspection techniques. This capability not only ensures compliance with international standards but also enhances brand reputation and customer trust. The rising adoption of digital twins and simulation-driven design further amplifies the need for robust AI-driven metrology, as organizations seek to bridge the gap between virtual models and physical outcomes.




    The market is also benefiting from advancements in sensor technologies, edge computing, and cloud-based analytics platforms. These innovations enable seamless integration of AI-driven outlier detection into existing manufacturing and quality control workflows, facilitating real-time data acquisition, processing, and visualization. The availability of scalable cloud infrastructure allows enterprises of all sizes to leverage sophisticated AI models without incurring prohibitive upfront costs. Additionally, partnerships between AI solution providers and metrology equipment manufacturers are accelerating the development of turnkey systems tailored to specific industry requirements. As a result, the barrier to entry for implementing AI in metrology is rapidly diminishing, fueling widespread adoption across both established players and emerging entrants in the market.




    From a regional perspective, Asia Pacific remains the dominant force in the Metrology Outlier Detection AI market, accounting for the largest share in 2024. This is attributed to the region's strong presence in semiconductor manufacturing, electronics, and automotive industries, particularly in countries such as China, Japan, South Korea, and Taiwan. North America and Europe are also witnessing significant growth, driven by technological advancements, robust R&D ecosystems, and stringent quality regulations in aerospace and healthcare. Meanwhile, the Middle East & Africa and Latin America are gradually emerging as promising markets, supported by increasing investments in industrial automation and quality infrastructure. The interplay of regional dynamics, industry-specific challenges, and evolving regulatory landscapes will continue to shape the trajectory of the global market over the coming years.



    Component Analysis



    The Metrology Outlier Detection AI market by component is segmented into Software, Hardware, and Services, each playing a vital role in the overall ecosystem. The software segment dominates the market, accounting for the largest share in 2024. This is primarily due to the rapid advancemen

  16. H

    Replication data for: Robust Estimation and Outlier Detection for...

    • dataverse.harvard.edu
    Updated Nov 28, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walter R. Mebane; Jasjeet S. Sekhon (2007). Replication data for: Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data [Dataset]. http://doi.org/10.7910/DVN/RDXADE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 28, 2007
    Dataset provided by
    Harvard Dataverse
    Authors
    Walter R. Mebane; Jasjeet S. Sekhon
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1993 - 2000
    Description

    We develop a robust estimator—the hyperbolic tangent (tanh) estimator—for over dispersed multinomial regression models of count data. The tanh estimator provides accurate estimates and reliable inferences even when the specified model is not good for as much as half of the data. Seriously ill-fitted counts—outliers—are identified as part of the estimation. A Monte Carlo sampling experiment shows that the tanh estimator produces good results at practical sample sizes even when ten percent of the data are generated by a significantly different process. The experiment shows that, with contaminated data, estimation fails using four other estimators: the non-robust maximum likelihood estimator, the additive logistic model and two SUR models. Using the tanh estimator to analyze data from Florida for the 2000 presidential election matches well-known features of the election that the other four estimators fail to capture. In an analysis of data from the 1993 Polish parliamentary election, the tanh estimator gives sharper inferences than does a previously proposed hetero-skedastic SUR model.

  17. z

    Controlled Anomalies Time Series (CATS) Dataset

    • zenodo.org
    bin
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7646897
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Solenix Engineering GmbH
    Authors
    Patrick Fleith; Patrick Fleith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

    The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

    • Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:
      • 4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.
      • 3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.
      • 10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.
    • 5 million timestamps. Sensors readings are at 1Hz sampling frequency.
      • 1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.
      • 4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).
    • 200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.
    • Different types of anomalies to understand what anomaly types can be detected by different approaches.
    • Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.
    • Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.
    • Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.
    • Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.
    • No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

    [1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

    About Solenix

    Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.

  18. r

    Deep one-class learning: a deep learning approach to anomaly detection

    • resodate.org
    Updated Oct 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Ruff (2021). Deep one-class learning: a deep learning approach to anomaly detection [Dataset]. http://doi.org/10.14279/depositonce-12250
    Explore at:
    Dataset updated
    Oct 8, 2021
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Lukas Ruff
    Description

    Anomaly detection is the problem of identifying unusual patterns in data. This problem is relevant for a wide variety of applications in various domains such as fault and damage detection in manufacturing, fraud detection in finance and insurance, intrusion detection in cybersecurity, disease detection in medical diagnosis, or scientific discovery. Many of these applications involve increasingly complex data at large scale, for instance, large collections of images or text. The lack of effective solutions in such settings has sparked an interest in developing anomaly detection methods based on deep learning, which has enabled breakthroughs in other machine learning problems that involve large amounts of complex data. This thesis proposes Deep One-Class Learning, a deep learning approach to anomaly detection that is based on the one-class classification paradigm. One-class classification views anomaly detection from a classification perspective, aiming to learn a discriminative decision boundary that separates the normal from the anomalous data. In contrast to previous methods that rely on fixed (usually manually engineered) features, deep one-class learning expands the one-class classification approach with methods that learn (or transfer) data representations via suitable one-class learning objectives. The key idea underlying deep one-class learning is to learn a transformation (e.g., a deep neural network) in such a way that the normal data points are concentrated in feature space, causing anomalies to deviate from the concentrated region, thereby making them detectable. We introduce several deep one-class learning methods in this thesis that follow the above idea while integrating different assumptions about the data or a specific domain. These include semi-supervised variants that can incorporate labeled anomalies, for example, or specific methods for images and text that enable model interpretability and an explanation of anomalies. Moreover, we present a unifying view of anomaly detection methods that, in addition to one-class classification, also covers reconstruction methods as well as methods based on density estimation and generative modeling. For each of these main approaches, we identify connections between respective deep and "shallow" methods based on common underlying principles. Through multiple experiments and analyses, we demonstrate that deep one-class learning is useful for anomaly detection, especially on semantic detection tasks. Finally, we conclude this thesis by discussing limits of the proposed approach and outlining specific paths for future research.

  19. Z

    Identification of Performance Changes at Code Level (Measurement...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous for Reviewing (2022). Identification of Performance Changes at Code Level (Measurement Configuration Dataset) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6300863
    Explore at:
    Dataset updated
    Aug 8, 2022
    Authors
    Anonymous for Reviewing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Measurement Configuration Dataset

    This is the anonymous reviewing version; the source code repository will be added after the review.

    This dataset provides reproduction data for performance measurement configuration at source code level in Java. The measurement data can be obtained using the precision-experiments repository https://anonymous.4open.science/r/precision-experiments-C613/ (Examining Different Repetition Counts) yourself. These data conatained here are the data we obtained from execution on i7-4770 CPU @ 3.40GHz.

    The analysis was tested on Ubuntu 20.04 and gnuplot 5.2.8. It will not work with older gnuplot versions.

    To execute the analysis, extract the data by

    tar -xvf basic-parameter-comparison.tar tar -xvf parallel-sequential-comparison.tar

    and afterwards build the precision-experiments repo and execute the analysis by

    cd precision-experiments/precision-analysis/ ../gradlew fatJar cd scripts/configuration-analysis/ ./executeCompleteAnalysis.sh ../../../../basic-parameter-comparison ../../../../parallel-sequential-comparison

    Afterwards, the following files will be present:

    precision-experiments/precision-analysis/scripts/configuration-analysis/repetitionHeatmaps/heatmap_all_en.pdf (Heatmaps for different repetition counts)

    precision-experiments/precision-analysis/scripts/configuration-analysis/repetitionHeatmaps/heatmap_outlierRemoval_en.pdf (Heatmap with and without outlier removal for 1000 repetitions)

    precision-experiments/precision-analysis/scripts/configuration-analysis/histogram_outliers_en.pdf (Histogram of the outliers)

    precision-experiments/precision-analysis/scripts/configuration-analysis/heatmap_parallel_en.pdf (Heatmap with sequential and parallel execution)

  20. Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Canada, United States
    Description

    Snapshot img

    Anomaly Detection Market Size 2025-2029

    The anomaly detection market size is valued to increase by USD 4.44 billion, at a CAGR of 14.4% from 2024 to 2029. Anomaly detection tools gaining traction in BFSI will drive the anomaly detection market.

    Major Market Trends & Insights

    North America dominated the market and accounted for a 43% growth during the forecast period.
    By Deployment - Cloud segment was valued at USD 1.75 billion in 2023
    By Component - Solution segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 173.26 million
    Market Future Opportunities: USD 4441.70 million
    CAGR from 2024 to 2029 : 14.4%
    

    Market Summary

    Anomaly detection, a critical component of advanced analytics, is witnessing significant adoption across various industries, with the financial services sector leading the charge. The increasing incidence of internal threats and cybersecurity frauds necessitates the need for robust anomaly detection solutions. These tools help organizations identify unusual patterns and deviations from normal behavior, enabling proactive response to potential threats and ensuring operational efficiency. For instance, in a supply chain context, anomaly detection can help identify discrepancies in inventory levels or delivery schedules, leading to cost savings and improved customer satisfaction. In the realm of compliance, anomaly detection can assist in maintaining regulatory adherence by flagging unusual transactions or activities, thereby reducing the risk of penalties and reputational damage.
    According to recent research, organizations that implement anomaly detection solutions experience a reduction in error rates by up to 25%. This improvement not only enhances operational efficiency but also contributes to increased customer trust and satisfaction. Despite these benefits, challenges persist, including data quality and the need for real-time processing capabilities. As the market continues to evolve, advancements in machine learning and artificial intelligence are expected to address these challenges and drive further growth.
    

    What will be the Size of the Anomaly Detection Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the Anomaly Detection Market Segmented ?

    The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      Cloud
      On-premises
    
    
    Component
    
      Solution
      Services
    
    
    End-user
    
      BFSI
      IT and telecom
      Retail and e-commerce
      Manufacturing
      Others
    
    
    Technology
    
      Big data analytics
      AI and ML
      Data mining and business intelligence
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        France
        Germany
        Spain
        UK
    
    
      APAC
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The cloud segment is estimated to witness significant growth during the forecast period.

    The market is witnessing significant growth, driven by the increasing adoption of advanced technologies such as machine learning algorithms, predictive modeling tools, and real-time monitoring systems. Businesses are increasingly relying on anomaly detection solutions to enhance their root cause analysis, improve system health indicators, and reduce false positives. This is particularly true in sectors where data is generated in real-time, such as cybersecurity threat detection, network intrusion detection, and fraud detection systems. Cloud-based anomaly detection solutions are gaining popularity due to their flexibility, scalability, and cost-effectiveness.

    This growth is attributed to cloud-based solutions' quick deployment, real-time data visibility, and customization capabilities, which are offered at flexible payment options like monthly subscriptions and pay-as-you-go models. Companies like Anodot, Ltd, Cisco Systems Inc, IBM Corp, and SAS Institute Inc provide both cloud-based and on-premise anomaly detection solutions. Anomaly detection methods include outlier detection, change point detection, and statistical process control. Data preprocessing steps, such as data mining techniques and feature engineering processes, are crucial in ensuring accurate anomaly detection. Data visualization dashboards and alert fatigue mitigation techniques help in managing and interpreting the vast amounts of data generated.

    Network traffic analysis, log file analysis, and sensor data integration are essential components of anomaly detection systems. Additionally, risk management frameworks, drift detection algorithms, time series forecasting, and performance degradation detection are vital in maintaining system performance and capacity planning.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hélio Amante Miot (2023). Anomalous values and missing data in clinical and experimental studies [Dataset]. http://doi.org/10.6084/m9.figshare.8227163.v1
Organization logo

Data from: Anomalous values and missing data in clinical and experimental studies

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Hélio Amante Miot
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

Search
Clear search
Close search
Google apps
Main menu