87 datasets found

Data from: Anomalous values and missing data in clinical and experimental...
scielo.figshare.com
jpeg
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hélio Amante Miot (2023). Anomalous values and missing data in clinical and experimental studies [Dataset]. http://doi.org/10.6084/m9.figshare.8227163.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8227163.v1
Dataset updated
Jun 2, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Hélio Amante Miot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.
d
Algorithms for Speeding up Distance-Based Outlier Detection
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Algorithms for Speeding up Distance-Based Outlier Detection [Dataset]. https://catalog.data.gov/dataset/algorithms-for-speeding-up-distance-based-outlier-detection
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
The problem of distance-based outlier detection is difficult to solve efficiently in very large datasets because of potential quadratic time complexity. We address this problem and develop sequential and distributed algorithms that are significantly more efficient than state-of-the-art methods while still guaranteeing the same outliers. By combining simple but effective indexing and disk block accessing techniques, we have developed a sequential algorithm iOrca that is up to an order-of-magnitude faster than the state-of-the-art. The indexing scheme is based on sorting the data points in order of increasing distance from a fixed reference point and then accessing those points based on this sorted order. To speed up the basic outlier detection technique, we develop two distributed algorithms (DOoR and iDOoR) for modern distributed multi-core clusters of machines, connected on a ring topology. The first algorithm passes data blocks from each machine around the ring, incrementally updating the nearest neighbors of the points passed. By maintaining a cutoff threshold, it is able to prune a large number of points in a distributed fashion. The second distributed algorithm extends this basic idea with the indexing scheme discussed earlier. In our experiments, both distributed algorithms exhibit significant improvements compared to the state-of-the-art distributed methods.
f
Data from: Methodology to filter out outliers in high spatial density data...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14305658.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
Privacy Preserving Outlier Detection through Random Nonlinear Data...
data.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Privacy Preserving Outlier Detection through Random Nonlinear Data Distortion - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/privacy-preserving-outlier-detection-through-random-nonlinear-data-distortion
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Consider a scenario in which the data owner has some private/sensitive data and wants a data miner to access it for studying important patterns without revealing the sensitive information. Privacy preserving data mining aims to solve this problem by randomly transforming the data prior to its release to data miners. Previous work only considered the case of linear data perturbations — additive, multiplicative or a combination of both for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy preserving anomaly detection from sensitive datasets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that for specific cases it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. Experiments conducted on real-life datasets demonstrate the effectiveness of the approach.
d
Data from: Privacy Preserving Outlier Detection through Random Nonlinear...
catalog.data.gov
data.amerigeoss.org
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Privacy Preserving Outlier Detection through Random Nonlinear Data Distortion [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-outlier-detection-through-random-nonlinear-data-distortion
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Consider a scenario in which the data owner has some private/sensitive data and wants a data miner to access it for studying important patterns without revealing the sensitive information. Privacy preserving data mining aims to solve this problem by randomly transforming the data prior to its release to data miners. Previous work only considered the case of linear data perturbations — additive, multiplicative or a combination of both for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy preserving anomaly detection from sensitive datasets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that for specific cases it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. Experiments conducted on real-life datasets demonstrate the effectiveness of the approach.
f
MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as...
tandf.figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mia Hubert; Peter J. Rousseeuw; Wannes Van den Bossche (2023). MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as Cellwise and Rowwise Outliers [Dataset]. http://doi.org/10.6084/m9.figshare.7624424.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7624424.v2
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francis
Authors
Mia Hubert; Peter J. Rousseeuw; Wannes Van den Bossche
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Multivariate data are typically represented by a rectangular matrix (table) in which the rows are the objects (cases) and the columns are the variables (measurements). When there are many variables one often reduces the dimension by principal component analysis (PCA), which in its basic form is not robust to outliers. Much research has focused on handling rowwise outliers, that is, rows that deviate from the majority of the rows in the data (e.g., they might belong to a different population). In recent years also cellwise outliers are receiving attention. These are suspicious cells (entries) that can occur anywhere in the table. Even a relatively small proportion of outlying cells can contaminate over half the rows, which causes rowwise robust methods to break down. In this article, a new PCA method is constructed which combines the strengths of two existing robust methods to be robust against both cellwise and rowwise outliers. At the same time, the algorithm can cope with missing values. As of yet it is the only PCA method that can deal with all three problems simultaneously. Its name MacroPCA stands for PCA allowing for Missingness And Cellwise & Rowwise Outliers. Several simulations and real datasets illustrate its robustness. New residual maps are introduced, which help to determine which variables are responsible for the outlying behavior. The method is well-suited for online process control.
Z
Multi-Domain Outlier Detection Dataset
data.niaid.nih.gov
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerner, Hannah; Rebbapragada, Umaa; Wagstaff, Kiri; Lu, Steven; Dubayah, Bryce; Huff, Eric; Francis, Raymond; Lee, Jake; Raman, Vinay; Kulshrestha, Sakshum (2022). Multi-Domain Outlier Detection Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5941338
Explore at:
Dataset updated
Mar 31, 2022
Dataset provided by
Jet Propulsion Laboratory, California Institute of Technology
University of Maryland College Park
Authors
Kerner, Hannah; Rebbapragada, Umaa; Wagstaff, Kiri; Lu, Steven; Dubayah, Bryce; Huff, Eric; Francis, Raymond; Lee, Jake; Raman, Vinay; Kulshrestha, Sakshum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Multi-Domain Outlier Detection Dataset contains datasets for conducting outlier detection experiments for four different application domains:

Astrophysics - detecting anomalous observations in the Dark Energy Survey (DES) catalog (data type: feature vectors)

Planetary science - selecting novel geologic targets for follow-up observation onboard the Mars Science Laboratory (MSL) rover (data type: grayscale images)

Earth science: detecting anomalous samples in satellite time series corresponding to ground-truth observations of maize crops (data type: time series/feature vectors)

Fashion-MNIST/MNIST: benchmark task to detect anomalous MNIST images among Fashion-MNIST images (data type: grayscale images)

Each dataset contains a "fit" dataset (used for fitting or training outlier detection models), a "score" dataset (used for scoring samples used to evaluate model performance, analogous to test set), and a label dataset (indicates whether samples in the score dataset are considered outliers or not in the domain of each dataset).

To read more about the datasets and how they are used for outlier detection, or to cite this dataset in your own work, please see the following citation:

Kerner, H. R., Rebbapragada, U., Wagstaff, K. L., Lu, S., Dubayah, B., Huff, E., Lee, J., Raman, V., and Kulshrestha, S. (2022). Domain-agnostic Outlier Ranking Algorithms (DORA)-A Configurable Pipeline for Facilitating Outlier Detection in Scientific Datasets. Under review for Frontiers in Astronomy and Space Sciences.
Data of experiment 1 (outliers removed), split across conditions.
plos.figshare.com
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom A. de Graaf; Joachim Gross; Gavin Paterson; Tessa Rusch; Alexander T. Sack; Gregor Thut (2023). Data of experiment 1 (outliers removed), split across conditions. [Dataset]. http://doi.org/10.1371/journal.pone.0060035.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0060035.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tom A. de Graaf; Joachim Gross; Gavin Paterson; Tessa Rusch; Alexander T. Sack; Gregor Thut
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Displayed are average proportion correct and [standard error of the mean].
r
KMASH Data Repository for outlier detection
research-repository.rmit.edu.au
researchdata.edu.au
+1more
zip
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sevvandi Kandanaarachchi; Mario Andres Munoz Acosta; Kate Smith-Miles; Rob J Hyndman (2023). KMASH Data Repository for outlier detection [Dataset]. http://doi.org/10.26180/5c6253c0b3323
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.26180/5c6253c0b3323
Dataset updated
May 30, 2023
Dataset provided by
RMIT University
Authors
Sevvandi Kandanaarachchi; Mario Andres Munoz Acosta; Kate Smith-Miles; Rob J Hyndman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The zip files contains 12338 datasets for outlier detection investigated in the following papers:(1) Instance space analysis for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Kate Smith-Miles (2) On normalization and algorithm selection for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Rob J. Hyndman, Kate Smith-MilesSome of these datasets were originally discussed in the paper: On the evaluation of unsupervised outlier detection:measures, datasets and an empirical studyAuthors : G. O. Campos, A, Zimek, J. Sander, R. J.G.B. Campello, B. Micenkova, E. Schubert, I. Assent, M.E. Houle.
f
Registration failure rates for registering a point-cloud target shape with...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taylor, Russell H.; Boctor, Emad M.; Billings, Seth D. (2015). Registration failure rates for registering a point-cloud target shape with outliers. (Experiment 5). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001857691
Explore at:
Dataset updated
Mar 6, 2015
Authors
Taylor, Russell H.; Boctor, Emad M.; Billings, Seth D.
Description
Source shapes were randomly generated from a mesh model of a human hip (Fig. 1A), misaligned by [15, 30] mm / degrees in (Experiment 5A) and [30, 60] mm / degrees in (Experiment 5B), and registered back to a point-cloud representation of the mesh. The test cases represent the different noise models used to generate noise on the source shape (Table 4). Outliers were added to the source shape constituting 5% (-i), 10% (-ii), 20% (-iii), and 30% (-iv) of the source points. For each test case, 300 randomized trials were conducted with the percent of unsuccessful registrations (TRE > 10 mm) being shown in the table. The proposed IMLP algorithm was evaluated relative to standard ICP [1], GICP [11], a robust variant of ICP [4], and CPD [20].Registration failure rates for registering a point-cloud target shape with outliers. (Experiment 5).
An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data
plos.figshare.com
doc
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nysia I. George; John F. Bowyer; Nathaniel M. Crabtree; Ching-Wei Chang (2023). An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data [Dataset]. http://doi.org/10.1371/journal.pone.0125224
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125224
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Nysia I. George; John F. Bowyer; Nathaniel M. Crabtree; Ching-Wei Chang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data.
D
Model Access Outlier Detection Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Model Access Outlier Detection Market Research Report 2033 [Dataset]. https://dataintelo.com/report/model-access-outlier-detection-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Model Access Outlier Detection Market Outlook

According to our latest research, the global Model Access Outlier Detection market size reached USD 1.32 billion in 2024, driven by the increasing need for advanced anomaly detection in digital infrastructure. The market is projected to grow at a CAGR of 14.8% from 2025 to 2033, reaching an estimated USD 4.15 billion by 2033. This robust growth is fueled by the rising adoption of AI-based security solutions, the proliferation of complex data environments, and the urgent demand for real-time threat detection across critical industries.

The primary growth factor for the Model Access Outlier Detection market is the exponential increase in cyber threats and sophisticated attacks targeting enterprise data and networks. As organizations digitize operations, they generate vast volumes of data, making traditional rule-based security approaches inadequate. Outlier detection solutions leverage machine learning and artificial intelligence to identify unusual patterns and potential threats in real time, significantly reducing response times and minimizing the risk of data breaches. The integration of these technologies into existing security frameworks is becoming a necessity, especially in highly regulated sectors such as banking, healthcare, and government, where data integrity and privacy are paramount.

Another significant driver propelling the market is the rapid adoption of cloud computing and the proliferation of IoT devices. As businesses migrate workloads to the cloud and deploy interconnected devices, the attack surface expands, necessitating advanced outlier detection mechanisms. Cloud-based solutions offer scalability, flexibility, and centralized monitoring, making them particularly attractive for organizations with distributed operations. Furthermore, the shift towards remote work and digital collaboration has increased the demand for real-time monitoring and anomaly detection to safeguard sensitive data and ensure business continuity. The continuous evolution of AI algorithms and the availability of big data analytics further enhance the accuracy and efficiency of outlier detection systems, contributing to sustained market growth.

The growing emphasis on regulatory compliance and data protection standards worldwide is also catalyzing the adoption of Model Access Outlier Detection solutions. Stringent regulations such as GDPR, HIPAA, and PCI DSS require organizations to implement robust security measures and continuously monitor access to critical systems. Outlier detection tools play a vital role in meeting these compliance requirements by providing automated alerts, detailed audit trails, and actionable insights into suspicious activities. As regulatory landscapes become more complex, organizations are investing in advanced detection technologies not only to avoid penalties but also to build trust with customers and stakeholders.

From a regional perspective, North America currently dominates the Model Access Outlier Detection market, accounting for the largest revenue share in 2024, followed by Europe and Asia Pacific. The presence of leading technology vendors, high cybersecurity awareness, and significant investments in digital infrastructure contribute to North America’s leadership. Europe is experiencing steady growth due to stringent data protection regulations and the increasing adoption of cloud-based security solutions. Meanwhile, the Asia Pacific region is poised for the fastest growth, driven by rapid digital transformation, expanding IT ecosystems, and rising incidences of cyber threats in emerging economies. The market’s global expansion is further supported by ongoing technological advancements and the increasing integration of AI and machine learning in security operations.

Component Analysis

The Component segment of the Model Access Outlier Detection market is broadly categorized into Software and Services. Software solutions are at the core of this market, comprising advanced analytics platforms, AI-driven detection engines, and customizable dashboards. These software offerings are designed to seamlessly integrate with existing IT infrastructure, providing organizations with the capability to monitor access patterns, identify anomalies, and generate real-time alerts. The sophistication of these tools lies in their ability to adapt to evolving threat landscapes, utilizing machine learning algorithms to
Weather Anomalies in the United States
kaggle.com
zip
Updated Nov 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Weather Anomalies in the United States [Dataset]. https://www.kaggle.com/datasets/thedevastator/weather-anomalies-in-the-united-states
Explore at:
zip(98365651 bytes)Available download formats
Dataset updated
Nov 22, 2022
Authors
The Devastator
Area covered
United States
Description
Weather Anomalies in the United States

Outliers from 1964-2013

By Carl V. Lewis [source]

About this dataset

Historical Weather Outliers in the United States,1964-2013:This dataset contains historical weather outliers in the United States from 1964 to 2013. The data includes thereporting station ID, name, min/max temperature, as well as degree coordinates of the recorded weather. The original weather data was collected from NOAA.

Each entry in this dataset represents a report from a weather station with high or low temperatures that were historical outliers within that month, averaged over time. This table's columns contain data that was collected from NOAA as well as data that was calculated using Enigma's assortment of weather data. The direct source of the information is identified in the description of the column.

Columns:date_str,degrees_from_mean,longitude,latitude,max_temp,min_temp,station_name,type

How to use the dataset

This dataset contains historical weather outliers in the United States from 1964 to 2013. The data includes the station ID, name, minimum and maximum temperatures, as well as degree coordinates of the recorded weather.

To use this dataset, simply download it and open it in a text editor or spreadsheet program. The data is organized by columns, with each column representing a different piece of information. Here is a brief explanation of each column:

date_str: The date of the weather report.

degrees_from_mean: The number of degrees that the temperature was above or below the historical mean for that month.

longitude: The longitude of the weather station.

latitude: The latitude of the weather station.

max_temp: The maximum temperature reported by the weather station.

min_temp: The minimum temperature reported by the weather station.

station_name: The name of the weather station.

type: The type of outlier, either high or low

Research Ideas

Plotting the locations of outliers on a map of the US

Identifying weather patterns associated with outliers

Determining which areas of the US are most vulnerable to extreme weather events

Acknowledgements

This dataset was originally published by Enigma.io Analysis.

#

Data Source>

License

Unknown License - Please check the dataset description for more information.

Columns

File: weather-anomalies-1964-2013.csv | Column name | Description | |:----------------------|:----------------------------------------------------------------------------------------------------| | date_str | The date of the weather anomaly. (Date) | | degrees_from_mean | The number of degrees that the temperature was above or below the monthly mean temperature. (Float) | | longitude | The longitude of the weather station where the anomaly was recorded. (Float) | | latitude | The latitude of the weather station where the anomaly was recorded. (Float) | | max_temp | The maximum temperature recorded at the weather station on the date of the anomaly. (Float) | | min_temp | The minimum temperature recorded at the weather station on the date of the anomaly. (Float) | | station_name | The name of the weather station where the anomaly was recorded. (String) | | type | The type of anomaly, either high or low temperature. (String) |

Acknowledgements

If you use this dataset in your research, please credit Carl V. Lewis.
mumpcepy: A Python implementation of the Method of Uncertainty Minimization...
datasets.ai
catalog.data.gov
0
Updated Mar 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2021). mumpcepy: A Python implementation of the Method of Uncertainty Minimization using Polynomial Chaos Expansions [Dataset]. https://datasets.ai/datasets/mumpcepy-a-python-implementation-of-the-method-of-uncertainty-minimization-using-polynomia-c2fc3
Explore at:
0Available download formats
Dataset updated
Mar 11, 2021
Dataset authored and provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The Method of Uncertainty Minimization using Polynomial Chaos Expansions (MUM-PCE) was developed as a software tool to constrain physical models against experimental measurements. These models contain parameters that cannot be easily determined from first principles and so must be measured, and some which cannot even be easily measured. In such cases, the models are validated and tuned against a set of global experiments which may depend on the underlying physical parameters in a complex way. The measurement uncertainty will affect the uncertainty in the parameter values.
D
Metrology Outlier Detection AI Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Metrology Outlier Detection AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/metrology-outlier-detection-ai-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Metrology Outlier Detection AI Market Outlook

According to our latest research, the global Metrology Outlier Detection AI market size reached USD 1.18 billion in 2024, reflecting rapid adoption across high-precision industries. The market is expanding at a robust CAGR of 18.4% and is projected to attain a value of USD 5.53 billion by 2033. This impressive growth is primarily driven by the increasing demand for automated quality assurance and defect detection across manufacturing and high-tech sectors, as organizations strive to optimize processes and reduce costs while maintaining stringent accuracy standards.

One of the primary growth factors propelling the Metrology Outlier Detection AI market is the surge in demand for advanced quality control solutions in semiconductor manufacturing and electronics industries. As these sectors face mounting pressure to deliver flawless products with microscopic tolerances, traditional metrology tools are often insufficient for detecting subtle anomalies. The integration of AI-based outlier detection into metrology systems enables real-time identification of defects and process deviations, significantly improving yield rates and reducing waste. Furthermore, the proliferation of smart factories and Industry 4.0 initiatives is compelling manufacturers to adopt intelligent metrology solutions that leverage machine learning algorithms, computer vision, and big data analytics to drive continuous process improvements and predictive maintenance.

Another crucial driver is the increasing complexity of products in automotive, aerospace, and healthcare sectors. Modern vehicles, aircraft, and medical devices involve intricate assemblies and rely on components manufactured to exacting specifications. Even minor deviations can result in significant safety, performance, or regulatory issues. AI-powered metrology outlier detection systems provide a scalable and adaptive approach to monitoring production quality, detecting anomalies that might escape conventional inspection techniques. This capability not only ensures compliance with international standards but also enhances brand reputation and customer trust. The rising adoption of digital twins and simulation-driven design further amplifies the need for robust AI-driven metrology, as organizations seek to bridge the gap between virtual models and physical outcomes.

The market is also benefiting from advancements in sensor technologies, edge computing, and cloud-based analytics platforms. These innovations enable seamless integration of AI-driven outlier detection into existing manufacturing and quality control workflows, facilitating real-time data acquisition, processing, and visualization. The availability of scalable cloud infrastructure allows enterprises of all sizes to leverage sophisticated AI models without incurring prohibitive upfront costs. Additionally, partnerships between AI solution providers and metrology equipment manufacturers are accelerating the development of turnkey systems tailored to specific industry requirements. As a result, the barrier to entry for implementing AI in metrology is rapidly diminishing, fueling widespread adoption across both established players and emerging entrants in the market.

From a regional perspective, Asia Pacific remains the dominant force in the Metrology Outlier Detection AI market, accounting for the largest share in 2024. This is attributed to the region's strong presence in semiconductor manufacturing, electronics, and automotive industries, particularly in countries such as China, Japan, South Korea, and Taiwan. North America and Europe are also witnessing significant growth, driven by technological advancements, robust R&D ecosystems, and stringent quality regulations in aerospace and healthcare. Meanwhile, the Middle East & Africa and Latin America are gradually emerging as promising markets, supported by increasing investments in industrial automation and quality infrastructure. The interplay of regional dynamics, industry-specific challenges, and evolving regulatory landscapes will continue to shape the trajectory of the global market over the coming years.

Component Analysis

The Metrology Outlier Detection AI market by component is segmented into Software, Hardware, and Services, each playing a vital role in the overall ecosystem. The software segment dominates the market, accounting for the largest share in 2024. This is primarily due to the rapid advancemen
H
Replication data for: Robust Estimation and Outlier Detection for...
dataverse.harvard.edu
Updated Nov 28, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Walter R. Mebane; Jasjeet S. Sekhon (2007). Replication data for: Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data [Dataset]. http://doi.org/10.7910/DVN/RDXADE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/RDXADE
Dataset updated
Nov 28, 2007
Dataset provided by
Harvard Dataverse
Authors
Walter R. Mebane; Jasjeet S. Sekhon
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
1993 - 2000
Description
We develop a robust estimator—the hyperbolic tangent (tanh) estimator—for over dispersed multinomial regression models of count data. The tanh estimator provides accurate estimates and reliable inferences even when the specified model is not good for as much as half of the data. Seriously ill-fitted counts—outliers—are identified as part of the estimation. A Monte Carlo sampling experiment shows that the tanh estimator produces good results at practical sample sizes even when ten percent of the data are generated by a significantly different process. The experiment shows that, with contaminated data, estimation fails using four other estimators: the non-robust maximum likelihood estimator, the additive logistic model and two SUR models. Using the tanh estimator to analyze data from Florida for the 2000 presidential election matches well-known features of the election that the other four estimators fail to capture. In an analysis of data from the 1993 Polish parliamentary election, the tanh estimator gives sharper inferences than does a previously proposed hetero-skedastic SUR model.
z
Controlled Anomalies Time Series (CATS) Dataset
zenodo.org
bin
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7646897
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7646897
Dataset updated
Jul 12, 2024
Dataset provided by
Solenix Engineering GmbH
Authors
Patrick Fleith; Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

5 million timestamps. Sensors readings are at 1Hz sampling frequency.

1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.

4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).

200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.

Different types of anomalies to understand what anomaly types can be detected by different approaches.

Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.

Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.

Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.

Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

About Solenix

Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.
r
Deep one-class learning: a deep learning approach to anomaly detection
resodate.org
Updated Oct 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Ruff (2021). Deep one-class learning: a deep learning approach to anomaly detection [Dataset]. http://doi.org/10.14279/depositonce-12250
Explore at:
Unique identifier
https://doi.org/10.14279/depositonce-12250
Dataset updated
Oct 8, 2021
Dataset provided by
Technische Universität Berlin
DepositOnce
Authors
Lukas Ruff
Description
Anomaly detection is the problem of identifying unusual patterns in data. This problem is relevant for a wide variety of applications in various domains such as fault and damage detection in manufacturing, fraud detection in finance and insurance, intrusion detection in cybersecurity, disease detection in medical diagnosis, or scientific discovery. Many of these applications involve increasingly complex data at large scale, for instance, large collections of images or text. The lack of effective solutions in such settings has sparked an interest in developing anomaly detection methods based on deep learning, which has enabled breakthroughs in other machine learning problems that involve large amounts of complex data. This thesis proposes Deep One-Class Learning, a deep learning approach to anomaly detection that is based on the one-class classification paradigm. One-class classification views anomaly detection from a classification perspective, aiming to learn a discriminative decision boundary that separates the normal from the anomalous data. In contrast to previous methods that rely on fixed (usually manually engineered) features, deep one-class learning expands the one-class classification approach with methods that learn (or transfer) data representations via suitable one-class learning objectives. The key idea underlying deep one-class learning is to learn a transformation (e.g., a deep neural network) in such a way that the normal data points are concentrated in feature space, causing anomalies to deviate from the concentrated region, thereby making them detectable. We introduce several deep one-class learning methods in this thesis that follow the above idea while integrating different assumptions about the data or a specific domain. These include semi-supervised variants that can incorporate labeled anomalies, for example, or specific methods for images and text that enable model interpretability and an explanation of anomalies. Moreover, we present a unifying view of anomaly detection methods that, in addition to one-class classification, also covers reconstruction methods as well as methods based on density estimation and generative modeling. For each of these main approaches, we identify connections between respective deep and "shallow" methods based on common underlying principles. Through multiple experiments and analyses, we demonstrate that deep one-class learning is useful for anomaly detection, especially on semantic detection tasks. Finally, we conclude this thesis by discussing limits of the proposed approach and outlining specific paths for future research.
Z
Identification of Performance Changes at Code Level (Measurement...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Aug 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous for Reviewing (2022). Identification of Performance Changes at Code Level (Measurement Configuration Dataset) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6300863
Explore at:
Dataset updated
Aug 8, 2022
Authors
Anonymous for Reviewing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Measurement Configuration Dataset

This is the anonymous reviewing version; the source code repository will be added after the review.

This dataset provides reproduction data for performance measurement configuration at source code level in Java. The measurement data can be obtained using the precision-experiments repository https://anonymous.4open.science/r/precision-experiments-C613/ (Examining Different Repetition Counts) yourself. These data conatained here are the data we obtained from execution on i7-4770 CPU @ 3.40GHz.

The analysis was tested on Ubuntu 20.04 and gnuplot 5.2.8. It will not work with older gnuplot versions.

To execute the analysis, extract the data by

tar -xvf basic-parameter-comparison.tar tar -xvf parallel-sequential-comparison.tar

and afterwards build the precision-experiments repo and execute the analysis by

cd precision-experiments/precision-analysis/ ../gradlew fatJar cd scripts/configuration-analysis/ ./executeCompleteAnalysis.sh ../../../../basic-parameter-comparison ../../../../parallel-sequential-comparison

Afterwards, the following files will be present:

precision-experiments/precision-analysis/scripts/configuration-analysis/repetitionHeatmaps/heatmap_all_en.pdf (Heatmaps for different repetition counts)

precision-experiments/precision-analysis/scripts/configuration-analysis/repetitionHeatmaps/heatmap_outlierRemoval_en.pdf (Heatmap with and without outlier removal for 1000 repetitions)

precision-experiments/precision-analysis/scripts/configuration-analysis/histogram_outliers_en.pdf (Histogram of the outliers)

precision-experiments/precision-analysis/scripts/configuration-analysis/heatmap_parallel_en.pdf (Heatmap with sequential and parallel execution)
Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
pdf
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jun 12, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
Canada, United States
Description
Snapshot img

Anomaly Detection Market Size 2025-2029

The anomaly detection market size is valued to increase by USD 4.44 billion, at a CAGR of 14.4% from 2024 to 2029. Anomaly detection tools gaining traction in BFSI will drive the anomaly detection market.

Major Market Trends & Insights

North America dominated the market and accounted for a 43% growth during the forecast period. By Deployment - Cloud segment was valued at USD 1.75 billion in 2023 By Component - Solution segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 173.26 million Market Future Opportunities: USD 4441.70 million CAGR from 2024 to 2029 : 14.4%

Market Summary

Anomaly detection, a critical component of advanced analytics, is witnessing significant adoption across various industries, with the financial services sector leading the charge. The increasing incidence of internal threats and cybersecurity frauds necessitates the need for robust anomaly detection solutions. These tools help organizations identify unusual patterns and deviations from normal behavior, enabling proactive response to potential threats and ensuring operational efficiency. For instance, in a supply chain context, anomaly detection can help identify discrepancies in inventory levels or delivery schedules, leading to cost savings and improved customer satisfaction. In the realm of compliance, anomaly detection can assist in maintaining regulatory adherence by flagging unusual transactions or activities, thereby reducing the risk of penalties and reputational damage. According to recent research, organizations that implement anomaly detection solutions experience a reduction in error rates by up to 25%. This improvement not only enhances operational efficiency but also contributes to increased customer trust and satisfaction. Despite these benefits, challenges persist, including data quality and the need for real-time processing capabilities. As the market continues to evolve, advancements in machine learning and artificial intelligence are expected to address these challenges and drive further growth.

What will be the Size of the Anomaly Detection Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

How is the Anomaly Detection Market Segmented ?

The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment Cloud On-premises Component Solution Services End-user BFSI IT and telecom Retail and e-commerce Manufacturing Others Technology Big data analytics AI and ML Data mining and business intelligence Geography North America US Canada Mexico Europe France Germany Spain UK APAC China India Japan Rest of World (ROW)

By Deployment Insights

The cloud segment is estimated to witness significant growth during the forecast period.

The market is witnessing significant growth, driven by the increasing adoption of advanced technologies such as machine learning algorithms, predictive modeling tools, and real-time monitoring systems. Businesses are increasingly relying on anomaly detection solutions to enhance their root cause analysis, improve system health indicators, and reduce false positives. This is particularly true in sectors where data is generated in real-time, such as cybersecurity threat detection, network intrusion detection, and fraud detection systems. Cloud-based anomaly detection solutions are gaining popularity due to their flexibility, scalability, and cost-effectiveness.

This growth is attributed to cloud-based solutions' quick deployment, real-time data visibility, and customization capabilities, which are offered at flexible payment options like monthly subscriptions and pay-as-you-go models. Companies like Anodot, Ltd, Cisco Systems Inc, IBM Corp, and SAS Institute Inc provide both cloud-based and on-premise anomaly detection solutions. Anomaly detection methods include outlier detection, change point detection, and statistical process control. Data preprocessing steps, such as data mining techniques and feature engineering processes, are crucial in ensuring accurate anomaly detection. Data visualization dashboards and alert fatigue mitigation techniques help in managing and interpreting the vast amounts of data generated.

Network traffic analysis, log file analysis, and sensor data integration are essential components of anomaly detection systems. Additionally, risk management frameworks, drift detection algorithms, time series forecasting, and performance degradation detection are vital in maintaining system performance and capacity planning.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hélio Amante Miot (2023). Anomalous values and missing data in clinical and experimental studies [Dataset]. http://doi.org/10.6084/m9.figshare.8227163.v1

Data from: Anomalous values and missing data in clinical and experimental studies

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.8227163.v1

Dataset updated

Jun 2, 2023

Dataset provided by

SciELOhttp://www.scielo.org/

Authors

Hélio Amante Miot

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

Clear search

Close search

Google apps

Main menu

Data from: Anomalous values and missing data in clinical and experimental...

Algorithms for Speeding up Distance-Based Outlier Detection

Data from: Methodology to filter out outliers in high spatial density data...

Privacy Preserving Outlier Detection through Random Nonlinear Data...

Data from: Privacy Preserving Outlier Detection through Random Nonlinear...

MacroPCA: An All-in-One PCA Method Allowing for Missing Values as Well as...

Multi-Domain Outlier Detection Dataset

Data of experiment 1 (outliers removed), split across conditions.

KMASH Data Repository for outlier detection

Registration failure rates for registering a point-cloud target shape with...

An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data

Model Access Outlier Detection Market Research Report 2033

Model Access Outlier Detection Market Outlook

Component Analysis

Weather Anomalies in the United States

Weather Anomalies in the United States

Outliers from 1964-2013

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

mumpcepy: A Python implementation of the Method of Uncertainty Minimization...

Metrology Outlier Detection AI Market Research Report 2033

Metrology Outlier Detection AI Market Outlook

Component Analysis

Replication data for: Robust Estimation and Outlier Detection for...

Controlled Anomalies Time Series (CATS) Dataset

Deep one-class learning: a deep learning approach to anomaly detection

Identification of Performance Changes at Code Level (Measurement...

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Data from: Anomalous values and missing data in clinical and experimental studies