85 datasets found

f
Data from: Error and anomaly detection for intra-participant time-series...
tandf.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5189002
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
David R. Mullineaux; Gareth Irwin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.
s
Outlier Set Two-step Method (OSTI)
orda.shef.ac.uk
application/x-rar
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amal Sarfraz; Abigail Birnbaum; Flannery Dolan; Jonathan Lamontagne; Lyudmila Mihaylova; Charles Rouge (2025). Outlier Set Two-step Method (OSTI) [Dataset]. http://doi.org/10.15131/shef.data.28227974.v3
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.28227974.v3
Dataset updated
Jul 1, 2025
Dataset provided by
The University of Sheffield
Authors
Amal Sarfraz; Abigail Birnbaum; Flannery Dolan; Jonathan Lamontagne; Lyudmila Mihaylova; Charles Rouge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These files are supplements to the paper titled 'A Robust Two-step Method for Detection of Outlier Sets'.This paper identifies and addresses the need for a robust method that identifies sets of points that collectively deviate from typical patterns in a dataset, which it calls "outlier sets'', while excluding individual points from detection. This new methodology, Outlier Set Two-step Identification (OSTI) employs a two-step approach to detect and label these outlier sets. First, it uses Gaussian Mixture Models for probabilistic clustering, identifying candidate outlier sets based on cluster weights below a predetermined threshold. Second, OSTI measures the Inter-cluster Mahalanobis distance between each candidate outlier set's centroid and the overall dataset mean. OSTI then tests the null hypothesis that this distance does not significantly differ from its theoretical chi-square distribution, enabling the formal detection of outlier sets. We test OSTI systematically on 8,000 synthetic 2D datasets across various inlier configurations and thousands of possible outlier set characteristics. Results show OSTI robustly and consistently detects outlier sets with an average F1 score of 0.92 and an average purity (the degree to which outlier sets identified correspond to those generated synthetically, i.e., our ground truth) of 98.58%. We also compare OSTI with state-of-the-art outlier detection methods, to illuminate how OSTI fills a gap as a tool for the exclusive detection of outlier sets.
Outlier Datasets - original
kaggle.com
zip
Updated Feb 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hai Vo (2021). Outlier Datasets - original [Dataset]. https://www.kaggle.com/hariwh0/outlier-detection-datasets
Explore at:
zip(1534928268 bytes)Available download formats
Dataset updated
Feb 5, 2021
Authors
Hai Vo
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by Hai Vo

Released under Database: Open Database, Contents: Database Contents

Contents
outlier detection text reducing
kaggle.com
Updated Aug 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Mortezaie (2025). outlier detection text reducing [Dataset]. https://www.kaggle.com/datasets/alimortezaie/outlier-detection-text-reducing
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 7, 2025
Dataset provided by
Kaggle
Authors
Ali Mortezaie
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Ali Mortezaie

Released under Apache 2.0

Contents
R
Vision Based Building Energy Data Outlier Detection Dataset
universe.roboflow.com
zip
Updated Apr 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
energy data outlier detection (2024). Vision Based Building Energy Data Outlier Detection Dataset [Dataset]. https://universe.roboflow.com/energy-data-outlier-detection/vision-based-building-energy-data-outlier-detection/model/5
Explore at:
zipAvailable download formats
Dataset updated
Apr 3, 2024
Dataset authored and provided by
energy data outlier detection
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
11785 Bounding Boxes
Description
Vision Based Building Energy Data Outlier Detection

## Overview Vision Based Building Energy Data Outlier Detection is a dataset for object detection tasks - it contains 11785 annotations for 2,159 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Additional file 2 of Outlier identification and monitoring of institutional...
springernature.figshare.com
txt
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menelaos Pavlou; Gareth Ambler; Rumana Z. Omar; Andrew T. Goodwin; Uday Trivedi; Peter Ludman; Mark de Belder (2023). Additional file 2 of Outlier identification and monitoring of institutional or clinician performance: an overview of statistical methods and application to national audit data [Dataset]. http://doi.org/10.6084/m9.figshare.22612465.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22612465.v1
Dataset updated
Jun 21, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Menelaos Pavlou; Gareth Ambler; Rumana Z. Omar; Andrew T. Goodwin; Uday Trivedi; Peter Ludman; Mark de Belder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2.
f
Data from: Simultaneous Outlier Detection and Prediction for Kriging with...
tandf.figshare.com
zip
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youjie Zeng; Zhanfeng Wang; Youngjo Lee; Niansheng Tang (2025). Simultaneous Outlier Detection and Prediction for Kriging with True Identification [Dataset]. http://doi.org/10.6084/m9.figshare.28715504.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28715504.v1
Dataset updated
May 30, 2025
Dataset provided by
Taylor & Francis
Authors
Youjie Zeng; Zhanfeng Wang; Youngjo Lee; Niansheng Tang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Kriging with interpolation is widely used in various noise-free areas, such as computer experiments. However, owing to its Gaussian assumption, it is susceptible to outliers, which affects statistical inference, and the resulting conclusions could be misleading. Little work has explored outlier detection for kriging. Therefore, we propose a novel kriging method for simultaneous outlier detection and prediction by introducing a normal-gamma prior, which results in an unbounded penalty on the biases to distinguish outliers from normal data points. We develop a simple and efficient method, avoiding the expensive computation of the Markov chain Monte Carlo algorithm, to simultaneously detect outliers and make a prediction. We establish the true identification property for outlier detection and the consistency of the estimated hyperparameters in kriging under the increasing domain framework as if the number and locations of the outliers were known in advance. Under appropriate regularity conditions, we demonstrate information consistency for prediction in the presence of outliers. Numerical studies and real data examples show that the proposed method generally provides robust analyses in the presence of outliers. Supplementary materials for this article are available online.
Multi-Domain Outlier Detection Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hannah Kerner; Hannah Kerner; Umaa Rebbapragada; Umaa Rebbapragada; Kiri Wagstaff; Kiri Wagstaff; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha (2022). Multi-Domain Outlier Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.6400786
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6400786
Dataset updated
Mar 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hannah Kerner; Hannah Kerner; Umaa Rebbapragada; Umaa Rebbapragada; Kiri Wagstaff; Kiri Wagstaff; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Multi-Domain Outlier Detection Dataset contains datasets for conducting outlier detection experiments for four different application domains:

Astrophysics - detecting anomalous observations in the Dark Energy Survey (DES) catalog (data type: feature vectors)

Planetary science - selecting novel geologic targets for follow-up observation onboard the Mars Science Laboratory (MSL) rover (data type: grayscale images)

Earth science: detecting anomalous samples in satellite time series corresponding to ground-truth observations of maize crops (data type: time series/feature vectors)

Fashion-MNIST/MNIST: benchmark task to detect anomalous MNIST images among Fashion-MNIST images (data type: grayscale images)

Each dataset contains a "fit" dataset (used for fitting or training outlier detection models), a "score" dataset (used for scoring samples used to evaluate model performance, analogous to test set), and a label dataset (indicates whether samples in the score dataset are considered outliers or not in the domain of each dataset).

To read more about the datasets and how they are used for outlier detection, or to cite this dataset in your own work, please see the following citation:

Kerner, H. R., Rebbapragada, U., Wagstaff, K. L., Lu, S., Dubayah, B., Huff, E., Lee, J., Raman, V., and Kulshrestha, S. (2022). Domain-agnostic Outlier Ranking Algorithms (DORA)-A Configurable Pipeline for Facilitating Outlier Detection in Scientific Datasets. Under review for Frontiers in Astronomy and Space Sciences.
f
Data from: Multivariate Outliers and the O3 Plot
tandf.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antony Unwin (2023). Multivariate Outliers and the O3 Plot [Dataset]. http://doi.org/10.6084/m9.figshare.7792115.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7792115.v1
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Antony Unwin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identifying and dealing with outliers is an important part of data analysis. A new visualization, the O3 plot, is introduced to aid in the display and understanding of patterns of multivariate outliers. It uses the results of identifying outliers for every possible combination of dataset variables to provide insight into why particular cases are outliers. The O3 plot can be used to compare the results from up to six different outlier identification methods. There is anRpackage OutliersO3 implementing the plot. The article is illustrated with outlier analyses of German demographic and economic data. Supplementary materials for this article are available online.
Introduction to Outlier
kaggle.com
zip
Updated Jul 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omsingh Bais (2021). Introduction to Outlier [Dataset]. https://www.kaggle.com/datasets/ombais/introduction-to-outlier
Explore at:
zip(5672 bytes)Available download formats
Dataset updated
Jul 10, 2021
Authors
Omsingh Bais
Description
Dataset

This dataset was created by Omsingh Bais

Contents
G
AI Histology QC Outlier Detection Tool Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Aug 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
AI Histology QC Outlier Detection Tool Market Outlook

According to our latest research, the global AI Histology QC Outlier Detection Tool market size reached USD 412 million in 2024, with a robust compound annual growth rate (CAGR) of 18.7% observed over the past year. The market’s expansion is primarily driven by the increasing adoption of artificial intelligence in digital pathology and the rising demand for high-precision quality control in histological workflows. By 2033, the market is forecasted to reach USD 1.97 billion, reflecting the accelerating integration of AI-powered QC outlier detection tools across clinical and research environments worldwide.

The surge in demand for AI Histology QC Outlier Detection Tools is primarily attributed to the pressing need for accuracy and consistency in histopathological diagnostics. Traditional quality control processes in histology are labor-intensive and prone to human error, which can result in diagnostic discrepancies and impact patient outcomes. The deployment of advanced AI-driven QC outlier detection tools addresses these challenges by automating the identification of anomalies and artifacts in histological slides, ensuring standardized results and significantly reducing turnaround times. Moreover, the integration of machine learning algorithms enables these systems to continuously improve their detection capabilities, further enhancing diagnostic reliability and supporting the growing trend towards digitization in pathology laboratories.

Another significant growth driver for the AI Histology QC Outlier Detection Tool market is the increasing prevalence of cancer and other chronic diseases that require histopathological examination for diagnosis and treatment planning. The rising global cancer burden, coupled with the shortage of skilled pathologists, is pushing healthcare providers to adopt AI-powered solutions that can streamline workflow efficiency and mitigate diagnostic bottlenecks. These tools not only facilitate faster and more accurate detection of outliers in tissue samples but also support pathologists in prioritizing cases that require immediate attention. As a result, healthcare institutions are investing heavily in AI-based QC solutions to optimize resource utilization, improve patient care, and comply with stringent regulatory standards for laboratory quality assurance.

Technological advancements and strategic collaborations between AI developers, pathology labs, and healthcare providers are further accelerating market growth. The ongoing development of sophisticated image analysis algorithms, cloud-based platforms, and interoperability standards is enabling seamless integration of AI QC tools into existing laboratory information systems. Additionally, government initiatives aimed at promoting digital health transformation and funding for AI research in medical diagnostics are creating a favorable environment for market expansion. The proliferation of digital pathology infrastructure, particularly in developed regions, is expected to drive the adoption of AI QC outlier detection tools, while emerging markets are witnessing growing interest as healthcare systems modernize and invest in advanced diagnostic technologies.

From a regional perspective, North America currently dominates the AI Histology QC Outlier Detection Tool market, accounting for a significant share of global revenues in 2024. The region’s leadership is underpinned by a well-established healthcare infrastructure, high adoption rates of digital pathology, and strong presence of leading AI technology providers. Europe follows closely, supported by robust investments in healthcare innovation and a proactive regulatory landscape. Meanwhile, the Asia Pacific region is poised for the fastest growth over the forecast period, driven by increasing healthcare expenditure, expanding cancer screening programs, and rising awareness of the benefits of AI-powered diagnostic solutions. Latin America and the Middle East & Africa are also expected to witness steady growth as digital transformation initiatives gain momentum in these regions.
e
Outliers and similarity in APOGEE - Dataset - B2FIND
b2find.eudat.eu
Updated Nov 2, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Outliers and similarity in APOGEE - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/b624b506-541b-5a09-b615-14b8e202c468
Explore at:
Dataset updated
Nov 2, 2017
Description
In this work we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the dataset, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the dataset for objects allows us to find objects that are impossible to find using their best fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the dataset, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data. Cone search capability for table J/MNRAS/476/2117/apogeenn (Nearest neighbors APOGEE IDs)
Image outlier dataset
kaggle.com
Updated Jul 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
imadkhan9691 (2021). Image outlier dataset [Dataset]. https://www.kaggle.com/datasets/imadkhan9691/image-outlier-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
imadkhan9691
Description
Dataset

This dataset was created by imadkhan9691

Contents
Additional file 1 of Unsupervised outlier detection applied to SARS-CoV-2...
figshare.com
zip
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georg Hahn; Sanghun Lee; Dmitry Prokopenko; Jonathan Abraham; Tanya Novak; Julian Hecker; Michael Cho; Surender Khurana; Lindsey R. Baden; Adrienne G. Randolph; Scott T. Weiss; Christoph Lange (2024). Additional file 1 of Unsupervised outlier detection applied to SARS-CoV-2 nucleotide sequences can identify sequences of common variants and other variants of interest [Dataset]. http://doi.org/10.6084/m9.figshare.26555624.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26555624.v1
Dataset updated
Aug 13, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Georg Hahn; Sanghun Lee; Dmitry Prokopenko; Jonathan Abraham; Tanya Novak; Julian Hecker; Michael Cho; Surender Khurana; Lindsey R. Baden; Adrienne G. Randolph; Scott T. Weiss; Christoph Lange
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1. Lists of GISAID IDs for the two reference datasets (simulating the time before the emergence of a new variant and the onset of a new variant) for each variant under consideration in the article (alpha, beta, delta, gamma, GH, lambda, mu, omicron).
Z
ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...
data.niaid.nih.gov
elki-project.github.io
+2more
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zimek, Arthur (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
Explore at:
Dataset updated
May 2, 2024
Dataset provided by
Zimek, Arthur
Schubert, Erich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data sets were originally created for the following publications:

M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

The outlier data set versions were introduced in:

E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

They are derived from the original image data available at https://aloi.science.uva.nl/

The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

Additional information is available at: https://elki-project.github.io/datasets/multi_view

The following views are currently available:

Feature type Description Files Object number Sparse 1000 dimensional vectors that give the true object assignment objs.arff.gz RGB color histograms Standard RGB color histograms (uniform binning) aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz HSV color histograms Standard HSV/HSB color histograms in various binnings aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz Color similiarity Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black) aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other) Haralick features First 13 Haralick features (radius 1 pixel) aloi-haralick-1.csv.gz Front to back Vectors representing front face vs. back faces of individual objects front.arff.gz Basic light Vectors indicating basic light situations light.arff.gz Manual annotations Manually annotated object groups of semantically related objects such as cups manual1.arff.gz

Outlier Detection Versions

Additionally, we generated a number of subsets for outlier detection:

Feature type Description Files RGB Histograms Downsampled to 100000 objects (553 outliers) aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz Downsampled to 75000 objects (717 outliers) aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz Downsampled to 50000 objects (1508 outliers) aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

technavio.com

pdf

Updated Jun 12, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis

Explore at:

pdfAvailable download formats

Dataset updated

Jun 12, 2025

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2025 - 2029

Area covered

United Kingdom, United States, Germany, Canada

Description

Snapshot img

Anomaly Detection Market Size 2025-2029

The anomaly detection market size is forecast to increase by USD 4.44 billion at a CAGR of 14.4% between 2024 and 2029.

The market is experiencing significant growth, particularly in the BFSI sector, as organizations increasingly prioritize identifying and addressing unusual patterns or deviations from normal business operations. The rising incidence of internal threats and cyber frauds necessitates the implementation of advanced anomaly detection tools to mitigate potential risks and maintain security. However, implementing these solutions comes with challenges, primarily infrastructural requirements. Ensuring compatibility with existing systems, integrating new technologies, and training staff to effectively utilize these tools pose significant hurdles for organizations.
Despite these challenges, the potential benefits of anomaly detection, such as improved risk management, enhanced operational efficiency, and increased security, make it an essential investment for businesses seeking to stay competitive and agile in today's complex and evolving threat landscape. Companies looking to capitalize on this market opportunity must carefully consider these challenges and develop strategies to address them effectively. Cloud computing is a key trend in the market, as cloud-based solutions offer quick deployment, flexibility, and scalability.

What will be the Size of the Anomaly Detection Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample

In the dynamic and evolving market, advanced technologies such as resource allocation, linear regression, pattern recognition, and support vector machines are increasingly being adopted for automated decision making. Businesses are leveraging these techniques to enhance customer experience through behavioral analytics, object detection, and sentiment analysis. Machine learning algorithms, including random forests, naive Bayes, decision trees, clustering algorithms, and k-nearest neighbors, are essential tools for risk management and compliance monitoring. AI-powered analytics, time series forecasting, and predictive modeling are revolutionizing business intelligence, while process optimization is achieved through the application of decision support systems, natural language processing, and predictive analytics.
Computer vision, image recognition, logistic regression, and operational efficiency are key areas where principal component analysis and artificial technoogyneural networks contribute significantly. Speech recognition and operational efficiency are also benefiting from these advanced technologies, enabling businesses to streamline processes and improve overall performance.

How is this Anomaly Detection Industry segmented?

The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment

  Cloud
  On-premises


Component

  Solution
  Services


End-user

  BFSI
  IT and telecom
  Retail and e-commerce
  Manufacturing
  Others


Technology

  Big data analytics
  AI and ML
  Data mining and business intelligence


Geography

  North America

    US
    Canada
    Mexico


  Europe

    France
    Germany
    Spain
    UK


  APAC

    China
    India
    Japan


  Rest of World (ROW)

By Deployment Insights

The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth due to the increasing adoption of advanced technologies such as machine learning models, statistical methods, and real-time monitoring. These technologies enable the identification of anomalous behavior in real-time, thereby enhancing network security and data privacy. Anomaly detection algorithms, including unsupervised learning, reinforcement learning, and deep learning networks, are used to identify outliers and intrusions in large datasets. Data security is a major concern, leading to the adoption of data masking, data pseudonymization, data de-identification, and differential privacy.

Data leakage prevention and incident response are critical components of an effective anomaly detection system. False positive and false negative rates are essential metrics to evaluate the performance of these systems. Time series analysis and concept drift are important techniques used in anomaly detection. Data obfuscation, data suppression, and data aggregation are other strategies employed to maintain data privacy. Companies such as Anodot, Cisco Systems Inc, IBM Corp, and SAS Institute Inc offer both cloud-based and on-premises anomaly detection solutions. These soluti

f
GTI: A Novel Algorithm for Identifying Outlier Gene Expression Profiles from...
plos.figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Patrick Mpindi; Henri Sara; Saija Haapa-Paananen; Sami Kilpinen; Tommi Pisto; Elmar Bucher; Kalle Ojala; Kristiina Iljin; Paula Vainio; Mari Björkman; Santosh Gupta; Pekka Kohonen; Matthias Nees; Olli Kallioniemi (2023). GTI: A Novel Algorithm for Identifying Outlier Gene Expression Profiles from Integrated Microarray Datasets [Dataset]. http://doi.org/10.1371/journal.pone.0017259
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0017259
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
John Patrick Mpindi; Henri Sara; Saija Haapa-Paananen; Sami Kilpinen; Tommi Pisto; Elmar Bucher; Kalle Ojala; Kristiina Iljin; Paula Vainio; Mari Björkman; Santosh Gupta; Pekka Kohonen; Matthias Nees; Olli Kallioniemi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundMeta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes. MethodologyA new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. Conclusions/SignificanceTaken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).
f
Summary of each combination of outlier detection methods and robust...
plos.figshare.com
xls
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nora F. Fino; Lesley A. Inker; Tom Greene; Ogechi M. Adingwupu; Josef Coresh; Jesse Seegmiller; Michael G. Shlipak; Tazeen H. Jafar; Roberto Kalil; Veronica T. Costa e Silva; Vilmundur Gudnason; Andrew S. Levey; Ben Haaland (2024). Summary of each combination of outlier detection methods and robust estimation approaches. [Dataset]. http://doi.org/10.1371/journal.pone.0313154.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313154.t001
Dataset updated
Dec 2, 2024
Dataset provided by
PLOS ONE
Authors
Nora F. Fino; Lesley A. Inker; Tom Greene; Ogechi M. Adingwupu; Josef Coresh; Jesse Seegmiller; Michael G. Shlipak; Tazeen H. Jafar; Roberto Kalil; Veronica T. Costa e Silva; Vilmundur Gudnason; Andrew S. Levey; Ben Haaland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We combined each outlier detection method with each estimation approach such that there were nine different appoaches for robust GFR estimation in new application data.
Z
BOREALIS Power Analysis Code and Data
data.niaid.nih.gov
zenodo.org
Updated Nov 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Klee, Eric W (2022). BOREALIS Power Analysis Code and Data [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7343135
Explore at:
Dataset updated
Nov 22, 2022
Dataset provided by
Oliver, Gavin R
Jenkinson. W Garrett
Klee, Eric W
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This contains the code and data necessary to rerun the power analysis used in testing BOREALIS.

Borealis is an R library performing outlier analysis for count-based bisulfite sequencing data. It detects outlier methylated CpG sites from bisulfite sequencing (BS-seq). The core of Borealis is modeling Beta-Binomial distributions. This can be useful for rare disease diagnoses.
Student Performances | Data set cleared of outlier
kaggle.com
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fehu.zone (2024). Student Performances | Data set cleared of outlier [Dataset]. https://www.kaggle.com/datasets/fehu94/student-performances-data-set-cleared-of-outlier/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 30, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
fehu.zone
Description
Dataset

This dataset was created by fehu.zone

Released under Other (specified in description)

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002

Data from: Error and anomaly detection for intra-participant time-series data

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.5189002

Dataset updated

Jun 1, 2023

Dataset provided by

Taylor & Francis

Authors

David R. Mullineaux; Gareth Irwin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

Clear search

Close search

Google apps

Main menu

Data from: Error and anomaly detection for intra-participant time-series...

Outlier Set Two-step Method (OSTI)

Outlier Datasets - original

Dataset

Contents

outlier detection text reducing

Dataset

Contents

Vision Based Building Energy Data Outlier Detection Dataset

Vision Based Building Energy Data Outlier Detection

Additional file 2 of Outlier identification and monitoring of institutional...

Data from: Simultaneous Outlier Detection and Prediction for Kriging with...

Multi-Domain Outlier Detection Dataset

Data from: Multivariate Outliers and the O3 Plot

Introduction to Outlier

Dataset

Contents

AI Histology QC Outlier Detection Tool Market Research Report 2033

AI Histology QC Outlier Detection Tool Market Outlook

Outliers and similarity in APOGEE - Dataset - B2FIND

Image outlier dataset

Dataset

Contents

Additional file 1 of Unsupervised outlier detection applied to SARS-CoV-2...

ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

GTI: A Novel Algorithm for Identifying Outlier Gene Expression Profiles from...

Summary of each combination of outlier detection methods and robust...

BOREALIS Power Analysis Code and Data

Student Performances | Data set cleared of outlier

Dataset

Contents

Data from: Error and anomaly detection for intra-participant time-series data