85 datasets found
  1. f

    Data from: Error and anomaly detection for intra-participant time-series...

    • tandf.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    David R. Mullineaux; Gareth Irwin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

  2. s

    Outlier Set Two-step Method (OSTI)

    • orda.shef.ac.uk
    application/x-rar
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amal Sarfraz; Abigail Birnbaum; Flannery Dolan; Jonathan Lamontagne; Lyudmila Mihaylova; Charles Rouge (2025). Outlier Set Two-step Method (OSTI) [Dataset]. http://doi.org/10.15131/shef.data.28227974.v3
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    The University of Sheffield
    Authors
    Amal Sarfraz; Abigail Birnbaum; Flannery Dolan; Jonathan Lamontagne; Lyudmila Mihaylova; Charles Rouge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These files are supplements to the paper titled 'A Robust Two-step Method for Detection of Outlier Sets'.This paper identifies and addresses the need for a robust method that identifies sets of points that collectively deviate from typical patterns in a dataset, which it calls "outlier sets'', while excluding individual points from detection. This new methodology, Outlier Set Two-step Identification (OSTI) employs a two-step approach to detect and label these outlier sets. First, it uses Gaussian Mixture Models for probabilistic clustering, identifying candidate outlier sets based on cluster weights below a predetermined threshold. Second, OSTI measures the Inter-cluster Mahalanobis distance between each candidate outlier set's centroid and the overall dataset mean. OSTI then tests the null hypothesis that this distance does not significantly differ from its theoretical chi-square distribution, enabling the formal detection of outlier sets. We test OSTI systematically on 8,000 synthetic 2D datasets across various inlier configurations and thousands of possible outlier set characteristics. Results show OSTI robustly and consistently detects outlier sets with an average F1 score of 0.92 and an average purity (the degree to which outlier sets identified correspond to those generated synthetically, i.e., our ground truth) of 98.58%. We also compare OSTI with state-of-the-art outlier detection methods, to illuminate how OSTI fills a gap as a tool for the exclusive detection of outlier sets.

  3. Outlier Datasets - original

    • kaggle.com
    zip
    Updated Feb 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hai Vo (2021). Outlier Datasets - original [Dataset]. https://www.kaggle.com/hariwh0/outlier-detection-datasets
    Explore at:
    zip(1534928268 bytes)Available download formats
    Dataset updated
    Feb 5, 2021
    Authors
    Hai Vo
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Hai Vo

    Released under Database: Open Database, Contents: Database Contents

    Contents

  4. outlier detection text reducing

    • kaggle.com
    Updated Aug 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Mortezaie (2025). outlier detection text reducing [Dataset]. https://www.kaggle.com/datasets/alimortezaie/outlier-detection-text-reducing
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2025
    Dataset provided by
    Kaggle
    Authors
    Ali Mortezaie
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Ali Mortezaie

    Released under Apache 2.0

    Contents

  5. R

    Vision Based Building Energy Data Outlier Detection Dataset

    • universe.roboflow.com
    zip
    Updated Apr 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    energy data outlier detection (2024). Vision Based Building Energy Data Outlier Detection Dataset [Dataset]. https://universe.roboflow.com/energy-data-outlier-detection/vision-based-building-energy-data-outlier-detection/model/5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 3, 2024
    Dataset authored and provided by
    energy data outlier detection
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    11785 Bounding Boxes
    Description

    Vision Based Building Energy Data Outlier Detection

    ## Overview
    
    Vision Based Building Energy Data Outlier Detection is a dataset for object detection tasks - it contains 11785 annotations for 2,159 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. Additional file 2 of Outlier identification and monitoring of institutional...

    • springernature.figshare.com
    txt
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Menelaos Pavlou; Gareth Ambler; Rumana Z. Omar; Andrew T. Goodwin; Uday Trivedi; Peter Ludman; Mark de Belder (2023). Additional file 2 of Outlier identification and monitoring of institutional or clinician performance: an overview of statistical methods and application to national audit data [Dataset]. http://doi.org/10.6084/m9.figshare.22612465.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Menelaos Pavlou; Gareth Ambler; Rumana Z. Omar; Andrew T. Goodwin; Uday Trivedi; Peter Ludman; Mark de Belder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2.

  7. f

    Data from: Simultaneous Outlier Detection and Prediction for Kriging with...

    • tandf.figshare.com
    zip
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youjie Zeng; Zhanfeng Wang; Youngjo Lee; Niansheng Tang (2025). Simultaneous Outlier Detection and Prediction for Kriging with True Identification [Dataset]. http://doi.org/10.6084/m9.figshare.28715504.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Youjie Zeng; Zhanfeng Wang; Youngjo Lee; Niansheng Tang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Kriging with interpolation is widely used in various noise-free areas, such as computer experiments. However, owing to its Gaussian assumption, it is susceptible to outliers, which affects statistical inference, and the resulting conclusions could be misleading. Little work has explored outlier detection for kriging. Therefore, we propose a novel kriging method for simultaneous outlier detection and prediction by introducing a normal-gamma prior, which results in an unbounded penalty on the biases to distinguish outliers from normal data points. We develop a simple and efficient method, avoiding the expensive computation of the Markov chain Monte Carlo algorithm, to simultaneously detect outliers and make a prediction. We establish the true identification property for outlier detection and the consistency of the estimated hyperparameters in kriging under the increasing domain framework as if the number and locations of the outliers were known in advance. Under appropriate regularity conditions, we demonstrate information consistency for prediction in the presence of outliers. Numerical studies and real data examples show that the proposed method generally provides robust analyses in the presence of outliers. Supplementary materials for this article are available online.

  8. Multi-Domain Outlier Detection Dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah Kerner; Hannah Kerner; Umaa Rebbapragada; Umaa Rebbapragada; Kiri Wagstaff; Kiri Wagstaff; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha (2022). Multi-Domain Outlier Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.6400786
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hannah Kerner; Hannah Kerner; Umaa Rebbapragada; Umaa Rebbapragada; Kiri Wagstaff; Kiri Wagstaff; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Multi-Domain Outlier Detection Dataset contains datasets for conducting outlier detection experiments for four different application domains:

    1. Astrophysics - detecting anomalous observations in the Dark Energy Survey (DES) catalog (data type: feature vectors)
    2. Planetary science - selecting novel geologic targets for follow-up observation onboard the Mars Science Laboratory (MSL) rover (data type: grayscale images)
    3. Earth science: detecting anomalous samples in satellite time series corresponding to ground-truth observations of maize crops (data type: time series/feature vectors)
    4. Fashion-MNIST/MNIST: benchmark task to detect anomalous MNIST images among Fashion-MNIST images (data type: grayscale images)

    Each dataset contains a "fit" dataset (used for fitting or training outlier detection models), a "score" dataset (used for scoring samples used to evaluate model performance, analogous to test set), and a label dataset (indicates whether samples in the score dataset are considered outliers or not in the domain of each dataset).

    To read more about the datasets and how they are used for outlier detection, or to cite this dataset in your own work, please see the following citation:

    Kerner, H. R., Rebbapragada, U., Wagstaff, K. L., Lu, S., Dubayah, B., Huff, E., Lee, J., Raman, V., and Kulshrestha, S. (2022). Domain-agnostic Outlier Ranking Algorithms (DORA)-A Configurable Pipeline for Facilitating Outlier Detection in Scientific Datasets. Under review for Frontiers in Astronomy and Space Sciences.

  9. f

    Data from: Multivariate Outliers and the O3 Plot

    • tandf.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antony Unwin (2023). Multivariate Outliers and the O3 Plot [Dataset]. http://doi.org/10.6084/m9.figshare.7792115.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Antony Unwin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identifying and dealing with outliers is an important part of data analysis. A new visualization, the O3 plot, is introduced to aid in the display and understanding of patterns of multivariate outliers. It uses the results of identifying outliers for every possible combination of dataset variables to provide insight into why particular cases are outliers. The O3 plot can be used to compare the results from up to six different outlier identification methods. There is anRpackage OutliersO3 implementing the plot. The article is illustrated with outlier analyses of German demographic and economic data. Supplementary materials for this article are available online.

  10. Introduction to Outlier

    • kaggle.com
    zip
    Updated Jul 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omsingh Bais (2021). Introduction to Outlier [Dataset]. https://www.kaggle.com/datasets/ombais/introduction-to-outlier
    Explore at:
    zip(5672 bytes)Available download formats
    Dataset updated
    Jul 10, 2021
    Authors
    Omsingh Bais
    Description

    Dataset

    This dataset was created by Omsingh Bais

    Contents

  11. G

    AI Histology QC Outlier Detection Tool Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI Histology QC Outlier Detection Tool Market Outlook



    According to our latest research, the global AI Histology QC Outlier Detection Tool market size reached USD 412 million in 2024, with a robust compound annual growth rate (CAGR) of 18.7% observed over the past year. The market’s expansion is primarily driven by the increasing adoption of artificial intelligence in digital pathology and the rising demand for high-precision quality control in histological workflows. By 2033, the market is forecasted to reach USD 1.97 billion, reflecting the accelerating integration of AI-powered QC outlier detection tools across clinical and research environments worldwide.




    The surge in demand for AI Histology QC Outlier Detection Tools is primarily attributed to the pressing need for accuracy and consistency in histopathological diagnostics. Traditional quality control processes in histology are labor-intensive and prone to human error, which can result in diagnostic discrepancies and impact patient outcomes. The deployment of advanced AI-driven QC outlier detection tools addresses these challenges by automating the identification of anomalies and artifacts in histological slides, ensuring standardized results and significantly reducing turnaround times. Moreover, the integration of machine learning algorithms enables these systems to continuously improve their detection capabilities, further enhancing diagnostic reliability and supporting the growing trend towards digitization in pathology laboratories.




    Another significant growth driver for the AI Histology QC Outlier Detection Tool market is the increasing prevalence of cancer and other chronic diseases that require histopathological examination for diagnosis and treatment planning. The rising global cancer burden, coupled with the shortage of skilled pathologists, is pushing healthcare providers to adopt AI-powered solutions that can streamline workflow efficiency and mitigate diagnostic bottlenecks. These tools not only facilitate faster and more accurate detection of outliers in tissue samples but also support pathologists in prioritizing cases that require immediate attention. As a result, healthcare institutions are investing heavily in AI-based QC solutions to optimize resource utilization, improve patient care, and comply with stringent regulatory standards for laboratory quality assurance.




    Technological advancements and strategic collaborations between AI developers, pathology labs, and healthcare providers are further accelerating market growth. The ongoing development of sophisticated image analysis algorithms, cloud-based platforms, and interoperability standards is enabling seamless integration of AI QC tools into existing laboratory information systems. Additionally, government initiatives aimed at promoting digital health transformation and funding for AI research in medical diagnostics are creating a favorable environment for market expansion. The proliferation of digital pathology infrastructure, particularly in developed regions, is expected to drive the adoption of AI QC outlier detection tools, while emerging markets are witnessing growing interest as healthcare systems modernize and invest in advanced diagnostic technologies.




    From a regional perspective, North America currently dominates the AI Histology QC Outlier Detection Tool market, accounting for a significant share of global revenues in 2024. The region’s leadership is underpinned by a well-established healthcare infrastructure, high adoption rates of digital pathology, and strong presence of leading AI technology providers. Europe follows closely, supported by robust investments in healthcare innovation and a proactive regulatory landscape. Meanwhile, the Asia Pacific region is poised for the fastest growth over the forecast period, driven by increasing healthcare expenditure, expanding cancer screening programs, and rising awareness of the benefits of AI-powered diagnostic solutions. Latin America and the Middle East & Africa are also expected to witness steady growth as digital transformation initiatives gain momentum in these regions.




  12. e

    Outliers and similarity in APOGEE - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Nov 2, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Outliers and similarity in APOGEE - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/b624b506-541b-5a09-b615-14b8e202c468
    Explore at:
    Dataset updated
    Nov 2, 2017
    Description

    In this work we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the dataset, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the dataset for objects allows us to find objects that are impossible to find using their best fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the dataset, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data. Cone search capability for table J/MNRAS/476/2117/apogeenn (Nearest neighbors APOGEE IDs)

  13. Image outlier dataset

    • kaggle.com
    Updated Jul 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    imadkhan9691 (2021). Image outlier dataset [Dataset]. https://www.kaggle.com/datasets/imadkhan9691/image-outlier-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    imadkhan9691
    Description

    Dataset

    This dataset was created by imadkhan9691

    Contents

  14. Additional file 1 of Unsupervised outlier detection applied to SARS-CoV-2...

    • figshare.com
    zip
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georg Hahn; Sanghun Lee; Dmitry Prokopenko; Jonathan Abraham; Tanya Novak; Julian Hecker; Michael Cho; Surender Khurana; Lindsey R. Baden; Adrienne G. Randolph; Scott T. Weiss; Christoph Lange (2024). Additional file 1 of Unsupervised outlier detection applied to SARS-CoV-2 nucleotide sequences can identify sequences of common variants and other variants of interest [Dataset]. http://doi.org/10.6084/m9.figshare.26555624.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 13, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Georg Hahn; Sanghun Lee; Dmitry Prokopenko; Jonathan Abraham; Tanya Novak; Julian Hecker; Michael Cho; Surender Khurana; Lindsey R. Baden; Adrienne G. Randolph; Scott T. Weiss; Christoph Lange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. Lists of GISAID IDs for the two reference datasets (simulating the time before the emergence of a new variant and the onset of a new variant) for each variant under consideration in the article (alpha, beta, delta, gamma, GH, lambda, mu, omicron).

  15. Z

    ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

    • data.niaid.nih.gov
    • elki-project.github.io
    • +2more
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zimek, Arthur (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
    Explore at:
    Dataset updated
    May 2, 2024
    Dataset provided by
    Zimek, Arthur
    Schubert, Erich
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data sets were originally created for the following publications:

    M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

    H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

    The outlier data set versions were introduced in:

    E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

    They are derived from the original image data available at https://aloi.science.uva.nl/

    The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

    Additional information is available at: https://elki-project.github.io/datasets/multi_view

    The following views are currently available:

        Feature type
        Description
        Files
    
    
        Object number
        Sparse 1000 dimensional vectors that give the true object assignment
        objs.arff.gz
    
    
        RGB color histograms
        Standard RGB color histograms (uniform binning)
        aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
    
    
        HSV color histograms
        Standard HSV/HSB color histograms in various binnings
        aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
    
    
        Color similiarity
        Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)
        aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
    
    
        Haralick features
        First 13 Haralick features (radius 1 pixel)
        aloi-haralick-1.csv.gz
    
    
        Front to back
        Vectors representing front face vs. back faces of individual objects
        front.arff.gz
    
    
        Basic light
        Vectors indicating basic light situations
        light.arff.gz
    
    
        Manual annotations
        Manually annotated object groups of semantically related objects such as cups
        manual1.arff.gz
    

    Outlier Detection Versions

    Additionally, we generated a number of subsets for outlier detection:

        Feature type
        Description
        Files
    
    
        RGB Histograms
        Downsampled to 100000 objects (553 outliers)
        aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
    
    
    
        Downsampled to 75000 objects (717 outliers)
        aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
    
    
    
        Downsampled to 50000 objects (1508 outliers)
        aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
    
  16. Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2025 - 2029
    Area covered
    United Kingdom, United States, Germany, Canada
    Description

    Snapshot img

    Anomaly Detection Market Size 2025-2029

    The anomaly detection market size is forecast to increase by USD 4.44 billion at a CAGR of 14.4% between 2024 and 2029.

    The market is experiencing significant growth, particularly in the BFSI sector, as organizations increasingly prioritize identifying and addressing unusual patterns or deviations from normal business operations. The rising incidence of internal threats and cyber frauds necessitates the implementation of advanced anomaly detection tools to mitigate potential risks and maintain security. However, implementing these solutions comes with challenges, primarily infrastructural requirements. Ensuring compatibility with existing systems, integrating new technologies, and training staff to effectively utilize these tools pose significant hurdles for organizations.
    Despite these challenges, the potential benefits of anomaly detection, such as improved risk management, enhanced operational efficiency, and increased security, make it an essential investment for businesses seeking to stay competitive and agile in today's complex and evolving threat landscape. Companies looking to capitalize on this market opportunity must carefully consider these challenges and develop strategies to address them effectively. Cloud computing is a key trend in the market, as cloud-based solutions offer quick deployment, flexibility, and scalability.
    

    What will be the Size of the Anomaly Detection Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    In the dynamic and evolving market, advanced technologies such as resource allocation, linear regression, pattern recognition, and support vector machines are increasingly being adopted for automated decision making. Businesses are leveraging these techniques to enhance customer experience through behavioral analytics, object detection, and sentiment analysis. Machine learning algorithms, including random forests, naive Bayes, decision trees, clustering algorithms, and k-nearest neighbors, are essential tools for risk management and compliance monitoring. AI-powered analytics, time series forecasting, and predictive modeling are revolutionizing business intelligence, while process optimization is achieved through the application of decision support systems, natural language processing, and predictive analytics.
    Computer vision, image recognition, logistic regression, and operational efficiency are key areas where principal component analysis and artificial technoogyneural networks contribute significantly. Speech recognition and operational efficiency are also benefiting from these advanced technologies, enabling businesses to streamline processes and improve overall performance.
    

    How is this Anomaly Detection Industry segmented?

    The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      Cloud
      On-premises
    
    
    Component
    
      Solution
      Services
    
    
    End-user
    
      BFSI
      IT and telecom
      Retail and e-commerce
      Manufacturing
      Others
    
    
    Technology
    
      Big data analytics
      AI and ML
      Data mining and business intelligence
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        France
        Germany
        Spain
        UK
    
    
      APAC
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth due to the increasing adoption of advanced technologies such as machine learning models, statistical methods, and real-time monitoring. These technologies enable the identification of anomalous behavior in real-time, thereby enhancing network security and data privacy. Anomaly detection algorithms, including unsupervised learning, reinforcement learning, and deep learning networks, are used to identify outliers and intrusions in large datasets. Data security is a major concern, leading to the adoption of data masking, data pseudonymization, data de-identification, and differential privacy.

    Data leakage prevention and incident response are critical components of an effective anomaly detection system. False positive and false negative rates are essential metrics to evaluate the performance of these systems. Time series analysis and concept drift are important techniques used in anomaly detection. Data obfuscation, data suppression, and data aggregation are other strategies employed to maintain data privacy. Companies such as Anodot, Cisco Systems Inc, IBM Corp, and SAS Institute Inc offer both cloud-based and on-premises anomaly detection solutions. These soluti

  17. f

    GTI: A Novel Algorithm for Identifying Outlier Gene Expression Profiles from...

    • plos.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Patrick Mpindi; Henri Sara; Saija Haapa-Paananen; Sami Kilpinen; Tommi Pisto; Elmar Bucher; Kalle Ojala; Kristiina Iljin; Paula Vainio; Mari Björkman; Santosh Gupta; Pekka Kohonen; Matthias Nees; Olli Kallioniemi (2023). GTI: A Novel Algorithm for Identifying Outlier Gene Expression Profiles from Integrated Microarray Datasets [Dataset]. http://doi.org/10.1371/journal.pone.0017259
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    John Patrick Mpindi; Henri Sara; Saija Haapa-Paananen; Sami Kilpinen; Tommi Pisto; Elmar Bucher; Kalle Ojala; Kristiina Iljin; Paula Vainio; Mari Björkman; Santosh Gupta; Pekka Kohonen; Matthias Nees; Olli Kallioniemi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundMeta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes. MethodologyA new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. Conclusions/SignificanceTaken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1).

  18. f

    Summary of each combination of outlier detection methods and robust...

    • plos.figshare.com
    xls
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nora F. Fino; Lesley A. Inker; Tom Greene; Ogechi M. Adingwupu; Josef Coresh; Jesse Seegmiller; Michael G. Shlipak; Tazeen H. Jafar; Roberto Kalil; Veronica T. Costa e Silva; Vilmundur Gudnason; Andrew S. Levey; Ben Haaland (2024). Summary of each combination of outlier detection methods and robust estimation approaches. [Dataset]. http://doi.org/10.1371/journal.pone.0313154.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 2, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Nora F. Fino; Lesley A. Inker; Tom Greene; Ogechi M. Adingwupu; Josef Coresh; Jesse Seegmiller; Michael G. Shlipak; Tazeen H. Jafar; Roberto Kalil; Veronica T. Costa e Silva; Vilmundur Gudnason; Andrew S. Levey; Ben Haaland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We combined each outlier detection method with each estimation approach such that there were nine different appoaches for robust GFR estimation in new application data.

  19. Z

    BOREALIS Power Analysis Code and Data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Klee, Eric W (2022). BOREALIS Power Analysis Code and Data [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7343135
    Explore at:
    Dataset updated
    Nov 22, 2022
    Dataset provided by
    Oliver, Gavin R
    Jenkinson. W Garrett
    Klee, Eric W
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This contains the code and data necessary to rerun the power analysis used in testing BOREALIS.

    Borealis is an R library performing outlier analysis for count-based bisulfite sequencing data. It detects outlier methylated CpG sites from bisulfite sequencing (BS-seq). The core of Borealis is modeling Beta-Binomial distributions. This can be useful for rare disease diagnoses.

  20. Student Performances | Data set cleared of outlier

    • kaggle.com
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fehu.zone (2024). Student Performances | Data set cleared of outlier [Dataset]. https://www.kaggle.com/datasets/fehu94/student-performances-data-set-cleared-of-outlier/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 30, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    fehu.zone
    Description

    Dataset

    This dataset was created by fehu.zone

    Released under Other (specified in description)

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002

Data from: Error and anomaly detection for intra-participant time-series data

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
David R. Mullineaux; Gareth Irwin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

Search
Clear search
Close search
Google apps
Main menu