100+ datasets found
  1. Data from: Anomalous values and missing data in clinical and experimental...

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hélio Amante Miot (2023). Anomalous values and missing data in clinical and experimental studies [Dataset]. http://doi.org/10.6084/m9.figshare.8227163.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Hélio Amante Miot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

  2. f

    Data from: Multivariate Outliers and the O3 Plot

    • figshare.com
    • tandf.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antony Unwin (2023). Multivariate Outliers and the O3 Plot [Dataset]. http://doi.org/10.6084/m9.figshare.7792115.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Antony Unwin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identifying and dealing with outliers is an important part of data analysis. A new visualization, the O3 plot, is introduced to aid in the display and understanding of patterns of multivariate outliers. It uses the results of identifying outliers for every possible combination of dataset variables to provide insight into why particular cases are outliers. The O3 plot can be used to compare the results from up to six different outlier identification methods. There is anRpackage OutliersO3 implementing the plot. The article is illustrated with outlier analyses of German demographic and economic data. Supplementary materials for this article are available online.

  3. Weight and Height data outlier detection

    • kaggle.com
    zip
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krishnaraj_DataScience (2023). Weight and Height data outlier detection [Dataset]. https://www.kaggle.com/datasets/krishnaraj30/weight-and-height-data-outlier-detection
    Explore at:
    zip(170686 bytes)Available download formats
    Dataset updated
    Jun 7, 2023
    Authors
    Krishnaraj_DataScience
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    When we talk about the features to predict the sex of each person, it is undeniable that Height & Weight are typical features for that.

    This dataset is purposely for the beginner who recently has done studying Machine Algorithm and may want to apply their algorithm on a simple dataset.

    There are just 2 features (Height, Weight) & 1 label (Sex)

    Height (inches) Weight Sex (male | female)

  4. Outlier Free Advertising Data Set

    • kaggle.com
    zip
    Updated Jul 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranjal Pandey (2020). Outlier Free Advertising Data Set [Dataset]. https://www.kaggle.com/pranjalpandey12/outlier-free-advertising-data-set
    Explore at:
    zip(1887 bytes)Available download formats
    Dataset updated
    Jul 28, 2020
    Authors
    Pranjal Pandey
    Description

    This is a outlier free data set for regression modelling.

  5. f

    Data from: Methodology to filter out outliers in high spatial density data...

    • scielo.figshare.com
    jpeg
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELO journals
    Authors
    Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.

  6. Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Canada, United States
    Description

    Snapshot img

    Anomaly Detection Market Size 2025-2029

    The anomaly detection market size is valued to increase by USD 4.44 billion, at a CAGR of 14.4% from 2024 to 2029. Anomaly detection tools gaining traction in BFSI will drive the anomaly detection market.

    Major Market Trends & Insights

    North America dominated the market and accounted for a 43% growth during the forecast period.
    By Deployment - Cloud segment was valued at USD 1.75 billion in 2023
    By Component - Solution segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 173.26 million
    Market Future Opportunities: USD 4441.70 million
    CAGR from 2024 to 2029 : 14.4%
    

    Market Summary

    Anomaly detection, a critical component of advanced analytics, is witnessing significant adoption across various industries, with the financial services sector leading the charge. The increasing incidence of internal threats and cybersecurity frauds necessitates the need for robust anomaly detection solutions. These tools help organizations identify unusual patterns and deviations from normal behavior, enabling proactive response to potential threats and ensuring operational efficiency. For instance, in a supply chain context, anomaly detection can help identify discrepancies in inventory levels or delivery schedules, leading to cost savings and improved customer satisfaction. In the realm of compliance, anomaly detection can assist in maintaining regulatory adherence by flagging unusual transactions or activities, thereby reducing the risk of penalties and reputational damage.
    According to recent research, organizations that implement anomaly detection solutions experience a reduction in error rates by up to 25%. This improvement not only enhances operational efficiency but also contributes to increased customer trust and satisfaction. Despite these benefits, challenges persist, including data quality and the need for real-time processing capabilities. As the market continues to evolve, advancements in machine learning and artificial intelligence are expected to address these challenges and drive further growth.
    

    What will be the Size of the Anomaly Detection Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the Anomaly Detection Market Segmented ?

    The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      Cloud
      On-premises
    
    
    Component
    
      Solution
      Services
    
    
    End-user
    
      BFSI
      IT and telecom
      Retail and e-commerce
      Manufacturing
      Others
    
    
    Technology
    
      Big data analytics
      AI and ML
      Data mining and business intelligence
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        France
        Germany
        Spain
        UK
    
    
      APAC
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The cloud segment is estimated to witness significant growth during the forecast period.

    The market is witnessing significant growth, driven by the increasing adoption of advanced technologies such as machine learning algorithms, predictive modeling tools, and real-time monitoring systems. Businesses are increasingly relying on anomaly detection solutions to enhance their root cause analysis, improve system health indicators, and reduce false positives. This is particularly true in sectors where data is generated in real-time, such as cybersecurity threat detection, network intrusion detection, and fraud detection systems. Cloud-based anomaly detection solutions are gaining popularity due to their flexibility, scalability, and cost-effectiveness.

    This growth is attributed to cloud-based solutions' quick deployment, real-time data visibility, and customization capabilities, which are offered at flexible payment options like monthly subscriptions and pay-as-you-go models. Companies like Anodot, Ltd, Cisco Systems Inc, IBM Corp, and SAS Institute Inc provide both cloud-based and on-premise anomaly detection solutions. Anomaly detection methods include outlier detection, change point detection, and statistical process control. Data preprocessing steps, such as data mining techniques and feature engineering processes, are crucial in ensuring accurate anomaly detection. Data visualization dashboards and alert fatigue mitigation techniques help in managing and interpreting the vast amounts of data generated.

    Network traffic analysis, log file analysis, and sensor data integration are essential components of anomaly detection systems. Additionally, risk management frameworks, drift detection algorithms, time series forecasting, and performance degradation detection are vital in maintaining system performance and capacity planning.

  7. Outlier classification using autoencoders: application for fluctuation...

    • osti.gov
    • dataverse.harvard.edu
    Updated Jun 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. http://doi.org/10.7910/DVN/SKEHRJ
    Explore at:
    Dataset updated
    Jun 2, 2021
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
    Description

    Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.

  8. Outlier packages in R

    • kaggle.com
    zip
    Updated Aug 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burak Dilber (2022). Outlier packages in R [Dataset]. https://www.kaggle.com/datasets/burakdilber/outliers-packages
    Explore at:
    zip(14940 bytes)Available download formats
    Dataset updated
    Aug 6, 2022
    Authors
    Burak Dilber
    Description

    In the R programming language, there are many packages related to a topic. Outliers is one of them. Dataset and descriptions of packages related to outliers in R:

    Package_Name: Package name Update_Date: The last update date of the package Version: Package version Depend: Package Depend License: Package License Needs Compilation: Need a compilation or not? URL: The package's website Encoding: UTF-8 or not Maintainer: Package maintainer Vignette_builder: Vignette builder Title: The title of the package Downloads1month: Number of downloads in the last 1 month Downloads6month: Number of downloads in the last 6 month Downloads12month: Number of downloads in the last 12 month

  9. r

    KMASH Data Repository for outlier detection

    • researchdata.edu.au
    • research-repository.rmit.edu.au
    • +1more
    Updated Aug 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sevvandi Kandanaarachchi; Sevvandi Kandanaarachchi; Rob Hyndman; Rob Hyndman; Mario Munoz Acosta; Mario Andres Munoz Acosta; Kate Smith-Miles; Kate Smith-Miles (2021). KMASH Data Repository for outlier detection [Dataset]. http://doi.org/10.26180/5C6253C0B3323
    Explore at:
    Dataset updated
    Aug 11, 2021
    Dataset provided by
    RMIT University, Australia
    Authors
    Sevvandi Kandanaarachchi; Sevvandi Kandanaarachchi; Rob Hyndman; Rob Hyndman; Mario Munoz Acosta; Mario Andres Munoz Acosta; Kate Smith-Miles; Kate Smith-Miles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The zip files contains 12338 datasets for outlier detection investigated in the following papers:


    (1) Instance space analysis for unsupervised outlier detection
    Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Kate Smith-Miles

    (2) On normalization and algorithm selection for unsupervised outlier detection
    Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Rob J. Hyndman, Kate Smith-Miles

    Some of these datasets were originally discussed in the paper:

    On the evaluation of unsupervised outlier detection:measures, datasets and an empirical study
    Authors : G. O. Campos, A, Zimek, J. Sander, R. J.G.B. Campello, B. Micenkova, E. Schubert, I. Assent, M.E. Houle.




  10. a

    Mapping Clusters: Hot Spot and Cluster and Outlier Analysis

    • hub.arcgis.com
    Updated Nov 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Delaware (2019). Mapping Clusters: Hot Spot and Cluster and Outlier Analysis [Dataset]. https://hub.arcgis.com/documents/delaware::mapping-clusters-hot-spot-and-cluster-and-outlier-analysis/about
    Explore at:
    Dataset updated
    Nov 8, 2019
    Dataset authored and provided by
    State of Delaware
    Description

    This course will introduce you to two of these tools: the Hot Spot Analysis (Getis-Ord Gi*) tool and the Cluster and Outlier Analysis (Anselin Local Moran's I) tool. These tools provide you with more control over your analysis. You can also use these tools to refine your analysis so that it better meets your needs.GoalsAnalyze data using the Hot Spot Analysis (Getis-Ord Gi*) tool.Analyze data using the Cluster and Outlier Analysis (Anselin Local Moran's I) tool.

  11. Z

    BOREALIS Power Analysis Code and Data

    • data.niaid.nih.gov
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oliver, Gavin R; Jenkinson. W Garrett; Klee, Eric W (2022). BOREALIS Power Analysis Code and Data [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7343135
    Explore at:
    Dataset updated
    Nov 22, 2022
    Dataset provided by
    Mayo Clinic
    Authors
    Oliver, Gavin R; Jenkinson. W Garrett; Klee, Eric W
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This contains the code and data necessary to rerun the power analysis used in testing BOREALIS.

    Borealis is an R library performing outlier analysis for count-based bisulfite sequencing data. It detects outlier methylated CpG sites from bisulfite sequencing (BS-seq). The core of Borealis is modeling Beta-Binomial distributions. This can be useful for rare disease diagnoses.

  12. d

    Replication Data for Outlier analysis: Natural resources and immigration...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Choi, Seung Whan (2023). Replication Data for Outlier analysis: Natural resources and immigration policy [Dataset]. http://doi.org/10.7910/DVN/MALOCW
    Explore at:
    Dataset updated
    Nov 12, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Choi, Seung Whan
    Description

    There are three files containing Stata data, and do and log-files. These are associated with the empirical models reported in the replication study, “Outlier Analysis: Natural Resources and Immigration Policy,” POLS ONE. Questions or comments regarding these materials should be directed to Seung-Whan Choi, Department of Political Science, University of Illinois at Chicago. His email address is whanchoi@uic.edu and his homepage address is https://whanchoi.people.uic.edu/.

  13. d

    Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

  14. e

    outlier.nyc Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). outlier.nyc Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/outlier.nyc
    Explore at:
    Dataset updated
    Sep 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank, Apparel & Fashion Category Rank
    Description

    Traffic analytics, rankings, and competitive metrics for outlier.nyc as of September 2025

  15. Student Performances | Data set cleared of outlier

    • kaggle.com
    zip
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fehu.zone (2024). Student Performances | Data set cleared of outlier [Dataset]. https://www.kaggle.com/datasets/fehu94/student-performances-data-set-cleared-of-outlier/code
    Explore at:
    zip(48730 bytes)Available download formats
    Dataset updated
    Oct 30, 2024
    Authors
    fehu.zone
    Description

    Dataset

    This dataset was created by fehu.zone

    Released under Other (specified in description)

    Contents

  16. D

    Model Access Outlier Detection Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Model Access Outlier Detection Market Research Report 2033 [Dataset]. https://dataintelo.com/report/model-access-outlier-detection-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Model Access Outlier Detection Market Outlook



    According to our latest research, the global Model Access Outlier Detection market size reached USD 1.32 billion in 2024, driven by the increasing need for advanced anomaly detection in digital infrastructure. The market is projected to grow at a CAGR of 14.8% from 2025 to 2033, reaching an estimated USD 4.15 billion by 2033. This robust growth is fueled by the rising adoption of AI-based security solutions, the proliferation of complex data environments, and the urgent demand for real-time threat detection across critical industries.




    The primary growth factor for the Model Access Outlier Detection market is the exponential increase in cyber threats and sophisticated attacks targeting enterprise data and networks. As organizations digitize operations, they generate vast volumes of data, making traditional rule-based security approaches inadequate. Outlier detection solutions leverage machine learning and artificial intelligence to identify unusual patterns and potential threats in real time, significantly reducing response times and minimizing the risk of data breaches. The integration of these technologies into existing security frameworks is becoming a necessity, especially in highly regulated sectors such as banking, healthcare, and government, where data integrity and privacy are paramount.




    Another significant driver propelling the market is the rapid adoption of cloud computing and the proliferation of IoT devices. As businesses migrate workloads to the cloud and deploy interconnected devices, the attack surface expands, necessitating advanced outlier detection mechanisms. Cloud-based solutions offer scalability, flexibility, and centralized monitoring, making them particularly attractive for organizations with distributed operations. Furthermore, the shift towards remote work and digital collaboration has increased the demand for real-time monitoring and anomaly detection to safeguard sensitive data and ensure business continuity. The continuous evolution of AI algorithms and the availability of big data analytics further enhance the accuracy and efficiency of outlier detection systems, contributing to sustained market growth.




    The growing emphasis on regulatory compliance and data protection standards worldwide is also catalyzing the adoption of Model Access Outlier Detection solutions. Stringent regulations such as GDPR, HIPAA, and PCI DSS require organizations to implement robust security measures and continuously monitor access to critical systems. Outlier detection tools play a vital role in meeting these compliance requirements by providing automated alerts, detailed audit trails, and actionable insights into suspicious activities. As regulatory landscapes become more complex, organizations are investing in advanced detection technologies not only to avoid penalties but also to build trust with customers and stakeholders.




    From a regional perspective, North America currently dominates the Model Access Outlier Detection market, accounting for the largest revenue share in 2024, followed by Europe and Asia Pacific. The presence of leading technology vendors, high cybersecurity awareness, and significant investments in digital infrastructure contribute to North America’s leadership. Europe is experiencing steady growth due to stringent data protection regulations and the increasing adoption of cloud-based security solutions. Meanwhile, the Asia Pacific region is poised for the fastest growth, driven by rapid digital transformation, expanding IT ecosystems, and rising incidences of cyber threats in emerging economies. The market’s global expansion is further supported by ongoing technological advancements and the increasing integration of AI and machine learning in security operations.



    Component Analysis



    The Component segment of the Model Access Outlier Detection market is broadly categorized into Software and Services. Software solutions are at the core of this market, comprising advanced analytics platforms, AI-driven detection engines, and customizable dashboards. These software offerings are designed to seamlessly integrate with existing IT infrastructure, providing organizations with the capability to monitor access patterns, identify anomalies, and generate real-time alerts. The sophistication of these tools lies in their ability to adapt to evolving threat landscapes, utilizing machine learning algorithms to

  17. Z

    Lipidomics LC-MS analysis support tools for outlier detection

    • data.niaid.nih.gov
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spick, Matt (2024). Lipidomics LC-MS analysis support tools for outlier detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10889320
    Explore at:
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    University of Surrey
    Authors
    Spick, Matt
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Identification of features with high levels of confidence in liquid chromatography-mass spectrometry (LC MS) lipidomics research is an essential part of biomarker discovery, but existing software platforms can give inconsistent results, even from identical spectral data. This poses a clear challenge for reproducibility in bioinformatics work, and highlights the importance of data-driven outlier detection in assessing spectral outputs – here demonstrated using a machine learning approach based on support vector machine regression combined with leave-one-out cross validation – as well as manual curation, in order to identify software-driven errors driven by closely related lipids and by co-elution issues.

    The lipidomics case study dataset used in this work analysed a lipid extraction of a human pancreatic adenocarcinoma cell line (PANC-1, Merck, UK, cat no. 87092802) analysed using an Acquity M-Class UPLC system (Waters, UK) coupled to a ZenoToF 7600 mass spectrometer (Sciex, UK). Raw output files are included alongside processed data using MS DIAL (v4.9.221218) and Lipostar (v2.1.4) and a Jupyter notebook with Python code to analyse the outputs for outlier detection.

  18. Exploratory data analysis of a clinical study group: Development of a...

    • plos.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bogumil M. Konopka; Felicja Lwow; Magdalena Owczarz; Łukasz Łaczmański (2023). Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data [Dataset]. http://doi.org/10.1371/journal.pone.0201950
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Bogumil M. Konopka; Felicja Lwow; Magdalena Owczarz; Łukasz Łaczmański
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups.

  19. ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

    • zenodo.org
    • elki-project.github.io
    • +1more
    application/gzip
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erich Schubert; Erich Schubert; Arthur Zimek; Arthur Zimek (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. http://doi.org/10.5281/zenodo.6355684
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erich Schubert; Erich Schubert; Arthur Zimek; Arthur Zimek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2022
    Description

    These data sets were originally created for the following publications:

    M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek
    Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
    In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

    H.-P. Kriegel, E. Schubert, A. Zimek
    Evaluation of Multiple Clustering Solutions
    In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

    The outlier data set versions were introduced in:

    E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel
    On Evaluation of Outlier Rankings and Outlier Scores
    In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

    They are derived from the original image data available at https://aloi.science.uva.nl/

    The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

    Additional information is available at: https://elki-project.github.io/datasets/multi_view

    The following views are currently available:

    Feature typeDescriptionFiles
    Object numberSparse 1000 dimensional vectors that give the true object assignmentobjs.arff.gz
    RGB color histogramsStandard RGB color histograms (uniform binning)aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
    HSV color histogramsStandard HSV/HSB color histograms in various binningsaloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
    Color similiarityAverage similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
    Haralick featuresFirst 13 Haralick features (radius 1 pixel)aloi-haralick-1.csv.gz
    Front to backVectors representing front face vs. back faces of individual objectsfront.arff.gz
    Basic lightVectors indicating basic light situationslight.arff.gz
    Manual annotationsManually annotated object groups of semantically related objects such as cupsmanual1.arff.gz

    Outlier Detection Versions

    Additionally, we generated a number of subsets for outlier detection:

    Feature typeDescriptionFiles
    RGB HistogramsDownsampled to 100000 objects (553 outliers)aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
    Downsampled to 75000 objects (717 outliers)aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
    Downsampled to 50000 objects (1508 outliers)aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
  20. BostonHousing

    • kaggle.com
    zip
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bunyamin Yavuz (2025). BostonHousing [Dataset]. https://www.kaggle.com/datasets/bunyaminyavuz/bostonhousing
    Explore at:
    zip(4713 bytes)Available download formats
    Dataset updated
    Feb 15, 2025
    Authors
    Bunyamin Yavuz
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Boston Housing Dataset

    The Boston Housing dataset is a well-known dataset in the field of predictive modeling and statistics. It contains information collected by the U.S. Census Service concerning housing in the area of Boston Mass.

    Dataset Overview

    • Number of Instances: 506
    • Number of Attributes: 14 (including the target variable)

    Attributes

    The dataset includes the following features:

    1. CRIM - Per capita crime rate by town.
    2. ZN - Proportion of residential land zoned for lots over 25,000 sq. ft.
    3. INDUS - Proportion of non-retail business acres per town.
    4. CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise).
    5. NOX - Nitric oxides concentration (parts per 10 million).
    6. RM - Average number of rooms per dwelling.
    7. AGE - Proportion of owner-occupied units built prior to 1940.
    8. DIS - Weighted distances to five Boston employment centers.
    9. RAD - Index of accessibility to radial highways.
    10. TAX - Full-value property tax rate per $10,000.
    11. PTRATIO - Pupil-teacher ratio by town.
    12. B - ( B ) stands for ( 1000(Bk - 0.63)^2 ) where ( Bk ) is the proportion of Black residents by town.
    13. LSTAT - Percentage of lower status of the population.
    14. MEDV - Median value of owner-occupied homes in $1000s (target variable).

    Use Cases

    This dataset can be used for:

    • Regression Analysis: To predict the value of homes based on the features provided.
    • Exploratory Data Analysis: To analyze the relationships between different variables.
    • Machine Learning: As a benchmark dataset for testing regression models.

    Citation

    Details about the dataset and its original source can be found in the following reference:

    • Harrison, D. and Rubinfeld, D. L. (1978). "Hedonic housing prices and the demand for clean air." J. Environ. Economics and Management, 5, 81-102.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hélio Amante Miot (2023). Anomalous values and missing data in clinical and experimental studies [Dataset]. http://doi.org/10.6084/m9.figshare.8227163.v1
Organization logo

Data from: Anomalous values and missing data in clinical and experimental studies

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Hélio Amante Miot
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

Search
Clear search
Close search
Google apps
Main menu