100+ datasets found

MNIST dataset for Outliers Detection - [ MNIST4OD ]
figshare.com
application/gzip
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanni Stilo; Bardh Prenkaj (2024). MNIST dataset for Outliers Detection - [ MNIST4OD ] [Dataset]. http://doi.org/10.6084/m9.figshare.9954986.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9954986.v2
Dataset updated
May 17, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Giovanni Stilo; Bardh Prenkaj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here we present a dataset, MNIST4OD, of large size (number of dimensions and number of instances) suitable for Outliers Detection task.The dataset is based on the famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).We build MNIST4OD in the following way:To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on the remaining images as outliers such as their number is equal to 10% of that of inliers. We repeat this dataset generation process for all digits. For implementation simplicity we then flatten the images (28 X 28) into vectors.Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x.The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point. The data contains also a column which indicates the original image class (0-9).See the following numbers for a complete list of the statistics of each datasets ( Name | Instances | Dimensions | Number of Outliers in % ):MNIST_0 | 7594 | 784 | 10MNIST_1 | 8665 | 784 | 10MNIST_2 | 7689 | 784 | 10MNIST_3 | 7856 | 784 | 10MNIST_4 | 7507 | 784 | 10MNIST_5 | 6945 | 784 | 10MNIST_6 | 7564 | 784 | 10MNIST_7 | 8023 | 784 | 10MNIST_8 | 7508 | 784 | 10MNIST_9 | 7654 | 784 | 10
d
Algorithms for Speeding up Distance-Based Outlier Detection
catalog.data.gov
data.nasa.gov
+1more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Algorithms for Speeding up Distance-Based Outlier Detection [Dataset]. https://catalog.data.gov/dataset/algorithms-for-speeding-up-distance-based-outlier-detection
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
The problem of distance-based outlier detection is difficult to solve efficiently in very large datasets because of potential quadratic time complexity. We address this problem and develop sequential and distributed algorithms that are significantly more efficient than state-of-the-art methods while still guaranteeing the same outliers. By combining simple but effective indexing and disk block accessing techniques, we have developed a sequential algorithm iOrca that is up to an order-of-magnitude faster than the state-of-the-art. The indexing scheme is based on sorting the data points in order of increasing distance from a fixed reference point and then accessing those points based on this sorted order. To speed up the basic outlier detection technique, we develop two distributed algorithms (DOoR and iDOoR) for modern distributed multi-core clusters of machines, connected on a ring topology. The first algorithm passes data blocks from each machine around the ring, incrementally updating the nearest neighbors of the points passed. By maintaining a cutoff threshold, it is able to prune a large number of points in a distributed fashion. The second distributed algorithm extends this basic idea with the indexing scheme discussed earlier. In our experiments, both distributed algorithms exhibit significant improvements compared to the state-of-the-art distributed methods.
f
Data from: A Diagnostic Procedure for Detecting Outliers in Linear...
tandf.figshare.com
figshare.com
txt
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dongjun You; Michael Hunter; Meng Chen; Sy-Miin Chow (2024). A Diagnostic Procedure for Detecting Outliers in Linear State–Space Models [Dataset]. http://doi.org/10.6084/m9.figshare.12162075.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12162075.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Taylor & Francis
Authors
Dongjun You; Michael Hunter; Meng Chen; Sy-Miin Chow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Outliers can be more problematic in longitudinal data than in independent observations due to the correlated nature of such data. It is common practice to discard outliers as they are typically regarded as a nuisance or an aberration in the data. However, outliers can also convey meaningful information concerning potential model misspecification, and ways to modify and improve the model. Moreover, outliers that occur among the latent variables (innovative outliers) have distinct characteristics compared to those impacting the observed variables (additive outliers), and are best evaluated with different test statistics and detection procedures. We demonstrate and evaluate the performance of an outlier detection approach for multi-subject state-space models in a Monte Carlo simulation study, with corresponding adaptations to improve power and reduce false detection rates. Furthermore, we demonstrate the empirical utility of the proposed approach using data from an ecological momentary assessment study of emotion regulation together with an open-source software implementation of the procedures.
Data for Filtering Organized 3D Point Clouds for Bin Picking Applications
datasets.ai
catalog.data.gov
0, 34, 47
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). Data for Filtering Organized 3D Point Clouds for Bin Picking Applications [Dataset]. https://datasets.ai/datasets/data-for-filtering-organized-3d-point-clouds-for-bin-picking-applications
Explore at:
0, 34, 47Available download formats
Dataset updated
Aug 6, 2024
Dataset authored and provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Contains scans of a bin filled with different parts ( screws, nuts, rods, spheres, sprockets). For each part type, RGB image and organized 3D point cloud obtained with structured light sensor are provided. In addition, unorganized 3D point cloud representing an empty bin and a small Matlab script to read the files is also provided. 3D data contain a lot of outliers and the data were used to demonstrate a new filtering technique.
d
Data from: Privacy Preserving Outlier Detection through Random Nonlinear...
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Privacy Preserving Outlier Detection through Random Nonlinear Data Distortion [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-outlier-detection-through-random-nonlinear-data-distortion
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Consider a scenario in which the data owner has some private/sensitive data and wants a data miner to access it for studying important patterns without revealing the sensitive information. Privacy preserving data mining aims to solve this problem by randomly transforming the data prior to its release to data miners. Previous work only considered the case of linear data perturbations — additive, multiplicative or a combination of both for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy preserving anomaly detection from sensitive datasets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that for specific cases it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. Experiments conducted on real-life datasets demonstrate the effectiveness of the approach.
f
Timings and statistical data of point model by our method.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaojuan Ning; Fan Li; Ge Tian; Yinghui Wang (2023). Timings and statistical data of point model by our method. [Dataset]. http://doi.org/10.1371/journal.pone.0201280.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0201280.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Xiaojuan Ning; Fan Li; Ge Tian; Yinghui Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Timings and statistical data of point model by our method.
d
Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
s
Outlier Set Two-step Method (OSTI)
orda.shef.ac.uk
application/x-rar
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amal Sarfraz; Abigail Birnbaum; Flannery Dolan; Jonathan Lamontagne; Lyudmila Mihaylova; Charles Rouge (2025). Outlier Set Two-step Method (OSTI) [Dataset]. http://doi.org/10.15131/shef.data.28227974.v2
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.28227974.v2
Dataset updated
Mar 25, 2025
Dataset provided by
The University of Sheffield
Authors
Amal Sarfraz; Abigail Birnbaum; Flannery Dolan; Jonathan Lamontagne; Lyudmila Mihaylova; Charles Rouge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These files are supplements to the paper titled 'A Robust Two-step Method for Detection of Outlier Sets'.
t
Outlier Detection on Sensor Data - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Outlier Detection on Sensor Data - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/outlier-detection-on-sensor-data
Explore at:
Dataset updated
Dec 2, 2024
Description
The dataset used for outlier detection on sensor data from temperature and humidity sensors deployed in sensorized farms and manufacturing units on Purdue University's campus.
f
Data from: Outlier detection in cylindrical data based on Mahalanobis...
tandf.figshare.com
text/x-tex
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prashant S. Dhamale; Akanksha S. Kashikar (2025). Outlier detection in cylindrical data based on Mahalanobis distance [Dataset]. http://doi.org/10.6084/m9.figshare.24092089.v1
Explore at:
text/x-texAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24092089.v1
Dataset updated
Jan 2, 2025
Dataset provided by
Taylor & Francis
Authors
Prashant S. Dhamale; Akanksha S. Kashikar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cylindrical data are bivariate data formed from the combination of circular and linear variables. Identifying outliers is a crucial step in any data analysis work. This paper proposes a new distribution-free procedure to detect outliers in cylindrical data using the Mahalanobis distance concept. The use of Mahalanobis distance incorporates the correlation between the components of the cylindrical distribution, which had not been accounted for in the earlier papers on outlier detection in cylindrical data. The threshold for declaring an observation to be an outlier can be obtained via parametric or non-parametric bootstrap, depending on whether the underlying distribution is known or unknown. The performance of the proposed method is examined via extensive simulations from the Johnson-Wehrly distribution. The proposed method is applied to two real datasets, and the outliers are identified in those datasets.
d
Supporting data for \"A Standard Operating Procedure for Outlier Removal in...
search.dataone.org
dataverse.no
+1more
Updated Jul 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Holsbø, Einar (2024). Supporting data for \"A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets\" [Dataset]. http://doi.org/10.18710/FGVLKS
Explore at:
Unique identifier
https://doi.org/10.18710/FGVLKS
Dataset updated
Jul 29, 2024
Dataset provided by
DataverseNO
Authors
Holsbø, Einar
Description
This dataset is example data from the Norwegian Women and Cancer study. It is supporting information to our article "A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets." (In submission) The bulk of the data comes from measuring gene expression in blood samples from the Norwegian Women and Cancer study (NOWAC) on Illumina Whole-Genome Gene Expression Bead Chips, HumanHT-12 v4. Please see README.txt for details
Multi-Domain Outlier Detection Dataset
zenodo.org
explore.openaire.eu
+1more
zip
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hannah Kerner; Hannah Kerner; Umaa Rebbapragada; Umaa Rebbapragada; Kiri Wagstaff; Kiri Wagstaff; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha (2022). Multi-Domain Outlier Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.5941339
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5941339
Dataset updated
Mar 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hannah Kerner; Hannah Kerner; Umaa Rebbapragada; Umaa Rebbapragada; Kiri Wagstaff; Kiri Wagstaff; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha; Steven Lu; Bryce Dubayah; Eric Huff; Raymond Francis; Jake Lee; Vinay Raman; Sakshum Kulshrestha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Multi-Domain Outlier Detection Dataset contains datasets for conducting outlier detection experiments for four different application domains:

Astrophysics - detecting anomalous observations in the Dark Energy Survey (DES) catalog (data type: feature vectors)

Planetary science - selecting novel geologic targets for follow-up observation onboard the Mars Science Laboratory (MSL) rover (data type: grayscale images)

Earth science: detecting anomalous samples in satellite time series corresponding to ground-truth observations of maize crops (data type: time series/feature vectors)

Fashion-MNIST/MNIST: benchmark task to detect anomalous MNIST images among Fashion-MNIST images (data type: grayscale images)

Each dataset contains a "fit" dataset (used for fitting or training outlier detection models), a "score" dataset (used for scoring samples used to evaluate model performance, analogous to test set), and a label dataset (indicates whether samples in the score dataset are considered outliers or not in the domain of each dataset).

To read more about the datasets and how they are used for outlier detection, or to cite this dataset in your own work, please see the following citation:

Kerner, H. R., Rebbapragada, U., Wagstaff, K. L., Lu, S., Dubayah, B., Huff, E., Lee, J., Raman, V., and Kulshrestha, S. (2022). Domain-agnostic Outlier Ranking Algorithms (DORA)-A Configurable Pipeline for Facilitating Outlier Detection in Scientific Datasets. Under review for Frontiers in Astronomy and Space Sciences.
d
Anomaly Detection in Sequences
catalog.data.gov
datasets.ai
+3more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Anomaly Detection in Sequences [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-in-sequences
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms we present are general and domain-independent, we focus on a specific problem that is critical to determining system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of he longest common subsequence (nLCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from a cluster. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithm provides a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. The final section of the paper demonstrates the effectiveness of sequenceMiner for anomaly detection on a real set of discrete sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard HiddenMarkov Models, and show that our methods are superior
R
Vision Based Building Energy Data Outlier Detection Dataset
universe.roboflow.com
zip
Updated Apr 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
energy data outlier detection (2024). Vision Based Building Energy Data Outlier Detection Dataset [Dataset]. https://universe.roboflow.com/energy-data-outlier-detection/vision-based-building-energy-data-outlier-detection/model/5
Explore at:
zipAvailable download formats
Dataset updated
Apr 3, 2024
Dataset authored and provided by
energy data outlier detection
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
11785 Bounding Boxes
Description
Vision Based Building Energy Data Outlier Detection

## Overview Vision Based Building Energy Data Outlier Detection is a dataset for object detection tasks - it contains 11785 annotations for 2,159 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

technavio.com

Updated Oct 11, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2022). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis

Explore at:

Dataset updated

Oct 11, 2022

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

Canada, Mexico, United States, Germany, Global

Description

Snapshot img

Anomaly Detection Market Size 2025-2029

The anomaly detection market size is forecast to increase by USD 4.44 billion at a CAGR of 14.4% between 2024 and 2029.

The market is experiencing significant growth, particularly in the BFSI sector, as organizations increasingly prioritize identifying and addressing unusual patterns or deviations from normal business operations. The rising incidence of internal threats and cyber frauds necessitates the implementation of advanced anomaly detection tools to mitigate potential risks and maintain security. However, implementing these solutions comes with challenges, primarily infrastructural requirements. Ensuring compatibility with existing systems, integrating new technologies, and training staff to effectively utilize these tools pose significant hurdles for organizations.
Despite these challenges, the potential benefits of anomaly detection, such as improved risk management, enhanced operational efficiency, and increased security, make it an essential investment for businesses seeking to stay competitive and agile in today's complex and evolving threat landscape. Companies looking to capitalize on this market opportunity must carefully consider these challenges and develop strategies to address them effectively. Cloud computing is a key trend in the market, as cloud-based solutions offer quick deployment, flexibility, and scalability.

What will be the Size of the Anomaly Detection Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample

In the dynamic and evolving market, advanced technologies such as resource allocation, linear regression, pattern recognition, and support vector machines are increasingly being adopted for automated decision making. Businesses are leveraging these techniques to enhance customer experience through behavioral analytics, object detection, and sentiment analysis. Machine learning algorithms, including random forests, naive Bayes, decision trees, clustering algorithms, and k-nearest neighbors, are essential tools for risk management and compliance monitoring. AI-powered analytics, time series forecasting, and predictive modeling are revolutionizing business intelligence, while process optimization is achieved through the application of decision support systems, natural language processing, and predictive analytics.
Computer vision, image recognition, logistic regression, and operational efficiency are key areas where principal component analysis and artificial neural networks contribute significantly. Speech recognition and operational efficiency are also benefiting from these advanced technologies, enabling businesses to streamline processes and improve overall performance.

How is this Anomaly Detection Industry segmented?

The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment

  Cloud
  On-premises


Component

  Solution
  Services


End-user

  BFSI
  IT and telecom
  Retail and e-commerce
  Manufacturing
  Others


Technology

  Big data analytics
  AI and ML
  Data mining and business intelligence


Geography

  North America

    US
    Canada
    Mexico


  Europe

    France
    Germany
    Spain
    UK


  APAC

    China
    India
    Japan


  Rest of World (ROW)

By Deployment Insights

The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth due to the increasing adoption of advanced technologies such as machine learning models, statistical methods, and real-time monitoring. These technologies enable the identification of anomalous behavior in real-time, thereby enhancing network security and data privacy. Anomaly detection algorithms, including unsupervised learning, reinforcement learning, and deep learning networks, are used to identify outliers and intrusions in large datasets. Data security is a major concern, leading to the adoption of data masking, data pseudonymization, data de-identification, and differential privacy.

Data leakage prevention and incident response are critical components of an effective anomaly detection system. False positive and false negative rates are essential metrics to evaluate the performance of these systems. Time series analysis and concept drift are important techniques used in anomaly detection. Data obfuscation, data suppression, and data aggregation are other strategies employed to maintain data privacy. Companies such as Anodot, Cisco Systems Inc, IBM Corp, and SAS Institute Inc offer both cloud-based and on-premises anomaly detection solutions. These solutions use v

i
Fifth Generation Wireless Channels Outlier Detection and Clustering
ieee-dataport.org
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jojo Blanza (2024). Fifth Generation Wireless Channels Outlier Detection and Clustering [Dataset]. https://ieee-dataport.org/documents/fifth-generation-wireless-channels-outlier-detection-and-clustering
Explore at:
Dataset updated
May 27, 2024
Authors
Jojo Blanza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
lower latency
d
Data from: Detection of outlier loci and their utility for fisheries...
search.dataone.org
borealisdata.ca
+1more
Updated Mar 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Russello, Michael A; Kirk, Stephanie L; Frazer, Karen K; Askey, Paul J (2024). Data from: Detection of outlier loci and their utility for fisheries management [Dataset]. http://doi.org/10.5683/SP2/C3ICAM
Explore at:
Unique identifier
https://doi.org/10.5683/SP2/C3ICAM
Dataset updated
Mar 16, 2024
Dataset provided by
Borealis
Authors
Russello, Michael A; Kirk, Stephanie L; Frazer, Karen K; Askey, Paul J
Description
AbstractGenetics-based approaches have informed fisheries management for decades, yet remain challenging to implement within systems involving recently diverged stocks or where gene flow persists. In such cases, genetic markers exhibiting locus-specific (“outlier”) effects associated with divergent selection may provide promising alternatives to loci that reflect genome-wide (“neutral”) effects for guiding fisheries management. Okanagan Lake kokanee (Oncorhynchus nerka), a fishery of conservation concern, exhibits two sympatric ecotypes adapted to different reproductive environments, however, previous research demonstrated the limited utility of neutral microsatellites for assigning individuals. Here, we investigated the efficacy of an outlier-based approach to fisheries management by screening >11,000 expressed sequence tags for linked microsatellites and conducting genomic scans for kokanee sampled across seven spawning sites. We identified eight outliers among 52 polymorphic loci that detected ecotype-level divergence, whereas there was no evidence of divergence at neutral loci. Outlier loci exhibited the highest self-assignment accuracy to ecotype (92.1%), substantially outperforming 44 neutral loci (71.8%). Results were robust among-sampling years, with assignment and mixed composition estimates for individuals sampled in 2010 mirroring baseline results. Overall, outlier loci constitute promising alternatives for informing fisheries management involving recently diverged stocks, with potential applications for designating management units across a broad range of taxa., Usage notesOkanagan_Lake_kokanee_microsatellite_dataLength, in base-pairs, of alleles at up to 52 EST-linked and non-EST-linked microsatellite loci in 164 individual kokanee (Oncorhynchus nerka) sampled at seven spawning sites across Okanagan Lake, British Columbia over two sampling years (2007 and 2010). File in GenAlEx format with missing data coded as 0. Data collected with funds from NSERC, Habitat Conservation Trust Fund and Northwest Scientific Association.
f
Data from: Methodology to filter out outliers in high spatial density data...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14305658.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
Data from: Outlier classification using autoencoders: application for...
osti.gov
Updated Jun 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. http://doi.org/10.7910/DVN/SKEHRJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SKEHRJ
Dataset updated
Jun 2, 2021
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
Description
Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.
g
ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...
elki-project.github.io
data.niaid.nih.gov
+1more
Updated Sep 2, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erich Schubert; Arthur Zimek (2011). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. http://doi.org/10.5281/zenodo.6355684
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6355684
Dataset updated
Sep 2, 2011
Dataset provided by
University of Southern Denmark, Denmark
TU Dortmund University
Authors
Erich Schubert; Arthur Zimek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The "Amsterdam Library of Object Images" is a collection of 110250 images of 1000 small objects, taken under various light conditions and rotation angles. All objects were placed on a black background. Thus the images are taken under rather uniform conditions, which means there is little uncontrolled bias in the data set (unless mixed with other sources). They do however not resemble a "typical" image collection. The data set has a rather unique property for its size: there are around 100 different images of each object, so it is well suited for clustering. By downsampling some objects it can also be used for outlier detection. For multi-view research, we offer a number of different feature vector sets for evaluating this data set.

Facebook

Twitter

Click to copy link

Link copied

Cite

Giovanni Stilo; Bardh Prenkaj (2024). MNIST dataset for Outliers Detection - [ MNIST4OD ] [Dataset]. http://doi.org/10.6084/m9.figshare.9954986.v2

MNIST dataset for Outliers Detection - [ MNIST4OD ]

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.9954986.v2

Dataset updated

May 17, 2024

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Giovanni Stilo; Bardh Prenkaj

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Here we present a dataset, MNIST4OD, of large size (number of dimensions and number of instances) suitable for Outliers Detection task.The dataset is based on the famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).We build MNIST4OD in the following way:To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on the remaining images as outliers such as their number is equal to 10% of that of inliers. We repeat this dataset generation process for all digits. For implementation simplicity we then flatten the images (28 X 28) into vectors.Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x.The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point. The data contains also a column which indicates the original image class (0-9).See the following numbers for a complete list of the statistics of each datasets ( Name | Instances | Dimensions | Number of Outliers in % ):MNIST_0 | 7594 | 784 | 10MNIST_1 | 8665 | 784 | 10MNIST_2 | 7689 | 784 | 10MNIST_3 | 7856 | 784 | 10MNIST_4 | 7507 | 784 | 10MNIST_5 | 6945 | 784 | 10MNIST_6 | 7564 | 784 | 10MNIST_7 | 8023 | 784 | 10MNIST_8 | 7508 | 784 | 10MNIST_9 | 7654 | 784 | 10

Clear search

Close search

Google apps

Main menu

MNIST dataset for Outliers Detection - [ MNIST4OD ]

Algorithms for Speeding up Distance-Based Outlier Detection

Data from: A Diagnostic Procedure for Detecting Outliers in Linear...

Data for Filtering Organized 3D Point Clouds for Bin Picking Applications

Data from: Privacy Preserving Outlier Detection through Random Nonlinear...

Timings and statistical data of point model by our method.

Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

Outlier Set Two-step Method (OSTI)

Outlier Detection on Sensor Data - Dataset - LDM

Data from: Outlier detection in cylindrical data based on Mahalanobis...

Supporting data for \"A Standard Operating Procedure for Outlier Removal in...

Multi-Domain Outlier Detection Dataset

Anomaly Detection in Sequences

Vision Based Building Energy Data Outlier Detection Dataset

Vision Based Building Energy Data Outlier Detection

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Fifth Generation Wireless Channels Outlier Detection and Clustering

Data from: Detection of outlier loci and their utility for fisheries...

Data from: Methodology to filter out outliers in high spatial density data...

Data from: Outlier classification using autoencoders: application for...

ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

MNIST dataset for Outliers Detection - [ MNIST4OD ]