17 datasets found

f
The pseudocode of the isolated Forests.
plos.figshare.com
xls
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). The pseudocode of the isolated Forests. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315322.t002
Dataset updated
Mar 10, 2025
Dataset provided by
PLOS ONE
Authors
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Anomaly detection is widely used in cold chain logistics (CCL). But, because of the high cost and technical problem, the anomaly detection performance is poor, and the anomaly can not be detected in time, which affects the quality of goods. To solve these problems, the paper presents a new anomaly detection scheme for CCL. At first, the characteristics of the collected data of CCL are analyzed, the mathematical model of data flow is established, and the sliding window and correlation coefficient are defined. Then the abnormal events in CCL are summarized, and three types of abnormal judgment conditions based on cor-relation coefficient ρjk are deduced. A measurement anomaly detection algorithm based on the improved isolated forest algorithm is proposed. Subsampling and cross factor are designed and used to overcome the shortcomings of the isolated forest algorithm (iForest). Experiments have shown that as the dimensionality of the data increases, the performance indicators of the new scheme, such as P (precision), R (recall), F1 score, and AUC (area under the curve), become increasingly superior to commonly used support vector machines (SVM), local outlier factors (LOF), and iForests. Its average P is 0.8784, average R is 0.8731, average F1 score is 0.8639, and average AUC is 0.9064. However, the execution time of the improved algorithm is slightly longer than that of the iForest.
f
Feature importance calculated by Random Forest classifier considering the 80...
figshare.com
plos.figshare.com
xls
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Feature importance calculated by Random Forest classifier considering the 80 features previously selected by Select K Best. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286311.t010
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Feature importance calculated by Random Forest classifier considering the 80 features previously selected by Select K Best.
f
Comparative analysis with unsupervised anomaly detection algorithms.
plos.figshare.com
xls
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kenichiro Nagata; Toshikazu Tsuji; Kimitaka Suetsugu; Kayoko Muraoka; Hiroyuki Watanabe; Akiko Kanaya; Nobuaki Egashira; Ichiro Ieiri (2023). Comparative analysis with unsupervised anomaly detection algorithms. [Dataset]. http://doi.org/10.1371/journal.pone.0260315.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0260315.t005
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Kenichiro Nagata; Toshikazu Tsuji; Kimitaka Suetsugu; Kayoko Muraoka; Hiroyuki Watanabe; Akiko Kanaya; Nobuaki Egashira; Ichiro Ieiri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparative analysis with unsupervised anomaly detection algorithms.
Network Intrusion Detection
kaggle.com
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Şahide ŞEKER (2025). Network Intrusion Detection [Dataset]. https://www.kaggle.com/datasets/sahideseker/network-intrusion-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 3, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Şahide ŞEKER
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
🇺🇸 English:

This dataset simulates network traffic to help build intrusion detection models. It includes source/destination IPs, protocols, connection durations, and labels for different types of attacks.

Use this dataset to:

Train anomaly detection or classification models

Experiment with imbalanced cybersecurity data

Build intrusion detection systems with ML algorithms like XGBoost or Isolation Forest

Features:

src_ip: Source IP address

dst_ip: Destination IP address

protocol: Network protocol (TCP, UDP, ICMP)

duration: Duration of the connection

attack: Attack type label (e.g., normal, dos, probe, etc.)

🇹🇷 Türkçe:

Bu veri seti, siber güvenlik alanında ağ trafiği üzerinden saldırı tespiti yapılmasını sağlamak için oluşturulmuştur. Kaynak/varış IP'leri, protokol, bağlantı süresi ve saldırı türü etiketlerini içerir.

Bu veri seti ile:

Dengesiz veri üzerinde anomali tespiti yapabilirsiniz

Saldırı sınıflandırma algoritmaları geliştirebilirsiniz

XGBoost ve Isolation Forest gibi algoritmaları test edebilirsiniz

Özellikler:

src_ip: Kaynak IP adresi

dst_ip: Hedef IP adresi

protocol: Ağ protokolü (TCP, UDP, ICMP)

duration: Bağlantı süresi

attack: Saldırı türü etiketi (örneğin normal, dos, probe vs.)

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

technavio.com

Updated Oct 11, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2022). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis

Explore at:

Dataset updated

Oct 11, 2022

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

Canada, Mexico, Germany, United States, Global

Description

Snapshot img

Anomaly Detection Market Size 2025-2029

The anomaly detection market size is forecast to increase by USD 4.44 billion at a CAGR of 14.4% between 2024 and 2029.

The market is experiencing significant growth, particularly in the BFSI sector, as organizations increasingly prioritize identifying and addressing unusual patterns or deviations from normal business operations. The rising incidence of internal threats and cyber frauds necessitates the implementation of advanced anomaly detection tools to mitigate potential risks and maintain security. However, implementing these solutions comes with challenges, primarily infrastructural requirements. Ensuring compatibility with existing systems, integrating new technologies, and training staff to effectively utilize these tools pose significant hurdles for organizations.
Despite these challenges, the potential benefits of anomaly detection, such as improved risk management, enhanced operational efficiency, and increased security, make it an essential investment for businesses seeking to stay competitive and agile in today's complex and evolving threat landscape. Companies looking to capitalize on this market opportunity must carefully consider these challenges and develop strategies to address them effectively. Cloud computing is a key trend in the market, as cloud-based solutions offer quick deployment, flexibility, and scalability.

What will be the Size of the Anomaly Detection Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample

In the dynamic and evolving market, advanced technologies such as resource allocation, linear regression, pattern recognition, and support vector machines are increasingly being adopted for automated decision making. Businesses are leveraging these techniques to enhance customer experience through behavioral analytics, object detection, and sentiment analysis. Machine learning algorithms, including random forests, naive Bayes, decision trees, clustering algorithms, and k-nearest neighbors, are essential tools for risk management and compliance monitoring. AI-powered analytics, time series forecasting, and predictive modeling are revolutionizing business intelligence, while process optimization is achieved through the application of decision support systems, natural language processing, and predictive analytics.
Computer vision, image recognition, logistic regression, and operational efficiency are key areas where principal component analysis and artificial neural networks contribute significantly. Speech recognition and operational efficiency are also benefiting from these advanced technologies, enabling businesses to streamline processes and improve overall performance.

How is this Anomaly Detection Industry segmented?

The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment

  Cloud
  On-premises


Component

  Solution
  Services


End-user

  BFSI
  IT and telecom
  Retail and e-commerce
  Manufacturing
  Others


Technology

  Big data analytics
  AI and ML
  Data mining and business intelligence


Geography

  North America

    US
    Canada
    Mexico


  Europe

    France
    Germany
    Spain
    UK


  APAC

    China
    India
    Japan


  Rest of World (ROW)

By Deployment Insights

The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth due to the increasing adoption of advanced technologies such as machine learning models, statistical methods, and real-time monitoring. These technologies enable the identification of anomalous behavior in real-time, thereby enhancing network security and data privacy. Anomaly detection algorithms, including unsupervised learning, reinforcement learning, and deep learning networks, are used to identify outliers and intrusions in large datasets. Data security is a major concern, leading to the adoption of data masking, data pseudonymization, data de-identification, and differential privacy.

Data leakage prevention and incident response are critical components of an effective anomaly detection system. False positive and false negative rates are essential metrics to evaluate the performance of these systems. Time series analysis and concept drift are important techniques used in anomaly detection. Data obfuscation, data suppression, and data aggregation are other strategies employed to maintain data privacy. Companies such as Anodot, Cisco Systems Inc, IBM Corp, and SAS Institute Inc offer both cloud-based and on-premises anomaly detection solutions. These solutions use v

f
coldChainDataA.
figshare.com
bin
Updated Mar 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). coldChainDataA. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.s001
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315322.s001
Dataset updated
Mar 10, 2025
Dataset provided by
PLOS ONE
Authors
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Anomaly detection is widely used in cold chain logistics (CCL). But, because of the high cost and technical problem, the anomaly detection performance is poor, and the anomaly can not be detected in time, which affects the quality of goods. To solve these problems, the paper presents a new anomaly detection scheme for CCL. At first, the characteristics of the collected data of CCL are analyzed, the mathematical model of data flow is established, and the sliding window and correlation coefficient are defined. Then the abnormal events in CCL are summarized, and three types of abnormal judgment conditions based on cor-relation coefficient ρjk are deduced. A measurement anomaly detection algorithm based on the improved isolated forest algorithm is proposed. Subsampling and cross factor are designed and used to overcome the shortcomings of the isolated forest algorithm (iForest). Experiments have shown that as the dimensionality of the data increases, the performance indicators of the new scheme, such as P (precision), R (recall), F1 score, and AUC (area under the curve), become increasingly superior to commonly used support vector machines (SVM), local outlier factors (LOF), and iForests. Its average P is 0.8784, average R is 0.8731, average F1 score is 0.8639, and average AUC is 0.9064. However, the execution time of the improved algorithm is slightly longer than that of the iForest.
d
Using Decision Trees to Detect and Isolate Leaks in the J-2X
catalog.data.gov
s.cnmilf.com
+2more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Using Decision Trees to Detect and Isolate Leaks in the J-2X [Dataset]. https://catalog.data.gov/dataset/using-decision-trees-to-detect-and-isolate-leaks-in-the-j-2x
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt & Whitney Rocketdyne Fernando Figueroa, NASA Stennis Space Center Abstract The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically “learns” a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to “train” and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it “learned” a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location. Introduction The J-2X rocket engine will be tested on Test Stand A-1 at NASA Stennis Space Center (SSC) in Mississippi. A team including people from SSC, NASA Ames Research Center (ARC), and Pratt & Whitney Rocketdyne (PWR) is developing a prototype end-to-end integrated systems health management (ISHM) system that will be used to monitor the test stand and the engine while the engine is on the test stand[1]. The prototype will use several different methods for detecting and diagnosing faults in the test stand and the engine, including rule-based, model-based, and data-driven approaches. SSC is currently using the G2 tool http://www.gensym.com to develop rule-based and model-based fault detection and diagnosis capabilities for the A-1 test stand. This paper describes preliminary results in applying the data-driven approach to detecting and diagnosing faults in the J-2X engine. The conventional approach to detecting and diagnosing faults in complex engineered systems such as rocket engines and test stands is to use large numbers of human experts. Test controllers watch the data in near-real time during each engine test. Engineers study the data after each test. These experts are aided by limit checks that signal when a particular variable goes outside of a predetermined range. The conventional approach is very labor intensive. Also, humans may not be able to recognize faults that involve the relationships among large numbers of variables. Further, some potential faults could happen too quickly for humans to detect them and react before they become catastrophic. Automated fault detection and diagnosis is therefore needed. One approach to automation is to encode human knowledge into rules or models. Another approach is use data-driven methods to automatically learn models from historical data or simulated data. Our prototype will combine the data-driven approach with the model-based and rule-based appro
f
Confusion matrix for calculating the abnormal detection.
figshare.com
plos.figshare.com
xls
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). Confusion matrix for calculating the abnormal detection. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315322.t005
Dataset updated
Mar 10, 2025
Dataset provided by
PLOS ONE
Authors
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Confusion matrix for calculating the abnormal detection.
f
The comparison of performance indicators of four algorithms with 5 nodes.
figshare.com
plos.figshare.com
xls
Updated Mar 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). The comparison of performance indicators of four algorithms with 5 nodes. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315322.t007
Dataset updated
Mar 10, 2025
Dataset provided by
PLOS ONE
Authors
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The comparison of performance indicators of four algorithms with 5 nodes.
f
The pseudocode of the length calculation.
plos.figshare.com
xls
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). The pseudocode of the length calculation. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315322.t003
Dataset updated
Mar 10, 2025
Dataset provided by
PLOS ONE
Authors
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Anomaly detection is widely used in cold chain logistics (CCL). But, because of the high cost and technical problem, the anomaly detection performance is poor, and the anomaly can not be detected in time, which affects the quality of goods. To solve these problems, the paper presents a new anomaly detection scheme for CCL. At first, the characteristics of the collected data of CCL are analyzed, the mathematical model of data flow is established, and the sliding window and correlation coefficient are defined. Then the abnormal events in CCL are summarized, and three types of abnormal judgment conditions based on cor-relation coefficient ρjk are deduced. A measurement anomaly detection algorithm based on the improved isolated forest algorithm is proposed. Subsampling and cross factor are designed and used to overcome the shortcomings of the isolated forest algorithm (iForest). Experiments have shown that as the dimensionality of the data increases, the performance indicators of the new scheme, such as P (precision), R (recall), F1 score, and AUC (area under the curve), become increasingly superior to commonly used support vector machines (SVM), local outlier factors (LOF), and iForests. Its average P is 0.8784, average R is 0.8731, average F1 score is 0.8639, and average AUC is 0.9064. However, the execution time of the improved algorithm is slightly longer than that of the iForest.
f
Variance of data stream of different sliding window length.
plos.figshare.com
xls
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). Variance of data stream of different sliding window length. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315322.t006
Dataset updated
Mar 10, 2025
Dataset provided by
PLOS ONE
Authors
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Variance of data stream of different sliding window length.
f
Feature scores calculated by f-classification in Select K Best by IMU...
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Feature scores calculated by f-classification in Select K Best by IMU position and sensor. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286311.t009
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Feature scores calculated by f-classification in Select K Best by IMU position and sensor.
f
Number of observations after feature extraction per dataset per posture.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Number of observations after feature extraction per dataset per posture. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286311.t005
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Number of observations after feature extraction per dataset per posture.
f
Classification metrics per posture achieved using the best models selected...
plos.figshare.com
xls
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Classification metrics per posture achieved using the best models selected by grid search in Classifier 3 on the test and golden sets combined. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t018
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286311.t018
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classification metrics per posture achieved using the best models selected by grid search in Classifier 3 on the test and golden sets combined.
f
Comparison between Classifier 3 performance and previous studies reporting...
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Comparison between Classifier 3 performance and previous studies reporting inter-subject classification performance metrics per posture. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t019
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286311.t019
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The best geometric means between TPR and TNR (g-mean) are in bold.
f
Grid search hyper-parameter set for the classifiers.
figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Grid search hyper-parameter set for the classifiers. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286311.t006
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Grid search hyper-parameter set for the classifiers.
f
Data from: Unraveling C-to-U RNA editing events from direct RNA sequencing
tandf.figshare.com
pdf
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adriano Fonzino; Caterina Manzari; Paola Spadavecchia; Uday Munagala; Serena Torrini; Silvestro Conticello; Graziano Pesole; Ernesto Picardi (2025). Unraveling C-to-U RNA editing events from direct RNA sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.24800474.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24800474.v1
Dataset updated
May 12, 2025
Dataset provided by
Taylor & Francis
Authors
Adriano Fonzino; Caterina Manzari; Paola Spadavecchia; Uday Munagala; Serena Torrini; Silvestro Conticello; Graziano Pesole; Ernesto Picardi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In mammals, RNA editing events involve the conversion of adenosine (A) in inosine (I) by ADAR enzymes or the hydrolytic deamination of cytosine (C) in uracil (U) by the APOBEC family of enzymes, mostly APOBEC1. RNA editing has a plethora of biological functions, and its deregulation has been associated with various human disorders. While the large-scale detection of A-to-I is quite straightforward using the Illumina RNAseq technology, the identification of C-to-U events is a non-trivial task. This difficulty arises from the rarity of such events in eukaryotic genomes and the challenge of distinguishing them from background noise. Direct RNA sequencing by Oxford Nanopore Technology (ONT) permits the direct detection of Us on sequenced RNA reads. Surprisingly, using ONT reads from wild-type (WT) and APOBEC1-knock-out (KO) murine cell lines as well as in vitro synthesized RNA without any modification, we identified a systematic error affecting the accuracy of the Cs call, thereby leading to incorrect identifications of C-to-U events. To overcome this issue in direct RNA reads, here we introduce a novel machine learning strategy based on the isolation Forest (iForest) algorithm in which C-to-U editing events are considered as sequencing anomalies. Using in vitro synthesized and human ONT reads, our model optimizes the signal-to-noise ratio improving the detection of C-to-U editing sites with high accuracy, over 90% in all samples tested. Our results suggest that iForest, known for its rapid implementation and minimal memory requirements, is a promising tool to denoise ONT reads and reliably identify RNA modifications.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). The pseudocode of the isolated Forests. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t002

The pseudocode of the isolated Forests.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0315322.t002

Dataset updated

Mar 10, 2025

Dataset provided by

PLOS ONE

Authors

Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Anomaly detection is widely used in cold chain logistics (CCL). But, because of the high cost and technical problem, the anomaly detection performance is poor, and the anomaly can not be detected in time, which affects the quality of goods. To solve these problems, the paper presents a new anomaly detection scheme for CCL. At first, the characteristics of the collected data of CCL are analyzed, the mathematical model of data flow is established, and the sliding window and correlation coefficient are defined. Then the abnormal events in CCL are summarized, and three types of abnormal judgment conditions based on cor-relation coefficient ρjk are deduced. A measurement anomaly detection algorithm based on the improved isolated forest algorithm is proposed. Subsampling and cross factor are designed and used to overcome the shortcomings of the isolated forest algorithm (iForest). Experiments have shown that as the dimensionality of the data increases, the performance indicators of the new scheme, such as P (precision), R (recall), F1 score, and AUC (area under the curve), become increasingly superior to commonly used support vector machines (SVM), local outlier factors (LOF), and iForests. Its average P is 0.8784, average R is 0.8731, average F1 score is 0.8639, and average AUC is 0.9064. However, the execution time of the improved algorithm is slightly longer than that of the iForest.

Clear search

Close search

Google apps

Main menu

The pseudocode of the isolated Forests.

Feature importance calculated by Random Forest classifier considering the 80...

Comparative analysis with unsupervised anomaly detection algorithms.

Network Intrusion Detection

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

coldChainDataA.

Using Decision Trees to Detect and Isolate Leaks in the J-2X

Confusion matrix for calculating the abnormal detection.

The comparison of performance indicators of four algorithms with 5 nodes.

The pseudocode of the length calculation.

Variance of data stream of different sliding window length.

Feature scores calculated by f-classification in Select K Best by IMU...

Number of observations after feature extraction per dataset per posture.

Classification metrics per posture achieved using the best models selected...

Comparison between Classifier 3 performance and previous studies reporting...

Grid search hyper-parameter set for the classifiers.

Data from: Unraveling C-to-U RNA editing events from direct RNA sequencing

The pseudocode of the isolated Forests.