17 datasets found
  1. f

    The pseudocode of the isolated Forests.

    • plos.figshare.com
    xls
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). The pseudocode of the isolated Forests. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anomaly detection is widely used in cold chain logistics (CCL). But, because of the high cost and technical problem, the anomaly detection performance is poor, and the anomaly can not be detected in time, which affects the quality of goods. To solve these problems, the paper presents a new anomaly detection scheme for CCL. At first, the characteristics of the collected data of CCL are analyzed, the mathematical model of data flow is established, and the sliding window and correlation coefficient are defined. Then the abnormal events in CCL are summarized, and three types of abnormal judgment conditions based on cor-relation coefficient ρjk are deduced. A measurement anomaly detection algorithm based on the improved isolated forest algorithm is proposed. Subsampling and cross factor are designed and used to overcome the shortcomings of the isolated forest algorithm (iForest). Experiments have shown that as the dimensionality of the data increases, the performance indicators of the new scheme, such as P (precision), R (recall), F1 score, and AUC (area under the curve), become increasingly superior to commonly used support vector machines (SVM), local outlier factors (LOF), and iForests. Its average P is 0.8784, average R is 0.8731, average F1 score is 0.8639, and average AUC is 0.9064. However, the execution time of the improved algorithm is slightly longer than that of the iForest.

  2. f

    Feature importance calculated by Random Forest classifier considering the 80...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Feature importance calculated by Random Forest classifier considering the 80 features previously selected by Select K Best. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Feature importance calculated by Random Forest classifier considering the 80 features previously selected by Select K Best.

  3. f

    Comparative analysis with unsupervised anomaly detection algorithms.

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenichiro Nagata; Toshikazu Tsuji; Kimitaka Suetsugu; Kayoko Muraoka; Hiroyuki Watanabe; Akiko Kanaya; Nobuaki Egashira; Ichiro Ieiri (2023). Comparative analysis with unsupervised anomaly detection algorithms. [Dataset]. http://doi.org/10.1371/journal.pone.0260315.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kenichiro Nagata; Toshikazu Tsuji; Kimitaka Suetsugu; Kayoko Muraoka; Hiroyuki Watanabe; Akiko Kanaya; Nobuaki Egashira; Ichiro Ieiri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparative analysis with unsupervised anomaly detection algorithms.

  4. Network Intrusion Detection

    • kaggle.com
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Şahide ŞEKER (2025). Network Intrusion Detection [Dataset]. https://www.kaggle.com/datasets/sahideseker/network-intrusion-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Şahide ŞEKER
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🇺🇸 English:

    This dataset simulates network traffic to help build intrusion detection models. It includes source/destination IPs, protocols, connection durations, and labels for different types of attacks.

    Use this dataset to:

    • Train anomaly detection or classification models
    • Experiment with imbalanced cybersecurity data
    • Build intrusion detection systems with ML algorithms like XGBoost or Isolation Forest

    Features:

    • src_ip: Source IP address
    • dst_ip: Destination IP address
    • protocol: Network protocol (TCP, UDP, ICMP)
    • duration: Duration of the connection
    • attack: Attack type label (e.g., normal, dos, probe, etc.)

    🇹🇷 Türkçe:

    Bu veri seti, siber güvenlik alanında ağ trafiği üzerinden saldırı tespiti yapılmasını sağlamak için oluşturulmuştur. Kaynak/varış IP'leri, protokol, bağlantı süresi ve saldırı türü etiketlerini içerir.

    Bu veri seti ile:

    • Dengesiz veri üzerinde anomali tespiti yapabilirsiniz
    • Saldırı sınıflandırma algoritmaları geliştirebilirsiniz
    • XGBoost ve Isolation Forest gibi algoritmaları test edebilirsiniz

    Özellikler:

    • src_ip: Kaynak IP adresi
    • dst_ip: Hedef IP adresi
    • protocol: Ağ protokolü (TCP, UDP, ICMP)
    • duration: Bağlantı süresi
    • attack: Saldırı türü etiketi (örneğin normal, dos, probe vs.)
  5. Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

    • technavio.com
    Updated Oct 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2022). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis
    Explore at:
    Dataset updated
    Oct 11, 2022
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Canada, Mexico, Germany, United States, Global
    Description

    Snapshot img

    Anomaly Detection Market Size 2025-2029

    The anomaly detection market size is forecast to increase by USD 4.44 billion at a CAGR of 14.4% between 2024 and 2029.

    The market is experiencing significant growth, particularly in the BFSI sector, as organizations increasingly prioritize identifying and addressing unusual patterns or deviations from normal business operations. The rising incidence of internal threats and cyber frauds necessitates the implementation of advanced anomaly detection tools to mitigate potential risks and maintain security. However, implementing these solutions comes with challenges, primarily infrastructural requirements. Ensuring compatibility with existing systems, integrating new technologies, and training staff to effectively utilize these tools pose significant hurdles for organizations.
    Despite these challenges, the potential benefits of anomaly detection, such as improved risk management, enhanced operational efficiency, and increased security, make it an essential investment for businesses seeking to stay competitive and agile in today's complex and evolving threat landscape. Companies looking to capitalize on this market opportunity must carefully consider these challenges and develop strategies to address them effectively. Cloud computing is a key trend in the market, as cloud-based solutions offer quick deployment, flexibility, and scalability.
    

    What will be the Size of the Anomaly Detection Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free Sample

    In the dynamic and evolving market, advanced technologies such as resource allocation, linear regression, pattern recognition, and support vector machines are increasingly being adopted for automated decision making. Businesses are leveraging these techniques to enhance customer experience through behavioral analytics, object detection, and sentiment analysis. Machine learning algorithms, including random forests, naive Bayes, decision trees, clustering algorithms, and k-nearest neighbors, are essential tools for risk management and compliance monitoring. AI-powered analytics, time series forecasting, and predictive modeling are revolutionizing business intelligence, while process optimization is achieved through the application of decision support systems, natural language processing, and predictive analytics.
    Computer vision, image recognition, logistic regression, and operational efficiency are key areas where principal component analysis and artificial neural networks contribute significantly. Speech recognition and operational efficiency are also benefiting from these advanced technologies, enabling businesses to streamline processes and improve overall performance.
    

    How is this Anomaly Detection Industry segmented?

    The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Deployment
    
      Cloud
      On-premises
    
    
    Component
    
      Solution
      Services
    
    
    End-user
    
      BFSI
      IT and telecom
      Retail and e-commerce
      Manufacturing
      Others
    
    
    Technology
    
      Big data analytics
      AI and ML
      Data mining and business intelligence
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        France
        Germany
        Spain
        UK
    
    
      APAC
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By Deployment Insights

    The cloud segment is estimated to witness significant growth during the forecast period. The market is witnessing significant growth due to the increasing adoption of advanced technologies such as machine learning models, statistical methods, and real-time monitoring. These technologies enable the identification of anomalous behavior in real-time, thereby enhancing network security and data privacy. Anomaly detection algorithms, including unsupervised learning, reinforcement learning, and deep learning networks, are used to identify outliers and intrusions in large datasets. Data security is a major concern, leading to the adoption of data masking, data pseudonymization, data de-identification, and differential privacy.

    Data leakage prevention and incident response are critical components of an effective anomaly detection system. False positive and false negative rates are essential metrics to evaluate the performance of these systems. Time series analysis and concept drift are important techniques used in anomaly detection. Data obfuscation, data suppression, and data aggregation are other strategies employed to maintain data privacy. Companies such as Anodot, Cisco Systems Inc, IBM Corp, and SAS Institute Inc offer both cloud-based and on-premises anomaly detection solutions. These solutions use v

  6. f

    coldChainDataA.

    • figshare.com
    bin
    Updated Mar 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). coldChainDataA. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anomaly detection is widely used in cold chain logistics (CCL). But, because of the high cost and technical problem, the anomaly detection performance is poor, and the anomaly can not be detected in time, which affects the quality of goods. To solve these problems, the paper presents a new anomaly detection scheme for CCL. At first, the characteristics of the collected data of CCL are analyzed, the mathematical model of data flow is established, and the sliding window and correlation coefficient are defined. Then the abnormal events in CCL are summarized, and three types of abnormal judgment conditions based on cor-relation coefficient ρjk are deduced. A measurement anomaly detection algorithm based on the improved isolated forest algorithm is proposed. Subsampling and cross factor are designed and used to overcome the shortcomings of the isolated forest algorithm (iForest). Experiments have shown that as the dimensionality of the data increases, the performance indicators of the new scheme, such as P (precision), R (recall), F1 score, and AUC (area under the curve), become increasingly superior to commonly used support vector machines (SVM), local outlier factors (LOF), and iForests. Its average P is 0.8784, average R is 0.8731, average F1 score is 0.8639, and average AUC is 0.9064. However, the execution time of the improved algorithm is slightly longer than that of the iForest.

  7. d

    Using Decision Trees to Detect and Isolate Leaks in the J-2X

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Using Decision Trees to Detect and Isolate Leaks in the J-2X [Dataset]. https://catalog.data.gov/dataset/using-decision-trees-to-detect-and-isolate-leaks-in-the-j-2x
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Full title: Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine Mark Schwabacher, NASA Ames Research Center Robert Aguilar, Pratt & Whitney Rocketdyne Fernando Figueroa, NASA Stennis Space Center Abstract The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically “learns” a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to “train” and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it “learned” a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location. Introduction The J-2X rocket engine will be tested on Test Stand A-1 at NASA Stennis Space Center (SSC) in Mississippi. A team including people from SSC, NASA Ames Research Center (ARC), and Pratt & Whitney Rocketdyne (PWR) is developing a prototype end-to-end integrated systems health management (ISHM) system that will be used to monitor the test stand and the engine while the engine is on the test stand[1]. The prototype will use several different methods for detecting and diagnosing faults in the test stand and the engine, including rule-based, model-based, and data-driven approaches. SSC is currently using the G2 tool http://www.gensym.com to develop rule-based and model-based fault detection and diagnosis capabilities for the A-1 test stand. This paper describes preliminary results in applying the data-driven approach to detecting and diagnosing faults in the J-2X engine. The conventional approach to detecting and diagnosing faults in complex engineered systems such as rocket engines and test stands is to use large numbers of human experts. Test controllers watch the data in near-real time during each engine test. Engineers study the data after each test. These experts are aided by limit checks that signal when a particular variable goes outside of a predetermined range. The conventional approach is very labor intensive. Also, humans may not be able to recognize faults that involve the relationships among large numbers of variables. Further, some potential faults could happen too quickly for humans to detect them and react before they become catastrophic. Automated fault detection and diagnosis is therefore needed. One approach to automation is to encode human knowledge into rules or models. Another approach is use data-driven methods to automatically learn models from historical data or simulated data. Our prototype will combine the data-driven approach with the model-based and rule-based appro

  8. f

    Confusion matrix for calculating the abnormal detection.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). Confusion matrix for calculating the abnormal detection. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Confusion matrix for calculating the abnormal detection.

  9. f

    The comparison of performance indicators of four algorithms with 5 nodes.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Mar 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). The comparison of performance indicators of four algorithms with 5 nodes. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The comparison of performance indicators of four algorithms with 5 nodes.

  10. f

    The pseudocode of the length calculation.

    • plos.figshare.com
    xls
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). The pseudocode of the length calculation. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anomaly detection is widely used in cold chain logistics (CCL). But, because of the high cost and technical problem, the anomaly detection performance is poor, and the anomaly can not be detected in time, which affects the quality of goods. To solve these problems, the paper presents a new anomaly detection scheme for CCL. At first, the characteristics of the collected data of CCL are analyzed, the mathematical model of data flow is established, and the sliding window and correlation coefficient are defined. Then the abnormal events in CCL are summarized, and three types of abnormal judgment conditions based on cor-relation coefficient ρjk are deduced. A measurement anomaly detection algorithm based on the improved isolated forest algorithm is proposed. Subsampling and cross factor are designed and used to overcome the shortcomings of the isolated forest algorithm (iForest). Experiments have shown that as the dimensionality of the data increases, the performance indicators of the new scheme, such as P (precision), R (recall), F1 score, and AUC (area under the curve), become increasingly superior to commonly used support vector machines (SVM), local outlier factors (LOF), and iForests. Its average P is 0.8784, average R is 0.8731, average F1 score is 0.8639, and average AUC is 0.9064. However, the execution time of the improved algorithm is slightly longer than that of the iForest.

  11. f

    Variance of data stream of different sliding window length.

    • plos.figshare.com
    xls
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). Variance of data stream of different sliding window length. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Variance of data stream of different sliding window length.

  12. f

    Feature scores calculated by f-classification in Select K Best by IMU...

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Feature scores calculated by f-classification in Select K Best by IMU position and sensor. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Feature scores calculated by f-classification in Select K Best by IMU position and sensor.

  13. f

    Number of observations after feature extraction per dataset per posture.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Number of observations after feature extraction per dataset per posture. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of observations after feature extraction per dataset per posture.

  14. f

    Classification metrics per posture achieved using the best models selected...

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Classification metrics per posture achieved using the best models selected by grid search in Classifier 3 on the test and golden sets combined. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t018
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification metrics per posture achieved using the best models selected by grid search in Classifier 3 on the test and golden sets combined.

  15. f

    Comparison between Classifier 3 performance and previous studies reporting...

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Comparison between Classifier 3 performance and previous studies reporting inter-subject classification performance metrics per posture. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t019
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The best geometric means between TPR and TNR (g-mean) are in bold.

  16. f

    Grid search hyper-parameter set for the classifiers.

    • figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin (2023). Grid search hyper-parameter set for the classifiers. [Dataset]. http://doi.org/10.1371/journal.pone.0286311.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Marinara Marcato; Salvatore Tedesco; Conor O’Mahony; Brendan O’Flynn; Paul Galvin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Grid search hyper-parameter set for the classifiers.

  17. f

    Data from: Unraveling C-to-U RNA editing events from direct RNA sequencing

    • tandf.figshare.com
    pdf
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adriano Fonzino; Caterina Manzari; Paola Spadavecchia; Uday Munagala; Serena Torrini; Silvestro Conticello; Graziano Pesole; Ernesto Picardi (2025). Unraveling C-to-U RNA editing events from direct RNA sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.24800474.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Adriano Fonzino; Caterina Manzari; Paola Spadavecchia; Uday Munagala; Serena Torrini; Silvestro Conticello; Graziano Pesole; Ernesto Picardi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In mammals, RNA editing events involve the conversion of adenosine (A) in inosine (I) by ADAR enzymes or the hydrolytic deamination of cytosine (C) in uracil (U) by the APOBEC family of enzymes, mostly APOBEC1. RNA editing has a plethora of biological functions, and its deregulation has been associated with various human disorders. While the large-scale detection of A-to-I is quite straightforward using the Illumina RNAseq technology, the identification of C-to-U events is a non-trivial task. This difficulty arises from the rarity of such events in eukaryotic genomes and the challenge of distinguishing them from background noise. Direct RNA sequencing by Oxford Nanopore Technology (ONT) permits the direct detection of Us on sequenced RNA reads. Surprisingly, using ONT reads from wild-type (WT) and APOBEC1-knock-out (KO) murine cell lines as well as in vitro synthesized RNA without any modification, we identified a systematic error affecting the accuracy of the Cs call, thereby leading to incorrect identifications of C-to-U events. To overcome this issue in direct RNA reads, here we introduce a novel machine learning strategy based on the isolation Forest (iForest) algorithm in which C-to-U editing events are considered as sequencing anomalies. Using in vitro synthesized and human ONT reads, our model optimizes the signal-to-noise ratio improving the detection of C-to-U editing sites with high accuracy, over 90% in all samples tested. Our results suggest that iForest, known for its rapid implementation and minimal memory requirements, is a promising tool to denoise ONT reads and reliably identify RNA modifications.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo (2025). The pseudocode of the isolated Forests. [Dataset]. http://doi.org/10.1371/journal.pone.0315322.t002

The pseudocode of the isolated Forests.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Mar 10, 2025
Dataset provided by
PLOS ONE
Authors
Zhibo Xie; Heng Long; Chengyi Ling; Yingjun Zhou; Yan Luo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Anomaly detection is widely used in cold chain logistics (CCL). But, because of the high cost and technical problem, the anomaly detection performance is poor, and the anomaly can not be detected in time, which affects the quality of goods. To solve these problems, the paper presents a new anomaly detection scheme for CCL. At first, the characteristics of the collected data of CCL are analyzed, the mathematical model of data flow is established, and the sliding window and correlation coefficient are defined. Then the abnormal events in CCL are summarized, and three types of abnormal judgment conditions based on cor-relation coefficient ρjk are deduced. A measurement anomaly detection algorithm based on the improved isolated forest algorithm is proposed. Subsampling and cross factor are designed and used to overcome the shortcomings of the isolated forest algorithm (iForest). Experiments have shown that as the dimensionality of the data increases, the performance indicators of the new scheme, such as P (precision), R (recall), F1 score, and AUC (area under the curve), become increasingly superior to commonly used support vector machines (SVM), local outlier factors (LOF), and iForests. Its average P is 0.8784, average R is 0.8731, average F1 score is 0.8639, and average AUC is 0.9064. However, the execution time of the improved algorithm is slightly longer than that of the iForest.

Search
Clear search
Close search
Google apps
Main menu