Facebook
TwitterThis chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Facebook
TwitterThis chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Data Science Platform Market Size 2025-2029
The data science platform market size is valued to increase USD 763.9 million, at a CAGR of 40.2% from 2024 to 2029. Integration of AI and ML technologies with data science platforms will drive the data science platform market.
Major Market Trends & Insights
North America dominated the market and accounted for a 48% growth during the forecast period.
By Deployment - On-premises segment was valued at USD 38.70 million in 2023
By Component - Platform segment accounted for the largest market revenue share in 2023
Market Size & Forecast
Market Opportunities: USD 1.00 million
Market Future Opportunities: USD 763.90 million
CAGR : 40.2%
North America: Largest market in 2023
Market Summary
The market represents a dynamic and continually evolving landscape, underpinned by advancements in core technologies and applications. Key technologies, such as machine learning and artificial intelligence, are increasingly integrated into data science platforms to enhance predictive analytics and automate data processing. Additionally, the emergence of containerization and microservices in data science platforms enables greater flexibility and scalability. However, the market also faces challenges, including data privacy and security risks, which necessitate robust compliance with regulations.
According to recent estimates, the market is expected to account for over 30% of the overall big data analytics market by 2025, underscoring its growing importance in the data-driven business landscape.
What will be the Size of the Data Science Platform Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
How is the Data Science Platform Market Segmented and what are the key trends of market segmentation?
The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Deployment
On-premises
Cloud
Component
Platform
Services
End-user
BFSI
Retail and e-commerce
Manufacturing
Media and entertainment
Others
Sector
Large enterprises
SMEs
Application
Data Preparation
Data Visualization
Machine Learning
Predictive Analytics
Data Governance
Others
Geography
North America
US
Canada
Europe
France
Germany
UK
Middle East and Africa
UAE
APAC
China
India
Japan
South America
Brazil
Rest of World (ROW)
By Deployment Insights
The on-premises segment is estimated to witness significant growth during the forecast period.
In the dynamic and evolving the market, big data processing is a key focus, enabling advanced model accuracy metrics through various data mining methods. Distributed computing and algorithm optimization are integral components, ensuring efficient handling of large datasets. Data governance policies are crucial for managing data security protocols and ensuring data lineage tracking. Software development kits, model versioning, and anomaly detection systems facilitate seamless development, deployment, and monitoring of predictive modeling techniques, including machine learning algorithms, regression analysis, and statistical modeling. Real-time data streaming and parallelized algorithms enable real-time insights, while predictive modeling techniques and machine learning algorithms drive business intelligence and decision-making.
Cloud computing infrastructure, data visualization tools, high-performance computing, and database management systems support scalable data solutions and efficient data warehousing. ETL processes and data integration pipelines ensure data quality assessment and feature engineering techniques. Clustering techniques and natural language processing are essential for advanced data analysis. The market is witnessing significant growth, with adoption increasing by 18.7% in the past year, and industry experts anticipate a further expansion of 21.6% in the upcoming period. Companies across various sectors are recognizing the potential of data science platforms, leading to a surge in demand for scalable, secure, and efficient solutions.
API integration services and deep learning frameworks are gaining traction, offering advanced capabilities and seamless integration with existing systems. Data security protocols and model explainability methods are becoming increasingly important, ensuring transparency and trust in data-driven decision-making. The market is expected to continue unfolding, with ongoing advancements in technology and evolving business needs shaping its future trajectory.
Request Free Sample
The On-premises segment was valued at USD 38.70 million in 2019 and showed
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As an important technique for data pre-processing, outlier detection plays a crucial role in various real applications and has gained substantial attention, especially in medical fields. Despite the importance of outlier detection, many existing methods are vulnerable to the distribution of outliers and require prior knowledge, such as the outlier proportion. To address this problem to some extent, this article proposes an adaptive mini-minimum spanning tree-based outlier detection (MMOD) method, which utilizes a novel distance measure by scaling the Euclidean distance. For datasets containing different densities and taking on different shapes, our method can identify outliers without prior knowledge of outlier percentages. The results on both real-world medical data corpora and intuitive synthetic datasets demonstrate the effectiveness of the proposed method compared to state-of-the-art methods.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cotton fiber development is still an intriguing question to understand fiber commitment and development. At different fiber developmental stages, many genes change their expression pattern and have a pivotal role in fiber quality and yield. Recently, numerous studies have been conducted for transcriptional regulation of fiber, and raw data were deposited to the public repository for comprehensive integrative analysis. Here, we remapped > 380 cotton RNAseq data with uniform mapping strategies that span ∼400 fold coverage to the genome. We identified stage-specific features related to fiber cell commitment, initiation, elongation, and Secondary Cell Wall (SCW) synthesis and their putative cis-regulatory elements for the specific regulation in fiber development. We also mined Exclusively Expressed Transcripts (EETs) that were positively selected during cotton fiber evolution and domestication. Furthermore, the expression of EETs was validated in 100 cotton genotypes through the nCounter assay and correlated with different fiber-related traits. Thus, our data mining study reveals several important features related to cotton fiber development and improvement, which were consolidated in the “CottonExpress-omics” database.
Facebook
TwitterTumor microenvironment (TME) plays a crucial role in the initiation and progression of lung adenocarcinoma (LUAD); however, there is still a challenge in understanding the dynamic modulation of the immune and stromal components in TME. In the presented study, we applied CIBERSORT and ESTIMATE computational methods to calculate the proportion of tumor-infiltrating immune cell (TIC) and the amount of immune and stromal components in 551 LUAD cases from The Cancer Genome Atlas (TCGA) database. The differentially expressed genes (DEGs) were analyzed by COX regression analysis and protein–protein interaction (PPI) network construction. Then, Bruton tyrosine kinase (BTK) was determined as a predictive factor by the intersection analysis of univariate COX and PPI. Further analysis revealed that BTK expression was negatively correlated with the clinical pathologic characteristics (clinical stage, distant metastasis) and positively correlated with the survival of LUAD patients. Gene Set Enrichment Analysis (GSEA) showed that the genes in the high-expression BTK group were mainly enriched in immune-related activities. In the low-expression BTK group, the genes were enriched in metabolic pathways. CIBERSORT analysis for the proportion of TICs revealed that B-cell memory and CD8+ T cells were positively correlated with BTK expression, suggesting that BTK might be responsible for the preservation of immune-dominant status for TME. Thus, the levels of BTK might be useful for outlining the prognosis of LUAD patients and especially be a clue that the status of TME transition from immune-dominant to metabolic activity, which offered an extra insight for therapeutics of LUAD.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib
Title: Wine Quality
Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
Past Usage:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).
Relevant Information:
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Number of Instances: red wine - 1599; white wine - 4898.
Number of Attributes: 11 + output attribute
Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.
Attribute information:
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Missing Attribute Values: None
Description of attributes:
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines
4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
5 - chlorides: the amount of salt in the wine
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant
11 - alcohol: the percent alcohol content of the wine
Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundSub-Saharan Africa faces high neonatal and maternal mortality rates due to limited access to skilled healthcare during delivery. This study aims to improve the classification of health facilities and home deliveries using advanced machine learning techniques and to explore factors influencing women's choices of delivery locations in East Africa.MethodThe study focused on 86,009 childbearing women in East Africa. A comparative analysis of 12 advanced machine learning algorithms was conducted, utilizing various data balancing techniques and hyperparameter optimization methods to enhance model performance.ResultThe prevalence of health facility delivery in East Africa was found to be 83.71%. The findings showed that the support vector machine (SVM) algorithm and CatBoost performed best in predicting the place of delivery, in which both of those algorithms scored an accuracy of 95% and an AUC of 0.98 after optimized with Bayesian optimization tuning and insignificant difference between them in all comprehensive analysis of metrics performance. Factors associated with facility-based deliveries were identified using association rule mining, including parental education levels, timing of initial antenatal care (ANC) check-ups, wealth status, marital status, mobile phone ownership, religious affiliation, media accessibility, and birth order.ConclusionThis study underscores the vital role of machine learning algorithms in predicting health facility deliveries. A slight decline in facility deliveries from previous reports highlights the urgent need for targeted interventions to meet Sustainable Development Goals (SDGs), particularly in maternal health. The study recommends promoting facility-based deliveries. These include raising awareness about skilled birth attendance, encouraging early ANC check-up, addressing financial barriers through targeted support programs, implementing culturally sensitive interventions, utilizing media campaigns, and mobile health initiatives. Design specific interventions tailored to the birth order of the child, recognizing that mothers may have different informational needs depending on whether it is their first or subsequent delivery. Furthermore, we recommended researchers to explore a variety of techniques and validate findings using more recent data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bacterial small RNAs (sRNAs) play a vital role in pathogenesis by enabling rapid, efficient networks of gene attenuation during infection. In recent decades, there has been a surge in the number of proposed and biochemically-confirmed sRNAs in both Gram-positive and Gram-negative pathogens. However, limited homology, network complexity, and condition specificity of sRNA has stunted complete characterization of the activity and regulation of these RNA regulators. To streamline the discovery of the expression of sRNAs, and their post-transcriptional activities, we propose an integrative in vivo data-mining approach that couples DNA protein occupancy, RNA-seq, and RNA accessibility data with motif identification and target prediction algorithms. We benchmark the approach against a subset of well-characterized E. coli sRNAs for which a degree of in vivo transcriptional regulation and post-transcriptional activity has been previously reported, finding support for known regulation in a large proportion of this sRNA set. We showcase the abilities of our method to expand understanding of sRNA RseX, a known envelope stress-linked sRNA for which a cellular role has been elusive due to a lack of native expression detection. Using the presented approach, we identify a small set of putative RseX regulators and targets for experimental investigation. These findings have allowed us to confirm native RseX expression under conditions that eliminate H-NS repression as well as uncover a post-transcriptional role of RseX in fimbrial regulation. Beyond RseX, we uncover 163 putative regulatory DNA-binding protein sites, corresponding to regulation of 62 sRNAs, that could lead to new understanding of sRNA transcription regulation. For 32 sRNAs, we also propose a subset of top targets filtered by engagement of regions that exhibit binding site accessibility behavior in vivo. We broadly anticipate that the proposed approach will be useful for sRNA-reliant network characterization in bacteria. Such investigations under pathogenesis-relevant environmental conditions will enable us to deduce complex rapid-regulation schemes that support infection.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)). The prediction task is to determine whether a person makes over $50K a year.
Description of fnlwgt (final weight) The weights on the Current Population Survey (CPS) files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls. These are:
A single cell estimate of the population 16+ for each state.
Controls for Hispanic Origin by age and sex.
Controls by Race, age and sex.
We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population. People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.
Relevant papers Ron Kohavi, "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996. (PDF)
**## NEW : CTGAN used to generated more data**
Facebook
TwitterP. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib
Title: Wine Quality
Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
Past Usage:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).
Relevant Information:
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Number of Instances: red wine - 1599; white wine - 4898.
Number of Attributes: 11 + output attribute
Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.
Attribute information:
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Missing Attribute Values: None
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)). The prediction task is to determine whether a person makes over $50K a year.
The weights on the Current Population Survey (CPS) files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls. These are:
A single cell estimate of the population 16+ for each state.
Controls for Hispanic Origin by age and sex.
Controls by Race, age and sex.
We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population. People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.
Ron Kohavi, "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996. (PDF)
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Automatic Identification And Data Capture Market Size 2024-2028
The automatic identification and data capture market size is valued to increase by USD 21.52 billion, at a CAGR of 8.1% from 2023 to 2028. Increasing applications of RFID will drive the automatic identification and data capture market.
Market Insights
North America dominated the market and accounted for a 47% growth during the 2024-2028.
By Product - RFID products segment was valued at USD 18.41 billion in 2022
By segment2 - segment2_1 segment accounted for the largest market revenue share in 2022
Market Size & Forecast
Market Opportunities: USD 79.34 million
Market Future Opportunities 2023: USD 21520.40 million
CAGR from 2023 to 2028 : 8.1%
Market Summary
The Automatic Identification and Data Capture (AIDC) market encompasses technologies and solutions that enable businesses to capture and process data in real time. This market is driven by the increasing adoption of RFID technology, which offers benefits such as improved supply chain visibility, inventory management, and operational efficiency. The growing popularity of smart factories, where automation and data-driven processes are integral, further fuels the demand for AIDC solutions. However, the market also faces challenges, including security concerns. With the increasing use of AIDC technologies, there is a growing need to ensure data privacy and security. This has led to the development of advanced encryption techniques and access control mechanisms to mitigate potential risks. A real-world business scenario illustrating the importance of AIDC is in the retail industry. Retailers use AIDC technologies such as RFID tags and barcode scanners to manage inventory levels, track stock movements, and optimize supply chain operations. By automating data capture processes, retailers can reduce manual errors, improve order fulfillment accuracy, and enhance the overall customer experience. Despite the challenges, the AIDC market continues to grow, driven by the need for real-time data processing and automation across various industries.
What will be the size of the Automatic Identification And Data Capture Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free SampleThe Automatic Identification and Data Capture (AIDC) market continues to evolve, driven by advancements in technology and increasing business demands. AIDC solutions, including barcode scanners, RFID systems, and OCR technology, enable organizations to streamline processes, enhance data accuracy, and improve operational efficiency. According to recent research, the use of RFID technology in the retail sector has surged by 25% over the past five years, underpinning its significance in inventory management and supply chain optimization. Moreover, the integration of AIDC technologies with cloud computing services and data visualization dashboards offers real-time data access and analysis, empowering businesses to make informed decisions. For instance, a manufacturing firm can leverage RFID data to monitor production lines, optimize workflows, and ensure compliance with industry regulations. AIDC systems are also instrumental in enhancing data security and privacy, with advanced encryption protocols and access control features ensuring data integrity and confidentiality. By adopting AIDC technologies, organizations can not only improve their operational efficiency but also gain a competitive edge in their respective industries.
Unpacking the Automatic Identification And Data Capture Market Landscape
The market encompasses technologies such as RFID tag identification, data stream management, and data mining techniques. These solutions enable businesses to efficiently process and analyze vast amounts of data from various sources, leading to significant improvements in data quality metrics and workflow optimization strategies. For instance, RFID implementation can result in a 30% increase in inventory accuracy, while data mining techniques can uncover hidden patterns and trends, driving ROI improvement and compliance alignment. Real-time data processing, facilitated by technologies like document understanding AI and image recognition algorithms, ensures swift decision-making and error reduction. Data capture pipelines and database management systems provide a solid foundation for data aggregation and analysis, while semantic web technologies and natural language processing enhance information retrieval and understanding. By integrating sensor data and applying machine vision systems, businesses can achieve high-throughput imaging and object detection, further enhancing their data processing capabilities.
Key Market Drivers Fueling Growth
The significant expansion of RFID (Radio-Frequency Identification) technology applications is the primary market growth catalyst. In the dyna
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Wine Quality Dataset downloaded from https://archive.ics.uci.edu/dataset/186/wine+quality
keep original three csv file
change separator from ";" to ",". So you don't have to specify sep when you call pd.read_csv
----------------------------message from original author-------------------------------- Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib
Title: Wine Quality
Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
Past Usage:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).
Relevant Information:
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Number of Instances: red wine - 1599; white wine - 4898.
Number of Attributes: 11 + output attribute
Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.
Attribute information:
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Missing Attribute Values: None
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Endometriosis is a common benign disease in women of reproductive age. It has been defined as a disorder characterized by inflammation, compromised immunity, hormone dependence, and neuroangiogenesis. Unfortunately, the mechanisms of endometriosis have not yet been fully elucidated, and available treatment methods are currently limited. The discovery of new therapeutic drugs and improvements in existing treatment schemes remain the focus of research initiatives. Chinese medicine can improve the symptoms associated with endometriosis. Many Chinese herbal medicines could exert antiendometriosis effects via comprehensive interactions with multiple targets. However, these interactions have not been defined. This study used association rule mining and systems pharmacology to discover a method by which potential antiendometriosis herbs can be investigated. We analyzed various combinations and mechanisms of action of medicinal herbs to establish molecular networks showing interactions with multiple targets. The results showed that endometriosis treatment in Chinese medicine is mainly based on methods of supplementation with blood-activating herbs and strengthening qi. Furthermore, we used network pharmacology to analyze the main herbs that facilitate the decoding of multiscale mechanisms of the herbal compounds. We found that Chinese medicine could affect the development of endometriosis by regulating inflammation, immunity, angiogenesis, and other clusters of processes identified by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. The antiendometriosis effect of Chinese medicine occurs mainly through nervous system–associated pathways, such as the serotonergic synapse, the neurotrophin signaling pathway, and dopaminergic synapse, among others, to reduce pain. Chinese medicine could also regulate VEGF signaling, toll-like reporter signaling, NF-κB signaling, MAPK signaling, PI3K-Akt signaling, and the HIF-1 signaling pathway, among others. Synergies often exist in herb pairs and herbal prescriptions. In conclusion, we identified some important targets, target pairs, and regulatory networks, using bioinformatics and data mining. The combination of data mining and network pharmacology may offer an efficient method for drug discovery and development from herbal medicines.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database describes the main data and results of the dynamic unavailability/risk model for two application cases. In both application cases the results of the Dynamic Model (identified as Detailed Assessment - DA) is compared with the Standard PSA, based on components mean unavailabilities, and also with the Conventional Assessment (CA) that uses the same mean unavailability models corrected by instantaneous unavailability values at the specific time intervals of components outage due to maintenances and tests. The Detailed Assessment (DA) was evaluated considering different scenarios of components ageing, test degradation, test efficiency and maintenance effectiveness not included in the Conventional Assessment (CA) approach.
The data base is compound by pdf files with the main data and results of the two application cases:
The information of each application case is structured as follows:
Standard PSA . Basic events and Reliability data . Fault Tree . Base risk and most important Minimal Cut Sets . Importance analysis (Fussell-Vesely and RAW)
Operational history data for CA and DA
Basic events and Reliability data for CA and DA
System Design - Main single-line diagrams for the ventilation system.
Conventional Assessment (CA) . Time-dependent Risk assessment (Table and chart) . Importance analysis (Fussell-Vesely and RAW) . Time-dependent Importance Analysis for selected components (Tables and charts)
Detailed Assessment (DA) . Time-dependent Risk assessment for different scenarios of components ageing, test degradation, test efficiency and maintenance effectiveness (Tables and charts) . Importance analysis for different scenarios of components ageing, test degradation, test efficiency and maintenance effectiveness (Fussell-Vesely and RAW) . Time-dependent Importance Analysis for selected components for different scenarios of components ageing, test degradation, test efficiency and maintenance effectiveness (Tables and charts)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database describes the main data and results of the dynamic unavailability/risk model for two application cases. In both application cases the results of the Dynamic Model (identified as Detailed Assessment - DA) is compared with the Standard PSA, based on components mean unavailabilities, and also with the Conventional Assessment (CA) that uses the same mean unavailability models corrected by instantaneous unavailability values at the specific time intervals of components outage due to maintenances and tests. The Detailed Assessment (DA) was evaluated considering different scenarios of components ageing, test degradation, test efficiency and maintenance effectiveness not included in the Conventional Assessment (CA) approach.
The data base is compound by pdf files with the main data and results of the two application cases:
The information of each application case is structured as follows:
Standard PSA . Basic events and Reliability data . Fault Tree . Base risk and most important Minimal Cut Sets . Importance analysis (Fussell-Vesely and RAW)
Operational history data for CA and DA
Basic events and Reliability data for CA and DA
System Design - Main single-line diagrams for the ventilation system.
Conventional Assessment (CA) . Time-dependent Risk assessment (Table and chart) . Importance analysis (Fussell-Vesely and RAW) . Time-dependent Importance Analysis for selected components (Tables and charts)
Detailed Assessment (DA) . Time-dependent Risk assessment for different scenarios of components ageing, test degradation, test efficiency and maintenance effectiveness (Tables and charts) . Importance analysis for different scenarios of components ageing, test degradation, test efficiency and maintenance effectiveness (Fussell-Vesely and RAW) . Time-dependent Importance Analysis for selected components for different scenarios of components ageing, test degradation, test efficiency and maintenance effectiveness (Tables and charts)
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Anomaly Detection Market Size 2025-2029
The anomaly detection market size is valued to increase by USD 4.44 billion, at a CAGR of 14.4% from 2024 to 2029. Anomaly detection tools gaining traction in BFSI will drive the anomaly detection market.
Major Market Trends & Insights
North America dominated the market and accounted for a 43% growth during the forecast period.
By Deployment - Cloud segment was valued at USD 1.75 billion in 2023
By Component - Solution segment accounted for the largest market revenue share in 2023
Market Size & Forecast
Market Opportunities: USD 173.26 million
Market Future Opportunities: USD 4441.70 million
CAGR from 2024 to 2029 : 14.4%
Market Summary
Anomaly detection, a critical component of advanced analytics, is witnessing significant adoption across various industries, with the financial services sector leading the charge. The increasing incidence of internal threats and cybersecurity frauds necessitates the need for robust anomaly detection solutions. These tools help organizations identify unusual patterns and deviations from normal behavior, enabling proactive response to potential threats and ensuring operational efficiency. For instance, in a supply chain context, anomaly detection can help identify discrepancies in inventory levels or delivery schedules, leading to cost savings and improved customer satisfaction. In the realm of compliance, anomaly detection can assist in maintaining regulatory adherence by flagging unusual transactions or activities, thereby reducing the risk of penalties and reputational damage.
According to recent research, organizations that implement anomaly detection solutions experience a reduction in error rates by up to 25%. This improvement not only enhances operational efficiency but also contributes to increased customer trust and satisfaction. Despite these benefits, challenges persist, including data quality and the need for real-time processing capabilities. As the market continues to evolve, advancements in machine learning and artificial intelligence are expected to address these challenges and drive further growth.
What will be the Size of the Anomaly Detection Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
How is the Anomaly Detection Market Segmented ?
The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Deployment
Cloud
On-premises
Component
Solution
Services
End-user
BFSI
IT and telecom
Retail and e-commerce
Manufacturing
Others
Technology
Big data analytics
AI and ML
Data mining and business intelligence
Geography
North America
US
Canada
Mexico
Europe
France
Germany
Spain
UK
APAC
China
India
Japan
Rest of World (ROW)
By Deployment Insights
The cloud segment is estimated to witness significant growth during the forecast period.
The market is witnessing significant growth, driven by the increasing adoption of advanced technologies such as machine learning algorithms, predictive modeling tools, and real-time monitoring systems. Businesses are increasingly relying on anomaly detection solutions to enhance their root cause analysis, improve system health indicators, and reduce false positives. This is particularly true in sectors where data is generated in real-time, such as cybersecurity threat detection, network intrusion detection, and fraud detection systems. Cloud-based anomaly detection solutions are gaining popularity due to their flexibility, scalability, and cost-effectiveness.
This growth is attributed to cloud-based solutions' quick deployment, real-time data visibility, and customization capabilities, which are offered at flexible payment options like monthly subscriptions and pay-as-you-go models. Companies like Anodot, Ltd, Cisco Systems Inc, IBM Corp, and SAS Institute Inc provide both cloud-based and on-premise anomaly detection solutions. Anomaly detection methods include outlier detection, change point detection, and statistical process control. Data preprocessing steps, such as data mining techniques and feature engineering processes, are crucial in ensuring accurate anomaly detection. Data visualization dashboards and alert fatigue mitigation techniques help in managing and interpreting the vast amounts of data generated.
Network traffic analysis, log file analysis, and sensor data integration are essential components of anomaly detection systems. Additionally, risk management frameworks, drift detection algorithms, time series forecasting, and performance degradation detection are vital in maintaining system performance and capacity planning.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.
Facebook
TwitterBackgroundAmiodarone and dronedarone are both class III antiarrhythmic medications used to treat arrhythmias. The objective of this study was to enhance the current understanding of adverse drug reaction (ADR) associated with amiodarone and dronedarone by employing data mining methods on the U.S. Food and Drug Administration Adverse Event Reporting System (FAERS), and providing a reference for safe and reasonable clinical use.MethodsThe ADR records were selected by searching the FAERS database from 2011 Q3 to 2023 Q3. The disproportionality analysis algorithms, including Reporting Odds Ratio (ROR), Proportional Reporting Ratio (PRR), Bayesian Confidence Propagation Neural Network (BCPNN), and Empirical Bayesian Geometric Mean (EBGM), were used to detect signals of amiodarone-related and dronedarone-related ADRs. The ADR profiles of amiodarone and dronedarone categorized by organ toxicity were compared through the Z-test and the Fisher exact test.Results9,295 reports specifically mentioned the use of amiodarone and 2,485 reports mentioned the use of dronedarone among 9,972,109 reports, with the majority of ADRs occurring in males over 60 years old. The United States was responsible for the highest proportion of reported ADRs. Significant system organ classes (SOC) for both included Cardiac disorders, Respiratory, thoracic and mediastinal disorders, and Investigations, etc. At the preferred terms (PTs) level, the more frequent ADR signals for amiodarone were drug interaction (n = 856), hyperthyroidism (n = 758), and dyspnoea (n = 607), while dronedarone were atrial fibrillation (n = 371), dyspnoea (n = 204), and blood creatinine increased (n = 123). Notably, unexpected ADRs, including electrocardiogram T wave alternans (n = 16; EBGM05 = 231.27), accessory cardiac pathway (n = 11; EBGM05 = 140), thyroiditis (n = 178; EBGM05 = 125.91) for amiodarone, and cardiac ablation (n = 11; EBGM05 = 31.86), cardioversion (n = 7; EBGM05 = 22.69), and dysphagia (n = 47; EBGM05 = 3.6) for dronedarone, were uncovered in the instructions. The analysis also revealed significant differences in the ADR profiles of amiodarone and dronedarone, with dronedarone showing higher proportions of cardiac toxicity but lower thyroid toxicity compared to amiodarone.ConclusionThese findings underscore the significance of vigilantly monitoring and comprehending the potential risks linked to the use of amiodarone and dronedarone. New ADRs discovered and clear ADR profiles of amiodarone and dronedarone enhance a thorough understanding of these drugs, which is essential for clinicians to ensure safe use of amiodarone and dronedarone.
Facebook
TwitterThis chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.