100+ datasets found

Statistical analysis of normotension vs. hypertension in women and men.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bum Ju Lee; Jong Yeol Kim (2023). Statistical analysis of normotension vs. hypertension in women and men. [Dataset]. http://doi.org/10.1371/journal.pone.0084897.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0084897.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Bum Ju Lee; Jong Yeol Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The AUC values were calculated using 10-fold cross validation. OR: odds ratios; AUC: area under the receiver operating characteristic curve; LR: logistic regression; NB: naïve Bayes.
Z
Data Analysis for the Systematic Literature Review of DL4SE
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
College of William and Mary
Washington and Lee University
Authors
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
m
Educational Attainment in North Carolina Public Schools: Use of statistical...
data.mendeley.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
Explore at:
Unique identifier
https://doi.org/10.17632/6cm9wyd5g5.1
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
d
Data Mining in Systems Health Management
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Statistical analysis of normotension and hypotension in women and men.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bum Ju Lee; Jong Yeol Kim (2023). Statistical analysis of normotension and hypotension in women and men. [Dataset]. http://doi.org/10.1371/journal.pone.0084897.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0084897.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Bum Ju Lee; Jong Yeol Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The AUC values were calculated using 10-fold cross validation. OR: odds ratios; AUC: area under the receiver operating characteristic curve; LR: logistic regression; NB: naïve Bayes.
d
Data from: Discovering System Health Anomalies using Data Mining Techniques
catalog.data.gov
s.cnmilf.com
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering System Health Anomalies using Data Mining Techniques [Dataset]. https://catalog.data.gov/dataset/discovering-system-health-anomalies-using-data-mining-techniques
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.
Webinar Series on Child Welfare Administrative Data and Program...
catalog.data.gov
healthdata.gov
+1more
Updated Sep 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Administration for Children and Families (2025). Webinar Series on Child Welfare Administrative Data and Program Sustainability [Dataset]. https://catalog.data.gov/dataset/webinar-series-on-child-welfare-administrative-data-and-program-sustainability
Explore at:
Dataset updated
Sep 30, 2025
Dataset provided by
Administration for Children and Families
Description
This first webinar discusses strategies for mining administrative data to assess the characteristics and needs of at-risk child welfare populations. Using examples from a federal Permanency Innovations Initiative (PII) grantee in Illinois, Dr. Dana Weiner identifies the key requirements of productive data mining, steps in the data mining process, and useful statistical techniques for analyzing and making sense of administrative data. This second webinar discusses propensity score matching (PSM) as a methodologically rigorous alternative to randomized controlled trials (RCTs). Using examples of grantees funded through the federal Permanency Innovations Initiative (PII), Mr. Andrew Barclay discusses the theory underlying PSM, techniques for implementing PSM and validating the results, and caveats and limitations of this statistical technique. This third webinar reviews strategies for using evaluation findings to help sustain program and evaluation activities following the end of federal funding. The sustainability planning and activities of two grantees funded through the federal Permanency Innovations Initiative (PII) – North Carolina Department of Social Services (funded in 2011 for five years) and Western Michigan University (funded in 2012 for five years) – are reviewed and discussed in detail. Metadata-only record linking to the original dataset. Open original dataset below.
f
Spatial Covariance Reconstructive (SCORE) Super-Resolution Fluorescence...
plos.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi Deng; Mingzhai Sun; Pei-Hui Lin; Jianjie Ma; Joshua W. Shaevitz (2023). Spatial Covariance Reconstructive (SCORE) Super-Resolution Fluorescence Microscopy [Dataset]. http://doi.org/10.1371/journal.pone.0094807
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0094807
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Yi Deng; Mingzhai Sun; Pei-Hui Lin; Jianjie Ma; Joshua W. Shaevitz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Super-resolution fluorescence microscopy has become a powerful tool to resolve structural information that is not accessible to traditional diffraction-limited imaging techniques such as confocal microscopy. Stochastic optical reconstruction microscopy (STORM) and photoactivation localization microscopy (PALM) are promising super-resolution techniques due to their relative ease of implementation and instrumentation on standard microscopes. However, the application of STORM is critically limited by its long sampling time. Several recent works have been focused on improving the STORM imaging speed by making use of the information from emitters with overlapping point spread functions (PSF). In this work, we present a fast and efficient algorithm that takes into account the blinking statistics of independent fluorescence emitters. We achieve sub-diffraction lateral resolution of 100 nm from 5 to 7 seconds of imaging. Our method is insensitive to background and can be applied to different types of fluorescence sources, including but not limited to the organic dyes and quantum dots that we demonstrate in this work.

Global Life Science Data Mining and Visualization Software Market Research...

wiseguyreports.com

Updated Sep 15, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Life Science Data Mining and Visualization Software Market Research Report: By Application (Drug Discovery, Clinical Data Management, Genomic Research, Patient Data Analysis), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By End User (Pharmaceutical Companies, Biotechnology Firms, Research Organizations, Academic Institutions), By Functionality (Data Mining, Data Visualization, Predictive Analytics, Statistical Analysis) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/life-science-data-mining-and-visualization-software-market

Explore at:

Dataset updated

Sep 15, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Sep 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	5.92(USD Billion)
MARKET SIZE 2025	6.34(USD Billion)
MARKET SIZE 2035	12.5(USD Billion)
SEGMENTS COVERED	Application, Deployment Type, End User, Functionality, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	Increasing data complexity, Growing demand for analytics, Rising need for regulatory compliance, Advancements in AI technologies, Enhanced data visualization techniques
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	RapidMiner, Elsevier, IBM, BioStat, Palantir Technologies, Oracle, Tableau, Altair Engineering, Biovia, Microsoft, Wolfram Research, Minitab, Cytel, TIBCO Software, SAS Institute, Qlik
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Growing demand for personalized medicine, Advancements in big data analytics, Increasing use of AI and ML technologies, Rising adoption of cloud-based solutions, Expanding regulatory compliance requirements
COMPOUND ANNUAL GROWTH RATE (CAGR)	7.1% (2025 - 2035)

Forecast revenue big data market worldwide 2011-2027
statista.com
Updated Mar 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2018). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
Explore at:
Dataset updated
Mar 15, 2018
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027. What is Big data? Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. Big data analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.
Data from: Enriching time series datasets using Nonparametric kernel...
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1609661.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Mohamad Ivan Fanany
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.
Software Architectural Styles
kaggle.com
zip
Updated Mar 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
QadeemKhan (2017). Software Architectural Styles [Dataset]. https://www.kaggle.com/qadeemkhan/dataset-of-software-architectural-styles
Explore at:
zip(40263 bytes)Available download formats
Dataset updated
Mar 27, 2017
Authors
QadeemKhan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Software systems are composed of one or more software architectural styles. These styles define the usage patterns of a programmer in order to develop a complex project. These architectural styles are required to analyze for pattern similarity in the structure of multiple groups of projects. The researcher can apply different types of data mining algorithms to analyze the software projects through architectural styles used. The dataset is obtained from an online questionnaire delivered to the world 's best academic and software industry.

Content

The content of this dataset are multiple architectural styles utilized by the system. He attributes are Repository, Client Server, Abstract Machine,Object Oriented,Function Oriented,Event Driven,Layered, Pipes & Filters, Data centeric, Blackboard, Rule Based, Publish Subscribe, Asynchronous Messaging, Plug-ins, Microkernel, Peer-to-Peer, Domain Driven, Shared Nothing.

Acknowledgements

Thanks to my honorable teacher Prof.Dr Usman Qamar for guiding me to accomplish this wonderful task.

Inspiration

The dataset is capable of updating and refinements.Any researcher ,who want to contribute ,plz feel free to ask.

Global Advanced and Predictive Analytics Tool Market Research Report: By...

wiseguyreports.com

Updated Sep 15, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Advanced and Predictive Analytics Tool Market Research Report: By Deployment Mode (On-Premises, Cloud-Based, Hybrid), By Application (Fraud Detection, Customer Analytics, Risk Management, Supply Chain Optimization), By End Use Industry (BFSI, Healthcare, Retail, Telecommunications, Manufacturing), By Technology (Machine Learning, Artificial Intelligence, Data Mining, Statistical Analysis) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/advanced-and-predictive-analytics-tool-market

Explore at:

Dataset updated

Sep 15, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Sep 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	6.83(USD Billion)
MARKET SIZE 2025	7.52(USD Billion)
MARKET SIZE 2035	20.0(USD Billion)
SEGMENTS COVERED	Deployment Mode, Application, End Use Industry, Technology, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	Increasing data volume, Demand for real-time insights, Adoption of AI technologies, Growing need for predictive maintenance, Rising focus on customer experience
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	RapidMiner, IBM, Domo, Oracle, Infor, Salesforce, Tableau, MathWorks, Apache Software Foundation, SAP, Microsoft, StatSoft, TIBCO Software, SAS Institute, Alteryx, Qlik
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Real-time data processing capabilities, Enhanced machine learning integration, Growing demand for data-driven decisions, Increased adoption in SMEs, Cloud-based analytics implementation
COMPOUND ANNUAL GROWTH RATE (CAGR)	10.2% (2025 - 2035)

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
pdf
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Spain, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/anomaly-detection-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jun 12, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
Canada, United States
Description
Snapshot img

Anomaly Detection Market Size 2025-2029

The anomaly detection market size is valued to increase by USD 4.44 billion, at a CAGR of 14.4% from 2024 to 2029. Anomaly detection tools gaining traction in BFSI will drive the anomaly detection market.

Major Market Trends & Insights

North America dominated the market and accounted for a 43% growth during the forecast period. By Deployment - Cloud segment was valued at USD 1.75 billion in 2023 By Component - Solution segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 173.26 million Market Future Opportunities: USD 4441.70 million CAGR from 2024 to 2029 : 14.4%

Market Summary

Anomaly detection, a critical component of advanced analytics, is witnessing significant adoption across various industries, with the financial services sector leading the charge. The increasing incidence of internal threats and cybersecurity frauds necessitates the need for robust anomaly detection solutions. These tools help organizations identify unusual patterns and deviations from normal behavior, enabling proactive response to potential threats and ensuring operational efficiency. For instance, in a supply chain context, anomaly detection can help identify discrepancies in inventory levels or delivery schedules, leading to cost savings and improved customer satisfaction. In the realm of compliance, anomaly detection can assist in maintaining regulatory adherence by flagging unusual transactions or activities, thereby reducing the risk of penalties and reputational damage. According to recent research, organizations that implement anomaly detection solutions experience a reduction in error rates by up to 25%. This improvement not only enhances operational efficiency but also contributes to increased customer trust and satisfaction. Despite these benefits, challenges persist, including data quality and the need for real-time processing capabilities. As the market continues to evolve, advancements in machine learning and artificial intelligence are expected to address these challenges and drive further growth.

What will be the Size of the Anomaly Detection Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

How is the Anomaly Detection Market Segmented ?

The anomaly detection industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment Cloud On-premises Component Solution Services End-user BFSI IT and telecom Retail and e-commerce Manufacturing Others Technology Big data analytics AI and ML Data mining and business intelligence Geography North America US Canada Mexico Europe France Germany Spain UK APAC China India Japan Rest of World (ROW)

By Deployment Insights

The cloud segment is estimated to witness significant growth during the forecast period.

The market is witnessing significant growth, driven by the increasing adoption of advanced technologies such as machine learning algorithms, predictive modeling tools, and real-time monitoring systems. Businesses are increasingly relying on anomaly detection solutions to enhance their root cause analysis, improve system health indicators, and reduce false positives. This is particularly true in sectors where data is generated in real-time, such as cybersecurity threat detection, network intrusion detection, and fraud detection systems. Cloud-based anomaly detection solutions are gaining popularity due to their flexibility, scalability, and cost-effectiveness.

This growth is attributed to cloud-based solutions' quick deployment, real-time data visibility, and customization capabilities, which are offered at flexible payment options like monthly subscriptions and pay-as-you-go models. Companies like Anodot, Ltd, Cisco Systems Inc, IBM Corp, and SAS Institute Inc provide both cloud-based and on-premise anomaly detection solutions. Anomaly detection methods include outlier detection, change point detection, and statistical process control. Data preprocessing steps, such as data mining techniques and feature engineering processes, are crucial in ensuring accurate anomaly detection. Data visualization dashboards and alert fatigue mitigation techniques help in managing and interpreting the vast amounts of data generated.

Network traffic analysis, log file analysis, and sensor data integration are essential components of anomaly detection systems. Additionally, risk management frameworks, drift detection algorithms, time series forecasting, and performance degradation detection are vital in maintaining system performance and capacity planning.
Main technologies per senior business executives globally in 2023
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Main technologies per senior business executives globally in 2023 [Dataset]. https://www.statista.com/statistics/1361095/top-ten-technologies-per-senior-business-executives-in-selected-countries/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Worldwide
Description
During a 2023 survey conducted in a variety of countries across the globe, it was found that 50 percent of respondents considered artificial intelligence (AI) to be a technology of strategic importance and would prioritize it in the coming year. 5G came in hot on the heels of AI, with 46 percent of respondents saying they would prioritize it.

Artificial intelligence

Artificial intelligence refers to the development of computer and machine skills to mimic human mind capabilities, such as problem-solving and decision-making. Particularly, AI learns from previous experiences to understand and respond to language, decisions, and problems. In recent years, more and more industries have adopted AI, from automotive to retail to healthcare, deployed to perform a variety of different tasks, including service operations and supply chain management. However, given its fast development, AI is not only affecting industries and job markets but is also impacting our everyday life.

Big data analytics

The expression “big data” indicates extremely large data sets that are difficult to process using traditional data-processing application software. In recent years, the size of the big data analytics market has increased and is forecast to amount to over 308 billion U.S. dollars in 2023. The growth of the big data analytics market has been fueled by the exponential growth in the volume of data exchanged online via a variety of sources, ranging from healthcare to social media. Tech giants like Oracle, Microsoft, and IBM form part of the market, providing big data analytics software tools for predictive analytics, forecasting, data mining, and optimization.

Global AI Data Analysis Platform Market Research Report: By Application...

wiseguyreports.com

Updated Aug 18, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global AI Data Analysis Platform Market Research Report: By Application (Predictive Analytics, Machine Learning, Data Visualization, Natural Language Processing), By Deployment Type (Cloud-Based, On-Premises, Hybrid), By End User (BFSI, Healthcare, Retail, IT and Telecom, Manufacturing), By Technology (Deep Learning, Neural Networks, Data Mining, Statistical Analysis) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/ai-data-analysis-platform-market

Explore at:

Dataset updated

Aug 18, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Aug 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	3.75(USD Billion)
MARKET SIZE 2025	4.25(USD Billion)
MARKET SIZE 2035	15.0(USD Billion)
SEGMENTS COVERED	Application, Deployment Type, End User, Technology, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	Rapid technological advancements, Increasing demand for data-driven insights, Growing adoption of cloud computing, Rise in automation and efficiency, Expanding regulatory compliance requirements
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	NVIDIA, MicroStrategy, Microsoft, Google, Alteryx, Oracle, Domo, SAP, SAS Institute, DataRobot, Amazon, Qlik, Siemens, TIBCO Software, Palantir Technologies, Salesforce, IBM
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Increased demand for real-time analytics, Growth of big data applications, Rising cloud adoption for data solutions, Expanding AI technology integration, Focus on predictive analytics capabilities
COMPOUND ANNUAL GROWTH RATE (CAGR)	13.4% (2025 - 2035)

Online Data Science Training Programs Market Analysis, Size, and Forecast...
technavio.com
pdf
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Online Data Science Training Programs Market Analysis, Size, and Forecast 2025-2029: North America (Mexico), Europe (France, Germany, Italy, and UK), Middle East and Africa (UAE), APAC (Australia, China, India, Japan, and South Korea), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/online-data-science-training-programs-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Feb 12, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Description
Snapshot img

Online Data Science Training Programs Market Size 2025-2029

The online data science training programs market size is forecast to increase by USD 8.67 billion, at a CAGR of 35.8% between 2024 and 2029.

The market is experiencing significant growth due to the increasing demand for data science professionals in various industries. The job market offers lucrative opportunities for individuals with data science skills, making online training programs an attractive option for those seeking to upskill or reskill. Another key driver in the market is the adoption of microlearning and gamification techniques in data science training. These approaches make learning more engaging and accessible, allowing individuals to acquire new skills at their own pace. Furthermore, the availability of open-source learning materials has democratized access to data science education, enabling a larger pool of learners to enter the field. However, the market also faces challenges, including the need for continuous updates to keep up with the rapidly evolving data science landscape and the lack of standardization in online training programs, which can make it difficult for employers to assess the quality of graduates. Companies seeking to capitalize on market opportunities should focus on offering up-to-date, high-quality training programs that incorporate microlearning and gamification techniques, while also addressing the challenges of continuous updates and standardization. By doing so, they can differentiate themselves in a competitive market and meet the evolving needs of learners and employers alike.

What will be the Size of the Online Data Science Training Programs Market during the forecast period?

Request Free SampleThe online data science training market continues to evolve, driven by the increasing demand for data-driven insights and innovations across various sectors. Data science applications, from computer vision and deep learning to natural language processing and predictive analytics, are revolutionizing industries and transforming business operations. Industry case studies showcase the impact of data science in action, with big data and machine learning driving advancements in healthcare, finance, and retail. Virtual labs enable learners to gain hands-on experience, while data scientist salaries remain competitive and attractive. Cloud computing and data science platforms facilitate interactive learning and collaborative research, fostering a vibrant data science community. Data privacy and security concerns are addressed through advanced data governance and ethical frameworks. Data science libraries, such as TensorFlow and Scikit-Learn, streamline the development process, while data storytelling tools help communicate complex insights effectively. Data mining and predictive analytics enable organizations to uncover hidden trends and patterns, driving innovation and growth. The future of data science is bright, with ongoing research and development in areas like data ethics, data governance, and artificial intelligence. Data science conferences and education programs provide opportunities for professionals to expand their knowledge and expertise, ensuring they remain at the forefront of this dynamic field.

How is this Online Data Science Training Programs Industry segmented?

The online data science training programs industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. TypeProfessional degree coursesCertification coursesApplicationStudentsWorking professionalsLanguageR programmingPythonBig MLSASOthersMethodLive streamingRecordedProgram TypeBootcampsCertificatesDegree ProgramsGeographyNorth AmericaUSMexicoEuropeFranceGermanyItalyUKMiddle East and AfricaUAEAPACAustraliaChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)

By Type Insights

The professional degree courses segment is estimated to witness significant growth during the forecast period.The market encompasses various segments catering to diverse learning needs. The professional degree course segment holds a significant position, offering comprehensive and in-depth training in data science. This segment's curriculum covers essential aspects such as statistical analysis, machine learning, data visualization, and data engineering. Delivered by industry professionals and academic experts, these courses ensure a high-quality education experience. Interactive learning environments, including live lectures, webinars, and group discussions, foster a collaborative and engaging experience. Data science applications, including deep learning, computer vision, and natural language processing, are integral to the market's growth. Data analysis, a crucial application, is gaining traction due to the increasing demand for data-driven decisio
m
COVID-19 Combined Data-set with Improved Measurement Errors
data.mendeley.com
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
Explore at:
Unique identifier
https://doi.org/10.17632/nw5m4hs3jr.3
Dataset updated
May 13, 2020
Authors
Afshin Ashofteh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
D
Data Analytics Market Report
archivemarketresearch.com
doc, pdf, ppt
Updated Aug 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Data Analytics Market Report [Dataset]. https://www.archivemarketresearch.com/reports/data-analytics-market-5695
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Aug 9, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
global
Variables measured
Market Size
Description
The size of the Data Analytics Market market was valued at USD 57.76 billion in 2023 and is projected to reach USD 302.74 billion by 2032, with an expected CAGR of 26.7 % during the forecast period. The data analytics market encompasses tools and technologies that analyze and interpret complex data sets to derive actionable insights. It involves techniques such as data mining, predictive analytics, and statistical analysis, enabling organizations to make informed decisions. Key uses include improving operational efficiency, enhancing customer experiences, and driving strategic planning across industries like healthcare, finance, and retail. Applications range from fraud detection and risk management to marketing optimization and supply chain management. Current trends highlight the growing adoption of artificial intelligence and machine learning for advanced analytics, the rise of real-time data processing, and an increasing focus on data privacy and security. As businesses seek to leverage data for competitive advantage, the demand for analytics solutions continues to grow.
Baseline characteristics and brief descriptions of the anthropometric...
plos.figshare.com
figshare.com
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bum Ju Lee; Jong Yeol Kim (2023). Baseline characteristics and brief descriptions of the anthropometric indices used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0084897.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0084897.t001
Dataset updated
Jun 5, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Bum Ju Lee; Jong Yeol Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data are expressed as the mean (standard deviation).

Facebook

Twitter

Click to copy link

Link copied

Cite

Bum Ju Lee; Jong Yeol Kim (2023). Statistical analysis of normotension vs. hypertension in women and men. [Dataset]. http://doi.org/10.1371/journal.pone.0084897.t002

Statistical analysis of normotension vs. hypertension in women and men.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0084897.t002

Dataset updated

Jun 4, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Bum Ju Lee; Jong Yeol Kim

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The AUC values were calculated using 10-fold cross validation. OR: odds ratios; AUC: area under the receiver operating characteristic curve; LR: logistic regression; NB: naïve Bayes.

Clear search

Close search

Google apps

Main menu

Statistical analysis of normotension vs. hypertension in women and men.

Data Analysis for the Systematic Literature Review of DL4SE

Educational Attainment in North Carolina Public Schools: Use of statistical...

Data Mining in Systems Health Management

Statistical analysis of normotension and hypotension in women and men.

Data from: Discovering System Health Anomalies using Data Mining Techniques

Webinar Series on Child Welfare Administrative Data and Program...

Spatial Covariance Reconstructive (SCORE) Super-Resolution Fluorescence...

Global Life Science Data Mining and Visualization Software Market Research...

Forecast revenue big data market worldwide 2011-2027

Data from: Enriching time series datasets using Nonparametric kernel...

Software Architectural Styles

Context

Content

Acknowledgements

Inspiration

Global Advanced and Predictive Analytics Tool Market Research Report: By...

Anomaly Detection Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Main technologies per senior business executives globally in 2023

Global AI Data Analysis Platform Market Research Report: By Application...

Online Data Science Training Programs Market Analysis, Size, and Forecast...

Snapshot img

COVID-19 Combined Data-set with Improved Measurement Errors

Data Analytics Market Report

Baseline characteristics and brief descriptions of the anthropometric...

Statistical analysis of normotension vs. hypertension in women and men.