100+ datasets found

d
Data Mining in Systems Health Management
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf
frontiersin.figshare.com
pdf
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2018.02231.s001
Dataset updated
Jun 7, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Xin Qiao; Hong Jiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.
Data from: Enriching time series datasets using Nonparametric kernel...
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1609661.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Mohamad Ivan Fanany
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.
Z
Data Analysis for the Systematic Literature Review of DL4SE
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Washington and Lee University
College of William and Mary
Authors
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
e
U.S. Data Analysis Storage Management Market Research Report By Product Type...
exactitudeconsultancy.com
Updated Mar 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exactitude Consultancy (2025). U.S. Data Analysis Storage Management Market Research Report By Product Type (On-Premises, Cloud-Based), By Application (Data Warehousing, Data Mining, Big Data Analytics), By End User (Healthcare, BFSI, Retail, IT and Telecom), By Technology (Hadoop, SQL Databases, NoSQL Databases), By Distribution Channel (Direct Sales, Online Sales) – Forecast to 2034. [Dataset]. https://exactitudeconsultancy.com/reports/50774/u-s-data-analysis-storage-management-market
Explore at:
Dataset updated
Mar 2025
Dataset authored and provided by
Exactitude Consultancy
License
https://exactitudeconsultancy.com/privacy-policyhttps://exactitudeconsultancy.com/privacy-policy
Description
The U.S. Data Analysis Storage Management market is projected to be valued at $10 billion in 2024, driven by factors such as increasing consumer awareness and the rising prevalence of industry-specific trends. The market is expected to grow at a CAGR of 12%, reaching approximately $31 billion by 2034.

Global Big Data Tool Market Research Report: By Deployment Model...

wiseguyreports.com

Updated Sep 15, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Big Data Tool Market Research Report: By Deployment Model (On-Premises, Cloud-Based, Hybrid), By Tool Type (Data Processing Tools, Data Visualization Tools, Data Mining Tools, Data Storage Tools), By End User (BFSI, Healthcare, Retail, Manufacturing, IT and Telecommunications), By Application (Fraud Detection, Customer Analytics, Predictive Maintenance, Supply Chain Management) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/big-data-tool-market

Explore at:

Dataset updated

Sep 15, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Sep 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	39.4(USD Billion)
MARKET SIZE 2025	43.3(USD Billion)
MARKET SIZE 2035	110.5(USD Billion)
SEGMENTS COVERED	Deployment Model, Tool Type, End User, Application, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	Increasing data volume, Demand for real-time analytics, Growing cloud adoption, Rising need for data security, Advancements in AI technologies
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Informatica, IBM, Amazon Web Services, Snowflake, DataRobot, Domo, Oracle, SAP, Microsoft, MongoDB, Cloudera, Google, SAS Institute, Teradata, Qlik, Hortonworks
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Increased demand for analytics solutions, Growth in cloud-based big data tools, Rise of AI and machine learning integration, Expansion in real-time data processing, Enhanced data privacy and security needs
COMPOUND ANNUAL GROWTH RATE (CAGR)	9.8% (2025 - 2035)

Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
pdf
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Feb 8, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
United States
Description
Snapshot img

Data Science Platform Market Size 2025-2029

The data science platform market size is valued to increase USD 763.9 million, at a CAGR of 40.2% from 2024 to 2029. Integration of AI and ML technologies with data science platforms will drive the data science platform market.

Major Market Trends & Insights

North America dominated the market and accounted for a 48% growth during the forecast period. By Deployment - On-premises segment was valued at USD 38.70 million in 2023 By Component - Platform segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 1.00 million Market Future Opportunities: USD 763.90 million CAGR : 40.2% North America: Largest market in 2023

Market Summary

The market represents a dynamic and continually evolving landscape, underpinned by advancements in core technologies and applications. Key technologies, such as machine learning and artificial intelligence, are increasingly integrated into data science platforms to enhance predictive analytics and automate data processing. Additionally, the emergence of containerization and microservices in data science platforms enables greater flexibility and scalability. However, the market also faces challenges, including data privacy and security risks, which necessitate robust compliance with regulations. According to recent estimates, the market is expected to account for over 30% of the overall big data analytics market by 2025, underscoring its growing importance in the data-driven business landscape.

What will be the Size of the Data Science Platform Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free Sample

How is the Data Science Platform Market Segmented and what are the key trends of market segmentation?

The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Deployment On-premises Cloud Component Platform Services End-user BFSI Retail and e-commerce Manufacturing Media and entertainment Others Sector Large enterprises SMEs Application Data Preparation Data Visualization Machine Learning Predictive Analytics Data Governance Others Geography North America US Canada Europe France Germany UK Middle East and Africa UAE APAC China India Japan South America Brazil Rest of World (ROW)

By Deployment Insights

The on-premises segment is estimated to witness significant growth during the forecast period.

In the dynamic and evolving the market, big data processing is a key focus, enabling advanced model accuracy metrics through various data mining methods. Distributed computing and algorithm optimization are integral components, ensuring efficient handling of large datasets. Data governance policies are crucial for managing data security protocols and ensuring data lineage tracking. Software development kits, model versioning, and anomaly detection systems facilitate seamless development, deployment, and monitoring of predictive modeling techniques, including machine learning algorithms, regression analysis, and statistical modeling. Real-time data streaming and parallelized algorithms enable real-time insights, while predictive modeling techniques and machine learning algorithms drive business intelligence and decision-making.

Cloud computing infrastructure, data visualization tools, high-performance computing, and database management systems support scalable data solutions and efficient data warehousing. ETL processes and data integration pipelines ensure data quality assessment and feature engineering techniques. Clustering techniques and natural language processing are essential for advanced data analysis. The market is witnessing significant growth, with adoption increasing by 18.7% in the past year, and industry experts anticipate a further expansion of 21.6% in the upcoming period. Companies across various sectors are recognizing the potential of data science platforms, leading to a surge in demand for scalable, secure, and efficient solutions.

API integration services and deep learning frameworks are gaining traction, offering advanced capabilities and seamless integration with existing systems. Data security protocols and model explainability methods are becoming increasingly important, ensuring transparency and trust in data-driven decision-making. The market is expected to continue unfolding, with ongoing advancements in technology and evolving business needs shaping its future trajectory.

Request Free Sample

The On-premises segment was valued at USD 38.70 million in 2019 and showed
D
Data Processing and Hosting Services Industry Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Data Processing and Hosting Services Industry Report [Dataset]. https://www.marketreportanalytics.com/reports/data-processing-and-hosting-services-industry-89228
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Apr 26, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Processing and Hosting Services market, exhibiting a Compound Annual Growth Rate (CAGR) of 4.20%, presents a significant opportunity for growth. While the exact market size in millions is not specified, considering the substantial involvement of major players like Amazon Web Services, IBM, and Salesforce, coupled with the pervasive adoption of cloud computing and big data analytics across diverse sectors, a 2025 market size exceeding $500 billion is a reasonable estimate. This robust growth is driven by several key factors. The increasing reliance on cloud-based solutions by both large enterprises and SMEs reflects a shift towards greater scalability, flexibility, and cost-effectiveness. Furthermore, the exponential growth of data necessitates advanced data processing capabilities, fueling demand for data mining, cleansing, and management services. The burgeoning adoption of AI and machine learning further enhances this need, as these technologies require robust data infrastructure and sophisticated processing techniques. Specific industry segments like IT & Telecommunications, BFSI (Banking, Financial Services, and Insurance), and Retail are major consumers, demanding reliable and secure hosting solutions and data processing capabilities to manage their critical operations and customer data. However, challenges remain, including the ongoing threat of cyberattacks and data breaches, necessitating robust security measures and compliance with evolving data privacy regulations. Competition among existing players is intense, driving innovation and price wars, which can impact profitability for some market participants. The forecast period of 2025-2033 indicates a continued upward trajectory for the market, largely fueled by expanding digitalization efforts globally. The Asia Pacific region is projected to be a significant contributor to this growth, driven by increasing internet penetration and a burgeoning technological landscape. While North America and Europe maintain substantial market share, the faster growth rate anticipated in Asia Pacific and other emerging markets signifies an evolving global market dynamic. Continued advancements in technologies such as edge computing, serverless architecture, and improved data analytics techniques will further drive market expansion and shape the competitive landscape. The segmentation within the market (by organization size, service offering, and end-user industry) presents diverse investment opportunities for businesses catering to specific needs and technological advancements within these niches. Recent developments include: December 2022 - TetraScience, the Scientific Data Cloud company, announced that Gubbs, a lab optimization, and validation software leader, joined the Tetra Partner Network to increase and enhance data processing throughput with the Tetra Scientific Data Cloud., November 2022 - Kinsta, a hosting provider that provides managed WordPress hosting powered by Google Cloud Platform, announced the launch of Application Hosting and Database Hosting. It is adding these two hosting services to its Managed WordPress product ushers in a new era for Kinsta as a Cloud Platform, enabling developers and businesses to run powerful applications, databases, websites, and services more flexibly than ever.. Key drivers for this market are: Growing Adoption of Cloud Computing to Accomplish Economies of Scale, Rising Demand for Outsourcing Data Processing Services. Potential restraints include: Growing Adoption of Cloud Computing to Accomplish Economies of Scale, Rising Demand for Outsourcing Data Processing Services. Notable trends are: Web Hosting is Gaining Traction Due to Emergence of Cloud-based Platform.
Video-to-Model Data Set
figshare.com
commons.datacite.org
xml
Updated Mar 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz (2020). Video-to-Model Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.12026850.v1
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12026850.v1
Dataset updated
Mar 24, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Sönke Knoch; Shreeraman Ponpathirkoottam; Tim Schwartz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set belongs to the paper "Video-to-Model: Unsupervised Trace Extraction from Videos for Process Discovery and Conformance Checking in Manual Assembly", submitted on March 24, 2020, to the 18th International Conference on Business Process Management (BPM).Abstract: Manual activities are often hidden deep down in discrete manufacturing processes. For the elicitation and optimization of process behavior, complete information about the execution of Manual activities are required. Thus, an approach is presented on how execution level information can be extracted from videos in manual assembly. The goal is the generation of a log that can be used in state-of-the-art process mining tools. The test bed for the system was lightweight and scalable consisting of an assembly workstation equipped with a single RGB camera recording only the hand movements of the worker from top. A neural network based real-time object classifier was trained to detect the worker’s hands. The hand detector delivers the input for an algorithm, which generates trajectories reflecting the movement paths of the hands. Those trajectories are automatically assigned to work steps using the position of material boxes on the assembly shelf as reference points and hierarchical clustering of similar behaviors with dynamic time warping. The system has been evaluated in a task-based study with ten participants in a laboratory, but under realistic conditions. The generated logs have been loaded into the process mining toolkit ProM to discover the underlying process model and to detect deviations from both, instructions and ground truth, using conformance checking. The results show that process mining delivers insights about the assembly process and the system’s precision.The data set contains the generated and the annotated logs based on the video material gathered during the user study. In addition, the petri nets from the process discovery and conformance checking conducted with ProM (http://www.promtools.org) and the reference nets modeled with Yasper (http://www.yasper.org/) are provided.
d
Privacy Preserving Distributed Data Mining
catalog.data.gov
s.cnmilf.com
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Privacy Preserving Distributed Data Mining [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-distributed-data-mining
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:
Data Mining Tools Market - A Global and Regional Analysis
bisresearch.com
csv, pdf
Updated Nov 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bisresearch (2025). Data Mining Tools Market - A Global and Regional Analysis [Dataset]. https://bisresearch.com/industry-report/global-data-mining-tools-market.html
Explore at:
csv, pdfAvailable download formats
Dataset updated
Nov 30, 2025
Dataset authored and provided by
Bisresearch
License
https://bisresearch.com/privacy-policy-cookie-restriction-modehttps://bisresearch.com/privacy-policy-cookie-restriction-mode
Time period covered
2023 - 2033
Area covered
Worldwide
Description
The Data Mining Tools Market is expected to be valued at $1.24 billion in 2024, with an anticipated expansion at a CAGR of 11.63% to reach $3.73 billion by 2034.
e
Data pre-processing and clean-up
paper.erudition.co.in
html
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Data pre-processing and clean-up [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering-artificial-intelligence-and-machine-learning/6/data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Dec 3, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Data pre-processing and clean-up of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)
l
LSC (Leicester Scientific Corpus)
figshare.le.ac.uk
Updated Apr 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LSC (Leicester Scientific Corpus) [Dataset]. http://doi.org/10.25392/leicester.data.9449639.v2
Explore at:
Unique identifier
https://doi.org/10.25392/leicester.data.9449639.v2
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
The LSC (Leicester Scientific Corpus)

April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data are extracted from the Web of Science [1]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.[Version 2] A further cleaning is applied in Data Processing for LSC Abstracts in Version 1*. Details of cleaning procedure are explained in Step 6.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v1.Getting StartedThis text provides the information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the meaning of research texts and make it available for use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. The corpus contains only documents in English. Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’. 5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’. 6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4] 7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018. We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,350.Data ProcessingStep 1: Downloading of the Data Online

The dataset is collected manually by exporting documents as Tab-delimitated files online. All documents are available online.Step 2: Importing the Dataset to R

The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryAs our research is based on the analysis of abstracts and categories, all documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsEspecially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc. Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. The detection and identification of such words is done by sampling of medicine-related publications with human intervention. Detected concatenate words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.The section headings in such abstracts are listed below:

Background Method(s) Design Theoretical Measurement(s) Location Aim(s) Methodology Process Abstract Population Approach Objective(s) Purpose(s) Subject(s) Introduction Implication(s) Patient(s) Procedure(s) Hypothesis Measure(s) Setting(s) Limitation(s) Discussion Conclusion(s) Result(s) Finding(s) Material (s) Rationale(s) Implications for health and nursing policyStep 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction, the lengths of abstracts are calculated. ‘Length’ indicates the total number of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. In LSC, we decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis.

Step 6: [Version 2] Cleaning Copyright Notices, Permission polices, Journal Names and Conference Names from LSC Abstracts in Version 1Publications can include a footer of copyright notice, permission policy, journal name, licence, author’s right or conference name below the text of abstract by conferences and journals. Used tool for extracting and processing abstracts in WoS database leads to attached such footers to the text. For example, our casual observation yields that copyright notices such as ‘Published by Elsevier ltd.’ is placed in many texts. To avoid abnormal appearances of words in further analysis of words such as bias in frequency calculation, we performed a cleaning procedure on such sentences and phrases in abstracts of LSC version 1. We removed copyright notices, names of conferences, names of journals, authors’ rights, licenses and permission policies identiﬁed by sampling of abstracts.Step 7: [Version 2] Re-extracting (Sub-setting) the Data Based on Lengths of AbstractsThe cleaning procedure described in previous step leaded to some abstracts having less than our minimum length criteria (30 words). 474 texts were removed.Step 8: Saving the Dataset into CSV FormatDocuments are saved into 34 CSV files. In CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/ [2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html [4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US [5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3 [6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.
Designing a more efficient, effective and safe Medical Emergency Team (MET)...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christoph Bergmeir; Irma Bilgrami; Christopher Bain; Geoffrey I. Webb; Judit Orosz; David Pilcher (2023). Designing a more efficient, effective and safe Medical Emergency Team (MET) service using data analysis [Dataset]. http://doi.org/10.1371/journal.pone.0188688
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0188688
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Christoph Bergmeir; Irma Bilgrami; Christopher Bain; Geoffrey I. Webb; Judit Orosz; David Pilcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionHospitals have seen a rise in Medical Emergency Team (MET) reviews. We hypothesised that the commonest MET calls result in similar treatments. Our aim was to design a pre-emptive management algorithm that allowed direct institution of treatment to patients without having to wait for attendance of the MET team and to model its potential impact on MET call incidence and patient outcomes.MethodsData was extracted for all MET calls from the hospital database. Association rule data mining techniques were used to identify the most common combinations of MET call causes, outcomes and therapies.ResultsThere were 13,656 MET calls during the 34-month study period in 7936 patients. The most common MET call was for hypotension [31%, (2459/7936)]. These MET calls were strongly associated with the immediate administration of intra-venous fluid (70% [1714/2459] v 13% [739/5477] p
f
Experimental data for "Software Data Analytics: Architectural Model...
figshare.com
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cong Liu (2023). Experimental data for "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection" [Dataset]. http://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
Dataset updated
Jun 6, 2023
Dataset provided by
4TU.ResearchData
Authors
Cong Liu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.
f
DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf
frontiersin.figshare.com
pdf
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi (2023). DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf [Dataset]. http://doi.org/10.3389/fgene.2019.00934.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00934.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Biological data analysis is the key to new discoveries in disease biology and drug discovery. The rapid proliferation of high-throughput ‘omics’ data has necessitated a need for tools and platforms that allow the researchers to combine and analyse different types of biological data and obtain biologically relevant knowledge. We had previously developed TargetMine, an integrative data analysis platform for target prioritisation and broad-based biological knowledge discovery. Here, we describe the newly modelled biological data types and the enhanced visual and analytical features of TargetMine. These enhancements have included: an enhanced coverage of gene–gene relations, small molecule metabolite to pathway mappings, an improved literature survey feature, and in silico prediction of gene functional associations such as protein–protein interactions and global gene co-expression. We have also described two usage examples on trans-omics data analysis and extraction of gene-disease associations using MeSH term descriptors. These examples have demonstrated how the newer enhancements in TargetMine have contributed to a more expansive coverage of the biological data space and can help interpret genotype–phenotype relations. TargetMine with its auxiliary toolkit is available at https://targetmine.mizuguchilab.org. The TargetMine source code is available at https://github.com/chenyian-nibio/targetmine-gradle.
e
Statistical Analysis and Data Mining - if-computation
exaly.com
csv, json
Updated Nov 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Statistical Analysis and Data Mining - if-computation [Dataset]. https://exaly.com/journal/28577/statistical-analysis-and-data-mining/impact-factor
Explore at:
json, csvAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This graph shows how the impact factor of ^ is computed. The left axis depicts the number of papers published in years X-1 and X-2, and the right axis displays their citations in year X.
e
Data Mining Tools Market Size, Share, Trend Analysis by 2033
emergenresearch.com
pdf,excel,csv,ppt
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emergen Research (2024). Data Mining Tools Market Size, Share, Trend Analysis by 2033 [Dataset]. https://www.emergenresearch.com/industry-report/data-mining-tools-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Dec 19, 2024
Dataset authored and provided by
Emergen Research
License
https://www.emergenresearch.com/privacy-policyhttps://www.emergenresearch.com/privacy-policy
Area covered
Global
Variables measured
Base Year, No. of Pages, Growth Drivers, Forecast Period, Segments covered, Historical Data for, Pitfalls Challenges, 2033 Value Projection, Tables, Charts, and Figures, Forecast Period 2024 - 2033 CAGR, and 1 more
Description
The Data Mining Tools Market size is expected to reach a valuation of USD 3.33 billion in 2033 growing at a CAGR of 12.50%. The Data Mining Tools market research report classifies market by share, trend, demand, forecast and based on segmentation.
D
Data Mining Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-software-41235
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Data Mining Software market is experiencing robust growth, driven by the increasing need for businesses to extract valuable insights from massive datasets. The market, estimated at $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching an estimated $45 billion by 2033. This expansion is fueled by several key factors. The burgeoning adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting both large enterprises and SMEs. Furthermore, advancements in machine learning and artificial intelligence algorithms are enhancing the accuracy and efficiency of data mining processes, leading to better decision-making across various sectors like finance, healthcare, and marketing. The rise of big data analytics and the increasing availability of affordable, high-powered computing resources are also significant contributors to market growth. However, the market faces certain challenges. Data security and privacy concerns remain paramount, especially with the increasing volume of sensitive information being processed. The complexity of data mining software and the need for skilled professionals to operate and interpret the results present a barrier to entry for some businesses. The high initial investment cost associated with implementing sophisticated data mining solutions can also deter smaller organizations. Nevertheless, the ongoing technological advancements and the growing recognition of the strategic value of data-driven decision-making are expected to overcome these restraints and propel the market toward continued expansion. The market segmentation reveals a strong preference for cloud-based solutions, reflecting the industry's trend toward flexible and scalable IT infrastructure. Large enterprises currently dominate the market share, but SMEs are rapidly adopting data mining software, indicating promising future growth in this segment. Geographic analysis shows that North America and Europe are currently leading the market, but the Asia-Pacific region is poised for significant growth due to increasing digitalization and economic expansion in countries like China and India.
d
Data from: Data Mining at NASA: From Theory to Applications
catalog.data.gov
s.cnmilf.com
+1more
Updated Aug 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining at NASA: From Theory to Applications [Dataset]. https://catalog.data.gov/dataset/data-mining-at-nasa-from-theory-to-applications
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
Dashlink
Description
NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management

Data Mining in Systems Health Management

Explore at:

12 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 10, 2025

Dataset provided by

Dashlink

Description

This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.

Clear search

Close search

Google apps

Main menu

Data Mining in Systems Health Management

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Data from: Enriching time series datasets using Nonparametric kernel...

Data Analysis for the Systematic Literature Review of DL4SE

U.S. Data Analysis Storage Management Market Research Report By Product Type...

Global Big Data Tool Market Research Report: By Deployment Model...

Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Data Processing and Hosting Services Industry Report

Video-to-Model Data Set

Privacy Preserving Distributed Data Mining

Data Mining Tools Market - A Global and Regional Analysis

Data pre-processing and clean-up

LSC (Leicester Scientific Corpus)

Designing a more efficient, effective and safe Medical Emergency Team (MET)...

Experimental data for "Software Data Analytics: Architectural Model...

DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf

Statistical Analysis and Data Mining - if-computation

Data Mining Tools Market Size, Share, Trend Analysis by 2033

Data Mining Software Report

Data from: Data Mining at NASA: From Theory to Applications

Data Mining in Systems Health Management