100+ datasets found

c
Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...
s.cnmilf.com
cloud.csiss.gmu.edu
+7more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/discovering-anomalous-aviation-safety-events-using-scalable-data-mining-algorithms
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent _domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.
s
Online Feature Selection and Its Applications
researchdata.smu.edu.sg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
Explore at:
Unique identifier
https://doi.org/10.25440/smu.12062733.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/
D
Data Mining Tools Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Tools Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-tools-market-1722
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Mining Tools Market size was valued at USD 1.01 USD billion in 2023 and is projected to reach USD 1.99 USD billion by 2032, exhibiting a CAGR of 10.2 % during the forecast period. The growing adoption of data-driven decision-making and the increasing need for business intelligence are major factors driving market growth. Data mining refers to filtering, sorting, and classifying data from larger datasets to reveal subtle patterns and relationships, which helps enterprises identify and solve complex business problems through data analysis. Data mining software tools and techniques allow organizations to foresee future market trends and make business-critical decisions at crucial times. Data mining is an essential component of data science that employs advanced data analytics to derive insightful information from large volumes of data. Businesses rely heavily on data mining to undertake analytics initiatives in the organizational setup. The analyzed data sourced from data mining is used for varied analytics and business intelligence (BI) applications, which consider real-time data analysis along with some historical pieces of information. Recent developments include: May 2023 – WiMi Hologram Cloud Inc. introduced a new data interaction system developed by combining neural network technology and data mining. Using real-time interaction, the system can offer reliable and safe information transmission., May 2023 – U.S. Data Mining Group, Inc., operating in bitcoin mining site, announced a hosting contract to deploy 150,000 bitcoins in partnership with major companies such as TeslaWatt, Sphere 3D, Marathon Digital, and more. The company is offering industry turn-key solutions for curtailment, accounting, and customer relations., April 2023 – Artificial intelligence and single-cell biotech analytics firm, One Biosciences, launched a single cell data mining algorithm called ‘MAYA’. The algorithm is for cancer patients to detect therapeutic vulnerabilities., May 2022 – Europe-based Solarisbank, a banking-as-a-service provider, announced its partnership with Snowflake to boost its cloud data strategy. Using the advanced cloud infrastructure, the company can enhance data mining efficiency and strengthen its banking position.. Key drivers for this market are: Increasing Focus on Customer Satisfaction to Drive Market Growth. Potential restraints include: Requirement of Skilled Technical Resources Likely to Hamper Market Growth. Notable trends are: Incorporation of Data Mining and Machine Learning Solutions to Propel Market Growth.
v
Data from: Multi-objective optimization based privacy preserving distributed...
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
cloud.csiss.gmu.edu
+3more
Updated Apr 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/multi-objective-optimization-based-privacy-preserving-distributed-data-mining-in-peer-to-p
Explore at:
Dataset updated
Apr 9, 2025
Dataset provided by
Dashlink
Description
This paper proposes a scalable, local privacy preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation.
w
Dataset of book subjects that contain Data mining algorithms in C++ : data...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Data mining algorithms in C++ : data patterns and algorithms for modern applications [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Data+mining+algorithms+in+C%2B%2B+:+data+patterns+and+algorithms+for+modern+applications&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 6 rows and is filtered where the books is Data mining algorithms in C++ : data patterns and algorithms for modern applications. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
m
Educational Attainment in North Carolina Public Schools: Use of statistical...
data.mendeley.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
Explore at:
Unique identifier
https://doi.org/10.17632/6cm9wyd5g5.1
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
Data Mining Tools Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Data Mining Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-mining-tools-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Aug 4, 2025
Dataset provided by
Authors
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining Tools Market Outlook

According to our latest research, the global Data Mining Tools market size reached USD 1.93 billion in 2024, reflecting robust industry momentum. The market is expected to grow at a CAGR of 12.7% from 2025 to 2033, reaching a projected value of USD 5.69 billion by 2033. This growth is primarily driven by the increasing adoption of advanced analytics across diverse industries, rapid digital transformation, and the necessity for actionable insights from massive data volumes.

One of the pivotal growth factors propelling the Data Mining Tools market is the exponential rise in data generation, particularly through digital channels, IoT devices, and enterprise applications. Organizations across sectors are leveraging data mining tools to extract meaningful patterns, trends, and correlations from structured and unstructured data. The need for improved decision-making, operational efficiency, and competitive advantage has made data mining an essential component of modern business strategies. Furthermore, advancements in artificial intelligence and machine learning are enhancing the capabilities of these tools, enabling predictive analytics, anomaly detection, and automation of complex analytical tasks, which further fuels market expansion.

Another significant driver is the growing demand for customer-centric solutions in industries such as retail, BFSI, and healthcare. Data mining tools are increasingly being used for customer relationship management, targeted marketing, fraud detection, and risk management. By analyzing customer behavior and preferences, organizations can personalize their offerings, optimize marketing campaigns, and mitigate risks. The integration of data mining tools with cloud platforms and big data technologies has also simplified deployment and scalability, making these solutions accessible to small and medium-sized enterprises (SMEs) as well as large organizations. This democratization of advanced analytics is creating new growth avenues for vendors and service providers.

The regulatory landscape and the increasing emphasis on data privacy and security are also shaping the development and adoption of Data Mining Tools. Compliance with frameworks such as GDPR, HIPAA, and CCPA necessitates robust data governance and transparent analytics processes. Vendors are responding by incorporating features like data masking, encryption, and audit trails into their solutions, thereby enhancing trust and adoption among regulated industries. Additionally, the emergence of industry-specific data mining applications, such as fraud detection in BFSI and predictive diagnostics in healthcare, is expanding the addressable market and fostering innovation.

From a regional perspective, North America currently dominates the Data Mining Tools market owing to the early adoption of advanced analytics, strong presence of leading technology vendors, and high investments in digital transformation. However, the Asia Pacific region is emerging as a lucrative market, driven by rapid industrialization, expansion of IT infrastructure, and growing awareness of data-driven decision-making in countries like China, India, and Japan. Europe, with its focus on data privacy and digital innovation, also represents a significant market share, while Latin America and the Middle East & Africa are witnessing steady growth as organizations in these regions modernize their operations and adopt cloud-based analytics solutions.

Component Analysis

The Component segment of the Data Mining Tools market is bifurcated into Software and Services. Software remains the dominant segment, accounting for the majority of the market share in 2024. This dominance is attributed to the continuous evolution of data mining algorithms, the proliferation of user-friendly graphical interfaces, and the integration of advanced analytics capabilities such as machine learning, artificial intelligence, and natural language pro
f
The performance of data mining algorithm given worst case/best case...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shang-Ming Zhou; Fabiola Fernandez-Gutierrez; Jonathan Kennedy; Roxanne Cooksey; Mark Atkinson; Spiros Denaxas; Stefan Siebert; William G. Dixon; Terence W. O’Neill; Ernest Choy; Cathie Sudlow; Sinead Brophy (2023). The performance of data mining algorithm given worst case/best case assumptions. [Dataset]. http://doi.org/10.1371/journal.pone.0154515.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0154515.t001
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Shang-Ming Zhou; Fabiola Fernandez-Gutierrez; Jonathan Kennedy; Roxanne Cooksey; Mark Atkinson; Spiros Denaxas; Stefan Siebert; William G. Dixon; Terence W. O’Neill; Ernest Choy; Cathie Sudlow; Sinead Brophy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The performance of data mining algorithm given worst case/best case assumptions.
d
Distributed Data Mining in Peer-to-Peer Networks
catalog.data.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Distributed Data Mining in Peer-to-Peer Networks [Dataset]. https://catalog.data.gov/dataset/distributed-data-mining-in-peer-to-peer-networks
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.
D
Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-mining-and-modeling-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Sep 23, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining and Modeling Market Outlook

The global data mining and modeling market size was valued at approximately $28.5 billion in 2023 and is projected to reach $70.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 10.5% during the forecast period. This remarkable growth can be attributed to the increasing complexity and volume of data generated across various industries, necessitating robust tools and techniques for effective data analysis and decision-making processes.

One of the primary growth factors driving the data mining and modeling market is the exponential increase in data generation owing to advancements in digital technology. Modern enterprises generate extensive data from numerous sources such as social media platforms, IoT devices, and transactional databases. The need to make sense of this vast information trove has led to a surge in the adoption of data mining and modeling tools. These tools help organizations uncover hidden patterns, correlations, and insights, thereby enabling more informed decision-making and strategic planning.

Another significant growth driver is the increasing adoption of artificial intelligence (AI) and machine learning (ML) technologies. Data mining and modeling are critical components of AI and ML algorithms, which rely on large datasets to learn and make predictions. As businesses strive to stay competitive, they are increasingly investing in AI-driven analytics solutions. This trend is particularly prevalent in sectors such as healthcare, finance, and retail, where predictive analytics can provide a substantial competitive edge. Moreover, advancements in big data technologies are further bolstering the capabilities of data mining and modeling solutions, making them more effective and efficient.

The burgeoning demand for business intelligence (BI) and analytics solutions is also a major factor propelling the market. Organizations are increasingly recognizing the value of data-driven insights in identifying market trends, customer preferences, and operational inefficiencies. Data mining and modeling tools form the backbone of sophisticated BI platforms, enabling companies to transform raw data into actionable intelligence. This demand is further amplified by the growing importance of regulatory compliance and risk management, particularly in highly regulated industries such as banking, financial services, and healthcare.

From a regional perspective, North America currently dominates the data mining and modeling market, owing to the early adoption of advanced technologies and the presence of major market players. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid digital transformation initiatives and increasing investments in AI and big data technologies. Europe also holds a significant market share, supported by stringent data protection regulations and a strong focus on innovation.

Component Analysis

The data mining and modeling market by component is broadly segmented into software and services. The software segment encompasses various tools and platforms that facilitate data mining and modeling processes. These software solutions range from basic data analysis tools to advanced platforms integrated with AI and ML capabilities. The increasing complexity of data and the need for real-time analytics are driving the demand for sophisticated software solutions. Companies are investing in custom and off-the-shelf software to enhance their data handling and analytical capabilities, thereby gaining a competitive edge.

The services segment includes consulting, implementation, training, and support services. As organizations strive to leverage data mining and modeling tools effectively, the demand for professional services is on the rise. Consulting services help businesses identify the right tools and strategies for their specific needs, while implementation services ensure the seamless integration of these tools into existing systems. Training services are crucial for building in-house expertise, enabling teams to maximize the benefits of data mining and modeling solutions. Support services ensure the ongoing maintenance and optimization of these tools, addressing any technical issues that may arise.

The software segment is expected to dominate the market throughout the forecast period, driven by continuous advancements in te
Artificial dataset for clustering algorithms
figshare.com
zip
Updated Jun 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mayra Zegarra Rodriguez; Cesar H. Comin; Dalcimar Casanova; Odemir M; Diego R. Amancio; Francisco A. Rodrigues; Luciano da F. Costa (2023). Artificial dataset for clustering algorithms [Dataset]. http://doi.org/10.6084/m9.figshare.5412091.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5412091.v1
Dataset updated
Jun 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Mayra Zegarra Rodriguez; Cesar H. Comin; Dalcimar Casanova; Odemir M; Diego R. Amancio; Francisco A. Rodrigues; Luciano da F. Costa
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This file contains a number of randomly generated datasets. The properties of each dataset are indicated in the name of each respective file: 'C' indicates the number of classes, 'F' indicates the number of features, 'Ne' indicates the number of objects contained in each class, 'A' is related to the average separation between classes and 'R' is an index used to differentiate distinct random trials. So, for instance, the file C2F10N2Ne5A1.2R0 is a dataset containing 2 classes, 10 features, 5 objects for each class and having a typical separation between classes of 1.2. The methodology used for generating the datasets is described in the accompanying reference.
D
Lifesciences Data Mining and Visualization Market Report | Global Forecast...
dataintelo.com
csv, pdf, pptx
Updated Sep 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Lifesciences Data Mining and Visualization Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-lifesciences-data-mining-and-visualization-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Sep 5, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Lifesciences Data Mining and Visualization Market Outlook

The global market size for Lifesciences Data Mining and Visualization was valued at approximately USD 1.5 billion in 2023 and is projected to reach around USD 4.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 12.5% during the forecast period. The growth of this market is driven by the increasing demand for sophisticated data analysis tools in the life sciences sector, advancements in analytical technologies, and the rising volume of complex biological data generated from research and clinical trials.

One of the primary growth factors for the Lifesciences Data Mining and Visualization market is the burgeoning amount of data generated from various life sciences applications, such as genomics, proteomics, and clinical trials. With the advent of high-throughput technologies, researchers and healthcare professionals are now capable of generating vast amounts of data, which necessitates the use of advanced data mining and visualization tools to derive actionable insights. These tools not only help in managing and interpreting large datasets but also in uncovering hidden patterns and relationships, thereby accelerating research and development processes.

Another significant driver is the increasing adoption of artificial intelligence (AI) and machine learning (ML) algorithms in the life sciences domain. These technologies have proven to be invaluable in enhancing data analysis capabilities, enabling more precise and predictive modeling of biological systems. By integrating AI and ML with data mining and visualization platforms, researchers can achieve higher accuracy in identifying potential drug targets, understanding disease mechanisms, and personalizing treatment plans. This trend is expected to continue, further propelling the market's growth.

Moreover, the rising emphasis on personalized medicine and the need for precision in healthcare is fueling the demand for data mining and visualization tools. Personalized medicine relies heavily on the analysis of individual genetic, proteomic, and metabolomic profiles to tailor treatments specifically to patients' unique characteristics. The ability to visualize these complex datasets in an understandable and actionable manner is critical for the successful implementation of personalized medicine strategies, thereby boosting the demand for advanced data analysis tools.

From a regional perspective, North America is anticipated to dominate the Lifesciences Data Mining and Visualization market, owing to the presence of a robust healthcare infrastructure, significant investments in research and development, and a high adoption rate of advanced technologies. The European market is also expected to witness substantial growth, driven by increasing government initiatives to support life sciences research and the presence of leading biopharmaceutical companies. The Asia Pacific region is projected to experience the fastest growth, attributed to the expanding healthcare sector, rising investments in biotechnology research, and the increasing adoption of data analytics solutions.

Component Analysis

The Lifesciences Data Mining and Visualization market is segmented by component into software and services. The software segment is expected to hold a significant share of the market, driven by the continuous advancements in data mining algorithms and visualization techniques. Software solutions are critical in processing large volumes of complex biological data, facilitating real-time analysis, and providing intuitive visual representations that aid in decision-making. The increasing integration of AI and ML into these software solutions is further enhancing their capabilities, making them indispensable tools in life sciences research.

The services segment, on the other hand, is projected to grow at a considerable rate, as organizations seek specialized expertise to manage and interpret their data. Services include consulting, implementation, and maintenance, as well as training and support. The demand for these services is driven by the need to ensure optimal utilization of data mining software and to keep up with the rapid pace of technological advancements. Moreover, many life sciences organizations lack the in-house expertise required to handle large-scale data analytics projects, thereby turning to external service providers for assistance.

Within the software segment, there is a growing trend towards the development of integrated platforms that combine multiple functionalities, such as data collection, pre
d
Data from: A Generic Local Algorithm for Mining Data Streams in Large...
catalog.data.gov
datasets.ai
+3more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems [Dataset]. https://catalog.data.gov/dataset/a-generic-local-algorithm-for-mining-data-streams-in-large-distributed-systems
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.
e
Key Characteristics of Algorithms' Dynamics Beyond Accuracy - Evaluation...
b2find.eudat.eu
Updated Aug 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Key Characteristics of Algorithms' Dynamics Beyond Accuracy - Evaluation Tests (v2) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/3524622d-2099-554c-826a-f2155c3f4bb4
Explore at:
Dataset updated
Aug 17, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Key Characteristics of Algorithms' Dynamics Beyond Accuracy - Evaluation Tests (v2) conducted for the paper: What do anomaly scores actually mean? Key characteristics of algorithms' dynamics beyond accuracy by F. Iglesias, H. O. Marques, A. Zimek, T. Zseby Context and methodology Anomaly detection is intrinsic to a large number of data analysis applications today. Most of the algorithms used assign an outlierness score to each instance prior to establishing anomalies in a binary form. The experiments in this repository study how different algorithms generate different dynamics in the outlierness scores and react in very different ways to possible model perturbations that affect data. The study elaborated in the referred paper presents new indices and coefficients to assess the dynamics and explores the responses of the algorithms as a function of variations in these indices, revealing key aspects of the interdependence between algorithms, data geometries and the ability to discriminate anomalies. Therefeore, this repository reproduces the conducted experiments, which study eight algorithms (ABOD, HBOS, iForest, K-NN, LOF, OCSVM, SDO and GLOSH), submitted to seven perturbations related to: cardinality, dimensionality, outlier proportion, inlier-outlier density ratio, density layers, clusters and local outliers, and collects behavioural profiles with eleven measurements (Adjusted Average Precission, ROC-AUC, Perini's Confidence [1], Perini's Stability [2], S-curves, Discriminant Power, Robust Coefficients of Variations for Inliers and Outliers, Coherence, Bias and Robustness) under two types of normalization: linear and Gaussian, the latter aiming to standardize the outlierness scores issued by different algorithms [3]. This repository is framed within the research on the following domains: algorithm evaluation, outlier detection, anomaly detection, unsupervised learning, machine learning, data mining, data analysis. Datasets and algorithms can be used for experiment replication and for further evaluation and comparison. References [1] Perini, L., Vercruyssen, V., Davis, J.: Quantifying the confidence of anomaly detectors in their example-wise predictions. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer Verlag (2020). [2] Perini, L., Galvin, C., Vercruyssen, V.: A Ranking Stability Measure for Quantifying the Robustness of Anomaly Detection Methods. In: 2nd Workshop on Evaluation and Experimental Design in Data Mining and Machine Learning @ ECML/PKDD (2020). [3] Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: Proceedings of the 2011 SIAM International Conference on Data Mining (SDM), pp. 13–24 (2011) Technical details Experiments are tested Python 3.9.6. Provided scripts generate all synthetic data and results. We keep them in the repo for the sake of comparability and replicability ("outputs.zip" file). The file and folder structure is as follows: "compare_scores_group.py" is a Python script to extract new dynamic indices proposed in the paper. "generate_data.py" is a Python script to generate datasets used for evaluation. "latex_table.py" is a Python script to show results in a latex-table format. "merge_indices.py" is a Python script to merge accuracy and dynamic indices in the same table-structured summary. "metric_corr.py" is a Python script to calculate correlation estimations between indices. "outdet.py" is a Python script that runs outlier detection with different algorithms on diverse datasets. "perini_tests.py" is a Python script to run Perini's confidence and stability on all datasets and algorithms' performances. "scatterplots.py" is a Python script that generates scatter plots for comparing accuracy and dynamic performances. "README.md" provides explanations and step by step instructions for replication. "requirements.txt" contains references to required Python libraries and versions. "outputs.zip" contains all result tables, plots and synthetic data generated with the scripts. [data/real_data] contain CSV versions of the Wilt, Shuttle, Waveform and Cardiotocography datasets (inherited and adapted from the LMU repository) License The CC-BY license applies to all data generated with the "generated_data.py" script. All distributed code is under the GNU GPL license. For the "ExCeeD.py" and "stability.py" scripts, please consult and refer to the original sources provided above.
f
Which Is a More Accurate Predictor in Colorectal Survival Analysis? Nine...
figshare.com
datasetcatalog.nlm.nih.gov
doc
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peng Gao; Xin Zhou; Zhen-ning Wang; Yong-xi Song; Lin-lin Tong; Ying-ying Xu; Zhen-yu Yue; Hui-mian Xu (2023). Which Is a More Accurate Predictor in Colorectal Survival Analysis? Nine Data Mining Algorithms vs. the TNM Staging System [Dataset]. http://doi.org/10.1371/journal.pone.0042015
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0042015
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Peng Gao; Xin Zhou; Zhen-ning Wang; Yong-xi Song; Lin-lin Tong; Ying-ying Xu; Zhen-yu Yue; Hui-mian Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveOver the past decades, many studies have used data mining technology to predict the 5-year survival rate of colorectal cancer, but there have been few reports that compared multiple data mining algorithms to the TNM classification of malignant tumors (TNM) staging system using a dataset in which the training and testing data were from different sources. Here we compared nine data mining algorithms to the TNM staging system for colorectal survival analysis. MethodsTwo different datasets were used: 1) the National Cancer Institute's Surveillance, Epidemiology, and End Results dataset; and 2) the dataset from a single Chinese institution. An optimization and prediction system based on nine data mining algorithms as well as two variable selection methods was implemented. The TNM staging system was based on the 7th edition of the American Joint Committee on Cancer TNM staging system. ResultsWhen the training and testing data were from the same sources, all algorithms had slight advantages over the TNM staging system in predictive accuracy. When the data were from different sources, only four algorithms (logistic regression, general regression neural network, Bayesian networks, and Naïve Bayes) had slight advantages over the TNM staging system. Also, there was no significant differences among all the algorithms (p>0.05). ConclusionsThe TNM staging system is simple and practical at present, and data mining methods are not accurate enough to replace the TNM staging system for colorectal cancer survival prediction. Furthermore, there were no significant differences in the predictive accuracy of all the algorithms when the data were from different sources. Building a larger dataset that includes more variables may be important for furthering predictive accuracy.
A Local Distributed Peer-to-Peer Algorithm Using Multi-Party Optimization...
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). A Local Distributed Peer-to-Peer Algorithm Using Multi-Party Optimization Based Privacy Preservation for Data Mining Primitive Computation - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/a-local-distributed-peer-to-peer-algorithm-using-multi-party-optimization-based-privacy-pr
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This paper proposes a scalable, local privacy-preserving algorithm for distributed peer-to-peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and therefore, is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization-based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacypreserving clustering, frequent itemset mining, and statistical aggregate computation.
Grocery Store dataset for data mining
kaggle.com
zip
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Honey Patel (2021). Grocery Store dataset for data mining [Dataset]. https://www.kaggle.com/honeypatel2158/grocery-store-dataset-for-data-mining
Explore at:
zip(7990 bytes)Available download formats
Dataset updated
Mar 9, 2021
Authors
Honey Patel
Description
Dataset

This dataset was created by Honey Patel

Contents
f
Additional file 1 of Novel methods of qualitative analysis for health policy...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mireya Martínez-García; Maite Vallejo; Enrique Hernández-Lemus; Jorge Alberto Álvarez-Díaz (2023). Additional file 1 of Novel methods of qualitative analysis for health policy research [Dataset]. http://doi.org/10.6084/m9.figshare.7587416.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7587416.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Mireya Martínez-García; Maite Vallejo; Enrique Hernández-Lemus; Jorge Alberto Álvarez-Díaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interactive network files. Interactive network files with all statistical and topological analyses. This is a Cytoscape.cys session. In order to open/view/modify this file please use the freely available Cytoscape software platform, available at http://www.cytoscape.org/download.php . (SIF 3413 kb)
s
Citation Trends for "Development of Data Mining Algorithms for Identifying...
shibatadb.com
Updated Oct 23, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2013). Citation Trends for "Development of Data Mining Algorithms for Identifying the Best Anthropometric Predictors for Cardiovascular Disease: MASHAD Cohort Study" [Dataset]. https://www.shibatadb.com/article/KMmMTbvt
Explore at:
Dataset updated
Oct 23, 2013
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2023 - 2025
Area covered
Mashhad
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "Development of Data Mining Algorithms for Identifying the Best Anthropometric Predictors for Cardiovascular Disease: MASHAD Cohort Study".
m
Data for: A PSO-based algorithm for mining association rules using a guided...
data.mendeley.com
Updated Jul 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosa María Valdovinos (2020). Data for: A PSO-based algorithm for mining association rules using a guided exploration strategy [Dataset]. http://doi.org/10.17632/nt73rwvcpd.1
Explore at:
Unique identifier
https://doi.org/10.17632/nt73rwvcpd.1
Dataset updated
Jul 4, 2020
Authors
Rosa María Valdovinos
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Datasets used in the paper

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/discovering-anomalous-aviation-safety-events-using-scalable-data-mining-algorithms

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms

Explore at:

Dataset updated

Apr 10, 2025

Dataset provided by

Dashlink

Description

The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent _domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.

Clear search

Close search

Google apps

Main menu

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

Online Feature Selection and Its Applications

Data Mining Tools Market Report

Data from: Multi-objective optimization based privacy preserving distributed...

Dataset of book subjects that contain Data mining algorithms in C++ : data...

Educational Attainment in North Carolina Public Schools: Use of statistical...

Data Mining Tools Market Research Report 2033

Data Mining Tools Market Outlook

Component Analysis

The performance of data mining algorithm given worst case/best case...

Distributed Data Mining in Peer-to-Peer Networks

Data Mining and Modeling Market Report | Global Forecast From 2025 To 2033

Data Mining and Modeling Market Outlook

Component Analysis

Artificial dataset for clustering algorithms

Lifesciences Data Mining and Visualization Market Report | Global Forecast...

Lifesciences Data Mining and Visualization Market Outlook

Component Analysis

Data from: A Generic Local Algorithm for Mining Data Streams in Large...

Key Characteristics of Algorithms' Dynamics Beyond Accuracy - Evaluation...

Which Is a More Accurate Predictor in Colorectal Survival Analysis? Nine...

A Local Distributed Peer-to-Peer Algorithm Using Multi-Party Optimization...

Grocery Store dataset for data mining

Dataset

Contents

Additional file 1 of Novel methods of qualitative analysis for health policy...

Citation Trends for "Development of Data Mining Algorithms for Identifying...

Data for: A PSO-based algorithm for mining association rules using a guided...

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining AlgorithmsSee More Versions

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms