100+ datasets found

d
Data from: Data Mining at NASA: From Theory to Applications
catalog.data.gov
s.cnmilf.com
+1more
Updated Aug 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining at NASA: From Theory to Applications [Dataset]. https://catalog.data.gov/dataset/data-mining-at-nasa-from-theory-to-applications
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
Dashlink
Description
NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.
e
Statistical Analysis and Data Mining - impact-factor
exaly.com
csv, json
Updated Nov 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Statistical Analysis and Data Mining - impact-factor [Dataset]. https://exaly.com/journal/28577/statistical-analysis-and-data-mining
Explore at:
csv, jsonAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The graph shows the changes in the impact factor of ^ and its corresponding percentile for the sake of comparison with the entire literature. Impact Factor is the most common scientometric index, which is defined by the number of citations of papers in two preceding years divided by the number of papers published in those years.
e
Data Mining Tools Market Size, Share, Trend Analysis by 2033
emergenresearch.com
pdf,excel,csv,ppt
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emergen Research (2024). Data Mining Tools Market Size, Share, Trend Analysis by 2033 [Dataset]. https://www.emergenresearch.com/industry-report/data-mining-tools-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Dec 19, 2024
Dataset authored and provided by
Emergen Research
License
https://www.emergenresearch.com/privacy-policyhttps://www.emergenresearch.com/privacy-policy
Area covered
Global
Variables measured
Base Year, No. of Pages, Growth Drivers, Forecast Period, Segments covered, Historical Data for, Pitfalls Challenges, 2033 Value Projection, Tables, Charts, and Figures, Forecast Period 2024 - 2033 CAGR, and 1 more
Description
The Data Mining Tools Market size is expected to reach a valuation of USD 3.33 billion in 2033 growing at a CAGR of 12.50%. The Data Mining Tools market research report classifies market by share, trend, demand, forecast and based on segmentation.
d
Privacy Preserving Distributed Data Mining
catalog.data.gov
s.cnmilf.com
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Privacy Preserving Distributed Data Mining [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-distributed-data-mining
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:
Data Mining Market Size, Competitive Landscape, Growth Trends 2030
mordorintelligence.com
pdf,excel,csv,ppt
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mordor Intelligence (2025). Data Mining Market Size, Competitive Landscape, Growth Trends 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/data-mining-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Mordor Intelligence
License
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Time period covered
2019 - 2030
Area covered
Global
Description
The Data Mining Market is Segmented by Component (Tools [ETL and Data Preparation, Data-Mining Workbench, and More], Services [Professional Services, and More]), End-User Enterprise Size (Small and Medium Enterprises, Large Enterprises), Deployment (Cloud, On-Premise), End-User Industry (BFSI, IT and Telecom, Government and Defence, and More), and Geography. The Market Forecasts are Provided in Terms of Value (USD).
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Data Mining Tools Market - A Global and Regional Analysis
bisresearch.com
csv, pdf
Updated Nov 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bisresearch (2025). Data Mining Tools Market - A Global and Regional Analysis [Dataset]. https://bisresearch.com/industry-report/global-data-mining-tools-market.html
Explore at:
csv, pdfAvailable download formats
Dataset updated
Nov 30, 2025
Dataset authored and provided by
Bisresearch
License
https://bisresearch.com/privacy-policy-cookie-restriction-modehttps://bisresearch.com/privacy-policy-cookie-restriction-mode
Time period covered
2023 - 2033
Area covered
Worldwide
Description
The Data Mining Tools Market is expected to be valued at $1.24 billion in 2024, with an anticipated expansion at a CAGR of 11.63% to reach $3.73 billion by 2034.
m
Educational Attainment in North Carolina Public Schools: Use of statistical...
data.mendeley.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
Explore at:
Unique identifier
https://doi.org/10.17632/6cm9wyd5g5.1
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
Wiley.Data.Mining.for.Business.Analytics.in.R.Full
kaggle.com
zip
Updated Oct 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
liyanonline (2021). Wiley.Data.Mining.for.Business.Analytics.in.R.Full [Dataset]. https://www.kaggle.com/datasets/liyanonline/wileydataminingforbusinessanalyticsinrfull
Explore at:
zip(6519556 bytes)Available download formats
Dataset updated
Oct 4, 2021
Authors
liyanonline
Description
Dataset

This dataset was created by liyanonline

Contents
Survey Data - Entrepreneurs Data Mining
kaggle.com
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lay Christian (2024). Survey Data - Entrepreneurs Data Mining [Dataset]. https://www.kaggle.com/datasets/laychristian/survey-data-entrepreneurs-data-mining
Explore at:
zip(38815 bytes)Available download formats
Dataset updated
Nov 21, 2024
Authors
Lay Christian
Description
Title: Identifying Factors that Affect Entrepreneurs’ Use of Data Mining for Analytics Authors: Edward Matthew Dominica, Feylin Wijaya, Andrew Giovanni Winoto, Christian Conference: The 4th International Conference on Electrical, Computer, Communications, and Mechatronics Engineering https://www.iceccme.com/home

This dataset was created to support research focused on understanding the factors influencing entrepreneurs’ adoption of data mining techniques for business analytics. The dataset contains carefully curated data points that reflect entrepreneurial behaviors, decision-making criteria, and the role of data mining in enhancing business insights.

Researchers and practitioners can leverage this dataset to explore patterns, conduct statistical analyses, and build predictive models to gain a deeper understanding of entrepreneurial adoption of data mining.

Intended Use: This dataset is designed for research and academic purposes, especially in the fields of business analytics, entrepreneurship, and data mining. It is suitable for conducting exploratory data analysis, hypothesis testing, and model development.

Citation: If you use this dataset in your research or publication, please cite the paper presented at the ICECCME 2024 conference using the following format: Edward Matthew Dominica, Feylin Wijaya, Andrew Giovanni Winoto, Christian. Identifying Factors that Affect Entrepreneurs’ Use of Data Mining for Analytics. The 4th International Conference on Electrical, Computer, Communications, and Mechatronics Engineering (2024).
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
G
Data Mining Tools Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Data Mining Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-mining-tools-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Aug 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining Tools Market Outlook

According to our latest research, the global Data Mining Tools market size reached USD 1.93 billion in 2024, reflecting robust industry momentum. The market is expected to grow at a CAGR of 12.7% from 2025 to 2033, reaching a projected value of USD 5.69 billion by 2033. This growth is primarily driven by the increasing adoption of advanced analytics across diverse industries, rapid digital transformation, and the necessity for actionable insights from massive data volumes.

One of the pivotal growth factors propelling the Data Mining Tools market is the exponential rise in data generation, particularly through digital channels, IoT devices, and enterprise applications. Organizations across sectors are leveraging data mining tools to extract meaningful patterns, trends, and correlations from structured and unstructured data. The need for improved decision-making, operational efficiency, and competitive advantage has made data mining an essential component of modern business strategies. Furthermore, advancements in artificial intelligence and machine learning are enhancing the capabilities of these tools, enabling predictive analytics, anomaly detection, and automation of complex analytical tasks, which further fuels market expansion.

Another significant driver is the growing demand for customer-centric solutions in industries such as retail, BFSI, and healthcare. Data mining tools are increasingly being used for customer relationship management, targeted marketing, fraud detection, and risk management. By analyzing customer behavior and preferences, organizations can personalize their offerings, optimize marketing campaigns, and mitigate risks. The integration of data mining tools with cloud platforms and big data technologies has also simplified deployment and scalability, making these solutions accessible to small and medium-sized enterprises (SMEs) as well as large organizations. This democratization of advanced analytics is creating new growth avenues for vendors and service providers.

The regulatory landscape and the increasing emphasis on data privacy and security are also shaping the development and adoption of Data Mining Tools. Compliance with frameworks such as GDPR, HIPAA, and CCPA necessitates robust data governance and transparent analytics processes. Vendors are responding by incorporating features like data masking, encryption, and audit trails into their solutions, thereby enhancing trust and adoption among regulated industries. Additionally, the emergence of industry-specific data mining applications, such as fraud detection in BFSI and predictive diagnostics in healthcare, is expanding the addressable market and fostering innovation.

From a regional perspective, North America currently dominates the Data Mining Tools market owing to the early adoption of advanced analytics, strong presence of leading technology vendors, and high investments in digital transformation. However, the Asia Pacific region is emerging as a lucrative market, driven by rapid industrialization, expansion of IT infrastructure, and growing awareness of data-driven decision-making in countries like China, India, and Japan. Europe, with its focus on data privacy and digital innovation, also represents a significant market share, while Latin America and the Middle East & Africa are witnessing steady growth as organizations in these regions modernize their operations and adopt cloud-based analytics solutions.

Component Analysis

The Component segment of the Data Mining Tools market is bifurcated into Software and Services. Software remains the dominant segment, accounting for the majority of the market share in 2024. This dominance is attributed to the continuous evolution of data mining algorithms, the proliferation of user-friendly graphical interfaces, and the integration of advanced analytics capabilities such as machine learning, artificial intelligence, and natural language pro
e
Statistical Analysis and Data Mining - if-computation
exaly.com
csv, json
Updated Nov 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Statistical Analysis and Data Mining - if-computation [Dataset]. https://exaly.com/journal/28577/statistical-analysis-and-data-mining/impact-factor
Explore at:
json, csvAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This graph shows how the impact factor of ^ is computed. The left axis depicts the number of papers published in years X-1 and X-2, and the right axis displays their citations in year X.
Hidden Room Educational Data Mining Analysis
figshare.com
png
Updated Sep 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Palomo-duarte; Anke Berns (2016). Hidden Room Educational Data Mining Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.3084319.v11
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3084319.v11
Dataset updated
Sep 27, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Manuel Palomo-duarte; Anke Berns
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

Histograms and results of k-means and Ward's clustering for Hidden Room gameThe fileset contains information from three sources:1. Histograms files:* Lexical_histogram.png (histogram of lexical error ratios)* Grammatical_histogram.png (histogram of grammatical error ratios)2. K-means clustering files:* elbow-lex kmeans.png (clustering by lexical aspects: error curves obtained for applying elbow method to determinate the optimal number of clusters)* cube-lex kmeans.png (clustering by lexical aspects: a three-dimensional representation of clusters obtained after applying k-means method)* Lexical_clusters (table) kmeans.xls (clustering by lexical aspects: centroids, standard deviations and number of instances assigned to each cluster)* elbow-gram kmeans.png (clustering by grammatical aspects: error curves obtained for applying elbow method to determinate the optimal number of clusters)* cube-gramm kmeans.png (clustering by grammatical aspects: a three-dimensional representation of clusters obtained after applying k-means method)* Grammatical_clusters (table) kmeans.xls (clustering by grammatical aspects: centroids, standard deviations and number of instances assigned to each cluster)* elbow-lexgram kmeans.png (clustering by lexical and grammatical aspects: error curves obtained for applying elbow method to determinate the optimal number of clusters)* Lexical_Grammatical_clusters (table) kmeans.xls (clustering by lexical and grammatical aspects: centroids, standard deviations and number of instances assigned to each cluster)* Grammatical_clusters_number_of_words (table) kmeans.xls

number of words (from column 2 to 4) and sizes (last column) obtained per each cluster by applying k-means clustering to grammatical error ratios.* Lexical_clusters_number_of_words (table) kmeans.xls

number of words (from column 2 to 4) and sizes (last column) obtained per each cluster by applying k-means clustering to lexical error ratios.* Lexical_Grammatical_clusters_number_of_words (table) kmeans.xls

number of words (from column 2 to 4) and sizes (last column) obtained per each cluster by applying k-means clustering to lexical and grammatical error ratios.3. Ward’s Agglomerative Hierarchical Clustering files:* Lexical_Cluster_Dendrogram_ward.png (clustering by lexical aspects: dendrogram obtained after applying Ward's clustering method).* Grammatical_Cluster_Dendrogram_ward.png (clustering by grammatical aspects: dendrogram obtained after applying Ward's clustering method)* Lexical_Grammatical_Cluster_Dendrogram_ward.png (clustering by lexical and grammatical aspects: dendrogram obtained after applying Ward's clustering method)* Lexical_Grammatical_clusters (table) ward.xls: Centroids (from column 2 to 7) and cluster sizes (last column) obtained by applying Ward's agglomerative hierarchical clustering to lexical and grammatical error ratios.* Grammatical_clusters (table) ward.xls: Centroids (from column 2 to 4) and cluster sizes (last column) obtained by applying Ward's agglomerative hierarchical clustering to grammatical error ratios.* Lexical_clusters (table) ward.xls: Centroids (from column 2 to 4) and cluster sizes (last column) obtained by applying Ward's agglomerative hierarchical clustering to lexical error ratios.* Lexical_clusters_number_of_words (table) ward.xls: number of words (from column 2 to 4) and sizes (last column) obtained per each cluster by applying Ward's agglomerative hierarchical clustering to lexical error ratios.* Grammatical_clusters_number_of_words (table) ward.xls: number of words (from column 2 to 4) and sizes (last column) obtained per each cluster by applying Ward's agglomerative hierarchical clustering to grammatical error ratios.* Lexical_Grammatical_clusters_number_of_words (table) ward.xls: number of words (from column 2 to 4) and sizes (last column) obtained per each cluster by applying Ward's agglomerative hierarchical clustering to lexical and grammatical error ratios.
Privacy Preserving Distributed Data Mining - Dataset - NASA Open Data Portal...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Privacy Preserving Distributed Data Mining - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/privacy-preserving-distributed-data-mining
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:
Designing a more efficient, effective and safe Medical Emergency Team (MET)...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christoph Bergmeir; Irma Bilgrami; Christopher Bain; Geoffrey I. Webb; Judit Orosz; David Pilcher (2023). Designing a more efficient, effective and safe Medical Emergency Team (MET) service using data analysis [Dataset]. http://doi.org/10.1371/journal.pone.0188688
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0188688
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Christoph Bergmeir; Irma Bilgrami; Christopher Bain; Geoffrey I. Webb; Judit Orosz; David Pilcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionHospitals have seen a rise in Medical Emergency Team (MET) reviews. We hypothesised that the commonest MET calls result in similar treatments. Our aim was to design a pre-emptive management algorithm that allowed direct institution of treatment to patients without having to wait for attendance of the MET team and to model its potential impact on MET call incidence and patient outcomes.MethodsData was extracted for all MET calls from the hospital database. Association rule data mining techniques were used to identify the most common combinations of MET call causes, outcomes and therapies.ResultsThere were 13,656 MET calls during the 34-month study period in 7936 patients. The most common MET call was for hypotension [31%, (2459/7936)]. These MET calls were strongly associated with the immediate administration of intra-venous fluid (70% [1714/2459] v 13% [739/5477] p
Data from: Enriching time series datasets using Nonparametric kernel...
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1609661.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Mohamad Ivan Fanany
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.
f
DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf
frontiersin.figshare.com
pdf
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi (2023). DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf [Dataset]. http://doi.org/10.3389/fgene.2019.00934.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00934.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Biological data analysis is the key to new discoveries in disease biology and drug discovery. The rapid proliferation of high-throughput ‘omics’ data has necessitated a need for tools and platforms that allow the researchers to combine and analyse different types of biological data and obtain biologically relevant knowledge. We had previously developed TargetMine, an integrative data analysis platform for target prioritisation and broad-based biological knowledge discovery. Here, we describe the newly modelled biological data types and the enhanced visual and analytical features of TargetMine. These enhancements have included: an enhanced coverage of gene–gene relations, small molecule metabolite to pathway mappings, an improved literature survey feature, and in silico prediction of gene functional associations such as protein–protein interactions and global gene co-expression. We have also described two usage examples on trans-omics data analysis and extraction of gene-disease associations using MeSH term descriptors. These examples have demonstrated how the newer enhancements in TargetMine have contributed to a more expansive coverage of the biological data space and can help interpret genotype–phenotype relations. TargetMine with its auxiliary toolkit is available at https://targetmine.mizuguchilab.org. The TargetMine source code is available at https://github.com/chenyian-nibio/targetmine-gradle.
D
Data Mining Tools Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Data Mining Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/data-mining-tools-34258
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 16, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Data Mining Tools market was valued at USD 571.4 million in 2024 and is projected to reach USD 882.13 million by 2033, with an expected CAGR of 6.4 % during the forecast period.
D
Data Mining Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-software-41235
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Data Mining Software market is experiencing robust growth, driven by the increasing need for businesses to extract valuable insights from massive datasets. The market, estimated at $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching an estimated $45 billion by 2033. This expansion is fueled by several key factors. The burgeoning adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting both large enterprises and SMEs. Furthermore, advancements in machine learning and artificial intelligence algorithms are enhancing the accuracy and efficiency of data mining processes, leading to better decision-making across various sectors like finance, healthcare, and marketing. The rise of big data analytics and the increasing availability of affordable, high-powered computing resources are also significant contributors to market growth. However, the market faces certain challenges. Data security and privacy concerns remain paramount, especially with the increasing volume of sensitive information being processed. The complexity of data mining software and the need for skilled professionals to operate and interpret the results present a barrier to entry for some businesses. The high initial investment cost associated with implementing sophisticated data mining solutions can also deter smaller organizations. Nevertheless, the ongoing technological advancements and the growing recognition of the strategic value of data-driven decision-making are expected to overcome these restraints and propel the market toward continued expansion. The market segmentation reveals a strong preference for cloud-based solutions, reflecting the industry's trend toward flexible and scalable IT infrastructure. Large enterprises currently dominate the market share, but SMEs are rapidly adopting data mining software, indicating promising future growth in this segment. Geographic analysis shows that North America and Europe are currently leading the market, but the Asia-Pacific region is poised for significant growth due to increasing digitalization and economic expansion in countries like China and India.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Data Mining at NASA: From Theory to Applications [Dataset]. https://catalog.data.gov/dataset/data-mining-at-nasa-from-theory-to-applications

Data from: Data Mining at NASA: From Theory to Applications

Explore at:

Dataset updated

Aug 23, 2025

Dataset provided by

Dashlink

Description

NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.

Clear search

Close search

Google apps

Main menu

Data from: Data Mining at NASA: From Theory to Applications

Statistical Analysis and Data Mining - impact-factor

Data Mining Tools Market Size, Share, Trend Analysis by 2033

Privacy Preserving Distributed Data Mining

Data Mining Market Size, Competitive Landscape, Growth Trends 2030

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Data Mining Tools Market - A Global and Regional Analysis

Educational Attainment in North Carolina Public Schools: Use of statistical...

Wiley.Data.Mining.for.Business.Analytics.in.R.Full

Dataset

Contents

Survey Data - Entrepreneurs Data Mining

Orange dataset table

Data Mining Tools Market Research Report 2033

Data Mining Tools Market Outlook

Component Analysis

Statistical Analysis and Data Mining - if-computation

Hidden Room Educational Data Mining Analysis

Privacy Preserving Distributed Data Mining - Dataset - NASA Open Data Portal...

Designing a more efficient, effective and safe Medical Emergency Team (MET)...

Data from: Enriching time series datasets using Nonparametric kernel...

DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf

Data Mining Tools Report

Data Mining Software Report

Data from: Data Mining at NASA: From Theory to ApplicationsSee More Versions

Data from: Data Mining at NASA: From Theory to Applications