88 datasets found

d
Data from: Data Mining at NASA: From Theory to Applications
catalog.data.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
+2more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining at NASA: From Theory to Applications [Dataset]. https://catalog.data.gov/dataset/data-mining-at-nasa-from-theory-to-applications
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.
Lifesciences Data Mining and Visualization Market Report | Global Forecast...
dataintelo.com
csv, pdf, pptx
Updated Sep 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Lifesciences Data Mining and Visualization Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-lifesciences-data-mining-and-visualization-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Sep 5, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Lifesciences Data Mining and Visualization Market Outlook

The global market size for Lifesciences Data Mining and Visualization was valued at approximately USD 1.5 billion in 2023 and is projected to reach around USD 4.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 12.5% during the forecast period. The growth of this market is driven by the increasing demand for sophisticated data analysis tools in the life sciences sector, advancements in analytical technologies, and the rising volume of complex biological data generated from research and clinical trials.

One of the primary growth factors for the Lifesciences Data Mining and Visualization market is the burgeoning amount of data generated from various life sciences applications, such as genomics, proteomics, and clinical trials. With the advent of high-throughput technologies, researchers and healthcare professionals are now capable of generating vast amounts of data, which necessitates the use of advanced data mining and visualization tools to derive actionable insights. These tools not only help in managing and interpreting large datasets but also in uncovering hidden patterns and relationships, thereby accelerating research and development processes.

Another significant driver is the increasing adoption of artificial intelligence (AI) and machine learning (ML) algorithms in the life sciences domain. These technologies have proven to be invaluable in enhancing data analysis capabilities, enabling more precise and predictive modeling of biological systems. By integrating AI and ML with data mining and visualization platforms, researchers can achieve higher accuracy in identifying potential drug targets, understanding disease mechanisms, and personalizing treatment plans. This trend is expected to continue, further propelling the market's growth.

Moreover, the rising emphasis on personalized medicine and the need for precision in healthcare is fueling the demand for data mining and visualization tools. Personalized medicine relies heavily on the analysis of individual genetic, proteomic, and metabolomic profiles to tailor treatments specifically to patients' unique characteristics. The ability to visualize these complex datasets in an understandable and actionable manner is critical for the successful implementation of personalized medicine strategies, thereby boosting the demand for advanced data analysis tools.

From a regional perspective, North America is anticipated to dominate the Lifesciences Data Mining and Visualization market, owing to the presence of a robust healthcare infrastructure, significant investments in research and development, and a high adoption rate of advanced technologies. The European market is also expected to witness substantial growth, driven by increasing government initiatives to support life sciences research and the presence of leading biopharmaceutical companies. The Asia Pacific region is projected to experience the fastest growth, attributed to the expanding healthcare sector, rising investments in biotechnology research, and the increasing adoption of data analytics solutions.

Component Analysis

The Lifesciences Data Mining and Visualization market is segmented by component into software and services. The software segment is expected to hold a significant share of the market, driven by the continuous advancements in data mining algorithms and visualization techniques. Software solutions are critical in processing large volumes of complex biological data, facilitating real-time analysis, and providing intuitive visual representations that aid in decision-making. The increasing integration of AI and ML into these software solutions is further enhancing their capabilities, making them indispensable tools in life sciences research.

The services segment, on the other hand, is projected to grow at a considerable rate, as organizations seek specialized expertise to manage and interpret their data. Services include consulting, implementation, and maintenance, as well as training and support. The demand for these services is driven by the need to ensure optimal utilization of data mining software and to keep up with the rapid pace of technological advancements. Moreover, many life sciences organizations lack the in-house expertise required to handle large-scale data analytics projects, thereby turning to external service providers for assistance.

Within the software segment, there is a growing trend towards the development of integrated platforms that combine multiple functionalities, such as data collection, pre
Data Mining Software Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Data Mining Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-mining-software-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Mining Software Market Outlook

The global data mining software market size was valued at USD 7.2 billion in 2023 and is projected to reach USD 15.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 8.7% during the forecast period. This growth is driven primarily by the increasing adoption of big data analytics and the rising demand for business intelligence across various industries. As businesses increasingly recognize the value of data-driven decision-making, the market is expected to witness substantial growth.

One of the significant growth factors for the data mining software market is the exponential increase in data generation. With the proliferation of internet-enabled devices and the rapid advancement of technologies such as the Internet of Things (IoT), there is a massive influx of data. Organizations are now more focused than ever on harnessing this data to gain insights, improve operations, and create a competitive advantage. This has led to a surge in demand for advanced data mining tools that can process and analyze large datasets efficiently.

Another driving force is the growing need for personalized customer experiences. In industries such as retail, healthcare, and BFSI, understanding customer behavior and preferences is crucial. Data mining software enables organizations to analyze customer data, segment their audience, and deliver personalized offerings, ultimately enhancing customer satisfaction and loyalty. This drive towards personalization is further fueling the adoption of data mining solutions, contributing significantly to market growth.

The integration of artificial intelligence (AI) and machine learning (ML) technologies with data mining software is also a key growth factor. These advanced technologies enhance the capabilities of data mining tools by enabling them to learn from data patterns and make more accurate predictions. The convergence of AI and data mining is opening new avenues for businesses, allowing them to automate complex tasks, predict market trends, and make informed decisions more swiftly. The continuous advancements in AI and ML are expected to propel the data mining software market over the forecast period.

Regionally, North America holds a significant share of the data mining software market, driven by the presence of major technology companies and the early adoption of advanced analytics solutions. The Asia Pacific region is also expected to witness substantial growth due to the rapid digital transformation across various industries and the increasing investments in data infrastructure. Additionally, the growing awareness and implementation of data-driven strategies in emerging economies are contributing to the market expansion in this region.

Text Mining Software is becoming an integral part of the data mining landscape, offering unique capabilities to analyze unstructured data. As organizations generate vast amounts of textual data from various sources such as social media, emails, and customer feedback, the need for specialized tools to extract meaningful insights is growing. Text Mining Software enables businesses to process and analyze this data, uncovering patterns and trends that were previously hidden. This capability is particularly valuable in industries like marketing, customer service, and research, where understanding the nuances of language can lead to more informed decision-making. The integration of text mining with traditional data mining processes is enhancing the overall analytical capabilities of organizations, allowing them to derive comprehensive insights from both structured and unstructured data.

Component Analysis

The data mining software market is segmented by components, which primarily include software and services. The software segment encompasses various types of data mining tools that are used for analyzing and extracting valuable insights from raw data. These tools are designed to handle large volumes of data and provide advanced functionalities such as predictive analytics, data visualization, and pattern recognition. The increasing demand for sophisticated data analysis tools is driving the growth of the software segment. Enterprises are investing in these tools to enhance their data processing capabilities and derive actionable insights.

Within the software segment, the emergence of cloud-based data mining solutions is a notable trend. Cloud-based solutions offer several advantages, including s
e
Overview and Concepts of Data Warehousing
paper.erudition.co.in
html
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Overview and Concepts of Data Warehousing [Dataset]. https://paper.erudition.co.in/makaut/btech-in-information-technology/7/data-warehousing-and-data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Overview and Concepts of Data Warehousing of Data Warehousing & Data Mining, 7th Semester , Information Technology
r
A predictive model for opal exploration in Australia from a data mining...
researchdata.edu.au
Updated May 1, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Landgrebe; Thomas Landgrebe; Adriana Dutkiewicz; Dietmar Muller (2015). A predictive model for opal exploration in Australia from a data mining approach [Dataset]. http://doi.org/10.4227/11/5587A86C0FDF1
Explore at:
Unique identifier
https://doi.org/10.4227/11/5587A86C0FDF1
Dataset updated
May 1, 2015
Dataset provided by
The University of Sydney
Authors
Thomas Landgrebe; Thomas Landgrebe; Adriana Dutkiewicz; Dietmar Muller
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Dataset funded by
Australian Research Council
Description
This data collection is associated with the publications: Merdith, A. S., Landgrebe, T. C. W., Dutkiewicz, A., & Müller, R. D. (2013). Towards a predictive model for opal exploration using a spatio-temporal data mining approach. Australian Journal of Earth Sciences, 60(2), 217-229. doi: 10.1080/08120099.2012.754793
and
Landgrebe, T. C. W., Merdith, A., Dutkiewicz, A., & Müller, R. D. (2013). Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach. Computers & Geosciences, 56(0), 76-82. doi: 10.1016/j.cageo.2013.02.002
Publication Abstract - Merdith et al. (2013)
Opal is Australia's national gemstone, however most significant opal discoveries were made in the early 1900's - more than 100 years ago - until recently. Currently there is no formal exploration model for opal, meaning there are no widely accepted concepts or methodologies available to suggest where new opal fields may be found. As a consequence opal mining in Australia is a cottage industry with the majority of opal exploration focused around old opal fields. The EarthByte Group has developed a new opal exploration methodology for the Great Artesian Basin. The work is based on the concept of applying “big data mining” approaches to data sets relevant for identifying regions that are prospective for opal. The group combined a multitude of geological and geophysical data sets that were jointly analysed to establish associations between particular features in the data with known opal mining sites. A “training set” of known opal localities (1036 opal mines) was assembled, using those localities, which were featured in published reports and on maps. The data used include rock types, soil type, regolith type, topography, radiometric data and a stack of digital palaeogeographic maps. The different data layers were analysed via spatio-temporal data mining combining the GPlates PaleoGIS software (www.gplates.org) with the Orange data mining software (orange.biolab.si) to produce the first opal prospectivity map for the Great Artesian Basin. One of the main results of the study is that the geological conditions favourable for opal were found to be related to a particular sequence of surface environments over geological time. These conditions involved alternating shallow seas and river systems followed by uplift and erosion. The approach reduces the entire area of the Great Artesian Basin to a mere 6% that is deemed to be prospective for opal exploration. The work is described in two companion papers in the Australian Journal of Earth Sciences and Computers and Geosciences.
Publication Abstract - Landgrebe et al. (2013)
Age-coded multi-layered geological datasets are becoming increasingly prevalent with the surge in open-access geodata, yet there are few methodologies for extracting geological information and knowledge from these data. We present a novel methodology, based on the open-source GPlates software in which age-coded digital palaeogeographic maps are used to “data-mine” spatio-temporal patterns related to the occurrence of Australian opal. Our aim is to test the concept that only a particular sequence of depositional/erosional environments may lead to conditions suitable for the formation of gem quality sedimentary opal. Time-varying geographic environment properties are extracted from a digital palaeogeographic dataset of the eastern Australian Great Artesian Basin (GAB) at 1036 opal localities. We obtain a total of 52 independent ordinal sequences sampling 19 time slices from the Early Cretaceous to the present-day. We find that 95% of the known opal deposits are tied to only 27 sequences all comprising fluvial and shallow marine depositional sequences followed by a prolonged phase of erosion. We then map the total area of the GAB that matches these 27 opal-specific sequences, resulting in an opal-prospective region of only about 10% of the total area of the basin. The key patterns underlying this association involve only a small number of key environmental transitions. We demonstrate that these key associations are generally absent at arbitrary locations in the basin. This new methodology allows for the simplification of a complex time-varying geological dataset into a single map view, enabling straightforward application for opal exploration and for future co-assessment with other datasets/geological criteria. This approach may help unravel the poorly understood opal formation process using an empirical spatio-temporal data-mining methodology and readily available datasets to aid hypothesis testing.
Authors and Institutions
Andrew Merdith - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-7564-8149
Thomas Landgrebe - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia
Adriana Dutkiewicz - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia
R. Dietmar Müller - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-3334-5764
Overview of Resources Contained
This collection contains geological data from Australia used for data mining in the publications Merdith et al. (2013) and Landgrebe et al. (2013). The resulting maps of opal prospectivity are also included.
List of Resources
Note: For details on the files included in this data collection, see “Description_of_Resources.txt”.
Note: For information on file formats and what programs to use to interact with various file formats, see “File_Formats_and_Recommended_Programs.txt”.
Map of Barfield region, Australia (.jpg, 270 KB)
Map overviewing the Great Artesian basins and main opal mining camps (.png, 82 KB)
Maps showing opal prospectivity data mining results for different geological datasets (.tif, 23.1 MB)
Map of opal prospectivity from palaeogeography data mining (.pdf, 2.6 MB)
Raster of palaeogeography target regions for viewing in Google Earth (.jpg, 418 KB)
Opal mine locations (.gpml, .txt, .kmz, .shp, total 15.6 MB)
Map of opal prospectivity from all data mining results as a Google Earth overlay (.kmz, 12 KB)
Map of probability of opal occurrence in prospective regions from all data mining results (.tif, 5.9 MB)
Paleogeography of Australia (.gpml, .txt, .shp, total 114.2 MB)
Radiometric data showing potassium concentration contrasts (.tif, .kmz, total 311.3 MB)
Regolith data (.gpml, .txt, .kml, .shp, total 7.1 MB)
Soil type data (.gpml, .txt, .kml, .shp, total 7.1 MB)
For more information on this data collection, and links to other datasets from the EarthByte Research Group please visit EarthByte
For more information about using GPlates, including tutorials and a user manual please visit GPlates or EarthByte
f
Data from: Improving the semantic quality of conceptual models through text...
figshare.com
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom Willaert (2023). Improving the semantic quality of conceptual models through text mining. A proof of concept [Dataset]. http://doi.org/10.6084/m9.figshare.6951608.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.6951608.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Tom Willaert
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Python code generated in the context of the dissertation 'Improving the semantic quality of conceptual models through text mining. A proof of concept' (Postgraduate studies Big Data & Analytics for Business and Management, KU Leuven Faculty of Economics and Business, 2018)
t
Data Mining Tools Global Market Report 2025
thebusinessresearchcompany.com
pdf,excel,csv,ppt
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Business Research Company (2025). Data Mining Tools Global Market Report 2025 [Dataset]. https://www.thebusinessresearchcompany.com/report/data-mining-tools-global-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jan 15, 2025
Dataset authored and provided by
The Business Research Company
License
https://www.thebusinessresearchcompany.com/privacy-policyhttps://www.thebusinessresearchcompany.com/privacy-policy
Description
Global Data Mining Tools market size is expected to reach $2.13 billion by 2029 at 12.9%, segmented as by tools, data mining software, data visualization tools, data preparation tools, predictive analytics tools, reporting tools
m
SPHERE: Students' performance dataset of conceptual understanding,...
data.mendeley.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
Explore at:
Unique identifier
https://doi.org/10.17632/88d7m2fv7p.2
Dataset updated
Jan 15, 2025
Authors
Purwoko Haryadi Santoso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.
Data and Model Checkpoints for "Weakly Supervised Concept Map Generation...
figshare.com
application/x-gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiaying Lu (2023). Data and Model Checkpoints for "Weakly Supervised Concept Map Generation through Task-Guided Graph Translation" [Dataset]. http://doi.org/10.6084/m9.figshare.16415802.v2
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16415802.v2
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jiaying Lu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and model checkpoints for paper "Weakly Supervised Concept Map Generation through Task-Guided Graph Translation" by Jiaying Lu, Xiangjue Dong, and Carl Yang. The paper has been accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE).

GT-D2G-*.tar.gz are model checkpoints for GT-D2G variants. These models are trained by seed=27. nyt/dblp/yelp.*.win5.pickle.gz are initial graphs generated by NLP pipelines. glove.840B.restaurant.400d.vec.gz is the pre-trained embedding for the Yelp dataset.

For more instructions, please refer to our GitHub repo.
f
Expanding the Kendrick Mass Plot Toolbox in MZmine 2 to Enable Rapid Polymer...
acs.figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ansgar Korf; Thierry Fouquet; Robin Schmid; Heiko Hayen; Sebastian Hagenhoff (2023). Expanding the Kendrick Mass Plot Toolbox in MZmine 2 to Enable Rapid Polymer Characterization in Liquid Chromatography−Mass Spectrometry Data Sets [Dataset]. http://doi.org/10.1021/acs.analchem.9b03863.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.9b03863.s002
Dataset updated
May 31, 2023
Dataset provided by
ACS Publications
Authors
Ansgar Korf; Thierry Fouquet; Robin Schmid; Heiko Hayen; Sebastian Hagenhoff
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Technological advances in mass spectrometry (MS) toward more accurate and faster data acquisition result in highly informative but also more complex data sets. Especially the hyphenation of liquid chromatography (LC) and MS yields large data files containing a high amount of compound specific information. Using electrospray-ionization for compounds such as polymers enables highly sensitive detection, yet results in very complex spectra, containing multiply charged ions and adducts. Recent years have seen the development of novel or updated data mining strategies to reduce the MS spectra complexity and to ultimately simplify the data analysis workflow. Among other techniques, the Kendrick mass defect analysis, which graphically highlights compounds containing a given repeating unit, has been revitalized with applications in multiple fields of study, such as lipids and polymers. Especially for the latter, various data mining concepts have been developed, which extend regular Kendrick mass defect analysis to multiply charged ion series. The aim of this work is to collect and subsequently implement these concepts in one of the most popular open-source MS data mining software, i.e., MZmine 2, to make them rapidly available for different MS based measurement techniques and various vendor formats, with a special focus on hyphenated techniques such as LC–MS. In combination with already existing data mining modules, an example data set was processed and simplified, enabling an ever faster evaluation and polymer characterization.
o
Data from: Data-Mining of In-Situ TEM Experiments: Towards Understanding...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominik Steinberger; Inas Issa; Rachel Strobl; Peter J Imrich; Daniel Kiener; Stefan Sandfeld (2022). Data-Mining of In-Situ TEM Experiments: Towards Understanding Nanoscale Fracture [Dataset]. http://doi.org/10.5281/zenodo.7251414
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7251414
Dataset updated
Jun 22, 2022
Authors
Dominik Steinberger; Inas Issa; Rachel Strobl; Peter J Imrich; Daniel Kiener; Stefan Sandfeld
Description
Datasets for the publication in the "Computational Materials Science". This is essentially a snapshot of the gitlab repository https://gitlab.com/computational-materials-science/public/publication-data-and-code/2022-data-mining-of-in-situ-tem-experiments that might contain additional updates and scripts. A version of the manuscript can also be found at https://arxiv.org/abs/2206.11355
d
Data mining approaches to quantifying the formation of secondary organic...
datasets.ai
catalog.data.gov
53
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). Data mining approaches to quantifying the formation of secondary organic aerosol [Dataset]. https://datasets.ai/datasets/data-mining-approaches-to-quantifying-the-formation-of-secondary-organic-aerosol
Explore at:
53Available download formats
Dataset updated
Sep 18, 2024
Dataset authored and provided by
U.S. Environmental Protection Agency
Description
This research used data mining approaches to better understand factors affecting the formation of secondary organic aerosol (SOA). Although numerous laboratory and computational studies have been completed on SOA formation, it is still challenging to determine factors that most influence SOA formation. Experimental data were based on previous work described by Offenberg et al. (2017), where volume concentrations of SOA were measured in 139 laboratory experiments involving the oxidation of single hydrocarbons under different operating conditions. Three different data mining methods were used, including nearest neighbor, decision tree, and pattern mining. Both decision tree and pattern mining approaches identified similar chemical and experimental conditions that were important to SOA formation. Among these important factors included the number of methyl groups, the number of rings and the presence of dinitrogen pentoxide (N2O5).

This dataset is associated with the following publication: Olson, D., J. Offenberg, M. Lewandowski, T. Kleindienst, K. Docherty, M. Jaoui, J.D. Krug, and T. Riedel. Data mining approaches to understanding the formation of secondary organic aerosol. ATMOSPHERIC ENVIRONMENT. Elsevier Science Ltd, New York, NY, USA, 252: 118345, (2021).
Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...
zenodo.org
bin, png, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado (2024). CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2 DIABETES [Dataset]. http://doi.org/10.5281/zenodo.7778291
Explore at:
bin, png, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7778291
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Technical notes and documentation on the common data model of the project CONCEPT-DM2.

This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.

Aims of the CONCEPT-DM2 project:

General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.

Main specific aims:

To characterize the care pathways in patients with diabetes through the whole care system in terms of process indicators and pharmacologic recommendations

To compare these observed care pathways with the theoretical clinical pathways derived from the clinical practice guidelines

To assess if the adherence to clinical guidelines influence on important health outcomes, such as cardiovascular hospitalizations.

To compare the traditional analytical methods with process mining methods in terms of modeling quality, prediction performance and information provided.

Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.

Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records

Inclusion criteria: patients that, at 01/01/2017 or during the follow-up from 01/01/2017 to 31/12/2022 had active health card (active TIS - tarjeta sanitaria activa) and code of type 2 diabetes (T2D, DM2 in spanish) in the clinical records of primary care (CIAP2 T90 in case of using CIAP code system)

Exclusion criteria:

patients with no contact with the health system from 01/01/2017 to 31/12/2022

patients that had a T1D (DM1) code opened after the T2D code during the follow-up.

Study period. From 01/01/2017 to 31/12/2022

Files included in this publication:

Datamodel_CONCEPT_DM2_diagram.png

Common data model specification (Datamodel_CONCEPT_DM2_v.0.1.0.xlsx)

Synthetic datasets (Datamodel_CONCEPT_DM2_sample_data)

sample_data1_dm_patient.csv

sample_data2_dm_param.csv

sample_data3_dm_patient.csv

sample_data4_dm_param.csv

sample_data5_dm_patient.csv

sample_data6_dm_param.csv

sample_data7_dm_param.csv

sample_data8_dm_param.csv

Datamodel_CONCEPT_DM2_explanation.pptx
f
The detailed datum of the Experiment C.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jyh-Jian Sheu; Ko-Tsung Chu; Nien-Feng Li; Cheng-Chi Lee (2023). The detailed datum of the Experiment C. [Dataset]. http://doi.org/10.1371/journal.pone.0171518.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0171518.t005
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Jyh-Jian Sheu; Ko-Tsung Chu; Nien-Feng Li; Cheng-Chi Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The detailed datum of the Experiment C.
f
Data from: Mining significant crisp-fuzzy spatial association rules
tandf.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenzhong Shi; Anshu Zhang; Geoffrey I. Webb (2023). Mining significant crisp-fuzzy spatial association rules [Dataset]. http://doi.org/10.6084/m9.figshare.5873139.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5873139.v1
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Wenzhong Shi; Anshu Zhang; Geoffrey I. Webb
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spatial association rule mining (SARM) is an important data mining task for understanding implicit and sophisticated interactions in spatial data. The usefulness of SARM results, represented as sets of rules, depends on their reliability: the abundance of rules, control over the risk of spurious rules, and accuracy of rule interestingness measure (RIM) values. This study presents crisp-fuzzy SARM, a novel SARM method that can enhance the reliability of resultant rules. The method firstly prunes dubious rules using statistically sound tests and crisp supports for the patterns involved, and then evaluates RIMs of accepted rules using fuzzy supports. For the RIM evaluation stage, the study also proposes a Gaussian-curve-based fuzzy data discretization model for SARM with improved design for spatial semantics. The proposed techniques were evaluated by both synthetic and real-world data. The synthetic data was generated with predesigned rules and RIM values, thus the reliability of SARM results could be confidently and quantitatively evaluated. The proposed techniques showed high efficacy in enhancing the reliability of SARM results in all three aspects. The abundance of resultant rules was improved by 50% or more compared with using conventional fuzzy SARM. Minimal risk of spurious rules was guaranteed by statistically sound tests. The probability that the entire result contained any spurious rules was below 1%. The RIM values also avoided large positive errors committed by crisp SARM, which typically exceeded 50% for representative RIMs. The real-world case study on New York City points of interest reconfirms the improved reliability of crisp-fuzzy SARM results, and demonstrates that such improvement is critical for practical spatial data analytics and decision support.
w
Dataset of book subjects that contain Spatial data mining : theory and...
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Spatial data mining : theory and application [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Spatial+data+mining+%3A+theory+and+application&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 4 rows and is filtered where the books is Spatial data mining : theory and application. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
d
Replication Data for: \"Unraveling spatial, structural, and social...
search.dataone.org
dataverse.harvard.edu
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo (2023). Replication Data for: \"Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS\" [Dataset]. http://doi.org/10.7910/DVN/SFT3RT
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SFT3RT
Dataset updated
Nov 9, 2023
Dataset provided by
Harvard Dataverse
Authors
PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo
Description
Data from the article "Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS", by Agustin Pájaro, Ignacio J. Duran and Pablo Rodrigo, published in Revista DADOS, v. 65, n. 3, 2022.
Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
d
Python and R Basics for Environmental Data Sciences
search.dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Tao Wen
Area covered

Description
This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.
f
Top five keyword counts by month.
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jyh-Jian Sheu; Ko-Tsung Chu; Nien-Feng Li; Cheng-Chi Lee (2023). Top five keyword counts by month. [Dataset]. http://doi.org/10.1371/journal.pone.0171518.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0171518.t006
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Jyh-Jian Sheu; Ko-Tsung Chu; Nien-Feng Li; Cheng-Chi Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Top five keyword counts by month.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Data Mining at NASA: From Theory to Applications [Dataset]. https://catalog.data.gov/dataset/data-mining-at-nasa-from-theory-to-applications

Data from: Data Mining at NASA: From Theory to Applications

Explore at:

Dataset updated

Apr 10, 2025

Dataset provided by

Dashlink

Description

NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.

Clear search

Close search

Google apps

Main menu

Data from: Data Mining at NASA: From Theory to Applications

Lifesciences Data Mining and Visualization Market Report | Global Forecast...

Lifesciences Data Mining and Visualization Market Outlook

Component Analysis

Data Mining Software Market Report | Global Forecast From 2025 To 2033

Data Mining Software Market Outlook

Component Analysis

Overview and Concepts of Data Warehousing

A predictive model for opal exploration in Australia from a data mining...

and

Landgrebe, T. C. W., Merdith, A., Dutkiewicz, A., & Müller, R. D. (2013). Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach. Computers & Geosciences, 56(0), 76-82. doi: 10.1016/j.cageo.2013.02.002

Publication Abstract - Merdith et al. (2013)

Publication Abstract - Landgrebe et al. (2013)

Authors and Institutions

Overview of Resources Contained

List of Resources

Data from: Improving the semantic quality of conceptual models through text...

Data Mining Tools Global Market Report 2025

SPHERE: Students' performance dataset of conceptual understanding,...

Data and Model Checkpoints for "Weakly Supervised Concept Map Generation...

Expanding the Kendrick Mass Plot Toolbox in MZmine 2 to Enable Rapid Polymer...

Data from: Data-Mining of In-Situ TEM Experiments: Towards Understanding...

Data mining approaches to quantifying the formation of secondary organic...

Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...

The detailed datum of the Experiment C.

Data from: Mining significant crisp-fuzzy spatial association rules

Dataset of book subjects that contain Spatial data mining : theory and...

Replication Data for: \"Unraveling spatial, structural, and social...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Python and R Basics for Environmental Data Sciences

Top five keyword counts by month.

Data from: Data Mining at NASA: From Theory to Applications