100+ datasets found

Data from: Results obtained in a data mining process applied to a database...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20011798.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
E.M. Ruiz Lobaina; C. P. Romero Suárez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.
e
List of Top Disciplines of Advances in Data Mining and Database Management...
exaly.com
csv, json
Updated Nov 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). List of Top Disciplines of Advances in Data Mining and Database Management Book Series sorted by citations [Dataset]. https://exaly.com/journal/61621/advances-in-data-mining-and-database-management-book-series/citing-disciplines
Explore at:
csv, jsonAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
List of Top Disciplines of Advances in Data Mining and Database Management Book Series sorted by citations.
w
Dataset of books in the Advances in data mining and database management...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books in the Advances in data mining and database management (ADMDM) book series series [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_series&fop0=%3D&fval0=Advances+in+data+mining+and+database+management+%28ADMDM%29+book+series&j=1&j0=book_series
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 2 rows and is filtered where the book series is Advances in data mining and database management (ADMDM) book series. It features 9 columns including author, publication date, language, and book publisher.
e
List of Top Schools of Advances in Data Mining and Database Management Book...
exaly.com
csv, json
Updated Oct 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). List of Top Schools of Advances in Data Mining and Database Management Book Series sorted by citations [Dataset]. https://exaly.com/journal/61621/advances-in-data-mining-and-database-management-book-series/top-schools
Explore at:
csv, jsonAvailable download formats
Dataset updated
Oct 14, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
List of Top Schools of Advances in Data Mining and Database Management Book Series sorted by citations.
Designing a more efficient, effective and safe Medical Emergency Team (MET)...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christoph Bergmeir; Irma Bilgrami; Christopher Bain; Geoffrey I. Webb; Judit Orosz; David Pilcher (2023). Designing a more efficient, effective and safe Medical Emergency Team (MET) service using data analysis [Dataset]. http://doi.org/10.1371/journal.pone.0188688
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0188688
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Christoph Bergmeir; Irma Bilgrami; Christopher Bain; Geoffrey I. Webb; Judit Orosz; David Pilcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionHospitals have seen a rise in Medical Emergency Team (MET) reviews. We hypothesised that the commonest MET calls result in similar treatments. Our aim was to design a pre-emptive management algorithm that allowed direct institution of treatment to patients without having to wait for attendance of the MET team and to model its potential impact on MET call incidence and patient outcomes.MethodsData was extracted for all MET calls from the hospital database. Association rule data mining techniques were used to identify the most common combinations of MET call causes, outcomes and therapies.ResultsThere were 13,656 MET calls during the 34-month study period in 7936 patients. The most common MET call was for hypotension [31%, (2459/7936)]. These MET calls were strongly associated with the immediate administration of intra-venous fluid (70% [1714/2459] v 13% [739/5477] p
e
List of Top Institutions of Advances in Data Mining and Database Management...
exaly.com
csv, json
Updated Nov 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). List of Top Institutions of Advances in Data Mining and Database Management Book Series sorted by citations [Dataset]. https://exaly.com/journal/61621/advances-in-data-mining-and-database-management-/top-institutions
Explore at:
json, csvAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
List of Top Institutions of Advances in Data Mining and Database Management Book Series sorted by citations.
m
T10I4D100K transactional database
data.mendeley.com
Updated Oct 23, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uday kiran RAGE (2019). T10I4D100K transactional database [Dataset]. http://doi.org/10.17632/4hz2vcvxhp.1
Explore at:
Unique identifier
https://doi.org/10.17632/4hz2vcvxhp.1
Dataset updated
Oct 23, 2019
Authors
Uday kiran RAGE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
T10I4D100K is a renowned synthetic database generated using the IBM Quest generator. This database is widely used to evaluate various frequent and correlated pattern mining algorithms.
d
Data from: Towards open data blockchain analytics: a Bitcoin perspective
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan McGinn; Douglas McIlwraith; Yike Guo (2025). Towards open data blockchain analytics: a Bitcoin perspective [Dataset]. http://doi.org/10.5061/dryad.h9r0p65
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.h9r0p65
Dataset updated
Jun 12, 2025
Dataset provided by
Dryad Digital Repository
Authors
Dan McGinn; Douglas McIlwraith; Yike Guo
Time period covered
Jul 9, 2018
Description
Bitcoin is the first implementation of a technology that has become known as a 'public permissionless' blockchain. Such systems allow public read/write access to an append-only blockchain database without the need for any mediating central authority. Instead they guarantee access, security and protocol conformity through an elegant combination of cryptographic assurances and game theoretic economic incentives. Not until the advent of the Bitcoin blockchain has such a trusted, transparent, comprehensive and granular data set of digital economic behaviours been available for public network analysis. In this article, by translating the cumbersome binary data structure of the Bitcoin blockchain into a high fidelity graph model, we demonstrate through various analyses the often overlooked social and econometric benefits of employing such a novel open data architecture. Specifically we show (a) how repeated patterns of transaction behaviours can be revealed to link user activity across t...
e
U.S. Data Analysis Storage Management Market Research Report By Product Type...
exactitudeconsultancy.com
Updated Mar 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exactitude Consultancy (2025). U.S. Data Analysis Storage Management Market Research Report By Product Type (On-Premises, Cloud-Based), By Application (Data Warehousing, Data Mining, Big Data Analytics), By End User (Healthcare, BFSI, Retail, IT and Telecom), By Technology (Hadoop, SQL Databases, NoSQL Databases), By Distribution Channel (Direct Sales, Online Sales) – Forecast to 2034. [Dataset]. https://exactitudeconsultancy.com/reports/50774/u-s-data-analysis-storage-management-market
Explore at:
Dataset updated
Mar 2025
Dataset authored and provided by
Exactitude Consultancy
License
https://exactitudeconsultancy.com/privacy-policyhttps://exactitudeconsultancy.com/privacy-policy
Description
The U.S. Data Analysis Storage Management market is projected to be valued at $10 billion in 2024, driven by factors such as increasing consumer awareness and the rising prevalence of industry-specific trends. The market is expected to grow at a CAGR of 12%, reaching approximately $31 billion by 2034.
e
List of Top Authors of Advances in Data Mining and Database Management Book...
exaly.com
csv, json
Updated Nov 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). List of Top Authors of Advances in Data Mining and Database Management Book Series sorted by citations [Dataset]. https://exaly.com/journal/61621/advances-in-data-mining-and-database-management-book-series/top-authors
Explore at:
csv, jsonAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
List of Top Authors of Advances in Data Mining and Database Management Book Series sorted by citations.
f
Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Oct 22, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Triant, Deborah A.; Andorf, Carson M.; Gardiner, Jack M.; Unni, Deepak R.; Elsik, Christine G.; Nguyen, Hung N.; Le Tourneau, Justin J.; Tayal, Aditi; Walsh, Amy T.; Portwood, John L.; Cannon, Ethalinda K. S.; Shamimuzzaman, (2020). Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484626
Explore at:
Dataset updated
Oct 22, 2020
Authors
Triant, Deborah A.; Andorf, Carson M.; Gardiner, Jack M.; Unni, Deepak R.; Elsik, Christine G.; Nguyen, Hung N.; Le Tourneau, Justin J.; Tayal, Aditi; Walsh, Amy T.; Portwood, John L.; Cannon, Ethalinda K. S.; Shamimuzzaman,
Description
MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.
r
Journal of Computational Design and Engineering Impact Factor 2024-2025 -...
researchhelpdesk.org
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Journal of Computational Design and Engineering Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/293/journal-of-computational-design-and-engineering
Explore at:
Dataset updated
Feb 23, 2022
Dataset authored and provided by
Research Help Desk
Description
Journal of Computational Design and Engineering Impact Factor 2024-2025 - ResearchHelpDesk - Journal of Computational Design and Engineering is an international journal that aims to provide academia and industry with a venue for rapid publication of research papers reporting innovative computational methods and applications to achieve a major breakthrough, practical improvements, and bold new research directions within a wide range of design and engineering: Theory and its progress in computational advancement for design and engineering Development of computational framework to support large scale design and engineering Interaction issues among human, designed artifacts, and systems Knowledge-intensive technologies for intelligent and sustainable systems Emerging technology and convergence of technology fields presented with convincing design examples Educational issues for academia, practitioners, and future generation Proposal on new research directions as well as survey and retrospectives on mature field. Examples of relevant topics include traditional and emerging issues in design and engineering but are not limited to: Field specific issues in mechanical, aerospace, shipbuilding, industrial, architectural, plant, and civil engineering as well as industrial design Geometric modeling and processing, solid and heterogeneous modeling, computational geometry, features, and virtual prototyping Computer graphics, virtual and augmented reality, and scientific visualization Human modeling and engineering, user interaction and experience, HCI, HMI, human-vehicle interaction(HVI), cognitive engineering, and human factors and ergonomics with computers Knowledge-based engineering, intelligent CAD, AI and machine learning in design, and ontology Product data exchange and management, PDM/PLM/CPC, PDX/PDQ, interoperability, data mining, and database issues Design theory and methodology, sustainable design and engineering, concurrent engineering, and collaborative engineering Digital/virtual manufacturing, rapid prototyping and tooling, and CNC machining Computer aided inspection, geometric and engineering tolerancing, and reverse engineering Finite element analysis, optimization, meshes and discretization, and virtual engineering Bio-CAD, Nano-CAD, and medical applications Industrial design, aesthetic design, new media, and design education Survey and benchmark reports
l
LSC (Leicester Scientific Corpus)
figshare.le.ac.uk
Updated Apr 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LSC (Leicester Scientific Corpus) [Dataset]. http://doi.org/10.25392/leicester.data.9449639.v2
Explore at:
Unique identifier
https://doi.org/10.25392/leicester.data.9449639.v2
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
The LSC (Leicester Scientific Corpus)

April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data are extracted from the Web of Science [1]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.[Version 2] A further cleaning is applied in Data Processing for LSC Abstracts in Version 1*. Details of cleaning procedure are explained in Step 6.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v1.Getting StartedThis text provides the information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the meaning of research texts and make it available for use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. The corpus contains only documents in English. Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’. 5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’. 6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4] 7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018. We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,350.Data ProcessingStep 1: Downloading of the Data Online

The dataset is collected manually by exporting documents as Tab-delimitated files online. All documents are available online.Step 2: Importing the Dataset to R

The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryAs our research is based on the analysis of abstracts and categories, all documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsEspecially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc. Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. The detection and identification of such words is done by sampling of medicine-related publications with human intervention. Detected concatenate words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.The section headings in such abstracts are listed below:

Background Method(s) Design Theoretical Measurement(s) Location Aim(s) Methodology Process Abstract Population Approach Objective(s) Purpose(s) Subject(s) Introduction Implication(s) Patient(s) Procedure(s) Hypothesis Measure(s) Setting(s) Limitation(s) Discussion Conclusion(s) Result(s) Finding(s) Material (s) Rationale(s) Implications for health and nursing policyStep 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction, the lengths of abstracts are calculated. ‘Length’ indicates the total number of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. In LSC, we decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis.

Step 6: [Version 2] Cleaning Copyright Notices, Permission polices, Journal Names and Conference Names from LSC Abstracts in Version 1Publications can include a footer of copyright notice, permission policy, journal name, licence, author’s right or conference name below the text of abstract by conferences and journals. Used tool for extracting and processing abstracts in WoS database leads to attached such footers to the text. For example, our casual observation yields that copyright notices such as ‘Published by Elsevier ltd.’ is placed in many texts. To avoid abnormal appearances of words in further analysis of words such as bias in frequency calculation, we performed a cleaning procedure on such sentences and phrases in abstracts of LSC version 1. We removed copyright notices, names of conferences, names of journals, authors’ rights, licenses and permission policies identiﬁed by sampling of abstracts.Step 7: [Version 2] Re-extracting (Sub-setting) the Data Based on Lengths of AbstractsThe cleaning procedure described in previous step leaded to some abstracts having less than our minimum length criteria (30 words). 474 texts were removed.Step 8: Saving the Dataset into CSV FormatDocuments are saved into 34 CSV files. In CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/ [2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html [4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US [5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3 [6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.
Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining
zenodo.org
data.niaid.nih.gov
+1more
bin, zip
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4904639
Dataset updated
Jun 7, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

The attractive features of MusicOSet include:

Integration and centralization of different musical data sources

Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018

Enriched metadata for music, artists, and albums from the US popular music industry

Availability of acoustic and lyrical resources

Unrestricted access in two formats: SQL database and compressed .csv files

| Data | # Records | |:-----------------:|:---------:| | Songs | 20,405 | | Artists | 11,518 | | Albums | 26,522 | | Lyrics | 19,664 | | Acoustic Features | 20,405 | | Genres | 1,561 |
Data from: DATA MINING THE GALAXY ZOO MERGERS
data.nasa.gov
gimi9.com
+3more
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). DATA MINING THE GALAXY ZOO MERGERS [Dataset]. https://data.nasa.gov/dataset/data-mining-the-galaxy-zoo-mergers
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the coalescence (merger) of the two galaxies. Collisions and mergers are rare phenomena, yet they may signal the ultimate fate of most galaxies, including our own Milky Way. With the onset of massive collection of astronomical data, a computerized and automated method will be necessary for identifying those colliding galaxies worthy of more detailed study. This project researches methods to accomplish that goal. Astronomical data from the Sloan Digital Sky Survey (SDSS) and human-provided classifications on merger status from the Galaxy Zoo project are combined and processed with machine learning algorithms. The goal is to determine indicators of merger status based solely on discovering those automated pipeline-generated attributes in the astronomical database that correlate most strongly with the patterns identified through visual inspection by the Galaxy Zoo volunteers. In the end, we aim to provide a new and improved automated procedure for classification of collisions and mergers in future petascale astronomical sky surveys. Both information gain analysis (via the C4.5 decision tree algorithm) and cluster analysis (via the Davies-Bouldin Index) are explored as techniques for finding the strongest correlations between human-identified patterns and existing database attributes. Galaxy attributes measured in the SDSS green waveband images are found to represent the most influential of the attributes for correct classification of collisions and mergers. Only a nominal information gain is noted in this research, however, there is a clear indication of which attributes contribute so that a direction for further study is apparent.
iHEARu-EAT Database
figshare.com
txt
Updated May 4, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Hantke (2016). iHEARu-EAT Database [Dataset]. http://doi.org/10.6084/m9.figshare.1619801.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1619801.v1
Dataset updated
May 4, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Simone Hantke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ARFF Files
Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open...
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne (2023). Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open Data Analysis Platform [Dataset]. http://doi.org/10.1371/journal.pone.0145791
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0145791
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner’s Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.
Open database on global coal and metal mine production
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Feb 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Jasansky; Simon Jasansky; Mirko Lieber; Mirko Lieber; Stefan Giljum; Stefan Giljum; Victor Maus; Victor Maus (2023). Open database on global coal and metal mine production [Dataset]. http://doi.org/10.5281/zenodo.6325109
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6325109
Dataset updated
Feb 14, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Simon Jasansky; Simon Jasansky; Mirko Lieber; Mirko Lieber; Stefan Giljum; Stefan Giljum; Victor Maus; Victor Maus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set covers global extraction and production of coal and metal ores on an individual mine level. It covers
1171 individual mines, reporting mine-level production for 80 different materials in the period 2000-2021. Furthermore, also data on mining coordinates, ownership, mineral reserves, mining waste, transportation of mining products, as well
as mineral processing capacities (smelters and mineral refineries) and production is included. The data was gathered manually from more than 1900 openly available sources, such as annual or sustainability reports of mining companies. All datapoints are linked to their respective sources. After manual screening and entry of the data, automatic cleaning, harmonization and data checking was conducted. Geoinformation was obtained either from coordinates available in company reports, or by retrieving the coordinates via Google Maps API and subsequent manual checking. For mines where no coordinates could be found, other geospatial attributes such as province, region, district or municipality were recorded, and linked to the GADM data set, available at www.gadm.org.

The data set consists of 12 tables. The table “facilities” contains descriptive and spatial information of mines and processing facilities, and is available as a GeoPackage (GPKG) file. All other tables are available in comma-separated values (CSV) format. A schematic depiction of the database is provided as in PNG format in the file database_model.png.
Data from: Data Mining of the Nephrops Survey Database to Support the...
dtechtive.com
find.data.gov.scot
Updated Jan 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marine Scotland (2020). Data Mining of the Nephrops Survey Database to Support the Scottish MPA Project [Dataset]. https://dtechtive.com/datasets/19719
Explore at:
Dataset updated
Jan 7, 2020
Dataset provided by
Marine Directoratehttps://www.gov.scot/about/how-government-is-run/directorates/marine-scotland/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
Scotland
Description
Scottish Marine and Freshwater Science Volume 3 Number 9 Marine Scotland Science conducts annual underwater television surveys to estimate the abundance of Nephrops norvegicus on muddy sediments in seas around Scotland. Underwater footage is recorded to DVD and reviewed by two independent observers. Nephrops burrows are counted and burrow densities over each survey tow are estimated from the average counts and viewed area. Additional data are also collected during the surveys, including sediment samples and observations on sea pen abundance, presence of fish and other benthic species and evidence of anthropogenic activities (trawl marks). All survey data are held in a purpose designed database, the 'Nephrops survey database'. In 2010, following discussions with Scottish Natural Heritage and the Joint Nature Conservation Committee, it was agreed that data within the Nephrops survey database would be used to assist with the Scottish Marine Protected Area project, specifically the mapping of burrowed mud and offshore deep mud habitats (biotopes). This report documents work carried out, including summaries for each area surveyed and maps based on Geographic Information System layers.
r
Journal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk
researchhelpdesk.org
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Journal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/289/journal-of-big-data
Explore at:
Dataset updated
Feb 23, 2022
Dataset authored and provided by
Research Help Desk
Description
Journal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk - The Journal of Big Data publishes high-quality, scholarly research papers, methodologies and case studies covering a broad range of topics, from big data analytics to data-intensive computing and all applications of big data research. The journal examines the challenges facing big data today and going forward including, but not limited to: data capture and storage; search, sharing, and analytics; big data technologies; data visualization; architectures for massively parallel processing; data mining tools and techniques; machine learning algorithms for big data; cloud computing platforms; distributed file systems and databases; and scalable storage systems. Academic researchers and practitioners will find the Journal of Big Data to be a seminal source of innovative material. All articles published by the Journal of Big Data are made freely and permanently accessible online immediately upon publication, without subscription charges or registration barriers. As authors of articles published in the Journal of Big Data you are the copyright holders of your article and have granted to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate your article, according to the SpringerOpen copyright and license agreement. For those of you who are US government employees or are prevented from being copyright holders for similar reasons, SpringerOpen can accommodate non-standard copyright lines.

Facebook

Twitter

Click to copy link

Link copied

Cite

E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.20011798.v1

Dataset updated

Jun 4, 2023

Dataset provided by

SciELOhttp://www.scielo.org/

Authors

E.M. Ruiz Lobaina; C. P. Romero Suárez

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

Clear search

Close search

Google apps

Main menu

Data from: Results obtained in a data mining process applied to a database...

List of Top Disciplines of Advances in Data Mining and Database Management...

Dataset of books in the Advances in data mining and database management...

List of Top Schools of Advances in Data Mining and Database Management Book...

Designing a more efficient, effective and safe Medical Emergency Team (MET)...

List of Top Institutions of Advances in Data Mining and Database Management...

T10I4D100K transactional database

Data from: Towards open data blockchain analytics: a Bitcoin perspective

U.S. Data Analysis Storage Management Market Research Report By Product Type...

List of Top Authors of Advances in Data Mining and Database Management Book...

Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and...

Journal of Computational Design and Engineering Impact Factor 2024-2025 -...

LSC (Leicester Scientific Corpus)

Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

Data from: DATA MINING THE GALAXY ZOO MERGERS

iHEARu-EAT Database

Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open...

Open database on global coal and metal mine production

Data from: Data Mining of the Nephrops Survey Database to Support the...

Journal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.