Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.
This dataset includes locations and associated information about mines and mining activity in the contiguous United States. The database was developed by combining publicly available national datasets of mineral mines, uranium mines, and minor and major coal mine activities. This database was developed in 2013, but temporal range of mine data varied dependent on source. Uranium mine information came from the TENORM Uranium Location Database produced by the US Environmental Protection Agency (U.S. EPA) in 2003. Major and minor coal mine information was from the USTRAT (Stratigraphic data related to coal) database 2012, and the mineral mine data came from the USGS Mineral Resource Program.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set covers global extraction and production of coal and metal ores on an individual mine level. It covers
1171 individual mines, reporting mine-level production for 80 different materials in the period 2000-2021. Furthermore, also data on mining coordinates, ownership, mineral reserves, mining waste, transportation of mining products, as well
as mineral processing capacities (smelters and mineral refineries) and production is included. The data was gathered manually from more than 1900 openly available sources, such as annual or sustainability reports of mining companies. All datapoints are linked to their respective sources. After manual screening and entry of the data, automatic cleaning, harmonization and data checking was conducted. Geoinformation was obtained either from coordinates available in company reports, or by retrieving the coordinates via Google Maps API and subsequent manual checking. For mines where no coordinates could be found, other geospatial attributes such as province, region, district or municipality were recorded, and linked to the GADM data set, available at www.gadm.org.
The data set consists of 12 tables. The table “facilities” contains descriptive and spatial information of mines and processing facilities, and is available as a GeoPackage (GPKG) file. All other tables are available in comma-separated values (CSV) format. A schematic depiction of the database is provided as in PNG format in the file database_model.png.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this research, data mining and decision tree techniques were analyzed as well as the induction of rules to integrate their many algorithms into the database managing system (DBMS), PostgreSQL, due to the defficiencies of the free use tools avaialable. A mechanism to optimize the performance of the implemented algorithms was proposed with the purpose of taking advantage of the PostgreSQL. By means of an experiment, it was proven that the time response and results obtained are improved when the algorithms are integrated into the managing system.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a synthetic database widely used for evaluating the scalability of pattern mining patterns. This database is generated using IBM Data Quest generator.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.
The attractive features of MusicOSet include:
| Data | # Records |
|:-----------------:|:---------:|
| Songs | 20,405 |
| Artists | 11,518 |
| Albums | 26,522 |
| Lyrics | 19,664 |
| Acoustic Features | 20,405 |
| Genres | 1,561 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Slides of the talk "Using data mining to identify new research avenues", given by Adam Stevenson at the NSF Ceramics Workshop on September 12, 2016.
DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the coalescence (merger) of the two galaxies. Collisions and mergers are rare phenomena, yet they may signal the ultimate fate of most galaxies, including our own Milky Way. With the onset of massive collection of astronomical data, a computerized and automated method will be necessary for identifying those colliding galaxies worthy of more detailed study. This project researches methods to accomplish that goal. Astronomical data from the Sloan Digital Sky Survey (SDSS) and human-provided classifications on merger status from the Galaxy Zoo project are combined and processed with machine learning algorithms. The goal is to determine indicators of merger status based solely on discovering those automated pipeline-generated attributes in the astronomical database that correlate most strongly with the patterns identified through visual inspection by the Galaxy Zoo volunteers. In the end, we aim to provide a new and improved automated procedure for classification of collisions and mergers in future petascale astronomical sky surveys. Both information gain analysis (via the C4.5 decision tree algorithm) and cluster analysis (via the Davies-Bouldin Index) are explored as techniques for finding the strongest correlations between human-identified patterns and existing database attributes. Galaxy attributes measured in the SDSS green waveband images are found to represent the most influential of the attributes for correct classification of collisions and mergers. Only a nominal information gain is noted in this research, however, there is a clear indication of which attributes contribute so that a direction for further study is apparent.
Cobalt, designated a critical mineral by the European Union and the United States, is a crucial component of the lithium-ion batteries found in cell phones, electric vehicles, and personal computing devices. Over half of the world’s cobalt supply is produced in the Democratic Republic of the Congo (DRC), where cobalt is mined in both large-scale and artisanal or small-scale operations. This dataset focuses on Africa’s mineral-rich Copperbelt region, an area mined for both copper and cobalt, that extends south across the DRC boundary into neighboring Zambia. Existing geoscientific data and remote sensing analysis were investigated to build a comprehensive dataset describing cobalt mining extent and technique (large- or artisanal/small-scale). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Bitcoin is the first implementation of a technology that has become known as a 'public permissionless' blockchain. Such systems allow public read/write access to an append-only blockchain database without the need for any mediating central authority. Instead they guarantee access, security and protocol conformity through an elegant combination of cryptographic assurances and game theoretic economic incentives. Not until the advent of the Bitcoin blockchain has such a trusted, transparent, comprehensive and granular data set of digital economic behaviours been available for public network analysis. In this article, by translating the cumbersome binary data structure of the Bitcoin blockchain into a high fidelity graph model, we demonstrate through various analyses the often overlooked social and econometric benefits of employing such a novel open data architecture. Specifically we show (a) how repeated patterns of transaction behaviours can be revealed to link user activity across t...
Data for (i) active mine sites and (ii) inactive mine sites are stored are stored as Excel spreadsheets. NB the number of active/inactive mines shown in the spreadsheets is less than that reported in Table S1, because proprietary data sources have not been included (i.e. MRDS, BRITPITS and S&P). Each spreadsheet lists mine names (column A), mine status i.e. active or inactive (column B), the principal commodity mined (column C), and lat/long coordinates (columns D & E). Data for (iii) TSFs and (iv) TDFs are stored as zipped Shapefiles. Data should be uncompressed and then imported into any GIS programme that can read Shapefiles. Modelling was implemented procedurally in MATLAB v9.9.0 (R2020b) with the open source TopoToolbox MATLAB program for the analysis of digital elevation models (https://topotoolbox.wordpress.com). Modelling workflow is presented in SI Figure S8 with example code available in the WAPHA database (Macklin et al code.pdf). Citations to software sources are giv...
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This data was taken directly in the Toraja area using a digital camera, a minimum shooting distance of 3 m in video form, the results of the shooting are divided into frames
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The longitudinal nature of the data motivated temporal trend identification in the pediatric EHR datatypes. Over the past three decades (1980-2018), we identified and quantified the temporal trend of 16,460 EHR concepts across measurement, visit, diagnosis, drug, and procedure datatypes.
Feature layer showing the mining districts, county general files, and state of Nevada files for the mining districts databases. Each polygon has related records with detailed information and links to PDF documents, if applicable.
Please credit the Nevada Bureau of Mines and Geology, University of Nevada, Reno when using any of our products. We request that you observe any copyright or disclaimer notices that may accompany these data in addition to the Creative Commons license. For specific publications, please use the suggested citation listed on the publication when available. For general datasets, please credit the Nevada Bureau of Mines and Geology, University of Nevada, Reno.
DISCLAIMER The data on this website are supplied as-is and the Nevada Bureau of Mines and Geology (NBMG), the University of Nevada, Reno (UNR), and the Nevada System of Higher Education (NSHE) make no warranties of any kind. This includes, without limitation, warranties of title, suitability for a particular use, non-infringement, absence of defects, accuracy, or the presence or absence of errors, whether or not they are known. NBMG will not be liable in any legal capacity (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of the use of the data on this website.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Scottish Marine and Freshwater Science Volume 3 Number 9 Marine Scotland Science conducts annual underwater television surveys to estimate the abundance of Nephrops norvegicus on muddy sediments in seas around Scotland. Underwater footage is recorded to DVD and reviewed by two independent observers. Nephrops burrows are counted and burrow densities over each survey tow are estimated from the average counts and viewed area. Additional data are also collected during the surveys, including sediment samples and observations on sea pen abundance, presence of fish and other benthic species and evidence of anthropogenic activities (trawl marks). All survey data are held in a purpose designed database, the 'Nephrops survey database'. In 2010, following discussions with Scottish Natural Heritage and the Joint Nature Conservation Committee, it was agreed that data within the Nephrops survey database would be used to assist with the Scottish Marine Protected Area project, specifically the mapping of burrowed mud and offshore deep mud habitats (biotopes). This report documents work carried out, including summaries for each area surveyed and maps based on Geographic Information System layers.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Neo4j graph-based data set of cycling routes in Slovenia, generated from OpenStreetMap geographical data (version: 23rd September 2022) and EU-DEM elevation data. It consists of 152659 nodes representing individual road intersections and 410922 edges representing the roads between them. The data is enriched with individual node locations in a form of a MongoDB collection.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Northern Territory Mines is a subset of the Mineral Occurrences Database (MODAT). MODAT provides information on metalliferous and non- metalliferous mineral occurrences and deposits within the Northern Territory. The data is captured by geologists from company reports and/or field work, and is stored in a Microsoft Access2010 database. Information includes deposit name, location, size, shape, origin, geological setting, host lithology, metamorphism, structure, mineralisation, wall rock alteration, past production and references
This dataset is published as Open DataAbstract The Development Low Risk Area is the part of the coal mining reporting area which contains no recorded coal mining legacy risks to the surface. New development within this defined area is subject to general Standing Advice from the Coal Authority. Purpose The development low risk areas are used by Planning Authorities to determine that a planning application is subject to general Standing Advice from the Coal Authority. Supplementary Information The National Coal Mining Database, which is based on the records held at The Coal Authority offices in Mansfield, Nottinghamshire, is updated on a regular basis. This dataset has been extracted from this dynamic database on the date stated below and therefore represents a snapshot in time. Status of the data Extract of data from the National Coal Mining Database Data update frequency: As needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mexico Production: Silver: Sinaloa data was reported at 2,401.000 kg in Jan 2019. This records a decrease from the previous number of 2,407.000 kg for Dec 2018. Mexico Production: Silver: Sinaloa data is updated monthly, averaging 4,092.000 kg from Jan 1995 (Median) to Jan 2019, with 289 observations. The data reached an all-time high of 8,544.000 kg in Jun 1996 and a record low of 43.000 kg in May 2004. Mexico Production: Silver: Sinaloa data remains active status in CEIC and is reported by National Institute of Statistics and Geography. The data is categorized under Global Database’s Mexico – Table MX.B024: Mining Production: by Region.
This dataset is published as Open DataAbstract The Development High Risk Area is the part of the coal mining reporting area which contains one or more recorded coal mining related features which have the potential for instability or a degree of risk to the surface from the legacy of coal mining operations. The combination of features included in this composite area includes mine entries; shallow coal workings (recorded and probable); recorded coal mining related hazards; recorded mine gas sites; fissures and breaklines and previous surface mining sites. New development in this defined area needs to demonstrate that the development will be safe and stable taking full account of former coal mining activities. This area was formally known as the Development Referral Area. Purpose The development high risk areas have been defined to enable developers and planners to understand and consider the potential for instability or degree of risk from the legacy of coal mining operations. This information is also provided to asset managers for the management of the land assets of public bodies and major landowners. Supplementary Information The National Coal Mining Database, which is based on the records held at The Coal Authority offices in Mansfield, Nottinghamshire, is updated on a regular basis. This dataset has been extracted from this dynamic database on the date stated below and therefore represents a snapshot in time. Status of the data Extract of data from the National Coal Mining Database Data update frequency: As needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.