Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
List of Top Authors of Advances in Data Mining and Database Management Book Series sorted by citations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The hatchery is one of the most important segments of the poultry chain, and generates an abundance of data, which, when analyzed, allow for identifying critical points of the process . The aim of this study was to evaluate the applicability of the data mining technique to databases of egg incubation of broiler breeders and laying hen breeders. The study uses a database recording egg incubation from broiler breeders housed in pens with shavings used for litters in natural mating, as well as laying hen breeders housed in cages using an artificial insemination mating system. The data mining technique (DM) was applied to analyses in a classification task, using the type of breeder and house system for delineating classes. The database was analyzed in three different ways: original database, attribute selection, and expert analysis. Models were selected on the basis of model precision and class accuracy. The data mining technique allowed for the classification of hatchery fertile eggs from different genetic groups, as well as hatching rates and the percentage of fertile eggs (the attributes with the greatest classification power). Broiler breeders showed higher fertility (> 95 %), but higher embryonic mortality between the third and seventh day post-hatching (> 0.5 %) when compared to laying hen breeders’ eggs. In conclusion, applying data mining to the hatchery process, selection of attributes and strategies based on the experience of experts can improve model performance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set covers global extraction and production of coal and metal ores on an individual mine level. It covers
1171 individual mines, reporting mine-level production for 80 different materials in the period 2000-2021. Furthermore, also data on mining coordinates, ownership, mineral reserves, mining waste, transportation of mining products, as well
as mineral processing capacities (smelters and mineral refineries) and production is included. The data was gathered manually from more than 1900 openly available sources, such as annual or sustainability reports of mining companies. All datapoints are linked to their respective sources. After manual screening and entry of the data, automatic cleaning, harmonization and data checking was conducted. Geoinformation was obtained either from coordinates available in company reports, or by retrieving the coordinates via Google Maps API and subsequent manual checking. For mines where no coordinates could be found, other geospatial attributes such as province, region, district or municipality were recorded, and linked to the GADM data set, available at www.gadm.org.
The data set consists of 12 tables. The table “facilities” contains descriptive and spatial information of mines and processing facilities, and is available as a GeoPackage (GPKG) file. All other tables are available in comma-separated values (CSV) format. A schematic depiction of the database is provided as in PNG format in the file database_model.png.
Facebook
TwitterThe original contributions presented in the study are included in the article and online through the TAME Toolkit, available at: https://uncsrp.github.io/Data-Analysis-Training-Modules/, with underlying code and datasets available in the parent UNC-SRP GitHub website (https://github.com/UNCSRP). This dataset is associated with the following publication: Roell, K., L. Koval, R. Boyles, G. Patlewicz, C. Ring, C. Rider, C. Ward-Caviness, D. Reif, I. Jaspers, R. Fry, and J. Rager. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research. Frontiers in Toxicology. Frontiers, Lausanne, SWITZERLAND, 4: 893924, (2022).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a synthetic database widely used for evaluating the scalability of pattern mining patterns. This database is generated using IBM Data Quest generator.
Facebook
TwitterCobalt, designated a critical mineral by the European Union and the United States, is a crucial component of the lithium-ion batteries found in cell phones, electric vehicles, and personal computing devices. Over half of the world’s cobalt supply is produced in the Democratic Republic of the Congo (DRC), where cobalt is mined in both large-scale and artisanal or small-scale operations. This dataset focuses on Africa’s mineral-rich Copperbelt region, an area mined for both copper and cobalt, that extends south across the DRC boundary into neighboring Zambia. Existing geoscientific data and remote sensing analysis were investigated to build a comprehensive dataset describing cobalt mining extent and technique (large- or artisanal/small-scale). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Facebook
TwitterPlease see the individual layers below to access the detailed metadata.This feature layer contains three datasets:The Mining Boreholes dataset contains GIS points depicting mining boreholes digitized from the U.S. Bureau of Mines (USBM) Illinois Mineral Development Atlas (IMDA) for Jo Daviess County, Illinois. Each point includes a link to a corresponding log (if available). This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.The named mines dataset contains GIS polygons depicting surveyed outlines of known (named) mine diggings from the U.S. Bureau of Mines (USBM) Illinois Mineral Department Atlas (IMDA) for Jo Daviess County, Illinois. This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.The unnamed mines dataset contains GIS polygons depicting unsurveyed inferred outlines of unknown (unnamed) mine diggings from the U.S. Bureau of Mines (USBM) Illinois Mineral Development Atlas (IMDA) for Jo Daviess County, Illinois. This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.
Facebook
TwitterMaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"The Africa Power–Mining Database 2014 shows ongoing and forthcoming mining projects in Africa categorized by the type of mineral, ore grade, size of the project. The database draws on basic mining data from Infomine surveys, the United States Geological Survey, annual reports, technical reports, feasibility studies, investor presentations, sustainability reports on property-owner websites or filed in public domains, and mining websites (Mining Weekly, Mining Journal, Mbendi, Mining-technology, and Miningmx). Comprising 455 projects in 28 SSA countries with each project’s ore reserve value assessed at more than $250 million, the database collates publicly available and proprietary information. It also provides a panoramic view of projects operating in 2000–12 and anticipated demand in 2020. The analysis is presented over three timeframes: pre-2000, 2001–12, and 2020 (each containing the projects from the previous period except for those closing during that previous period)."
Facebook
TwitterThis dataset includes locations and associated information about mines and mining activity in the contiguous United States. The database was developed by combining publicly available national datasets of mineral mines, uranium mines, and minor and major coal mine activities. This database was developed in 2013, but temporal range of mine data varied dependent on source. Uranium mine information came from the TENORM Uranium Location Database produced by the US Environmental Protection Agency (U.S. EPA) in 2003. Major and minor coal mine information was from the USTRAT (Stratigraphic data related to coal) database 2012, and the mineral mine data came from the USGS Mineral Resource Program.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The design of a relational database for ichnological data is presented to illustrate and address deficiencies in present-day palaeontological databases. Currently, palaeontology databases apply concepts and terminology derived from the study of body fossils to trace fossil records. We suggest that fundamental differences between body and trace fossils make this practice inappropriate. These differences stem from the fact that trace fossils represent the behaviour of the tracemaker, and not the phylogenetic affinities of an organism. This database, referred to as IchnoDB, has been tested by the authors throughout the design process to ensure that recommended alterations to current palaeontology databases made herein are functional. In describing the design and logic that underpins an ichnology database, it is our desire to see established palaeontological databases incorporate ichnology specific fields into their structure. This would support and encourage future research, involving the use of large ichnological datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes replication data for the paper: " Sann, R. and Lai, P.-C. (2021), "Do expectations towards Thai hospitality differ? The views of English vs Chinese speaking travelers", International Journal of Culture, Tourism and Hospitality Research, Vol. 15 No. 1, pp. 43-58. https://doi.org/10.1108/IJCTHR-01-2020-0010".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Purposive sampling was the method we chose to collect the data. We obtained information from two after-school coaching programs that voluntarily provided their online learning data to us in 2020 during the pandemic. Batches of 45 and 75 students each were used to organize the data, which were then combined to create a single dataset with 399 entries. Two phases of collection took place: on January 17, 2023, and on February 12, 2023. The initial data recording was done using Google Learning Management System's Google Classroom. The data was then exported to local storage by the classroom faculties and then passed onto the researchers. Excel was used to organize the data, with rows representing individual students and columns representing different topics. The dataset, which consists of four mock tests and sixteen physics topics, was gathered from grade 10 physics instructors and students. Every pupil was given a unique ID to protect their privacy, resulting in 399 distinct entries overall. The coaching institution standardized the dataset to score it out of 100 for consistency. It is important to note that for students who did not take the majority of the exams, the institutions did not gather or transmit missing data. The dataset displays a spread with a standard deviation of 20.5 and an average score of 69.547.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Legal Cultures of the Subsoil Database is an open-access digital and bilingual (English/Spanish) research resource which maps out relevant legal and legal-like actions employed by a range of actors who have sought to assert fundamental rights in the context of socio-environmental conflicts over industrial mining.
The database contains information on a selection of eight paradigmatic mining projects in Central America and Mexico: El Dorado (El Salvador), Cerro Blanco, Escobal and Marlin (Guatemala), San Martín and ASP & ASP2 (Honduras), La Libertad (Nicaragua), and Reducción Norte & Corazón de Tinieblas (Mexico).
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Scottish Marine and Freshwater Science Volume 3 Number 9 Marine Scotland Science conducts annual underwater television surveys to estimate the abundance of Nephrops norvegicus on muddy sediments in seas around Scotland. Underwater footage is recorded to DVD and reviewed by two independent observers. Nephrops burrows are counted and burrow densities over each survey tow are estimated from the average counts and viewed area. Additional data are also collected during the surveys, including sediment samples and observations on sea pen abundance, presence of fish and other benthic species and evidence of anthropogenic activities (trawl marks). All survey data are held in a purpose designed database, the 'Nephrops survey database'. In 2010, following discussions with Scottish Natural Heritage and the Joint Nature Conservation Committee, it was agreed that data within the Nephrops survey database would be used to assist with the Scottish Marine Protected Area project, specifically the mapping of burrowed mud and offshore deep mud habitats (biotopes). This report documents work carried out, including summaries for each area surveyed and maps based on Geographic Information System layers.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The database contains information of international articles on Mining Tailings
Facebook
TwitterData for (i) active mine sites and (ii) inactive mine sites are stored are stored as Excel spreadsheets. NB the number of active/inactive mines shown in the spreadsheets is less than that reported in Table S1, because proprietary data sources have not been included (i.e. MRDS, BRITPITS and S&P). Each spreadsheet lists mine names (column A), mine status i.e. active or inactive (column B), the principal commodity mined (column C), and lat/long coordinates (columns D & E). Data for (iii) TSFs and (iv) TDFs are stored as zipped Shapefiles. Data should be uncompressed and then imported into any GIS programme that can read Shapefiles. Modelling was implemented procedurally in MATLAB v9.9.0 (R2020b) with the open source TopoToolbox MATLAB program for the analysis of digital elevation models (https://topotoolbox.wordpress.com). Modelling workflow is presented in SI Figure S8 with example code available in the WAPHA database (Macklin et al code.pdf). Citations to software sources are giv...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Indonesia Mining Production: Usage: End Stock: Nickel Ore data was reported at 5,968,339.000 Ton in 2015. This records an increase from the previous number of 974,456.000 Ton for 2014. Indonesia Mining Production: Usage: End Stock: Nickel Ore data is updated yearly, averaging 1,303,135.000 Ton from Dec 1998 (Median) to 2015, with 18 observations. The data reached an all-time high of 5,968,339.000 Ton in 2015 and a record low of 144,087.000 Ton in 2012. Indonesia Mining Production: Usage: End Stock: Nickel Ore data remains active status in CEIC and is reported by Central Bureau of Statistics. The data is categorized under Indonesia Premium Database’s Mining and Manufacturing Sector – Table ID.BAE004: Mining Production: Usage.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.