100+ datasets found

Data from: Results obtained in a data mining process applied to a database...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20011798.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
E.M. Ruiz Lobaina; C. P. Romero Suárez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.
e
List of Top Authors of Advances in Data Mining and Database Management Book...
exaly.com
csv, json
Updated Nov 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). List of Top Authors of Advances in Data Mining and Database Management Book Series sorted by citations [Dataset]. https://exaly.com/journal/61621/advances-in-data-mining-and-database-management-book-series/top-authors
Explore at:
csv, jsonAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
List of Top Authors of Advances in Data Mining and Database Management Book Series sorted by citations.
Data mining as a hatchery process evaluation tool
scielo.figshare.com
jpeg
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Regina Klein; Marcos Martinez do Vale; Mariana Fernandes Ribas da Silva; Micheli Faccin Kuhn; Tatiane Branco; Mauricio Portella dos Santos (2023). Data mining as a hatchery process evaluation tool [Dataset]. http://doi.org/10.6084/m9.figshare.10258280.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.10258280.v1
Dataset updated
Jun 3, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Daniela Regina Klein; Marcos Martinez do Vale; Mariana Fernandes Ribas da Silva; Micheli Faccin Kuhn; Tatiane Branco; Mauricio Portella dos Santos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The hatchery is one of the most important segments of the poultry chain, and generates an abundance of data, which, when analyzed, allow for identifying critical points of the process . The aim of this study was to evaluate the applicability of the data mining technique to databases of egg incubation of broiler breeders and laying hen breeders. The study uses a database recording egg incubation from broiler breeders housed in pens with shavings used for litters in natural mating, as well as laying hen breeders housed in cages using an artificial insemination mating system. The data mining technique (DM) was applied to analyses in a classification task, using the type of breeder and house system for delineating classes. The database was analyzed in three different ways: original database, attribute selection, and expert analysis. Models were selected on the basis of model precision and class accuracy. The data mining technique allowed for the classification of hatchery fertile eggs from different genetic groups, as well as hatching rates and the percentage of fertile eggs (the attributes with the greatest classification power). Broiler breeders showed higher fertility (> 95 %), but higher embryonic mortality between the third and seventh day post-hatching (> 0.5 %) when compared to laying hen breeders’ eggs. In conclusion, applying data mining to the hatchery process, selection of attributes and strategies based on the experience of experts can improve model performance.
Open database on global coal and metal mine production
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Feb 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Jasansky; Simon Jasansky; Mirko Lieber; Mirko Lieber; Stefan Giljum; Stefan Giljum; Victor Maus; Victor Maus (2023). Open database on global coal and metal mine production [Dataset]. http://doi.org/10.5281/zenodo.6325109
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6325109
Dataset updated
Feb 14, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Simon Jasansky; Simon Jasansky; Mirko Lieber; Mirko Lieber; Stefan Giljum; Stefan Giljum; Victor Maus; Victor Maus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set covers global extraction and production of coal and metal ores on an individual mine level. It covers
1171 individual mines, reporting mine-level production for 80 different materials in the period 2000-2021. Furthermore, also data on mining coordinates, ownership, mineral reserves, mining waste, transportation of mining products, as well
as mineral processing capacities (smelters and mineral refineries) and production is included. The data was gathered manually from more than 1900 openly available sources, such as annual or sustainability reports of mining companies. All datapoints are linked to their respective sources. After manual screening and entry of the data, automatic cleaning, harmonization and data checking was conducted. Geoinformation was obtained either from coordinates available in company reports, or by retrieving the coordinates via Google Maps API and subsequent manual checking. For mines where no coordinates could be found, other geospatial attributes such as province, region, district or municipality were recorded, and linked to the GADM data set, available at www.gadm.org.

The data set consists of 12 tables. The table “facilities” contains descriptive and spatial information of mines and processing facilities, and is available as a GeoPackage (GPKG) file. All other tables are available in comma-separated values (CSV) format. A schematic depiction of the database is provided as in PNG format in the file database_model.png.
Data from: Development of the InTelligence And Machine LEarning (TAME)...
catalog.data.gov
Updated Oct 31, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research [Dataset]. https://catalog.data.gov/dataset/development-of-the-intelligence-and-machine-learning-tame-toolkit-for-introductory-data-sc
Explore at:
Dataset updated
Oct 31, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The original contributions presented in the study are included in the article and online through the TAME Toolkit, available at: https://uncsrp.github.io/Data-Analysis-Training-Modules/, with underlying code and datasets available in the parent UNC-SRP GitHub website (https://github.com/UNCSRP). This dataset is associated with the following publication: Roell, K., L. Koval, R. Boyles, G. Patlewicz, C. Ring, C. Rider, C. Ward-Caviness, D. Reif, I. Jaspers, R. Fry, and J. Rager. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research. Frontiers in Toxicology. Frontiers, Lausanne, SWITZERLAND, 4: 893924, (2022).
m
T10I4D1000K transactional database
data.mendeley.com
Updated Oct 23, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uday kiran RAGE (2019). T10I4D1000K transactional database [Dataset]. http://doi.org/10.17632/tykb96s325.1
Explore at:
Unique identifier
https://doi.org/10.17632/tykb96s325.1
Dataset updated
Oct 23, 2019
Authors
Uday kiran RAGE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a synthetic database widely used for evaluating the scalability of pattern mining patterns. This database is generated using IBM Data Quest generator.
d
Data from: A database of artisanal, small-scale, and large-scale mining in...
catalog.data.gov
data.usgs.gov
Updated Nov 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). A database of artisanal, small-scale, and large-scale mining in the Copperbelt region of the Democratic Republic of Congo and Zambia [Dataset]. https://catalog.data.gov/dataset/a-database-of-artisanal-small-scale-and-large-scale-mining-in-the-copperbelt-region-of-the
Explore at:
Dataset updated
Nov 20, 2025
Dataset provided by
U.S. Geological Survey
Area covered
Copperbelt Province, Zambia, Democratic Republic of the Congo
Description
Cobalt, designated a critical mineral by the European Union and the United States, is a crucial component of the lithium-ion batteries found in cell phones, electric vehicles, and personal computing devices. Over half of the world’s cobalt supply is produced in the Democratic Republic of the Congo (DRC), where cobalt is mined in both large-scale and artisanal or small-scale operations. This dataset focuses on Africa’s mineral-rich Copperbelt region, an area mined for both copper and cobalt, that extends south across the DRC boundary into neighboring Zambia. Existing geoscientific data and remote sensing analysis were investigated to build a comprehensive dataset describing cobalt mining extent and technique (large- or artisanal/small-scale). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
a
Jo Daviess County Mining Database
hub.arcgis.com
Updated Aug 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Fish & Wildlife Service (2021). Jo Daviess County Mining Database [Dataset]. https://hub.arcgis.com/maps/738451798b2c467eae73edfcf4abc4b9
Explore at:
Dataset updated
Aug 25, 2021
Dataset authored and provided by
U.S. Fish & Wildlife Service
Area covered

Description
Please see the individual layers below to access the detailed metadata.This feature layer contains three datasets:The Mining Boreholes dataset contains GIS points depicting mining boreholes digitized from the U.S. Bureau of Mines (USBM) Illinois Mineral Development Atlas (IMDA) for Jo Daviess County, Illinois. Each point includes a link to a corresponding log (if available). This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.The named mines dataset contains GIS polygons depicting surveyed outlines of known (named) mine diggings from the U.S. Bureau of Mines (USBM) Illinois Mineral Department Atlas (IMDA) for Jo Daviess County, Illinois. This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.The unnamed mines dataset contains GIS polygons depicting unsurveyed inferred outlines of unknown (unnamed) mine diggings from the U.S. Bureau of Mines (USBM) Illinois Mineral Development Atlas (IMDA) for Jo Daviess County, Illinois. This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.
f
Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Oct 22, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Triant, Deborah A.; Andorf, Carson M.; Gardiner, Jack M.; Unni, Deepak R.; Elsik, Christine G.; Nguyen, Hung N.; Le Tourneau, Justin J.; Tayal, Aditi; Walsh, Amy T.; Portwood, John L.; Cannon, Ethalinda K. S.; Shamimuzzaman, (2020). Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484626
Explore at:
Dataset updated
Oct 22, 2020
Authors
Triant, Deborah A.; Andorf, Carson M.; Gardiner, Jack M.; Unni, Deepak R.; Elsik, Christine G.; Nguyen, Hung N.; Le Tourneau, Justin J.; Tayal, Aditi; Walsh, Amy T.; Portwood, John L.; Cannon, Ethalinda K. S.; Shamimuzzaman,
Description
MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.
e
Africa - PowerMining Projects Database
energydata.info
cloud.csiss.gmu.edu
Updated Jul 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Africa - PowerMining Projects Database [Dataset]. https://energydata.info/dataset/africa-powermining-projects-database-2014
Explore at:
Dataset updated
Jul 23, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
"The Africa Power–Mining Database 2014 shows ongoing and forthcoming mining projects in Africa categorized by the type of mineral, ore grade, size of the project. The database draws on basic mining data from Infomine surveys, the United States Geological Survey, annual reports, technical reports, feasibility studies, investor presentations, sustainability reports on property-owner websites or filed in public domains, and mining websites (Mining Weekly, Mining Journal, Mbendi, Mining-technology, and Miningmx). Comprising 455 projects in 28 SSA countries with each project’s ore reserve value assessed at more than $250 million, the database collates publicly available and proprietary information. It also provides a panoramic view of projects operating in 2000–12 and anticipated demand in 2020. The analysis is presented over three timeframes: pre-2000, 2001–12, and 2020 (each containing the projects from the previous period except for those closing during that previous period)."
d
Data from: Locations of mines and mining activity in the contiguous United...
catalog.data.gov
Updated Nov 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Locations of mines and mining activity in the contiguous United States 2013 [Dataset]. https://catalog.data.gov/dataset/locations-of-mines-and-mining-activity-in-the-contiguous-united-states-2013
Explore at:
Dataset updated
Nov 19, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Contiguous United States, United States
Description
This dataset includes locations and associated information about mines and mining activity in the contiguous United States. The database was developed by combining publicly available national datasets of mineral mines, uranium mines, and minor and major coal mine activities. This database was developed in 2013, but temporal range of mine data varied dependent on source. Uranium mine information came from the TENORM Uranium Location Database produced by the US Environmental Protection Agency (U.S. EPA) in 2003. Major and minor coal mine information was from the USTRAT (Stratigraphic data related to coal) database 2012, and the mineral mine data came from the USGS Mineral Resource Program.
Data from: IchnoDB: structure and importance of an ichnology database
tandf.figshare.com
mdb
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dean M. Meek; Bruce M. Eglington; Luis A. Buatois; M. Gabriela Mángano (2023). IchnoDB: structure and importance of an ichnology database [Dataset]. http://doi.org/10.6084/m9.figshare.12848993.v1
Explore at:
mdbAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12848993.v1
Dataset updated
Jun 5, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Dean M. Meek; Bruce M. Eglington; Luis A. Buatois; M. Gabriela Mángano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The design of a relational database for ichnological data is presented to illustrate and address deficiencies in present-day palaeontological databases. Currently, palaeontology databases apply concepts and terminology derived from the study of body fossils to trace fossil records. We suggest that fundamental differences between body and trace fossils make this practice inappropriate. These differences stem from the fact that trace fossils represent the behaviour of the tracemaker, and not the phylogenetic affinities of an organism. This database, referred to as IchnoDB, has been tested by the authors throughout the design process to ensure that recommended alterations to current palaeontology databases made herein are functional. In describing the design and logic that underpins an ichnology database, it is our desire to see established palaeontological databases incorporate ichnology specific fields into their structure. This would support and encourage future research, involving the use of large ichnological datasets.
m
Replication Data for: Do expectations towards Thai hospitality differ? The...
data.mendeley.com
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RAKSMEY SANN (2023). Replication Data for: Do expectations towards Thai hospitality differ? The views of English vs Chinese speaking travelers [Dataset]. http://doi.org/10.17632/v75j8yhpgy.1
Explore at:
Unique identifier
https://doi.org/10.17632/v75j8yhpgy.1
Dataset updated
Feb 21, 2023
Authors
RAKSMEY SANN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes replication data for the paper: " Sann, R. and Lai, P.-C. (2021), "Do expectations towards Thai hospitality differ? The views of English vs Chinese speaking travelers", International Journal of Culture, Tourism and Hospitality Research, Vol. 15 No. 1, pp. 43-58. https://doi.org/10.1108/IJCTHR-01-2020-0010".
m
A brief dataset highlighting online learning test scores of Bangladeshi...
data.mendeley.com
Updated Feb 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shabab Rahman (2024). A brief dataset highlighting online learning test scores of Bangladeshi high-school students [Dataset]. http://doi.org/10.17632/g88h8vz9kg.2
Explore at:
Unique identifier
https://doi.org/10.17632/g88h8vz9kg.2
Dataset updated
Feb 6, 2024
Authors
Shabab Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
Purposive sampling was the method we chose to collect the data. We obtained information from two after-school coaching programs that voluntarily provided their online learning data to us in 2020 during the pandemic. Batches of 45 and 75 students each were used to organize the data, which were then combined to create a single dataset with 399 entries. Two phases of collection took place: on January 17, 2023, and on February 12, 2023. The initial data recording was done using Google Learning Management System's Google Classroom. The data was then exported to local storage by the classroom faculties and then passed onto the researchers. Excel was used to organize the data, with rows representing individual students and columns representing different topics. The dataset, which consists of four mock tests and sixteen physics topics, was gathered from grade 10 physics instructors and students. Every pupil was given a unique ID to protect their privacy, resulting in 399 distinct entries overall. The coaching institution standardized the dataset to score it out of 100 for consistency. It is important to note that for students who did not take the majority of the exams, the institutions did not gather or transmit missing data. The dataset displays a spread with a standard deviation of 20.5 and an average score of 69.547.
z
The Legal Cultures of the Subsoil Database
zenodo.org
data.niaid.nih.gov
pdf
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ainhoa Montoya; Ainhoa Montoya (2024). The Legal Cultures of the Subsoil Database [Dataset]. http://doi.org/10.14296/slwu8713
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.14296/slwu8713
Dataset updated
Aug 9, 2024
Dataset provided by
School of Advanced Study
Authors
Ainhoa Montoya; Ainhoa Montoya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2020
Description
The Legal Cultures of the Subsoil Database is an open-access digital and bilingual (English/Spanish) research resource which maps out relevant legal and legal-like actions employed by a range of actors who have sought to assert fundamental rights in the context of socio-environmental conflicts over industrial mining.

The database contains information on a selection of eight paradigmatic mining projects in Central America and Mexico: El Dorado (El Salvador), Cerro Blanco, Escobal and Marlin (Guatemala), San Martín and ASP & ASP2 (Honduras), La Libertad (Nicaragua), and Reducción Norte & Corazón de Tinieblas (Mexico).
Data from: Data Mining of the Nephrops Survey Database to Support the...
find.data.gov.scot
dtechtive.com
Updated Jan 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marine Scotland (2020). Data Mining of the Nephrops Survey Database to Support the Scottish MPA Project [Dataset]. https://find.data.gov.scot/datasets/19719
Explore at:
Dataset updated
Jan 7, 2020
Dataset provided by
Marine Directoratehttps://www.gov.scot/about/how-government-is-run/directorates/marine-scotland/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
Scotland
Description
Scottish Marine and Freshwater Science Volume 3 Number 9 Marine Scotland Science conducts annual underwater television surveys to estimate the abundance of Nephrops norvegicus on muddy sediments in seas around Scotland. Underwater footage is recorded to DVD and reviewed by two independent observers. Nephrops burrows are counted and burrow densities over each survey tow are estimated from the average counts and viewed area. Additional data are also collected during the surveys, including sediment samples and observations on sea pen abundance, presence of fish and other benthic species and evidence of anthropogenic activities (trawl marks). All survey data are held in a purpose designed database, the 'Nephrops survey database'. In 2010, following discussions with Scottish Natural Heritage and the Joint Nature Conservation Committee, it was agreed that data within the Nephrops survey database would be used to assist with the Scottish Marine Protected Area project, specifically the mapping of burrowed mud and offshore deep mud habitats (biotopes). This report documents work carried out, including summaries for each area surveyed and maps based on Geographic Information System layers.
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Database of International Research about Mine Tailings
zenodo.org
Updated Feb 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ojeda-Pereira; Ojeda-Pereira; Campos-Medina; Campos-Medina (2025). Database of International Research about Mine Tailings [Dataset]. http://doi.org/10.5281/zenodo.8106170
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8106170
Dataset updated
Feb 9, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ojeda-Pereira; Ojeda-Pereira; Campos-Medina; Campos-Medina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The database contains information of international articles on Mining Tailings
d
Water and Planetary Health Analytics (WAPHA) global metal mines database
datadryad.org
search.dataone.org
+1more
zip
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karen Hudson-Edwards; John Owen; Deanna Kemp; Paolo Scussolini; Alex Lechner; Mark Macklin; Paul Brewer; Christopher Thomas; John Lewin; Dirk Eilander; Graham Bird; KR Mangalaa; Amogh Mudbhatkal (2023). Water and Planetary Health Analytics (WAPHA) global metal mines database [Dataset]. http://doi.org/10.5061/dryad.j3tx95xmg
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j3tx95xmg
Dataset updated
Sep 7, 2023
Dataset provided by
Dryad
Authors
Karen Hudson-Edwards; John Owen; Deanna Kemp; Paolo Scussolini; Alex Lechner; Mark Macklin; Paul Brewer; Christopher Thomas; John Lewin; Dirk Eilander; Graham Bird; KR Mangalaa; Amogh Mudbhatkal
Time period covered
Jul 25, 2023
Description
Data for (i) active mine sites and (ii) inactive mine sites are stored are stored as Excel spreadsheets. NB the number of active/inactive mines shown in the spreadsheets is less than that reported in Table S1, because proprietary data sources have not been included (i.e. MRDS, BRITPITS and S&P). Each spreadsheet lists mine names (column A), mine status i.e. active or inactive (column B), the principal commodity mined (column C), and lat/long coordinates (columns D & E). Data for (iii) TSFs and (iv) TDFs are stored as zipped Shapefiles. Data should be uncompressed and then imported into any GIS programme that can read Shapefiles. Modelling was implemented procedurally in MATLAB v9.9.0 (R2020b) with the open source TopoToolbox MATLAB program for the analysis of digital elevation models (https://topotoolbox.wordpress.com). Modelling workflow is presented in SI Figure S8 with example code available in the WAPHA database (Macklin et al code.pdf). Citations to software sources are giv...
I
Indonesia Mining Production: Usage: End Stock: Nickel Ore
ceicdata.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, Indonesia Mining Production: Usage: End Stock: Nickel Ore [Dataset]. https://www.ceicdata.com/en/indonesia/mining-production-usage/mining-production-usage-end-stock-nickel-ore
Explore at:
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2004 - Dec 1, 2015
Area covered
Indonesia
Variables measured
Industrial Production
Description
Indonesia Mining Production: Usage: End Stock: Nickel Ore data was reported at 5,968,339.000 Ton in 2015. This records an increase from the previous number of 974,456.000 Ton for 2014. Indonesia Mining Production: Usage: End Stock: Nickel Ore data is updated yearly, averaging 1,303,135.000 Ton from Dec 1998 (Median) to 2015, with 18 observations. The data reached an all-time high of 5,968,339.000 Ton in 2015 and a record low of 144,087.000 Ton in 2012. Indonesia Mining Production: Usage: End Stock: Nickel Ore data remains active status in CEIC and is reported by Central Bureau of Statistics. The data is categorized under Indonesia Premium Database’s Mining and Manufacturing Sector – Table ID.BAE004: Mining Production: Usage.

Facebook

Twitter

Click to copy link

Link copied

Cite

E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.20011798.v1

Dataset updated

Jun 4, 2023

Dataset provided by

SciELOhttp://www.scielo.org/

Authors

E.M. Ruiz Lobaina; C. P. Romero Suárez

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

Clear search

Close search

Google apps

Main menu

Data from: Results obtained in a data mining process applied to a database...

List of Top Authors of Advances in Data Mining and Database Management Book...

Data mining as a hatchery process evaluation tool

Open database on global coal and metal mine production

Data from: Development of the InTelligence And Machine LEarning (TAME)...

T10I4D1000K transactional database

Data from: A database of artisanal, small-scale, and large-scale mining in...

Jo Daviess County Mining Database

Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and...

Africa - PowerMining Projects Database

Data from: Locations of mines and mining activity in the contiguous United...

Data from: IchnoDB: structure and importance of an ichnology database

Replication Data for: Do expectations towards Thai hospitality differ? The...

A brief dataset highlighting online learning test scores of Bangladeshi...

The Legal Cultures of the Subsoil Database

Data from: Data Mining of the Nephrops Survey Database to Support the...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Database of International Research about Mine Tailings

Water and Planetary Health Analytics (WAPHA) global metal mines database

Indonesia Mining Production: Usage: End Stock: Nickel Ore

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.