100+ datasets found
  1. Data from: Results obtained in a data mining process applied to a database...

    • scielo.figshare.com
    jpeg
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    E.M. Ruiz Lobaina; C. P. Romero Suárez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

  2. e

    List of Top Authors of Advances in Data Mining and Database Management Book...

    • exaly.com
    csv, json
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). List of Top Authors of Advances in Data Mining and Database Management Book Series sorted by citations [Dataset]. https://exaly.com/journal/61621/advances-in-data-mining-and-database-management-book-series/top-authors
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Nov 1, 2025
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    List of Top Authors of Advances in Data Mining and Database Management Book Series sorted by citations.

  3. Data mining as a hatchery process evaluation tool

    • scielo.figshare.com
    jpeg
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela Regina Klein; Marcos Martinez do Vale; Mariana Fernandes Ribas da Silva; Micheli Faccin Kuhn; Tatiane Branco; Mauricio Portella dos Santos (2023). Data mining as a hatchery process evaluation tool [Dataset]. http://doi.org/10.6084/m9.figshare.10258280.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Daniela Regina Klein; Marcos Martinez do Vale; Mariana Fernandes Ribas da Silva; Micheli Faccin Kuhn; Tatiane Branco; Mauricio Portella dos Santos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT The hatchery is one of the most important segments of the poultry chain, and generates an abundance of data, which, when analyzed, allow for identifying critical points of the process . The aim of this study was to evaluate the applicability of the data mining technique to databases of egg incubation of broiler breeders and laying hen breeders. The study uses a database recording egg incubation from broiler breeders housed in pens with shavings used for litters in natural mating, as well as laying hen breeders housed in cages using an artificial insemination mating system. The data mining technique (DM) was applied to analyses in a classification task, using the type of breeder and house system for delineating classes. The database was analyzed in three different ways: original database, attribute selection, and expert analysis. Models were selected on the basis of model precision and class accuracy. The data mining technique allowed for the classification of hatchery fertile eggs from different genetic groups, as well as hatching rates and the percentage of fertile eggs (the attributes with the greatest classification power). Broiler breeders showed higher fertility (> 95 %), but higher embryonic mortality between the third and seventh day post-hatching (> 0.5 %) when compared to laying hen breeders’ eggs. In conclusion, applying data mining to the hatchery process, selection of attributes and strategies based on the experience of experts can improve model performance.

  4. Open database on global coal and metal mine production

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Feb 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Jasansky; Simon Jasansky; Mirko Lieber; Mirko Lieber; Stefan Giljum; Stefan Giljum; Victor Maus; Victor Maus (2023). Open database on global coal and metal mine production [Dataset]. http://doi.org/10.5281/zenodo.6325109
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 14, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Simon Jasansky; Simon Jasansky; Mirko Lieber; Mirko Lieber; Stefan Giljum; Stefan Giljum; Victor Maus; Victor Maus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set covers global extraction and production of coal and metal ores on an individual mine level. It covers
    1171 individual mines, reporting mine-level production for 80 different materials in the period 2000-2021. Furthermore, also data on mining coordinates, ownership, mineral reserves, mining waste, transportation of mining products, as well
    as mineral processing capacities (smelters and mineral refineries) and production is included. The data was gathered manually from more than 1900 openly available sources, such as annual or sustainability reports of mining companies. All datapoints are linked to their respective sources. After manual screening and entry of the data, automatic cleaning, harmonization and data checking was conducted. Geoinformation was obtained either from coordinates available in company reports, or by retrieving the coordinates via Google Maps API and subsequent manual checking. For mines where no coordinates could be found, other geospatial attributes such as province, region, district or municipality were recorded, and linked to the GADM data set, available at www.gadm.org.

    The data set consists of 12 tables. The table “facilities” contains descriptive and spatial information of mines and processing facilities, and is available as a GeoPackage (GPKG) file. All other tables are available in comma-separated values (CSV) format. A schematic depiction of the database is provided as in PNG format in the file database_model.png.

  5. Data from: Development of the InTelligence And Machine LEarning (TAME)...

    • catalog.data.gov
    Updated Oct 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research [Dataset]. https://catalog.data.gov/dataset/development-of-the-intelligence-and-machine-learning-tame-toolkit-for-introductory-data-sc
    Explore at:
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The original contributions presented in the study are included in the article and online through the TAME Toolkit, available at: https://uncsrp.github.io/Data-Analysis-Training-Modules/, with underlying code and datasets available in the parent UNC-SRP GitHub website (https://github.com/UNCSRP). This dataset is associated with the following publication: Roell, K., L. Koval, R. Boyles, G. Patlewicz, C. Ring, C. Rider, C. Ward-Caviness, D. Reif, I. Jaspers, R. Fry, and J. Rager. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research. Frontiers in Toxicology. Frontiers, Lausanne, SWITZERLAND, 4: 893924, (2022).

  6. m

    T10I4D1000K transactional database

    • data.mendeley.com
    Updated Oct 23, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uday kiran RAGE (2019). T10I4D1000K transactional database [Dataset]. http://doi.org/10.17632/tykb96s325.1
    Explore at:
    Dataset updated
    Oct 23, 2019
    Authors
    Uday kiran RAGE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a synthetic database widely used for evaluating the scalability of pattern mining patterns. This database is generated using IBM Data Quest generator.

  7. d

    Data from: A database of artisanal, small-scale, and large-scale mining in...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). A database of artisanal, small-scale, and large-scale mining in the Copperbelt region of the Democratic Republic of Congo and Zambia [Dataset]. https://catalog.data.gov/dataset/a-database-of-artisanal-small-scale-and-large-scale-mining-in-the-copperbelt-region-of-the
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Copperbelt Province, Zambia, Democratic Republic of the Congo
    Description

    Cobalt, designated a critical mineral by the European Union and the United States, is a crucial component of the lithium-ion batteries found in cell phones, electric vehicles, and personal computing devices. Over half of the world’s cobalt supply is produced in the Democratic Republic of the Congo (DRC), where cobalt is mined in both large-scale and artisanal or small-scale operations. This dataset focuses on Africa’s mineral-rich Copperbelt region, an area mined for both copper and cobalt, that extends south across the DRC boundary into neighboring Zambia. Existing geoscientific data and remote sensing analysis were investigated to build a comprehensive dataset describing cobalt mining extent and technique (large- or artisanal/small-scale). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

  8. a

    Jo Daviess County Mining Database

    • hub.arcgis.com
    Updated Aug 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish & Wildlife Service (2021). Jo Daviess County Mining Database [Dataset]. https://hub.arcgis.com/maps/738451798b2c467eae73edfcf4abc4b9
    Explore at:
    Dataset updated
    Aug 25, 2021
    Dataset authored and provided by
    U.S. Fish & Wildlife Service
    Area covered
    Description

    Please see the individual layers below to access the detailed metadata.This feature layer contains three datasets:The Mining Boreholes dataset contains GIS points depicting mining boreholes digitized from the U.S. Bureau of Mines (USBM) Illinois Mineral Development Atlas (IMDA) for Jo Daviess County, Illinois. Each point includes a link to a corresponding log (if available). This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.The named mines dataset contains GIS polygons depicting surveyed outlines of known (named) mine diggings from the U.S. Bureau of Mines (USBM) Illinois Mineral Department Atlas (IMDA) for Jo Daviess County, Illinois. This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.The unnamed mines dataset contains GIS polygons depicting unsurveyed inferred outlines of unknown (unnamed) mine diggings from the U.S. Bureau of Mines (USBM) Illinois Mineral Development Atlas (IMDA) for Jo Daviess County, Illinois. This is one of several datasets complied for the Karst Feature Database of Jo Daviess County, IL and hosted by the U.S. Fish and Wildlife Service.

  9. f

    Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Oct 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Triant, Deborah A.; Andorf, Carson M.; Gardiner, Jack M.; Unni, Deepak R.; Elsik, Christine G.; Nguyen, Hung N.; Le Tourneau, Justin J.; Tayal, Aditi; Walsh, Amy T.; Portwood, John L.; Cannon, Ethalinda K. S.; Shamimuzzaman, (2020). Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484626
    Explore at:
    Dataset updated
    Oct 22, 2020
    Authors
    Triant, Deborah A.; Andorf, Carson M.; Gardiner, Jack M.; Unni, Deepak R.; Elsik, Christine G.; Nguyen, Hung N.; Le Tourneau, Justin J.; Tayal, Aditi; Walsh, Amy T.; Portwood, John L.; Cannon, Ethalinda K. S.; Shamimuzzaman,
    Description

    MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.

  10. e

    Africa - PowerMining Projects Database

    • energydata.info
    • cloud.csiss.gmu.edu
    Updated Jul 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Africa - PowerMining Projects Database [Dataset]. https://energydata.info/dataset/africa-powermining-projects-database-2014
    Explore at:
    Dataset updated
    Jul 23, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    "The Africa Power–Mining Database 2014 shows ongoing and forthcoming mining projects in Africa categorized by the type of mineral, ore grade, size of the project. The database draws on basic mining data from Infomine surveys, the United States Geological Survey, annual reports, technical reports, feasibility studies, investor presentations, sustainability reports on property-owner websites or filed in public domains, and mining websites (Mining Weekly, Mining Journal, Mbendi, Mining-technology, and Miningmx). Comprising 455 projects in 28 SSA countries with each project’s ore reserve value assessed at more than $250 million, the database collates publicly available and proprietary information. It also provides a panoramic view of projects operating in 2000–12 and anticipated demand in 2020. The analysis is presented over three timeframes: pre-2000, 2001–12, and 2020 (each containing the projects from the previous period except for those closing during that previous period)."

  11. d

    Data from: Locations of mines and mining activity in the contiguous United...

    • catalog.data.gov
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Locations of mines and mining activity in the contiguous United States 2013 [Dataset]. https://catalog.data.gov/dataset/locations-of-mines-and-mining-activity-in-the-contiguous-united-states-2013
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Contiguous United States, United States
    Description

    This dataset includes locations and associated information about mines and mining activity in the contiguous United States. The database was developed by combining publicly available national datasets of mineral mines, uranium mines, and minor and major coal mine activities. This database was developed in 2013, but temporal range of mine data varied dependent on source. Uranium mine information came from the TENORM Uranium Location Database produced by the US Environmental Protection Agency (U.S. EPA) in 2003. Major and minor coal mine information was from the USTRAT (Stratigraphic data related to coal) database 2012, and the mineral mine data came from the USGS Mineral Resource Program.

  12. Data from: IchnoDB: structure and importance of an ichnology database

    • tandf.figshare.com
    mdb
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dean M. Meek; Bruce M. Eglington; Luis A. Buatois; M. Gabriela Mángano (2023). IchnoDB: structure and importance of an ichnology database [Dataset]. http://doi.org/10.6084/m9.figshare.12848993.v1
    Explore at:
    mdbAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Dean M. Meek; Bruce M. Eglington; Luis A. Buatois; M. Gabriela Mángano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The design of a relational database for ichnological data is presented to illustrate and address deficiencies in present-day palaeontological databases. Currently, palaeontology databases apply concepts and terminology derived from the study of body fossils to trace fossil records. We suggest that fundamental differences between body and trace fossils make this practice inappropriate. These differences stem from the fact that trace fossils represent the behaviour of the tracemaker, and not the phylogenetic affinities of an organism. This database, referred to as IchnoDB, has been tested by the authors throughout the design process to ensure that recommended alterations to current palaeontology databases made herein are functional. In describing the design and logic that underpins an ichnology database, it is our desire to see established palaeontological databases incorporate ichnology specific fields into their structure. This would support and encourage future research, involving the use of large ichnological datasets.

  13. m

    Replication Data for: Do expectations towards Thai hospitality differ? The...

    • data.mendeley.com
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RAKSMEY SANN (2023). Replication Data for: Do expectations towards Thai hospitality differ? The views of English vs Chinese speaking travelers [Dataset]. http://doi.org/10.17632/v75j8yhpgy.1
    Explore at:
    Dataset updated
    Feb 21, 2023
    Authors
    RAKSMEY SANN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes replication data for the paper: " Sann, R. and Lai, P.-C. (2021), "Do expectations towards Thai hospitality differ? The views of English vs Chinese speaking travelers", International Journal of Culture, Tourism and Hospitality Research, Vol. 15 No. 1, pp. 43-58. https://doi.org/10.1108/IJCTHR-01-2020-0010".

  14. m

    A brief dataset highlighting online learning test scores of Bangladeshi...

    • data.mendeley.com
    Updated Feb 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shabab Rahman (2024). A brief dataset highlighting online learning test scores of Bangladeshi high-school students [Dataset]. http://doi.org/10.17632/g88h8vz9kg.2
    Explore at:
    Dataset updated
    Feb 6, 2024
    Authors
    Shabab Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    Purposive sampling was the method we chose to collect the data. We obtained information from two after-school coaching programs that voluntarily provided their online learning data to us in 2020 during the pandemic. Batches of 45 and 75 students each were used to organize the data, which were then combined to create a single dataset with 399 entries. Two phases of collection took place: on January 17, 2023, and on February 12, 2023. The initial data recording was done using Google Learning Management System's Google Classroom. The data was then exported to local storage by the classroom faculties and then passed onto the researchers. Excel was used to organize the data, with rows representing individual students and columns representing different topics. The dataset, which consists of four mock tests and sixteen physics topics, was gathered from grade 10 physics instructors and students. Every pupil was given a unique ID to protect their privacy, resulting in 399 distinct entries overall. The coaching institution standardized the dataset to score it out of 100 for consistency. It is important to note that for students who did not take the majority of the exams, the institutions did not gather or transmit missing data. The dataset displays a spread with a standard deviation of 20.5 and an average score of 69.547.

  15. z

    The Legal Cultures of the Subsoil Database

    • zenodo.org
    • data.niaid.nih.gov
    pdf
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ainhoa Montoya; Ainhoa Montoya (2024). The Legal Cultures of the Subsoil Database [Dataset]. http://doi.org/10.14296/slwu8713
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 9, 2024
    Dataset provided by
    School of Advanced Study
    Authors
    Ainhoa Montoya; Ainhoa Montoya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2020
    Description

    The Legal Cultures of the Subsoil Database is an open-access digital and bilingual (English/Spanish) research resource which maps out relevant legal and legal-like actions employed by a range of actors who have sought to assert fundamental rights in the context of socio-environmental conflicts over industrial mining.

    The database contains information on a selection of eight paradigmatic mining projects in Central America and Mexico: El Dorado (El Salvador), Cerro Blanco, Escobal and Marlin (Guatemala), San Martín and ASP & ASP2 (Honduras), La Libertad (Nicaragua), and Reducción Norte & Corazón de Tinieblas (Mexico).

  16. Data from: Data Mining of the Nephrops Survey Database to Support the...

    • find.data.gov.scot
    • dtechtive.com
    Updated Jan 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marine Scotland (2020). Data Mining of the Nephrops Survey Database to Support the Scottish MPA Project [Dataset]. https://find.data.gov.scot/datasets/19719
    Explore at:
    Dataset updated
    Jan 7, 2020
    Dataset provided by
    Marine Directoratehttps://www.gov.scot/about/how-government-is-run/directorates/marine-scotland/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Scotland
    Description

    Scottish Marine and Freshwater Science Volume 3 Number 9 Marine Scotland Science conducts annual underwater television surveys to estimate the abundance of Nephrops norvegicus on muddy sediments in seas around Scotland. Underwater footage is recorded to DVD and reviewed by two independent observers. Nephrops burrows are counted and burrow densities over each survey tow are estimated from the average counts and viewed area. Additional data are also collected during the surveys, including sediment samples and observations on sea pen abundance, presence of fish and other benthic species and evidence of anthropogenic activities (trawl marks). All survey data are held in a purpose designed database, the 'Nephrops survey database'. In 2010, following discussions with Scottish Natural Heritage and the Joint Nature Conservation Committee, it was agreed that data within the Nephrops survey database would be used to assist with the Scottish Marine Protected Area project, specifically the mapping of burrowed mud and offshore deep mud habitats (biotopes). This report documents work carried out, including summaries for each area surveyed and maps based on Geographic Information System layers.

  17. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  18. Database of International Research about Mine Tailings

    • zenodo.org
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ojeda-Pereira; Ojeda-Pereira; Campos-Medina; Campos-Medina (2025). Database of International Research about Mine Tailings [Dataset]. http://doi.org/10.5281/zenodo.8106170
    Explore at:
    Dataset updated
    Feb 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ojeda-Pereira; Ojeda-Pereira; Campos-Medina; Campos-Medina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The database contains information of international articles on Mining Tailings

  19. d

    Water and Planetary Health Analytics (WAPHA) global metal mines database

    • datadryad.org
    • search.dataone.org
    • +1more
    zip
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karen Hudson-Edwards; John Owen; Deanna Kemp; Paolo Scussolini; Alex Lechner; Mark Macklin; Paul Brewer; Christopher Thomas; John Lewin; Dirk Eilander; Graham Bird; KR Mangalaa; Amogh Mudbhatkal (2023). Water and Planetary Health Analytics (WAPHA) global metal mines database [Dataset]. http://doi.org/10.5061/dryad.j3tx95xmg
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 7, 2023
    Dataset provided by
    Dryad
    Authors
    Karen Hudson-Edwards; John Owen; Deanna Kemp; Paolo Scussolini; Alex Lechner; Mark Macklin; Paul Brewer; Christopher Thomas; John Lewin; Dirk Eilander; Graham Bird; KR Mangalaa; Amogh Mudbhatkal
    Time period covered
    Jul 25, 2023
    Description

    Data for (i) active mine sites and (ii) inactive mine sites are stored are stored as Excel spreadsheets. NB the number of active/inactive mines shown in the spreadsheets is less than that reported in Table S1, because proprietary data sources have not been included (i.e. MRDS, BRITPITS and S&P). Each spreadsheet lists mine names (column A), mine status i.e. active or inactive (column B), the principal commodity mined (column C), and lat/long coordinates (columns D & E). Data for (iii) TSFs and (iv) TDFs are stored as zipped Shapefiles. Data should be uncompressed and then imported into any GIS programme that can read Shapefiles. Modelling was implemented procedurally in MATLAB v9.9.0 (R2020b) with the open source TopoToolbox MATLAB program for the analysis of digital elevation models (https://topotoolbox.wordpress.com). Modelling workflow is presented in SI Figure S8 with example code available in the WAPHA database (Macklin et al code.pdf). Citations to software sources are giv...

  20. I

    Indonesia Mining Production: Usage: End Stock: Nickel Ore

    • ceicdata.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, Indonesia Mining Production: Usage: End Stock: Nickel Ore [Dataset]. https://www.ceicdata.com/en/indonesia/mining-production-usage/mining-production-usage-end-stock-nickel-ore
    Explore at:
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2004 - Dec 1, 2015
    Area covered
    Indonesia
    Variables measured
    Industrial Production
    Description

    Indonesia Mining Production: Usage: End Stock: Nickel Ore data was reported at 5,968,339.000 Ton in 2015. This records an increase from the previous number of 974,456.000 Ton for 2014. Indonesia Mining Production: Usage: End Stock: Nickel Ore data is updated yearly, averaging 1,303,135.000 Ton from Dec 1998 (Median) to 2015, with 18 observations. The data reached an all-time high of 5,968,339.000 Ton in 2015 and a record low of 144,087.000 Ton in 2012. Indonesia Mining Production: Usage: End Stock: Nickel Ore data remains active status in CEIC and is reported by Central Bureau of Statistics. The data is categorized under Indonesia Premium Database’s Mining and Manufacturing Sector – Table ID.BAE004: Mining Production: Usage.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
Organization logo

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
E.M. Ruiz Lobaina; C. P. Romero Suárez
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

Search
Clear search
Close search
Google apps
Main menu