3 datasets found
  1. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. Geoscape Geocoded National Address File (G-NAF)

    • data.gov.au
    • researchdata.edu.au
    • +1more
    pdf, zip
    Updated May 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Industry, Science and Resources (DISR) (2025). Geoscape Geocoded National Address File (G-NAF) [Dataset]. https://data.gov.au/data/dataset/geocoded-national-address-file-g-naf
    Explore at:
    pdf, zip(1689613051), zip(1685801192), pdf(398940)Available download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Department of Industry and Sciencehttp://www.industry.gov.au/
    Authors
    Department of Industry, Science and Resources (DISR)
    Description

    Geoscape G-NAF is the geocoded address database for Australian businesses and governments. It’s the trusted source of geocoded address data for Australia with over 50 million contributed addresses distilled into 15.4 million G-NAF addresses. It is built and maintained by Geoscape Australia using independently examined and validated government data.

    From 22 August 2022, Geoscape Australia is making G-NAF available in an additional simplified table format. G-NAF Core makes accessing geocoded addresses easier by utilising less technical effort.

    G-NAF Core will be updated on a quarterly basis along with G-NAF.

    Further information about contributors to G-NAF is available here.

    With more than 15 million Australian physical address record, G-NAF is one of the most ubiquitous and powerful spatial datasets. The records include geocodes, which are latitude and longitude map coordinates. G-NAF does not contain personal information or details relating to individuals.

    Updated versions of G-NAF are published on a quarterly basis. Previous versions are available here

    Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.

    Changes in the May 2025 release

    • Nationally, the May 2025 update of G-NAF shows an overall increase of 47,194 addresses (0.30%). The total number of addresses in G-NAF now stands at 15,753,927 of which 14,909,770 or 94.64% are principal.

    • At some locations, there are unit-numbered addresses that appear to be duplicate addresses. Geoscape is working to identify these locations and include these addresses as separate addresses in G-NAF. To facilitate this process, some secondary addresses have had the word RETAIL added to their building names. In the first instance, this process is being progressively rolled out to identified locations, but it is expected that the requirement for this will become ongoing.

    • There is one new locality in G-NAF: Keswick Island, QLD.

    • The source data used for generating G-NAF STREET_LOCALITY_POINT data in New South Wales has an updated datum and changed from GDA94 to GDA2020. This has resulted in updates to the STREET_LOCALITY_POINT geometry for approximately 91,000 records, however, more than 95% of these have moved less than a metre.

    • Geoscape has moved product descriptions, guides and reports online to https://docs.geoscape.com.au.

    Further information on G-NAF, including FAQs on the data, is available here or through Geoscape Australia’s network of partners. They provide a range of commercial products based on G-NAF, including software solutions, consultancy and support.

    Additional information: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia.

    License Information

    Use of the G-NAF downloaded from data.gov.au is subject to the End User Licence Agreement (EULA)

    The EULA terms are based on the Creative Commons Attribution 4.0 International license (CC BY 4.0). However, an important restriction relating to the use of the open G-NAF for the sending of mail has been added.

    The open G-NAF data must not be used for the generation of an address or the compilation of an address for the sending of mail unless the user has verified that each address to be used for the sending of mail is capable of receiving mail by reference to a secondary source of information. Further information on this use restriction is available here.

    End users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).

    Users must also note the following attribution requirements:

    Preferred attribution for the Licensed Material:

    _G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the _Open Geo-coded National Address File (G-NAF) End User Licence Agreement.

    Preferred attribution for Adapted Material:

    Incorporates or developed using G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the Open Geo-coded National Address File (G-NAF) End User Licence Agreement.

    What to Expect When You Download G-NAF

    G-NAF is a complex and large dataset (approximately 5GB unpacked), consisting of multiple tables that will need to be joined prior to use. The dataset is primarily designed for application developers and large-scale spatial integration. Users are advised to read the technical documentation, including product change notices and the individual product descriptions before downloading and using the product. A quick reference guide on unpacking the G-NAF is also available.

  3. g

    Coronavirus (Covid-19) Data in the United States

    • github.com
    • openicpsr.org
    • +2more
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
    Explore at:
    csvAvailable download formats
    Dataset provided by
    New York Times
    License

    https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE

    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu