82 datasets found
  1. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. f

    Data from: Mining significant crisp-fuzzy spatial association rules

    • tandf.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenzhong Shi; Anshu Zhang; Geoffrey I. Webb (2023). Mining significant crisp-fuzzy spatial association rules [Dataset]. http://doi.org/10.6084/m9.figshare.5873139.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Wenzhong Shi; Anshu Zhang; Geoffrey I. Webb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spatial association rule mining (SARM) is an important data mining task for understanding implicit and sophisticated interactions in spatial data. The usefulness of SARM results, represented as sets of rules, depends on their reliability: the abundance of rules, control over the risk of spurious rules, and accuracy of rule interestingness measure (RIM) values. This study presents crisp-fuzzy SARM, a novel SARM method that can enhance the reliability of resultant rules. The method firstly prunes dubious rules using statistically sound tests and crisp supports for the patterns involved, and then evaluates RIMs of accepted rules using fuzzy supports. For the RIM evaluation stage, the study also proposes a Gaussian-curve-based fuzzy data discretization model for SARM with improved design for spatial semantics. The proposed techniques were evaluated by both synthetic and real-world data. The synthetic data was generated with predesigned rules and RIM values, thus the reliability of SARM results could be confidently and quantitatively evaluated. The proposed techniques showed high efficacy in enhancing the reliability of SARM results in all three aspects. The abundance of resultant rules was improved by 50% or more compared with using conventional fuzzy SARM. Minimal risk of spurious rules was guaranteed by statistically sound tests. The probability that the entire result contained any spurious rules was below 1%. The RIM values also avoided large positive errors committed by crisp SARM, which typically exceeded 50% for representative RIMs. The real-world case study on New York City points of interest reconfirms the improved reliability of crisp-fuzzy SARM results, and demonstrates that such improvement is critical for practical spatial data analytics and decision support.

  3. PATData.xlsx

    • figshare.com
    xlsx
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Smith (2024). PATData.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.27229614.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Steven Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Two sets of data (one coded by participants and one by problems) are included for Experiment 1, and two for Experiment 2. For data coded by participant, each row represents a different participant, and shows that person's counterbalancing condition, and their correct (1) and incorrect (0) responses for each of the 60 problems (20 problems in each of 3 sets). Problems, shown in the top 3 rows, depending on the counterbalancing condition, are shown in their sequential presentation order, one problem per column. Also shown are that participant's subjective reports (NONE, SOME, or MOST) for the 3 sets. Problem Data for Experiment 1 shows each problem in a single row with its mean solution rate at 7-sec, 15-sec, and 30-sec times, and a mean solution rate across all 3 times. Experiment 2 Participant Data shows each participant as a single row, with Solution Rate and Aha rates for all problems, for CRA problems, for PAT problems, and the mean numbers of problems that were both solved and reported as Aha for CRA problems and for PAT problems. Experiment 2 Problem Data show each problem as a row, and columns include each problems average solution rate, average aha rate, mean proportion of solutions reported as aha, as well as the average forward association strength form test words to solution words, and average backwards association strength from solution words to test words (From the South Florida association norms).

  4. W

    MeshSLAM: Robust Localization and Large-Scale Mapping in Barren Terrain,...

    • cloud.csiss.gmu.edu
    • data.wu.ac.at
    html
    Updated Jan 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2020). MeshSLAM: Robust Localization and Large-Scale Mapping in Barren Terrain, Phase II [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/meshslam-robust-localization-and-large-scale-mapping-in-barren-terrain-phase-ii-1b24d
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jan 29, 2020
    Dataset provided by
    United States
    Description

    Robots need to know their location to map of their surroundings but without global positioning data they need a map to identify their surroundings and estimate their location. Simultaneous localization and mapping (SLAM) solves these dual problems at once. SLAM does not depend on any kind of infrastructure and is thus a promising localization technology for NASA planetary missions and for many terrestrial applications as well. However, state-of-the-art SLAM depends on easily-recognizable landmarks in the robot's environment, which are lacking in barren planetary surfaces. Our work will develop a technology we call MeshSLAM, which constructs robust landmarks from associations of weak features extracted from terrain. Our test results will also show that MeshSLAM applies to all environments in which NASA's rovers could someday operate: dunes, rocky plains, overhangs, cliff faces, and underground structures such as lava tubes. Another limitation of SLAM for planetary missions is its significant data-association problems. As a robot travels it must infer its motion from the sensor data it collects, which invariably suffers from drift due to random error. To correct drift, SLAM recognize when the robot has returned to a previously-visited place, which requires searching over a great deal of previously-sensed data. Computation on such a large amount of memory may be infeasible on space-relevant hardware. MeshSLAM eases these requirements. It employs topology-based map segmentation, which limits the scope of a search. Furthermore, a faster, multi-resolution search is performed over the topological graph of observations. Mesh Robotics LLC and Carnegie Mellon University have formed a partnership to commercially develop MeshSLAM. MeshSLAM technology will be available via open source, to ease its adoption by NASA. In Phase 1 of our project we will show the feasibility of MeshSLAM for NASA and commercial applications through a series of focused technical demonstrations.

  5. P

    ParaMAWPS Dataset

    • paperswithcode.com
    • library.toponeai.link
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Rifat Raiyan; Md. Nafis Faiyaz; Shah Md. Jawad Kabir; Mohsinul Kabir; Hasan Mahmud; Md Kamrul Hasan, ParaMAWPS Dataset [Dataset]. https://paperswithcode.com/dataset/paramawps
    Explore at:
    Authors
    Syed Rifat Raiyan; Md. Nafis Faiyaz; Shah Md. Jawad Kabir; Mohsinul Kabir; Hasan Mahmud; Md Kamrul Hasan
    Description

    This repository contains the code, data, and models of the paper titled "Math Word Problem Solving by Generating Linguistic Variants of Problem Statements" published in the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop).

    The work is outlined in a more detailed and expository manner in our Bachelor of Science (B.Sc.) thesis titled "Variational Mathematical Reasoning: Enhancing Math Word Problem Solvers with Linguistic Variants and Disentangled Attention" which can be accessed from the Islamic University of Technology (IUT) Institutional Repository.

    License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

    Dataset In order to download our dataset PᴀʀᴀMAWPS, please navigate to the ParaMAWPS folder. We use an $80:10:10$ train-validation-test split for our PᴀʀᴀMAWPS dataset. The splits are available in .json format in the aforementioned folder.

    Data Format Each row consists of a Math Word Problem (MWP). The table below describes what each column signifies.

    Column TitleDescription
    idThe unique identification number of the sample. Seed problems have id size of $\leq 4$, whereas, variant problems have id size of $> 4$. The last variant of a seed problem (generally with the id "$16000i$", where $i$ is the id of the seed problem) is the inverse variant of the seed problem.
    original_textThe problem statement of the MWP. The seed problems have the same problem statement as present in the Mᴀᴡᴘs dataset.
    equationThe equation with a variable (x) which solves the MWP
    quantity_tagged_textThe problem statement of the MWP, where each quantity is replaced with a unique tag ([Q_i])
    quantity_tagged_equationThe equation with a variable (x) which solves the MWP, but each quantity is replaced with its unique tag ([Q_i]) in the problem statement
    have_constantWhether the use of a constant value is required to solve the MWP
    For an MWP sample (i) with have_constant label (C_i), the boolean label is,
    $C_i =\begin{cases} \text{FALSE}, & \text{if $i$ requires $0$ constant values}\ \text{TRUE}, & \text{if $i$ requires $\geq 1$ constant values}\end{cases}$

    Types of Variations https://github.com/Starscream-11813/Variational-Mathematical-Reasoning/raw/main/images/ACLMWP_variationtypes.png" alt="drawing" style="width:1000px;"/>

    Dataset Statistics https://github.com/Starscream-11813/Variational-Mathematical-Reasoning/raw/main/images/ACLMWP_datasetcomparisontable.png" alt="drawing" style="width:500px;"/> https://github.com/Starscream-11813/Variational-Mathematical-Reasoning/raw/main/images/ACLMWP_datasetcomparisongraph.png" alt="drawing" style="width:500px;"/>

    Methodology https://github.com/Starscream-11813/Variational-Mathematical-Reasoning/raw/main/images/ACLMWP_architecture2.png" alt="drawing" style="width:1000px;"/>

    Results To reproduce the results, please refer to the documentation of MWPToolkit created by Yihuai Lan et al. https://github.com/Starscream-11813/Variational-Mathematical-Reasoning/raw/main/images/ACLMWP_results.png" alt="drawing" style="width:500px;"/> https://github.com/Starscream-11813/Variational-Mathematical-Reasoning/raw/main/images/ACLMWP_ablation.png" alt="drawing" style="width:500px;"/>

    Citation If you find this work useful, please cite our paper: bib @inproceedings{raiyan-etal-2023-math, title = "Math Word Problem Solving by Generating Linguistic Variants of Problem Statements", author = "Raiyan, Syed Rifat and Faiyaz, Md Nafis and Kabir, Shah Md. Jawad and Kabir, Mohsinul and Mahmud, Hasan and Hasan, Md Kamrul", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-srw.49", doi = "10.18653/v1/2023.acl-srw.49", pages = "362--378", abstract = "The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP) {---} a crucial stride towards general AI. These existing models are susceptible to dependency on shallow heuristics and spurious correlations to derive the solution expressions. In order to ameliorate this issue, in this paper, we propose a framework for MWP solvers based on the generation of linguistic variants of the problem text. The approach involves solving each of the variant problems and electing the predicted expression with the majority of the votes. We use DeBERTa (Decoding-enhanced BERT with disentangled attention) as the encoder to leverage its rich textual representations and enhanced mask decoder to construct the solution expressions. Furthermore, we introduce a challenging dataset, ParaMAWPS, consisting of paraphrased, adversarial, and inverse variants of selectively sampled MWPs from the benchmark Mawps dataset. We extensively experiment on this dataset along with other benchmark datasets using some baseline MWP solver models. We show that training on linguistic variants of problem statements and voting on candidate predictions improve the mathematical reasoning and robustness of the model. We make our code and data publicly available.", }

    You can also cite our thesis: bib @phdthesis{raiyan2023variational, type={Bachelor's Thesis}, title={Variational Mathematical Reasoning: Enhancing Math Word Problem Solvers with Linguistic Variants and Disentangled Attention}, author={Raiyan, Syed Rifat and Faiyaz, Md Nafis and Kabir, Shah Md Jawad}, year={2023}, school={Department of Computer Science and Engineering (CSE), Islamic University of Technology}, address={Board Bazar, Gazipur-1704, Dhaka, Bangladesh}, note={Available at \url{http://103.82.172.44:8080/xmlui/handle/123456789/2092}} }

  6. f

    Gene Function Prediction from Functional Association Networks Using Kernel...

    • plos.figshare.com
    tiff
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sonja Lehtinen; Jon Lees; Jürg Bähler; John Shawe-Taylor; Christine Orengo (2023). Gene Function Prediction from Functional Association Networks Using Kernel Partial Least Squares Regression [Dataset]. http://doi.org/10.1371/journal.pone.0134668
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sonja Lehtinen; Jon Lees; Jürg Bähler; John Shawe-Taylor; Christine Orengo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the growing availability of large-scale biological datasets, automated methods of extracting functionally meaningful information from this data are becoming increasingly important. Data relating to functional association between genes or proteins, such as co-expression or functional association, is often represented in terms of gene or protein networks. Several methods of predicting gene function from these networks have been proposed. However, evaluating the relative performance of these algorithms may not be trivial: concerns have been raised over biases in different benchmarking methods and datasets, particularly relating to non-independence of functional association data and test data. In this paper we propose a new network-based gene function prediction algorithm using a commute-time kernel and partial least squares regression (Compass). We compare Compass to GeneMANIA, a leading network-based prediction algorithm, using a number of different benchmarks, and find that Compass outperforms GeneMANIA on these benchmarks. We also explicitly explore problems associated with the non-independence of functional association data and test data. We find that a benchmark based on the Gene Ontology database, which, directly or indirectly, incorporates information from other databases, may considerably overestimate the performance of algorithms exploiting functional association data for prediction.

  7. Data from: Data and Code from: Association studies of salinity tolerance in...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data and Code from: Association studies of salinity tolerance in sunflower provide robust breeding and selection strategies under climate change [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-association-studies-of-salinity-tolerance-in-sunflower-provide-robust-b
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Abstract Phytotoxic soil salinity is a global problem, and in the northern Great Plains and western Canada, salt accumulates on the surface of marine sediment soils with high water tables under annual crop cover, particularly near wetlands. Crop production can overcome saline-affected soils using crop species and cultivars with salinity tolerance along with changes in management practices. This research seeks to improve our understanding of sunflower (Helianthus annuus) genetic tolerance to high salinity soils. Genome-wide association was conducted using the Sunflower Association Mapping panel grown for two years in naturally occurring saline soils (2016 and 2017, near Indian Head, Saskatchewan, Canada), and six phenotypes were measured: days to bloom, height, leaf area, leaf mass, oil percentage, and yield. Plot level soil salinity was determined by grid sampling of soil followed by kriging. Three estimates of sunflower performance were calculated: 1) under low soil salinity ( 4 dS/m), and 3) plasticity (regression coefficient between phenotype and soil salinity). Fourteen loci were significant, with one instance of co-localization between a leaf area and a leaf mass locus. Some genomic regions identified as significant in this study were also significant in a recent greenhouse salinity experiment using the same panel. Also, some candidate genes underlying significant QTL have been identified in other plant species as having a role in salinity response. This research identifies alleles for cultivar improvement and for genetic studies to further elucidate salinity tolerance pathways. Contents This link to GitHub contains the data and analysis scripts used in this research, including R analysis scripts, and data analyzed.

  8. Data from: Optimum design of family structure and allocation of resources in...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, txt
    Updated May 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenxin Liu; Hans Peter Maurer; Jochen C. Reif; Albrecht E. Melchinger; H. F. Utz; Nicolas Ranc; Giovanni Della Porta; Matthew R. Tucker; Tobias Würschum; Wenxin Liu; Hans Peter Maurer; Jochen C. Reif; Albrecht E. Melchinger; H. F. Utz; Nicolas Ranc; Giovanni Della Porta; Matthew R. Tucker; Tobias Würschum (2022). Data from: Optimum design of family structure and allocation of resources in association mapping with lines from multiple crosses [Dataset]. http://doi.org/10.5061/dryad.m6079
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    May 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wenxin Liu; Hans Peter Maurer; Jochen C. Reif; Albrecht E. Melchinger; H. F. Utz; Nicolas Ranc; Giovanni Della Porta; Matthew R. Tucker; Tobias Würschum; Wenxin Liu; Hans Peter Maurer; Jochen C. Reif; Albrecht E. Melchinger; H. F. Utz; Nicolas Ranc; Giovanni Della Porta; Matthew R. Tucker; Tobias Würschum
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Family mapping is based on multiple segregating families and is becoming increasingly popular due to advantages over population mapping. Though much progress has been made recently, the optimum design and allocation of resources for family mapping remains unclear. Here, we addressed these issues using a simulation study, resample model averaging and cross-validation approaches. Our results show that in family mapping, the predictive power and the accuracy of QTL detection depend greatly on the population size and phenotyping intensity. With small population sizes or few test environments, QTL results become unreliable and are hampered by a large bias in the estimation of the proportion of genotypic variance explained by the detected QTL. In addition, we observed that even though quality results can be achieved with low marker densities, no plateau is reached with our full marker complement. This suggests that higher quality results could be achieved with greater mar ker densities or sequence data, which will be available in the near future for many species.

  9. a

    Data from: Neighborhood Association

    • data-roseville.opendata.arcgis.com
    • hub.arcgis.com
    Updated Jul 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CityofRoseville (2016). Neighborhood Association [Dataset]. https://data-roseville.opendata.arcgis.com/items/c056077a6f714f688ee7b635389c0822
    Explore at:
    Dataset updated
    Jul 11, 2016
    Dataset authored and provided by
    CityofRoseville
    Area covered
    Description

    Identifies existing Neighborhood Association boundaries. In 1993, the City of Roseville Police Department began the formation of Neighborhood Associations. Volunteers participating in their Neighborhood Association work to improve their neighborhoods and maintain a high quality of life. Citizens and staff work together on a variety of projects such as crime prevention, park development, resolution of development related issues, neighborhood team building and much more. Neighborhood Association Representatives make up the Roseville Coalition of Neighborhood Associations (RCONA).

  10. South Korea SME: Sales: Associations & Orgs, Repair & Other Personal...

    • ceicdata.com
    Updated Feb 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). South Korea SME: Sales: Associations & Orgs, Repair & Other Personal Services [Dataset]. https://www.ceicdata.com/en/korea/small-and-medium-enterprise-sales/sme-sales-associations--orgs-repair--other-personal-services
    Explore at:
    Dataset updated
    Feb 15, 2025
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2015 - Dec 1, 2020
    Area covered
    South Korea
    Variables measured
    Domestic Trade
    Description

    South Korea SME: Sales: Associations & Orgs, Repair & Other Personal Services data was reported at 329,791.000 KRW hm in 2019. This records an increase from the previous number of 315,567.000 KRW hm for 2018. South Korea SME: Sales: Associations & Orgs, Repair & Other Personal Services data is updated yearly, averaging 329,791.000 KRW hm from Dec 2015 (Median) to 2019, with 5 observations. The data reached an all-time high of 343,820.000 KRW hm in 2015 and a record low of 297,980.000 KRW hm in 2017. South Korea SME: Sales: Associations & Orgs, Repair & Other Personal Services data remains active status in CEIC and is reported by Korea Federation of SMEs. The data is categorized under Global Database’s South Korea – Table KR.H035: 2015-2019 Small and Medium Enterprise Sales.

  11. a

    Data from: Neighborhood Association

    • geoportal-lawrenceks.hub.arcgis.com
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Lawrence, Kansas (2024). Neighborhood Association [Dataset]. https://geoportal-lawrenceks.hub.arcgis.com/maps/lawrenceks::neighborhood-association
    Explore at:
    Dataset updated
    Aug 14, 2024
    Dataset authored and provided by
    City of Lawrence, Kansas
    Area covered
    Description

    Neighborhood association boundaries in and around the City of Lawrence, Kansas. Neighborhood associations are volunteer groups of residents who work together to discuss common issues and promote an area's overall health and well being.

  12. u

    tracking_analysis: processed per trial tracking data first three sessions

    • rdr.ucl.ac.uk
    7z
    Updated May 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesca Greenstreet (2025). tracking_analysis: processed per trial tracking data first three sessions [Dataset]. http://doi.org/10.5522/04/28778065.v1
    Explore at:
    7zAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    University College London
    Authors
    Francesca Greenstreet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dopaminergic action prediction errors serve as a value-free teaching signalAnimals’ choice behavior is characterized by two main tendencies: taking actions that led to rewards and repeating past actions. Theory suggests these strategies may be reinforced by different types of dopaminergic teaching signals: reward prediction error to reinforce value-based associations and movement-based action prediction errors to reinforce value-free repetitive associations. Here we use an auditory-discrimination task in mice to show that movement-related dopamine activity in the tail of the striatum encodes the hypothesized action prediction error signal. Causal manipulations reveal that this prediction error serves as a value-free teaching signal that supports learning by reinforcing repeated associations. Computational modelling and experiments demonstrate that action prediction errors alone cannot support reward-guided learning but when paired with the reward prediction error circuitry they serve to consolidate stable sound-action associations in a value-free manner. Together we show that there are two types of dopaminergic prediction errors that work in tandem to support learning, each reinforcing different types of association in different striatal areas.This is processed tracking data for the first three sessions for tail and VS dopamine recordings. Used for ED figure 5 k, l, m, n

  13. Data for: Representation of rewards differing in their hedonic valence in...

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura V. Cuaya; Laura V. Cuaya; Raúl Hernández-Pérez; Raúl Hernández-Pérez; Attila Andics; Attila Andics; Rita Báji; Márta Gácsi; Márta Gácsi; Marion Guilloux; Alice Roche; Alice Roche; Laurence Callejon; Ádám Miklósi; Ádám Miklósi; Dorottya Júlia Ujfalussy; Dorottya Júlia Ujfalussy; Rita Báji; Marion Guilloux; Laurence Callejon (2023). Data for: Representation of rewards differing in their hedonic valence in the caudate nucleus correlates with the performance in a problem-solving task in dogs (Canis familiaris) [Dataset]. http://doi.org/10.5281/zenodo.7626042
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Laura V. Cuaya; Laura V. Cuaya; Raúl Hernández-Pérez; Raúl Hernández-Pérez; Attila Andics; Attila Andics; Rita Báji; Márta Gácsi; Márta Gácsi; Marion Guilloux; Alice Roche; Alice Roche; Laurence Callejon; Ádám Miklósi; Ádám Miklósi; Dorottya Júlia Ujfalussy; Dorottya Júlia Ujfalussy; Rita Báji; Marion Guilloux; Laurence Callejon
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Abstract

    We have investigated dogs’ (Canis familiaris) abilities in associating different sounds with food rewards of different incentive value. The establishment of the association was tested in a problem-solving behavioural paradigm, as well as in an fMRI study on the same subjects (N=20). The aim was to show behavioural, as well as parallel neural effects of the association formation between the two sounds and two different associated food rewards.
    The latency of solving the problem was considered as an indicator of motivational state. In our behaviour study we found that dogs were quicker in solving a problem upon hearing the sound associated with food higher in reward value, suggesting that they have successfully associated the sounds with the corresponding food value. In the fMRI study, the cerebral response to the two sounds was compared both before and after the associative training. Two bilateral regions of interest were explored: the caudate nucleus and the amygdala. After the associative training the response in the caudate nucleus was higher to the sound related to a higher reward value food than to the sound related to a lower reward value food, which difference was not present before the associative training. We found an increase in the amygdala response to both sounds after the training. In a whole-brain representational similarity analysis, we found that cerebral patterns in the caudate nucleus to the two sounds were different only after the training. Moreover, we found a positive correlation between the dissimilarity index in the caudate nucleus for activation responses to the two sounds and the difference in latencies to solve the behavioural task: the quicker the dog solved the behavioural task the greater the difference in the neural representation of the two sounds was. In summary, family dogs’ brain activation patterns reflected their expectations based on what they learned about the relationship between two sounds and their associated rewards.

    This dataset contains

    • Raw data (four functional runs n = 20)
    • Dog brain template
    • ROI results (Caudate nucleus and amygdala percentage of BOLD signal change and caudate nucleus response to sounds during the two post-training runs adding or not the Inter-scan interval as a covariate in the GLM model n = 20,)
    • RSA results ( Dissimilarity change between sounds (post-training > pre-training) and dissimilarity index per participant n = 20)
  14. g

    National Directory of Associations — Vienna | gimi9.com

    • gimi9.com
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). National Directory of Associations — Vienna | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_622ac9d029f0d3e611927462
    Explore at:
    Dataset updated
    Dec 17, 2024
    Description

    What does the RNA contain? * * * The RNA lists all the associations covered by the Law of 1 July 1901 on the contract of association, whose headquarters are in France (metropolicy and overseas) with the exception of the departments of Moselle (57), Bas-Rhin (67) and Haut-Rhin (68) which fall under a specific regime. The RNA also contains, under the same conditions, associations recognised as being of public benefit. The RNA is fed after the file is processed by the Association Registry when it is created, modified or dissolved. The declarations of creation are then published in the Official Journal of the Associations and Foundations of Enterprise (JOAFE) (https://www.journal-officiel.gouv.fr/associations). * * * ### How to disseminate RNA data * * * In accordance with the provisions of the Law for a Digital Republic of 7 October 2016 the RNA data made available by the Ministry of the Interior and are part of the data accessible for consultation and downloadable as open data. The content of the downloadable data is split into two extractions: _RNA_waldec_: list of associations with an RNA number. All associations created or declared a change of situation since 2009 have an RNA number. _RNA_import_: list of associations created since 1901 that have not made a change of status declaration since 2009. _A renovated version of the RNA is being developed by the Ministry of the Interior; it will eventually enable the monitoring of transplants to be harmonised throughout the national territory. Content of extractions: * where applicable, RNA * the name of the association and its acronym * the object of the association and its social object * the address of the seat * where applicable, the management address * where applicable, the website of the association ### Setting up a FAQ for RNA * * * In the event of an error found on the basis of RNA, complaints must be sent to the territorially competent registry, where the head office is located (prefecture, sub-prefecture or departmental directorate for social cohesion). ### ** enrichment** * geocoding of the headquarters of the association: * addition of departments/regions/epci/code iso 3166-3 of the zones.

  15. o

    Data from: Association of work-related stress with mental health problems in...

    • omicsdi.org
    Updated Jul 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Association of work-related stress with mental health problems in a special police force unit. [Dataset]. https://www.omicsdi.org/dataset/biostudies/S-EPMC3717472
    Explore at:
    Dataset updated
    Jul 11, 2023
    Variables measured
    Unknown
    Description

    Objectives Law and order enforcement tasks may expose special force police officers to significant psychosocial risk factors. The aim of this work is to investigate the relationship between job stress and the presence of mental health symptoms while controlling sociodemographical, occupational and personality variables in special force police officers. Method At different time points, 292 of 294 members of the 'VI Reparto Mobile', a special police force engaged exclusively in the enforcement of law and order, responded to our invitation to complete questionnaires for the assessment of personality traits, work-related stress (using the Demand-Control-Support (DCS) and the Effort-Reward-Imbalance (ERI) models) and mental health problems such as depression, anxiety and burnout. Results Regression analyses showed that lower levels of support and reward and higher levels of effort and overcommitment were associated with higher levels of mental health symptoms. Psychological screening revealed 21 (7.3%) likely cases of mild depression (Beck Depression Inventory, BDI?10). Officers who had experienced a discrepancy between work effort and rewards showed a marked increase in the risk of depression (OR 7.89, 95% CI 2.32 to 26.82) when compared with their counterparts who did not perceive themselves to be in a condition of distress. Conclusions The findings of this study suggest that work-related stress may play a role in the development of mental health problems in police officers. The prevalence of mental health symptoms in the cohort investigated here was low, but not negligible in the case of depression. Since special forces police officers have to perform sensitive tasks for which a healthy psychological functioning is needed, the results of this study suggest that steps should be taken to prevent distress and improve the mental well-being of these workers.

  16. Data from: Associations between environmental quality and mortality in the...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Associations between environmental quality and mortality in the contiguous United States 2000-2005 [Dataset]. https://catalog.data.gov/dataset/associations-between-environmental-quality-and-mortality-in-the-contiguous-united-sta-2000
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Contiguous United States, United States
    Description

    Age-adjusted mortality rates for the contiguous United States in 2000–2005 were obtained from the Wide-ranging Online Data for Epidemiologic Research system of the U.S. Centers for Disease Control and Prevention (CDC) (2015). Age-adjusted mortality rates were weighted averages of the age-specific death rates, and they were used to account for different age structures among populations (Curtin and Klein 1995). The mortality rates for counties with < 10 deaths were suppressed by the CDC to protect privacy and to ensure data reliability; only counties with ≥ 10 deaths were included in the analyses. The underlying cause of mortality was specified using the World Health Organization’s International Statistical Classification of Diseases and Related Health Problems (10th revision; ICD-10). In this study, we focused on the all-cause mortality rate (A00-R99) and on mortality rates from the three leading causes: heart disease (I00-I09, I11, I13, and I20-I51), cancer (C00-C97), and stroke (I60- I69) (Heron 2013). We excluded mortality due to external causes for all-cause mortality, as has been done in many previous studies (e.g., Pearce et al. 2010, 2011; Zanobetti and Schwartz 2009), because external causes of mortality are less likely to be related to environmental quality. We also focused on the contiguous United States because the numbers of counties with available cause-specific mortality rates were small in Hawaii and Alaska. County-level rates were available for 3,101 of the 3,109 counties in the contiguous United States (99.7%) for all-cause mortality; for 3,067 (98.6%) counties for heart disease mortality; for 3,057 (98.3%) counties for cancer mortality; and for 2,847 (91.6%) counties for stroke mortality. The EQI includes variables representing five environmental domains: air, water, land, built, and sociodemographic (2). The domain-specific indices include both beneficial and detrimental environmental factors. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jian, Y., L. Messer, J. Jagai, K. Rappazzo, C. Gray, S. Grabich, and D. Lobdell. Associations between environmental quality and mortality in the contiguous United States 2000-2005. ENVIRONMENTAL HEALTH PERSPECTIVES. National Institute of Environmental Health Sciences (NIEHS), Research Triangle Park, NC, USA, 125(3): 355-362, (2017).

  17. Thailand No. of Issues (NOI): Corporate Bond (CB)

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, Thailand No. of Issues (NOI): Corporate Bond (CB) [Dataset]. https://www.ceicdata.com/en/thailand/thai-bond-market-association-bond-market/no-of-issues-noi-corporate-bond-cb
    Explore at:
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2017 - Nov 1, 2018
    Area covered
    Thailand
    Description

    Thailand No. of Issues (NOI): Corporate Bond (CB) data was reported at 2,225.000 THB in Nov 2018. This records a decrease from the previous number of 2,249.000 THB for Oct 2018. Thailand No. of Issues (NOI): Corporate Bond (CB) data is updated monthly, averaging 2,280.500 THB from Apr 2017 (Median) to Nov 2018, with 20 observations. The data reached an all-time high of 2,392.000 THB in Jun 2017 and a record low of 2,225.000 THB in Nov 2018. Thailand No. of Issues (NOI): Corporate Bond (CB) data remains active status in CEIC and is reported by The Thai Bond Market Association. The data is categorized under Global Database’s Thailand – Table TH.Z015: Thai Bond Market Association: Bond Market.

  18. d

    Data from: Penalized Multi-Marker versus Single-Marker Regression methods...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hui Yi; Patrick Breheny; Netsanet Iman; Yongmei Liu; Ina Hoeschele; N. Imam (2025). Penalized Multi-Marker versus Single-Marker Regression methods for genome-wide association studies of quantitative traits [Dataset]. http://doi.org/10.5061/dryad.hc445
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Hui Yi; Patrick Breheny; Netsanet Iman; Yongmei Liu; Ina Hoeschele; N. Imam
    Time period covered
    Jan 1, 2015
    Description

    The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single marker association methods. As an alternative to Single Marker Analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of Penalized Regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by False Discovery Rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini-Hochberg FDR control (SMA-BH). PR...

  19. n

    Data from: CAG repeat not polyglutamine length determines timing of...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jul 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jong-Min Lee (2020). CAG repeat not polyglutamine length determines timing of Huntington’s disease onset [Dataset]. http://doi.org/10.5061/dryad.5d4s2r8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 7, 2020
    Dataset provided by
    Massachusetts General Hospital
    Authors
    Jong-Min Lee
    Description

    Variable, glutamine-encoding, CAA interruptions indicate that a property of the uninterrupted HTT CAG repeat sequence, distinct from the length of huntingtin’s polyglutamine segment, dictates the rate at which Huntington’s disease (HD) develops. The timing of onset shows no significant association with HTT cis-eQTLs but is influenced, sometimes in a sex-specific manner, by polymorphic variation at multiple DNA maintenance genes, suggesting that the special onset-determining property of the uninterrupted CAG repeat is a propensity for length instability that leads to its somatic expansion. Additional naturally occurring genetic modifier loci, defined by GWAS, may influence HD pathogenesis through other mechanisms. These findings have profound implications for the pathogenesis of HD and other repeat diseases and question the fundamental premise that polyglutamine length determines the rate of pathogenesis in the “polyglutamine disorders.”

  20. f

    S1 Data -

    • plos.figshare.com
    xlsx
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tang Ruipeng; Yang Jianbu; Tang Jianrui; Narendra Kumar Aridas; Mohamad Sofian Abu Talip (2024). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0308845.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Tang Ruipeng; Yang Jianbu; Tang Jianrui; Narendra Kumar Aridas; Mohamad Sofian Abu Talip
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The agricultural WSN (wireless sensor network) has the characteristics of long operation cycle and wide coverage area. In order to cover as much area as possible, farms usually deploy multiple monitoring devices in different locations of the same area. Due to different types of equipment, monitoring data will vary greatly, and too many monitoring nodes also reduce the efficiency of the network. Although there have been some studies on data fusion algorithms, they have problems such as ignoring the dynamic changes of time series, weak anti-interference ability, and poor processing of data fluctuations. So in this study, a data fusion algorithm for optimal node tracking in agricultural wireless sensor networks is designed. By introducing the dynamic bending distance in the dynamic time warping algorithm to replace the absolute distance in the fuzzy association algorithm and combine the sensor’s own reliability and association degree as the weighted fusion weight, which improved the fuzzy association algorithm. Finally, another three algorithm were tested for multi-temperature sensor data fusion. Compare with the kalman filter, arithmetic mean and fuzzy association algorithm, the average value of the improved data fusion algorithm is 29.5703, which is close to the average value of the other three algorithms, indicating that the data distribution is more even. Its extremely bad value is 8.9767, which is 10.04%, 1.14% and 9.85% smaller than the other three algorithms, indicating that it is more robust when dealing with outliers. Its variance is 2.6438, which is 2.82%, 0.65% and 0.27% smaller than the other three algorithms, indicating that it is more stable and has less data volatility. The results show that the algorithm proposed in this study has higher fusion accuracy and better robustness, which can obtain the fusion value that truly feedbacks the agricultural environment conditions. It reduces production costs by reducing redundant monitoring devices, the energy consumption and improves the data collection efficiency in wireless sensor networks.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu