80 datasets found
  1. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. Basket Analysis (Association Rule Mining)

    • kaggle.com
    zip
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Basket Analysis (Association Rule Mining) [Dataset]. https://www.kaggle.com/datasets/vikramamin/basket-analysis-association-rule-mining
    Explore at:
    zip(345413 bytes)Available download formats
    Dataset updated
    Apr 25, 2023
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The basket dataset contains a list of items available for purchase for customers. These items can be found in sets as well. For eg. milk and sugar.

    The analysis being done is to ascertain for the retailers which item or sets of items are purchased. Sometimes it so happens that the purchase of an item by the customer leads the customer to purchase another item as well. It is a sort of an association of items. This is called "Association Rule Mining".

    It shows which items appear together in a transaction or relation. It’s majorly used by retailers, grocery stores, an online marketplace that has a large transactional database.

    We wouldn’t want to calculate all associations between every possible combination of products. Instead, we would want to select only potentially “relevant” rules from the set of all possible rules. Therefore, we use the measures support, confidence and lift to reduce the number of relationships we need to analyze.

    Support says how popular an item is, as measured in the proportion of transactions in which an item set appears.

    Confidence says how likely item Y is purchased when item X is purchased, Thus it is measured by the proportion of transaction with item X in which item Y also appears (Support/Antecedent (LHS)).

    Lift says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is. (Confidence/Consequent (RHS))

  3. Groceries dataset

    • kaggle.com
    zip
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heeral Dedhia (2020). Groceries dataset [Dataset]. https://www.kaggle.com/heeraldedhia/groceries-dataset
    Explore at:
    zip(263057 bytes)Available download formats
    Dataset updated
    Sep 17, 2020
    Authors
    Heeral Dedhia
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Association Rule Mining

    Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

    Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

    Details of the dataset

    The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

    Apriori Algorithm

    Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

    An example of Association Rules

    Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

    Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Some important terms:

    • Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

    • Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

    • Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

  4. The collected raw Tara data set.

    • plos.figshare.com
    zip
    Updated Aug 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibo Chen; Zi-Tong Lu; Xue-Ting Song; Yu-Fan Gao; Jian Xiao (2025). The collected raw Tara data set. [Dataset]. http://doi.org/10.1371/journal.pone.0300490.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zhibo Chen; Zi-Tong Lu; Xue-Ting Song; Yu-Fan Gao; Jian Xiao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Omics-wide association analysis is a very important tool for medicine and human health study. However, the modern omics data sets collected often exhibit the high-dimensionality, unknown distribution response, unknown distribution features and unknown complex association relationships between the response and its explanatory features. Reliable association analysis results depend on an accurate modeling for such data sets. Most of the existing association analysis methods rely on the specific model assumptions and lack effective false discovery rate (FDR) control. To address these limitations, the paper firstly applies a single index model for omics data. The model shows robust performance in allowing the relationships between the response variable and linear combination of covariates to be connected by any unknown monotonic link function, and both the random error and the covariates can follow any unknown distribution. Then based on this model, the paper combines rank-based approach and symmetrized data aggregation approach to develop a novel and robust feature selection method for achieving fine-mapping of risk features while controlling the false positive rate of selection. The theoretical results support the proposed method and the analysis results of simulated data show the new method possesses effective and robust performance for all the scenarios. The new method is also used to analyze the two real datasets and identifies some risk features unreported by the existing finds.

  5. Retail Market Basket Transactions Dataset

    • kaggle.com
    Updated Aug 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wasiq Ali (2025). Retail Market Basket Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/wasiqaliyasir/retail-market-basket-transactions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Wasiq Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview

    The Market_Basket_Optimisation dataset is a classic transactional dataset often used in association rule mining and market basket analysis.
    It consists of multiple transactions where each transaction represents the collection of items purchased together by a customer in a single shopping trip.

    • File Name: Market_Basket_Optimisation.csv
    • Format: CSV (Comma-Separated Values)
    • Structure: Each row corresponds to one shopping basket. Each column in that row contains an item purchased in that basket.
    • Nature of Data: Transactional, categorical, sparse.
    • Primary Use Case: Discovering frequent itemsets and association rules to understand shopping patterns, product affinities, and to build recommender systems.

    Detailed Information

    📊 Dataset Composition

    • Transactions: 7,501 (each row = one basket).
    • Items (unique): Around 120 distinct products (e.g., bread, mineral water, chocolate, etc.).
    • Columns per row: Up to 20 possible items (not fixed; some rows have fewer, some more).
    • Data Type: Purely categorical (no numerical or continuous features).
    • Missing Values: Present in the form of empty cells (since not every basket has all 20 columns).
    • Duplicates: Some baskets may appear more than once — this is acceptable in transactional data as multiple customers can buy the same set of items.

    🛒 Nature of Transactions

    • Basket Definition: Each row captures items bought together during a single visit to the store.
    • Variability: Basket size varies from 1 to 20 items. Some customers buy only one product, while others purchase a full set of groceries.
    • Sparsity: Since there are ~120 unique items but only a handful appear in each basket, the dataset is sparse. Most entries in the one-hot encoded representation are zeros.

    🔎 Examples of Data

    Example transaction rows (simplified):

    Item 1Item 2Item 3Item 4...
    BreadButterJam
    Mineral waterChocolateEggsMilk
    SpaghettiTomato sauceParmesan

    Here, empty cells mean no item was purchased in that slot.

    📈 Applications of This Dataset

    This dataset is frequently used in data mining, analytics, and recommendation systems. Common applications include:

    1. Association Rule Mining (Apriori, FP-Growth):

      • Discover rules like {Bread, Butter} ⇒ {Jam} with high support and confidence.
      • Identify cross-selling opportunities.
    2. Product Affinity Analysis:

      • Understand which items tend to be purchased together.
      • Helps with store layout decisions (placing related items near each other).
    3. Recommendation Engines:

      • Build systems that suggest "You may also like" products.
      • Example: If a customer buys pasta and tomato sauce, recommend cheese.
    4. Marketing Campaigns:

      • Bundle promotions and discounts on frequently co-purchased products.
      • Personalized offers based on buying history.
    5. Inventory Management:

      • Anticipate demand for certain product combinations.
      • Prevent stockouts of items that drive the purchase of others.

    📌 Key Insights Potentially Hidden in the Dataset

    • Popular Items: Some items (like mineral water, eggs, spaghetti) occur far more frequently than others.
    • Product Pairs: Frequent pairs and triplets (e.g., pasta + sauce + cheese) reflect natural meal-prep combinations.
    • Basket Size Distribution: Most customers buy fewer than 5 items, but a small fraction buy 10+ items, showing long-tail behavior.
    • Seasonality (if extended with timestamps): Certain items might show peaks in demand during weekends or holidays (though timestamps are not included in this dataset).

    📂 Dataset Limitations

    1. No Customer Identifiers:

      • We cannot track repeated purchases by the same customer.
      • Analysis is limited to basket-level insights.
    2. No Timestamps:

      • No temporal analysis (trends over time, seasonality) is possible.
    3. No Quantities or Prices:

      • We only know whether an item was purchased, not how many units or its cost.
    4. Sparse & Noisy:

      • Many baskets are small (1–2 items), which may produce weak or trivial rules.

    🔮 Potential Extensions

    • Synthetic Timestamps: Assign simulated timestamps to study temporal buying patterns.
    • Add Customer IDs: If merged with external data, one can perform personalized recommendations.
    • Price Data: Adding cost allows for profit-driven association rules (not just frequency-based).
    • Deep Learning Models: Sequence models (RNNs, Transformers) could be applied if temporal ordering of items is introduced.

    ...

  6. Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation...

    • catalog.data.gov
    • data.bts.gov
    • +3more
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2023). Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs: Dallas Testbed Analysis Plan [supporting datasets] [Dataset]. https://catalog.data.gov/dataset/analysis-modeling-and-simulation-ams-testbed-development-and-evaluation-to-support-dynamic-d4e77
    Explore at:
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    Federal Highway Administrationhttps://highways.dot.gov/
    Description

    The datasets in this zip file are in support of Intelligent Transportation Systems Joint Program Office (ITS JPO) report FHWA-JPO-16-385, "Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs — Evaluation Report for ATDM Program," https://rosap.ntl.bts.gov/view/dot/32520 and FHWA-JPO-16-373, "Analysis, modeling, and simulation (AMS) testbed development and evaluation to support dynamic mobility applications (DMA) and active transportation and demand management (ATDM) programs : Dallas testbed analysis plan," https://rosap.ntl.bts.gov/view/dot/32106 The files in this zip file are specifically related to the Dallas Testbed. The compressed zip files total 2.2 GB in size. The files have been uploaded as-is; no further documentation was supplied by NTL. All located .docx files were converted to .pdf document files which are an open, archival format. These pdfs were then added to the zip file alongside the original .docx files. These files can be unzipped using any zip compression/decompression software. This zip file contains files in the following formats: .pdf document files which can be read using any pdf reader; .cvs text files which can be read using any text editor; .txt text files which can be read using any text editor; .docx document files which can be read in Microsoft Word and some other word processing programs; . xlsx spreadsheet files which can be read in Microsoft Excel and some other spreadsheet programs; .dat data files which may be text or multimedia; as well as GIS or mapping files in the fowlling formats: .mxd, .dbf, .prj, .sbn, .shp., .shp.xml; which may be opened in ArcGIS or other GIS software. [software requirements] These files were last accessed in 2017.

  7. f

    Data_Sheet_1_Genome-wide association analysis and admixture mapping in a...

    • datasetcatalog.nlm.nih.gov
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rivero, Joe; Rajabli, Farid; Beecham, Gary W.; McInerney, Katalina F.; Dalgard, Clifton L.; Scott, Kyle; Valladares, Glenies S.; Cuccaro, Michael L.; Akgun, Bilcag; Vance, Jeffery M.; Pericak-Vance, Margaret A.; Bussies, Parker L.; Hamilton-Nelson, Kara L.; Griswold, Anthony J.; Tejada, Sergio; Feliciano-Astacio, Briseida E.; Sanchez, Jose J.; Adams, Larry D. (2024). Data_Sheet_1_Genome-wide association analysis and admixture mapping in a Puerto Rican cohort supports an Alzheimer disease risk locus on chromosome 12.DOCX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001301710
    Explore at:
    Dataset updated
    Sep 4, 2024
    Authors
    Rivero, Joe; Rajabli, Farid; Beecham, Gary W.; McInerney, Katalina F.; Dalgard, Clifton L.; Scott, Kyle; Valladares, Glenies S.; Cuccaro, Michael L.; Akgun, Bilcag; Vance, Jeffery M.; Pericak-Vance, Margaret A.; Bussies, Parker L.; Hamilton-Nelson, Kara L.; Griswold, Anthony J.; Tejada, Sergio; Feliciano-Astacio, Briseida E.; Sanchez, Jose J.; Adams, Larry D.
    Description

    IntroductionHispanic/Latino populations are underrepresented in Alzheimer Disease (AD) genetic studies. Puerto Ricans (PR), a three-way admixed (European, African, and Amerindian) population is the second-largest Hispanic group in the continental US. We aimed to conduct a genome-wide association study (GWAS) and comprehensive analyses to identify novel AD susceptibility loci and characterize known AD genetic risk loci in the PR population.Materials and methodsOur study included Whole Genome Sequencing (WGS) and phenotype data from 648 PR individuals (345 AD, 303 cognitively unimpaired). We used a generalized linear-mixed model adjusting for sex, age, population substructure, and genetic relationship matrix. To infer local ancestry, we merged the dataset with the HGDP/1000G reference panel. Subsequently, we conducted univariate admixture mapping (AM) analysis.ResultsWe identified suggestive signals within the SLC38A1 and SCN8A genes on chromosome 12q13. This region overlaps with an area of linkage of AD in previous studies (12q13) in independent data sets further supporting. Univariate African AM analysis identified one suggestive ancestral block (p = 7.2×10−6) located in the same region. The ancestry-aware approach showed that this region has both European and African ancestral backgrounds and both contributing to the risk in this region. We also replicated 11 different known AD loci -including APOE- identified in mostly European studies, which is likely due to the high European background of the PR population.ConclusionPR GWAS and AM analysis identified a suggestive AD risk locus on chromosome 12, which includes the SLC38A1 and SCN8A genes. Our findings demonstrate the importance of designing GWAS and ancestry-aware approaches and including underrepresented populations in genetic studies of AD.

  8. f

    Table_3_Applying machine-learning to rapidly analyze large qualitative text...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Oct 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amlôt, Richard; Bondaronek, Paulina; Towler, Lauren; Papakonstantinou, Trisevgeni; Chadborn, Tim; Ainsworth, Ben; Yardley, Lucy (2023). Table_3_Applying machine-learning to rapidly analyze large qualitative text datasets to inform the COVID-19 pandemic response: comparing human and machine-assisted topic analysis techniques.DOCX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001083700
    Explore at:
    Dataset updated
    Oct 31, 2023
    Authors
    Amlôt, Richard; Bondaronek, Paulina; Towler, Lauren; Papakonstantinou, Trisevgeni; Chadborn, Tim; Ainsworth, Ben; Yardley, Lucy
    Description

    IntroductionMachine-assisted topic analysis (MATA) uses artificial intelligence methods to help qualitative researchers analyze large datasets. This is useful for researchers to rapidly update healthcare interventions during changing healthcare contexts, such as a pandemic. We examined the potential to support healthcare interventions by comparing MATA with “human-only” thematic analysis techniques on the same dataset (1,472 user responses from a COVID-19 behavioral intervention).MethodsIn MATA, an unsupervised topic-modeling approach identified latent topics in the text, from which researchers identified broad themes. In human-only codebook analysis, researchers developed an initial codebook based on previous research that was applied to the dataset by the team, who met regularly to discuss and refine the codes. Formal triangulation using a “convergence coding matrix” compared findings between methods, categorizing them as “agreement”, “complementary”, “dissonant”, or “silent”.ResultsHuman analysis took much longer than MATA (147.5 vs. 40 h). Both methods identified key themes about what users found helpful and unhelpful. Formal triangulation showed both sets of findings were highly similar. The formal triangulation showed high similarity between the findings. All MATA codes were classified as in agreement or complementary to the human themes. When findings differed slightly, this was due to human researcher interpretations or nuance from human-only analysis.DiscussionResults produced by MATA were similar to human-only thematic analysis, with substantial time savings. For simple analyses that do not require an in-depth or subtle understanding of the data, MATA is a useful tool that can support qualitative researchers to interpret and analyze large datasets quickly. This approach can support intervention development and implementation, such as enabling rapid optimization during public health emergencies.

  9. d

    Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation...

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Aviation Administration (2023). Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs: calibration Report for Phoenix Testbed [supporting datasets] [Dataset]. https://catalog.data.gov/dataset/analysis-modeling-and-simulation-ams-testbed-development-and-evaluation-to-support-dynamic
    Explore at:
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    Federal Aviation Administration
    Area covered
    Phoenix
    Description

    The datasets in this zip file are in support of FHWA-JPO-16-379, Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs - calibration Report for Phoenix Testbed : Final Report. The compressed zip file totals 1.1 GB in size. The zip file have been uploaded as-is; no further documentation was supplied by NTL, excepted as noted: All located .docx files were converted to .pdf document files which are an archival format. These .pdfs were then added to the zip file alongside the original .docx files. The initial zip file presented to NTL contained uncompressed datasets and duplicative zip files of the files. In order to make the overall size of the this zip file more manageable, duplicative files were deleted. The zip file can be unzipped using any zip compression/decompression software. This zip file contains files in the following formats: .pdf document files which can be read using any pdf reader; .cvs text files which can be read using any text editor; .docx document files which can be read in Microsoft Word and some other word processing programs; .txt text files which can be opened with any text editor; .xlsx spreadsheet files which can be read in Microsoft Excel and some other spreadsheet programs; .cfg computer configuration files; .db database files, which can be opened with many database programs; .rif raster image files, these files may have been created by the Corel Painter image editing application, a proprietary software program, although other image programs may open the files [software requirements]. These files were last accessed in 2017.

  10. d

    1.35 Student Support Satisfaction (summary)

    • catalog.data.gov
    • data-academy.tempe.gov
    • +4more
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). 1.35 Student Support Satisfaction (summary) [Dataset]. https://catalog.data.gov/dataset/1-35-student-support-satisfaction-summary
    Explore at:
    Dataset updated
    Nov 15, 2025
    Dataset provided by
    City of Tempe
    Description

    This dataset provides the annual results, by school year, from the student surveys. The survey questions assess satisfaction with overall service for individuals who receive assistance from CARE 7 Youth Support Specialists. Students who receive services from Youth Specialists are given the opportunity to complete a survey regarding their satisfaction with the services provided. A student can complete a study every time they meet with a Youth Support Specialists. The survey is voluntary. Data DictionaryAdditional InformationSource: Department generated surveyContact: Maria GonzalezContact Email: Maria_Gonzalez@tempe.govData Source Type: Excel spreadsheetPreparation Method: Responses of "Very Satisfied" and "Satisfied" from two school districts are combined and summarized.Publish Frequency: AnnualPublish Method: Manual

  11. r

    UCDP External Support Dataset

    • researchdata.se
    • gimi9.com
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stina Högbladh; Therése Pettersson; Lotta Themnér (2024). UCDP External Support Dataset [Dataset]. https://researchdata.se/en/catalogue/dataset/ext0034-1
    Explore at:
    Dataset updated
    Aug 7, 2024
    Dataset provided by
    Uppsala University
    Authors
    Stina Högbladh; Therése Pettersson; Lotta Themnér
    Time period covered
    1975 - 2010
    Description

    The UCDP, Uppsala Conflict Data Program, contains information on a large number data on organised violence, armed violence, and peacemaking. There is information from 1946 up to today, and the datasets are updated continuously. The data can be downloaded for free, and available in several different versions.

    The UCDP External Support Data contains information of external support in intrastate conflicts, 1975-2010. Provides information of kind of support, extern actor and specific year. The data is divided into two separate datasets which are analogous, i.e. contain identical data structured in a different manner to simplify various types of research such as different types of statistical analyses:

    1. One dataset provide data where the unit of analysis is a warring party-year, providing information on the existence, type, and provider of external support for all warring parties (actors) coded as active in UCDP data, on an annual basis. The dataset contains information for the time-period 1975–2010. It involves 29 variables and 3606 individuals/objects.
    2. One dataset provide data where the unit of analysis is the warring party-supporter-year, i.e. each row in the dataset contains information on the type of support that a warring party receives from a specific external party in a given year, using dummy variables for each category of support. The dataset contains information for the time-period 1975–2010. It involves 30 variables and 6519 individuals/objects.
  12. The preprocessed HNSCC dataset, which contains 2,000 gene expression values,...

    • plos.figshare.com
    zip
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhibo Chen; Zi-Tong Lu; Xue-Ting Song; Yu-Fan Gao; Jian Xiao (2025). The preprocessed HNSCC dataset, which contains 2,000 gene expression values, the logarithm of survival time, and a censoring indicator, can also be available. [Dataset]. http://doi.org/10.1371/journal.pone.0300490.s004
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zhibo Chen; Zi-Tong Lu; Xue-Ting Song; Yu-Fan Gao; Jian Xiao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The preprocessed HNSCC dataset, which contains 2,000 gene expression values, the logarithm of survival time, and a censoring indicator, can also be available.

  13. u

    Association analysis of high-low outlier road intersection crashes within...

    • zivahub.uct.ac.za
    xlsx
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Vieira; Simon Hull; Roger Behrens (2024). Association analysis of high-low outlier road intersection crashes within the CoCT in 2017, 2018, 2019 and 2021 [Dataset]. http://doi.org/10.25375/uct.25975741.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    University of Cape Town
    Authors
    Simone Vieira; Simon Hull; Roger Behrens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    City of Cape Town
    Description

    This dataset provides comprehensive information on road intersection crashes recognised as "high-low" outliers within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in at least 5% of the total "high-low" outlier road intersection crashes for the years 2017, 2018, 2019, and 2021. The dataset is meticulously organised according to support metric values, ranging from 0,05 to 0,0278, with entries presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 675 KBNumber of Files: The dataset contains a total of 10212 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-low" outlier fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes that occurred within the "high-low" outlier fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,05 support metric value. Consequently, commonly occurring crash attributes among at least 5% of the "high-low" outlier road intersection crashes were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)

  14. Variantscape datasets

    • zenodo.org
    csv
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Wosny; Marie Wosny (2025). Variantscape datasets [Dataset]. http://doi.org/10.5281/zenodo.15268056
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marie Wosny; Marie Wosny
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 23, 2025
    Description

    Variantscape dataset
    LLM-based extraction of genetic variants and biomedical entities from titles and abstracts of biomedical publications. These datasets support the analysis of literature-derived co-associations between genetic variants, cancer types, and treatments, enabling downstream network analysis, hypothesis generation, and discovery in precision oncology.

    1. Dataset: Cleaned literature dataset for biomedical entity extraction (2014–2024)
    "cleaned_OpenAlex.csv "
    A pre-processed, cleaned, and structured dataset of cancer-related biomedical publications (2014–2024) retrieved from OpenAlex, containing titles, abstracts, and metadata curated for downstream NLP and LLM-based biomedical entity extraction.

    2. Dataset: Binary entity matrix for co-association and network analysis
    "
    dataset_for_analysis.csv"
    Final binary matrix dataset derived from NLP- and LLM-based entity extraction on cancer-related literature. Entities include genetic variants, cancer types, and treatments, enabling co-occurrence and network analysis, and the investigation of literature-derived co-associations.

    3. Dataset: LLM-based classification of variant-treatment co-associations
    "v
    ariant_treatment_relationship_consensus.csv"
    Dataset capturing LLM-based classification and consensus on co-associations between genetic variants and treatments.

    4. Dataset: Metadata mapping for entity extraction and analysis
    "
    metadata_mapping_transposed.csv "
    Transposed, row-indexed metadata mapping file used for identification of each column as a variant, cancer type, treatment, study design element, or publication-derived metadata.

  15. Data from: Multi-Source Distributed System Data for AI-powered Analytics

    • zenodo.org
    zip
    Updated Nov 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao; Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao (2022). Multi-Source Distributed System Data for AI-powered Analytics [Dataset]. http://doi.org/10.5281/zenodo.3549604
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao; Sasho Nedelkoski; Jasmin Bogatinovski; Ajay Kumar Mandapati; Soeren Becker; Jorge Cardoso; Odej Kao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    In recent years there has been an increased interest in Artificial Intelligence for IT Operations (AIOps). This field utilizes monitoring data from IT systems, big data platforms, and machine learning to automate various operations and maintenance (O&M) tasks for distributed systems.
    The major contributions have been materialized in the form of novel algorithms.
    Typically, researchers took the challenge of exploring one specific type of observability data sources, such as application logs, metrics, and distributed traces, to create new algorithms.
    Nonetheless, due to the low signal-to-noise ratio of monitoring data, there is a consensus that only the analysis of multi-source monitoring data will enable the development of useful algorithms that have better performance.
    Unfortunately, existing datasets usually contain only a single source of data, often logs or metrics. This limits the possibilities for greater advances in AIOps research.
    Thus, we generated high-quality multi-source data composed of distributed traces, application logs, and metrics from a complex distributed system. This paper provides detailed descriptions of the experiment, statistics of the data, and identifies how such data can be analyzed to support O&M tasks such as anomaly detection, root cause analysis, and remediation.

    General Information:

    This repository contains the simple scripts for data statistics, and link to the multi-source distributed system dataset.

    You may find details of this dataset from the original paper:

    Sasho Nedelkoski, Jasmin Bogatinovski, Ajay Kumar Mandapati, Soeren Becker, Jorge Cardoso, Odej Kao, "Multi-Source Distributed System Data for AI-powered Analytics".

    If you use the data, implementation, or any details of the paper, please cite!

    BIBTEX:

    _

    @inproceedings{nedelkoski2020multi,
     title={Multi-source Distributed System Data for AI-Powered Analytics},
     author={Nedelkoski, Sasho and Bogatinovski, Jasmin and Mandapati, Ajay Kumar and Becker, Soeren and Cardoso, Jorge and Kao, Odej},
     booktitle={European Conference on Service-Oriented and Cloud Computing},
     pages={161--176},
     year={2020},
     organization={Springer}
    }
    

    _

    The multi-source/multimodal dataset is composed of distributed traces, application logs, and metrics produced from running a complex distributed system (Openstack). In addition, we also provide the workload and fault scripts together with the Rally report which can serve as ground truth. We provide two datasets, which differ on how the workload is executed. The sequential_data is generated via executing workload of sequential user requests. The concurrent_data is generated via executing workload of concurrent user requests.

    The raw logs in both datasets contain the same files. If the user wants the logs filetered by time with respect to the two datasets, should refer to the timestamps at the metrics (they provide the time window). In addition, we suggest to use the provided aggregated time ranged logs for both datasets in CSV format.

    Important: The logs and the metrics are synchronized with respect time and they are both recorded on CEST (central european standard time). The traces are on UTC (Coordinated Universal Time -2 hours). They should be synchronized if the user develops multimodal methods. Please read the IMPORTANT_experiment_start_end.txt file before working with the data.

    Our GitHub repository with the code for the workloads and scripts for basic analysis can be found at: https://github.com/SashoNedelkoski/multi-source-observability-dataset/

  16. f

    Data from: Do intrapersonal factors mediate the association of social...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbott, Gavin; Ball, Kylie; Brug, Johannes; Timperio, Anna; Velde, Saskia J. te; Middelweerd, Anouk (2017). Do intrapersonal factors mediate the association of social support with physical activity in young women living in socioeconomically disadvantaged neighbourhoods? A longitudinal mediation analysis [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001767289
    Explore at:
    Dataset updated
    Mar 16, 2017
    Authors
    Abbott, Gavin; Ball, Kylie; Brug, Johannes; Timperio, Anna; Velde, Saskia J. te; Middelweerd, Anouk
    Description

    BackgroundLevels of physical activity (PA) decrease when transitioning from adolescence into young adulthood. Evidence suggests that social support and intrapersonal factors (self-efficacy, outcome expectations, PA enjoyment) are associated with PA. The aim of the present study was to explore whether cross-sectional and longitudinal associations of social support from family and friends with leisure-time PA (LTPA) among young women living in disadvantaged areas were mediated by intrapersonal factors (PA enjoyment, outcome expectations, self-efficacy).MethodsSurvey data were collected from 18–30 year-old women living in disadvantaged suburbs of Victoria, Australia as part of the READI study in 2007–2008 (T0, N = 1197), with follow-up data collected in 2010–2011 (T1, N = 357) and 2012–2013 (T2, N = 271). A series of single-mediator models were tested using baseline (T0) and longitudinal data from all three time points with residual change scores for changes between measurements.ResultsCross-sectional analyses showed that social support was associated with LTPA both directly and indirectly, mediated by intrapersonal factors. Each intrapersonal factor explained between 5.9–37.5% of the associations. None of the intrapersonal factors were significant mediators in the longitudinal analyses.ConclusionsResults from the cross-sectional analyses suggest that the associations of social support from family and from friends with LTPA are mediated by intrapersonal factors (PA enjoyment, outcome expectations and self-efficacy). However, longitudinal analyses did not confirm these findings.

  17. f

    Data from: Gene-Based Association Analysis Identified Novel Genes Associated...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 26, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mo, Xing-Bo; Lei, Shu-Feng; Zhang, Yong-Hong; Deng, Fei-Yan; Lu, Xin; Zhang, Zeng-Li (2015). Gene-Based Association Analysis Identified Novel Genes Associated with Bone Mineral Density [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001915185
    Explore at:
    Dataset updated
    Mar 26, 2015
    Authors
    Mo, Xing-Bo; Lei, Shu-Feng; Zhang, Yong-Hong; Deng, Fei-Yan; Lu, Xin; Zhang, Zeng-Li
    Description

    Genetic factors contribute to the variation of bone mineral density (BMD), which is a major risk factor of osteoporosis. The aim of this study was to identify more “novel” genes for BMD. Based on the publicly available SNP-based P values, we performed an initial gene-based analysis in a total of 32,961 individuals. Furthermore, we performed differential expression, pathway and protein-protein interaction analyses to find supplementary evidence to support the significance of the identified genes. About 21,695 genes for femoral neck (FN)-BMD and 21,683 genes for lumbar spine (LS)-BMD were analyzed using gene-based association analysis. A total of 35 FN-BMD associated genes and 53 LS-BMD associated genes were identified (P < 2.3×10-6) after Bonferroni correction. Among them, 64 genes have not been reported in previous SNP-based genome-wide association studies. Differential expression analysis further supported the significant associations of 14 genes with FN-BMD and 19 genes with LS-BMD. Especially, WNT3 and WNT9B in the Wnt signaling pathway for FN-BMD were further supported by pathway analysis and protein-protein interaction analysis. The present study took the advantage of gene-based association method to perform a supplementary analysis of the GWAS dataset and found some BMD-associated genes. The evidence taken together supported the importance of Wnt signaling pathway genes in determining osteoporosis. Our findings provided more insights into the genetic basis of osteoporosis.

  18. f

    Table1_Genetic association-based functional analysis detects HOGA1 as a...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Aug 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cho, Yoon Shin; Kim, Myungsuk; Kwak, Soo Heon; Park, Kyong Soo; Randy, Ahmad; Song, No Joon; Nho, Chu Won; Lim, Eun Bi; Park, Kye Won; Ahn, Yeongseon (2022). Table1_Genetic association-based functional analysis detects HOGA1 as a potential gene involved in fat accumulation.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000449482
    Explore at:
    Dataset updated
    Aug 12, 2022
    Authors
    Cho, Yoon Shin; Kim, Myungsuk; Kwak, Soo Heon; Park, Kyong Soo; Randy, Ahmad; Song, No Joon; Nho, Chu Won; Lim, Eun Bi; Park, Kye Won; Ahn, Yeongseon
    Description

    Although there are a number of discoveries from genome-wide association studies (GWAS) for obesity, it has not been successful in linking GWAS results to biology. We sought to discover causal genes for obesity by conducting functional studies on genes detected from genetic association analysis. Gene-based association analysis of 917 individual exome sequences showed that HOGA1 attains exome-wide significance (p-value < 2.7 × 10–6) for body mass index (BMI). The mRNA expression of HOGA1 is significantly increased in human adipose tissues from obese individuals in the Genotype-Tissue Expression (GTEx) dataset, which supports the genetic association of HOGA1 with BMI. Functional analyses employing cell- and animal model-based approaches were performed to gain insights into the functional relevance of Hoga1 in obesity. Adipogenesis was retarded when Hoga1 was knocked down by siRNA treatment in a mouse 3T3-L1 cell line and a similar inhibitory effect was confirmed in mice with down-regulated Hoga1. Hoga1 antisense oligonucleotide (ASO) treatment reduced body weight, blood lipid level, blood glucose, and adipocyte size in high-fat diet-induced mice. In addition, several lipogenic genes including Srebf1, Scd1, Lp1, and Acaca were down-regulated, while lipolytic genes Cpt1l, Ppara, and Ucp1 were up-regulated. Taken together, HOGA1 is a potential causal gene for obesity as it plays a role in excess body fat development.

  19. f

    Dataset for social support paper in Stata format.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Govia, Ishtar; Wilks, Rainford J.; Francis, Damian K.; Blake, Alphanso L.; Younger-Coleman, Novie O.; Ferguson, Trevor S.; McFarlane, Shelly R.; McKenzie, Joette A.; Tulloch-Reid, Marshall K.; Williams, David R.; Walters, Renee; Bennett, Nadia R. (2024). Dataset for social support paper in Stata format. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001386084
    Explore at:
    Dataset updated
    Jul 30, 2024
    Authors
    Govia, Ishtar; Wilks, Rainford J.; Francis, Damian K.; Blake, Alphanso L.; Younger-Coleman, Novie O.; Ferguson, Trevor S.; McFarlane, Shelly R.; McKenzie, Joette A.; Tulloch-Reid, Marshall K.; Williams, David R.; Walters, Renee; Bennett, Nadia R.
    Description

    Recent studies have suggested that high levels of social support can encourage better health behaviours and result in improved cardiovascular health. In this study we evaluated the association between social support and ideal cardiovascular health among urban Jamaicans. We conducted a cross-sectional study among urban residents in Jamaica’s south-east health region. Socio-demographic data and information on cigarette smoking, physical activity, dietary practices, blood pressure, body size, cholesterol, and glucose, were collected by trained personnel. The outcome variable, ideal cardiovascular health, was defined as having optimal levels of ≥5 of these characteristics (ICH-5) according to the American Heart Association definitions. Social support exposure variables included number of friends (network size), number of friends willing to provide loans (instrumental support) and number of friends providing advice (informational support). Principal component analysis was used to create a social support score using these three variables. Survey-weighted logistic regression models were used to evaluate the association between ICH-5 and social support score. Analyses included 841 participants (279 males, 562 females) with mean age of 47.6 ± 18.42 years. ICH-5 prevalence was 26.6% (95%CI 22.3, 31.0) with no significant sex difference (male 27.5%, female 25.7%). In sex-specific, multivariable logistic regression models, social support score, was inversely associated with ICH-5 among males (OR 0.67 [95%CI 0.51, 0.89], p = 0.006) but directly associated among females (OR 1.26 [95%CI 1.04, 1.53], p = 0.020) after adjusting for age and community SES. Living in poorer communities was also significantly associated with higher odds of ICH-5 among males, while living communities with high property value was associated with higher odds of ICH among females. In this study, higher level of social support was associated with better cardiovascular health among women, but poorer cardiovascular health among men in urban Jamaica. Further research should explore these associations and identify appropriate interventions to promote cardiovascular health.

  20. g

    Michigan Public Policy Survey Restricted Use Datasets

    • datasearch.gesis.org
    Updated Aug 27, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for Local, State, and Urban Policy (2016). Michigan Public Policy Survey Restricted Use Datasets [Dataset]. http://doi.org/10.3886/E55175V2
    Explore at:
    Dataset updated
    Aug 27, 2016
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    Center for Local, State, and Urban Policy
    Area covered
    Michigan
    Description

    The Michigan Public Policy Survey (MPPS) is a program of state-wide surveys of local government leaders in Michigan. The MPPS is designed to fill an important information gap in the policymaking process. While there are ongoing surveys of the business community and of the citizens of Michigan, before the MPPS there were no ongoing surveys of local government officials that were representative of all general purpose local governments in the state. Therefore, while we knew the policy priorities and views of the state's businesses and citizens, we knew very little about the views of the local officials who are so important to the economies and community life throughout Michigan. The MPPS was launched in 2009 by the Center for Local, State, and Urban Policy (CLOSUP) at the University of Michigan and is conducted in partnership with the Michigan Association of Counties, Michigan Municipal League, and Michigan Townships Association. The associations provide CLOSUP with contact information for the survey's respondents, and consult on survey topics. CLOSUP makes all decisions on survey design, data analysis, and reporting, and receives no funding support from the associations. The surveys investigate local officials' opinions and perspectives on a variety of important public policy issues and solicit factual information about their localities relevant to policymaking. Over time, the program has covered issues such as fiscal, budgetary and operational policy, fiscal health, public sector compensation, workforce development, local-state governmental relations, intergovernmental collaboration, economic development strategies and initiatives such as placemaking and economic gardening, the role of local government in environmental sustainability, energy topics such as hydraulic fracturing ("fracking") and wind power, trust in government, views on state policymaker performance, opinions on the impacts of the Federal Stimulus Program (ARRA), and more. The program will investigate many other issues relevant to local and state policy in the future. A searchable database of every question the MPPS has asked is available on CLOSUP's website. Results of MPPS surveys are currently available as reports, and via online data tables. The MPPS datasets are being released in two forms: public-use datasets and restricted-use datasets. Unlike the public-use datasets, the restricted-use datasets represent full MPPS survey waves, and include all of the survey questions from a wave. Restricted-use datasets also allow for multiple waves to be linked together for longitudinal analysis. The MPPS staff do still modify these restricted-use datasets to remove jurisdiction and respondent identifiers and to recode other variables in order to protect confidentiality. However, it is theoretically possible that a researcher might be able, in some rare cases, to use enough variables from a full dataset to identify a unique jurisdiction, so access to these datasets is restricted and approved on a case-by-case basis. CLOSUP encourages researchers interested in the MPPS to review the codebooks included in this data collection to see the full list of variables including those not found in the public-use datasets, and to explore the MPPS data using the public-use datasets. On 2016-08-20, the openICPSR web site was moved to new software. In the migration process, some projects were not published in the new system because the decisions made in the old site did not map easily to the new setup. This project is temporarily available as restricted data while ICPSR verifies that all files were migrated correctly.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu