10 datasets found
  1. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. d

    Replication Data for: \"A Topic-based Segmentation Model for Identifying...

    • search.dataone.org
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
    Description

    We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...

  3. HR Dataset.csv

    • kaggle.com
    Updated Mar 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fahad Rehman (2024). HR Dataset.csv [Dataset]. https://www.kaggle.com/datasets/fahadrehman07/hr-comma-sep-csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Fahad Rehman
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🟡Please Upvote my dataset If you like It.✨

    This dataset contains valuable employee information over time that can be analyzed to help optimize key HR functions. Some potential use cases include:

    Attrition analysis: Identify factors correlated with attrition like department, role, salary, etc. Segment high-risk employees. Predict future attrition.

    Performance management: Analyze the relationship between metrics like ratings, and salary increments. recommend performance improvement programs.

    Workforce planning: Forecast staffing needs based on historical hiring/turnover trends. Determine optimal recruitment strategies.

    Compensation analysis: Benchmark salaries vs performance, and experience. Identify pay inequities. Inform compensation policies.

    Diversity monitoring: Assess diversity metrics like gender ratio over roles, and departments. Identify underrepresented groups.

    Succession planning: Identify high-potential candidates and critical roles. Predict internal promotions/replacements in advance.

    Given its longitudinal employee data and multiple variables, this dataset provides rich opportunities for exploration, predictive modeling, and actionable insights. With a large sample size, it can uncover subtle patterns. Cleaning, joining with other contextual data sources can yield even deeper insights. This makes it a valuable starting point for many organizational studies and evidence-based decision-making.

    .............................................................................................................................................................................................................................................

    This dataset contains information about different attributes of employees from a company. It includes 1000 employee records and 12 feature columns.

    The columns are:

    satisfaction_level: Employee satisfaction score (1-5 scale) last_evaluation: Score on last evaluation (1-5 scale) number_project: Number of projects employee worked on average_monthly_hours: Average hours worked in a month time_spend_company: Number of years spent with the company work_accident: If an employee had a workplace accident (yes/no) left: If an employee has left the company (yes/no) promotion_last_5years: Number of promotions in last 5 years Department: Department of the employee Salary: Annual salary of employee satisfaction_level: Employee satisfaction level (1-5 scale) last_evaluation: Score on last evaluation (1-5 scale)

  4. f

    Full dataset used for IRT analysis on L-POST

    • figshare.com
    txt
    Updated May 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kathleen Vancleef; Nele Demeyere; Luning Sun (2021). Full dataset used for IRT analysis on L-POST [Dataset]. http://doi.org/10.6084/m9.figshare.13522187.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 11, 2021
    Dataset provided by
    figshare
    Authors
    Kathleen Vancleef; Nele Demeyere; Luning Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUBTEST LABELS IN DATA (SUBTEST_ID)

    0 Fine shape discrimination

    1 Dot lattices

    2 RFP contour integration

    3 Shape ratio discrimination (Efron)

    4 Figure ground segmentation

    5 Kinetic object segmentation

    6 RFP texture surfaces

    7 RFP Fragmented outline

    8 Biological motion

    9 Recognition of missing part

    10 Global Motion detection

    11 Dot counting

    12 Recognition of objects in isolation

    13 Embedded figure detection

    14 Recognition of objects in a scene

  5. D

    Background data for: Latent-variable modeling of ordinal outcomes in...

    • dataverse.no
    • dataone.org
    pdf, text/tsv, txt
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manfred Krug; Manfred Krug; Fabian Vetter; Fabian Vetter; Lukas Sönning; Lukas Sönning (2024). Background data for: Latent-variable modeling of ordinal outcomes in language data analysis [Dataset]. http://doi.org/10.18710/WI9TEH
    Explore at:
    text/tsv(4475), text/tsv(1079156), txt(8660), pdf(160867), pdf(287207)Available download formats
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    DataverseNO
    Authors
    Manfred Krug; Manfred Krug; Fabian Vetter; Fabian Vetter; Lukas Sönning; Lukas Sönning
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2008 - Dec 31, 2018
    Area covered
    Malta
    Dataset funded by
    German Humboldt Foundation
    Bavarian Ministry for Science, Research and the Arts
    Spanish Ministry of Education and Science with European Regional Development Fund
    Description

    This dataset contains tabular files with information about the usage preferences of speakers of Maltese English with regard to 63 pairs of lexical expressions. These pairs (e.g. truck-lorry or realization-realisation) are known to differ in usage between BrE and AmE (cf. Algeo 2006). The data were elicited with a questionnaire that asks informants to indicate whether they always use one of the two variants, prefer one over the other, have no preference, or do not use either expression (see Krug and Sell 2013 for methodological details). Usage preferences were therefore measured on a symmetric 5-point ordinal scale. Data were collected between 2008 to 2018, as part of a larger research project on lexical and grammatical variation in settings where English is spoken as a native, second, or foreign language. The current dataset, which we use for our methodological study on ordinal data modeling strategies, consists of a subset of 500 speakers that is roughly balanced on year of birth. Abstract: Related publication In empirical work, ordinal variables are typically analyzed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological literature, it also generates simple and informative data summaries, a standard often not met by statistically more adequate procedures. Motivated by a survey of how ordered variables are dealt with in language research, we draw attention to an un(der)used latent-variable approach to ordinal data modeling, which constitutes an alternative perspective on the most widely used form of ordered regression, the cumulative model. Since the latent-variable approach does not feature in any of the studies in our survey, we believe it is worthwhile to promote its benefits. To this end, we draw on questionnaire-based preference ratings by speakers of Maltese English, who indicated on a 5-point scale which of two synonymous expressions (e.g. package-parcel) they (tend to) use. We demonstrate that a latent-variable formulation of the cumulative model affords nuanced and interpretable data summaries that can be visualized effectively, while at the same time avoiding limitations inherent in mean response models (e.g. distortions induced by floor and ceiling effects). The online supplementary materials include a tutorial for its implementation in R.

  6. Figure 5

    • figshare.com
    xlsx
    Updated Apr 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aishwarya Iyer (2025). Figure 5 [Dataset]. http://doi.org/10.6084/m9.figshare.28601363.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 7, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Aishwarya Iyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw dataset

  7. N

    Median Household Income Variation by Family Size in Camp Point, IL:...

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Median Household Income Variation by Family Size in Camp Point, IL: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1abdb4f1-73fd-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Camp Point, Illinois
    Variables measured
    Household size, Median Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median household incomes for various household sizes in Camp Point, IL, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

    Key observations

    • Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Camp Point did not include 5, or 7-person households. Across the different household sizes in Camp Point the mean income is $67,562, and the standard deviation is $27,282. The coefficient of variation (CV) is 40.38%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.
    • In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $31,329. It then further increased to $101,486 for 6-person households, the largest household size for which the bureau reported a median household income.

    https://i.neilsberg.com/ch/camp-point-il-median-household-income-by-household-size.jpeg" alt="Camp Point, IL median household income, by household size (in 2022 inflation-adjusted dollars)">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Household Sizes:

    • 1-person households
    • 2-person households
    • 3-person households
    • 4-person households
    • 5-person households
    • 6-person households
    • 7-or-more-person households

    Variables / Data Columns

    • Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).
    • Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Camp Point median household income. You can refer the same here

  8. N

    Median Household Income Variation by Family Size in Hunts Point, WA:...

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Median Household Income Variation by Family Size in Hunts Point, WA: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1b08bf9b-73fd-11ee-949f-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Hunts Point, Washington
    Variables measured
    Household size, Median Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median household incomes for various household sizes in Hunts Point, WA, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

    Key observations

    • Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Hunts Point did not include 6, or 7-person households. Across the different household sizes in Hunts Point the mean income is $268,607, and the standard deviation is $3,626. The coefficient of variation (CV) is 1.35%. This low CV indicates low relative variability, suggesting that the incomes are relatively consistent across different sizes of households. Please note that the U.S. Census Bureau uses $250,001 as a JAM value to report incomes of $250,000 or more. In the case of Hunts Point, there were 4 household sizes where the JAM values were used. Thus, the numbers for the mean and standard deviation may not be entirely accurate and have a higher possibility of errors. However, to obtain an approximate estimate, we have used a value of $250,001 as the income for calculations, as reported in the datasets by the U.S. Census Bureau.
    • In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $270,229. It then further decreased to 262,121 for 5-person households, the largest household size for which the bureau reported a median household income.

    https://i.neilsberg.com/ch/hunts-point-wa-median-household-income-by-household-size.jpeg" alt="Hunts Point, WA median household income, by household size (in 2022 inflation-adjusted dollars)">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Household Sizes:

    • 1-person households
    • 2-person households
    • 3-person households
    • 4-person households
    • 5-person households
    • 6-person households
    • 7-or-more-person households

    Variables / Data Columns

    • Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).
    • Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Hunts Point median household income. You can refer the same here

  9. N

    Median Household Income Variation by Family Size in St. Marys Point, MN:...

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Median Household Income Variation by Family Size in St. Marys Point, MN: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1b78b055-73fd-11ee-949f-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Minnesota, Saint Marys Point
    Variables measured
    Household size, Median Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median household incomes for various household sizes in St. Marys Point, MN, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

    Key observations

    • Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, St. Marys Point did not include 6, or 7-person households. Across the different household sizes in St. Marys Point the mean income is $129,439, and the standard deviation is $87,273. The coefficient of variation (CV) is 67.42%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households. Please note that the U.S. Census Bureau uses $250,001 as a JAM value to report incomes of $250,000 or more. In the case of St. Marys Point, there were 1 household sizes where the JAM values were used. Thus, the numbers for the mean and standard deviation may not be entirely accurate and have a higher possibility of errors. However, to obtain an approximate estimate, we have used a value of $250,001 as the income for calculations, as reported in the datasets by the U.S. Census Bureau.
    • In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $47,290. It then further increased to $270,229 for 5-person households, the largest household size for which the bureau reported a median household income.

    https://i.neilsberg.com/ch/st-marys-point-mn-median-household-income-by-household-size.jpeg" alt="St. Marys Point, MN median household income, by household size (in 2022 inflation-adjusted dollars)">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Household Sizes:

    • 1-person households
    • 2-person households
    • 3-person households
    • 4-person households
    • 5-person households
    • 6-person households
    • 7-or-more-person households

    Variables / Data Columns

    • Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).
    • Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for St. Marys Point median household income. You can refer the same here

  10. N

    Median Household Income Variation by Family Size in Point Blank, TX:...

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Median Household Income Variation by Family Size in Point Blank, TX: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1b56d40a-73fd-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Texas, Point Blank
    Variables measured
    Household size, Median Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median household incomes for various household sizes in Point Blank, TX, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

    Key observations

    • Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Point Blank did not include 4, 5, 6, or 7-person households. Across the different household sizes in Point Blank the mean income is $79,416, and the standard deviation is $58,916. The coefficient of variation (CV) is 74.19%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.
    • In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $20,662. It then further increased to $138,492 for 3-person households, the largest household size for which the bureau reported a median household income.

    https://i.neilsberg.com/ch/point-blank-tx-median-household-income-by-household-size.jpeg" alt="Point Blank, TX median household income, by household size (in 2022 inflation-adjusted dollars)">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Household Sizes:

    • 1-person households
    • 2-person households
    • 3-person households
    • 4-person households
    • 5-person households
    • 6-person households
    • 7-or-more-person households

    Variables / Data Columns

    • Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).
    • Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Point Blank median household income. You can refer the same here

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu