XYZ Credit Card company regularly helps its merchants understand their data better and take key business decisions accurately by providing machine learning and analytics consulting. ABC is an established Brick & Mortar retailer that frequently conducts marketing campaigns for its diverse product range. As a merchant of XYZ, they have sought XYZ to assist them in their discount marketing process using the power of machine learning.
Discount marketing and coupon usage are very widely used promotional techniques to attract new customers and to retain & reinforce loyalty of existing customers. The measurement of a consumer’s propensity towards coupon usage and the prediction of the redemption behaviour are crucial parameters in assessing the effectiveness of a marketing campaign.
ABC promotions are shared across various channels including email, notifications, etc. A number of these campaigns include coupon discounts that are offered for a specific product/range of products. The retailer would like the ability to predict whether customers redeem the coupons received across channels, which will enable the retailer’s marketing team to accurately design coupon construct, and develop more precise and targeted marketing strategies.
The data available in this problem contains the following information, including the details of a sample of campaigns and coupons used in previous campaigns -
User Demographic Details
Campaign and coupon Details
Product details
Previous transactions
Based on previous transaction & performance data from the last 18 campaigns, predict the probability for the next 10 campaigns in the test set for each coupon and customer combination, whether the customer will redeem the coupon or not?
Here is the schema for the different data tables available. The detailed data dictionary is provided next.
You are provided with the following files:
train.csv: Train data containing the coupons offered to the given customers under the 18 campaigns
Variable | Definition |
---|---|
id | Unique id for coupon customer impression |
campaign_id | Unique id for a discount campaign |
coupon_id | Unique id for a discount coupon |
customer_id | Unique id for a customer |
redemption_status | (target) (0 - Coupon not redeemed, 1 - Coupon redeemed) |
campaign_data.csv: Campaign information for each of the 28 campaigns
Variable | Definition |
---|---|
campaign_id | Unique id for a discount campaign |
campaign_type | Anonymised Campaign Type (X/Y) |
start_date | Campaign Start Date |
end_date | Campaign End Date |
coupon_item_mapping.csv: Mapping of coupon and items valid for discount under that coupon
Variable | Definition |
---|---|
coupon_id | Unique id for a discount coupon (no order) |
item_id | Unique id for items for which given coupon is valid (no order) |
customer_demographics.csv: Customer demographic information for some customers
Variable | Definition |
---|---|
customer_id | Unique id for a customer |
age_range | Age range of customer family in years |
marital_status | Married/Single |
rented | 0 - not rented accommodation, 1 - rented accommodation |
family_size | Number of family members |
no_of_children | Number of children in the family |
income_bracket | Label Encoded Income Bracket (Higher income corresponds to higher number) |
customer_transaction_data.csv: Transaction data for all customers for duration of campaigns in the train data
Variable | Definition |
---|---|
date | Date of Transaction |
customer_id | Unique id for a customer |
item_id | Unique id for item |
quantity | quantity of item bought |
selling_price | Sales value of the transaction |
other_discount | Discount from other sources such as manufacturer coupon/loyalty card |
coupon_discount | Discount availed from retailer coupon |
item_data.csv: Item information for each item sold by the retailer
Variable | Definition |
---|---|
item_id | Unique id for itemv |
brand | Unique id for item brand |
brand_type | Brand Type (local/Established) |
category | Item Category |
test.csv: Contains the coupon customer combination for which redemption status is to be predicted
Variable | Definition |
---|---|
id | Unique id for coupon customer impression |
campaign_id | Unique id for a discount campaign |
coupon_id | Unique id for a discount coupon |
customer_id | Unique id for a customer |
To summarise the entire process:
This dataset and map service provides information on the U.S. Housing and Urban Development's (HUD) low to moderate income areas. The term Low to Moderate Income, often referred to as low-mod, has a specific programmatic context within the Community Development Block Grant (CDBG) program. Over a 1, 2, or 3-year period, as selected by the grantee, not less than 70 percent of CDBG funds must be used for activities that benefit low- and moderate-income persons. HUD uses special tabulations of Census data to determine areas where at least 51% of households have incomes at or below 80% of the area median income (AMI). This dataset and map service contains the following layer.
This database contains food demand elasticities estimates collected from a literature review carried out in 2015 as part of a contract funded by the International Food Policy Research Institute (IFPRI) (contract n° 2015X144.FEM). It served as a basis for the meta-analysis of price and income elasticities of food demand presented in Femenia (2019). Data collection: Two reports providing food demand elasticities published by the United States Department of Agriculture (USDA) (Seale et al. (2003) and Muhammad et al. (2011)) are frequently used to calibrate demand functions in global economic models. In these reports, price and income elasticities are estimated for eight broad food categories and for a large number of countries. This broad level of country coverage renders these elasticity data well-suited for calibrating large simulation models. Economists might however wish to use other source of elasticities for different reasons when, for instance, they consider food products at a higher disaggregation level or when they wish to compare results obtained with a calibration of demand parameters based on USDA estimates to those obtained with a calibration based on other estimates given in the literature. The USDA provides a literature review database (USDA, 2005), which contains this type of information. This database collects own price, cross price, expenditure and income demand elasticity estimates from papers that have been published and/or presented in the United States (US) between 1979 and 2005. While the database covers a large variety of products at various aggregation levels, few countries are included. These two sources of data, namely, the USDA’s estimates given in Seale et al. (2003) and Muhammad et al. (2011) and the USDA’s literature review database, were used as a basis to build the database presented here. We started with the structure of the USDA literature review database, which includes useful information on each elasticity estimate, such as the references of the papers from which the estimates have been collected; the countries, products and time periods concerned; the types of data used to conduct estimations; and the demand models estimated. The elasticities estimated by Seale et al. (2003) and Muhammad et al. (2011) were also included. We then reviewed the primary studies to check the information included in the USDA database and to ensure the consistency of the data. Of the 74 references present in these data, five PhD dissertations were not available to us, thus restricting our ability to verify the data and to collect new information, and we decided to exclude these references. In a second step, we searched for new references providing food demand elasticity estimates in the economic literature with a focus on pre-2005 studies dealing with countries other than the US and China and with a focus on post-2005 studies regardless of the country. The search was performed with Google Scholar in March 2015 using the following combinations of keywords: “price, elasticities, food, demand” and “income, elasticities, food, demand”. We did not limit our search to published papers; working papers, reports, and papers presented at conferences were also included. A total of 72 references were collected in this way. All price and income elasticity estimates of food demand reported in these references were collected. Among own price elasticities we distinguished uncompensated (Marshallian) price elasticities from compensated (Hicksian) elasticities. The final database contains 25,117 food demand elasticities estimates collected from 148 studies published between 1973 and 2014. Information included and data coding: In addition to the values of elasticity estimates and the references of the primary studies from which they have been collected, our database incorporate several variables aimed at providing detailed information on the estimated values. These descriptive variables contain information related to the type of data used to estimate the elasticities (time series, panel or cross section), to whether these data have been collected at the micro (household) or macro (country) level, to the decade in which they have been collected, which ranges from 1950 to 2010, and to the countries and products to which these data refer. To homogenize the information on food products, product names as they appear in the primary studies are mapped to the following eight product categories: beverages and tobacco, cereals, dairy products, fruits and vegetables, oils and fats, meat and fish, other food products and non-food products. Given that these categories are in some cases much broader than the product levels considered in primary studies, a variable representing the aggregation level of the primary data is also associated with each observation. The following four aggregation levels are considered: “global food aggregate”; “product category aggregate”, which corresponds to the aforementioned categories; “product level”, which refers to single products, for instance bananas and apples for fruits, beef and poultry for meat, wheat and corn for cereals, etc.; “differentiated product level”, which refers to products differentiated by specific characteristics, for instance, organic or conventional for fruits and vegetables or cereals and types of cut for meat. Country names are converted into standard ISO-alpha-3 country codes (International Organization for Standardization) and are mapped to 11 world regions. Where applicable, we also report in our data information concerning the types (urban, rural or any type) of households from which the primary data have been collected. Finally, information related to the functional form of the demand system from which the elasticities have been estimated is also reported in the database. A description of these variables and the coding of their modalities is provided in the "Elasticities_Review_datacoding" file associated to the database file. References Femenia, F. (2019). A meta-analysis of the price and income elasticities of food demand. German Journal of Agricultural Economics, 68(2), 77-98. Muhammad, A., Seale, J.L., Meade, B., Regmi, A. (2011). International Evidence on Food Consumption Patterns: An Update Using 2005 International Comparison Program Data. USDA-ERS Technical Bulletin, No. 1929, 59 p Seale, J.L., Regmi, A., Bernstein, J. (2003). International evidence on food consumption patterns. USDA Technical Bulletin, No. TB-1904. 70 p. USDA. (2005). Commodity and food elasticities. Accessed May 2015.
The Community Development Block Grant (CDBG) program requires that each CDBG funded activity must either principally benefit low- and moderate-income (LMI) persons, aid in the prevention or elimination of slums or blight, or meet a community development need having a particular urgency. Most activities funded by the CDBG program are designed to benefit low- and moderate-income (LMI) persons. That benefit may take the form of housing, jobs, and services. Additionally, activities may qualify for CDBG assistance if the activity will benefit all the residents of a primarily residential area where at least 51 percent of the residents are low- and moderate-income persons, i.e. area-benefit (LMA). [Certain exception grantees may qualify activities as area-benefit with fewer LMI persons than 51 percent.]The Office of Community Planning and Development (CPD) provides estimates of the number of persons that can be considered Low-, Low- to Moderate-, and Low-, Moderate-, and Medium-income persons based on special tabulations of data from the 2016-2020 ACS 5-Year Estimates and the 2020 Island Areas Census. The Low- and Moderate-Income Summary Data may be used by CDBG grantees to determine whether or not a CDBG-funded activity qualifies as an LMA activity. The LMI percentages are calculated at various principal geographies provided by the U.S. Census Bureau. CPD provides the following datasets:Geographic Summary Level "150": Census Tract-Block Group.The block groups are associated with the HUD Unit-of-Government-Identification-Code for the CDBG grantee jurisdiction by fiscal year that is associated with each block group.Local government jurisdictions include; Summary Level 160: Incorporated Cities and Census-Designated Places, i.e. "Places", Summary Level 170: Consolidated Cities, Summary Level 050: County, and Summary Level 060: County Subdivision geographies.In the data files, these geographies are identified by their Federal Information Processing Standards (FIPS) codes and names for the place, consolidated city, or block group, county subdivision, county, and state.The statistical information used in the calculation of estimates identified in the data sets comes from the 2016-2020 ACS, 2020 Island Areas Census, and the Income Limits for Metropolitan Areas and for Non Metropolitan Counties. The data necessary to determine an LMI percentage for an area is not published in the publicly-available ACS data tables. Therefore, the Bureau of Census matches family size, income, and the income limits in a special tabulation to produce the estimates.Estimates are provided at three income levels: Low Income (up to 50 percent of the Area Median Income (AMI)); Moderate Income (greater than 50 percent AMI and up to 80 percent AMI), and Medium Income (greater than 80 percent AMI and up to 120 AMI). HUD is publishing the margin of error (MOE) data for all block groups and all places in the 2020 ACS LMISD. These data are provided within the LMISD tables.The MOE does not provide an expanded range for compliance. For example, a service area of 50 percent LMI with a 2 percent MOE would still be just 50 percent LMI for compliance purposes. However, the 2 percent MOE would inform the grantee about the accuracy of the ACS data before undergoing the effort and cost of conducting a local income survey, which is the alternative to using the HUD-provided data.CPD Notice 24-04 announced the publication of LMISD based on the 2020 ACS, and updated CPD Notice 19-02 as well as explains policy about the accuracy of surveys conducted pursuant to CPD Notice 14-013.Questions about the calculation of the estimates may be directed to Formula Help Desk.Questions about the use of the data should be directed to the staff of the CPD Field Office.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Madison, MS, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/madison-ms-median-household-income-by-household-size.jpeg" alt="Madison, MS median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Madison median household income. You can refer the same here
This dataset is a listing of all active City of Chicago employees, complete with full names, departments, positions, employment status (part-time or full-time), frequency of hourly employee –where applicable—and annual salaries or hourly rate. Please note that "active" has a specific meaning for Human Resources purposes and will sometimes exclude employees on certain types of temporary leave. For hourly employees, the City is providing the hourly rate and frequency of hourly employees (40, 35, 20 and 10) to allow dataset users to estimate annual wages for hourly employees. Please note that annual wages will vary by employee, depending on number of hours worked and seasonal status. For information on the positions and related salaries detailed in the annual budgets, see https://www.cityofchicago.org/city/en/depts/obm.html
Data Disclosure Exemptions: Information disclosed in this dataset is subject to FOIA Exemption Act, 5 ILCS 140/7 (Link:https://www.ilga.gov/legislation/ilcs/documents/000501400K7.htm)
The Community Development Block Grant (CDBG) program requires that each CDBG funded activity must either principally benefit low- and moderate-income (LMI) persons, aid in the prevention or elimination of slums or blight, or meet a community development need having a particular urgency. Most activities funded by the CDBG program are designed to benefit low- and moderate-income (LMI) persons. That benefit may take the form of housing, jobs, and services. Additionally, activities may qualify for CDBG assistance if the activity will benefit all the residents of a primarily residential area where at least 51 percent of the residents are low- and moderate-income persons, i.e. area-benefit (LMA). [Certain exception grantees may qualify activities as area-benefit with fewer LMI persons than 51 percent.]The Office of Community Planning and Development (CPD) provides estimates of the number of persons that can be considered Low-, Low- to Moderate-, and Low-, Moderate-, and Medium-income persons based on special tabulations of data from the 2016-2020 ACS 5-Year Estimates and the 2020 Island Areas Census. The Low- and Moderate-Income Summary Data may be used by CDBG grantees to determine whether or not a CDBG-funded activity qualifies as an LMA activity. The LMI percentages are calculated at various principal geographies provided by the U.S. Census Bureau. CPD provides the following datasets:Geographic Summary Level "150": Census Tract-Block Group.The block groups are associated with the HUD Unit-of-Government-Identification-Code for the CDBG grantee jurisdiction by fiscal year that is associated with each block group.Local government jurisdictions include; Summary Level 160: Incorporated Cities and Census-Designated Places, i.e. "Places", Summary Level 170: Consolidated Cities, Summary Level 050: County, and Summary Level 060: County Subdivision geographies.In the data files, these geographies are identified by their Federal Information Processing Standards (FIPS) codes and names for the place, consolidated city, or block group, county subdivision, county, and state.The statistical information used in the calculation of estimates identified in the data sets comes from the 2016-2020 ACS, 2020 Island Areas Census, and the Income Limits for Metropolitan Areas and for Non Metropolitan Counties. The data necessary to determine an LMI percentage for an area is not published in the publicly-available ACS data tables. Therefore, the Bureau of Census matches family size, income, and the income limits in a special tabulation to produce the estimates.Estimates are provided at three income levels: Low Income (up to 50 percent of the Area Median Income (AMI)); Moderate Income (greater than 50 percent AMI and up to 80 percent AMI), and Medium Income (greater than 80 percent AMI and up to 120 AMI). HUD is publishing the margin of error (MOE) data for all block groups and all places in the 2020 ACS LMISD. These data are provided within the LMISD tables.The MOE does not provide an expanded range for compliance. For example, a service area of 50 percent LMI with a 2 percent MOE would still be just 50 percent LMI for compliance purposes. However, the 2 percent MOE would inform the grantee about the accuracy of the ACS data before undergoing the effort and cost of conducting a local income survey, which is the alternative to using the HUD-provided data.CPD Notice 24-04 announced the publication of LMISD based on the 2020 ACS, and updated CPD Notice 19-02 as well as explains policy about the accuracy of surveys conducted pursuant to CPD Notice 14-013.Questions about the calculation of the estimates may be directed to Formula Help Desk.Questions about the use of the data should be directed to the staff of the CPD Field Office.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in West Linn, OR, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for West Linn median household income. You can refer the same here
Problem Statement 1: Design an ML-based Income Range Predictor with Bias-Aware Integration:
Develop a machine learning solution to predict job applicants' income range based on their demographic and professional information. The solution should include a user-friendly interface, either as a website or SDK. Address potential biases to ensure fair and unbiased predictions. Conduct exploratory data analysis to understand the data, identify biases, and analyze variable relationships. Mitigate bias during model training and evaluation.
Dataset info: This dataset related to income ranges of adults. The uses of this dataset are in cases where companies may need to use an algorithm to classify job applicants into income ranges. In this data set the income column- (>50k or <=50k) are the y-labels for prediction. (Note that the data is raw and rife with issues- just as it is in real time- try your best to work through it!)
Problem Statement 2: Analyzing Bias in Criminal Recidivism Scores and Building a Performance Comparison Solution:
Recidivism: Whether a convicted criminal reoffends. Currently, an algorithm is used to predict whether a criminal will reoffend. It makes criminals answer questions, and based on the answers- a decile score from 1-10 is given, rating them from low-high.
Perform an in-depth EDA on the given dataset of criminal details and "recidivism scores". Identify biases and analyze their impact on predictions. Use a second dataset with true recidivism data- same criminal data, to develop a solution that predicts likelihood of reoffending. Integrate the solution into a website or SDK, considering ethical considerations for responsible use. Make a collective report(a pdf or other) explaining your findings and solution.
Dataset info: crimealgo Criminal data was put through an algorithms and was processed into a 1-10 decile score on how likely they were going to commit crime again (recidivism). Use all the decile scores and criminal data at your disposal to find correlations and biases for the first past of the problem statement. (Note that the data is raw and rife with issues- just as it is in real time- try your best to work through it!)
Dataset info: truerecid Same dataset as before- except instead of the calculated recidivism score from the test- the true values are given in the "did_they_commit_crime_again" column. This column has 0 values for criminals that did not commit crime again, and 1 is for those that did. This is your y-label. (Note that the data is raw and rife with issues- just as it is in real time- try your best to work through it!)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in University Park, TX, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/university-park-tx-median-household-income-by-household-size.jpeg" alt="University Park, TX median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for University Park median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Weston, Vermont, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Weston town median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Worth County, IA, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/worth-county-ia-median-household-income-by-household-size.jpeg" alt="Worth County, IA median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Worth County median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Zionsville, IN, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Zionsville median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Whitpain Township, Pennsylvania, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Whitpain township median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Rye Brook, NY, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Rye Brook median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Valley, NE, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/valley-ne-median-household-income-by-household-size.jpeg" alt="Valley, NE median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Valley median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Western, New York, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/western-ny-median-household-income-by-household-size.jpeg" alt="Western, New York median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Western town median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Reading, Massachusetts, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/reading-ma-median-household-income-by-household-size.jpeg" alt="Reading, Massachusetts median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Reading town median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Westport, Connecticut, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/westport-ct-median-household-income-by-household-size.jpeg" alt="Westport, Connecticut median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Westport town median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Wenonah, NJ, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Wenonah median household income. You can refer the same here
XYZ Credit Card company regularly helps its merchants understand their data better and take key business decisions accurately by providing machine learning and analytics consulting. ABC is an established Brick & Mortar retailer that frequently conducts marketing campaigns for its diverse product range. As a merchant of XYZ, they have sought XYZ to assist them in their discount marketing process using the power of machine learning.
Discount marketing and coupon usage are very widely used promotional techniques to attract new customers and to retain & reinforce loyalty of existing customers. The measurement of a consumer’s propensity towards coupon usage and the prediction of the redemption behaviour are crucial parameters in assessing the effectiveness of a marketing campaign.
ABC promotions are shared across various channels including email, notifications, etc. A number of these campaigns include coupon discounts that are offered for a specific product/range of products. The retailer would like the ability to predict whether customers redeem the coupons received across channels, which will enable the retailer’s marketing team to accurately design coupon construct, and develop more precise and targeted marketing strategies.
The data available in this problem contains the following information, including the details of a sample of campaigns and coupons used in previous campaigns -
User Demographic Details
Campaign and coupon Details
Product details
Previous transactions
Based on previous transaction & performance data from the last 18 campaigns, predict the probability for the next 10 campaigns in the test set for each coupon and customer combination, whether the customer will redeem the coupon or not?
Here is the schema for the different data tables available. The detailed data dictionary is provided next.
You are provided with the following files:
train.csv: Train data containing the coupons offered to the given customers under the 18 campaigns
Variable | Definition |
---|---|
id | Unique id for coupon customer impression |
campaign_id | Unique id for a discount campaign |
coupon_id | Unique id for a discount coupon |
customer_id | Unique id for a customer |
redemption_status | (target) (0 - Coupon not redeemed, 1 - Coupon redeemed) |
campaign_data.csv: Campaign information for each of the 28 campaigns
Variable | Definition |
---|---|
campaign_id | Unique id for a discount campaign |
campaign_type | Anonymised Campaign Type (X/Y) |
start_date | Campaign Start Date |
end_date | Campaign End Date |
coupon_item_mapping.csv: Mapping of coupon and items valid for discount under that coupon
Variable | Definition |
---|---|
coupon_id | Unique id for a discount coupon (no order) |
item_id | Unique id for items for which given coupon is valid (no order) |
customer_demographics.csv: Customer demographic information for some customers
Variable | Definition |
---|---|
customer_id | Unique id for a customer |
age_range | Age range of customer family in years |
marital_status | Married/Single |
rented | 0 - not rented accommodation, 1 - rented accommodation |
family_size | Number of family members |
no_of_children | Number of children in the family |
income_bracket | Label Encoded Income Bracket (Higher income corresponds to higher number) |
customer_transaction_data.csv: Transaction data for all customers for duration of campaigns in the train data
Variable | Definition |
---|---|
date | Date of Transaction |
customer_id | Unique id for a customer |
item_id | Unique id for item |
quantity | quantity of item bought |
selling_price | Sales value of the transaction |
other_discount | Discount from other sources such as manufacturer coupon/loyalty card |
coupon_discount | Discount availed from retailer coupon |
item_data.csv: Item information for each item sold by the retailer
Variable | Definition |
---|---|
item_id | Unique id for itemv |
brand | Unique id for item brand |
brand_type | Brand Type (local/Established) |
category | Item Category |
test.csv: Contains the coupon customer combination for which redemption status is to be predicted
Variable | Definition |
---|---|
id | Unique id for coupon customer impression |
campaign_id | Unique id for a discount campaign |
coupon_id | Unique id for a discount coupon |
customer_id | Unique id for a customer |
To summarise the entire process: