Individuals; Tax filers and dependants by total income, sex and age groups (final T1 Family File; T1FF).
Background
Tayko is a software catalog firm that sells games and educational software. It started out as a software manufacturer and later added third-party titles to its offerings. It has recently put together a revised collection of items in a new catalog, which it is preparing to roll out in a mailing. In addition to its own software titles, Tayko’s customer list is a key asset. In an attempt to expand its customer base, it has recently joined a consortium of catalog firms that specialize in computer and software products. The consortium affords members the opportunity to mail catalogs to names drawn from a pooled list of customers. Members supply their own customer lists to the pool, and can “withdraw” an equivalent number of names each quarter. Members are allowed to do predictive modeling on the records in the pool so they can do a better job of selecting names from the pool.
The Mailing Experimen
Tayko has supplied its customer list of 200,000 names to the pool, which totals over 5,000,000 names, so it is now entitled to draw 200,000 names for a mailing. Tayko would like to select the names that have the best chance of performing well, so it conducts a test—it draws 20,000 names from the pool and does a test mailing of the new catalog. This mailing yielded 1065 purchasers, a response rate of 0.053. To optimize the performance of the data mining techniques, it was decided to work with a the stratified sample that contained equal numbers of purchasers and nonpurchasers. For ease of presentation, the dataset for this case includes just 1000 purchasers and 1000 nonpurchasers, with an apparent response rate of 0.5. Therefore, after using the dataset to predict who will be a purchaser, we must adjust the purchase rate back down by multiplying each case’s “probability of purchase” by 0.053/0.5, or 0.107.
Data
There are two outcome variables in this case. The purchase indicates whether or not a prospect responded to the test mailing and purchased something. Spending indicates, for those who made a purchase, how much they spent. The overall procedure, in this case, will be to develop two models. One will be used to classify records as a purchase or no purchase. The second will be used for those cases that are classified as purchases and will predict the amount they will spend. Table 21.6 shows the first few rows of data. Table 21.7 provides a description of the variables available in this case.
DESCRIPTION OF VARIABLES FOR TAYKO DATASET
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10470699%2Fe5f0d825c7dc8e5edd97d576c3a4ea48%2F1%20-%20.png?generation=1683879198956353&alt=media" alt="">
This table presents income shares, thresholds, tax shares, and total counts of individual Canadian tax filers, with a focus on high income individuals (95% income threshold, 99% threshold, etc.). Income thresholds are based on national threshold values, regardless of selected geography; for example, the number of Nova Scotians in the top 1% will be calculated as the number of taxfiling Nova Scotians whose total income exceeded the 99% national income threshold. Different definitions of income are available in the table namely market, total, and after-tax income, both with and without capital gains.
https://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
A dataset listing the richest zip codes in New Jersey per the most current US Census data, including information on rank and average income.
https://www.spotzi.com/en/about/terms-of-service/https://www.spotzi.com/en/about/terms-of-service/
This dataset offers a granular view of disposable income trends within Canada, and is available at the Dissemination Area level - enabling marketers to zoom in on micro-level trends within Canada's diverse regions. This level of precision allows for targeted campaigns that resonate with local audiences. Some key features of this dataset include income segmentation and shelter cost insights.
https://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
A dataset listing the richest zip codes in Missouri per the most current US Census data, including information on rank and average income.
https://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
A dataset listing the richest zip codes in North Carolina per the most current US Census data, including information on rank and average income.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Individuals; Tax filers and dependants by total income, sex and age groups (final T1 Family File; T1FF).