https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Description
- Customer Demographics: Includes FullName, Gender, Age, CreditScore, and MonthlyIncome. These variables provide a demographic snapshot of the customer base, allowing for segmentation and targeted marketing analysis.
- Geographical Data: Comprising Country, State, and City, this section facilitates location-based analytics, market penetration studies, and regional sales performance.
- Product Information: Details like Category, Product, Cost, and Price enable product trend analysis, profitability assessment, and inventory optimization.
- Transactional Data: Captures the customer journey through SessionStart, CartAdditionTime, OrderConfirmation, OrderConfirmationTime, PaymentMethod, and SessionEnd. This rich temporal data can be used for funnel analysis, conversion rate optimization, and customer behavior modeling.
- Post-Purchase Details: With OrderReturn and ReturnReason, analysts can delve into return rate calculations, post-purchase satisfaction, and quality control.
Types of Analysis
- Descriptive Analytics: Understand basic metrics like average monthly income, most common product categories, and typical credit scores.
- Predictive Analytics: Use machine learning to predict credit risk or the likelihood of a purchase based on demographics and session activity.
- Customer Segmentation: Group customers by demographics or purchasing behavior to tailor marketing strategies.
- Geospatial Analysis: Examine sales distribution across different regions and optimize logistics. Time Series Analysis: Study the seasonality of purchases and session activities over time.
- Funnel Analysis: Evaluate the customer journey from session start to order confirmation and identify drop-off points.
- Cohort Analysis: Track customer cohorts over time to understand retention and repeat purchase patterns.
- Market Basket Analysis: Discover product affinities and develop cross-selling strategies.
Curious about how I created the data? Feel free to click here and take a peek! 😉
📊🔍 Good Luck and Happy Analysing 🔍📊
This dataset was created by Max Duong
It contains the following files:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Facebook Ad Campaign’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/madislemsalu/facebook-ad-campaign on 12 November 2021.
--- Dataset description provided by original source is as follows ---
Simple Dataset from different marketing campaigns.
The total conversion number shows the total number of signups or installs for instance while approved conversions tells how many became actual active users.
Courtesy of Bunq.
--- Original source retains full ownership of the source dataset ---
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
It is a dataset that describing Portugal bank marketing campaigns results. Conducted campaigns were based mostly on direct phone calls, offering bank client to place a term deposit. If after all marking afforts client had agreed to place deposit - target variable marked 'yes', otherwise 'no'
Sourse of the data https://archive.ics.uci.edu/ml/datasets/bank+marketing
Citation Request:
This dataset is public available for research. The details are described in S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
Title: Bank Marketing (with social/economic context)
Sources Created by: Sérgio Moro (ISCTE-IUL), Paulo Cortez (Univ. Minho) and Paulo Rita (ISCTE-IUL) @ 2014
Past Usage:
The full dataset (bank-additional-full.csv) was described and analyzed in:
S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems (2014), doi:10.1016/j.dss.2014.03.001.
Relevant Information:
This dataset is based on "Bank Marketing" UCI dataset (please check the description at: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing). The data is enriched by the addition of five new social and economic features/attributes (national wide indicators from a ~10M population country), published by the Banco de Portugal and publicly available at: https://www.bportugal.pt/estatisticasweb. This dataset is almost identical to the one used in Moro et al., 2014. Using the rminer package and R tool (http://cran.r-project.org/web/packages/rminer/), we found that the addition of the five new social and economic attributes (made available here) lead to substantial improvement in the prediction of a success, even when the duration of the call is not included. Note: the file can be read in R using: d=read.table("bank-additional-full.csv",header=TRUE,sep=";")
The binary classification goal is to predict if the client will subscribe a bank term deposit (variable y).
Number of Instances: 41188 for bank-additional-full.csv
Number of Attributes: 20 + output attribute.
Attribute information:
For more information, read [Moro et al., 2014].
Input variables:
*1 - age (numeric)
*2 - job : type of job (categorical: "admin.","blue-collar","entrepreneur","housemaid","management","retired","self-employed","services","student","technician","unemployed","unknown")
*3 - marital : marital status (categorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed)
*4 - education (categorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown")
5 - default: has credit in default? (categorical: "no","yes","unknown")
6 - housing: has housing loan? (categorical: "no","yes","unknown")
7 - loan: has personal loan? (categorical: "no","yes","unknown")
*9 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
*10 - day_of_week: last contact day of the week (categorical: "mon","tue","wed","thu","fri")
*11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y="no"). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
*12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
*13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
*14 - previous: number of contacts performed before this campaign and for this client (numeric)
1515 - poutcome: outcome of the previous marketing campaign (categorical: "failure","nonexistent","success")
*16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
*17 - cons.price.idx: consumer price index - monthly indicator (numeric)
*18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
*19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
Output variable (desired target): * 21 - y - h...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘ Sales Conversion Optimization’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/loveall/clicks-conversion-tracking on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Cluster Analysis for Ad Conversions Data
The data used in this project is from an anonymous organisation’s social media ad campaign. The data file can be downloaded from here. The file conversion_data.csv contains 1143 observations in 11 variables. Below are the descriptions of the variables.
1.) ad_id: an unique ID for each ad.
2.) xyz_campaign_id: an ID associated with each ad campaign of XYZ company.
3.) fb_campaign_id: an ID associated with how Facebook tracks each campaign.
4.) age: age of the person to whom the ad is shown.
5.) gender: gender of the person to whim the add is shown
6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).
7.) Impressions: the number of times the ad was shown.
8.) Clicks: number of clicks on for that ad.
9.) Spent: Amount paid by company xyz to Facebook, to show that ad.
10.) Total conversion: Total number of people who enquired about the product after seeing the ad.
11.) Approved conversion: Total number of people who bought the product after seeing the ad.
Thanks to the Anonymous data depositor
Social Media Ad Campaign marketing is a leading source of Sales Conversion and i have made this data available for the benefit of Businesses using Google Adwords to track Conversions
--- Original source retains full ownership of the source dataset ---
Data Description: The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.
Domain: Banking
Context: Leveraging customer information is paramount for most businesses. In the case of a bank, attributes of customers like the ones mentioned below can be crucial in strategizing a marketing campaign when launching a new product.
Learning Outcomes: ● Exploratory Data Analysis ● Preparing the data to train a model ● Training and making predictions using an Ensemble Model ● Comparing model performances
Objective: The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).
Steps and tasks: 1. Import the necessary libraries 2. Read the data as a data frame 3. Perform basic EDA which should include the following and print out your insights at every step. a. Shape of the data b. Data type of each attribute c. Checking the presence of missing values d. 5 point summary of numerical attributes e. Checking the presence of outliers 4. Prepare the data to train a model – check if data types are appropriate, get rid of the missing values etc 5. Train a few standard classification algorithms, note and comment on their performances along different metrics. 6. Build the ensemble models and compare the results with the base models. Note: Random forest can be used only with Decision trees. 7. Compare performances of all the models
References: ● Data analytics use cases in Banking ● Machine Learning for Financial Marketing
Context Problem Statement
Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers. It makes it easier for them to modify products according to the specific needs, behaviors, and concerns of different types of customers.
Customer personality analysis helps a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only to that particular segment.
Content Attributes
People
ID: Customer's unique identifier Year_Birth: Customer's birth year Education: Customer's education level Marital_Status: Customer's marital status Income: Customer's yearly household income Kidhome: Number of children in customer's household Teenhome: Number of teenagers in customer's household Dt_Customer: Date of customer's enrollment with the company Recency: Number of days since customer's last purchase Complain: 1 if the customer complained in the last 2 years, 0 otherwise Products
MntWines: Amount spent on wine in last 2 years MntFruits: Amount spent on fruits in last 2 years MntMeatProducts: Amount spent on meat in last 2 years MntFishProducts: Amount spent on fish in last 2 years MntSweetProducts: Amount spent on sweets in last 2 years MntGoldProds: Amount spent on gold in last 2 years Promotion
NumDealsPurchases: Number of purchases made with a discount AcceptedCmp1: 1 if the customer accepted the offer in the 1st campaign, 0 otherwise AcceptedCmp2: 1 if customer accepted the offer in the 2nd customer accepted the offer in the 2nd campaign, 0 otherwise AcceptedCmp3: 1 if the customer accepted the offer in the 3rd campaign, 0 otherwise AcceptedCmp4: 1 if customer accepted the offer in the 4th customer accepted the offer in the 4th campaign, 0 otherwise AcceptedCmp5: 1 if the customer accepted the offer in the 5th campaign, 0 otherwise Response: 1 if customer accepted the offer in the last campaign, 0 otherwise Place
NumWebPurchases: Number of purchases made through the company’s website NumCatalogPurchases: Number of purchases made using a catalog NumStorePurchases: Number of purchases made directly in stores NumWebVisitsMonth: Number of visits to the company’s website in the last month Target Need to perform clustering to summarize customer segments.
Inspiration happy learning….
I hope you like this dataset please don't forget to like this dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Customer Personality Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/imakash3011/customer-personality-analysis on 21 November 2021.
--- Dataset description provided by original source is as follows ---
Problem Statement
Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviors and concerns of different types of customers.
Customer personality analysis helps a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.
Attributes
People
Products
Promotion
Place
Need to perform clustering to summarize customer segments.
You can take help from following link to know more about the approach to solve this problem. Visit this URL
happy learning....
Hope you like this dataset please don't forget to like this dataset
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Uplift modeling is an important yet novel area of research in machine learning which aims to explain and to estimate the causal impact of a treatment at the individual level. In the digital advertising industry, the treatment is exposure to different ads and uplift modeling is used to direct marketing efforts towards users for whom it is the most efficient . The data is a collection collection of 13 million samples from a randomized control trial, scaling up previously available datasets by a healthy 590x factor.
###
###
The dataset was created by The Criteo AI Lab .The dataset consists of 13M rows, each one representing a user with 12 features, a treatment indicator and 2 binary labels (visits and conversions). Positive labels mean the user visited/converted on the advertiser website during the test period (2 weeks). The global treatment ratio is 84.6%. It is usual that advertisers keep only a small control population as it costs them in potential revenue.
Following is a detailed description of the features:
###
Uplift modeling is an important yet novel area of research in machine learning which aims to explain and to estimate the causal impact of a treatment at the individual level. In the digital advertising industry, the treatment is exposure to different ads and uplift modeling is used to direct marketing efforts towards users for whom it is the most efficient . The data is a collection collection of 13 million samples from a randomized control trial, scaling up previously available datasets by a healthy 590x factor.
###
###
The dataset was created by The Criteo AI Lab .The dataset consists of 13M rows, each one representing a user with 12 features, a treatment indicator and 2 binary labels (visits and conversions). Positive labels mean the user visited/converted on the advertiser website during the test period (2 weeks). The global treatment ratio is 84.6%. It is usual that advertisers keep only a small control population as it costs them in potential revenue.
Following is a detailed description of the features:
###
The data provided for paper: "A Large Scale Benchmark for Uplift Modeling"
https://s3.us-east-2.amazonaws.com/criteo-uplift-dataset/large-scale-benchmark.pdf
For privacy reasons the data has been sub-sampled non-uniformly so that the original incrementality level cannot be deduced from the dataset while preserving a realistic, challenging benchmark. Feature names have been anonymized and their values randomly projected so as to keep predictive power while making it practically impossible to recover the original features or user context.
We can foresee related usages such as but not limited to:
Most of the small to medium business owners are making effective use of Gmail based Email marketing Strategies for offline targeting of converting their prospective customers into leads so that they stay with them in Business
we have different aspects of emails to characterize the mail and track the mail is ignored; read; acknowledged by the reader
corefactors.in
amount of advertising dollars spent on a product determines the amount of its sales, we could use regression analysis to quantify the precise nature of the relationship between advertising and sales. here we want everyone to experiment with this fun data , what value we can derive from email as a tool for compaign marketing in a multi channel marketing strategy of a Small to Medium Businesses
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Personal Loan Modeling ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/teertha/personal-loan-modeling on 30 September 2021.
--- Dataset description provided by original source is as follows ---
This case is about a bank (Thera Bank) whose management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors). A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio with minimal budget.
The file Bank.xls contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.
There are no empty or (NaN) values in the dataset. The dataset has a mix of numerical and categorical attributes, but all categorical data are represented with numbers. Moreover, Some of the predictor variables are heavily skewed (long - tailed), making the data pre-processing an interesting yet not too challenging aspect of the data.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Bank_Personal_Loan_Modelling’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/krantiswalke/bank-personal-loan-modelling on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Data Description: The file Bank.xls contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.
Domain:Banking
Context: This case is about a bank (Thera Bank) whose management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors). A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio with minimal budget.
Learning Outcomes: 1. Exploratory Data Analysis 2. Preparing the data to train a model 3. Training and making predictions using a classification model 4. Model evaluation
Objective: The classification goal is to predict the likelihood of a liability customer buying personal loans.
Steps and tasks: 1. Read the column description and ensure you understand each attribute well 2. Study the data distribution in each attribute, share your findings 3. Get the target column distribution. 4. Split the data into training and test set in the ratio of 70:30 respectively 5. Use different classification models (Logistic, K-NN and Naïve Bayes) to predict the likelihood of a liability customer buying personal loans 6. Print the confusion matrix for all the above models 7. Give your reasoning on which is the best model in this case and why it performs better?
References: 1. Data analytics use cases in Banking 2. Machine Learning for Financial Marketing
--- Original source retains full ownership of the source dataset ---
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
There are two types of cyclists, those who purchase casual tickets and those who purchase annual memberships. The marketing team believes that maximizing the number of annual member will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, there is a very good chance to convert casual riders into members.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The airline implemented a promotional campaign, aimed at enhancing program enrollment. The dataset encompasses information regarding the signups, enrollment, cancellations within the loyalty program, and supplementary customer demographics.
Field | Description |
---|---|
Loyalty Number | Customer's unique loyalty number |
Year | Year of the period |
Month | Month of the period |
Flights Booked | Number of flights booked for member only in the period |
Flights with Companions | Number of flights booked with additional passengers in the period |
Total Flights | Sum of Flights Booked and Flights with Companions |
Distance | Flight distance traveled in the period (km) |
Points Accumulated | Loyalty points accumulated in the period |
Points Redeemed | Loyalty points redeemed in the period |
Dollar Cost Points Redeemed | Dollar equivalent for points redeemed in the period in CDN |
Field | Description |
---|---|
Loyalty Number | Customer's unique loyalty number |
Country | Country of residence |
Province | Province of residence |
City | City of residence |
Postal Code | Postal code of residence |
Gender | Gender |
Education | Highest education level (High school or lower > College > Bachelor > Master > Doctor) |
Salary | Annual income |
Marital Status | Marital status (Single, Married, Divorced) |
Loyalty Card | Loyalty card status (Star > Nova > Aurora) |
CLV | Customer lifetime value - total invoice value for all flights ever booked by member |
Enrollment Type | Enrollment type (Standard / 2018 Promotion) |
Enrollment Year | Year Member enrolled in membership program |
Enrollment Month | Month Member enrolled in membership program |
Cancellation Year | Year Member cancelled their membership |
Cancellation Month | Month Member cancelled their membership |
Reference :
www.ibm.com. (n.d.). IBM Documentation. [online] Available at: https://www.ibm.com/docs/en/cognos-analytics/ [Accessed 6 Dec. 2023].
Attributes details Customer id: This column is about the id of the customer contacted
age : This column consists of the age of each customer
salary : This column represents monthly salary of the customer
balance : This column represents the cash balance in the bank account of the customer
marital : This column consists of the information about the marital status of each customer.
jobedu : This column consists of the information about the job and education of each customer
default: This column consists of two categorical variables ‘yes’ & ‘no’, where
Yes - represents if the customer has defaulted any loan
no - represents if the customer has not defaulted any loan
housing : This column consists of the two categorical variables ‘yes’ & ‘no’, where
yes - represents if the customer has taken housing loan
no - represents if the customer has not taken the housing loan
loan : This column consists of the two categorical variables ‘yes’ & ‘no’, where
yes - represents if the customer has taken personal loan
no - represents if the customer has not taken the personal loan
contact This column provides the information on the means through which the customer has been contacted either ‘cellular’ , ‘telephone’ and ‘unknown’ represents no information
day day of month on which a particular customer is contacted
month This column provides the detail of month in which the customer is contacted during the campaign
duration This column represents the total call duration of each customer
campaign This column is the number of campaign in which customer is contacted.
pdays This column represents the no of days passed by since the customer has been reached via bank for any of the other products (not term deposit).
Here, the value ‘-1’ represents that the customer has never been reached for any product previous This column represents the no of times the customer has been reached in the previous campaigns or for any of the other products(not term deposit)
poutcome This column represents the outcome of the previous reach outs for any of the products(other than term deposits) provided by banks
Unknown - This represents that the customer has not been reached so far
Success - This represents that the previous call was a successful conversion of the customer
Failure - This represents that the customer is not interested in the last product Other - This represents that during the previous call, the customer has not given any definite answer
response This column represents whether the customer has opened the term deposit account or not
The bank.csv dataset describes about a phone call between customer and customer care staffs who are working for Portuguese banking institution. The dataset is about, whether the customer will get the scheme or product such as bank term deposit. Maximum the data will have ‘yes’ or ‘no’ type data.
The main goal is to predict if clients will subscribe to a term deposit or not.
Bank Client Data: 1 - age: (numeric) 2 - job: type of job (categorical: admin., blue-collar, entrepreneur, housemaid, management, retired, self-employed, services, student, technician, unemployed, unknown) 3 - marital: marital status (categorical: divorced, married, single, unknown; note: divorced means either divorced or widowed) 4 - education: (categorical: basic.4y, basic.6y, basic.9y, high.school, illiterate, professional.course, university.degree, unknown) 5 - default: has credit in default? (categorical: no, yes, unknown) 6 - housing: has housing loan? (categorical: no, yes, unknown) 7 - loan: has personal loan? (categorical: no, yes, unknown)
Related with the Last Contact of the Current Campaign: 8 - contact: contact communication type (categorical: cellular, telephone) 9 - month: last contact month of year (categorical: jan, feb, mar, ..., nov, dec) 10 - day_of_week: last contact day of the week (categorical: mon, tue, wed, thu, fri) 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
Other Attributes: 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14 - previous: number of contacts performed before this campaign and for this client (numeric) 15 - poutcome: outcome of the previous marketing campaign (categorical: failure, nonexistent, success)
#Social and Economic Context Attributes 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric) 17 - cons.price.idx: consumer price index - monthly indicator (numeric) 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19 - euribor3m: euribor 3 month rate - daily indicator (numeric) 20 - nr.employed: number of employees - quarterly indicator (numeric)
Output Variable (Desired Target): 21 - y (deposit): - has the client subscribed a term deposit? (binary: yes, no) -> changed column title from '***y***' to '***deposit***'
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I took the limited data set for this project that I cleaned, organized, and transformed the data into a presentation for the fictitious company. This was to showcase my ability to analyze, clean, & visualize datasets to find insights and present them in a way that could represent a real life scenario.
This case study was the capstone project at the end of my Online course from Google in Data Analytics. This comes from a relatively large but limited data spreadsheet from a fictitious bicycle share service. In the scenario you were required to answer the first of three questions posed to the data analyst with a limited data set: “How do annual members differ from casual users?” This was to help the marketing lead build a new campaign to sign up casual users of the service to annual members based on findings from the financial analysts.
Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path. By the end of this lesson, you will have a portfolio-ready case study.
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
The datasets contain the previous 12 months of Cyclistic trip data. The datasets have a different name because Cyclistic is a fictional company. For the purposes of this case study, the datasets are appropriate and will enable you to answer business questions.
This data has been made available by Motivate International Inc. under this license. This is public data that you can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit you from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.
Research question: How do annual members and casual riders use Cylistic bikes differently.
Powerco is one of the clients in BCG in which focuses on supplying gas and electricity for SME (Small Medium Enterprise) and residential customers who wants to detect declining customers who are likely to churn particularly for the customers in the SME segment and market in Europe through the issue of the energy of power-liberalization. One hypothesis under consideration of the clients is the customer’s price sensitiveness that affects the possibility of churn. Hence, Powerco wants to do a marketing strategy by offering customers a high propensity to churn a 20% discount. We will be predicting the probability of customer churn rate and deliver actionable insights based on the available data provided by powerco. This course is part of Data Science & Advanced Analytics Virtual Experience Program by Boston Consulting Group. You can hone your data science skills through this program.
forecast_meter_rent_12m forecasted bill of meter rental for the next 12 months
forecast_price_energy_p1 forecasted energy price for 1st period
forecast_price_energy_p2 forecasted energy price for 2nd period
forecast_price_pow_p1 forecasted power price for 1st period
has_gas indicated if clieclient is also a gas client
imp_cons current paid consumption
margin_gross_pow_ele gross margin on power subscription
margin_net_pow_ele net margin on power subscription
nb_prod_act number of active products and services
net_margin total net margin
num_years_antig antiquity of the client (in number of years)
origin_up code of the electricity campaign the customer first subscribed to
pow_max subscribed power
price_date reference date
price_p1_var price of energy for the 1st period
price_p2_var price of energy for the 2nd period
price_p3_var price of energy for the 3rd period
price_p1_fix price of power for the 1st period
price_p2_fix price of power for the 2nd period
price_p3_fix price of power for the 3rd period
churned has the client churned over the next 3 months
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Description
- Customer Demographics: Includes FullName, Gender, Age, CreditScore, and MonthlyIncome. These variables provide a demographic snapshot of the customer base, allowing for segmentation and targeted marketing analysis.
- Geographical Data: Comprising Country, State, and City, this section facilitates location-based analytics, market penetration studies, and regional sales performance.
- Product Information: Details like Category, Product, Cost, and Price enable product trend analysis, profitability assessment, and inventory optimization.
- Transactional Data: Captures the customer journey through SessionStart, CartAdditionTime, OrderConfirmation, OrderConfirmationTime, PaymentMethod, and SessionEnd. This rich temporal data can be used for funnel analysis, conversion rate optimization, and customer behavior modeling.
- Post-Purchase Details: With OrderReturn and ReturnReason, analysts can delve into return rate calculations, post-purchase satisfaction, and quality control.
Types of Analysis
- Descriptive Analytics: Understand basic metrics like average monthly income, most common product categories, and typical credit scores.
- Predictive Analytics: Use machine learning to predict credit risk or the likelihood of a purchase based on demographics and session activity.
- Customer Segmentation: Group customers by demographics or purchasing behavior to tailor marketing strategies.
- Geospatial Analysis: Examine sales distribution across different regions and optimize logistics. Time Series Analysis: Study the seasonality of purchases and session activities over time.
- Funnel Analysis: Evaluate the customer journey from session start to order confirmation and identify drop-off points.
- Cohort Analysis: Track customer cohorts over time to understand retention and repeat purchase patterns.
- Market Basket Analysis: Discover product affinities and develop cross-selling strategies.
Curious about how I created the data? Feel free to click here and take a peek! 😉
📊🔍 Good Luck and Happy Analysing 🔍📊