2 datasets found

Online_Retail_II

kaggle.com

Updated Jul 2, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Shah Nawaj (2025). Online_Retail_II [Dataset]. https://www.kaggle.com/datasets/shahnawaj9/online-retail/data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 2, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Shah Nawaj

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Cleaned & Merged UCI Online Retail Dataset (Dec 2009 – Dec 2011)

This dataset is a cleaned and merged version of the original UCI Online Retail and Online Retail II datasets. It contains transaction data from a UK-based online retailer, covering a period from December 2009 to December 2011.

Description

The original UCI Online Retail II dataset contains two separate sheets: - Year 2009–2010 - Year 2010–2011

These have been merged with the original UCI Online Retail dataset to create a unified and continuous dataset.

Cleaning and Preprocessing Performed

Merged all sheets into a single dataset
Removed:
- Rows with negative or zero quantity
- Rows with negative or zero price
- Rows with missing customer_id
Created:
- total_price column (quantity × price)
- is_cancelled column based on invoice format or return flag
Standardized:
- invoicedate formatting
- Column names and data types

Column Definitions

Column	Description
`invoice`	Invoice number (returns start with 'C')
`stockcode`	Product code
`description`	Description of product
`quantity`	Number of items purchased
`invoicedate`	Date and time of invoice
`price`	Unit price in GBP
`customer_id`	Unique identifier for each customer
`country`	Customer’s country
`is_cancelled`	Boolean flag for cancelled transactions
`total_price`	Computed total (`quantity × price`) for each line item

Included Files and Descriptions

File	Type	Description
`online_retail_cleaned.csv`	Data	Cleaned and merged retail transactions from 2009–2011
`rfm_final_score.csv`	Output	Final RFM scores for each customer with segment labels
`Retail_Data_Analysis_Dashboard.xlsx`	Excel	Interactive Excel dashboard with KPIs, CLV, monthly trends
`Retail_Data_Analysis_Dashboard.png`	Image	Visual preview of the Excel dashboard
`RFM_Segmentation.sql`	SQL	SQL logic to calculate RFM scores and assign segments
`Cohort_Analysis_on_Customer.sql`	SQL	Cohort analysis based on acquisition month
`Cohort_Analysis_on_Revenue.sql`	SQL	Cohort revenue tracking over time

Dataset Summary

Time range: December 2009 – December 2011
Data combined from all three sheets (original and Online Retail II)
Most customers are from the United Kingdom
Fully cleaned and ready for use in analysis or modeling

Applications

Market basket analysis
RFM segmentation
Cohort and retention analysis
Customer lifetime value modeling
Time series forecasting

Included Analysis & Dashboards

In addition to the cleaned dataset, this dataset includes complete analysis artifacts:

1. Excel Dashboard

Summary metrics: Total Revenue, Orders, Customers, AOV
Turnover by year
Customer Lifetime Value segmentation (High, Medium, Low)
Monthly customer acquisition and churn trend
Country-wise revenue
Key business recommendations

2. SQL-Based RFM Segmentation

RFM scores (1–5 scale)
Segment grouping (e.g., Champions, At Risk, Loyal Customers)
Monetary value distributions

3. SQL-Based Cohort Analysis

Monthly cohorts based on acquisition date
Retention matrix for month-over-month analysis
Supports churn and lifecycle evaluation

These files are provided in .xlsx and .sql formats and can be used for further business analysis or modeling.

Source

Original datasets: - UCI Online Retail II: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II

This version was cleaned and merged by: Md Shah Nawaj

Online_Retail_II

Merged and cleaned transaction data from UCI Online Retail II (2009–2011)

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 2, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Shah Nawaj

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Cleaned & Merged UCI Online Retail Dataset (Dec 2009 – Dec 2011)

Description

The original UCI Online Retail II dataset contains two separate sheets: - Year 2009–2010 - Year 2010–2011

These have been merged with the original UCI Online Retail dataset to create a unified and continuous dataset.

Cleaning and Preprocessing Performed

Merged all sheets into a single dataset
Removed:
- Rows with negative or zero quantity
- Rows with negative or zero price
- Rows with missing customer_id
Created:
- total_price column (quantity × price)
- is_cancelled column based on invoice format or return flag
Standardized:
- invoicedate formatting
- Column names and data types

Column Definitions

Column	Description
`invoice`	Invoice number (returns start with 'C')
`stockcode`	Product code
`description`	Description of product
`quantity`	Number of items purchased
`invoicedate`	Date and time of invoice
`price`	Unit price in GBP
`customer_id`	Unique identifier for each customer
`country`	Customer’s country
`is_cancelled`	Boolean flag for cancelled transactions
`total_price`	Computed total (`quantity × price`) for each line item

Included Files and Descriptions

File	Type	Description
`online_retail_cleaned.csv`	Data	Cleaned and merged retail transactions from 2009–2011
`rfm_final_score.csv`	Output	Final RFM scores for each customer with segment labels
`Retail_Data_Analysis_Dashboard.xlsx`	Excel	Interactive Excel dashboard with KPIs, CLV, monthly trends
`Retail_Data_Analysis_Dashboard.png`	Image	Visual preview of the Excel dashboard
`RFM_Segmentation.sql`	SQL	SQL logic to calculate RFM scores and assign segments
`Cohort_Analysis_on_Customer.sql`	SQL	Cohort analysis based on acquisition month
`Cohort_Analysis_on_Revenue.sql`	SQL	Cohort revenue tracking over time

Dataset Summary

Time range: December 2009 – December 2011
Data combined from all three sheets (original and Online Retail II)
Most customers are from the United Kingdom
Fully cleaned and ready for use in analysis or modeling

Applications

Market basket analysis
RFM segmentation
Cohort and retention analysis
Customer lifetime value modeling
Time series forecasting

Included Analysis & Dashboards

In addition to the cleaned dataset, this dataset includes complete analysis artifacts:

1. Excel Dashboard

Summary metrics: Total Revenue, Orders, Customers, AOV
Turnover by year
Customer Lifetime Value segmentation (High, Medium, Low)
Monthly customer acquisition and churn trend
Country-wise revenue
Key business recommendations

2. SQL-Based RFM Segmentation

RFM scores (1–5 scale)
Segment grouping (e.g., Champions, At Risk, Loyal Customers)
Monetary value distributions

3. SQL-Based Cohort Analysis

Monthly cohorts based on acquisition date
Retention matrix for month-over-month analysis
Supports churn and lifecycle evaluation

These files are provided in .xlsx and .sql formats and can be used for further business analysis or modeling.

Source

Original datasets: - UCI Online Retail II: https://archive.ics.uci.edu/ml/datasets/Online+Retail+II

This version was cleaned and merged by: Md Shah Nawaj

Online_Retail_II

Cleaned & Merged UCI Online Retail Dataset (Dec 2009 – Dec 2011)

Description

Cleaning and Preprocessing Performed

Column Definitions

Included Files and Descriptions

Dataset Summary

Applications

Included Analysis & Dashboards

1. Excel Dashboard

2. SQL-Based RFM Segmentation

3. SQL-Based Cohort Analysis

Source

Tags

Global Shopping Trolley Market Size, Share, Growth Analysis, By Product...

Online_Retail_II

Merged and cleaned transaction data from UCI Online Retail II (2009–2011)

Cleaned & Merged UCI Online Retail Dataset (Dec 2009 – Dec 2011)

Description

Cleaning and Preprocessing Performed

Column Definitions

Included Files and Descriptions

Dataset Summary

Applications

Included Analysis & Dashboards

1. Excel Dashboard

2. SQL-Based RFM Segmentation

3. SQL-Based Cohort Analysis

Source

Tags