73 datasets found
  1. Ecommerce Dataset for Data Analysis

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
    Explore at:
    zip(2028853 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Shrishti Manja
    Description

    This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

    About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

    Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

    This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

    This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

  2. Automobile Dataset For EDA Python And R

    • kaggle.com
    zip
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anubhav Kumar Gupta (2023). Automobile Dataset For EDA Python And R [Dataset]. https://www.kaggle.com/datasets/anubhavkumargupta/automobile-dataset-for-eda-python-and-r
    Explore at:
    zip(4923 bytes)Available download formats
    Dataset updated
    Nov 15, 2023
    Authors
    Anubhav Kumar Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Anubhav Kumar Gupta

    Released under Apache 2.0

    Contents

  3. Project Python- Data Cleaning - EDA- Visualization

    • kaggle.com
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hussein Al Chami (2023). Project Python- Data Cleaning - EDA- Visualization [Dataset]. https://www.kaggle.com/datasets/husseinalchami/project-python-data-cleaning-eda-visualization
    Explore at:
    zip(322085 bytes)Available download formats
    Dataset updated
    Dec 10, 2023
    Authors
    Hussein Al Chami
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Hussein Al Chami

    Released under MIT

    Contents

  4. EDA Using Python

    • kaggle.com
    zip
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rashmi Tiwari (2024). EDA Using Python [Dataset]. https://www.kaggle.com/datasets/rosetiwari/eda-using-python
    Explore at:
    zip(192125 bytes)Available download formats
    Dataset updated
    Dec 6, 2024
    Authors
    Rashmi Tiwari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Rashmi Tiwari

    Released under CC0: Public Domain

    Contents

  5. Capstone Project TikTok - EDA

    • kaggle.com
    zip
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sohail K. Nikouzad (2023). Capstone Project TikTok - EDA [Dataset]. https://www.kaggle.com/datasets/sohailnikouzad/capstone-pr0ject-tiktok-eda
    Explore at:
    zip(52324 bytes)Available download formats
    Dataset updated
    Nov 15, 2023
    Authors
    Sohail K. Nikouzad
    Description

    Dataset

    This dataset was created by Sohail K. Nikouzad

    Contents

  6. Chocolate Sales Analysis using Python/EDA

    • kaggle.com
    zip
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Kalvankar (2023). Chocolate Sales Analysis using Python/EDA [Dataset]. https://www.kaggle.com/datasets/deepakkalvankar/chocolate-sales-analysis-using-python
    Explore at:
    zip(191259 bytes)Available download formats
    Dataset updated
    May 3, 2023
    Authors
    Deepak Kalvankar
    Description

    Dataset

    This dataset was created by Deepak Kalvankar

    Contents

  7. EDA on Car Sales Dataset in Ukraine

    • kaggle.com
    zip
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Khedekar (2023). EDA on Car Sales Dataset in Ukraine [Dataset]. https://www.kaggle.com/datasets/swatikhedekar/eda-on-car-sales-dataset-in-ukraine
    Explore at:
    zip(508971 bytes)Available download formats
    Dataset updated
    Jan 13, 2023
    Authors
    Swati Khedekar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Ukraine
    Description

    1. Problem statemont:

    This dataset contains data more than 9.5k car sales in Ukraine.Most of then are used car so it open the possibility to analyze featurs related to car operation. This is subset of all car data in Ukraine. Using this we will analyze the various parameter of used car sales in Ukraine.

    1.1 Introduction: This Exploratory Data Analysis is to practice python skills till now on a structured dataset including loading, inspecting,wrangling,Exploring and drawing conclusions from the data.The notebook has the obeservations with each step in order to explain throughtly how to approach the dataset. Based on the obseravation some quetions also are answered in the notebook for the reference though not all them are explored in the analysis.

    1.2 Data Source and Dataset: a. How was it collected?

    Name: Car Sales Sponsering Organization: Dont Know! Year :2019 Description: This is case study of more than 9.5k car sales in Ukraine. b. it is sample? If yes ,What is properly sampled?

    Yes .It is sample .We dont have official information about the data collection method, but its appears not to be random sample, so we can assume that it is not representative.

  8. Cleaned Netflix Dataset for EDA

    • kaggle.com
    zip
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil raman K (2025). Cleaned Netflix Dataset for EDA [Dataset]. https://www.kaggle.com/datasets/nikhilramank/cleaned-netflix-dataset-for-eda
    Explore at:
    zip(750797 bytes)Available download formats
    Dataset updated
    Jul 7, 2025
    Authors
    Nikhil raman K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a cleaned version of a Netflix movies dataset prepared for exploratory data analysis (EDA). Missing values have been handled, invalid rows removed, and numerical + categorical columns cleaned for analysis using Python and Pandas.

  9. Electronics Store Sales Dataset for EDA

    • kaggle.com
    zip
    Updated Feb 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sinjoy Saha (2021). Electronics Store Sales Dataset for EDA [Dataset]. https://www.kaggle.com/sinjoysaha/sales-analysis-dataset
    Explore at:
    zip(2505035 bytes)Available download formats
    Dataset updated
    Feb 13, 2021
    Authors
    Sinjoy Saha
    Description

    Content

    This is a transactions data from an Electronics store chain in the US. The data contains 12 CSV files for each month of 2019. The naming convention is as follows: Sales_[MONTH_NAME]_2019 Each file contains anywhere from around 9000 to 26000 rows and 6 columns. The columns are as follows: Order ID, Product, Quantity Ordered, Price Each, Order Date, Purchase Address There are around 186851 data points combining all the 12-month files. There may be null values in some rows.

    Inspiration

    Keith Galli

    Acknowledgements

  10. Udemy Dataset - EDA using Python

    • kaggle.com
    zip
    Updated Nov 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhagya sree (2022). Udemy Dataset - EDA using Python [Dataset]. https://www.kaggle.com/datasets/bhagya20/udemy-dataset-eda-using-python
    Explore at:
    zip(3478 bytes)Available download formats
    Dataset updated
    Nov 20, 2022
    Authors
    Bhagya sree
    Description

    Dataset

    This dataset was created by Bhagya sree

    Contents

  11. BI intro to data cleaning eda and machine learning

    • kaggle.com
    zip
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walekhwa Tambiti Leo Philip (2025). BI intro to data cleaning eda and machine learning [Dataset]. https://www.kaggle.com/datasets/walekhwatlphilip/intro-to-data-cleaning-eda-and-machine-learning/suggestions
    Explore at:
    zip(9961 bytes)Available download formats
    Dataset updated
    Nov 17, 2025
    Authors
    Walekhwa Tambiti Leo Philip
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Real-World Data Science Challenge

    Business Intelligence Program Strategy — Student Success Optimization

    Hosted by: Walsoft Computer Institute 📁 Download dataset 👤 Kaggle profile

    Background

    Walsoft Computer Institute runs a Business Intelligence (BI) training program for students from diverse educational, geographical, and demographic backgrounds. The institute has collected detailed data on student attributes, entry exams, study effort, and final performance in two technical subjects: Python Programming and Database Systems.

    As part of an internal review, the leadership team has hired you — a Data Science Consultant — to analyze this dataset and provide clear, evidence-based recommendations on how to improve:

    • Admissions decision-making
    • Academic support strategies
    • Overall program impact and ROI

    Your Mission

    Answer this central question:

    “Using the BI program dataset, how can Walsoft strategically improve student success, optimize resources, and increase the effectiveness of its training program?”

    Key Strategic Areas

    You are required to analyze and provide actionable insights for the following three areas:

    1. Admissions Optimization

    Should entry exams remain the primary admissions filter?

    Your task is to evaluate the predictive power of entry exam scores compared to other features such as prior education, age, gender, and study hours.

    ✅ Deliverables:

    • Feature importance ranking for predicting Python and DB scores
    • Admission policy recommendation (e.g., retain exams, add screening tools, adjust thresholds)
    • Business rationale and risk analysis

    2. Curriculum Support Strategy

    Are there at-risk student groups who need extra support?

    Your task is to uncover whether certain backgrounds (e.g., prior education level, country, residence type) correlate with poor performance and recommend targeted interventions.

    ✅ Deliverables:

    • At-risk segment identification
    • Support program design (e.g., prep course, mentoring)
    • Expected outcomes, costs, and KPIs

    3. Resource Allocation & Program ROI

    How can we allocate resources for maximum student success?

    Your task is to segment students by success profiles and suggest differentiated teaching/facility strategies.

    ✅ Deliverables:

    • Performance drivers
    • Student segmentation
    • Resource allocation plan and ROI projection

    🛠️ Dataset Overview

    ColumnDescription
    fNAME, lNAMEStudent first and last name
    AgeStudent age (21–71 years)
    genderGender (standardized as "Male"/"Female")
    countryStudent’s country of origin
    residenceStudent housing/residence type
    entryEXAMEntry test score (28–98)
    prevEducationPrior education (High School, Diploma, etc.)
    studyHOURSTotal study hours logged
    PythonFinal Python exam score
    DBFinal Database exam score

    📊 Dataset

    You are provided with a real-world messy dataset that reflects the types of issues data scientists face every day — from inconsistent formatting to missing values.

    Raw Dataset (Recommended for Full Project)

    Download: bi.csv

    This dataset includes common data quality challenges:

    • Country name inconsistencies
      e.g. Norge → Norway, RSA → South Africa, UK → United Kingdom

    • Residence type variations
      e.g. BI-Residence, BIResidence, BI_Residence → unify to BI Residence

    • Education level typos and casing issues
      e.g. Barrrchelors → Bachelor, DIPLOMA, DiplomaaaDiploma

    • Gender value noise
      e.g. M, F, female → standardize to Male / Female

    • Missing scores in Python subject
      Fill NaN values using column mean or suitable imputation strategy

    Participants using this dataset are expected to apply data cleaning techniques such as: - String standardization - Null value imputation - Type correction (e.g., scores as float) - Validation and visual verification

    Bonus: Submissions that use and clean this dataset will earn additional Technical Competency points.

    Cleaned Dataset (Optional Shortcut)

    Download: cleaned_bi.csv

    This version has been fully standardized and preprocessed: - All fields cleaned and renamed consistently - Missing Python scores filled with th...

  12. Amazon Prime EDA & Python dashboard

    • kaggle.com
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil raman K (2025). Amazon Prime EDA & Python dashboard [Dataset]. https://www.kaggle.com/datasets/nikhilramank/amazon-prime-eda-and-python-dashboard
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nikhil raman K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Nikhil raman K

    Released under CC0: Public Domain

    Contents

  13. Chicago_Crimes_2005_to_2007

    • kaggle.com
    zip
    Updated Jul 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Chaurasia (2020). Chicago_Crimes_2005_to_2007 [Dataset]. https://www.kaggle.com/datasets/shivamchaurasia/chicago-crimes-2005-to-2007
    Explore at:
    zip(373130398 bytes)Available download formats
    Dataset updated
    Jul 17, 2020
    Authors
    Shivam Chaurasia
    Area covered
    Chicago
    Description

    Dataset

    This dataset was created by Shivam Chaurasia

    Contents

  14. Spotify tracks

    • kaggle.com
    zip
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BharathiD8 (2024). Spotify tracks [Dataset]. https://www.kaggle.com/datasets/bharathid8/spotify-tracks/code
    Explore at:
    zip(16221953 bytes)Available download formats
    Dataset updated
    Aug 29, 2024
    Authors
    BharathiD8
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by BharathiD8

    Released under Apache 2.0

    Contents

  15. Cleaned Auto Dataset 1985

    • kaggle.com
    zip
    Updated Oct 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Moiz Hussain (2021). Cleaned Auto Dataset 1985 [Dataset]. https://www.kaggle.com/faisalmoizhussain/cleaned-auto-dataset-1985
    Explore at:
    zip(10027 bytes)Available download formats
    Dataset updated
    Oct 3, 2021
    Authors
    Faisal Moiz Hussain
    Description

    Context

    Tailor made data to apply the machine learning models on the dataset. Where the newcomers can easily perform their EDA.

    The data consists of all the features of the four wheelers available in the market in 1985. We need to predict the **price of the car ** using Linear Regression or PCA or SVM-R etc.,

  16. Shopping Mall Customer Data Segmentation Analysis

    • kaggle.com
    zip
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataZng (2024). Shopping Mall Customer Data Segmentation Analysis [Dataset]. https://www.kaggle.com/datasets/datazng/shopping-mall-customer-data-segmentation-analysis
    Explore at:
    zip(5890828 bytes)Available download formats
    Dataset updated
    Aug 4, 2024
    Authors
    DataZng
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Demographic Analysis of Shopping Behavior: Insights and Recommendations

    Dataset Information: The Shopping Mall Customer Segmentation Dataset comprises 15,079 unique entries, featuring Customer ID, age, gender, annual income, and spending score. This dataset assists in understanding customer behavior for strategic marketing planning.

    Cleaned Data Details: Data cleaned and standardized, 15,079 unique entries with attributes including - Customer ID, age, gender, annual income, and spending score. Can be used by marketing analysts to produce a better strategy for mall specific marketing.

    Challenges Faced: 1. Data Cleaning: Overcoming inconsistencies and missing values required meticulous attention. 2. Statistical Analysis: Interpreting demographic data accurately demanded collaborative effort. 3. Visualization: Crafting informative visuals to convey insights effectively posed design challenges.

    Research Topics: 1. Consumer Behavior Analysis: Exploring psychological factors driving purchasing decisions. 2. Market Segmentation Strategies: Investigating effective targeting based on demographic characteristics.

    Suggestions for Project Expansion: 1. Incorporate External Data: Integrate social media analytics or geographic data to enrich customer insights. 2. Advanced Analytics Techniques: Explore advanced statistical methods and machine learning algorithms for deeper analysis. 3. Real-Time Monitoring: Develop tools for agile decision-making through continuous customer behavior tracking. This summary outlines the demographic analysis of shopping behavior, highlighting key insights, dataset characteristics, team contributions, challenges, research topics, and suggestions for project expansion. Leveraging these insights can enhance marketing strategies and drive business growth in the retail sector.

    References OpenAI. (2022). ChatGPT [Computer software]. Retrieved from https://openai.com/chatgpt. Mustafa, Z. (2022). Shopping Mall Customer Segmentation Data [Data set]. Kaggle. Retrieved from https://www.kaggle.com/datasets/zubairmustafa/shopping-mall-customer-segmentation-data Donkeys. (n.d.). Kaggle Python API [Jupyter Notebook]. Kaggle. Retrieved from https://www.kaggle.com/code/donkeys/kaggle-python-api/notebook Pandas-Datareader. (n.d.). Retrieved from https://pypi.org/project/pandas-datareader/

  17. Customer Sale Dataset for Data Visualization

    • kaggle.com
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atul (2025). Customer Sale Dataset for Data Visualization [Dataset]. https://www.kaggle.com/datasets/atulkgoyl/customer-sale-dataset-for-visualization
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atul
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This synthetic dataset is designed specifically for practicing data visualization and exploratory data analysis (EDA) using popular Python libraries like Seaborn, Matplotlib, and Pandas.

    Unlike most public datasets, this one includes a diverse mix of column types:

    📅 Date columns (for time series and trend plots) 🔢 Numerical columns (for histograms, boxplots, scatter plots) 🏷️ Categorical columns (for bar charts, group analysis)

    Whether you are a beginner learning how to visualize data or an intermediate user testing new charting techniques, this dataset offers a versatile playground.

    Feel free to:

    Create EDA notebooks Practice plotting techniques Experiment with filtering, grouping, and aggregations 🛠️ No missing values, no data cleaning needed — just download and start exploring!

    Hope you find this helpful. Looking forward to hearing from you all.

  18. ZOMATO BANGALORE EDA

    • kaggle.com
    zip
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshika Srivastava (2025). ZOMATO BANGALORE EDA [Dataset]. https://www.kaggle.com/datasets/anshikasri62/zomato-banglore-eda
    Explore at:
    zip(1246927 bytes)Available download formats
    Dataset updated
    Sep 15, 2025
    Authors
    Anshika Srivastava
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Bengaluru
    Description

    Exploratory Data Analysis (EDA) of ZOMATO BANGALORE DATASET using Python and its libraries (Pandas , Matplotlib and Seaborn ). Analyzed restaurant distribution ,top cuisines ,rating distribution, cost for two and other interesting insights.

    Included files: - NOTEBOOK : : ZOMATO_EDA.ipynb -IMAGES : : Visualizations of key insights - requirement.txt : : Python dependencies

  19. Google Advanced Data Analytics Capstone

    • kaggle.com
    zip
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Goosey (2023). Google Advanced Data Analytics Capstone [Dataset]. https://www.kaggle.com/datasets/timgoosey/google-advanced-data-analytics-capstone
    Explore at:
    zip(113112 bytes)Available download formats
    Dataset updated
    Aug 26, 2023
    Authors
    Tim Goosey
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Capstone project for Google Advanced Data Analytics. Dataset to build predictive models to provide insights to the HR department, of a large consulting firm. The HR department wants to improve employee satisfaction at the company. Data cleaning, EDA, visualization, and modeling was completed in Python.

  20. Hotel Reservation Booking - EDA in Python

    • kaggle.com
    zip
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radhika Padsala (2024). Hotel Reservation Booking - EDA in Python [Dataset]. https://www.kaggle.com/radhikapadsala/hotel-reservation-booking-eda-in-python
    Explore at:
    zip(589595 bytes)Available download formats
    Dataset updated
    May 14, 2024
    Authors
    Radhika Padsala
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Radhika Padsala

    Released under MIT

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
Organization logo

Ecommerce Dataset for Data Analysis

Exploratory Data Analysis, Data Visualisation and Machine Learning

Explore at:
zip(2028853 bytes)Available download formats
Dataset updated
Sep 19, 2024
Authors
Shrishti Manja
Description

This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

Search
Clear search
Close search
Google apps
Main menu