74 datasets found
  1. Ecommerce Dataset for Data Analysis

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
    Explore at:
    zip(2028853 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Shrishti Manja
    Description

    This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

    About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

    Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

    This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

    This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

  2. Automobile Dataset For EDA Python And R

    • kaggle.com
    zip
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anubhav Kumar Gupta (2023). Automobile Dataset For EDA Python And R [Dataset]. https://www.kaggle.com/datasets/anubhavkumargupta/automobile-dataset-for-eda-python-and-r
    Explore at:
    zip(4923 bytes)Available download formats
    Dataset updated
    Nov 15, 2023
    Authors
    Anubhav Kumar Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Anubhav Kumar Gupta

    Released under Apache 2.0

    Contents

  3. Project Python- Data Cleaning - EDA- Visualization

    • kaggle.com
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hussein Al Chami (2023). Project Python- Data Cleaning - EDA- Visualization [Dataset]. https://www.kaggle.com/datasets/husseinalchami/project-python-data-cleaning-eda-visualization
    Explore at:
    zip(322085 bytes)Available download formats
    Dataset updated
    Dec 10, 2023
    Authors
    Hussein Al Chami
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Hussein Al Chami

    Released under MIT

    Contents

  4. EDA Using Python

    • kaggle.com
    zip
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rashmi Tiwari (2024). EDA Using Python [Dataset]. https://www.kaggle.com/datasets/rosetiwari/eda-using-python
    Explore at:
    zip(192125 bytes)Available download formats
    Dataset updated
    Dec 6, 2024
    Authors
    Rashmi Tiwari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Rashmi Tiwari

    Released under CC0: Public Domain

    Contents

  5. Capstone Project TikTok - EDA

    • kaggle.com
    zip
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sohail K. Nikouzad (2023). Capstone Project TikTok - EDA [Dataset]. https://www.kaggle.com/datasets/sohailnikouzad/capstone-pr0ject-tiktok-eda
    Explore at:
    zip(52324 bytes)Available download formats
    Dataset updated
    Nov 15, 2023
    Authors
    Sohail K. Nikouzad
    Description

    Dataset

    This dataset was created by Sohail K. Nikouzad

    Contents

  6. EDA on Car Sales Dataset in Ukraine

    • kaggle.com
    zip
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swati Khedekar (2023). EDA on Car Sales Dataset in Ukraine [Dataset]. https://www.kaggle.com/datasets/swatikhedekar/eda-on-car-sales-dataset-in-ukraine
    Explore at:
    zip(508971 bytes)Available download formats
    Dataset updated
    Jan 13, 2023
    Authors
    Swati Khedekar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Ukraine
    Description

    1. Problem statemont:

    This dataset contains data more than 9.5k car sales in Ukraine.Most of then are used car so it open the possibility to analyze featurs related to car operation. This is subset of all car data in Ukraine. Using this we will analyze the various parameter of used car sales in Ukraine.

    1.1 Introduction: This Exploratory Data Analysis is to practice python skills till now on a structured dataset including loading, inspecting,wrangling,Exploring and drawing conclusions from the data.The notebook has the obeservations with each step in order to explain throughtly how to approach the dataset. Based on the obseravation some quetions also are answered in the notebook for the reference though not all them are explored in the analysis.

    1.2 Data Source and Dataset: a. How was it collected?

    Name: Car Sales Sponsering Organization: Dont Know! Year :2019 Description: This is case study of more than 9.5k car sales in Ukraine. b. it is sample? If yes ,What is properly sampled?

    Yes .It is sample .We dont have official information about the data collection method, but its appears not to be random sample, so we can assume that it is not representative.

  7. Chocolate Sales Analysis using Python/EDA

    • kaggle.com
    zip
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Kalvankar (2023). Chocolate Sales Analysis using Python/EDA [Dataset]. https://www.kaggle.com/datasets/deepakkalvankar/chocolate-sales-analysis-using-python
    Explore at:
    zip(191259 bytes)Available download formats
    Dataset updated
    May 3, 2023
    Authors
    Deepak Kalvankar
    Description

    Dataset

    This dataset was created by Deepak Kalvankar

    Contents

  8. Cleaned Netflix Dataset for EDA

    • kaggle.com
    zip
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil raman K (2025). Cleaned Netflix Dataset for EDA [Dataset]. https://www.kaggle.com/datasets/nikhilramank/cleaned-netflix-dataset-for-eda
    Explore at:
    zip(750797 bytes)Available download formats
    Dataset updated
    Jul 7, 2025
    Authors
    Nikhil raman K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a cleaned version of a Netflix movies dataset prepared for exploratory data analysis (EDA). Missing values have been handled, invalid rows removed, and numerical + categorical columns cleaned for analysis using Python and Pandas.

  9. Electronics Store Sales Dataset for EDA

    • kaggle.com
    zip
    Updated Feb 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sinjoy Saha (2021). Electronics Store Sales Dataset for EDA [Dataset]. https://www.kaggle.com/sinjoysaha/sales-analysis-dataset
    Explore at:
    zip(2505035 bytes)Available download formats
    Dataset updated
    Feb 13, 2021
    Authors
    Sinjoy Saha
    Description

    Content

    This is a transactions data from an Electronics store chain in the US. The data contains 12 CSV files for each month of 2019. The naming convention is as follows: Sales_[MONTH_NAME]_2019 Each file contains anywhere from around 9000 to 26000 rows and 6 columns. The columns are as follows: Order ID, Product, Quantity Ordered, Price Each, Order Date, Purchase Address There are around 186851 data points combining all the 12-month files. There may be null values in some rows.

    Inspiration

    Keith Galli

    Acknowledgements

  10. Udemy Dataset - EDA using Python

    • kaggle.com
    zip
    Updated Nov 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhagya sree (2022). Udemy Dataset - EDA using Python [Dataset]. https://www.kaggle.com/datasets/bhagya20/udemy-dataset-eda-using-python
    Explore at:
    zip(3478 bytes)Available download formats
    Dataset updated
    Nov 20, 2022
    Authors
    Bhagya sree
    Description

    Dataset

    This dataset was created by Bhagya sree

    Contents

  11. BI intro to data cleaning eda and machine learning

    • kaggle.com
    zip
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walekhwa Tambiti Leo Philip (2025). BI intro to data cleaning eda and machine learning [Dataset]. https://www.kaggle.com/datasets/walekhwatlphilip/intro-to-data-cleaning-eda-and-machine-learning/suggestions
    Explore at:
    zip(9961 bytes)Available download formats
    Dataset updated
    Nov 17, 2025
    Authors
    Walekhwa Tambiti Leo Philip
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Real-World Data Science Challenge

    Business Intelligence Program Strategy — Student Success Optimization

    Hosted by: Walsoft Computer Institute 📁 Download dataset 👤 Kaggle profile

    Background

    Walsoft Computer Institute runs a Business Intelligence (BI) training program for students from diverse educational, geographical, and demographic backgrounds. The institute has collected detailed data on student attributes, entry exams, study effort, and final performance in two technical subjects: Python Programming and Database Systems.

    As part of an internal review, the leadership team has hired you — a Data Science Consultant — to analyze this dataset and provide clear, evidence-based recommendations on how to improve:

    • Admissions decision-making
    • Academic support strategies
    • Overall program impact and ROI

    Your Mission

    Answer this central question:

    “Using the BI program dataset, how can Walsoft strategically improve student success, optimize resources, and increase the effectiveness of its training program?”

    Key Strategic Areas

    You are required to analyze and provide actionable insights for the following three areas:

    1. Admissions Optimization

    Should entry exams remain the primary admissions filter?

    Your task is to evaluate the predictive power of entry exam scores compared to other features such as prior education, age, gender, and study hours.

    ✅ Deliverables:

    • Feature importance ranking for predicting Python and DB scores
    • Admission policy recommendation (e.g., retain exams, add screening tools, adjust thresholds)
    • Business rationale and risk analysis

    2. Curriculum Support Strategy

    Are there at-risk student groups who need extra support?

    Your task is to uncover whether certain backgrounds (e.g., prior education level, country, residence type) correlate with poor performance and recommend targeted interventions.

    ✅ Deliverables:

    • At-risk segment identification
    • Support program design (e.g., prep course, mentoring)
    • Expected outcomes, costs, and KPIs

    3. Resource Allocation & Program ROI

    How can we allocate resources for maximum student success?

    Your task is to segment students by success profiles and suggest differentiated teaching/facility strategies.

    ✅ Deliverables:

    • Performance drivers
    • Student segmentation
    • Resource allocation plan and ROI projection

    🛠️ Dataset Overview

    ColumnDescription
    fNAME, lNAMEStudent first and last name
    AgeStudent age (21–71 years)
    genderGender (standardized as "Male"/"Female")
    countryStudent’s country of origin
    residenceStudent housing/residence type
    entryEXAMEntry test score (28–98)
    prevEducationPrior education (High School, Diploma, etc.)
    studyHOURSTotal study hours logged
    PythonFinal Python exam score
    DBFinal Database exam score

    📊 Dataset

    You are provided with a real-world messy dataset that reflects the types of issues data scientists face every day — from inconsistent formatting to missing values.

    Raw Dataset (Recommended for Full Project)

    Download: bi.csv

    This dataset includes common data quality challenges:

    • Country name inconsistencies
      e.g. Norge → Norway, RSA → South Africa, UK → United Kingdom

    • Residence type variations
      e.g. BI-Residence, BIResidence, BI_Residence → unify to BI Residence

    • Education level typos and casing issues
      e.g. Barrrchelors → Bachelor, DIPLOMA, DiplomaaaDiploma

    • Gender value noise
      e.g. M, F, female → standardize to Male / Female

    • Missing scores in Python subject
      Fill NaN values using column mean or suitable imputation strategy

    Participants using this dataset are expected to apply data cleaning techniques such as: - String standardization - Null value imputation - Type correction (e.g., scores as float) - Validation and visual verification

    Bonus: Submissions that use and clean this dataset will earn additional Technical Competency points.

    Cleaned Dataset (Optional Shortcut)

    Download: cleaned_bi.csv

    This version has been fully standardized and preprocessed: - All fields cleaned and renamed consistently - Missing Python scores filled with th...

  12. Chicago_Crimes_2005_to_2007

    • kaggle.com
    zip
    Updated Jul 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Chaurasia (2020). Chicago_Crimes_2005_to_2007 [Dataset]. https://www.kaggle.com/datasets/shivamchaurasia/chicago-crimes-2005-to-2007
    Explore at:
    zip(373130398 bytes)Available download formats
    Dataset updated
    Jul 17, 2020
    Authors
    Shivam Chaurasia
    Area covered
    Chicago
    Description

    Dataset

    This dataset was created by Shivam Chaurasia

    Contents

  13. Amazon Prime EDA & Python dashboard

    • kaggle.com
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil raman K (2025). Amazon Prime EDA & Python dashboard [Dataset]. https://www.kaggle.com/datasets/nikhilramank/amazon-prime-eda-and-python-dashboard
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nikhil raman K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Nikhil raman K

    Released under CC0: Public Domain

    Contents

  14. Spotify tracks

    • kaggle.com
    zip
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BharathiD8 (2024). Spotify tracks [Dataset]. https://www.kaggle.com/datasets/bharathid8/spotify-tracks/code
    Explore at:
    zip(16221953 bytes)Available download formats
    Dataset updated
    Aug 29, 2024
    Authors
    BharathiD8
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by BharathiD8

    Released under Apache 2.0

    Contents

  15. Cleaned Auto Dataset 1985

    • kaggle.com
    zip
    Updated Oct 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Moiz Hussain (2021). Cleaned Auto Dataset 1985 [Dataset]. https://www.kaggle.com/faisalmoizhussain/cleaned-auto-dataset-1985
    Explore at:
    zip(10027 bytes)Available download formats
    Dataset updated
    Oct 3, 2021
    Authors
    Faisal Moiz Hussain
    Description

    Context

    Tailor made data to apply the machine learning models on the dataset. Where the newcomers can easily perform their EDA.

    The data consists of all the features of the four wheelers available in the market in 1985. We need to predict the **price of the car ** using Linear Regression or PCA or SVM-R etc.,

  16. Customer Sale Dataset for Data Visualization

    • kaggle.com
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atul (2025). Customer Sale Dataset for Data Visualization [Dataset]. https://www.kaggle.com/datasets/atulkgoyl/customer-sale-dataset-for-visualization
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atul
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This synthetic dataset is designed specifically for practicing data visualization and exploratory data analysis (EDA) using popular Python libraries like Seaborn, Matplotlib, and Pandas.

    Unlike most public datasets, this one includes a diverse mix of column types:

    📅 Date columns (for time series and trend plots) 🔢 Numerical columns (for histograms, boxplots, scatter plots) 🏷️ Categorical columns (for bar charts, group analysis)

    Whether you are a beginner learning how to visualize data or an intermediate user testing new charting techniques, this dataset offers a versatile playground.

    Feel free to:

    Create EDA notebooks Practice plotting techniques Experiment with filtering, grouping, and aggregations 🛠️ No missing values, no data cleaning needed — just download and start exploring!

    Hope you find this helpful. Looking forward to hearing from you all.

  17. Shopping Mall Customer Data Segmentation Analysis

    • kaggle.com
    zip
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataZng (2024). Shopping Mall Customer Data Segmentation Analysis [Dataset]. https://www.kaggle.com/datasets/datazng/shopping-mall-customer-data-segmentation-analysis
    Explore at:
    zip(5890828 bytes)Available download formats
    Dataset updated
    Aug 4, 2024
    Authors
    DataZng
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Demographic Analysis of Shopping Behavior: Insights and Recommendations

    Dataset Information: The Shopping Mall Customer Segmentation Dataset comprises 15,079 unique entries, featuring Customer ID, age, gender, annual income, and spending score. This dataset assists in understanding customer behavior for strategic marketing planning.

    Cleaned Data Details: Data cleaned and standardized, 15,079 unique entries with attributes including - Customer ID, age, gender, annual income, and spending score. Can be used by marketing analysts to produce a better strategy for mall specific marketing.

    Challenges Faced: 1. Data Cleaning: Overcoming inconsistencies and missing values required meticulous attention. 2. Statistical Analysis: Interpreting demographic data accurately demanded collaborative effort. 3. Visualization: Crafting informative visuals to convey insights effectively posed design challenges.

    Research Topics: 1. Consumer Behavior Analysis: Exploring psychological factors driving purchasing decisions. 2. Market Segmentation Strategies: Investigating effective targeting based on demographic characteristics.

    Suggestions for Project Expansion: 1. Incorporate External Data: Integrate social media analytics or geographic data to enrich customer insights. 2. Advanced Analytics Techniques: Explore advanced statistical methods and machine learning algorithms for deeper analysis. 3. Real-Time Monitoring: Develop tools for agile decision-making through continuous customer behavior tracking. This summary outlines the demographic analysis of shopping behavior, highlighting key insights, dataset characteristics, team contributions, challenges, research topics, and suggestions for project expansion. Leveraging these insights can enhance marketing strategies and drive business growth in the retail sector.

    References OpenAI. (2022). ChatGPT [Computer software]. Retrieved from https://openai.com/chatgpt. Mustafa, Z. (2022). Shopping Mall Customer Segmentation Data [Data set]. Kaggle. Retrieved from https://www.kaggle.com/datasets/zubairmustafa/shopping-mall-customer-segmentation-data Donkeys. (n.d.). Kaggle Python API [Jupyter Notebook]. Kaggle. Retrieved from https://www.kaggle.com/code/donkeys/kaggle-python-api/notebook Pandas-Datareader. (n.d.). Retrieved from https://pypi.org/project/pandas-datareader/

  18. Google Advanced Data Analytics Capstone

    • kaggle.com
    zip
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Goosey (2023). Google Advanced Data Analytics Capstone [Dataset]. https://www.kaggle.com/datasets/timgoosey/google-advanced-data-analytics-capstone
    Explore at:
    zip(113112 bytes)Available download formats
    Dataset updated
    Aug 26, 2023
    Authors
    Tim Goosey
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Capstone project for Google Advanced Data Analytics. Dataset to build predictive models to provide insights to the HR department, of a large consulting firm. The HR department wants to improve employee satisfaction at the company. Data cleaning, EDA, visualization, and modeling was completed in Python.

  19. Bank Loan Case Study Dataset

    • kaggle.com
    zip
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreshth Vashisht (2023). Bank Loan Case Study Dataset [Dataset]. https://www.kaggle.com/datasets/shreshthvashisht/bank-loan-case-study-dataset/discussion
    Explore at:
    zip(117814223 bytes)Available download formats
    Dataset updated
    May 4, 2023
    Authors
    Shreshth Vashisht
    Description

    This case study aims to give you an idea of applying EDA in a real business scenario. In this case study, apart from applying the techniques that you have learnt in the EDA module, you will also develop a basic understanding of risk analytics in banking and financial services and understand how data is used to minimize the risk of losing money while lending to customers.

    Business Understanding: The loan providing companies find it hard to give loans to the people due to their insufficient or non-existent credit history. Because of that, some consumers use it as their advantage by becoming a defaulter. Suppose you work for a consumer finance company which specialises in lending various types of loans to urban customers. You have to use EDA to analyse the patterns present in the data. This will ensure that the applicants capable of repaying the loan are not rejected.

    When the company receives a loan application, the company has to decide for loan approval based on the applicant’s profile. Two types of risks are associated with the bank’s decision:

    If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company. If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving the loan may lead to a financial loss for the company. The data given below contains the information about the loan application at the time of applying for the loan. It contains two types of scenarios:

    The client with payment difficulties: he/she had late payment more than X days on at least one of the first Y instalments of the loan in our sample All other cases: All other cases when the payment is paid on time. When a client applies for a loan, there are four types of decisions that could be taken by the client/company:

    Approved: The company has approved loan application Cancelled: The client cancelled the application sometime during approval. Either the client changed her/his mind about the loan or in some cases due to a higher risk of the client he received worse pricing which he did not want. Refused: The company had rejected the loan (because the client does not meet their requirements etc.). Unused Offer: Loan has been cancelled by the client but on different stages of the process. In this case study, you will use EDA to understand how consumer attributes and loan attributes influence the tendency of default.

    Business Objectives: It aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.

    In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.

    To develop your understanding of the domain, you are advised to independently research a little about risk analytics – understanding the types of variables and their significance should be enough).

    Data Understanding: Download the Dataset using the link given under dataset section on the right.

    application_data.csv contains all the information of the client at the time of application. The data is about wheather a client has payment difficulties. previous_application.csv contains information about the client’s previous loan data. It contains the data whether the previous application had been Approved, Cancelled, Refused or Unused offer. columns_descrption.csv is data dictionary which describes the meaning of the variables. You are required to provide a detailed report for the below data record mentioning the answer to the questions that follows:

    Present the overall approach of the analysis. Mention the problem statement and the analysis approach briefly Indentify the missing data and use appropriate method to deal with it. (Remove columns/or replace it with an appropriate value) Hint: Note that in EDA, since it is not necessary to replace the missing value, but if you have to replace the missing value, what should be the approach. Clearly mention the approach. Identify if there are outliers in the dataset. Also, mention why do you think it is an outlier. Again, remember that for this exercise, it is not necessary to remove any data points. Identify if there is data imbalance in the data. Find the ratio of data imbalance. Hint: Since there are a lot of columns, you can run your analysis in loops for the appropriate columns and find the insights. Explain the results of univariate, segmented univariate, bivariate analysis, etc. in business terms. Find the top 10 c...

  20. Linear Regression EDA Python

    • kaggle.com
    zip
    Updated Sep 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishi Sharma (2018). Linear Regression EDA Python [Dataset]. https://www.kaggle.com/datasets/rishisharma/linear-regression-eda-python
    Explore at:
    zip(999 bytes)Available download formats
    Dataset updated
    Sep 10, 2018
    Authors
    Rishi Sharma
    Description

    Dataset

    This dataset was created by Rishi Sharma

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
Organization logo

Ecommerce Dataset for Data Analysis

Exploratory Data Analysis, Data Visualisation and Machine Learning

Explore at:
zip(2028853 bytes)Available download formats
Dataset updated
Sep 19, 2024
Authors
Shrishti Manja
Description

This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

Search
Clear search
Close search
Google apps
Main menu