Facebook
TwitterThis dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.
About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.
Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.
This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.
This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning
Facebook
TwitterThis dataset was created by 64 Aashish Chaudhary
Released under Other (specified in description)
Facebook
TwitterThis is a transactions data from an Electronics store chain in the US. The data contains 12 CSV files for each month of 2019.
The naming convention is as follows: Sales_[MONTH_NAME]_2019
Each file contains anywhere from around 9000 to 26000 rows and 6 columns. The columns are as follows:
Order ID, Product, Quantity Ordered, Price Each, Order Date, Purchase Address
There are around 186851 data points combining all the 12-month files. There may be null values in some rows.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a cleaned version of a Netflix movies dataset originally used for exploratory data analysis (EDA). The dataset contains information such as:
Missing values have been handled using appropriate methods (mean, median, unknown), and new features like rating_level and popular have been added for deeper analysis.
The dataset is ready for: - EDA - Data visualization - Machine learning tasks - Dashboard building
Used in the accompanying notebook
Facebook
TwitterThis dataset was created by syamalakumar
Facebook
TwitterID: Unique identifier for each individual in the dataset. Name: Name of the individual. Date_of_Birth: Birth date of the individual. Avg_Salary: Average salary of the individual. Nationality: Nationality of the individual. Address: Address of the individual. Zip_Code: Zip code of the individual's address. Monthly_Spending(USD): Monthly spending in USD by the individual. Health_Condition: Health condition of the individual (e.g., Diabetes, Asthma, Depression, Cancer). Marital_Status: Marital status of the individual (e.g., Married, Single, Divorced). Education: Highest education level attained by the individual (e.g., Bachelor's, Master's, Ph.D., High School). Gender: Gender of the individual. Occupation: Occupation of the individual (e.g., Engineer, Doctor, Teacher, Businessman, Nurse, IT Professional, Chef, Scientist). Number_of_Child: Number of children the individual has. Email_Address: Email address of the individual. Blood_Type: Blood type of the individual. Property_Value: Value of the individual's property. Credit_Score: Credit score of the individual. Smoking_Habit: Smoking habit of the individual (Yes/No). Preferred_Social_Network: Preferred social network of the individual (e.g., Facebook, Instagram, WhatsApp, Snapchat).
Facebook
TwitterConducted an in-depth analysis of Cyclistic bike-share data to uncover customer usage patterns and trends. Cleaned and processed raw data using Python libraries such as pandas and NumPy to ensure data quality. Performed exploratory data analysis (EDA) to identify insights, including peak usage times, customer demographics, and trip duration patterns. Created visualizations using Matplotlib and Seaborn to effectively communicate findings. Delivered actionable recommendations to enhance customer engagement and optimize operational efficiency.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data is publicly available on GitHub here. It can be utilized for EDA, Statistical Analysis, and Visualizations.
The data set ifood_df.csv consists of 2206 customers of XYZ company with data on:
- Customer profiles
- Product preferences
- Campaign successes/failures
- Channel performance
I do not own this dataset. I am simply making it accessible on this platform via the public GitHub link.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Chaitu Devi
Released under MIT
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Steps
1. Load Data
2. Check Nulls and Update Data if required
3. Perform Descriptive Statistics
4. Data Visualization
Univariate - Single Column Visualization
categorical - countplot
continuous - histogram
Bivariate - 2 Columns Visualization
continuous vs continuous - scatterplot, regplot
categorical vs continuous - boxplot
categorical vs categorical - crosstab, heatmap
Multivariate - Multi Columns Visualization
correlation plot
pairplot
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
**### Context
EDA using numpy and pandas
In this Task i have to predict what factors makes an app perform well .whether its size , price , category or multiple factors together . what makes an app rank on the top in google Playstore .**
Column description: App : name of the application Category: category of the application Rating: rating of an application Reviews: reviews of that application Size: size of application Installs:how many users installed that application Type: Type of application Price: price of application content rating:rating of content of the application
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
- About Dataset
Description
The Data Set was downloaded from Kaggle, from the following link
Context
Google PlayStore Android App Data. (2.3 Million+ App Data)
Backup repo: https://github.com/gauthamp10/Google-Playstore-Dataset
Content
I've collected the data with the help of Python script (Scrapy) running on a cloud vm instance.
The data was collected in the month of june 2025.
Also checkout:
Apple AppStore Apps dataset: https://www.kaggle.com/gauthamp10/apple-appstore-apps Android App Permission dataset: https://www.kaggle.com/gauthamp10/app-permissions-android
Acknowledgements
I couldn't have build this dataset without the help of Github Education and switched to facundoolano/google-play-scraper for sane reasons
Inspiration
Took inspiration from: https://www.kaggle.com/lava18/google-play-store-apps to build a big database for students and researchers.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by ADITYA MISHRA
Released under CC0: Public Domain
Facebook
TwitterThis notebook presents a comprehensive exploratory data analysis on a smartphone dataset, covering the distribution of prices, feature prevalence, and relationships between specs and price. All code, plots, and insights are included. Feedback and suggestions welcome!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A retail store that has multiple outlets across the country are facing issues in managing the inventory - to match the demand with respect to supply. You are a data scientist, who has to come up with useful insights using the data and make prediction models to forecast the sales for X number of months/years.
Facebook
TwitterThis dataset was created by Akalya Subramanian
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains more than 180M consumer reviews on different amazon products. This dataset is also available on other dataset related sites, but I wrangled it and shared it here.
This dataset contains the following attributes:
Total Records: 180M+ Total Columns: 6 Domain Name: amazon.com File Extension: CSV
Available Fields: rating, verified, reviewerID, product_id, date, vote
rating: Overall Rating out of 5 verified: Verification Status of the review (A term used in Amazon) reviewerID: Reviewer ID product_id: Product ID date: Date of Review (TimeStamp) vote: Helpful votes for review
We wouldn't be here without the help of teams who gathered this dataset and made it public.
Exploratory Data Analysis and Machine Learning and applying it to this kind of dataset are one of the biggest inspirations for this contribution.
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
I was reading Every Nose Counts: Using Metrics in Animal Shelters when I got inspired to conduct an EDA on animal shelter data. I looked online for data and found this dataset which is curated by Austin Animal Center. The data can be found on https://data.austintexas.gov.
This data can be utilized for EDA practice. So go ahead and help animal shelters with your EDA powers by completing this task!
The data set contains three CSVs:
1. Austin_Animal_Center_Intakes.csv
2. Austin_Animal_Center_Outcomes.csv
3. Austin_Animal_Center_Stray_Map.csv
More TBD!
Thank you Austin Animal Center for all the animal protection you provide to stray & owned animals. Also, thank you for making your data accessible to the public.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset supports a study examining how students perceive the usefulness of artificial intelligence (AI) in educational settings. The project involved analyzing an open-access survey dataset that captures a wide range of student responses on AI tools in learning.
The data underwent cleaning and preprocessing, followed by an exploratory data analysis (EDA) to identify key trends and insights. Visualizations were created to support interpretation, and the results were summarized in a digital poster format to communicate findings effectively.
This resource may be useful for researchers, educators, and technologists interested in the evolving role of AI in education.
Keywords: Artificial Intelligence, Education, Student Perception, Survey, Data Analysis, EDA
Subject: Computer and Information Science
License: CC0 1.0 Universal Public Domain Dedication
DOI: https://doi.org/10.18738/T8/RXUCHK
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
It is Sergey's Home Credict Public Notebook code repo.
volume_stats.csv was obtained from the EDA code.The calculation was obtained by using the below snippet:
Facebook
TwitterThis dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.
About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.
Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.
This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.
This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning