Facebook
TwitterThis dataset was created by Monis Ahmad
Facebook
TwitterThis dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.
About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.
Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.
This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.
This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Saubhagya Mishra
Released under MIT
Facebook
TwitterConducted an in-depth analysis of Cyclistic bike-share data to uncover customer usage patterns and trends. Cleaned and processed raw data using Python libraries such as pandas and NumPy to ensure data quality. Performed exploratory data analysis (EDA) to identify insights, including peak usage times, customer demographics, and trip duration patterns. Created visualizations using Matplotlib and Seaborn to effectively communicate findings. Delivered actionable recommendations to enhance customer engagement and optimize operational efficiency.
Facebook
TwitterThis dataset was created by Sohail K. Nikouzad
Facebook
TwitterThis is a Weather dataset analysis project where basic libraries, statistics and different functions of python are used for data analysis. Exploratory Data Analysis has been also implemented to gain better insights.
Facebook
TwitterThis is a transactions data from an Electronics store chain in the US. The data contains 12 CSV files for each month of 2019.
The naming convention is as follows: Sales_[MONTH_NAME]_2019
Each file contains anywhere from around 9000 to 26000 rows and 6 columns. The columns are as follows:
Order ID, Product, Quantity Ordered, Price Each, Order Date, Purchase Address
There are around 186851 data points combining all the 12-month files. There may be null values in some rows.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains data more than 9.5k car sales in Ukraine.Most of then are used car so it open the possibility to analyze featurs related to car operation. This is subset of all car data in Ukraine. Using this we will analyze the various parameter of used car sales in Ukraine.
1.1 Introduction: This Exploratory Data Analysis is to practice python skills till now on a structured dataset including loading, inspecting,wrangling,Exploring and drawing conclusions from the data.The notebook has the obeservations with each step in order to explain throughtly how to approach the dataset. Based on the obseravation some quetions also are answered in the notebook for the reference though not all them are explored in the analysis.
1.2 Data Source and Dataset: a. How was it collected?
Name: Car Sales Sponsering Organization: Dont Know! Year :2019 Description: This is case study of more than 9.5k car sales in Ukraine. b. it is sample? If yes ,What is properly sampled?
Yes .It is sample .We dont have official information about the data collection method, but its appears not to be random sample, so we can assume that it is not representative.
Facebook
TwitterDescription: Dive into the world of exceptional cinema with our meticulously curated dataset, "IMDb's Gems Unveiled." This dataset is a result of an extensive data collection effort based on two critical criteria: IMDb ratings exceeding 7 and a substantial number of votes, surpassing 10,000. The outcome? A treasure trove of 4070 movies meticulously selected from IMDb's vast repository.
What sets this dataset apart is its richness and diversity. With more than 20 data points meticulously gathered for each movie, this collection offers a comprehensive insight into each cinematic masterpiece. Our data collection process leveraged the power of Selenium and Pandas modules, ensuring accuracy and reliability.
Cleaning this vast dataset was a meticulous task, combining both Excel and Python for optimum precision. Analysis is powered by Pandas, Matplotlib, and NLTK, enabling to uncover hidden patterns, trends, and themes within the realm of cinema.
Note: The data is collected as of April 2023. Future versions of this analysis include Movie recommendation system Please do connect for any queries, All Love, No Hate.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the dataset required for Keith Galli's 'Solving real world data science tasks with Python Pandas!' video. Where he analyzes and answers business questions for 12 months worth of business data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.
I decided to upload the data here so that I can carry out the exercise straight on Kaggle Notebooks. Making it ready for viewing as a portfolio project.
12 .csv files containing sales data for each month of 2019.
Of course, all thanks goes to Keith Galli and the great work he does with his tutorials. He has several other amazing tutorials that you can follow and subscribe at his channel.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Exploratory Data Analysis (EDA) of ZOMATO BANGALORE DATASET using Python and its libraries (Pandas , Matplotlib and Seaborn ). Analyzed restaurant distribution ,top cuisines ,rating distribution, cost for two and other interesting insights.
Included files: - NOTEBOOK : : ZOMATO_EDA.ipynb -IMAGES : : Visualizations of key insights - requirement.txt : : Python dependencies
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a cleaned version of a Netflix movies dataset prepared for exploratory data analysis (EDA). Missing values have been handled, invalid rows removed, and numerical + categorical columns cleaned for analysis using Python and Pandas.
Facebook
TwitterTo explore and learn more on Multiple Linear Regression.
The dataset consists of house prices across the USA. It has the following columns: - Avg. Area Income: Numerical data about the average area of the income where the house is located. - House Age: Age of the house in years. - Number of Rooms - Number of Bedrooms - Area Population: Population of the area where the house is located. - Price - Address: The only textual data in the dataset consisting of the address of the house.
Facebook
TwitterThis dataset was created by Damini Tiwari
Facebook
TwitterThis dataset was created by Rahul Sharma
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Startup India - Exploratory Data Analysis
1- The dataset contains updated record of all startups from 1963 to 2021. 2- An Exploratory Data Analysis is performed our the record with different types of data visualizations.
Technologies Used: Python Numpy Pandas Matplotlib Seaborn
Facebook
TwitterThis notebook presents an exploratory data analysis (EDA) and visualisations of the 911 emergency calls dataset. The analysis aims to provide insights into the patterns and trends of emergency calls, including their distribution across various categories and over time. The following key aspects are covered:
The visualisations in this notebook are intended to provide a clear understanding of the patterns and dynamics of emergency calls, offering valuable insights for relevant stakeholders in the emergency response and public safety domains.
The analysis was performed using Python, with data manipulation and visualisation libraries such as Pandas, Matplotlib, and Seaborn.
The dataset used for this analysis is the '911 Calls' dataset, which includes information about emergency calls received by the emergency response systems.
Feel free to provide feedback and suggestions for further analysis or improvements.
Facebook
TwitterThis dataset has total population of dingapore basing on their ethnicity,gender . It is raw data which has mixed entities in columns . from year 1957 to 2018 population data is given . The main aim in uploading this data is to get skilled in python pandas for exploratory data analysis.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Dataset "World's Air Quality and Water Pollution" was obtained from Jack Jae Hwan Kim Kaggle page. This Dataset is comprized of 5 columns; "City", "Region", "Country", "Air Quality" and "Water Pollution". The last two columns consist of values varying from 0 to 100; Air Quality Column: Air quality varies from 0 (bad quality) to 100 (top good quality) Water Pollution Column: Water pollution varies from 0 (no pollution) to 100 (extreme pollution).
Facebook
Twitterhttps://www.kaggle.com/code/mithilesh9/amazon-sales-data-analysis-using-python
Dataset Description This dataset contains a 100 rows of sales data for Amazon, including the region, country, item type, sales channel, order priority, order date, order ID, ship date, units sold, unit price, unit cost, total revenue, total cost, and total profit.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19501062%2F5d10a624d07eefb2240c474ca00114b6%2FScreenshot%202024-06-25%20135139.png?generation=1719303822906805&alt=media" alt="">
Facebook
TwitterThis dataset was created by Monis Ahmad