Facebook
TwitterThis dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.
About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.
Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.
This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.
This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning
Facebook
TwitterThis dataset was created by Monis Ahmad
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Saubhagya Mishra
Released under MIT
Facebook
TwitterThis is a transactions data from an Electronics store chain in the US. The data contains 12 CSV files for each month of 2019.
The naming convention is as follows: Sales_[MONTH_NAME]_2019
Each file contains anywhere from around 9000 to 26000 rows and 6 columns. The columns are as follows:
Order ID, Product, Quantity Ordered, Price Each, Order Date, Purchase Address
There are around 186851 data points combining all the 12-month files. There may be null values in some rows.
Facebook
TwitterThis is a Weather dataset analysis project where basic libraries, statistics and different functions of python are used for data analysis. Exploratory Data Analysis has been also implemented to gain better insights.
Facebook
TwitterThis dataset was created by Sohail K. Nikouzad
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a cleaned version of a Netflix movies dataset prepared for exploratory data analysis (EDA). Missing values have been handled, invalid rows removed, and numerical + categorical columns cleaned for analysis using Python and Pandas.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains data more than 9.5k car sales in Ukraine.Most of then are used car so it open the possibility to analyze featurs related to car operation. This is subset of all car data in Ukraine. Using this we will analyze the various parameter of used car sales in Ukraine.
1.1 Introduction: This Exploratory Data Analysis is to practice python skills till now on a structured dataset including loading, inspecting,wrangling,Exploring and drawing conclusions from the data.The notebook has the obeservations with each step in order to explain throughtly how to approach the dataset. Based on the obseravation some quetions also are answered in the notebook for the reference though not all them are explored in the analysis.
1.2 Data Source and Dataset: a. How was it collected?
Name: Car Sales Sponsering Organization: Dont Know! Year :2019 Description: This is case study of more than 9.5k car sales in Ukraine. b. it is sample? If yes ,What is properly sampled?
Yes .It is sample .We dont have official information about the data collection method, but its appears not to be random sample, so we can assume that it is not representative.
Facebook
TwitterConducted an in-depth analysis of Cyclistic bike-share data to uncover customer usage patterns and trends. Cleaned and processed raw data using Python libraries such as pandas and NumPy to ensure data quality. Performed exploratory data analysis (EDA) to identify insights, including peak usage times, customer demographics, and trip duration patterns. Created visualizations using Matplotlib and Seaborn to effectively communicate findings. Delivered actionable recommendations to enhance customer engagement and optimize operational efficiency.
Facebook
TwitterDescription: Dive into the world of exceptional cinema with our meticulously curated dataset, "IMDb's Gems Unveiled." This dataset is a result of an extensive data collection effort based on two critical criteria: IMDb ratings exceeding 7 and a substantial number of votes, surpassing 10,000. The outcome? A treasure trove of 4070 movies meticulously selected from IMDb's vast repository.
What sets this dataset apart is its richness and diversity. With more than 20 data points meticulously gathered for each movie, this collection offers a comprehensive insight into each cinematic masterpiece. Our data collection process leveraged the power of Selenium and Pandas modules, ensuring accuracy and reliability.
Cleaning this vast dataset was a meticulous task, combining both Excel and Python for optimum precision. Analysis is powered by Pandas, Matplotlib, and NLTK, enabling to uncover hidden patterns, trends, and themes within the realm of cinema.
Note: The data is collected as of April 2023. Future versions of this analysis include Movie recommendation system Please do connect for any queries, All Love, No Hate.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
What Can Pandas Do?
Pandas gives you answers about the data. Like:
Is there a correlation between two or more columns?
What is average value?
Max value?
Min value?
Facebook
TwitterIn this project, I have done exploratory data analysis on the UCI Automobile dataset available at https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data
This dataset consists of data From the 1985 Ward's Automotive Yearbook. Here are the sources
1) 1985 Model Import Car and Truck Specifications, 1985 Ward's Automotive Yearbook. 2) Personal Auto Manuals, Insurance Services Office, 160 Water Street, New York, NY 10038 3) Insurance Collision Report, Insurance Institute for Highway Safety, Watergate 600, Washington, DC 20037
Number of Instances: 398 Number of Attributes: 9 including the class attribute
Attribute Information:
mpg: continuous cylinders: multi-valued discrete displacement: continuous horsepower: continuous weight: continuous acceleration: continuous model year: multi-valued discrete origin: multi-valued discrete car name: string (unique for each instance)
This data set consists of three types of entities:
I - The specification of an auto in terms of various characteristics
II - Tts assigned an insurance risk rating. This corresponds to the degree to which the auto is riskier than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is riskier (or less), this symbol is adjusted by moving it up (or down) the scale. Actuaries call this process "symboling".
III - Its normalized losses in use as compared to other cars. This is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/specialty, etc...), and represents the average loss per car per year.
The analysis is divided into two parts:
Data Wrangling
Exploratory Data Analysis
Descriptive statistics
Groupby
Analysis of variance
Correlation
Correlation stats
Acknowledgment Dataset: UCI Machine Learning Repository Data link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data
Facebook
TwitterThis dataset was created by Damini Tiwari
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Exploratory Data Analysis (EDA) of ZOMATO BANGALORE DATASET using Python and its libraries (Pandas , Matplotlib and Seaborn ). Analyzed restaurant distribution ,top cuisines ,rating distribution, cost for two and other interesting insights.
Included files: - NOTEBOOK : : ZOMATO_EDA.ipynb -IMAGES : : Visualizations of key insights - requirement.txt : : Python dependencies
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Dataset "World's Air Quality and Water Pollution" was obtained from Jack Jae Hwan Kim Kaggle page. This Dataset is comprized of 5 columns; "City", "Region", "Country", "Air Quality" and "Water Pollution". The last two columns consist of values varying from 0 to 100; Air Quality Column: Air quality varies from 0 (bad quality) to 100 (top good quality) Water Pollution Column: Water pollution varies from 0 (no pollution) to 100 (extreme pollution).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Startup India - Exploratory Data Analysis
1- The dataset contains updated record of all startups from 1963 to 2021. 2- An Exploratory Data Analysis is performed our the record with different types of data visualizations.
Technologies Used: Python Numpy Pandas Matplotlib Seaborn
Facebook
TwitterThis dataset has total population of dingapore basing on their ethnicity,gender . It is raw data which has mixed entities in columns . from year 1957 to 2018 population data is given . The main aim in uploading this data is to get skilled in python pandas for exploratory data analysis.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the dataset required for Keith Galli's 'Solving real world data science tasks with Python Pandas!' video. Where he analyzes and answers business questions for 12 months worth of business data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.
I decided to upload the data here so that I can carry out the exercise straight on Kaggle Notebooks. Making it ready for viewing as a portfolio project.
12 .csv files containing sales data for each month of 2019.
Of course, all thanks goes to Keith Galli and the great work he does with his tutorials. He has several other amazing tutorials that you can follow and subscribe at his channel.
Facebook
TwitterThis dataset was created by Rahul Sharma
Facebook
TwitterThis notebook presents an exploratory data analysis (EDA) and visualisations of the 911 emergency calls dataset. The analysis aims to provide insights into the patterns and trends of emergency calls, including their distribution across various categories and over time. The following key aspects are covered:
The visualisations in this notebook are intended to provide a clear understanding of the patterns and dynamics of emergency calls, offering valuable insights for relevant stakeholders in the emergency response and public safety domains.
The analysis was performed using Python, with data manipulation and visualisation libraries such as Pandas, Matplotlib, and Seaborn.
The dataset used for this analysis is the '911 Calls' dataset, which includes information about emergency calls received by the emergency response systems.
Feel free to provide feedback and suggestions for further analysis or improvements.
Facebook
TwitterThis dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.
About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.
Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.
This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.
This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning