Facebook
TwitterThis dataset contains information about housing sales in Nashville, TN such as property, owner, sales, and tax information. The SQL queries I created for Data Cleaning can be found here.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Najir 0123
Released under MIT
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
About Datasets:
Domain : Finance Project: Bank loan of customers Datasets: Finance_1.xlsx & Finance_2.xlsx Dataset Type: Excel Data Dataset Size: Each Excel file has 39k+ records KPI's:
Year wise loan amount Stats Grade and sub grade wise revol_bal Total Payment for Verified Status Vs Total Payment for Non Verified Status State wise loan status Month wise loan status Get more insights based on your understanding of the data Process:
Understanding the problem Data Collection Data Cleaning Exploring and analyzing the data Interpreting the results
This data contains create database, select count * from, select * from, limit, select year as, group by, order by, inner join on, concat, round, sum, format, desc.
Facebook
TwitterI completed a PostgreSQL project to hone my SQL abilities. Following a tutorial video, I worked on a music store data analysis. In the project, I used SQL to answer several queries about the music shop company.
Facebook
TwitterIntroduction After completing my Google Data Analytics Professional Certificate on Coursera, I accomplished a Capstone Project, recommended by Google, to improve and highlight the technical skills of data analysis knowledge, such as R programming, SQL, and Tableau. In the Cyclistic Case Study, I performed many real-world tasks of a junior data analyst. To answer the critical business questions, I followed the steps of the data analysis process: ask, prepare, process, analyze, share, and act. **Scenario ** You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations. Characters and teams Cyclistic: A bike-share program that has grown to a fleet of 5,824 bicycles that are tracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system at any time. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day. Stakeholders Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels. Cyclistic marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals and how you, as a junior data analyst, can help Cyclistic achieve them. *Cyclistic executive team: *The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains detailed information about the COVID-19 pandemic. The inspiration behind this dataset is to analyze trends, identify patterns, and understand the global impact of COVID-19 through SQL queries. It is designed for anyone interested in data exploration and real-world analytics.
Facebook
TwitterIntroduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.
Section 1 - Ask:
A. Guiding Questions:
1. Who are the key stakeholders and what are their goals for the data analysis project?
2. What is the business task that this data analysis project is attempting to solve?
B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.
Section 2 - Prepare:
A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?
B. Key Tasks:
Research and communicate the source of the data, and how it is stored/organized to stakeholders.
*The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
*Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were:
-sleepDay_merged.csv
-dailyActivity_merged.csv
Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This project focuses on analyzing the S&P 500 companies using data analysis tools like Python (Pandas), SQL, and Power BI. The goal is to extract insights related to sectors, industries, locations, and more, and visualize them using dashboards.
Included Files:
sp500_cleaned.csv – Cleaned dataset used for analysis
sp500_analysis.ipynb – Jupyter Notebook (Python + SQL code)
dashboard_screenshot.png – Screenshot of Power BI dashboard
README.md – Summary of the project and key takeaways
This project demonstrates practical data cleaning, querying, and visualization skills.
Facebook
Twitter🎟️ BookMyShow SQL Data Analysis 🎯 Objective This project leverages SQL-based analysis to gain actionable insights into user engagement, movie performance, theater efficiency, payment systems, and customer satisfaction on the BookMyShow platform. The goal is to enhance platform performance, boost revenue, and optimize user experience through data-driven strategies.
📊 Key Analysis Areas 1. 👥 User Behavior & Engagement Identify most active users and repeat customers Track unique monthly users Analyze peak booking times and average tickets per user Drive engagement strategies and boost customer retention 2. 🎬 Movie Performance Analysis Highlight top-rated and most booked movies Analyze popular languages and high-revenue genres Study average occupancy rates Focus marketing on high-performing genres and content 3. 🏢 Theater & Show Performance Pinpoint theaters with highest/lowest bookings Evaluate popular show timings Measure theater-wise revenue contribution and occupancy Improve theater scheduling and resource allocation 4. 💵 Booking & Revenue Insights Track total revenue, top spenders, and monthly booking patterns Discover most used payment methods Calculate average price per booking and bookings per user Optimize revenue generation and spending strategies 5. 🪑 Seat Utilization & Pricing Strategy Identify most booked seat types and their revenue impact Analyze seat pricing variations and price elasticity Align pricing strategy with demand patterns for higher revenue 6. ✅❌ Payment & Transaction Analysis Distinguish successful vs. failed transactions Track refund frequency and payment delays Evaluate revenue lost due to failures Enhance payment processing systems 7. ⭐ User Reviews & Sentiment Analysis Measure average ratings per movie Identify top and lowest-rated content Analyze review volume and sentiment trends Leverage feedback to refine content offerings 🧰 Tech Stack Query Language: SQL (MySQL/PostgreSQL) Database Tools: DBeaver, pgAdmin, or any SQL IDE Visualization (Optional): Power BI / Tableau for presenting insights Version Control: Git & GitHub
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a self-guided project.
PROBLEM STATEMENT: What underlying trends could the company be missing out on in our Pizza Sales data that can aid in gap analysis of its business sales.
OBJECTIVES: 1. Generate Key Performance Indicators (KPIs) of the Pizza Sales data for insight gain into underlying business performance. 2. Visualize important aspects of the Pizza Sales data to gain insight and understand key trends\
I dived into the csv dataset to uncover patterns within the Pizza Sales data which spanned across a calendar.
Used Microsoft SQL SMSS to perform EDA (Exploratory Data Analysis); ergo, identifying trends and sales patterns.
Having completed that, I used the Microsoft Power BI to create a visualization as a means to visually represent of my analytical findings to technical and non-technical viewers.
STEPS COMPLETED: Data Importation SQL Data analysis query writing Data Cleaning Data Processing Data Visualization Report/Dashboard Development
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Project Name: Divvy Bikeshare Trip Data_Year2020 Date Range: April 2020 to December 2020. Analyst: Ajith Software: R Program, Microsoft Excel IDE: RStudio
The following are the basic system requirements, necessary for the project: Processor: Intel i3 or AMD Ryzen 3 and higher Internal RAM: 8 GB or higher Operating System: Windows 7 or above, MacOS
**Data Usage License: https://ride.divvybikes.com/data-license-agreement ** Introduction:
In this case, study we aim to utilize different data analysis techniques and tools, to understand the rental patterns of the divvy bike sharing company and understand the key business improvement suggestions. This case study is a mandatory project to be submitted to achieve the Google Data Analytics Certification. The data utilized in this case study was licensed based on the provided data usage license. The trips between April 2020 to December 2020 are used to analyse the data.
Scenario: Marketing team needs to design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ.
Objective: The main objective of this case study, is to understand the customer usage patterns and the breakdown of customers, based on their subscription status and the average durations of the rental bike usage.
Introduction to Data: The Data provided for this project, is adhered to the data usage license, laid down by the source company. The source data was provided in the CSV files and are month and quarter breakdowns. A total of 13 columns of data was provided in each csv file.
The following are the columns, which were initially observed across the datasets.
Ride_id Ride_type Start_station_name Start_station_id End_station_name End_station_id Usertype Start_time End_time Start_lat Start_lng End_lat End_lng
Documentation, Cleaning and Preparing Data for Analysis: The total size of the datasets, for the year 2020, is approximately 450 MB, which is tiring job, when you have to upload them to the SQL database and visualize using the BI tools. I wanted to improve my skills into R environment and this is the best opportunity and optimal to use R for the data analysis.
For more insights, installation procedures for R and RStudio, please refer to the following URL, for additional information.
R Projects Document: https://www.r-project.org/other-docs.html RStudio Download: https://www.rstudio.com/products/rstudio/ Installation Guide: https://www.youtube.com/watch?v=TFGYlKvQEQ4
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Seattle, WA.
Content The following Airbnb activity is included in this Seattle dataset:
Inspiration - Can you describe the vibe of each Seattle neighborhood using listing descriptions? - What are the busiest times of the year to visit Seattle? By how much do prices spike? - Is there a general upward trend of both new Airbnb listings and total Airbnb visitors to Seattle?
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data analytics project utilized SQL and Tableau to analyze and model real estate prices in Georgia. The data was cleaned and transformed in SQL, and visualizations were created in Tableau to identify key trends and patterns. A linear regression model was developed to predict property prices based on given features, and the model was validated using statistical metrics. The results were presented in an interactive dashboard, enabling users to explore the data and make informed decisions related to real estate investments in Georgia.
Thanks to the orginal authors of this dataset, which was co-produced by Guenter Roehrich and Jordan, who produced a dataset of real estate listings for Georgia for the first 6 months of 2021.
For visualizations related to this project, click the tableau link in my bio or visit tableau public.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset includes information on the number of confirmed deaths from COVID-19, showing the total impact of the pandemic on mortality globally. The Our World in Data COVID-19 dataset is open-source, updated daily, and can be found here.
SQL Queries for Data Exploration can be found on this Github Repository.
Covid Dashboard created can be found on this Tableau Public Page.
Facebook
TwitterSQL Case Study Project: Employee Database Analysis 📊
I recently completed a comprehensive SQL project involving a simulated employee database with multiple tables:
In this project, I practiced and applied a wide range of SQL concepts:
✅ Simple Queries ✅ Filtering with WHERE conditions ✅ Sorting with ORDER BY ✅ Aggregation using GROUP BY and HAVING ✅ Multi-table JOINs ✅ Conditional Logic using CASE ✅ Subqueries and Set Operators
💡 Key Highlights:
🛠️ Tools Used: Azure Data Studio
📂 You can find the entire project and scripts here:
👉 https://github.com/RiddhiNDivecha/Employee-Database-Analysis
This project helped me sharpen my SQL skills and understand business logic more deeply in a practical context.
💬 I’m open to feedback and happy to connect with fellow data enthusiasts!
Facebook
TwitterRSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.
The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.
For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.
Facebook
TwitterThis dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fdd8e395e5d70bde9279f0f653b4bc2bf%2FGemini_Generated_Image_cvz71ncvz71ncvz7.jpg?generation=1736783649344014&alt=media" alt="">
This project involves analyzing and transforming data from a bike warehouse database using SQL. The goal is to clean, transform, and query the data to generate insights about products, employees, customers, sales, and trends.
The SAP Bikes Sales database contains various tables that represent business data for a bike warehouse, such as information on products, sales, employees, business partners, and more. This project focuses on cleaning and transforming data, optimizing database schema, and generating SQL queries to gain business insights.
1.**Data Cleaning & Transformation**: - Remove duplicate records from key tables. - Drop unnecessary columns and handle null values. - Populate new columns based on existing data. - Merge related tables to create new insights. 2.**Business Insights Queries**: - Top-selling Products: Identify products with the highest sales quantities and total revenue. - Sales Performance by Product Category: Analyze revenue and order counts by product category. - Employee Sales Performance: Track employees' contribution to sales volumes and revenue. - Customer Segmentation: Examine the number of orders placed by business partners and their total sales value. - Sales Trends: Analyze sales trends over time and calculate average order values.
-**Addresses Table**:
-Checking for duplicates ADDRESSID.
-**BusinessPartners Table**:
-Handled duplicates, missing or incorrect data.
-Dropped the unnecessary FAXNUMBER column because it was empty.
-**Employee Table**:
-Dropped unnecessary columns.
-Populated NAME_INITIALS based on employee's first, middle, and last name initials.
-Fixed column type issues.
-**Product Categories and Product Texts**:
-Merged ProductCategories and ProductCategoryText tables into a new CombinedProductCategories table for easy analysis.
-**Products Table**:
-Dropped irrelevant columns such as WIDTH, DEPTH, HEIGHT, etc.
-**Sales Order Items Table**:
-Fixed null values in GROSSAMOUNT and created a TOTALGROSSAMOUNT column to track sales volume.
###2. Database Diagram and Relationships In addition to the data cleaning and analysis, a database diagram has been create...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a collection of SQL scripts and techniques developed by business data analyst to assist with data optimization and cleaning tasks. The scripts cover a range of data management operations, including:
1) Data cleansing: Identifying and addressing issues such as missing values, duplicate records, formatting inconsistencies, and outliers. 2) Data normalization: Designing optimized database schemas and normalizing data structures to minimize redundancy and improve data integrity. 3) Data transformation and ETL: Developing efficient Extract, Transform, and Load (ETL) pipelines to integrate data from multiple sources and perform complex data transformations. 4) Reporting and dashboarding: Creating visually appealing and insightful reports, dashboards, and data visualizations to support informed decision-making.
The scripts and techniques in this dataset are tailored to the needs of business data analysts and can be used to enhance the quality, efficiency, and value of data-driven insights.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
*Data Analysis Project done using MS SQL. *Data Analysis of Music Store has been done using various queries in order to return specific data according to what is required of the question. *3 sets - according to difficulty and complexity of the query.
Facebook
TwitterThis dataset contains information about housing sales in Nashville, TN such as property, owner, sales, and tax information. The SQL queries I created for Data Cleaning can be found here.