100+ datasets found
  1. Housing - SQL Project

    • kaggle.com
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ann Truong (2023). Housing - SQL Project [Dataset]. https://www.kaggle.com/datasets/bvanntruong/housing-sql-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ann Truong
    Description

    This dataset contains information about housing sales in Nashville, TN such as property, owner, sales, and tax information. The SQL queries I created for Data Cleaning can be found here.

  2. Monday Coffee SQL Data Analysis Project

    • kaggle.com
    zip
    Updated Nov 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Najir 0123 (2024). Monday Coffee SQL Data Analysis Project [Dataset]. https://www.kaggle.com/datasets/najir0123/monday-coffee-sql-data-analysis-project
    Explore at:
    zip(2735826 bytes)Available download formats
    Dataset updated
    Nov 15, 2024
    Authors
    Najir 0123
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Najir 0123

    Released under MIT

    Contents

  3. Bank Loan Analysis Project in Mysql

    • kaggle.com
    zip
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjana Murthy (2024). Bank Loan Analysis Project in Mysql [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/bank-loan-analysis-project-in-mysql
    Explore at:
    zip(739 bytes)Available download formats
    Dataset updated
    Jul 3, 2024
    Authors
    Sanjana Murthy
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    About Datasets:

    Domain : Finance Project: Bank loan of customers Datasets: Finance_1.xlsx & Finance_2.xlsx Dataset Type: Excel Data Dataset Size: Each Excel file has 39k+ records KPI's:

    Year wise loan amount Stats Grade and sub grade wise revol_bal Total Payment for Verified Status Vs Total Payment for Non Verified Status State wise loan status Month wise loan status Get more insights based on your understanding of the data Process:

    Understanding the problem Data Collection Data Cleaning Exploring and analyzing the data Interpreting the results

    This data contains create database, select count * from, select * from, limit, select year as, group by, order by, inner join on, concat, round, sum, format, desc.

  4. Music Store Data Analysis Project using SQL

    • kaggle.com
    zip
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aetik (2023). Music Store Data Analysis Project using SQL [Dataset]. https://www.kaggle.com/datasets/adimadapalageetika/music-store-data-analysis-project-using-sql/discussion
    Explore at:
    zip(1748 bytes)Available download formats
    Dataset updated
    Jun 30, 2023
    Authors
    Aetik
    Description

    I completed a PostgreSQL project to hone my SQL abilities. Following a tutorial video, I worked on a music store data analysis. In the project, I used SQL to answer several queries about the music shop company.

  5. Google Data Analytics Capstone Project

    • kaggle.com
    zip
    Updated Jul 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ponomarliliia (2023). Google Data Analytics Capstone Project [Dataset]. https://www.kaggle.com/datasets/ponomarlili/google-data-analytics-capstone-project
    Explore at:
    zip(214473433 bytes)Available download formats
    Dataset updated
    Jul 14, 2023
    Authors
    Ponomarliliia
    Description

    Introduction After completing my Google Data Analytics Professional Certificate on Coursera, I accomplished a Capstone Project, recommended by Google, to improve and highlight the technical skills of data analysis knowledge, such as R programming, SQL, and Tableau. In the Cyclistic Case Study, I performed many real-world tasks of a junior data analyst. To answer the critical business questions, I followed the steps of the data analysis process: ask, prepare, process, analyze, share, and act. **Scenario ** You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations. Characters and teams Cyclistic: A bike-share program that has grown to a fleet of 5,824 bicycles that are tracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system at any time. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day. Stakeholders Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels. Cyclistic marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals and how you, as a junior data analyst, can help Cyclistic achieve them. *Cyclistic executive team: *The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

  6. COVID-19 data analysis project using MySQL.

    • kaggle.com
    zip
    Updated Dec 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shourya Negi (2024). COVID-19 data analysis project using MySQL. [Dataset]. https://www.kaggle.com/datasets/shouryanegi/covid-19-data-analysis-project-using-mysql
    Explore at:
    zip(2253676 bytes)Available download formats
    Dataset updated
    Dec 1, 2024
    Authors
    Shourya Negi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains detailed information about the COVID-19 pandemic. The inspiration behind this dataset is to analyze trends, identify patterns, and understand the global impact of COVID-19 through SQL queries. It is designed for anyone interested in data exploration and real-world analytics.

  7. Google Certificate BellaBeats Capstone Project

    • kaggle.com
    zip
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project
    Explore at:
    zip(169161 bytes)Available download formats
    Dataset updated
    Jan 5, 2023
    Authors
    Jason Porzelius
    Description

    Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

    Section 1 - Ask:

    A. Guiding Questions:
    1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

    B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

    1. Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

    Section 2 - Prepare:

    A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

    B. Key Tasks:

    1. Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
      *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv

    2. Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...

  8. S&P 500 Companies Analysis Project

    • kaggle.com
    zip
    Updated Apr 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anshadkaggle (2025). S&P 500 Companies Analysis Project [Dataset]. https://www.kaggle.com/datasets/anshadkaggle/s-and-p-500-companies-analysis-project
    Explore at:
    zip(9721576 bytes)Available download formats
    Dataset updated
    Apr 6, 2025
    Authors
    anshadkaggle
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This project focuses on analyzing the S&P 500 companies using data analysis tools like Python (Pandas), SQL, and Power BI. The goal is to extract insights related to sectors, industries, locations, and more, and visualize them using dashboards.

    Included Files:

    sp500_cleaned.csv – Cleaned dataset used for analysis

    sp500_analysis.ipynb – Jupyter Notebook (Python + SQL code)

    dashboard_screenshot.png – Screenshot of Power BI dashboard

    README.md – Summary of the project and key takeaways

    This project demonstrates practical data cleaning, querying, and visualization skills.

  9. BookMyShow-SQL-Data-Analysis

    • kaggle.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soumendu Ray (2025). BookMyShow-SQL-Data-Analysis [Dataset]. https://www.kaggle.com/datasets/soumenduray99/bookmyshow-sql-data-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Soumendu Ray
    Description

    🎟️ BookMyShow SQL Data Analysis 🎯 Objective This project leverages SQL-based analysis to gain actionable insights into user engagement, movie performance, theater efficiency, payment systems, and customer satisfaction on the BookMyShow platform. The goal is to enhance platform performance, boost revenue, and optimize user experience through data-driven strategies.

    📊 Key Analysis Areas 1. 👥 User Behavior & Engagement Identify most active users and repeat customers Track unique monthly users Analyze peak booking times and average tickets per user Drive engagement strategies and boost customer retention 2. 🎬 Movie Performance Analysis Highlight top-rated and most booked movies Analyze popular languages and high-revenue genres Study average occupancy rates Focus marketing on high-performing genres and content 3. 🏢 Theater & Show Performance Pinpoint theaters with highest/lowest bookings Evaluate popular show timings Measure theater-wise revenue contribution and occupancy Improve theater scheduling and resource allocation 4. 💵 Booking & Revenue Insights Track total revenue, top spenders, and monthly booking patterns Discover most used payment methods Calculate average price per booking and bookings per user Optimize revenue generation and spending strategies 5. 🪑 Seat Utilization & Pricing Strategy Identify most booked seat types and their revenue impact Analyze seat pricing variations and price elasticity Align pricing strategy with demand patterns for higher revenue 6. ✅❌ Payment & Transaction Analysis Distinguish successful vs. failed transactions Track refund frequency and payment delays Evaluate revenue lost due to failures Enhance payment processing systems 7. ⭐ User Reviews & Sentiment Analysis Measure average ratings per movie Identify top and lowest-rated content Analyze review volume and sentiment trends Leverage feedback to refine content offerings 🧰 Tech Stack Query Language: SQL (MySQL/PostgreSQL) Database Tools: DBeaver, pgAdmin, or any SQL IDE Visualization (Optional): Power BI / Tableau for presenting insights Version Control: Git & GitHub

  10. Company Product Sales Analysis & BI Report

    • kaggle.com
    zip
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oluwabori Abiodun-Johnson (2023). Company Product Sales Analysis & BI Report [Dataset]. https://www.kaggle.com/datasets/oluwaboriaj/pizza-company-sales-bi-report
    Explore at:
    zip(15967889 bytes)Available download formats
    Dataset updated
    Oct 25, 2023
    Authors
    Oluwabori Abiodun-Johnson
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a self-guided project.

    PROBLEM STATEMENT: What underlying trends could the company be missing out on in our Pizza Sales data that can aid in gap analysis of its business sales.

    OBJECTIVES: 1. Generate Key Performance Indicators (KPIs) of the Pizza Sales data for insight gain into underlying business performance. 2. Visualize important aspects of the Pizza Sales data to gain insight and understand key trends\

    I dived into the csv dataset to uncover patterns within the Pizza Sales data which spanned across a calendar.

    Used Microsoft SQL SMSS to perform EDA (Exploratory Data Analysis); ergo, identifying trends and sales patterns.

    Having completed that, I used the Microsoft Power BI to create a visualization as a means to visually represent of my analytical findings to technical and non-technical viewers.

    STEPS COMPLETED: Data Importation SQL Data analysis query writing Data Cleaning Data Processing Data Visualization Report/Dashboard Development

  11. Google Data Analytics Capstone Project

    • kaggle.com
    Updated Oct 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Rookie (2022). Google Data Analytics Capstone Project [Dataset]. https://www.kaggle.com/datasets/rookieaj1234/google-data-analytics-capstone-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data Rookie
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Project Name: Divvy Bikeshare Trip Data_Year2020 Date Range: April 2020 to December 2020. Analyst: Ajith Software: R Program, Microsoft Excel IDE: RStudio

    The following are the basic system requirements, necessary for the project: Processor: Intel i3 or AMD Ryzen 3 and higher Internal RAM: 8 GB or higher Operating System: Windows 7 or above, MacOS

    **Data Usage License: https://ride.divvybikes.com/data-license-agreement ** Introduction:

    In this case, study we aim to utilize different data analysis techniques and tools, to understand the rental patterns of the divvy bike sharing company and understand the key business improvement suggestions. This case study is a mandatory project to be submitted to achieve the Google Data Analytics Certification. The data utilized in this case study was licensed based on the provided data usage license. The trips between April 2020 to December 2020 are used to analyse the data.

    Scenario: Marketing team needs to design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ.

    Objective: The main objective of this case study, is to understand the customer usage patterns and the breakdown of customers, based on their subscription status and the average durations of the rental bike usage.

    Introduction to Data: The Data provided for this project, is adhered to the data usage license, laid down by the source company. The source data was provided in the CSV files and are month and quarter breakdowns. A total of 13 columns of data was provided in each csv file.

    The following are the columns, which were initially observed across the datasets.

    Ride_id Ride_type Start_station_name Start_station_id End_station_name End_station_id Usertype Start_time End_time Start_lat Start_lng End_lat End_lng

    Documentation, Cleaning and Preparing Data for Analysis: The total size of the datasets, for the year 2020, is approximately 450 MB, which is tiring job, when you have to upload them to the SQL database and visualize using the BI tools. I wanted to improve my skills into R environment and this is the best opportunity and optimal to use R for the data analysis.

    For more insights, installation procedures for R and RStudio, please refer to the following URL, for additional information.

    R Projects Document: https://www.r-project.org/other-docs.html RStudio Download: https://www.rstudio.com/products/rstudio/ Installation Guide: https://www.youtube.com/watch?v=TFGYlKvQEQ4

  12. Seattle Airbnb Open Data - SQL Project

    • kaggle.com
    zip
    Updated Jul 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharmaine Wong (2024). Seattle Airbnb Open Data - SQL Project [Dataset]. https://www.kaggle.com/datasets/swsw1717/seatle-airbnb-open-data-sql-project
    Explore at:
    zip(60054635 bytes)Available download formats
    Dataset updated
    Jul 31, 2024
    Authors
    Sharmaine Wong
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Seattle
    Description

    Context Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Seattle, WA.

    Content The following Airbnb activity is included in this Seattle dataset:

    • Listings, including full descriptions and average review score
    • Reviews, including unique id for each reviewer and detailed comments
    • Calendar, including listing id and the price and availability for that day

    Inspiration - Can you describe the vibe of each Seattle neighborhood using listing descriptions? - What are the busiest times of the year to visit Seattle? By how much do prices spike? - Is there a general upward trend of both new Airbnb listings and total Airbnb visitors to Seattle?

  13. The_Real_Estate_Project

    • kaggle.com
    zip
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CraigAS (2023). The_Real_Estate_Project [Dataset]. https://www.kaggle.com/datasets/craigas/project-cleaning-script
    Explore at:
    zip(8515515 bytes)Available download formats
    Dataset updated
    Mar 8, 2023
    Authors
    CraigAS
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This data analytics project utilized SQL and Tableau to analyze and model real estate prices in Georgia. The data was cleaned and transformed in SQL, and visualizations were created in Tableau to identify key trends and patterns. A linear regression model was developed to predict property prices based on given features, and the model was validated using statistical metrics. The results were presented in an interactive dashboard, enabling users to explore the data and make informed decisions related to real estate investments in Georgia.

    Thanks to the orginal authors of this dataset, which was co-produced by Guenter Roehrich and Jordan, who produced a dataset of real estate listings for Georgia for the first 6 months of 2021.

    For visualizations related to this project, click the tableau link in my bio or visit tableau public.

  14. COVID-19 - SQL Project

    • kaggle.com
    zip
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharmaine Wong (2024). COVID-19 - SQL Project [Dataset]. https://www.kaggle.com/datasets/swsw1717/covid-19-sql-project
    Explore at:
    zip(13220606 bytes)Available download formats
    Dataset updated
    Jul 30, 2024
    Authors
    Sharmaine Wong
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset includes information on the number of confirmed deaths from COVID-19, showing the total impact of the pandemic on mortality globally. The Our World in Data COVID-19 dataset is open-source, updated daily, and can be found here.

    SQL Queries for Data Exploration can be found on this Github Repository.

    Covid Dashboard created can be found on this Tableau Public Page.

  15. Employee Database for SQL Case Study

    • kaggle.com
    zip
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riddhi N Divecha (2025). Employee Database for SQL Case Study [Dataset]. https://www.kaggle.com/datasets/riddhindivecha/employee-database-for-sql-case-study/code
    Explore at:
    zip(890 bytes)Available download formats
    Dataset updated
    Jun 21, 2025
    Authors
    Riddhi N Divecha
    Description

    SQL Case Study Project: Employee Database Analysis 📊

    I recently completed a comprehensive SQL project involving a simulated employee database with multiple tables:

    • 🏢 DEPARTMENT
    • 👨‍💼 EMPLOYEE
    • 💼 JOB
    • 🌍 LOCATION

    In this project, I practiced and applied a wide range of SQL concepts:

    
✅ Simple Queries 
✅ Filtering with WHERE conditions 
✅ Sorting with ORDER BY 
✅ Aggregation using GROUP BY and HAVING 
✅ Multi-table JOINs
 ✅ Conditional Logic using CASE 
✅ Subqueries and Set Operators

    💡 Key Highlights:

    • Salary grade classifications
    • Department-level insights
    • Employee trends based on hire dates
    • Advanced queries like Nth highest salary

    🛠️ Tools Used:
 Azure Data Studio

    📂 You can find the entire project and scripts here:


    👉 https://github.com/RiddhiNDivecha/Employee-Database-Analysis

    This project helped me sharpen my SQL skills and understand business logic more deeply in a practical context.

    💬 I’m open to feedback and happy to connect with fellow data enthusiasts!

    SQL #DataAnalytics #PortfolioProject #CaseStudy #LearningByDoing #DataScience #SQLProject

  16. IMDB Movies Analysis - SQL

    • kaggle.com
    zip
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav B R (2023). IMDB Movies Analysis - SQL [Dataset]. https://www.kaggle.com/datasets/gauravbr/imdb-movies-data-erd
    Explore at:
    zip(3818401 bytes)Available download formats
    Dataset updated
    Feb 21, 2023
    Authors
    Gaurav B R
    Description

    SQL IMDB Movies Analysis for RSVP (Film Production Company)

    RSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.

    The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.

    For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.

  17. SQL Data Cleaning & EDA Project

    • kaggle.com
    zip
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bilal424 (2024). SQL Data Cleaning & EDA Project [Dataset]. https://www.kaggle.com/datasets/bilal424/sql-data-cleaning-and-eda-project/code
    Explore at:
    zip(5352 bytes)Available download formats
    Dataset updated
    Oct 15, 2024
    Authors
    Bilal424
    Description

    This dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.

  18. Bike Warehouse SQL Project

    • kaggle.com
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Safae Ahb (2025). Bike Warehouse SQL Project [Dataset]. https://www.kaggle.com/datasets/safaeahb/bike-warehouse-sql-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Safae Ahb
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    SAP Bikes Sales : SQL Project

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fdd8e395e5d70bde9279f0f653b4bc2bf%2FGemini_Generated_Image_cvz71ncvz71ncvz7.jpg?generation=1736783649344014&alt=media" alt=""> This project involves analyzing and transforming data from a bike warehouse database using SQL. The goal is to clean, transform, and query the data to generate insights about products, employees, customers, sales, and trends.

    Overview

    The SAP Bikes Sales database contains various tables that represent business data for a bike warehouse, such as information on products, sales, employees, business partners, and more. This project focuses on cleaning and transforming data, optimizing database schema, and generating SQL queries to gain business insights.

    Key SQL Operations:

    1.**Data Cleaning & Transformation**: - Remove duplicate records from key tables. - Drop unnecessary columns and handle null values. - Populate new columns based on existing data. - Merge related tables to create new insights. 2.**Business Insights Queries**: - Top-selling Products: Identify products with the highest sales quantities and total revenue. - Sales Performance by Product Category: Analyze revenue and order counts by product category. - Employee Sales Performance: Track employees' contribution to sales volumes and revenue. - Customer Segmentation: Examine the number of orders placed by business partners and their total sales value. - Sales Trends: Analyze sales trends over time and calculate average order values.

    Tables Involved

    • Addresses: Contains information about addresses: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F1a5b39b4f402dfce31ea25d6d53c2f38%2FAdresses%20Table.PNG?generation=1736780543250265&alt=media" alt="">
    • BusinessPartners: Contains details about business partners: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F416a9be40526650a4167dfdc565dfbe6%2FBusinessPartners%20Table.PNG?generation=1736780656503685&alt=media" alt="">
    • Employees: Contains employee information: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F5b99849bde2bc052cc1d6cc7d52fb67d%2FEmployees%20Table.PNG?generation=1736780677194831&alt=media" alt="">
    • ProductCategories & ProductCategoryText: Describe product categories and their descriptions: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F928f9aeb937c2fdc8d8860cc8d23f9d7%2FProductCategories%20Table.PNG?generation=1736780784495223&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fe148078e53777ca1180c5adf6cec7dda%2FProductCategory%20Text%20Table.PNG?generation=1736780831995071&alt=media" alt="">
    • Products & ProductTexts: Contain product details and product descriptions: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fdd4eb334332ec5d9248ccb8b737dd2df%2FProducts%20Table.PNG?generation=1736780894684724&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Faceb93b69951b1fde1f46bac146a9aa0%2FProductTexts%20Table.PNG?generation=1736782044055973&alt=media" alt="">
    • SalesOrderItems: Contains details of individual items within a sales order: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2Fe0ba42ddc00634ce1728e013dbeb231c%2FSalesOrderItemsTable.PNG?generation=1736781074515668&alt=media" alt="">
    • SalesOrders: Contains information about sales orders: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22744129%2F0a67d6ba6ded143676db9f0b4f4dfcb0%2FSalesOrders%20Table.PNG?generation=1736781089531236&alt=media" alt="">

    Key SQL Queries

    1. Data Cleaning and Transformation:

    -**Addresses Table**: -Checking for duplicates ADDRESSID. -**BusinessPartners Table**: -Handled duplicates, missing or incorrect data. -Dropped the unnecessary FAXNUMBER column because it was empty. -**Employee Table**: -Dropped unnecessary columns. -Populated NAME_INITIALS based on employee's first, middle, and last name initials. -Fixed column type issues. -**Product Categories and Product Texts**: -Merged ProductCategories and ProductCategoryText tables into a new CombinedProductCategories table for easy analysis. -**Products Table**: -Dropped irrelevant columns such as WIDTH, DEPTH, HEIGHT, etc. -**Sales Order Items Table**: -Fixed null values in GROSSAMOUNT and created a TOTALGROSSAMOUNT column to track sales volume.

    ###2. Database Diagram and Relationships In addition to the data cleaning and analysis, a database diagram has been create...

  19. SQL PROJECT

    • kaggle.com
    zip
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SHAW RICK (2024). SQL PROJECT [Dataset]. https://www.kaggle.com/datasets/shawrick/sql-project
    Explore at:
    zip(69397 bytes)Available download formats
    Dataset updated
    Jul 27, 2024
    Authors
    SHAW RICK
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a collection of SQL scripts and techniques developed by business data analyst to assist with data optimization and cleaning tasks. The scripts cover a range of data management operations, including:

    1) Data cleansing: Identifying and addressing issues such as missing values, duplicate records, formatting inconsistencies, and outliers. 2) Data normalization: Designing optimized database schemas and normalizing data structures to minimize redundancy and improve data integrity. 3) Data transformation and ETL: Developing efficient Extract, Transform, and Load (ETL) pipelines to integrate data from multiple sources and perform complex data transformations. 4) Reporting and dashboarding: Creating visually appealing and insightful reports, dashboards, and data visualizations to support informed decision-making.

    The scripts and techniques in this dataset are tailored to the needs of business data analysts and can be used to enhance the quality, efficiency, and value of data-driven insights.

  20. Music Store Analysis

    • kaggle.com
    zip
    Updated May 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Apar Negi (2024). Music Store Analysis [Dataset]. https://www.kaggle.com/datasets/aparnegi/music-store-analysis/discussion
    Explore at:
    zip(963345 bytes)Available download formats
    Dataset updated
    May 10, 2024
    Authors
    Apar Negi
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    *Data Analysis Project done using MS SQL. *Data Analysis of Music Store has been done using various queries in order to return specific data according to what is required of the question. *3 sets - according to difficulty and complexity of the query.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ann Truong (2023). Housing - SQL Project [Dataset]. https://www.kaggle.com/datasets/bvanntruong/housing-sql-project
Organization logo

Housing - SQL Project

Nashville housing information

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ann Truong
Description

This dataset contains information about housing sales in Nashville, TN such as property, owner, sales, and tax information. The SQL queries I created for Data Cleaning can be found here.

Search
Clear search
Close search
Google apps
Main menu