55 datasets found

SQL Data Cleaning & EDA Project
kaggle.com
zip
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bilal424 (2024). SQL Data Cleaning & EDA Project [Dataset]. https://www.kaggle.com/datasets/bilal424/sql-data-cleaning-and-eda-project/code
Explore at:
zip(5352 bytes)Available download formats
Dataset updated
Oct 15, 2024
Authors
Bilal424
Description
This dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.
cyclistic_data_analysis_sql_script
kaggle.com
zip
Updated Sep 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Usen (2023). cyclistic_data_analysis_sql_script [Dataset]. https://www.kaggle.com/mikeusen/cyclistic-data-analysis-sql-script
Explore at:
zip(2001 bytes)Available download formats
Dataset updated
Sep 24, 2023
Authors
Michael Usen
Description
Dataset

This dataset was created by Michael Usen

Contents
SQL Data Exploration COVID Portfolio V1
kaggle.com
zip
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Hurairah (2023). SQL Data Exploration COVID Portfolio V1 [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/covid-portfolio-project-sql-v1
Explore at:
zip(61483158 bytes)Available download formats
Dataset updated
Jun 16, 2023
Authors
Mohammad Hurairah
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Data exploration, cleaning, and arrangement with Covid Death and Covid Vaccination which is involved:

Data that going to be using

Shows the likelihood of dying if you contract covid in your country

Show what percentage of the population got Covid

Looking at Countries with the Highest Infection Rate compared to the Population

Showing the Country with the Highest Death Count per Population

Break things down by continent

Continents with the Highest death count per population

Looking at Total Population vs Vaccinations

Used CTE and Temp Table

Creating View to store data for later visualizations
Seattle Airbnb Open Data - SQL Project
kaggle.com
zip
Updated Jul 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharmaine Wong (2024). Seattle Airbnb Open Data - SQL Project [Dataset]. https://www.kaggle.com/datasets/swsw1717/seatle-airbnb-open-data-sql-project
Explore at:
zip(60054635 bytes)Available download formats
Dataset updated
Jul 31, 2024
Authors
Sharmaine Wong
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Seattle
Description
Context Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Seattle, WA.

Content The following Airbnb activity is included in this Seattle dataset:

Listings, including full descriptions and average review score

Reviews, including unique id for each reviewer and detailed comments

Calendar, including listing id and the price and availability for that day

Inspiration - Can you describe the vibe of each Seattle neighborhood using listing descriptions? - What are the busiest times of the year to visit Seattle? By how much do prices spike? - Is there a general upward trend of both new Airbnb listings and total Airbnb visitors to Seattle?
Superstore Sales EDA - Nawaf Alzzeer
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nawaf Alzeer (2025). Superstore Sales EDA - Nawaf Alzzeer [Dataset]. https://www.kaggle.com/datasets/nawafalzeer/superstore-sales-eda-nawaf-alzzeer
Explore at:
zip(809072 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Nawaf Alzeer
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Complete data engineering project on 4 years (2014-2017) of retail sales transactions.

DATASET CONTENTS: - Original denormalized data (9,994 rows) - Normalized database: 4 tables (customers, orders, products, sales) - 9 SQL analysis files organized by phase - Complete EDA from data cleaning to business insights

DATABASE TABLES: - customers: 793 records - orders: 4,931 records
- products: 1,812 records - sales: 9,686 transactions

KEY FINDINGS: - Low profitability: 12.44% margin (below industry standard) - Discount problem: 50%+ transactions have 20%+ discounts - Loss-making: 18.66% of transactions lose money - Furniture crisis: Only 2.31% margin - Small baskets: Only 1.96 items per order

SQL SKILLS DEMONSTRATED: ✓ Window functions (ROW_NUMBER, PARTITION BY) ✓ Database normalization (3NF) ✓ Complex JOINs (3-4 tables) ✓ Data deduplication with CTEs ✓ Business analytics queries ✓ CASE statements and aggregations

PERFECT FOR: - SQL practice (beginner to advanced) - Database normalization learning - EDA methodology study - Business analytics projects - Data engineering portfolios

FILES INCLUDED: - 5 CSV files (original + 4 normalized tables) - 9 SQL query files (cleaning, migration, analysis)

Author: Nawaf Alzzeer License: CC BY-SA 4.0
Bellabeat Case Study Supplement
kaggle.com
zip
Updated Oct 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Britta Smith (2022). Bellabeat Case Study Supplement [Dataset]. https://www.kaggle.com/datasets/brittasmith/bellabeat-casestudy-sql-tableau-excel
Explore at:
zip(65670 bytes)Available download formats
Dataset updated
Oct 28, 2022
Authors
Britta Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data, clean data, and SQL query output tables as spreadsheets to support Tableau story and github repository available at https://github.com/brittabeta/Bellabeat-Case-Study-SQL-Excel-Tableau
The_Real_Estate_Project
kaggle.com
zip
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CraigAS (2023). The_Real_Estate_Project [Dataset]. https://www.kaggle.com/datasets/craigas/project-cleaning-script
Explore at:
zip(8515515 bytes)Available download formats
Dataset updated
Mar 8, 2023
Authors
CraigAS
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This data analytics project utilized SQL and Tableau to analyze and model real estate prices in Georgia. The data was cleaned and transformed in SQL, and visualizations were created in Tableau to identify key trends and patterns. A linear regression model was developed to predict property prices based on given features, and the model was validated using statistical metrics. The results were presented in an interactive dashboard, enabling users to explore the data and make informed decisions related to real estate investments in Georgia.

Thanks to the orginal authors of this dataset, which was co-produced by Guenter Roehrich and Jordan, who produced a dataset of real estate listings for Georgia for the first 6 months of 2021.

For visualizations related to this project, click the tableau link in my bio or visit tableau public.
COVID-19 - SQL Project
kaggle.com
zip
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharmaine Wong (2024). COVID-19 - SQL Project [Dataset]. https://www.kaggle.com/datasets/swsw1717/covid-19-sql-project
Explore at:
zip(13220606 bytes)Available download formats
Dataset updated
Jul 30, 2024
Authors
Sharmaine Wong
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset includes information on the number of confirmed deaths from COVID-19, showing the total impact of the pandemic on mortality globally. The Our World in Data COVID-19 dataset is open-source, updated daily, and can be found here.

SQL Queries for Data Exploration can be found on this Github Repository.

Covid Dashboard created can be found on this Tableau Public Page.
(Sunset)📒 Meta Kaggle ported to MS SQL SERVER
kaggle.com
zip
Updated Mar 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). (Sunset)📒 Meta Kaggle ported to MS SQL SERVER [Dataset]. https://www.kaggle.com/datasets/bwandowando/meta-kaggle-ported-to-sql-server-2022-database
Explore at:
zip(8635902534 bytes)Available download formats
Dataset updated
Mar 20, 2024
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.

MSSQL VERSION: SQL Server 2022

Collation: SQL_Latin1_General_CP1_CI_AS

Recovery model: simple

Requirements

Download and install the SQL SERVER 2022 Developer edition here

Download the backup file

Restore the backup file into your local. If you havent done this before, it's easy and straightforward. Here is a guide.

(QUOTED FROM THE ORIGINAL DATASET)

Meta Kaggle

Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">

Notes

I repeat, I just ported the dataset. All credits to Kaggle for the amazing source dataset.

Cover image from https://picryl.com/media/space-earth-bug-ce3ca6
🖼️ Famous Paintings
kaggle.com
zip
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2023). 🖼️ Famous Paintings [Dataset]. https://www.kaggle.com/datasets/mexwell/famous-paintings
Explore at:
zip(6681482 bytes)Available download formats
Dataset updated
Oct 5, 2023
Authors
mexwell
Description
Famous paintings and their artists. This data set is published to help students have interesting data to practice SQL

Original Data

Acknowlegement

Foto von Steve Johnson auf Unsplash
Company Product Sales Analysis & BI Report
kaggle.com
zip
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oluwabori Abiodun-Johnson (2023). Company Product Sales Analysis & BI Report [Dataset]. https://www.kaggle.com/datasets/oluwaboriaj/pizza-company-sales-bi-report
Explore at:
zip(15967889 bytes)Available download formats
Dataset updated
Oct 25, 2023
Authors
Oluwabori Abiodun-Johnson
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is a self-guided project.

PROBLEM STATEMENT: What underlying trends could the company be missing out on in our Pizza Sales data that can aid in gap analysis of its business sales.

OBJECTIVES: 1. Generate Key Performance Indicators (KPIs) of the Pizza Sales data for insight gain into underlying business performance. 2. Visualize important aspects of the Pizza Sales data to gain insight and understand key trends\

I dived into the csv dataset to uncover patterns within the Pizza Sales data which spanned across a calendar.

Used Microsoft SQL SMSS to perform EDA (Exploratory Data Analysis); ergo, identifying trends and sales patterns.

Having completed that, I used the Microsoft Power BI to create a visualization as a means to visually represent of my analytical findings to technical and non-technical viewers.

STEPS COMPLETED: Data Importation SQL Data analysis query writing Data Cleaning Data Processing Data Visualization Report/Dashboard Development
Bank Transaction Analytics Dashboard – SQL + Excel
kaggle.com
zip
Updated Aug 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prachi Singh (2025). Bank Transaction Analytics Dashboard – SQL + Excel [Dataset]. https://www.kaggle.com/datasets/prachisingh29ds/bank-transaction-analytics-dashboard-sql-excel
Explore at:
zip(2856220 bytes)Available download formats
Dataset updated
Aug 18, 2025
Authors
Prachi Singh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📊 Bank Transaction Analytics Dashboard – SQL + Excel

🔹 Overview

This project focuses on Bank Transaction Analysis using a combination of SQL scripts and Excel dashboards. The goal is to provide insights into customer spending patterns, payment modes, suspicious transactions, and overall financial trends.

The dataset and analysis files can help learners and professionals understand how SQL and Excel can be used together for business decision-making, customer behavior tracking, and data-driven insights.

🔹 Contents

The dataset includes the following resources:

📂 SQL Scripts:

Create & Insert tables

15 Basic Queries

15 Advanced Queries

📂 CSV File:

Bank Transaction Analytics.csv (main dataset)

📂 Excel Charts:

Pie, Bar, Column, Line, Doughnut charts

Final Interactive Dashboard

📂 Screenshots:

Query outputs, Charts, and Final Dashboard visualization

📂 PDF Reports:

Project Report

Dashboard Report

📄 README.md:

Complete documentation and step-by-step explanation

🔹 Key Insights

26–35 age group spent the most across categories.

Amazon identified as the top merchant.

NetBanking showed the highest share compared to POS/UPI.

Travel & Shopping emerged as dominant categories.

🔹 Applications

Detecting suspicious transactions.

Understanding customer behavior.

Identifying top merchants and categories.

Building business intelligence dashboards.

🔹 How to Use

Download the dataset and SQL scripts.

Run Bank_Transaction_Analytics.SQL to create and insert data.

Execute the queries (Basic + Advanced) for insights.

Open Excel files to explore interactive charts and dashboards.

Refer to Project Report PDF for documentation.

🔹 Author

👩‍💻 Created by: Prachi Singh

GitHub: Bank Transaction Analytics Dashboard(https://github.com/prachi-singh-ds/Bank-Transaction-Analytics-Dashboard)

⚡This project is a complete SQL + Excel integration case study and is suitable for Data Science, Business Analytics, and Data Engineering portfolios.
Zomato - AI Generated Data
kaggle.com
zip
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satyavathi Gunturi (2025). Zomato - AI Generated Data [Dataset]. https://www.kaggle.com/datasets/satya918g/zomato-ai-generated-data/code
Explore at:
zip(135823 bytes)Available download formats
Dataset updated
Apr 27, 2025
Authors
Satyavathi Gunturi
Description
This AI-generated dataset simulates food delivery platform data, featuring users, restaurants, orders, delivery times, visits, and referrals. Ideal for practicing advanced SQL analytics, anomaly detection, customer behavior analysis, and business insights
Adventure Works DW 2008
kaggle.com
zip
Updated Oct 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Vasanth (2024). Adventure Works DW 2008 [Dataset]. https://www.kaggle.com/datasets/jamesvasanth/adventure-works-dw-2008
Explore at:
zip(9400055 bytes)Available download formats
Dataset updated
Oct 5, 2024
Authors
James Vasanth
Description
The AdventureWorks DW 2008 dataset, originally provided by Microsoft, has been converted into CSV files for easier use, making it accessible for data exploration on platforms like Kaggle. The dataset is licensed under the Microsoft Public License (MS-PL), which is a permissive open-source license. This means you are free to use, modify, and share the dataset, whether for personal or commercial purposes, provided that you include the original license terms. However, it's important to note that the dataset is provided "as-is" without any warranty or guarantee from Microsoft.

I really enjoy working with the AdventureWorks DW 2008 dataset. It offers a rich and well-structured environment that's perfect for writing and learning SQL queries. The data warehouse includes a variety of tables, such as facts and dimensions, making it an excellent resource for both beginners and experienced SQL users to practice querying and exploring relational databases.

Now, with the dataset available in CSV format, it can be easily used with Python for exploratory data analysis (EDA), and it’s also well-suited for applying machine learning techniques such as regression, classification, and clustering.

If you’re planning to dive into the data, all the best! It's a fantastic resource to learn from and experiment with. Cheers!
Indian students data for data analysis Practice
kaggle.com
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satish Dhawale (2024). Indian students data for data analysis Practice [Dataset]. https://www.kaggle.com/datasets/satishdhawle/indian-students-data-for-data-analysis-practice
Explore at:
zip(849129 bytes)Available download formats
Dataset updated
Jan 9, 2024
Authors
Satish Dhawale
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The data appears to be related to student order and payment information, including details like student names, order IDs, courses enrolled, payment status, and more. this data is for practice on sql queries, it is helpful for data analysis student to make visualisation on the data. data is provided by Skill course the E- Learning platform by Satish Dhawale

WWW.SKILLCOURSE.IN

The file "Indian_Students_Data.csv" contains the following information:

****Columns are :****

srno order_id student_name
payment_date course_name price payment_status payment_id email state
eCommerce Transactions
kaggle.com
zip
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chad Wambles (2025). eCommerce Transactions [Dataset]. https://www.kaggle.com/datasets/chadwambles/ecommerce-transactions
Explore at:
zip(245430 bytes)Available download formats
Dataset updated
Jan 3, 2025
Authors
Chad Wambles
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This data set is perfect for practicing your analytical skills for Power BI, Tableau, Excel, or transform it into a CSV to practice SQL.

This use case mimics transactions for a fictional eCommerce website named EverMart Online. The 3 tables in this data set are all logically connected together with IDs.

My Power BI Use Case Explanation - Using Microsoft Power BI, I made dynamic data visualizations for revenue reporting and customer behavior reporting.

Revenue Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total Sales, Product Sales, or Categorical Sales. - Line Graph Visual that shows Total Revenue by Month of the entire year. This graph also changes to calculate Total Revenue by Month for the Total Sales by Product and Total Sales by Category if selected. - Bar Graph Visual showcasing Total Sales by Product. - Donut Chart Visual showcasing Total Sales by Category of Product.

Customer Behavior Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total or by continent selected on the map. - Interactive Map Visual showing key statistics for the continent selected. - The key statistics are presented on the tool tip when you select a continent, and the following statistics show for that continent: - Continent Name - Customer Total - Percentage of Products Sold - Percentage of Total Customers - Percentage of Total Transactions - Percentage of Total Revenue
Netflix Analysis
kaggle.com
zip
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sahibjotchandla (2025). Netflix Analysis [Dataset]. https://www.kaggle.com/datasets/sahibjotchandla/netflixdata
Explore at:
zip(1401547 bytes)Available download formats
Dataset updated
Jan 14, 2025
Authors
sahibjotchandla
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This project shows SQL scripts and a cleaned dataset to explore Netflix's extensive catalog of over 8,000 movies and TV shows

SQL queries were crafted to uncover a range of insights, including:

The distribution of Movies vs. TV Shows

Top genres and categories across different content types

Trends in content by country, release year, and rating

Contributions of prominent directors and actors to Netflix's library

This project highlights essential skills in data cleaning, SQL querying, and exploratory data analysis. The results provide valuable insights into Netflix’s content trends, diversity, and evolution, making it a great resource for anyone interested in data-driven storytelling.
Healthcare Fraud Detection Dataset
kaggle.com
zip
Updated Mar 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishal Jaiswal (2025). Healthcare Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/jaiswalmagic1/healthcare-fraud-detection-dataset
Explore at:
zip(10427537 bytes)Available download formats
Dataset updated
Mar 6, 2025
Authors
Vishal Jaiswal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains comprehensive synthetic healthcare data designed for fraud detection analysis. It includes information on patients, healthcare providers, insurance claims, and payments. The dataset is structured to mimic real-world healthcare transactions, where fraudulent activities such as false claims, overbilling, and duplicate charges can be identified through advanced analytics.

The dataset is suitable for practicing SQL queries, exploratory data analysis (EDA), machine learning for fraud detection, and visualization techniques. It is designed to help data analysts and data scientists develop and refine their analytical skills in the healthcare insurance domain.

Dataset Overview The dataset consists of four CSV files:

Patients Data (patients.csv)

Contains demographic details of patients, such as age, gender, insurance type, and location. Can be used to analyze patient demographics and healthcare usage patterns. Providers Data (providers.csv)

Contains information about healthcare providers, including provider ID, specialty, location, and associated hospital.

Useful for identifying fraudulent claims linked to specific providers or hospitals. Claims Data (claims.csv)

Contains records of insurance claims made by patients, including diagnosis codes, treatment details, provider ID, and claim amount.

Can be analyzed for suspicious patterns, such as excessive claims from a single provider or duplicate claims for the same patient.

Payments Data (payments.csv) Contains details of claim payments made by insurance companies, including payment amount, claim ID, and reimbursement status.

Helps in detecting discrepancies between claims and actual reimbursements. Possible Analysis Ideas

This dataset allows for multiple analysis approaches, including but not limited to:

🔹 Fraud Detection: Identify patterns in claims data to detect fraudulent activities (e.g., excessive billing, duplicate claims). 🔹 Provider Behavior Analysis: Analyze providers who have an unusually high claim volume or high rejection rates. 🔹 Payment Trends: Compare claims vs. payments to find irregularities in reimbursement patterns. 🔹 Patient Demographics & Utilization: Explore which patient groups are more likely to file claims and receive reimbursements. 🔹 SQL Query Practice: Perform advanced SQL queries, including joins, aggregations, window functions, and subqueries, to extract insights from the data.

Use Cases Practicing SQL queries for job interviews and real-world projects. Learning data cleaning, data wrangling, and feature engineering for healthcare analytics. Applying machine learning techniques for fraud detection. Gaining insights into the healthcare insurance domain and its challenges.

License & Usage License: CC0 Public Domain (Free to use for any purpose).

Attribution: Not required but appreciated. Intended Use: This dataset is for educational and research purposes only.

This dataset is an excellent resource for aspiring data analysts, data scientists, and SQL learners who want to gain hands-on experience in healthcare fraud detection.
Indeed - Data Science
kaggle.com
zip
Updated Aug 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cormac42 (2024). Indeed - Data Science [Dataset]. https://www.kaggle.com/datasets/cormac42/indeed-data-science
Explore at:
zip(6243501 bytes)Available download formats
Dataset updated
Aug 16, 2024
Authors
Cormac42
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset was scraped from Indeed during the summer of 2024, focusing on the search term 'data scientist.' The data encompasses job listings from every state in the USA, including remote positions, providing a comprehensive snapshot of the data science job market during this period.

Working with this dataset involves a variety of skills that can help students gain valuable experience in data analysis, visualization, and interpretation. Some skills that could be practiced using this data:

Data Cleaning and Preprocessing

Exploratory Data Analysis (EDA)

Data Visualization

Text Analysis and Natural Language Processing (NLP)

SQL and Database Management

Geospatial Analysis

Machine Learning
99 Little Orange, Technical Business Case
kaggle.com
zip
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IVAN CHAVEZ (2022). 99 Little Orange, Technical Business Case [Dataset]. https://www.kaggle.com/datasets/ivanchvez/99littleorange
Explore at:
zip(91998345 bytes)Available download formats
Dataset updated
Jun 13, 2022
Authors
IVAN CHAVEZ
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
99 Little Orange, Technical Business Case

Dear candidate, we are so excited with your interest in working with us! This challenge is an opportunity for us to know a bit of the great talent we know you have. It was built to simulate real-case scenarios that you would face while working at [Organization] and is organized in 2 parts:

A technical part of close-ended questions with specific answers that are meant to assess your ability to analyze large amounts of data with SQL to answer key questions.

An analytical part of open-ended questions to assess your ability to build data-backed recommendations to support decision-making. Expect further questions and discussions on top of your answers in the next phase of our hiring process.

Part I - Technical Provide both the answer and the SQL code used. 1. What is the average trip cost of holidays? How does it compare to non-holidays? 2. Find the average call time of the first time passengers make a trip. 3. Find the average number of trips per driver for every week day. 4. Which day of the week drivers usually drive the most distance on average? 5. What was the growth percentage of rides month over month? 6. Optional. List the top 5 drivers per number of trips in the top 5 largest cities.

Part II - Analytical 99 is a marketplace, where drivers are the supply and passengers, the demand. One of our main challenges is to keep this marketplace balanced. If there's too much demand, prices would increase due to surge and passengers would prefer not to run. If there's too much supply, drivers would spend more time idle impacting their revenue. 1. Let's say it's 2019-09-23 and a new Operations manager for The Shire was just hired. She has 5 minutes during the Ops weekly meeting to present an overview of the business in the city, and since she's just arrived, she asked your help to do it. What would you prepare for this 5 minutes presentation? Please provide 1-2 slides with your idea. 2. She also mentioned she has a budget to invest in promoting the business. What kind of metrics and performance indicators would you use in order to help her decide if she should invest it into the passenger side or the driver side? Extra point if you provide data-backed recommendations. 3. One month later, she comes back, super grateful for all the helpful insights you have given her. And says she is anticipating a driver supply shortage due to a major concert that is going to take place the next day and also a 3 day city holiday that is coming the next month. What would you do to help her analyze the best course of action to either prevent or minimize the problem in each case? 4. Optional. We want to build up a model to predict “Possible Churn Users” (e.g.: no trips in the past 4 weeks). List all features that you can think about and the data mining or machine learning model or other methods you may use for this case.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bilal424 (2024). SQL Data Cleaning & EDA Project [Dataset]. https://www.kaggle.com/datasets/bilal424/sql-data-cleaning-and-eda-project/code

SQL Data Cleaning & EDA Project

Explore at:

zip(5352 bytes)Available download formats

Dataset updated

Oct 15, 2024

Authors

Bilal424

Description

This dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.

Clear search

Close search

Google apps

Main menu

SQL Data Cleaning & EDA Project

cyclistic_data_analysis_sql_script

Dataset

Contents

SQL Data Exploration COVID Portfolio V1

Seattle Airbnb Open Data - SQL Project

Superstore Sales EDA - Nawaf Alzzeer

Bellabeat Case Study Supplement

The_Real_Estate_Project

COVID-19 - SQL Project

(Sunset)📒 Meta Kaggle ported to MS SQL SERVER

Context

Requirements

(QUOTED FROM THE ORIGINAL DATASET)

Meta Kaggle

Notes

🖼️ Famous Paintings

Acknowlegement

Company Product Sales Analysis & BI Report

Bank Transaction Analytics Dashboard – SQL + Excel

Zomato - AI Generated Data

Adventure Works DW 2008

Indian students data for data analysis Practice

eCommerce Transactions

Netflix Analysis

Healthcare Fraud Detection Dataset

Indeed - Data Science

99 Little Orange, Technical Business Case

99 Little Orange, Technical Business Case

SQL Data Cleaning & EDA Project