Facebook
TwitterThis dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.
Facebook
TwitterThis dataset was created by Michael Usen
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data exploration, cleaning, and arrangement with Covid Death and Covid Vaccination which is involved:
Data that going to be using
Shows the likelihood of dying if you contract covid in your country
Show what percentage of the population got Covid
Looking at Countries with the Highest Infection Rate compared to the Population
Showing the Country with the Highest Death Count per Population
Break things down by continent
Continents with the Highest death count per population
Looking at Total Population vs Vaccinations
Used CTE and Temp Table
Creating View to store data for later visualizations
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Seattle, WA.
Content The following Airbnb activity is included in this Seattle dataset:
Inspiration - Can you describe the vibe of each Seattle neighborhood using listing descriptions? - What are the busiest times of the year to visit Seattle? By how much do prices spike? - Is there a general upward trend of both new Airbnb listings and total Airbnb visitors to Seattle?
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Complete data engineering project on 4 years (2014-2017) of retail sales transactions.
DATASET CONTENTS: - Original denormalized data (9,994 rows) - Normalized database: 4 tables (customers, orders, products, sales) - 9 SQL analysis files organized by phase - Complete EDA from data cleaning to business insights
DATABASE TABLES:
- customers: 793 records
- orders: 4,931 records
- products: 1,812 records
- sales: 9,686 transactions
KEY FINDINGS: - Low profitability: 12.44% margin (below industry standard) - Discount problem: 50%+ transactions have 20%+ discounts - Loss-making: 18.66% of transactions lose money - Furniture crisis: Only 2.31% margin - Small baskets: Only 1.96 items per order
SQL SKILLS DEMONSTRATED: ✓ Window functions (ROW_NUMBER, PARTITION BY) ✓ Database normalization (3NF) ✓ Complex JOINs (3-4 tables) ✓ Data deduplication with CTEs ✓ Business analytics queries ✓ CASE statements and aggregations
PERFECT FOR: - SQL practice (beginner to advanced) - Database normalization learning - EDA methodology study - Business analytics projects - Data engineering portfolios
FILES INCLUDED: - 5 CSV files (original + 4 normalized tables) - 9 SQL query files (cleaning, migration, analysis)
Author: Nawaf Alzzeer License: CC BY-SA 4.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data, clean data, and SQL query output tables as spreadsheets to support Tableau story and github repository available at https://github.com/brittabeta/Bellabeat-Case-Study-SQL-Excel-Tableau
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data analytics project utilized SQL and Tableau to analyze and model real estate prices in Georgia. The data was cleaned and transformed in SQL, and visualizations were created in Tableau to identify key trends and patterns. A linear regression model was developed to predict property prices based on given features, and the model was validated using statistical metrics. The results were presented in an interactive dashboard, enabling users to explore the data and make informed decisions related to real estate investments in Georgia.
Thanks to the orginal authors of this dataset, which was co-produced by Guenter Roehrich and Jordan, who produced a dataset of real estate listings for Georgia for the first 6 months of 2021.
For visualizations related to this project, click the tableau link in my bio or visit tableau public.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset includes information on the number of confirmed deaths from COVID-19, showing the total impact of the pandemic on mortality globally. The Our World in Data COVID-19 dataset is open-source, updated daily, and can be found here.
SQL Queries for Data Exploration can be found on this Github Repository.
Covid Dashboard created can be found on this Tableau Public Page.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.
Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">
Facebook
TwitterFamous paintings and their artists. This data set is published to help students have interesting data to practice SQL
Foto von Steve Johnson auf Unsplash
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a self-guided project.
PROBLEM STATEMENT: What underlying trends could the company be missing out on in our Pizza Sales data that can aid in gap analysis of its business sales.
OBJECTIVES: 1. Generate Key Performance Indicators (KPIs) of the Pizza Sales data for insight gain into underlying business performance. 2. Visualize important aspects of the Pizza Sales data to gain insight and understand key trends\
I dived into the csv dataset to uncover patterns within the Pizza Sales data which spanned across a calendar.
Used Microsoft SQL SMSS to perform EDA (Exploratory Data Analysis); ergo, identifying trends and sales patterns.
Having completed that, I used the Microsoft Power BI to create a visualization as a means to visually represent of my analytical findings to technical and non-technical viewers.
STEPS COMPLETED: Data Importation SQL Data analysis query writing Data Cleaning Data Processing Data Visualization Report/Dashboard Development
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📊 Bank Transaction Analytics Dashboard – SQL + Excel
🔹 Overview
This project focuses on Bank Transaction Analysis using a combination of SQL scripts and Excel dashboards. The goal is to provide insights into customer spending patterns, payment modes, suspicious transactions, and overall financial trends.
The dataset and analysis files can help learners and professionals understand how SQL and Excel can be used together for business decision-making, customer behavior tracking, and data-driven insights.
🔹 Contents
The dataset includes the following resources:
📂 SQL Scripts:
Create & Insert tables
15 Basic Queries
15 Advanced Queries
📂 CSV File:
Bank Transaction Analytics.csv (main dataset)
📂 Excel Charts:
Pie, Bar, Column, Line, Doughnut charts
Final Interactive Dashboard
📂 Screenshots:
Query outputs, Charts, and Final Dashboard visualization
📂 PDF Reports:
Project Report
Dashboard Report
📄 README.md:
Complete documentation and step-by-step explanation
🔹 Key Insights
26–35 age group spent the most across categories.
Amazon identified as the top merchant.
NetBanking showed the highest share compared to POS/UPI.
Travel & Shopping emerged as dominant categories.
🔹 Applications
Detecting suspicious transactions.
Understanding customer behavior.
Identifying top merchants and categories.
Building business intelligence dashboards.
🔹 How to Use
Download the dataset and SQL scripts.
Run Bank_Transaction_Analytics.SQL to create and insert data.
Execute the queries (Basic + Advanced) for insights.
Open Excel files to explore interactive charts and dashboards.
Refer to Project Report PDF for documentation.
🔹 Author
👩💻 Created by: Prachi Singh
GitHub: Bank Transaction Analytics Dashboard(https://github.com/prachi-singh-ds/Bank-Transaction-Analytics-Dashboard)
⚡This project is a complete SQL + Excel integration case study and is suitable for Data Science, Business Analytics, and Data Engineering portfolios.
Facebook
TwitterThis AI-generated dataset simulates food delivery platform data, featuring users, restaurants, orders, delivery times, visits, and referrals. Ideal for practicing advanced SQL analytics, anomaly detection, customer behavior analysis, and business insights
Facebook
TwitterThe AdventureWorks DW 2008 dataset, originally provided by Microsoft, has been converted into CSV files for easier use, making it accessible for data exploration on platforms like Kaggle. The dataset is licensed under the Microsoft Public License (MS-PL), which is a permissive open-source license. This means you are free to use, modify, and share the dataset, whether for personal or commercial purposes, provided that you include the original license terms. However, it's important to note that the dataset is provided "as-is" without any warranty or guarantee from Microsoft.
I really enjoy working with the AdventureWorks DW 2008 dataset. It offers a rich and well-structured environment that's perfect for writing and learning SQL queries. The data warehouse includes a variety of tables, such as facts and dimensions, making it an excellent resource for both beginners and experienced SQL users to practice querying and exploring relational databases.
Now, with the dataset available in CSV format, it can be easily used with Python for exploratory data analysis (EDA), and it’s also well-suited for applying machine learning techniques such as regression, classification, and clustering.
If you’re planning to dive into the data, all the best! It's a fantastic resource to learn from and experiment with. Cheers!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The data appears to be related to student order and payment information, including details like student names, order IDs, courses enrolled, payment status, and more. this data is for practice on sql queries, it is helpful for data analysis student to make visualisation on the data. data is provided by Skill course the E- Learning platform by Satish Dhawale
The file "Indian_Students_Data.csv" contains the following information:
****Columns are :****
srno
order_id
student_name
payment_date
course_name
price
payment_status
payment_id
email
state
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data set is perfect for practicing your analytical skills for Power BI, Tableau, Excel, or transform it into a CSV to practice SQL.
This use case mimics transactions for a fictional eCommerce website named EverMart Online. The 3 tables in this data set are all logically connected together with IDs.
My Power BI Use Case Explanation - Using Microsoft Power BI, I made dynamic data visualizations for revenue reporting and customer behavior reporting.
Revenue Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total Sales, Product Sales, or Categorical Sales. - Line Graph Visual that shows Total Revenue by Month of the entire year. This graph also changes to calculate Total Revenue by Month for the Total Sales by Product and Total Sales by Category if selected. - Bar Graph Visual showcasing Total Sales by Product. - Donut Chart Visual showcasing Total Sales by Category of Product.
Customer Behavior Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total or by continent selected on the map. - Interactive Map Visual showing key statistics for the continent selected. - The key statistics are presented on the tool tip when you select a continent, and the following statistics show for that continent: - Continent Name - Customer Total - Percentage of Products Sold - Percentage of Total Customers - Percentage of Total Transactions - Percentage of Total Revenue
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This project shows SQL scripts and a cleaned dataset to explore Netflix's extensive catalog of over 8,000 movies and TV shows
SQL queries were crafted to uncover a range of insights, including:
This project highlights essential skills in data cleaning, SQL querying, and exploratory data analysis. The results provide valuable insights into Netflix’s content trends, diversity, and evolution, making it a great resource for anyone interested in data-driven storytelling.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains comprehensive synthetic healthcare data designed for fraud detection analysis. It includes information on patients, healthcare providers, insurance claims, and payments. The dataset is structured to mimic real-world healthcare transactions, where fraudulent activities such as false claims, overbilling, and duplicate charges can be identified through advanced analytics.
The dataset is suitable for practicing SQL queries, exploratory data analysis (EDA), machine learning for fraud detection, and visualization techniques. It is designed to help data analysts and data scientists develop and refine their analytical skills in the healthcare insurance domain.
Dataset Overview The dataset consists of four CSV files:
Patients Data (patients.csv)
Contains demographic details of patients, such as age, gender, insurance type, and location. Can be used to analyze patient demographics and healthcare usage patterns. Providers Data (providers.csv)
Contains information about healthcare providers, including provider ID, specialty, location, and associated hospital.
Useful for identifying fraudulent claims linked to specific providers or hospitals. Claims Data (claims.csv)
Contains records of insurance claims made by patients, including diagnosis codes, treatment details, provider ID, and claim amount.
Can be analyzed for suspicious patterns, such as excessive claims from a single provider or duplicate claims for the same patient.
Payments Data (payments.csv) Contains details of claim payments made by insurance companies, including payment amount, claim ID, and reimbursement status.
Helps in detecting discrepancies between claims and actual reimbursements. Possible Analysis Ideas
This dataset allows for multiple analysis approaches, including but not limited to:
🔹 Fraud Detection: Identify patterns in claims data to detect fraudulent activities (e.g., excessive billing, duplicate claims). 🔹 Provider Behavior Analysis: Analyze providers who have an unusually high claim volume or high rejection rates. 🔹 Payment Trends: Compare claims vs. payments to find irregularities in reimbursement patterns. 🔹 Patient Demographics & Utilization: Explore which patient groups are more likely to file claims and receive reimbursements. 🔹 SQL Query Practice: Perform advanced SQL queries, including joins, aggregations, window functions, and subqueries, to extract insights from the data.
Use Cases Practicing SQL queries for job interviews and real-world projects. Learning data cleaning, data wrangling, and feature engineering for healthcare analytics. Applying machine learning techniques for fraud detection. Gaining insights into the healthcare insurance domain and its challenges.
License & Usage License: CC0 Public Domain (Free to use for any purpose).
Attribution: Not required but appreciated. Intended Use: This dataset is for educational and research purposes only.
This dataset is an excellent resource for aspiring data analysts, data scientists, and SQL learners who want to gain hands-on experience in healthcare fraud detection.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was scraped from Indeed during the summer of 2024, focusing on the search term 'data scientist.' The data encompasses job listings from every state in the USA, including remote positions, providing a comprehensive snapshot of the data science job market during this period.
Working with this dataset involves a variety of skills that can help students gain valuable experience in data analysis, visualization, and interpretation. Some skills that could be practiced using this data:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dear candidate, we are so excited with your interest in working with us! This challenge is an opportunity for us to know a bit of the great talent we know you have. It was built to simulate real-case scenarios that you would face while working at [Organization] and is organized in 2 parts:
Part I - Technical Provide both the answer and the SQL code used. 1. What is the average trip cost of holidays? How does it compare to non-holidays? 2. Find the average call time of the first time passengers make a trip. 3. Find the average number of trips per driver for every week day. 4. Which day of the week drivers usually drive the most distance on average? 5. What was the growth percentage of rides month over month? 6. Optional. List the top 5 drivers per number of trips in the top 5 largest cities.
Part II - Analytical 99 is a marketplace, where drivers are the supply and passengers, the demand. One of our main challenges is to keep this marketplace balanced. If there's too much demand, prices would increase due to surge and passengers would prefer not to run. If there's too much supply, drivers would spend more time idle impacting their revenue. 1. Let's say it's 2019-09-23 and a new Operations manager for The Shire was just hired. She has 5 minutes during the Ops weekly meeting to present an overview of the business in the city, and since she's just arrived, she asked your help to do it. What would you prepare for this 5 minutes presentation? Please provide 1-2 slides with your idea. 2. She also mentioned she has a budget to invest in promoting the business. What kind of metrics and performance indicators would you use in order to help her decide if she should invest it into the passenger side or the driver side? Extra point if you provide data-backed recommendations. 3. One month later, she comes back, super grateful for all the helpful insights you have given her. And says she is anticipating a driver supply shortage due to a major concert that is going to take place the next day and also a 3 day city holiday that is coming the next month. What would you do to help her analyze the best course of action to either prevent or minimize the problem in each case? 4. Optional. We want to build up a model to predict “Possible Churn Users” (e.g.: no trips in the past 4 weeks). List all features that you can think about and the data mining or machine learning model or other methods you may use for this case.
Facebook
TwitterThis dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.