55 datasets found
  1. SQL Data Cleaning & EDA Project

    • kaggle.com
    zip
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bilal424 (2024). SQL Data Cleaning & EDA Project [Dataset]. https://www.kaggle.com/datasets/bilal424/sql-data-cleaning-and-eda-project/code
    Explore at:
    zip(5352 bytes)Available download formats
    Dataset updated
    Oct 15, 2024
    Authors
    Bilal424
    Description

    This dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.

  2. cyclistic_data_analysis_sql_script

    • kaggle.com
    zip
    Updated Sep 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Usen (2023). cyclistic_data_analysis_sql_script [Dataset]. https://www.kaggle.com/mikeusen/cyclistic-data-analysis-sql-script
    Explore at:
    zip(2001 bytes)Available download formats
    Dataset updated
    Sep 24, 2023
    Authors
    Michael Usen
    Description

    Dataset

    This dataset was created by Michael Usen

    Contents

  3. SQL Data Exploration COVID Portfolio V1

    • kaggle.com
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Hurairah (2023). SQL Data Exploration COVID Portfolio V1 [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/covid-portfolio-project-sql-v1
    Explore at:
    zip(61483158 bytes)Available download formats
    Dataset updated
    Jun 16, 2023
    Authors
    Mohammad Hurairah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data exploration, cleaning, and arrangement with Covid Death and Covid Vaccination which is involved:

    1. Data that going to be using

    2. Shows the likelihood of dying if you contract covid in your country

    3. Show what percentage of the population got Covid

    4. Looking at Countries with the Highest Infection Rate compared to the Population

    5. Showing the Country with the Highest Death Count per Population

    6. Break things down by continent

    7. Continents with the Highest death count per population

    8. Looking at Total Population vs Vaccinations

    9. Used CTE and Temp Table

    10. Creating View to store data for later visualizations

  4. Seattle Airbnb Open Data - SQL Project

    • kaggle.com
    zip
    Updated Jul 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharmaine Wong (2024). Seattle Airbnb Open Data - SQL Project [Dataset]. https://www.kaggle.com/datasets/swsw1717/seatle-airbnb-open-data-sql-project
    Explore at:
    zip(60054635 bytes)Available download formats
    Dataset updated
    Jul 31, 2024
    Authors
    Sharmaine Wong
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Seattle
    Description

    Context Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Seattle, WA.

    Content The following Airbnb activity is included in this Seattle dataset:

    • Listings, including full descriptions and average review score
    • Reviews, including unique id for each reviewer and detailed comments
    • Calendar, including listing id and the price and availability for that day

    Inspiration - Can you describe the vibe of each Seattle neighborhood using listing descriptions? - What are the busiest times of the year to visit Seattle? By how much do prices spike? - Is there a general upward trend of both new Airbnb listings and total Airbnb visitors to Seattle?

  5. Superstore Sales EDA - Nawaf Alzzeer

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nawaf Alzeer (2025). Superstore Sales EDA - Nawaf Alzzeer [Dataset]. https://www.kaggle.com/datasets/nawafalzeer/superstore-sales-eda-nawaf-alzzeer
    Explore at:
    zip(809072 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Nawaf Alzeer
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Complete data engineering project on 4 years (2014-2017) of retail sales transactions.

    DATASET CONTENTS: - Original denormalized data (9,994 rows) - Normalized database: 4 tables (customers, orders, products, sales) - 9 SQL analysis files organized by phase - Complete EDA from data cleaning to business insights

    DATABASE TABLES: - customers: 793 records - orders: 4,931 records
    - products: 1,812 records - sales: 9,686 transactions

    KEY FINDINGS: - Low profitability: 12.44% margin (below industry standard) - Discount problem: 50%+ transactions have 20%+ discounts - Loss-making: 18.66% of transactions lose money - Furniture crisis: Only 2.31% margin - Small baskets: Only 1.96 items per order

    SQL SKILLS DEMONSTRATED: ✓ Window functions (ROW_NUMBER, PARTITION BY) ✓ Database normalization (3NF) ✓ Complex JOINs (3-4 tables) ✓ Data deduplication with CTEs ✓ Business analytics queries ✓ CASE statements and aggregations

    PERFECT FOR: - SQL practice (beginner to advanced) - Database normalization learning - EDA methodology study - Business analytics projects - Data engineering portfolios

    FILES INCLUDED: - 5 CSV files (original + 4 normalized tables) - 9 SQL query files (cleaning, migration, analysis)

    Author: Nawaf Alzzeer License: CC BY-SA 4.0

  6. Bellabeat Case Study Supplement

    • kaggle.com
    zip
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Britta Smith (2022). Bellabeat Case Study Supplement [Dataset]. https://www.kaggle.com/datasets/brittasmith/bellabeat-casestudy-sql-tableau-excel
    Explore at:
    zip(65670 bytes)Available download formats
    Dataset updated
    Oct 28, 2022
    Authors
    Britta Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data, clean data, and SQL query output tables as spreadsheets to support Tableau story and github repository available at https://github.com/brittabeta/Bellabeat-Case-Study-SQL-Excel-Tableau

  7. The_Real_Estate_Project

    • kaggle.com
    zip
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CraigAS (2023). The_Real_Estate_Project [Dataset]. https://www.kaggle.com/datasets/craigas/project-cleaning-script
    Explore at:
    zip(8515515 bytes)Available download formats
    Dataset updated
    Mar 8, 2023
    Authors
    CraigAS
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This data analytics project utilized SQL and Tableau to analyze and model real estate prices in Georgia. The data was cleaned and transformed in SQL, and visualizations were created in Tableau to identify key trends and patterns. A linear regression model was developed to predict property prices based on given features, and the model was validated using statistical metrics. The results were presented in an interactive dashboard, enabling users to explore the data and make informed decisions related to real estate investments in Georgia.

    Thanks to the orginal authors of this dataset, which was co-produced by Guenter Roehrich and Jordan, who produced a dataset of real estate listings for Georgia for the first 6 months of 2021.

    For visualizations related to this project, click the tableau link in my bio or visit tableau public.

  8. COVID-19 - SQL Project

    • kaggle.com
    zip
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharmaine Wong (2024). COVID-19 - SQL Project [Dataset]. https://www.kaggle.com/datasets/swsw1717/covid-19-sql-project
    Explore at:
    zip(13220606 bytes)Available download formats
    Dataset updated
    Jul 30, 2024
    Authors
    Sharmaine Wong
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset includes information on the number of confirmed deaths from COVID-19, showing the total impact of the pandemic on mortality globally. The Our World in Data COVID-19 dataset is open-source, updated daily, and can be found here.

    SQL Queries for Data Exploration can be found on this Github Repository.

    Covid Dashboard created can be found on this Tableau Public Page.

  9. (Sunset)📒 Meta Kaggle ported to MS SQL SERVER

    • kaggle.com
    zip
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). (Sunset)📒 Meta Kaggle ported to MS SQL SERVER [Dataset]. https://www.kaggle.com/datasets/bwandowando/meta-kaggle-ported-to-sql-server-2022-database
    Explore at:
    zip(8635902534 bytes)Available download formats
    Dataset updated
    Mar 20, 2024
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.

    • MSSQL VERSION: SQL Server 2022
    • Collation: SQL_Latin1_General_CP1_CI_AS
    • Recovery model: simple

    Requirements

    • Download and install the SQL SERVER 2022 Developer edition here
    • Download the backup file
    • Restore the backup file into your local. If you havent done this before, it's easy and straightforward. Here is a guide.

    (QUOTED FROM THE ORIGINAL DATASET)

    Meta Kaggle

    Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">

    Notes

  10. 🖼️ Famous Paintings

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 🖼️ Famous Paintings [Dataset]. https://www.kaggle.com/datasets/mexwell/famous-paintings
    Explore at:
    zip(6681482 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    mexwell
    Description

    Famous paintings and their artists. This data set is published to help students have interesting data to practice SQL

    Original Data

    Acknowlegement

    Foto von Steve Johnson auf Unsplash

  11. Company Product Sales Analysis & BI Report

    • kaggle.com
    zip
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oluwabori Abiodun-Johnson (2023). Company Product Sales Analysis & BI Report [Dataset]. https://www.kaggle.com/datasets/oluwaboriaj/pizza-company-sales-bi-report
    Explore at:
    zip(15967889 bytes)Available download formats
    Dataset updated
    Oct 25, 2023
    Authors
    Oluwabori Abiodun-Johnson
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a self-guided project.

    PROBLEM STATEMENT: What underlying trends could the company be missing out on in our Pizza Sales data that can aid in gap analysis of its business sales.

    OBJECTIVES: 1. Generate Key Performance Indicators (KPIs) of the Pizza Sales data for insight gain into underlying business performance. 2. Visualize important aspects of the Pizza Sales data to gain insight and understand key trends\

    I dived into the csv dataset to uncover patterns within the Pizza Sales data which spanned across a calendar.

    Used Microsoft SQL SMSS to perform EDA (Exploratory Data Analysis); ergo, identifying trends and sales patterns.

    Having completed that, I used the Microsoft Power BI to create a visualization as a means to visually represent of my analytical findings to technical and non-technical viewers.

    STEPS COMPLETED: Data Importation SQL Data analysis query writing Data Cleaning Data Processing Data Visualization Report/Dashboard Development

  12. Bank Transaction Analytics Dashboard – SQL + Excel

    • kaggle.com
    zip
    Updated Aug 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prachi Singh (2025). Bank Transaction Analytics Dashboard – SQL + Excel [Dataset]. https://www.kaggle.com/datasets/prachisingh29ds/bank-transaction-analytics-dashboard-sql-excel
    Explore at:
    zip(2856220 bytes)Available download formats
    Dataset updated
    Aug 18, 2025
    Authors
    Prachi Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📊 Bank Transaction Analytics Dashboard – SQL + Excel

    🔹 Overview

    This project focuses on Bank Transaction Analysis using a combination of SQL scripts and Excel dashboards. The goal is to provide insights into customer spending patterns, payment modes, suspicious transactions, and overall financial trends.

    The dataset and analysis files can help learners and professionals understand how SQL and Excel can be used together for business decision-making, customer behavior tracking, and data-driven insights.

    🔹 Contents

    The dataset includes the following resources:

    📂 SQL Scripts:

    Create & Insert tables

    15 Basic Queries

    15 Advanced Queries

    📂 CSV File:

    Bank Transaction Analytics.csv (main dataset)

    📂 Excel Charts:

    Pie, Bar, Column, Line, Doughnut charts

    Final Interactive Dashboard

    📂 Screenshots:

    Query outputs, Charts, and Final Dashboard visualization

    📂 PDF Reports:

    Project Report

    Dashboard Report

    📄 README.md:

    Complete documentation and step-by-step explanation

    🔹 Key Insights

    26–35 age group spent the most across categories.

    Amazon identified as the top merchant.

    NetBanking showed the highest share compared to POS/UPI.

    Travel & Shopping emerged as dominant categories.

    🔹 Applications

    Detecting suspicious transactions.

    Understanding customer behavior.

    Identifying top merchants and categories.

    Building business intelligence dashboards.

    🔹 How to Use

    Download the dataset and SQL scripts.

    Run Bank_Transaction_Analytics.SQL to create and insert data.

    Execute the queries (Basic + Advanced) for insights.

    Open Excel files to explore interactive charts and dashboards.

    Refer to Project Report PDF for documentation.

    🔹 Author

    👩‍💻 Created by: Prachi Singh

    GitHub: Bank Transaction Analytics Dashboard(https://github.com/prachi-singh-ds/Bank-Transaction-Analytics-Dashboard)

    ⚡This project is a complete SQL + Excel integration case study and is suitable for Data Science, Business Analytics, and Data Engineering portfolios.

  13. Zomato - AI Generated Data

    • kaggle.com
    zip
    Updated Apr 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satyavathi Gunturi (2025). Zomato - AI Generated Data [Dataset]. https://www.kaggle.com/datasets/satya918g/zomato-ai-generated-data/code
    Explore at:
    zip(135823 bytes)Available download formats
    Dataset updated
    Apr 27, 2025
    Authors
    Satyavathi Gunturi
    Description

    This AI-generated dataset simulates food delivery platform data, featuring users, restaurants, orders, delivery times, visits, and referrals. Ideal for practicing advanced SQL analytics, anomaly detection, customer behavior analysis, and business insights

  14. Adventure Works DW 2008

    • kaggle.com
    zip
    Updated Oct 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Vasanth (2024). Adventure Works DW 2008 [Dataset]. https://www.kaggle.com/datasets/jamesvasanth/adventure-works-dw-2008
    Explore at:
    zip(9400055 bytes)Available download formats
    Dataset updated
    Oct 5, 2024
    Authors
    James Vasanth
    Description

    The AdventureWorks DW 2008 dataset, originally provided by Microsoft, has been converted into CSV files for easier use, making it accessible for data exploration on platforms like Kaggle. The dataset is licensed under the Microsoft Public License (MS-PL), which is a permissive open-source license. This means you are free to use, modify, and share the dataset, whether for personal or commercial purposes, provided that you include the original license terms. However, it's important to note that the dataset is provided "as-is" without any warranty or guarantee from Microsoft.

    I really enjoy working with the AdventureWorks DW 2008 dataset. It offers a rich and well-structured environment that's perfect for writing and learning SQL queries. The data warehouse includes a variety of tables, such as facts and dimensions, making it an excellent resource for both beginners and experienced SQL users to practice querying and exploring relational databases.

    Now, with the dataset available in CSV format, it can be easily used with Python for exploratory data analysis (EDA), and it’s also well-suited for applying machine learning techniques such as regression, classification, and clustering.

    If you’re planning to dive into the data, all the best! It's a fantastic resource to learn from and experiment with. Cheers!

  15. Indian students data for data analysis Practice

    • kaggle.com
    zip
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satish Dhawale (2024). Indian students data for data analysis Practice [Dataset]. https://www.kaggle.com/datasets/satishdhawle/indian-students-data-for-data-analysis-practice
    Explore at:
    zip(849129 bytes)Available download formats
    Dataset updated
    Jan 9, 2024
    Authors
    Satish Dhawale
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The data appears to be related to student order and payment information, including details like student names, order IDs, courses enrolled, payment status, and more. this data is for practice on sql queries, it is helpful for data analysis student to make visualisation on the data. data is provided by Skill course the E- Learning platform by Satish Dhawale

    WWW.SKILLCOURSE.IN

    The file "Indian_Students_Data.csv" contains the following information:

    ****Columns are :****

    srno order_id student_name
    payment_date course_name price payment_status payment_id email state

  16. eCommerce Transactions

    • kaggle.com
    zip
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chad Wambles (2025). eCommerce Transactions [Dataset]. https://www.kaggle.com/datasets/chadwambles/ecommerce-transactions
    Explore at:
    zip(245430 bytes)Available download formats
    Dataset updated
    Jan 3, 2025
    Authors
    Chad Wambles
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This data set is perfect for practicing your analytical skills for Power BI, Tableau, Excel, or transform it into a CSV to practice SQL.

    This use case mimics transactions for a fictional eCommerce website named EverMart Online. The 3 tables in this data set are all logically connected together with IDs.

    My Power BI Use Case Explanation - Using Microsoft Power BI, I made dynamic data visualizations for revenue reporting and customer behavior reporting.

    Revenue Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total Sales, Product Sales, or Categorical Sales. - Line Graph Visual that shows Total Revenue by Month of the entire year. This graph also changes to calculate Total Revenue by Month for the Total Sales by Product and Total Sales by Category if selected. - Bar Graph Visual showcasing Total Sales by Product. - Donut Chart Visual showcasing Total Sales by Category of Product.

    Customer Behavior Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total or by continent selected on the map. - Interactive Map Visual showing key statistics for the continent selected. - The key statistics are presented on the tool tip when you select a continent, and the following statistics show for that continent: - Continent Name - Customer Total - Percentage of Products Sold - Percentage of Total Customers - Percentage of Total Transactions - Percentage of Total Revenue

  17. Netflix Analysis

    • kaggle.com
    zip
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sahibjotchandla (2025). Netflix Analysis [Dataset]. https://www.kaggle.com/datasets/sahibjotchandla/netflixdata
    Explore at:
    zip(1401547 bytes)Available download formats
    Dataset updated
    Jan 14, 2025
    Authors
    sahibjotchandla
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This project shows SQL scripts and a cleaned dataset to explore Netflix's extensive catalog of over 8,000 movies and TV shows

    SQL queries were crafted to uncover a range of insights, including:

    • The distribution of Movies vs. TV Shows
    • Top genres and categories across different content types
    • Trends in content by country, release year, and rating
    • Contributions of prominent directors and actors to Netflix's library

    This project highlights essential skills in data cleaning, SQL querying, and exploratory data analysis. The results provide valuable insights into Netflix’s content trends, diversity, and evolution, making it a great resource for anyone interested in data-driven storytelling.

  18. Healthcare Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishal Jaiswal (2025). Healthcare Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/jaiswalmagic1/healthcare-fraud-detection-dataset
    Explore at:
    zip(10427537 bytes)Available download formats
    Dataset updated
    Mar 6, 2025
    Authors
    Vishal Jaiswal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains comprehensive synthetic healthcare data designed for fraud detection analysis. It includes information on patients, healthcare providers, insurance claims, and payments. The dataset is structured to mimic real-world healthcare transactions, where fraudulent activities such as false claims, overbilling, and duplicate charges can be identified through advanced analytics.

    The dataset is suitable for practicing SQL queries, exploratory data analysis (EDA), machine learning for fraud detection, and visualization techniques. It is designed to help data analysts and data scientists develop and refine their analytical skills in the healthcare insurance domain.

    Dataset Overview The dataset consists of four CSV files:

    Patients Data (patients.csv)

    Contains demographic details of patients, such as age, gender, insurance type, and location. Can be used to analyze patient demographics and healthcare usage patterns. Providers Data (providers.csv)

    Contains information about healthcare providers, including provider ID, specialty, location, and associated hospital.

    Useful for identifying fraudulent claims linked to specific providers or hospitals. Claims Data (claims.csv)

    Contains records of insurance claims made by patients, including diagnosis codes, treatment details, provider ID, and claim amount.

    Can be analyzed for suspicious patterns, such as excessive claims from a single provider or duplicate claims for the same patient.

    Payments Data (payments.csv) Contains details of claim payments made by insurance companies, including payment amount, claim ID, and reimbursement status.

    Helps in detecting discrepancies between claims and actual reimbursements. Possible Analysis Ideas

    This dataset allows for multiple analysis approaches, including but not limited to:

    🔹 Fraud Detection: Identify patterns in claims data to detect fraudulent activities (e.g., excessive billing, duplicate claims). 🔹 Provider Behavior Analysis: Analyze providers who have an unusually high claim volume or high rejection rates. 🔹 Payment Trends: Compare claims vs. payments to find irregularities in reimbursement patterns. 🔹 Patient Demographics & Utilization: Explore which patient groups are more likely to file claims and receive reimbursements. 🔹 SQL Query Practice: Perform advanced SQL queries, including joins, aggregations, window functions, and subqueries, to extract insights from the data.

    Use Cases Practicing SQL queries for job interviews and real-world projects. Learning data cleaning, data wrangling, and feature engineering for healthcare analytics. Applying machine learning techniques for fraud detection. Gaining insights into the healthcare insurance domain and its challenges.

    License & Usage License: CC0 Public Domain (Free to use for any purpose).

    Attribution: Not required but appreciated. Intended Use: This dataset is for educational and research purposes only.

    This dataset is an excellent resource for aspiring data analysts, data scientists, and SQL learners who want to gain hands-on experience in healthcare fraud detection.

  19. Indeed - Data Science

    • kaggle.com
    zip
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cormac42 (2024). Indeed - Data Science [Dataset]. https://www.kaggle.com/datasets/cormac42/indeed-data-science
    Explore at:
    zip(6243501 bytes)Available download formats
    Dataset updated
    Aug 16, 2024
    Authors
    Cormac42
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset was scraped from Indeed during the summer of 2024, focusing on the search term 'data scientist.' The data encompasses job listings from every state in the USA, including remote positions, providing a comprehensive snapshot of the data science job market during this period.

    Working with this dataset involves a variety of skills that can help students gain valuable experience in data analysis, visualization, and interpretation. Some skills that could be practiced using this data:

    1. Data Cleaning and Preprocessing
    2. Exploratory Data Analysis (EDA)
    3. Data Visualization
    4. Text Analysis and Natural Language Processing (NLP)
    5. SQL and Database Management
    6. Geospatial Analysis
    7. Machine Learning
  20. 99 Little Orange, Technical Business Case

    • kaggle.com
    zip
    Updated Jun 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IVAN CHAVEZ (2022). 99 Little Orange, Technical Business Case [Dataset]. https://www.kaggle.com/datasets/ivanchvez/99littleorange
    Explore at:
    zip(91998345 bytes)Available download formats
    Dataset updated
    Jun 13, 2022
    Authors
    IVAN CHAVEZ
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    99 Little Orange, Technical Business Case

    Dear candidate, we are so excited with your interest in working with us! This challenge is an opportunity for us to know a bit of the great talent we know you have. It was built to simulate real-case scenarios that you would face while working at [Organization] and is organized in 2 parts:

      1. A technical part of close-ended questions with specific answers that are meant to assess your ability to analyze large amounts of data with SQL to answer key questions.
      1. An analytical part of open-ended questions to assess your ability to build data-backed recommendations to support decision-making. Expect further questions and discussions on top of your answers in the next phase of our hiring process.

    Part I - Technical Provide both the answer and the SQL code used. 1. What is the average trip cost of holidays? How does it compare to non-holidays? 2. Find the average call time of the first time passengers make a trip. 3. Find the average number of trips per driver for every week day. 4. Which day of the week drivers usually drive the most distance on average? 5. What was the growth percentage of rides month over month? 6. Optional. List the top 5 drivers per number of trips in the top 5 largest cities.

    Part II - Analytical 99 is a marketplace, where drivers are the supply and passengers, the demand. One of our main challenges is to keep this marketplace balanced. If there's too much demand, prices would increase due to surge and passengers would prefer not to run. If there's too much supply, drivers would spend more time idle impacting their revenue. 1. Let's say it's 2019-09-23 and a new Operations manager for The Shire was just hired. She has 5 minutes during the Ops weekly meeting to present an overview of the business in the city, and since she's just arrived, she asked your help to do it. What would you prepare for this 5 minutes presentation? Please provide 1-2 slides with your idea. 2. She also mentioned she has a budget to invest in promoting the business. What kind of metrics and performance indicators would you use in order to help her decide if she should invest it into the passenger side or the driver side? Extra point if you provide data-backed recommendations. 3. One month later, she comes back, super grateful for all the helpful insights you have given her. And says she is anticipating a driver supply shortage due to a major concert that is going to take place the next day and also a 3 day city holiday that is coming the next month. What would you do to help her analyze the best course of action to either prevent or minimize the problem in each case? 4. Optional. We want to build up a model to predict “Possible Churn Users” (e.g.: no trips in the past 4 weeks). List all features that you can think about and the data mining or machine learning model or other methods you may use for this case.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bilal424 (2024). SQL Data Cleaning & EDA Project [Dataset]. https://www.kaggle.com/datasets/bilal424/sql-data-cleaning-and-eda-project/code
Organization logo

SQL Data Cleaning & EDA Project

Explore at:
zip(5352 bytes)Available download formats
Dataset updated
Oct 15, 2024
Authors
Bilal424
Description

This dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.

Search
Clear search
Close search
Google apps
Main menu