20 datasets found
  1. N

    SQL Project

    • data.cityofnewyork.us
    application/rdfxml +5
    Updated May 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Finance (DOF) (2025). SQL Project [Dataset]. https://data.cityofnewyork.us/City-Government/SQL-Project/hek5-e7qj
    Explore at:
    json, csv, application/rdfxml, xml, application/rssxml, tsvAvailable download formats
    Dataset updated
    May 29, 2025
    Authors
    Department of Finance (DOF)
    Description

    Check out our data lens page for additional data filtering and sorting options: https://data.cityofnewyork.us/view/i4p3-pe6a

    This dataset contains Open Parking and Camera Violations issued by the City of New York. Updates will be applied to this data set on the following schedule:

    New or open tickets will be updated weekly (Sunday). Tickets satisfied will be updated daily (Tuesday through Sunday). NOTE: Summonses that have been written-off are indicated by blank financials.

    Summons images will not be available during scheduled downtime on Sunday - Monday from 1:00 am to 2:30 am and on Sundays from 5:00 am to 10:00 am.

    • Initial dataset loaded 05/14/2016.
  2. Library Management System SQL Project

    • kaggle.com
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Najir 0123 (2024). Library Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/najir0123/library-management-system-sql-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 22, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Najir 0123
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
  3. MY SQL DATA CLEANING PROJECT

    • kaggle.com
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George M122 (2024). MY SQL DATA CLEANING PROJECT [Dataset]. https://www.kaggle.com/datasets/georgem122/my-sql-data-cleaning-project/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    George M122
    Description

    Dataset

    This dataset was created by George M122

    Contents

  4. sql-project-img

    • kaggle.com
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Lira (2023). sql-project-img [Dataset]. https://www.kaggle.com/datasets/luisliraportfolio/sql-project-img/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Luis Lira
    Description

    Dataset

    This dataset was created by Luis Lira

    Contents

  5. Student's mental health

    • kaggle.com
    Updated Apr 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdallah Nasser (2024). Student's mental health [Dataset]. https://www.kaggle.com/abdallahprogrammer/students-mental-health/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abdallah Nasser
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Abdallah Nasser

    Released under Apache 2.0

    Contents

  6. Hospital Database Management System SQL Project

    • kaggle.com
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Dolcimascolo-Garrett (2024). Hospital Database Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/andrewdolcigarrett/hospital-database-management-system-sql-project/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Andrew Dolcimascolo-Garrett
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Andrew Dolcimascolo-Garrett

    Released under MIT

    Contents

  7. SQL PROJECT-1 BY JITENDRA KUMAR

    • kaggle.com
    Updated Nov 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jitendra Kumar (2023). SQL PROJECT-1 BY JITENDRA KUMAR [Dataset]. https://www.kaggle.com/datasets/jktdatascientist/sql-project-1-by-jitendra-kumar/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jitendra Kumar
    Description

    Dataset

    This dataset was created by Jitendra Kumar

    Released under Other (specified in description)

    Contents

  8. Nashville Housing Data : SQL project

    • kaggle.com
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paragi Jain11 (2023). Nashville Housing Data : SQL project [Dataset]. https://www.kaggle.com/paragijain11/nashville-housing-data-sql-project/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 9, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Paragi Jain11
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Nashville
    Description

    Dataset

    This dataset was created by Paragi Jain11

    Released under CC0: Public Domain

    Contents

  9. Covid-SQL-project

    • kaggle.com
    Updated Jul 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuel Chude (2023). Covid-SQL-project [Dataset]. https://www.kaggle.com/datasets/emmanuelchude/sql-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Emmanuel Chude
    Description

    Dataset

    This dataset was created by Emmanuel Chude

    Contents

  10. SQL-Project: Dataset - Delitos BA 2021

    • kaggle.com
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Lira (2023). SQL-Project: Dataset - Delitos BA 2021 [Dataset]. https://www.kaggle.com/datasets/luisliraportfolio/delitos-ba-2021
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Luis Lira
    Description

    Dataset

    This dataset was created by Luis Lira

    Contents

  11. S&P 500 Companies Analysis Project

    • kaggle.com
    Updated Apr 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anshadkaggle (2025). S&P 500 Companies Analysis Project [Dataset]. https://www.kaggle.com/datasets/anshadkaggle/s-and-p-500-companies-analysis-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    anshadkaggle
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This project focuses on analyzing the S&P 500 companies using data analysis tools like Python (Pandas), SQL, and Power BI. The goal is to extract insights related to sectors, industries, locations, and more, and visualize them using dashboards.

    Included Files:

    sp500_cleaned.csv โ€“ Cleaned dataset used for analysis

    sp500_analysis.ipynb โ€“ Jupyter Notebook (Python + SQL code)

    dashboard_screenshot.png โ€“ Screenshot of Power BI dashboard

    README.md โ€“ Summary of the project and key takeaways

    This project demonstrates practical data cleaning, querying, and visualization skills.

  12. Cupcake Business - Sales Data Analysis

    • kaggle.com
    Updated Mar 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MITHRA CHANDRAN (2024). Cupcake Business - Sales Data Analysis [Dataset]. http://doi.org/10.34740/kaggle/dsv/7922498
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MITHRA CHANDRAN
    Description

    This project answers some business questions for a cupcake business company, by analyzing their sales data by SQL. The business wants to know

    1. Find the unique flavors.
    2. Find the revenue per flavor
    3. Total Revenue for the year 2023
    4. Which month has the highest sales?
    5. which flavor sells most during this month?
    6. Which is the most popular flavor?
    7. Which flavor has the most rating?
    8. Is there any relation between rating 5 and revenue?
    9. Top 3 loyal customers
    10. From which city are we getting the most orders?

    Here the database used is PostgreSQL .

  13. Sales Data Analysis Using MySQL, Excel & Power BI

    • kaggle.com
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pooja Career (2025). Sales Data Analysis Using MySQL, Excel & Power BI [Dataset]. https://www.kaggle.com/datasets/poojacareer/sales-data-analysis-using-mysql-excel-and-power-bi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    pooja Career
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ๐Ÿ“Š Sales Data Analysis Using MySQL, Excel & Power BI ๐Ÿ” Project Overview This project focuses on analyzing sales data to extract valuable insights, identify trends, and support business decision-making. Using MySQL for querying, Excel for data manipulation, and Power BI for visualization, we explore key sales performance metrics.

    ๐Ÿ›  Tools Used โœ… MySQL โ€“ Data storage, cleaning, and analysis using SQL queries. โœ… Excel โ€“ Data preprocessing, pivot tables, and basic visualization. โœ… Power BI โ€“ Interactive dashboards for advanced data visualization.

    ๐Ÿ“‚ Dataset Information Source: Kaggle Superstore Sales Dataset Data Size: 10,000+ records Key Features: Sales, Customer Details, Ship Mode, Product Category, Region

    ๐Ÿ“Œ Key Business Questions Answered 1๏ธโƒฃ What are the top-performing sales regions? โœ… Used Power BI Map Visualization to analyze sales distribution by region. โœ… Key Insight: The highest sales were recorded in the West & East regions, while some regions showed potential for improvement.

    2๏ธโƒฃ Which product categories drive the highest revenue? โœ… Used Excel Pivot Tables to aggregate Sales by Category. โœ… Observation: "Technology" products had the highest sales, followed by "Furniture" and "Office Supplies."

    3๏ธโƒฃ Who are the top 10 customers by sales volume? โœ… Extracted top customers using SQL Queries & Power BI Ranking Functions. โœ… Business Insight: Retaining these customers can significantly boost revenue.

    4๏ธโƒฃ Which are the top 5 best-selling products? โœ… Aggregated product sales using MySQL SUM() function. โœ… Result: High-demand products identified, helping in inventory planning.

    5๏ธโƒฃ How does shipping mode affect sales? โœ… Created Power BI Slicer & Bar Chart for Ship Mode Analysis. โœ… Finding: Standard Class was the most used, while Same-Day shipping had lower but high-value orders.

    ๐Ÿ“Š Power BI Dashboard Overview ๐Ÿ”น Sales by Region โ€“ Geographical performance map ๐Ÿ”น Top 10 Customers โ€“ Key customers contributing to revenue ๐Ÿ”น Category & Sales โ€“ Identifying best-performing categories ๐Ÿ”น Top 5 Products โ€“ Sales contribution by product ๐Ÿ”น Shipping Mode Impact โ€“ Analyzing customer shipping preferences

    ๐Ÿ“ˆ Business Insights & Recommendations ๐Ÿ“Œ Optimize Marketing Efforts โ€“ Focus more on high-performing regions. ๐Ÿ“Œ Inventory Management โ€“ Maintain high stock levels for top-selling products. ๐Ÿ“Œ Customer Retention Strategies โ€“ Prioritize personalized marketing for top customers. ๐Ÿ“Œ Improve Shipping Efficiency โ€“ Explore cost-effective shipping options for increased profitability.

    ๐Ÿ“ข Why This Project? This project helped me strengthen my SQL querying skills, enhance Excel data manipulation, and build Power BI dashboards for professional data storytelling.

    ๐Ÿ’ก Next Steps: Expanding analysis with predictive analytics & machine learning.

    ๐Ÿ“Ž Project Files & Resources ๐Ÿ“‚ Dataset โ€“ Available on Kaggle ๐Ÿ“Š Power BI Dashboard โ€“ Shared in project files ๐Ÿ“œ SQL Queries & Excel Reports โ€“ Available for reference

    ๐Ÿš€ Let's Connect! ๐Ÿ‘จโ€๐Ÿ’ป LinkedIn โ€“ www.linkedin.com/in/ pooja-akash-lohkare-62a6a5b6

    ๐Ÿ“ง Contact โ€“ poojacareer789@gmail.com

    If you found this useful, upvote & comment with your feedback! ๐Ÿ™Œ

  14. moved_project_sql_result_01

    • kaggle.com
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josรฉ Francisco Lara Cรกrdemas (2023). moved_project_sql_result_01 [Dataset]. https://www.kaggle.com/datasets/josephfaster/moved-project-sql-result-01-csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Josรฉ Francisco Lara Cรกrdemas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Josรฉ Francisco Lara Cardenas

    Released under CC0: Public Domain

    Contents

  15. Bellabeat Case Study using SQL and Tableau

    • kaggle.com
    Updated Oct 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ragini1 (2023). Bellabeat Case Study using SQL and Tableau [Dataset]. https://www.kaggle.com/ragini1/bellabeat-case-study-using-sql-and-tableau/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ragini1
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Ragini1

    Released under CC0: Public Domain

    Contents

  16. AW2019 Sales Overview

    • kaggle.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xavier Berge (2025). AW2019 Sales Overview [Dataset]. https://www.kaggle.com/datasets/xavierberge/aw2019-sales-overview
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Xavier Berge
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset extracted from the 2019 Adventure Works database. 4 files: -Dimension Calendar -Dimension Customer -Dimension Product -Fact Internet Sales

    All tables used in the SQL project attached to the Dataset.

  17. Healthcare Management System

    • kaggle.com
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anouska Abhisikta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Patients Table:

    • PatientID: Unique identifier for each patient.
    • firstname: First name of the patient.
    • lastname: Last name of the patient.
    • email: Email address of the patient.

    This table stores information about individual patients, including their names and contact details.

    Doctors Table:

    • DoctorID: Unique identifier for each doctor.
    • DoctorName: Full name of the doctor.
    • Specialization: Area of medical specialization.
    • DoctorContact: Contact details of the doctor.

    This table contains details about healthcare providers, including their names, specializations, and contact information.

    Appointments Table:

    • AppointmentID: Unique identifier for each appointment.
    • Date: Date of the appointment.
    • Time: Time of the appointment.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.
    • DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

    This table records scheduled appointments, linking patients to doctors.

    MedicalProcedure Table:

    • ProcedureID: Unique identifier for each medical procedure.
    • ProcedureName: Name or description of the medical procedure.
    • AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

    This table stores details about medical procedures associated with specific appointments.

    Billing Table:

    • InvoiceID: Unique identifier for each billing transaction.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.
    • Items: Description of items or services billed.
    • Amount: Amount charged for the billing transaction.

    This table maintains records of billing transactions, associating them with specific patients.

    demo Table:

    • ID: Primary key, serves as a unique identifier for each record.
    • Name: Name of the entity.
    • Hint: Additional information or hint about the entity.

    This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

    This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.

  18. Super store

    • kaggle.com
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Somayeh Sahebi (2024). Super store [Dataset]. https://www.kaggle.com/datasets/somayehsahebi/super-store/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Somayeh Sahebi
    Description

    ****Super Store Analytics with SQL and Looker Studio****

    I am excited to share a project that I recently completed, focusing on comprehensive analytics for a superstore using SQL and visualizations crafted in Looker Studio. This project aimed to enhance decision-making processes by leveraging robust data analysis and interactive visualizations. All the data presented is expressed in thousands.

    Key Components: Query Optimization: Leveraging the power of SQL ( DBeaver, Postgres) , I implemented optimized queries to extract meaningful insights from the vast dataset. This involved employing aggregate functions, joins, and subqueries to retrieve specific information such as sales trends, customer behaviors, and inventory management.

    Looker Studio Visualizations: To provide a user-friendly interface for data exploration, Looker Studio was employed to create interactive and insightful visualizations. Dashboards were crafted to offer a holistic view of superstore performance, enabling stakeholders to identify patterns, trends, and areas for improvement. Looker studio visualisation

    Project Achievements:

    • Enhanced data-driven decision-making processes for the superstore management team.
    • Improved operational efficiency through insights into product performance, customer preferences, and inventory management.
    • Provided a scalable and adaptable solution for ongoing analytics and reporting needs.

    Data Description:

    Order_ID (integer): Unique identifier for each order.
    Order_Date (date): Date when the order was placed.
    Ship_Date (date): Date when the order was shipped.
    Interval (day) (integer): Number of days between order placement and shipment.
    Ship_Mode (string): Shipping method chosen for the order.
    Customer_ID (integer): Unique identifier for each customer.
    Customer_Name (string): Name of the customer.
    Segment (string): Customer segmentation.
    Country (string): Country where the order was placed.
    City (string): City where the order was placed.
    State (string): State where the order was placed.
    Postal_Code (string): Postal code of the order location.
    Region (string): Geographical region of the order.
    Product_ID (integer): Unique identifier for each product.
    Category (string): Product category.
    Sub_Category (string): Product sub-category.
    Product_Name (string): Name of the product.
    Sales (float): Sales amount for the order.
    Quantity (integer): Quantity of products in the order.
    Discount (float): Discount applied to the order.
    Profit (float): Profit generated from the order.
    Returned (string): Indicates whether the order was returned (Yes/No).
    Person (string): Customer categorization.
    Region (string): Geographic region associated with the customer.*
    

    SELECT o."Order_ID", o."Order_Date", o."Ship_Date", o."Ship_Mode", o."Segment", o."Country", o."City", o."State", o."Postal_Code", o."Region", o."Product_ID", o."Category", o."Sub_Category", o."Product_Name", o."Sales", o."Quantity", o."Discount", o."Profit", p."Person", CASE WHEN r."Returned" = 'yes' THEN 'yes' ELSE 'No' END AS "Returned", min("Sales" - "Profit") as Total_cost, min("Sales" / ("Quantity" * (1 - "Discount"))) as price_per_unit FROM orders o JOIN "Return" r ON r."Order_ID" = o."Order_ID" JOIN people p ON p."Region" = o."Region" GROUP BY o."Order_ID", o."Order_Date", o."Ship_Date", o."Ship_Mode", o."Segment", o."Country", o."City", o."State", o."Postal_Code", o."Region", o."Product_ID", o."Category", o."Sub_Category", o."Product_Name", o."Sales", o."Quantity", o."Discount", "Returned" , o."Profit", p."Person" ORDER BY Total_cost, price_per_unit DESC;

    Questions:

    • What specific factors contributed to the fluctuations in average total cost between 2014 and 2017?
    • Can you identify any outliers or anomalies in the total cost data during this period?
    • What were the key drivers behind the increase in average profit from 2014 to 2017?
    • Are there any specific categories or sub-categories that experienced a significant change in profit?
    • Which categories or products contributed the most to the growth in average sales from 2014 to 2017?
    • Did any specific periods within this timeframe witness a notable spike or decline in sales?
    • Can you provide more detailed insights into the technology category's profitability, especially regarding its near-zero profit in 2016?
    • What other categories exhibited distinct patterns in the histogram analysis?
    • What factors contributed to the cost discrepancy between average sales and average total cost for furniture...
  19. Healthcare Fraud Detection Dataset

    • kaggle.com
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishal Jaiswal (2025). Healthcare Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/jaiswalmagic1/healthcare-fraud-detection-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vishal Jaiswal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains comprehensive synthetic healthcare data designed for fraud detection analysis. It includes information on patients, healthcare providers, insurance claims, and payments. The dataset is structured to mimic real-world healthcare transactions, where fraudulent activities such as false claims, overbilling, and duplicate charges can be identified through advanced analytics.

    The dataset is suitable for practicing SQL queries, exploratory data analysis (EDA), machine learning for fraud detection, and visualization techniques. It is designed to help data analysts and data scientists develop and refine their analytical skills in the healthcare insurance domain.

    Dataset Overview The dataset consists of four CSV files:

    Patients Data (patients.csv)

    Contains demographic details of patients, such as age, gender, insurance type, and location. Can be used to analyze patient demographics and healthcare usage patterns. Providers Data (providers.csv)

    Contains information about healthcare providers, including provider ID, specialty, location, and associated hospital.

    Useful for identifying fraudulent claims linked to specific providers or hospitals. Claims Data (claims.csv)

    Contains records of insurance claims made by patients, including diagnosis codes, treatment details, provider ID, and claim amount.

    Can be analyzed for suspicious patterns, such as excessive claims from a single provider or duplicate claims for the same patient.

    Payments Data (payments.csv) Contains details of claim payments made by insurance companies, including payment amount, claim ID, and reimbursement status.

    Helps in detecting discrepancies between claims and actual reimbursements. Possible Analysis Ideas

    This dataset allows for multiple analysis approaches, including but not limited to:

    ๐Ÿ”น Fraud Detection: Identify patterns in claims data to detect fraudulent activities (e.g., excessive billing, duplicate claims). ๐Ÿ”น Provider Behavior Analysis: Analyze providers who have an unusually high claim volume or high rejection rates. ๐Ÿ”น Payment Trends: Compare claims vs. payments to find irregularities in reimbursement patterns. ๐Ÿ”น Patient Demographics & Utilization: Explore which patient groups are more likely to file claims and receive reimbursements. ๐Ÿ”น SQL Query Practice: Perform advanced SQL queries, including joins, aggregations, window functions, and subqueries, to extract insights from the data.

    Use Cases Practicing SQL queries for job interviews and real-world projects. Learning data cleaning, data wrangling, and feature engineering for healthcare analytics. Applying machine learning techniques for fraud detection. Gaining insights into the healthcare insurance domain and its challenges.

    License & Usage License: CC0 Public Domain (Free to use for any purpose).

    Attribution: Not required but appreciated. Intended Use: This dataset is for educational and research purposes only.

    This dataset is an excellent resource for aspiring data analysts, data scientists, and SQL learners who want to gain hands-on experience in healthcare fraud detection.

  20. Computer Science Students Career Prediction

    • kaggle.com
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rugved Patil (2024). Computer Science Students Career Prediction [Dataset]. https://www.kaggle.com/datasets/devildyno/computer-science-students-career-prediction/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rugved Patil
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Computer Science Students Dataset

    This dataset contains information about computer science students from a fictional university. It includes attributes such as Student ID, Name, Gender, Age, GPA, Major, Interested Domain, Projects undertaken, and skills in Python, SQL, and Java. The dataset aims to provide insights into the academic performance, career aspirations, and technical skills of students in the field of computer science.

    Columns: Student ID: Unique identifier for each student. Name: Name of the student. Gender: Gender of the student. Age: Age of the student. GPA: Grade Point Average of the student. Major: Field of study within computer science. Interested Domain: Area of interest within the field of computer science. Projects: Noteworthy projects completed by the student. Python: Proficiency level in Python programming. SQL: Proficiency level in SQL querying. Java: Proficiency level in Java programming.

    • Future Career: Intended career path or job aspiration (target variable).

    Purpose: This dataset is suitable for tasks such as predictive modeling to understand factors influencing career choices in computer science students. The "Future Career" column serves as the target variable for classification tasks. Researchers, educators, and data enthusiasts can utilize this dataset for various educational and analytical purposes in the realm of computer science education and career planning.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Department of Finance (DOF) (2025). SQL Project [Dataset]. https://data.cityofnewyork.us/City-Government/SQL-Project/hek5-e7qj

SQL Project

Explore at:
json, csv, application/rdfxml, xml, application/rssxml, tsvAvailable download formats
Dataset updated
May 29, 2025
Authors
Department of Finance (DOF)
Description

Check out our data lens page for additional data filtering and sorting options: https://data.cityofnewyork.us/view/i4p3-pe6a

This dataset contains Open Parking and Camera Violations issued by the City of New York. Updates will be applied to this data set on the following schedule:

New or open tickets will be updated weekly (Sunday). Tickets satisfied will be updated daily (Tuesday through Sunday). NOTE: Summonses that have been written-off are indicated by blank financials.

Summons images will not be available during scheduled downtime on Sunday - Monday from 1:00 am to 2:30 am and on Sundays from 5:00 am to 10:00 am.

  • Initial dataset loaded 05/14/2016.
Search
Clear search
Close search
Google apps
Main menu