Check out our data lens page for additional data filtering and sorting options: https://data.cityofnewyork.us/view/i4p3-pe6a
This dataset contains Open Parking and Camera Violations issued by the City of New York. Updates will be applied to this data set on the following schedule:
New or open tickets will be updated weekly (Sunday). Tickets satisfied will be updated daily (Tuesday through Sunday). NOTE: Summonses that have been written-off are indicated by blank financials.
Summons images will not be available during scheduled downtime on Sunday - Monday from 1:00 am to 2:30 am and on Sundays from 5:00 am to 10:00 am.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Library Dataset for SQL Project
Watch Full Video -- https://www.youtube.com/watch?v=6X2-P9fNVvw
Project Files -- https://github.com/najirh/Library-System-Management---P2?tab=readme-ov-file
This dataset was created by George M122
This dataset was created by Luis Lira
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Abdallah Nasser
Released under Apache 2.0
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Andrew Dolcimascolo-Garrett
Released under MIT
This dataset was created by Jitendra Kumar
Released under Other (specified in description)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Paragi Jain11
Released under CC0: Public Domain
This dataset was created by Emmanuel Chude
This dataset was created by Luis Lira
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This project focuses on analyzing the S&P 500 companies using data analysis tools like Python (Pandas), SQL, and Power BI. The goal is to extract insights related to sectors, industries, locations, and more, and visualize them using dashboards.
Included Files:
sp500_cleaned.csv โ Cleaned dataset used for analysis
sp500_analysis.ipynb โ Jupyter Notebook (Python + SQL code)
dashboard_screenshot.png โ Screenshot of Power BI dashboard
README.md โ Summary of the project and key takeaways
This project demonstrates practical data cleaning, querying, and visualization skills.
This project answers some business questions for a cupcake business company, by analyzing their sales data by SQL. The business wants to know
Here the database used is PostgreSQL .
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
๐ Sales Data Analysis Using MySQL, Excel & Power BI ๐ Project Overview This project focuses on analyzing sales data to extract valuable insights, identify trends, and support business decision-making. Using MySQL for querying, Excel for data manipulation, and Power BI for visualization, we explore key sales performance metrics.
๐ Tools Used โ MySQL โ Data storage, cleaning, and analysis using SQL queries. โ Excel โ Data preprocessing, pivot tables, and basic visualization. โ Power BI โ Interactive dashboards for advanced data visualization.
๐ Dataset Information Source: Kaggle Superstore Sales Dataset Data Size: 10,000+ records Key Features: Sales, Customer Details, Ship Mode, Product Category, Region
๐ Key Business Questions Answered 1๏ธโฃ What are the top-performing sales regions? โ Used Power BI Map Visualization to analyze sales distribution by region. โ Key Insight: The highest sales were recorded in the West & East regions, while some regions showed potential for improvement.
2๏ธโฃ Which product categories drive the highest revenue? โ Used Excel Pivot Tables to aggregate Sales by Category. โ Observation: "Technology" products had the highest sales, followed by "Furniture" and "Office Supplies."
3๏ธโฃ Who are the top 10 customers by sales volume? โ Extracted top customers using SQL Queries & Power BI Ranking Functions. โ Business Insight: Retaining these customers can significantly boost revenue.
4๏ธโฃ Which are the top 5 best-selling products? โ Aggregated product sales using MySQL SUM() function. โ Result: High-demand products identified, helping in inventory planning.
5๏ธโฃ How does shipping mode affect sales? โ Created Power BI Slicer & Bar Chart for Ship Mode Analysis. โ Finding: Standard Class was the most used, while Same-Day shipping had lower but high-value orders.
๐ Power BI Dashboard Overview ๐น Sales by Region โ Geographical performance map ๐น Top 10 Customers โ Key customers contributing to revenue ๐น Category & Sales โ Identifying best-performing categories ๐น Top 5 Products โ Sales contribution by product ๐น Shipping Mode Impact โ Analyzing customer shipping preferences
๐ Business Insights & Recommendations ๐ Optimize Marketing Efforts โ Focus more on high-performing regions. ๐ Inventory Management โ Maintain high stock levels for top-selling products. ๐ Customer Retention Strategies โ Prioritize personalized marketing for top customers. ๐ Improve Shipping Efficiency โ Explore cost-effective shipping options for increased profitability.
๐ข Why This Project? This project helped me strengthen my SQL querying skills, enhance Excel data manipulation, and build Power BI dashboards for professional data storytelling.
๐ก Next Steps: Expanding analysis with predictive analytics & machine learning.
๐ Project Files & Resources ๐ Dataset โ Available on Kaggle ๐ Power BI Dashboard โ Shared in project files ๐ SQL Queries & Excel Reports โ Available for reference
๐ Let's Connect! ๐จโ๐ป LinkedIn โ www.linkedin.com/in/ pooja-akash-lohkare-62a6a5b6
๐ง Contact โ poojacareer789@gmail.com
If you found this useful, upvote & comment with your feedback! ๐
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Josรฉ Francisco Lara Cardenas
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Ragini1
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset extracted from the 2019 Adventure Works database. 4 files: -Dimension Calendar -Dimension Customer -Dimension Product -Fact Internet Sales
All tables used in the SQL project attached to the Dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Patients Table:
This table stores information about individual patients, including their names and contact details.
Doctors Table:
This table contains details about healthcare providers, including their names, specializations, and contact information.
Appointments Table:
This table records scheduled appointments, linking patients to doctors.
MedicalProcedure Table:
This table stores details about medical procedures associated with specific appointments.
Billing Table:
This table maintains records of billing transactions, associating them with specific patients.
demo Table:
This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.
This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.
****Super Store Analytics with SQL and Looker Studio****
I am excited to share a project that I recently completed, focusing on comprehensive analytics for a superstore using SQL and visualizations crafted in Looker Studio. This project aimed to enhance decision-making processes by leveraging robust data analysis and interactive visualizations. All the data presented is expressed in thousands.
Key Components: Query Optimization: Leveraging the power of SQL ( DBeaver, Postgres) , I implemented optimized queries to extract meaningful insights from the vast dataset. This involved employing aggregate functions, joins, and subqueries to retrieve specific information such as sales trends, customer behaviors, and inventory management.
Looker Studio Visualizations: To provide a user-friendly interface for data exploration, Looker Studio was employed to create interactive and insightful visualizations. Dashboards were crafted to offer a holistic view of superstore performance, enabling stakeholders to identify patterns, trends, and areas for improvement. Looker studio visualisation
Project Achievements:
Data Description:
Order_ID (integer): Unique identifier for each order.
Order_Date (date): Date when the order was placed.
Ship_Date (date): Date when the order was shipped.
Interval (day) (integer): Number of days between order placement and shipment.
Ship_Mode (string): Shipping method chosen for the order.
Customer_ID (integer): Unique identifier for each customer.
Customer_Name (string): Name of the customer.
Segment (string): Customer segmentation.
Country (string): Country where the order was placed.
City (string): City where the order was placed.
State (string): State where the order was placed.
Postal_Code (string): Postal code of the order location.
Region (string): Geographical region of the order.
Product_ID (integer): Unique identifier for each product.
Category (string): Product category.
Sub_Category (string): Product sub-category.
Product_Name (string): Name of the product.
Sales (float): Sales amount for the order.
Quantity (integer): Quantity of products in the order.
Discount (float): Discount applied to the order.
Profit (float): Profit generated from the order.
Returned (string): Indicates whether the order was returned (Yes/No).
Person (string): Customer categorization.
Region (string): Geographic region associated with the customer.*
SELECT
o."Order_ID",
o."Order_Date",
o."Ship_Date",
o."Ship_Mode",
o."Segment",
o."Country",
o."City",
o."State",
o."Postal_Code",
o."Region",
o."Product_ID",
o."Category",
o."Sub_Category",
o."Product_Name",
o."Sales",
o."Quantity",
o."Discount",
o."Profit",
p."Person",
CASE WHEN r."Returned" = 'yes' THEN 'yes' ELSE 'No' END AS "Returned",
min("Sales" - "Profit") as Total_cost,
min("Sales" / ("Quantity" * (1 - "Discount"))) as price_per_unit
FROM
orders o
JOIN
"Return" r ON r."Order_ID" = o."Order_ID"
JOIN
people p ON p."Region" = o."Region"
GROUP BY
o."Order_ID",
o."Order_Date",
o."Ship_Date",
o."Ship_Mode",
o."Segment",
o."Country",
o."City",
o."State",
o."Postal_Code",
o."Region",
o."Product_ID",
o."Category",
o."Sub_Category",
o."Product_Name",
o."Sales",
o."Quantity",
o."Discount",
"Returned" ,
o."Profit",
p."Person"
ORDER BY
Total_cost,
price_per_unit DESC;
Questions:
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains comprehensive synthetic healthcare data designed for fraud detection analysis. It includes information on patients, healthcare providers, insurance claims, and payments. The dataset is structured to mimic real-world healthcare transactions, where fraudulent activities such as false claims, overbilling, and duplicate charges can be identified through advanced analytics.
The dataset is suitable for practicing SQL queries, exploratory data analysis (EDA), machine learning for fraud detection, and visualization techniques. It is designed to help data analysts and data scientists develop and refine their analytical skills in the healthcare insurance domain.
Dataset Overview The dataset consists of four CSV files:
Patients Data (patients.csv)
Contains demographic details of patients, such as age, gender, insurance type, and location. Can be used to analyze patient demographics and healthcare usage patterns. Providers Data (providers.csv)
Contains information about healthcare providers, including provider ID, specialty, location, and associated hospital.
Useful for identifying fraudulent claims linked to specific providers or hospitals. Claims Data (claims.csv)
Contains records of insurance claims made by patients, including diagnosis codes, treatment details, provider ID, and claim amount.
Can be analyzed for suspicious patterns, such as excessive claims from a single provider or duplicate claims for the same patient.
Payments Data (payments.csv) Contains details of claim payments made by insurance companies, including payment amount, claim ID, and reimbursement status.
Helps in detecting discrepancies between claims and actual reimbursements. Possible Analysis Ideas
This dataset allows for multiple analysis approaches, including but not limited to:
๐น Fraud Detection: Identify patterns in claims data to detect fraudulent activities (e.g., excessive billing, duplicate claims). ๐น Provider Behavior Analysis: Analyze providers who have an unusually high claim volume or high rejection rates. ๐น Payment Trends: Compare claims vs. payments to find irregularities in reimbursement patterns. ๐น Patient Demographics & Utilization: Explore which patient groups are more likely to file claims and receive reimbursements. ๐น SQL Query Practice: Perform advanced SQL queries, including joins, aggregations, window functions, and subqueries, to extract insights from the data.
Use Cases Practicing SQL queries for job interviews and real-world projects. Learning data cleaning, data wrangling, and feature engineering for healthcare analytics. Applying machine learning techniques for fraud detection. Gaining insights into the healthcare insurance domain and its challenges.
License & Usage License: CC0 Public Domain (Free to use for any purpose).
Attribution: Not required but appreciated. Intended Use: This dataset is for educational and research purposes only.
This dataset is an excellent resource for aspiring data analysts, data scientists, and SQL learners who want to gain hands-on experience in healthcare fraud detection.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Computer Science Students Dataset
This dataset contains information about computer science students from a fictional university. It includes attributes such as Student ID, Name, Gender, Age, GPA, Major, Interested Domain, Projects undertaken, and skills in Python, SQL, and Java. The dataset aims to provide insights into the academic performance, career aspirations, and technical skills of students in the field of computer science.
Columns: Student ID: Unique identifier for each student. Name: Name of the student. Gender: Gender of the student. Age: Age of the student. GPA: Grade Point Average of the student. Major: Field of study within computer science. Interested Domain: Area of interest within the field of computer science. Projects: Noteworthy projects completed by the student. Python: Proficiency level in Python programming. SQL: Proficiency level in SQL querying. Java: Proficiency level in Java programming.
Purpose: This dataset is suitable for tasks such as predictive modeling to understand factors influencing career choices in computer science students. The "Future Career" column serves as the target variable for classification tasks. Researchers, educators, and data enthusiasts can utilize this dataset for various educational and analytical purposes in the realm of computer science education and career planning.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Check out our data lens page for additional data filtering and sorting options: https://data.cityofnewyork.us/view/i4p3-pe6a
This dataset contains Open Parking and Camera Violations issued by the City of New York. Updates will be applied to this data set on the following schedule:
New or open tickets will be updated weekly (Sunday). Tickets satisfied will be updated daily (Tuesday through Sunday). NOTE: Summonses that have been written-off are indicated by blank financials.
Summons images will not be available during scheduled downtime on Sunday - Monday from 1:00 am to 2:30 am and on Sundays from 5:00 am to 10:00 am.