Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Andrew Dolcimascolo-Garrett
Released under MIT
Facebook
TwitterSQL Case Study Project: Employee Database Analysis 📊
I recently completed a comprehensive SQL project involving a simulated employee database with multiple tables:
In this project, I practiced and applied a wide range of SQL concepts:
✅ Simple Queries ✅ Filtering with WHERE conditions ✅ Sorting with ORDER BY ✅ Aggregation using GROUP BY and HAVING ✅ Multi-table JOINs ✅ Conditional Logic using CASE ✅ Subqueries and Set Operators
💡 Key Highlights:
🛠️ Tools Used: Azure Data Studio
📂 You can find the entire project and scripts here:
👉 https://github.com/RiddhiNDivecha/Employee-Database-Analysis
This project helped me sharpen my SQL skills and understand business logic more deeply in a practical context.
💬 I’m open to feedback and happy to connect with fellow data enthusiasts!
Facebook
TwitterRSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.
The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.
For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
patCit: A Comprehensive Dataset of Patent Citations [Newsletter, GitHub] Patents are at the crossroads of many innovation nodes: science, industry, products, competition, etc. Such interactions can be identified through citations in a broad sense. It is now common to use front-page patent citations to study some aspects of the innovation system. However, there is much more buried in the Non Patent Literature (NPL) citations and in the patent text itself. patCit extracts and structures these citations. Want to know more? Read patCit academic presentation or dive into usage and technical guides on patCit documentation website. IN PRACTICE At patCit, we are building a comprehensive dataset of patent citations to help the community explore this terra incognita. patCit has the following features: global coverage front-page and in-text citations all categories of NPL documents Front-page patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it into 10 categories (bibliographical reference, database, norm & standard, etc). Then, we design and apply category specific information extraction models using spaCy. Eventually, when possible, we enrich the data using external domain specific high quality databases (e.g. Crossref for bibliographical references). In-text patCit builds on Google Patents corpus of USPTO full-text patents. First, we extract patent and bibliographical reference citations. Then, we parse detected in-text citations into a series of category dependent attributes using grobid. Patent citations are matched with a standard publication number using the Google Patents matching API and bibliographical references are matched with a DOI using biblio-glutton. Eventually, when possible, we enrich the data using external domain specific high quality databases (e.g. Crossref for bibliographical references). FAIR Find - The patCit dataset is available on BigQuery in an interactive environment. For those who have a smattering of SQL, this is the perfect place to explore the data. It can also be downloaded on Zenodo. Interoperate - Interoperability is at the core of patCit ambition. We take care to extract unique identifiers whenever it is possible to enable data enrichment for domain specific high quality databases. This includes the DOI, PMID and PMCID for bibliographical references, the Technical Doc Number for standards, the Accession Number for Genetic databases, the publication number for PATSTAT and Claims, etc. See specific table for more details. Reproduce - Our gitHub repository is the project factory. You can learn more about data recipes and models on the patCit documentation website.
Facebook
TwitterThis is the sample database from sqlservertutorial.net. This is a great dataset for learning SQL and practicing querying relational databases.
Database Diagram:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4146319%2Fc5838eb006bab3938ad94de02f58c6c1%2FSQL-Server-Sample-Database.png?generation=1692609884383007&alt=media" alt="">
The sample database is copyrighted and cannot be used for commercial purposes. For example, it cannot be used for the following but is not limited to the purposes: - Selling - Including in paid courses
Facebook
TwitterTypically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
"This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."
Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.
Image from stocksnap.io.
Analyses for this dataset could include time series, clustering, classification and more.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📖 Dataset Description
This dataset provides an end-to-end view of vendor performance across multiple dimensions — purchases, sales, inventory, pricing, and invoices. It is designed for data analytics, visualization, and business intelligence projects, making it ideal for learners and professionals exploring procurement, vendor management, and supply chain optimization.
🔗 GitHub Project (Code + Power BI Dashboard): Vendor Performance Analysis[https://github.com/HARSH-MADHAVAN/Vendor-Performance-Analysis]
The dataset includes:
purchases.csv → Detailed vendor purchase transactions sales.csv → Sales performance data linked to vendors inventory.csv (begin & end) → Stock levels at different periods purchase_prices.csv → Historical vendor pricing vendor_invoice.csv → Invoice details for reconciliation vendor_sales_summary.csv → Aggregated vendor-wise sales insights
Use this dataset to practice:
SQL querying & data modeling Python analytics & preprocessing Power BI dashboarding & reporting
Facebook
TwitterSupply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
Facebook
TwitterThis dataset has been created for educational purposes, specifically to help learners practice SQL-like operations using Python’s pandas library. It is ideal for beginners who want to improve their data manipulation, querying, and transformation skills in a notebook environment such as Kaggle.
The dataset simulates a simple personnel and department system. It includes two tables:
personel: Contains employee data such as names, ages, salaries, and department IDs. departman: Contains department IDs and corresponding department names. Throughout this project, key SQL operations have been demonstrated with their pandas equivalents. These include:
Basic commands like SELECT, INSERT, UPDATE, DELETE Table structure operations: ALTER, DROP, TRUNCATE, COPY Filtering and logical expressions: WHERE, AND, OR, IN, IS NULL, BETWEEN, LIKE Aggregations and sorting: COUNT(), ORDER BY, LIMIT, DISTINCT String functions: LOWER, TRIM, REPLACE, SPLIT, LENGTH Joins: INNER JOIN, LEFT JOIN Comparison operators: =, !=, <, > The goal is to provide a hands-on, interactive environment for practicing SQL logic using real Python code. This dataset does not represent real individuals or businesses — it is entirely fictional and meant for training, teaching, and experimentation purposes only.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset represents a Snowflake Schema model built from the popular Tableau Superstore dataset which exists primarily in a denormalized (flat) format.
This version is fully structured into fact and dimension tables, making it ready for data warehouse design, SQL analytics, and BI visualization projects.
The dataset was modeled to demonstrate dimensional modeling best practices, showing how the original flat Superstore data can be normalized into related dimensions and a central fact table.
Use this dataset to: - Practice SQL joins and schema design - Build ETL pipelines or dbt models - Design Power BI dashboards - Learn data warehouse normalization (3NF → Snowflake) concepts - Simulate enterprise data warehouse reporting environments
I’m open to suggestions or improvements from the community — feel free to share ideas on additional dimensions, measures, or transformations that could improve and make this dataset even more useful for learning and analysis.
Transformation was done using dbt, check out the models and the entire project.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a cleaned and structured dataset for a real-world data analytics project designed around ML Dental Clinic, a fictional but highly realistic dental clinic based in Tilak Nagar, West Delhi.
🦷 Dataset Highlights: - Covers 896 patient records from Jan 2023 to Dec 2024 - Includes demographics, visit dates, treatments, doctors, billing, discounts, and due amounts - Treatment handled by 2 doctors: Dr. Kajal (Implantologist) and Dr. Karan (Oral Surgeon) - Realistic pricing and billing logic (OPD-only charges, waived fees on treatment, free camps, etc.) - Built for data cleaning, SQL querying, Python analysis, and Power BI dashboard creation
✅ Use cases: - Healthcare analytics practice - MySQL or Power BI dashboard creation - End-to-end data analyst portfolio projects - Freelance healthcare reporting automation
🛠 Tech Stack Used in Project: - Python (Pandas, Matplotlib, Seaborn) - MySQL Workbench - Power BI - Excel
📌 GitHub Project Link:
https://github.com/kumararjunjha/ML-Dental-Clinic-Data-Analysis
👨💻 Created by: Arjun Jha
🔍 Aspiring Freelance Data Analyst | Healthcare Data Projects | Portfolio-ready work
📬 Reach out on LinkedIn: https://linkedin.com/in/kumararjunjha
Let me know what insights you discover with this data!
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Highlighting both practice projects and contributions I've made on the job, with a focus on practical, results-driven analysis. Each project reflects my ability to solve business problems using tools like Excel for data visualization, SQL for querying and structuring data, and the skills I've built in Python.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Hotel Room Booking & Customer Orders Dataset This is a rich, synthetic dataset meticulously designed for data analysts, data scientists, and machine learning practitioners to practice their skills on realistic e-commerce data. It models a hotel booking platform, providing a comprehensive and interconnected environment to analyze booking trends, customer behavior, and operational patterns. It is an ideal resource for building a professional portfolio project from initial exploratory data analysis to advanced predictive modeling.
The dataset is structured as a relational database, consisting of three core tables that can be easily joined:
rooms.csv: This table serves as the hotel's inventory, containing a catalog of unique rooms with essential attributes such as room_id, type, capacity, and price_per_night.
customers.csv: This file provides a list of unique customers, offering demographic insights with columns like customer_id, name, country, and age. This data can be used to segment customers and personalize marketing strategies.
orders.csv: As the central transactional table, it links rooms and customers, capturing the details of each booking. Key columns include order_id, customer_id, room_id, booking_date, and the order_total, which can be derived from the room price and the duration of the stay.
This dataset is valuable because its structure enables a wide range of analytical projects. The relationships between tables are clearly defined, allowing you to practice complex SQL joins and data manipulation with Pandas. The presence of both categorical data (room_type, country) and numerical data (age, price) makes it versatile for different analytical approaches.
Use Cases for Data Exploration & Modeling This dataset is a versatile tool for a wide range of analytical projects:
Data Visualization: Create dashboards to analyze booking trends over time, identify the most popular room types, or visualize the geographical distribution of your customer base.
Machine Learning: Build a regression model to predict the order_total based on room type and customer characteristics. Alternatively, you could develop a model to recommend room types to customers based on their past orders.
SQL & Database Skills: Practice complex queries to find the average order value per country, or identify the most profitable room types by month.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset looks at the number of movies produced in the United States of America that fall into the "crime" genre between 1985 and 2017 and compares it to violent crime rates of the same time. The time frame was chosen based off of accessible data (The Movies Dataset ends with 2017 and the FBI's CDE tool starts at 1985).
The data for the movies and genres was pulled from "The Movies Dataset" on Kaggle where columns were adjusted and the first two genres were kept. The data was then filtered to only include films released in the United States of America from 1985-2017. Violent crime data and population data in the USA was then joined.
movies-to-crime_data_by_population_1985-2017_2023-03-06.csv: This file contains the filtered and sorted data joining together the rest of the included data.
movies_data_cleaned_V2.csv: This includes a large movie dataset that was pulled from the aforementioned "The Movies Dataset" and adjusted for usability for this project, find original dataset here.
population_data_1985-2017.csv: This data was pulled from the World Bank, Population, Total for United States [POPTOTUSA647NWDB], retrieved from FRED, Federal Reserve Bank of St. Louis.
violent_crime_rates_USA_1985-2017_2024-03-06.csv: This data was pulled from the Federal Bureau of Investigation's "Crime Data Explorer" tool. Data pulled includes all violent crime 1985-2017. More information concerning how violent crimes are categorized can be found on the Crime Data Explorer's website linked above.
All data was sourced via publicly available datasets and linked to above. Special thanks to Kaggle user Rounak Banik for their work creating "The Movies Dataset" which was incredibly helpful.
This project was a side project to gain further practice with tools such as SQL, R, Tableau and spreadsheets. It began with a focus on authors of crime novels vs amount of actual criminals. The project soon morphed into this after a struggle to find usable datasets.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Sreelakshmi Sivan
Released under MIT
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive collection of YouTube video and channel metadata curated for data analysis, visualization, and storytelling projects. It contains rich information on trending videos across multiple countries, including video performance statistics, engagement metrics, and channel-level details.
The dataset is designed to help learners and researchers explore real-world YouTube dynamics, such as: • What type of content gains the highest views and engagement? • How do categories perform across different countries? • What role do publishing time, video duration, or tags play in driving popularity? • Which channels dominate in terms of subscribers, views, and content consistency?
Features
The dataset includes detailed video-level fields such as: • Video ID, title, description, and publish time • Trending date and country • Tags, categories, duration, resolution, and licensed content status • Views, likes, and comment counts
Alongside channel-level information including: • Channel ID, title, and description • Channel country, publish date, and custom URL (if available) • Subscriber count, total views, video count, and hidden subscriber flag
With this structured dataset, students and professionals can perform data cleaning, transformation, SQL querying, trend analysis, and dashboarding in tools such as Excel, SQL, Power BI, Tableau, and Python. It is also suitable for advanced machine learning tasks like predicting video performance, engagement modeling, and natural language processing on video titles and descriptions.
Use Cases 1. Descriptive Analytics: Identify top categories, channels, and countries leading the YouTube trending space. 2. Comparative Analysis: Compare engagement rates across different regions and content types. 3. Visualization Projects: Create dashboards showing performance KPIs, category trends, and time-based patterns. 4. Storytelling: Derive business insights and best practices for creators, marketers, and educators on YouTube.
Educational Value
This dataset is structured specifically for student projects and group assignments. It ensures every learner can take a role—whether as a data engineer, analyst, visualization specialist, or business storyteller—mirroring the structure of real-world consulting projects.
Credits
This dataset is published as part of the YouTube Data Analytics Project initiated by Analytics Circle, an institute dedicated to empowering learners with practical data analytics, data science, and AI skills through hands-on projects and real-world applications.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
In the case study titled "Blinkit: Grocery Product Analysis," a dataset called 'Grocery Sales' contains 12 columns with information on sales of grocery items across different outlets. Using Tableau, you as a data analyst can uncover customer behavior insights, track sales trends, and gather feedback. These insights will drive operational improvements, enhance customer satisfaction, and optimize product offerings and store layout. Tableau enables data-driven decision-making for positive outcomes at Blinkit.
The table Grocery Sales is a .CSV file and has the following columns, details of which are as follows:
• Item_Identifier: A unique ID for each product in the dataset. • Item_Weight: The weight of the product. • Item_Fat_Content: Indicates whether the product is low fat or not. • Item_Visibility: The percentage of the total display area in the store that is allocated to the specific product. • Item_Type: The category or type of product. • Item_MRP: The maximum retail price (list price) of the product. • Outlet_Identifier: A unique ID for each store in the dataset. • Outlet_Establishment_Year: The year in which the store was established. • Outlet_Size: The size of the store in terms of ground area covered. • Outlet_Location_Type: The type of city or region in which the store is located. • Outlet_Type: Indicates whether the store is a grocery store or a supermarket. • Item_Outlet_Sales: The sales of the product in the particular store. This is the outcome variable that we want to predict.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
The dataset is taken from Kaggle.
Facebook
TwitterAny aspiring datascientist will look everything in view of data. Even when chilling with friends, watching cricket live and cheering for the favorite team.
It includes ODI, Test, t20 statistics of all the players in all the three category (batting ,bowling and fielding).
We wouldn't be here without the help of cricket. Thank you for all the great cricketers for the wonderful contribution.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Hello,
In this post, we have compiled Turkish SQL queries that are frequently encountered in daily use. There are 1303 queries in total. 966 of them are SELECT, 337 of them are DELETE and UPDATE queries. This resource can be a useful reference for those who want to learn SQL or reinforce existing knowledge.
The content is presented in an understandable and enriched manner with real-life examples. It can be an ideal resource for those who want to understand and practice the basics of database operations. There are also explanations and usage examples next to each query.
This post is open to everyone and has a structure that can be stretched so that anyone who wants to contribute can make additions. You can also add your own SQL queries or edit existing ones.
Project Features: - Total Number of Queries: 1303 - Number of SELECT Queries: 966 - Number of DELETE and UPDATE Queries: 337
Example SQL Query Categories: - Select Data (SELECT) Queries - Data Delete (DELETE) Queries - Data Update (UPDATE) Queries
I hope this resource supports and helps you in your SQL learning process. We welcome your contributions!
Kind regards, Sahil Rzayev
Merhaba,
Bu paylaşımımızda günlük kullanımda sıklıkla karşılaşılan Türkçe SQL sorgularını derledik. Toplamda 1303 adet sorgu yer almaktadır. Bunlardan 966 adedi SELECT, 337 adedi ise DELETE ve UPDATE sorgularından oluşmaktadır. Bu kaynak, SQL öğrenmek veya mevcut bilgileri pekiştirmek isteyenler için faydalı bir referans olabilir.
İçerik, gerçek hayattan örneklerle zenginleştirilmiş ve anlaşılır bir şekilde sunulmuştur. Veritabanı işlemlerinin temellerini anlamak ve pratik yapmak isteyenler için ideal bir kaynak olabilir. Ayrıca, her sorgunun yanında açıklamalar ve kullanım örnekleri de bulunmaktadır.
Bu paylaşım herkese açık olarak sunulmuştur ve katkıda bulunmak isteyen herkesin eklemeler yapabilmesi için esnetilebilir bir yapıya sahiptir. Siz de kendi SQL sorgularınızı ekleyebilir veya mevcut sorguları düzenleyebilirsiniz.
Proje Özellikleri: - Toplam Sorgu Sayısı: 1303 - SELECT Sorgusu Sayısı: 966 - DELETE ve UPDATE Sorgusu Sayısı: 337
Örnek SQL Sorgu Kategorileri: - Veri Seçme (SELECT) Sorguları - Veri Silme (DELETE) Sorguları - Veri Güncelleme (UPDATE) Sorguları
Umarım bu kaynak, SQL öğrenme sürecinizi destekler ve size yardımcı olur. Katkılarınızı bekliyoruz!
Saygılarımla, Sahil Rzayev
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Andrew Dolcimascolo-Garrett
Released under MIT