20 datasets found
  1. Hospital Database Management System SQL Project

    • kaggle.com
    zip
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Dolcimascolo-Garrett (2024). Hospital Database Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/andrewdolcigarrett/hospital-database-management-system-sql-project
    Explore at:
    zip(1487278 bytes)Available download formats
    Dataset updated
    May 9, 2024
    Authors
    Andrew Dolcimascolo-Garrett
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Andrew Dolcimascolo-Garrett

    Released under MIT

    Contents

  2. Employee Database for SQL Case Study

    • kaggle.com
    zip
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riddhi N Divecha (2025). Employee Database for SQL Case Study [Dataset]. https://www.kaggle.com/datasets/riddhindivecha/employee-database-for-sql-case-study/code
    Explore at:
    zip(890 bytes)Available download formats
    Dataset updated
    Jun 21, 2025
    Authors
    Riddhi N Divecha
    Description

    SQL Case Study Project: Employee Database Analysis 📊

    I recently completed a comprehensive SQL project involving a simulated employee database with multiple tables:

    • 🏢 DEPARTMENT
    • 👨‍💼 EMPLOYEE
    • 💼 JOB
    • 🌍 LOCATION

    In this project, I practiced and applied a wide range of SQL concepts:

    
✅ Simple Queries 
✅ Filtering with WHERE conditions 
✅ Sorting with ORDER BY 
✅ Aggregation using GROUP BY and HAVING 
✅ Multi-table JOINs
 ✅ Conditional Logic using CASE 
✅ Subqueries and Set Operators

    💡 Key Highlights:

    • Salary grade classifications
    • Department-level insights
    • Employee trends based on hire dates
    • Advanced queries like Nth highest salary

    🛠️ Tools Used:
 Azure Data Studio

    📂 You can find the entire project and scripts here:


    👉 https://github.com/RiddhiNDivecha/Employee-Database-Analysis

    This project helped me sharpen my SQL skills and understand business logic more deeply in a practical context.

    💬 I’m open to feedback and happy to connect with fellow data enthusiasts!

    SQL #DataAnalytics #PortfolioProject #CaseStudy #LearningByDoing #DataScience #SQLProject

  3. IMDB Movies Analysis - SQL

    • kaggle.com
    zip
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav B R (2023). IMDB Movies Analysis - SQL [Dataset]. https://www.kaggle.com/datasets/gauravbr/imdb-movies-data-erd
    Explore at:
    zip(3818401 bytes)Available download formats
    Dataset updated
    Feb 21, 2023
    Authors
    Gaurav B R
    Description

    SQL IMDB Movies Analysis for RSVP (Film Production Company)

    RSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.

    The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.

    For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.

  4. Data from: PatCit: A Comprehensive Dataset of Patent Citations

    • search.datacite.org
    Updated Dec 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cyril Verluise; Gabriele Cristelli; Kyle Higham; Lucas Violon; Gaétan De Rassenfosse (2020). PatCit: A Comprehensive Dataset of Patent Citations [Dataset]. http://doi.org/10.5281/zenodo.4391095
    Explore at:
    Dataset updated
    Dec 23, 2020
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Zenodohttp://zenodo.org/
    Authors
    Cyril Verluise; Gabriele Cristelli; Kyle Higham; Lucas Violon; Gaétan De Rassenfosse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    patCit: A Comprehensive Dataset of Patent Citations [Newsletter, GitHub] Patents are at the crossroads of many innovation nodes: science, industry, products, competition, etc. Such interactions can be identified through citations in a broad sense. It is now common to use front-page patent citations to study some aspects of the innovation system. However, there is much more buried in the Non Patent Literature (NPL) citations and in the patent text itself. patCit extracts and structures these citations. Want to know more? Read patCit academic presentation or dive into usage and technical guides on patCit documentation website. IN PRACTICE At patCit, we are building a comprehensive dataset of patent citations to help the community explore this terra incognita. patCit has the following features: global coverage front-page and in-text citations all categories of NPL documents Front-page patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it into 10 categories (bibliographical reference, database, norm & standard, etc). Then, we design and apply category specific information extraction models using spaCy. Eventually, when possible, we enrich the data using external domain specific high quality databases (e.g. Crossref for bibliographical references). In-text patCit builds on Google Patents corpus of USPTO full-text patents. First, we extract patent and bibliographical reference citations. Then, we parse detected in-text citations into a series of category dependent attributes using grobid. Patent citations are matched with a standard publication number using the Google Patents matching API and bibliographical references are matched with a DOI using biblio-glutton. Eventually, when possible, we enrich the data using external domain specific high quality databases (e.g. Crossref for bibliographical references). FAIR Find - The patCit dataset is available on BigQuery in an interactive environment. For those who have a smattering of SQL, this is the perfect place to explore the data. It can also be downloaded on Zenodo. Interoperate - Interoperability is at the core of patCit ambition. We take care to extract unique identifiers whenever it is possible to enable data enrichment for domain specific high quality databases. This includes the DOI, PMID and PMCID for bibliographical references, the Technical Doc Number for standards, the Accession Number for Genetic databases, the publication number for PATSTAT and Claims, etc. See specific table for more details. Reproduce - Our gitHub repository is the project factory. You can learn more about data recipes and models on the patCit documentation website.

  5. Bike Store Relational Database | SQL

    • kaggle.com
    zip
    Updated Aug 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dillon Myrick (2023). Bike Store Relational Database | SQL [Dataset]. https://www.kaggle.com/datasets/dillonmyrick/bike-store-sample-database
    Explore at:
    zip(94412 bytes)Available download formats
    Dataset updated
    Aug 21, 2023
    Authors
    Dillon Myrick
    Description

    This is the sample database from sqlservertutorial.net. This is a great dataset for learning SQL and practicing querying relational databases.

    Database Diagram:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4146319%2Fc5838eb006bab3938ad94de02f58c6c1%2FSQL-Server-Sample-Database.png?generation=1692609884383007&alt=media" alt="">

    Terms of Use

    The sample database is copyrighted and cannot be used for commercial purposes. For example, it cannot be used for the following but is not limited to the purposes: - Selling - Including in paid courses

  6. E-Commerce Data

    • kaggle.com
    zip
    Updated Aug 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carrie (2017). E-Commerce Data [Dataset]. https://www.kaggle.com/datasets/carrie1/ecommerce-data
    Explore at:
    zip(7548686 bytes)Available download formats
    Dataset updated
    Aug 17, 2017
    Authors
    Carrie
    Description

    Context

    Typically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".

    Content

    "This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."

    Acknowledgements

    Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.

    Image from stocksnap.io.

    Inspiration

    Analyses for this dataset could include time series, clustering, classification and more.

  7. Data from: Vendor Performance Analysis

    • kaggle.com
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harsh Madhavan (2025). Vendor Performance Analysis [Dataset]. https://www.kaggle.com/datasets/harshmadhavan/vendor-performance-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Kaggle
    Authors
    Harsh Madhavan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📖 Dataset Description

    This dataset provides an end-to-end view of vendor performance across multiple dimensions — purchases, sales, inventory, pricing, and invoices. It is designed for data analytics, visualization, and business intelligence projects, making it ideal for learners and professionals exploring procurement, vendor management, and supply chain optimization.

    🔗 GitHub Project (Code + Power BI Dashboard): Vendor Performance Analysis[https://github.com/HARSH-MADHAVAN/Vendor-Performance-Analysis]

    The dataset includes:

    purchases.csv → Detailed vendor purchase transactions sales.csv → Sales performance data linked to vendors inventory.csv (begin & end) → Stock levels at different periods purchase_prices.csv → Historical vendor pricing vendor_invoice.csv → Invoice details for reconciliation vendor_sales_summary.csv → Aggregated vendor-wise sales insights

    Use this dataset to practice:

    SQL querying & data modeling Python analytics & preprocessing Power BI dashboarding & reporting

  8. Supply Chain DataSet

    • kaggle.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Motefaker (2023). Supply Chain DataSet [Dataset]. https://www.kaggle.com/datasets/amirmotefaker/supply-chain-dataset
    Explore at:
    zip(9340 bytes)Available download formats
    Dataset updated
    Jun 1, 2023
    Authors
    Amir Motefaker
    Description

    Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.

  9. Avokado gelişim

    • kaggle.com
    zip
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABD�LKAD�R UY�UR (2025). Avokado gelişim [Dataset]. https://www.kaggle.com/datasets/abdlkadruyur/avokado-geliim
    Explore at:
    zip(5439 bytes)Available download formats
    Dataset updated
    May 22, 2025
    Authors
    ABD�LKAD�R UY�UR
    Description

    This dataset has been created for educational purposes, specifically to help learners practice SQL-like operations using Python’s pandas library. It is ideal for beginners who want to improve their data manipulation, querying, and transformation skills in a notebook environment such as Kaggle.

    The dataset simulates a simple personnel and department system. It includes two tables:

    personel: Contains employee data such as names, ages, salaries, and department IDs. departman: Contains department IDs and corresponding department names. Throughout this project, key SQL operations have been demonstrated with their pandas equivalents. These include:

    Basic commands like SELECT, INSERT, UPDATE, DELETE Table structure operations: ALTER, DROP, TRUNCATE, COPY Filtering and logical expressions: WHERE, AND, OR, IN, IS NULL, BETWEEN, LIKE Aggregations and sorting: COUNT(), ORDER BY, LIMIT, DISTINCT String functions: LOWER, TRIM, REPLACE, SPLIT, LENGTH Joins: INNER JOIN, LEFT JOIN Comparison operators: =, !=, <, > The goal is to provide a hands-on, interactive environment for practicing SQL logic using real Python code. This dataset does not represent real individuals or businesses — it is entirely fictional and meant for training, teaching, and experimentation purposes only.

  10. Superstore Snowflake Schema Modeling Dataset

    • kaggle.com
    zip
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chik0di (2025). Superstore Snowflake Schema Modeling Dataset [Dataset]. https://www.kaggle.com/datasets/chik0di/superstore-snowflake-schema-modeling-dataset
    Explore at:
    zip(474167 bytes)Available download formats
    Dataset updated
    Oct 30, 2025
    Authors
    Chik0di
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset represents a Snowflake Schema model built from the popular Tableau Superstore dataset which exists primarily in a denormalized (flat) format.

    This version is fully structured into fact and dimension tables, making it ready for data warehouse design, SQL analytics, and BI visualization projects.

    The dataset was modeled to demonstrate dimensional modeling best practices, showing how the original flat Superstore data can be normalized into related dimensions and a central fact table.

    Use this dataset to: - Practice SQL joins and schema design - Build ETL pipelines or dbt models - Design Power BI dashboards - Learn data warehouse normalization (3NF → Snowflake) concepts - Simulate enterprise data warehouse reporting environments

    I’m open to suggestions or improvements from the community — feel free to share ideas on additional dimensions, measures, or transformations that could improve and make this dataset even more useful for learning and analysis.

    Transformation was done using dbt, check out the models and the entire project.

  11. Dental Clinic Patient Data (2023-2024)

    • kaggle.com
    zip
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arjun Kumar Jha (2025). Dental Clinic Patient Data (2023-2024) [Dataset]. https://www.kaggle.com/datasets/arjunkumarjha1/dental-clinic
    Explore at:
    zip(17317 bytes)Available download formats
    Dataset updated
    Aug 4, 2025
    Authors
    Arjun Kumar Jha
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a cleaned and structured dataset for a real-world data analytics project designed around ML Dental Clinic, a fictional but highly realistic dental clinic based in Tilak Nagar, West Delhi.

    🦷 Dataset Highlights: - Covers 896 patient records from Jan 2023 to Dec 2024 - Includes demographics, visit dates, treatments, doctors, billing, discounts, and due amounts - Treatment handled by 2 doctors: Dr. Kajal (Implantologist) and Dr. Karan (Oral Surgeon) - Realistic pricing and billing logic (OPD-only charges, waived fees on treatment, free camps, etc.) - Built for data cleaning, SQL querying, Python analysis, and Power BI dashboard creation

    ✅ Use cases: - Healthcare analytics practice - MySQL or Power BI dashboard creation - End-to-end data analyst portfolio projects - Freelance healthcare reporting automation

    🛠 Tech Stack Used in Project: - Python (Pandas, Matplotlib, Seaborn) - MySQL Workbench - Power BI - Excel

    📌 GitHub Project Link:
    https://github.com/kumararjunjha/ML-Dental-Clinic-Data-Analysis

    👨‍💻 Created by: Arjun Jha
    🔍 Aspiring Freelance Data Analyst | Healthcare Data Projects | Portfolio-ready work
    📬 Reach out on LinkedIn: https://linkedin.com/in/kumararjunjha

    Let me know what insights you discover with this data!

  12. Waddle Portfolio

    • kaggle.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colin Waddle (2025). Waddle Portfolio [Dataset]. https://www.kaggle.com/datasets/colindwaddle/waddle-portfolio
    Explore at:
    zip(4330358 bytes)Available download formats
    Dataset updated
    Jul 31, 2025
    Authors
    Colin Waddle
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Highlighting both practice projects and contributions I've made on the job, with a focus on practical, results-driven analysis. Each project reflects my ability to solve business problems using tools like Excel for data visualization, SQL for querying and structuring data, and the skills I've built in Python.

  13. Boutique Hotel Dataset in Turkey

    • kaggle.com
    zip
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alperen Atik (2025). Boutique Hotel Dataset in Turkey [Dataset]. https://www.kaggle.com/datasets/alperenmyung/boutique-hotel-dataset-in-turkey/code
    Explore at:
    zip(299786 bytes)Available download formats
    Dataset updated
    Aug 8, 2025
    Authors
    Alperen Atik
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Türkiye
    Description

    The Hotel Room Booking & Customer Orders Dataset This is a rich, synthetic dataset meticulously designed for data analysts, data scientists, and machine learning practitioners to practice their skills on realistic e-commerce data. It models a hotel booking platform, providing a comprehensive and interconnected environment to analyze booking trends, customer behavior, and operational patterns. It is an ideal resource for building a professional portfolio project from initial exploratory data analysis to advanced predictive modeling.

    The dataset is structured as a relational database, consisting of three core tables that can be easily joined:

    rooms.csv: This table serves as the hotel's inventory, containing a catalog of unique rooms with essential attributes such as room_id, type, capacity, and price_per_night.

    customers.csv: This file provides a list of unique customers, offering demographic insights with columns like customer_id, name, country, and age. This data can be used to segment customers and personalize marketing strategies.

    orders.csv: As the central transactional table, it links rooms and customers, capturing the details of each booking. Key columns include order_id, customer_id, room_id, booking_date, and the order_total, which can be derived from the room price and the duration of the stay.

    This dataset is valuable because its structure enables a wide range of analytical projects. The relationships between tables are clearly defined, allowing you to practice complex SQL joins and data manipulation with Pandas. The presence of both categorical data (room_type, country) and numerical data (age, price) makes it versatile for different analytical approaches.

    Use Cases for Data Exploration & Modeling This dataset is a versatile tool for a wide range of analytical projects:

    Data Visualization: Create dashboards to analyze booking trends over time, identify the most popular room types, or visualize the geographical distribution of your customer base.

    Machine Learning: Build a regression model to predict the order_total based on room type and customer characteristics. Alternatively, you could develop a model to recommend room types to customers based on their past orders.

    SQL & Database Skills: Practice complex queries to find the average order value per country, or identify the most profitable room types by month.

  14. Crime Movies VS Violent Crimes in the USA

    • kaggle.com
    zip
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio101 (2024). Crime Movies VS Violent Crimes in the USA [Dataset]. https://www.kaggle.com/datasets/antonio101/violent-crime-vs-crime-movies-in-the-usa
    Explore at:
    zip(1098074 bytes)Available download formats
    Dataset updated
    Mar 13, 2024
    Authors
    Antonio101
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    This dataset looks at the number of movies produced in the United States of America that fall into the "crime" genre between 1985 and 2017 and compares it to violent crime rates of the same time. The time frame was chosen based off of accessible data (The Movies Dataset ends with 2017 and the FBI's CDE tool starts at 1985).

    The data for the movies and genres was pulled from "The Movies Dataset" on Kaggle where columns were adjusted and the first two genres were kept. The data was then filtered to only include films released in the United States of America from 1985-2017. Violent crime data and population data in the USA was then joined.

    Tableau Public Visualization

    Content

    movies-to-crime_data_by_population_1985-2017_2023-03-06.csv: This file contains the filtered and sorted data joining together the rest of the included data.

    movies_data_cleaned_V2.csv: This includes a large movie dataset that was pulled from the aforementioned "The Movies Dataset" and adjusted for usability for this project, find original dataset here.

    population_data_1985-2017.csv: This data was pulled from the World Bank, Population, Total for United States [POPTOTUSA647NWDB], retrieved from FRED, Federal Reserve Bank of St. Louis.

    violent_crime_rates_USA_1985-2017_2024-03-06.csv: This data was pulled from the Federal Bureau of Investigation's "Crime Data Explorer" tool. Data pulled includes all violent crime 1985-2017. More information concerning how violent crimes are categorized can be found on the Crime Data Explorer's website linked above.

    Acknowledgements

    All data was sourced via publicly available datasets and linked to above. Special thanks to Kaggle user Rounak Banik for their work creating "The Movies Dataset" which was incredibly helpful.

    Inspiration

    This project was a side project to gain further practice with tools such as SQL, R, Tableau and spreadsheets. It began with a focus on authors of crime novels vs amount of actual criminals. The project soon morphed into this after a struggle to find usable datasets.

  15. Supply Chain Management SQL Case Study

    • kaggle.com
    zip
    Updated Jan 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sreelakshmi Sivan (2024). Supply Chain Management SQL Case Study [Dataset]. https://www.kaggle.com/datasets/sreelakshmisivan/supply-chain-management-sql-case-study
    Explore at:
    zip(2830 bytes)Available download formats
    Dataset updated
    Jan 21, 2024
    Authors
    Sreelakshmi Sivan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Sreelakshmi Sivan

    Released under MIT

    Contents

  16. Youtube Trending Videos Dataset

    • kaggle.com
    zip
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keshav Bansal95 (2025). Youtube Trending Videos Dataset [Dataset]. https://www.kaggle.com/datasets/keshavbansal95/youtube-trending-videos-dataset
    Explore at:
    zip(274004927 bytes)Available download formats
    Dataset updated
    Sep 28, 2025
    Authors
    Keshav Bansal95
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    This dataset provides a comprehensive collection of YouTube video and channel metadata curated for data analysis, visualization, and storytelling projects. It contains rich information on trending videos across multiple countries, including video performance statistics, engagement metrics, and channel-level details.

    The dataset is designed to help learners and researchers explore real-world YouTube dynamics, such as: • What type of content gains the highest views and engagement? • How do categories perform across different countries? • What role do publishing time, video duration, or tags play in driving popularity? • Which channels dominate in terms of subscribers, views, and content consistency?

    Features

    The dataset includes detailed video-level fields such as: • Video ID, title, description, and publish time • Trending date and country • Tags, categories, duration, resolution, and licensed content status • Views, likes, and comment counts

    Alongside channel-level information including: • Channel ID, title, and description • Channel country, publish date, and custom URL (if available) • Subscriber count, total views, video count, and hidden subscriber flag

    With this structured dataset, students and professionals can perform data cleaning, transformation, SQL querying, trend analysis, and dashboarding in tools such as Excel, SQL, Power BI, Tableau, and Python. It is also suitable for advanced machine learning tasks like predicting video performance, engagement modeling, and natural language processing on video titles and descriptions.

    Use Cases 1. Descriptive Analytics: Identify top categories, channels, and countries leading the YouTube trending space. 2. Comparative Analysis: Compare engagement rates across different regions and content types. 3. Visualization Projects: Create dashboards showing performance KPIs, category trends, and time-based patterns. 4. Storytelling: Derive business insights and best practices for creators, marketers, and educators on YouTube.

    Educational Value

    This dataset is structured specifically for student projects and group assignments. It ensures every learner can take a role—whether as a data engineer, analyst, visualization specialist, or business storyteller—mirroring the structure of real-world consulting projects.

    Credits

    This dataset is published as part of the YouTube Data Analytics Project initiated by Analytics Circle, an institute dedicated to empowering learners with practical data analytics, data science, and AI skills through hands-on projects and real-world applications.

  17. Blinkit dataset

    • kaggle.com
    zip
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mukesh gadri (2024). Blinkit dataset [Dataset]. https://www.kaggle.com/datasets/mukeshgadri/blinkit-dataset
    Explore at:
    zip(695160 bytes)Available download formats
    Dataset updated
    Jul 18, 2024
    Authors
    mukesh gadri
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    In the case study titled "Blinkit: Grocery Product Analysis," a dataset called 'Grocery Sales' contains 12 columns with information on sales of grocery items across different outlets. Using Tableau, you as a data analyst can uncover customer behavior insights, track sales trends, and gather feedback. These insights will drive operational improvements, enhance customer satisfaction, and optimize product offerings and store layout. Tableau enables data-driven decision-making for positive outcomes at Blinkit.

    The table Grocery Sales is a .CSV file and has the following columns, details of which are as follows:

    • Item_Identifier: A unique ID for each product in the dataset. • Item_Weight: The weight of the product. • Item_Fat_Content: Indicates whether the product is low fat or not. • Item_Visibility: The percentage of the total display area in the store that is allocated to the specific product. • Item_Type: The category or type of product. • Item_MRP: The maximum retail price (list price) of the product. • Outlet_Identifier: A unique ID for each store in the dataset. • Outlet_Establishment_Year: The year in which the store was established. • Outlet_Size: The size of the store in terms of ground area covered. • Outlet_Location_Type: The type of city or region in which the store is located. • Outlet_Type: Indicates whether the store is a grocery store or a supermarket. • Item_Outlet_Sales: The sales of the product in the particular store. This is the outcome variable that we want to predict.

  18. Walmart Dataset

    • kaggle.com
    zip
    Updated Dec 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2021). Walmart Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/walmart-dataset
    Explore at:
    zip(125095 bytes)Available download formats
    Dataset updated
    Dec 26, 2021
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">

    Description:

    One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.

    Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

    Acknowledgements

    The dataset is taken from Kaggle.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build Regression models to predict the sales w.r.t single & multiple features.
    • Also evaluate the models & compare their respective scores like R2, RMSE, etc.
  19. Cricket data

    • kaggle.com
    zip
    Updated Jan 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mahendran narayanan (2020). Cricket data [Dataset]. https://www.kaggle.com/datasets/mahendran1/icc-cricket
    Explore at:
    zip(383854 bytes)Available download formats
    Dataset updated
    Jan 20, 2020
    Authors
    mahendran narayanan
    Description

    Context

    Any aspiring datascientist will look everything in view of data. Even when chilling with friends, watching cricket live and cheering for the favorite team.

    Content

    It includes ODI, Test, t20 statistics of all the players in all the three category (batting ,bowling and fielding).

    Acknowledgements

    We wouldn't be here without the help of cricket. Thank you for all the great cricketers for the wonderful contribution.

  20. Turkish Query and SQL - Türkçe Sorular ve SQL

    • kaggle.com
    zip
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Rzayev (2023). Turkish Query and SQL - Türkçe Sorular ve SQL [Dataset]. https://www.kaggle.com/datasets/sahilrzayev/turkish-query-answer-sql
    Explore at:
    zip(28124 bytes)Available download formats
    Dataset updated
    Jun 9, 2023
    Authors
    Sahil Rzayev
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    SQL Queries for Daily Usage in Turkish

    Hello,

    In this post, we have compiled Turkish SQL queries that are frequently encountered in daily use. There are 1303 queries in total. 966 of them are SELECT, 337 of them are DELETE and UPDATE queries. This resource can be a useful reference for those who want to learn SQL or reinforce existing knowledge.

    The content is presented in an understandable and enriched manner with real-life examples. It can be an ideal resource for those who want to understand and practice the basics of database operations. There are also explanations and usage examples next to each query.

    This post is open to everyone and has a structure that can be stretched so that anyone who wants to contribute can make additions. You can also add your own SQL queries or edit existing ones.

    Project Features: - Total Number of Queries: 1303 - Number of SELECT Queries: 966 - Number of DELETE and UPDATE Queries: 337

    Example SQL Query Categories: - Select Data (SELECT) Queries - Data Delete (DELETE) Queries - Data Update (UPDATE) Queries

    I hope this resource supports and helps you in your SQL learning process. We welcome your contributions!

    Kind regards, Sahil Rzayev

    Türkçe Günlük Kullanıma Dair SQL Sorguları

    Merhaba,

    Bu paylaşımımızda günlük kullanımda sıklıkla karşılaşılan Türkçe SQL sorgularını derledik. Toplamda 1303 adet sorgu yer almaktadır. Bunlardan 966 adedi SELECT, 337 adedi ise DELETE ve UPDATE sorgularından oluşmaktadır. Bu kaynak, SQL öğrenmek veya mevcut bilgileri pekiştirmek isteyenler için faydalı bir referans olabilir.

    İçerik, gerçek hayattan örneklerle zenginleştirilmiş ve anlaşılır bir şekilde sunulmuştur. Veritabanı işlemlerinin temellerini anlamak ve pratik yapmak isteyenler için ideal bir kaynak olabilir. Ayrıca, her sorgunun yanında açıklamalar ve kullanım örnekleri de bulunmaktadır.

    Bu paylaşım herkese açık olarak sunulmuştur ve katkıda bulunmak isteyen herkesin eklemeler yapabilmesi için esnetilebilir bir yapıya sahiptir. Siz de kendi SQL sorgularınızı ekleyebilir veya mevcut sorguları düzenleyebilirsiniz.

    Proje Özellikleri: - Toplam Sorgu Sayısı: 1303 - SELECT Sorgusu Sayısı: 966 - DELETE ve UPDATE Sorgusu Sayısı: 337

    Örnek SQL Sorgu Kategorileri: - Veri Seçme (SELECT) Sorguları - Veri Silme (DELETE) Sorguları - Veri Güncelleme (UPDATE) Sorguları

    Umarım bu kaynak, SQL öğrenme sürecinizi destekler ve size yardımcı olur. Katkılarınızı bekliyoruz!

    Saygılarımla, Sahil Rzayev

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Andrew Dolcimascolo-Garrett (2024). Hospital Database Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/andrewdolcigarrett/hospital-database-management-system-sql-project
Organization logo

Hospital Database Management System SQL Project

A Data Mining Exercise in SQL

Explore at:
zip(1487278 bytes)Available download formats
Dataset updated
May 9, 2024
Authors
Andrew Dolcimascolo-Garrett
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset

This dataset was created by Andrew Dolcimascolo-Garrett

Released under MIT

Contents

Search
Clear search
Close search
Google apps
Main menu