19 datasets found
  1. IMDB Movies Analysis - SQL

    • kaggle.com
    zip
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav B R (2023). IMDB Movies Analysis - SQL [Dataset]. https://www.kaggle.com/datasets/gauravbr/imdb-movies-data-erd
    Explore at:
    zip(3818401 bytes)Available download formats
    Dataset updated
    Feb 21, 2023
    Authors
    Gaurav B R
    Description

    SQL IMDB Movies Analysis for RSVP (Film Production Company)

    RSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.

    The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.

    For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.

  2. Nvidia Database

    • kaggle.com
    zip
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajay Tom (2025). Nvidia Database [Dataset]. https://www.kaggle.com/datasets/ajayt0m/nvidia-database
    Explore at:
    zip(8712 bytes)Available download formats
    Dataset updated
    Jan 30, 2025
    Authors
    Ajay Tom
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a beginner-friendly SQLite database designed to help users practice SQL and relational database concepts. The dataset represents a basic business model inspired by NVIDIA and includes interconnected tables covering essential aspects like products, customers, sales, suppliers, employees, and projects. It's perfect for anyone new to SQL or data analytics who wants to learn and experiment with structured data.

    Tables and Their Contents:

    Products:

    Includes details of 15 products (e.g., GPUs, AI accelerators). Attributes: product_id, product_name, category, release_date, price.

    Customers:

    Lists 20 fictional customers with their industry and contact information. Attributes: customer_id, customer_name, industry, contact_email, contact_phone.

    Sales:

    Contains 100 sales records tied to products and customers. Attributes: sale_id, product_id, customer_id, sale_date, region, quantity_sold, revenue.

    Suppliers:

    Features 50 suppliers and the materials they provide. Attributes: supplier_id, supplier_name, material_supplied, contact_email.

    Supply Chain:

    Tracks materials supplied to produce products, proportional to sales. Attributes: supply_chain_id, supplier_id, product_id, supply_date, quantity_supplied.

    Departments:

    Lists 5 departments within the business. Attributes: department_id, department_name, location.

    Employees:

    Contains data on 30 employees and their roles in different departments. Attributes: employee_id, first_name, last_name, department_id, hire_date, salary.

    Projects:

    Describes 10 projects handled by different departments. Attributes: project_id, project_name, department_id, start_date, end_date, budget.

    Why Use This Dataset?

    • Perfect for Beginners: The dataset is simple and easy to understand.
    • Interconnected Tables: Provides a basic introduction to relational database concepts like joins and foreign keys.
    • SQL Practice: Run basic queries, filter data, and perform simple aggregations or calculations.
    • Learning Tool: Great for small projects and understanding business datasets.

    Potential Use Cases:

    • Practice SQL queries (SELECT, INSERT, UPDATE, DELETE, JOIN).
    • Understand how to design and query relational databases.
    • Analyze basic sales and supply chain data for patterns and trends.
    • Learn how to use databases in analytics tools like Excel, Power BI, or Tableau.

    Data Size:

    Number of Tables: 8 Total Rows: Around 230 across all tables, ensuring quick queries and easy exploration.

  3. SQL Integrity Journey: Unleashing Data Constraints

    • kaggle.com
    zip
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radha Gandhi (2023). SQL Integrity Journey: Unleashing Data Constraints [Dataset]. https://www.kaggle.com/datasets/radhagandhi/sql-integrity-journey-unleashing-data-constraints
    Explore at:
    zip(13817 bytes)Available download formats
    Dataset updated
    Oct 9, 2023
    Authors
    Radha Gandhi
    Description

    **Title: **Practical Exploration of SQL Constraints: Building a Foundation in Data Integrity Introduction: Welcome to my Data Analysis project, where I focus on mastering SQL constraints—a pivotal aspect of database management. This project centers on hands-on experience with SQL's Data Definition Language (DDL) commands, emphasizing constraints such as PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and DEFAULT. In this project, I aim to demonstrate my foundational understanding of enforcing data integrity and maintaining a structured database environment. Purpose: The primary purpose of this project is to showcase my proficiency in implementing and managing SQL constraints for robust data governance. By delving into the realm of constraints, you'll gain insights into my SQL skills and how I utilize constraints to ensure data accuracy, consistency, and reliability within relational databases. What to Expect: Within this project, you will find a series of projects that focus on the implementation and utilization of SQL constraints. These projects highlight my command over the following key constraint types: NOT NULL: The NOT NULL constraint is crucial for ensuring the presence of essential data in a column. PRIMARY KEY: Ensuring unique identification of records for data integrity. FOREIGN KEY: Establishing relationships between tables to maintain referential integrity. UNIQUE: Guaranteeing the uniqueness of values within specified columns. CHECK: Implementing custom conditions to validate data entries. DEFAULT: Setting default values for columns to enhance data reliability. Each exercise within this project is accompanied by clear and concise SQL scripts, explanations of the intended outcomes, and practical insights into the application of these constraints. My goal is to showcase how SQL constraints serve as crucial tools for creating a structured and dependable database foundation. I invite you to explore these projects in detail, where I provide hands-on examples that highlight the importance and utility of SQL constraints. Together, these projects underscore my commitment to upholding data quality, ensuring data accuracy, and harnessing the power of SQL constraints for informed decision-making in data analysis. 3.1 CONSTRAINT - ENFORCING NOT NULL CONSTRAINT WHILE CREATING NEW TABLE. 3.2 CONSTRAINT- ENFORCE NOT NULL CONSTRAINT ON EXISTING COLUMN. 3.3 CONSTRAINT - ENFORCING PRIMARY KEY CONSTRAINT WHILE CREATING A NEW TABLE. 3.4 CONSTRAINT - ENFORCE PRIMARY KEY CONSTRAINT ON EXISTING COLUMN. 3.5 CONSTRAINT - ENFORCING FOREIGN KEY CONSTRAINT WHILE CREATING NEW TABLE. 3.6 CONSTRAINT - ENFORCE FOREIGN KEY CONSTRAINT ON EXISTING COLUMN. 3.7CONSTRAINT - ENFORCING UNIQUE CONSTRAINTS WHILE CREATING A NEW TABLE. 3.8 CONSTRAINT - ENFORCING UNIQUE CONSTRAINT IN EXISTING TABLE. 3.9 CONSTRAINT - ENFORCING CHECK CONSTRAINT IN NEW TABLE. 3.10 CONSTRAINT - ENFORCING CHECK CONSTRAINT IN THE EXISTING TABLE. 3.11 CONSTRAINT - ENFORCING DEFAULT CONSTRAINT IN THE NEW TABLE. 3.12 CONSTRAINT - ENFORCING DEFAULT CONSTRAINT IN THE EXISTING TABLE.

  4. BookMyShow-SQL-Data-Analysis

    • kaggle.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soumendu Ray (2025). BookMyShow-SQL-Data-Analysis [Dataset]. https://www.kaggle.com/datasets/soumenduray99/bookmyshow-sql-data-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Soumendu Ray
    Description

    🎟️ BookMyShow SQL Data Analysis 🎯 Objective This project leverages SQL-based analysis to gain actionable insights into user engagement, movie performance, theater efficiency, payment systems, and customer satisfaction on the BookMyShow platform. The goal is to enhance platform performance, boost revenue, and optimize user experience through data-driven strategies.

    📊 Key Analysis Areas 1. 👥 User Behavior & Engagement Identify most active users and repeat customers Track unique monthly users Analyze peak booking times and average tickets per user Drive engagement strategies and boost customer retention 2. 🎬 Movie Performance Analysis Highlight top-rated and most booked movies Analyze popular languages and high-revenue genres Study average occupancy rates Focus marketing on high-performing genres and content 3. 🏢 Theater & Show Performance Pinpoint theaters with highest/lowest bookings Evaluate popular show timings Measure theater-wise revenue contribution and occupancy Improve theater scheduling and resource allocation 4. 💵 Booking & Revenue Insights Track total revenue, top spenders, and monthly booking patterns Discover most used payment methods Calculate average price per booking and bookings per user Optimize revenue generation and spending strategies 5. 🪑 Seat Utilization & Pricing Strategy Identify most booked seat types and their revenue impact Analyze seat pricing variations and price elasticity Align pricing strategy with demand patterns for higher revenue 6. ✅❌ Payment & Transaction Analysis Distinguish successful vs. failed transactions Track refund frequency and payment delays Evaluate revenue lost due to failures Enhance payment processing systems 7. ⭐ User Reviews & Sentiment Analysis Measure average ratings per movie Identify top and lowest-rated content Analyze review volume and sentiment trends Leverage feedback to refine content offerings 🧰 Tech Stack Query Language: SQL (MySQL/PostgreSQL) Database Tools: DBeaver, pgAdmin, or any SQL IDE Visualization (Optional): Power BI / Tableau for presenting insights Version Control: Git & GitHub

  5. Wikipedia SQLITE Portable DB, Huge 5M+ Rows

    • kaggle.com
    zip
    Updated Jun 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    christernyc (2024). Wikipedia SQLITE Portable DB, Huge 5M+ Rows [Dataset]. https://www.kaggle.com/datasets/christernyc/wikipedia-sqlite-portable-db-huge-5m-rows/code
    Explore at:
    zip(6064169983 bytes)Available download formats
    Dataset updated
    Jun 29, 2024
    Authors
    christernyc
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The "Wikipedia SQLite Portable DB" is a compact and efficient database derived from the Kensho Derived Wikimedia Dataset (KDWD). This dataset provides a condensed subset of raw Wikimedia data in a format optimized for natural language processing (NLP) research and applications.

    I am not affiliated or partnered with the Kensho in any way, just really like the dataset for giving my agents to query easily.

    Key Features:

    Contains over 5 million rows of data from English Wikipedia and Wikidata Stored in a portable SQLite database format for easy integration and querying Includes a link-annotated corpus of English Wikipedia pages and a compact sample of the Wikidata knowledge base Ideal for NLP tasks, machine learning, data analysis, and research projects

    The database consists of four main tables:

    • items: Contains information about Wikipedia items, including labels and descriptions
    • properties: Stores details about Wikidata properties, such as labels and descriptions
    • pages: Provides metadata for Wikipedia pages, including page IDs, item IDs, titles, and view counts
    • link_annotated_text: Contains the link-annotated text of Wikipedia pages, divided into sections

    This dataset is derived from the Kensho Derived Wikimedia Dataset (KDWD), which is built from the English Wikipedia snapshot from December 1, 2019, and the Wikidata snapshot from December 2, 2019. The KDWD is a condensed subset of the raw Wikimedia data in a form that is helpful for NLP work, and it is released under the CC BY-SA 3.0 license. Credits: The "Wikipedia SQLite Portable DB" is derived from the Kensho Derived Wikimedia Dataset (KDWD), created by the Kensho R&D group. The KDWD is based on data from Wikipedia and Wikidata, which are crowd-sourced projects supported by the Wikimedia Foundation. We would like to acknowledge and thank the Kensho R&D group for their efforts in creating the KDWD and making it available for research and development purposes. By providing this portable SQLite database, we aim to make Wikipedia data more accessible and easier to use for researchers, data scientists, and developers working on NLP tasks, machine learning projects, and other data-driven applications. We hope that this dataset will contribute to the advancement of NLP research and the development of innovative applications utilizing Wikipedia data.

    https://www.kaggle.com/datasets/kenshoresearch/kensho-derived-wikimedia-data/data

    Tags: encyclopedia, wikipedia, sqlite, database, reference, knowledge-base, articles, information-retrieval, natural-language-processing, nlp, text-data, large-dataset, multi-table, data-science, machine-learning, research, data-analysis, data-mining, content-analysis, information-extraction, text-mining, text-classification, topic-modeling, language-modeling, question-answering, fact-checking, entity-recognition, named-entity-recognition, link-prediction, graph-analysis, network-analysis, knowledge-graph, ontology, semantic-web, structured-data, unstructured-data, data-integration, data-processing, data-cleaning, data-wrangling, data-visualization, exploratory-data-analysis, eda, corpus, document-collection, open-source, crowdsourced, collaborative, online-encyclopedia, web-data, hyperlinks, categories, page-views, page-links, embeddings

    Usage with LIKE queries: ``` import aiosqlite import asyncio

    class KenshoDatasetQuery: def init(self, db_file): self.db_file = db_file

    async def _aenter_(self):
      self.conn = await aiosqlite.connect(self.db_file)
      return self
    
    async def _aexit_(self, exc_type, exc_val, exc_tb):
      await self.conn.close()
    
    async def search_pages_by_title(self, title):
      query = """
      SELECT pages.page_id, pages.item_id, pages.title, pages.views, 
          items.labels AS item_labels, items.description AS item_description,
          link_annotated_text.sections
      FROM pages 
      JOIN items ON pages.item_id = items.id
      JOIN link_annotated_text ON pages.page_id = link_annotated_text.page_id
      WHERE pages.title LIKE ?
      """
      async with self.conn.execute(query, (f"%{title}%",)) as cursor:
        return await cursor.fetchall()
    
    async def search_items_by_label_or_description(self, keyword):
      query = """
      SELECT id, labels, description 
      FROM items
      WHERE labels LIKE ? OR description LIKE ?
      """
      async with self.conn.execute(query, (f"%{keyword}%", f"%{keyword}%")) as cursor:
        return await cursor.fetchall()
    
    async def search_items_by_label(self, label):
      query = """
      SELECT id, labels, description
      FROM items 
      WHERE labels LIKE ?
      """
      async with self.conn.execute(query, (f"%{label}%",)) as cursor:
        return await cursor.fetchall()
    
    async def search_properties_by_label_or_desc...
    
  6. Healthcare Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishal Jaiswal (2025). Healthcare Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/jaiswalmagic1/healthcare-fraud-detection-dataset
    Explore at:
    zip(10427537 bytes)Available download formats
    Dataset updated
    Mar 6, 2025
    Authors
    Vishal Jaiswal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains comprehensive synthetic healthcare data designed for fraud detection analysis. It includes information on patients, healthcare providers, insurance claims, and payments. The dataset is structured to mimic real-world healthcare transactions, where fraudulent activities such as false claims, overbilling, and duplicate charges can be identified through advanced analytics.

    The dataset is suitable for practicing SQL queries, exploratory data analysis (EDA), machine learning for fraud detection, and visualization techniques. It is designed to help data analysts and data scientists develop and refine their analytical skills in the healthcare insurance domain.

    Dataset Overview The dataset consists of four CSV files:

    Patients Data (patients.csv)

    Contains demographic details of patients, such as age, gender, insurance type, and location. Can be used to analyze patient demographics and healthcare usage patterns. Providers Data (providers.csv)

    Contains information about healthcare providers, including provider ID, specialty, location, and associated hospital.

    Useful for identifying fraudulent claims linked to specific providers or hospitals. Claims Data (claims.csv)

    Contains records of insurance claims made by patients, including diagnosis codes, treatment details, provider ID, and claim amount.

    Can be analyzed for suspicious patterns, such as excessive claims from a single provider or duplicate claims for the same patient.

    Payments Data (payments.csv) Contains details of claim payments made by insurance companies, including payment amount, claim ID, and reimbursement status.

    Helps in detecting discrepancies between claims and actual reimbursements. Possible Analysis Ideas

    This dataset allows for multiple analysis approaches, including but not limited to:

    🔹 Fraud Detection: Identify patterns in claims data to detect fraudulent activities (e.g., excessive billing, duplicate claims). 🔹 Provider Behavior Analysis: Analyze providers who have an unusually high claim volume or high rejection rates. 🔹 Payment Trends: Compare claims vs. payments to find irregularities in reimbursement patterns. 🔹 Patient Demographics & Utilization: Explore which patient groups are more likely to file claims and receive reimbursements. 🔹 SQL Query Practice: Perform advanced SQL queries, including joins, aggregations, window functions, and subqueries, to extract insights from the data.

    Use Cases Practicing SQL queries for job interviews and real-world projects. Learning data cleaning, data wrangling, and feature engineering for healthcare analytics. Applying machine learning techniques for fraud detection. Gaining insights into the healthcare insurance domain and its challenges.

    License & Usage License: CC0 Public Domain (Free to use for any purpose).

    Attribution: Not required but appreciated. Intended Use: This dataset is for educational and research purposes only.

    This dataset is an excellent resource for aspiring data analysts, data scientists, and SQL learners who want to gain hands-on experience in healthcare fraud detection.

  7. SQLite Sakila Sample Database

    • kaggle.com
    zip
    Updated Mar 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atanas Kanev (2021). SQLite Sakila Sample Database [Dataset]. https://www.kaggle.com/datasets/atanaskanev/sqlite-sakila-sample-database/code
    Explore at:
    zip(4495190 bytes)Available download formats
    Dataset updated
    Mar 14, 2021
    Authors
    Atanas Kanev
    Description

    SQLite Sakila Sample Database

    Database Description

    The Sakila sample database is a fictitious database designed to represent a DVD rental store. The tables of the database include film, film_category, actor, customer, rental, payment and inventory among others. The Sakila sample database is intended to provide a standard schema that can be used for examples in books, tutorials, articles, samples, and so forth. Detailed information about the database can be found on the MySQL website: https://dev.mysql.com/doc/sakila/en/

    Sakila for SQLite is a part of the sakila-sample-database-ports project intended to provide ported versions of the original MySQL database for other database systems, including:

    • Oracle
    • SQL Server
    • SQLIte
    • Interbase/Firebird
    • Microsoft Access

    Sakila for SQLite is a port of the Sakila example database available for MySQL, which was originally developed by Mike Hillyer of the MySQL AB documentation team. This project is designed to help database administrators to decide which database to use for development of new products The user can run the same SQL against different kind of databases and compare the performance

    License: BSD Copyright DB Software Laboratory http://www.etl-tools.com

    Note: Part of the insert scripts were generated by Advanced ETL Processor http://www.etl-tools.com/etl-tools/advanced-etl-processor-enterprise/overview.html

    Information about the project and the downloadable files can be found at: https://code.google.com/archive/p/sakila-sample-database-ports/

    Other versions and developments of the project can be found at: https://github.com/ivanceras/sakila/tree/master/sqlite-sakila-db

    https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples/Sakila

    Direct access to the MySQL Sakila database, which does not require installation of MySQL (queries can be typed directly in the browser), is provided on the phpMyAdmin demo version website: https://demo.phpmyadmin.net/master-config/

    Files Description

    The files in the sqlite-sakila-db folder are the script files which can be used to generate the SQLite version of the database. For convenience, the script files have already been run in cmd to generate the sqlite-sakila.db file, as follows:

    sqlite> .open sqlite-sakila.db # creates the .db file sqlite> .read sqlite-sakila-schema.sql # creates the database schema sqlite> .read sqlite-sakila-insert-data.sql # inserts the data

    Therefore, the sqlite-sakila.db file can be directly loaded into SQLite3 and queries can be directly executed. You can refer to my notebook for an overview of the database and a demonstration of SQL queries. Note: Data about the film_text table is not provided in the script files, thus the film_text table is empty. Instead the film_id, title and description fields are included in the film table. Moreover, the Sakila Sample Database has many versions, so an Entity Relationship Diagram (ERD) is provided to describe this specific version. You are advised to refer to the ERD to familiarise yourself with the structure of the database.

  8. Superstore Snowflake Schema Modeling Dataset

    • kaggle.com
    zip
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chik0di (2025). Superstore Snowflake Schema Modeling Dataset [Dataset]. https://www.kaggle.com/datasets/chik0di/superstore-snowflake-schema-modeling-dataset
    Explore at:
    zip(474167 bytes)Available download formats
    Dataset updated
    Oct 30, 2025
    Authors
    Chik0di
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset represents a Snowflake Schema model built from the popular Tableau Superstore dataset which exists primarily in a denormalized (flat) format.

    This version is fully structured into fact and dimension tables, making it ready for data warehouse design, SQL analytics, and BI visualization projects.

    The dataset was modeled to demonstrate dimensional modeling best practices, showing how the original flat Superstore data can be normalized into related dimensions and a central fact table.

    Use this dataset to: - Practice SQL joins and schema design - Build ETL pipelines or dbt models - Design Power BI dashboards - Learn data warehouse normalization (3NF → Snowflake) concepts - Simulate enterprise data warehouse reporting environments

    I’m open to suggestions or improvements from the community — feel free to share ideas on additional dimensions, measures, or transformations that could improve and make this dataset even more useful for learning and analysis.

    Transformation was done using dbt, check out the models and the entire project.

  9. Hospital Management Dataset

    • kaggle.com
    zip
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanak Baghel (2025). Hospital Management Dataset [Dataset]. https://www.kaggle.com/datasets/kanakbaghel/hospital-management-dataset/data
    Explore at:
    zip(11375 bytes)Available download formats
    Dataset updated
    May 30, 2025
    Authors
    Kanak Baghel
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is a structured, multi-table dataset designed to simulate a hospital management system. It is ideal for practicing data analysis, SQL, machine learning, and healthcare analytics.

    Dataset Overview

    This dataset includes five CSV files:

    1. patients.csv – Patient demographics, contact details, registration info, and insurance data

    2. doctors.csv – Doctor profiles with specializations, experience, and contact information

    3. appointments.csv – Appointment dates, times, visit reasons, and statuses

    4. treatments.csv – Treatment types, descriptions, dates, and associated costs

    5. billing.csv – Billing amounts, payment methods, and status linked to treatments

    📁 Files & Column Descriptions

    ** patients.csv**

    Contains patient demographic and registration details.

    Column Description

    patient_id -> Unique ID for each patient first_name -> Patient's first name last_name -> Patient's last name gender -> Gender (M/F) date_of_birth -> Date of birth contact_number -> Phone number address -> Address of the patient registration_date -> Date of first registration at the hospital insurance_provider -> Insurance company name insurance_number -> Policy number email -> Email address

    ** doctors.csv**

    Details about the doctors working in the hospital.

    Column Description

    doctor_id -> Unique ID for each doctor first_name -> Doctor's first name last_name -> Doctor's last name specialization -> Medical field of expertise phone_number -> Contact number years_experience -> Total years of experience hospital_branch -> Branch of hospital where doctor is based email -> Official email address

    appointments.csv

    Records of scheduled and completed patient appointments.

    Column Description

    appointment_id -> Unique appointment ID patient_id -> ID of the patient doctor_id -> ID of the attending doctor appointment_date -> Date of the appointment appointment_time -> Time of the appointment reason_for_visit -> Purpose of visit (e.g., checkup) status -> Status (Scheduled, Completed, Cancelled)

    treatments.csv

    Information about the treatments given during appointments.

    Column Description

    treatment_id -> Unique ID for each treatment appointment_id -> Associated appointment ID treatment_type -> Type of treatment (e.g., MRI, X-ray) description -> Notes or procedure details cost -> Cost of treatment treatment_date -> Date when treatment was given

    ** billing.csv**

    Billing and payment details for treatments.

    Column Description

    bill_id -> Unique billing ID patient_id -> ID of the billed patient treatment_id -> ID of the related treatment bill_date -> Date of billing amount -> Total amount billed payment_method -> Mode of payment (Cash, Card, Insurance) payment_status -> Status of payment (Paid, Pending, Failed)

    Possible Use Cases

    SQL queries and relational database design

    Exploratory data analysis (EDA) and dashboarding

    Machine learning projects (e.g., cost prediction, no-show analysis)

    Feature engineering and data cleaning practice

    End-to-end healthcare analytics workflows

    Recommended Tools & Resources

    SQL (joins, filters, window functions)

    Pandas and Matplotlib/Seaborn for EDA

    Scikit-learn for ML models

    Pandas Profiling for automated EDA

    Plotly for interactive visualizations

    Please Note that :

    All data is synthetically generated for educational and project use. No real patient information is included.

    If you find this dataset helpful, consider upvoting or sharing your insights by creating a Kaggle notebook.

  10. Logistics Operations Database

    • kaggle.com
    zip
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yogape Rodriguez (2025). Logistics Operations Database [Dataset]. https://www.kaggle.com/datasets/yogape/logistics-operations-database
    Explore at:
    zip(15059576 bytes)Available download formats
    Dataset updated
    Nov 23, 2025
    Authors
    Yogape Rodriguez
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Kaggle Dataset: Synthetic Logistics Operations Database (2022-2024)

    About this Dataset

    What's Inside

    A complete operational database from a fictional Class 8 trucking company spanning three years. This isn't scraped web data or simplified tutorial content—it's a realistic simulation built from 12 years of real-world logistics experience, designed specifically for analysts transitioning into supply chain and transportation domains.

    The dataset contains 85,000+ records across 14 interconnected tables covering everything from driver assignments and fuel purchases to maintenance schedules and delivery performance. Each table maintains proper foreign key relationships, making this ideal for practicing complex SQL queries, building data pipelines, or developing operational dashboards.

    Who This Is For

    SQL Learners: Master window functions, CTEs, and multi-table JOINs using realistic business scenarios rather than contrived examples.

    Data Analysts: Build portfolio projects that demonstrate understanding of operational metrics: cost-per-mile analysis, fleet utilization optimization, driver performance scorecards.

    Aspiring Supply Chain Analysts: Work with authentic logistics data patterns—seasonal freight volumes, equipment utilization rates, route profitability calculations—without NDA restrictions.

    Data Science Students: Develop predictive models for maintenance scheduling, driver retention, or route optimization using time-series data with actual business context.

    Career Changers: If you're moving from operations into analytics (like the dataset creator), this provides a bridge—your domain knowledge becomes a competitive advantage rather than a gap to explain.

    Why This Dataset Exists

    Most logistics datasets are either proprietary (unavailable) or overly simplified (unrealistic). This fills the gap: operational complexity without confidentiality concerns. The data reflects real industry patterns:

    • Fuel prices track the 2022 diesel spike and 2023-2024 decline
    • Driver turnover sits at 15% annually (industry standard)
    • Equipment utilization averages 65% (typical for dry van operations)
    • On-time delivery performance ranges 85-95% (realistic service levels)
    • Maintenance intervals follow Class 8 PM schedules

    Dataset Structure

    Core Entities (Reference Tables): - Drivers (150 records) - Demographics, employment history, CDL info - Trucks (120 records) - Fleet specs, acquisition dates, status - Trailers (180 records) - Equipment types, current assignments - Customers (200 records) - Shipper accounts, contract terms, revenue potential - Facilities (50 records) - Terminals and warehouses with geocoordinates - Routes (60+ records) - City pairs with distances and rate structures

    Operational Transactions: - Loads (57,000+ records) - Shipment details, revenue, booking type - Trips (57,000+ records) - Driver-truck assignments, actual performance - Fuel Purchases (131,000+ records) - Transaction-level data with pricing - Maintenance Records (6,500+ records) - Service history, costs, downtime - Delivery Events (114,000+ records) - Pickup/delivery timestamps, detention - Safety Incidents (114 records) - Accidents, violations, claims

    Aggregated Analytics: - Driver Monthly Metrics (5,400+ records) - Performance summaries - Truck Utilization Metrics (3,800+ records) - Equipment efficiency

    Key Features

    Temporal Coverage: January 2022 through December 2024 (3 years)

    Geographic Scope: National operations across 25+ major US cities

    Realistic Patterns: - Seasonal freight fluctuations (Q4 peaks) - Historical fuel price accuracy - Equipment lifecycle modeling - Driver retention dynamics - Service level variations

    Data Quality: - Complete foreign key integrity - No orphaned records - Intentional 2% null rate in driver/truck assignments (reflects reality) - All timestamps properly sequenced - Financial calculations verified

    Use Case Examples

    Business Intelligence: Create executive dashboards showing revenue per truck, cost per mile, driver efficiency rankings, maintenance spend by equipment age, customer concentration risk.

    Predictive Analytics: Build models forecasting equipment failures based on maintenance history, predict driver turnover using performance metrics, estimate route profitability for new lanes.

    Operations Optimization: Analyze route efficiency, identify underutilized assets, optimize maintenance scheduling, calculate ideal fleet size, evaluate driver-to-truck ratios.

    SQL Mastery: Practice window functions for running totals and rankings, write complex JOINs across 6+ tables, implement CTEs for hierarchical queries, perform cohort analysis on driver retention.

    Sample Questions to Explore

    1. Which routes generate the highest profit margin after fuel costs?
    2. How does driver tenure correlate with fuel ef...
  11. BigQuery Fintech Dataset

    • kaggle.com
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Keser (2024). BigQuery Fintech Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/bigquery-fintech-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Keser
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset: cloud-training-demos.fintech

    This dataset, hosted on BigQuery, is designed for financial technology (fintech) training and analysis. It comprises six interconnected tables, each providing detailed insights into various aspects of customer loans, loan purposes, and regional distributions. The dataset is ideal for practicing SQL queries, building data models, and conducting financial analytics.

    Tables:

    1. customer:
      Contains records of individual customers, including demographic details and unique customer IDs. This table serves as a primary reference for analyzing customer behavior and loan distribution.

    2. loan:
      Includes detailed information about each loan issued, such as the loan amount, interest rate, and tenure. The table is crucial for analyzing lending patterns and financial outcomes.

    3. loan_count_by_year:
      Provides aggregated loan data by year, offering insights into yearly lending trends. This table helps in understanding the temporal dynamics of loan issuance.

    4. loan_purposes:
      Lists various reasons or purposes for which loans were issued, along with corresponding loan counts. This data can be used to analyze customer needs and market demands.

    5. loan_with_region:
      Combines loan data with regional information, allowing for geographical analysis of lending activities. This table is key for regional market analysis and understanding how loan distribution varies across different areas.

    6. state_region:
      Maps state names to their respective regions, enabling a more granular geographical analysis when combined with other tables in the dataset.

    Use Cases:

    • Customer Segmentation: Analyze customer data to identify distinct segments based on demographics and loan behaviors.
    • Loan Analysis: Explore loan issuance patterns, interest rates, and purposes to uncover trends and insights.
    • Regional Analysis: Combine loan and region data to understand how loan distributions vary by geography.
    • Temporal Trends: Utilize the loan_count_by_year table to observe how lending patterns evolve over time.

    This dataset is ideal for those looking to enhance their skills in SQL, financial data analysis, and BigQuery, providing a comprehensive foundation for fintech-related projects and case studies.

  12. Model Car - Mint Classics

    • kaggle.com
    zip
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaston Saracusti (2024). Model Car - Mint Classics [Dataset]. https://www.kaggle.com/datasets/gastonsaracusti/model-car-mint-classics
    Explore at:
    zip(26650 bytes)Available download formats
    Dataset updated
    Apr 29, 2024
    Authors
    Gaston Saracusti
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Mint Classics Company, a retailer of classic model cars and other vehicles, is looking at closing one of their storage facilities.

    To support a data-based business decision, they are looking for suggestions and recommendations for reorganizing or reducing inventory, while still maintaining timely service to their customers. For example, they would like to be able to ship a product to a customer within 24 hours of the order being placed.

    As a data analyst, you have been asked to use MySQL Workbench to familiarize yourself with the general business by examining the current data. You will be provided with a data model and sample data tables to review. You will then need to isolate and identify those parts of the data that could be useful in deciding how to reduce inventory. You will write queries to answer questions like these:

    1) Where are items stored and if they were rearranged, could a warehouse be eliminated?

    2) How are inventory numbers related to sales figures? Do the inventory counts seem appropriate for each item?

    3) Are we storing items that are not moving? Are any items candidates for being dropped from the product line?

    The answers to questions like those should help you to formulate suggestions and recommendations for reducing inventory with the goal of closing one of the storage facilities.

    Project Objectives

    1. Explore products currently in inventory.

    2. Determine important factors that may influence inventory reorganization/reduction.

    3. Provide analytic insights and data-driven recommendations.

    Your Challenge

    Your challenge will be to conduct an exploratory data analysis to investigate if there are any patterns or themes that may influence the reduction or reorganization of inventory in the Mint Classics storage facilities. To do this, you will import the database and then analyze data. You will also pose questions, and seek to answer them meaningfully using SQL queries to retrieve data from the database provided.

    In this project, we'll use the fictional Mint Classics relational database and a relational data model. Both will be provided.

    After you perform your analysis, you will share your findings.

  13. Kimia Farma: Performance Analysis 2020-2023

    • kaggle.com
    zip
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anggun Dwi Lestari (2025). Kimia Farma: Performance Analysis 2020-2023 [Dataset]. https://www.kaggle.com/datasets/anggundwilestari/kimia-farma-performance-analysis-2020-2023
    Explore at:
    zip(30284703 bytes)Available download formats
    Dataset updated
    Feb 27, 2025
    Authors
    Anggun Dwi Lestari
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19062145%2F025ccf521f62db512b4a98edd0b3508a%2FKimia_Farma_Dashboard.jpg?generation=1748428094441761&alt=media" alt="">This project analyzes Kimia Farma's performance from 2020 to 2023 using Google Looker Studio. The analysis is based on a pre-processed dataset stored in BigQuery, which serves as the data source for the dashboard.

    Project Scope

    The dashboard is designed to provide insights into branch performance, sales trends, customer ratings, and profitability. The development is ongoing, with multiple pages planned for a more in-depth analysis.

    Current Progress

    ✅ The first page of the dashboard is completed
    ✅ A sample dashboard file is available on Kaggle
    🔄 Development will continue with additional pages

    Dataset Overview

    The dataset consists of transaction records from Kimia Farma branches across different cities and provinces. Below are the key columns used in the analysis: - transaction_id: Transaction ID code - date: Transaction date - branch_id: Kimia Farma branch ID code - branch_name: Kimia Farma branch name - kota: City of the Kimia Farma branch - provinsi: Province of the Kimia Farma branch - rating_cabang: Customer rating of the Kimia Farma branch - customer_name: Name of the customer who made the transaction - product_id: Product ID code - product_name: Name of the medicine - actual_price: Price of the medicine - discount_percentage: Discount percentage applied to the medicine - persentase_gross_laba: Gross profit percentage based on the following conditions:
    Price ≤ Rp 50,000 → 10% profit
    Price > Rp 50,000 - 100,000 → 15% profit
    Price > Rp 100,000 - 300,000 → 20% profit
    Price > Rp 300,000 - 500,000 → 25% profit
    Price > Rp 500,000 → 30% profit
    - nett_sales: Price after discount - nett_profit: Profit earned by Kimia Farma - rating_transaksi: Customer rating of the transaction

    Files Provided

    📌 kimia farma_query.txt – Contains SQL queries used for data analysis in Looker Studio
    📌 kimia farma_analysis_table.csv – Preprocessed dataset ready for import and analysis

    📢 Published on : My LinkedIn

  14. Supply Chain DataSet

    • kaggle.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Motefaker (2023). Supply Chain DataSet [Dataset]. https://www.kaggle.com/datasets/amirmotefaker/supply-chain-dataset
    Explore at:
    zip(9340 bytes)Available download formats
    Dataset updated
    Jun 1, 2023
    Authors
    Amir Motefaker
    Description

    Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.

  15. Cloud Carbon Emissions Dataset

    • kaggle.com
    zip
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidhi Suryavanshi (2025). Cloud Carbon Emissions Dataset [Dataset]. https://www.kaggle.com/datasets/nidhis4444/cloud-carbon-emissions-dataset
    Explore at:
    zip(36611 bytes)Available download formats
    Dataset updated
    Sep 23, 2025
    Authors
    Nidhi Suryavanshi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a synthetic simulation of cloud resource usage and carbon emissions, designed for experimentation, analysis, and forecasting in sustainability and data engineering projects.

    Included Tables: - projects → Metadata about projects/teams. - services → Metadata about cloud services (Compute, Storage, AI, etc.). - emission_factors → Regional grid carbon intensity (gCO₂ per kWh). - service_energy_coefficients → Conversion rates from usage units to kWh. - daily_usage → Raw service usage (per project × service × region × day). - daily_emissions → Carbon emissions derived from usage × regional emission factors. - service_cost_coefficients → Conversion rates from usage units to cost (USD per unit).
    - daily_cost_emissions → Integrated fact table combining usage, energy, cost, and emissions for analysis.

    Features: - Simulated seasonality (weekend dips/spikes, holiday surges, quarter-end growth). - Regional variations in carbon intensity (e.g., coal-heavy vs renewable grids). - Multiple projects and services for multi-dimensional analysis. - Directly importable into BigQuery for analytics & forecasting.

    Use Cases: Explore sustainability analytics at scale. Build carbon footprint dashboards. Run AI/ML forecasting on emissions data. Practice SQL, data modeling, and visualization.

    ⚠️ Note: All data is synthetic and created for educational/demo purposes. It does not represent actual cloud provider emissions.

  16. Blinkit dataset

    • kaggle.com
    zip
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mukesh gadri (2024). Blinkit dataset [Dataset]. https://www.kaggle.com/datasets/mukeshgadri/blinkit-dataset
    Explore at:
    zip(695160 bytes)Available download formats
    Dataset updated
    Jul 18, 2024
    Authors
    mukesh gadri
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    In the case study titled "Blinkit: Grocery Product Analysis," a dataset called 'Grocery Sales' contains 12 columns with information on sales of grocery items across different outlets. Using Tableau, you as a data analyst can uncover customer behavior insights, track sales trends, and gather feedback. These insights will drive operational improvements, enhance customer satisfaction, and optimize product offerings and store layout. Tableau enables data-driven decision-making for positive outcomes at Blinkit.

    The table Grocery Sales is a .CSV file and has the following columns, details of which are as follows:

    • Item_Identifier: A unique ID for each product in the dataset. • Item_Weight: The weight of the product. • Item_Fat_Content: Indicates whether the product is low fat or not. • Item_Visibility: The percentage of the total display area in the store that is allocated to the specific product. • Item_Type: The category or type of product. • Item_MRP: The maximum retail price (list price) of the product. • Outlet_Identifier: A unique ID for each store in the dataset. • Outlet_Establishment_Year: The year in which the store was established. • Outlet_Size: The size of the store in terms of ground area covered. • Outlet_Location_Type: The type of city or region in which the store is located. • Outlet_Type: Indicates whether the store is a grocery store or a supermarket. • Item_Outlet_Sales: The sales of the product in the particular store. This is the outcome variable that we want to predict.

  17. ascension

    • kaggle.com
    zip
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cmumf93 (2025). ascension [Dataset]. https://www.kaggle.com/datasets/cmumford1993/ascension
    Explore at:
    zip(78996 bytes)Available download formats
    Dataset updated
    Nov 17, 2025
    Authors
    cmumf93
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a synthetic dataset inspired by the merchandise and supply-chain operations of a Christian publishing company. It was created to practice:

    Product & channel performance analysis

    Supply-chain and vendor risk assessment

    Inventory and backorder monitoring

    Basic forecasting and scenario planning

    The data spans 2025-01-01 to 2025-06-30 and includes 10 products (studies, devotionals, rosaries, journals, and digital bundles) sold across four channels (Website, Parish Bulk, Amazon, and Events) in four US regions (Northeast, Midwest, South, West).

    Dataset summary

    Rows: 5,435 daily product–channel–region records

    Products: 10

    Channels: Website, Parish Bulk, Amazon, Event

    Regions: Northeast, Midwest, South, West

    Vendors: Multiple printers and vendors with different lead times and risk profiles

    Each row describes the performance of a single product on a given date in a given channel, along with inventory and vendor information that can be used for operational risk analysis.

    Columns

    date – Calendar date for the record (YYYY-MM-DD).

    product_id – Short ID for the product (e.g., BIBLE-STUDY-101).

    product_name – Human-readable product name (e.g., Foundations Bible Study).

    product_category – High-level category (Adult Study, Seasonal, Sacrament Prep, etc.).

    format – Physical or Digital format.

    channel – Sales channel (Website, Parish Bulk, Amazon, Event).

    region – US region where the sale occurred (Northeast, Midwest, South, West).

    vendor – Primary printer or vendor responsible for fulfilling that product.

    units_sold – Number of units sold for that product/date/channel/region.

    unit_price – Selling price per unit (USD).

    revenue – Total revenue = units_sold * unit_price.

    cogs_per_unit – Cost of goods sold per unit (approximate production/fulfillment cost).

    gross_margin – Revenue minus total COGS for that row.

    inventory_start – On-hand inventory at the start of the day.

    inventory_end – On-hand inventory at the end of the day after sales.

    backorder_flag – True if demand exceeded inventory and created a backorder, otherwise False.

    lead_time_days – Typical replenishment lead time in days for that product/vendor combination.

    What you can do with this dataset

    This dataset is designed for:

    Product & channel profitability

    Rank products by total profit or margin.

    Compare profitability across channels and regions.

    Supply-chain & vendor risk

    Identify products with long lead times and frequent backorders.

    Flag higher-risk vendors (e.g., long lead times, tight inventory).

    Inventory analytics

    Track when inventory gets tight.

    Explore safety stock ideas using inventory_start, inventory_end, and backorder_flag.

    Forecasting & scenario planning

    Build time-series forecasts of units sold or revenue.

    Simulate what happens if one vendor fails or lead times increase.

    Learning & practice

    Practice SQL, Python, or R data analysis.

    Build dashboards (Tableau, Power BI, etc.) or case-study style projects for a product or data-analytics portfolio.

    Important notes

    This is not real Ascension data; it is fully synthetic and safe to use publicly.

    The structure was designed to resemble realistic publishing/merchandise operations, but the exact numbers and patterns were generated programmatically.

    If you use this dataset in a notebook, blog post, or portfolio project, feel free to link back here so others can see how you approached the analysis.

  18. E-Commerce Data

    • kaggle.com
    zip
    Updated Aug 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carrie (2017). E-Commerce Data [Dataset]. https://www.kaggle.com/datasets/carrie1/ecommerce-data
    Explore at:
    zip(7548686 bytes)Available download formats
    Dataset updated
    Aug 17, 2017
    Authors
    Carrie
    Description

    Context

    Typically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".

    Content

    "This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."

    Acknowledgements

    Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.

    Image from stocksnap.io.

    Inspiration

    Analyses for this dataset could include time series, clustering, classification and more.

  19. Railway Management System

    • kaggle.com
    zip
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MANISH SHARMA 95 (2023). Railway Management System [Dataset]. https://www.kaggle.com/datasets/manish9569/railway-management-system
    Explore at:
    zip(3026 bytes)Available download formats
    Dataset updated
    Oct 19, 2023
    Authors
    MANISH SHARMA 95
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    **I shared dataset for SQL basic project of railway management system in reservation area this data helpful for basic details of reservation ticket . ** I collect data from Wikipedia, GitHub, Kaggle and other sources so make this project for basic understanding and with some moderate queries of SQL this also help in queries of SQL.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gaurav B R (2023). IMDB Movies Analysis - SQL [Dataset]. https://www.kaggle.com/datasets/gauravbr/imdb-movies-data-erd
Organization logo

IMDB Movies Analysis - SQL

SQL IMDB Movies Analysis for RSVP (Film Production Company)

Explore at:
zip(3818401 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
Gaurav B R
Description

SQL IMDB Movies Analysis for RSVP (Film Production Company)

RSVP Movies is an Indian film production company which has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.

The production company wants to plan their every move analytically based on data. We have taken the last three years IMDB movies data and carried out the analysis using SQL. We have analysed the data set and drew meaningful insights that could help them start their new project.

For our convenience, the entire analytics process has been divided into four segments, where each segment leads to significant insights from different combinations of tables. The questions in each segment with business objectives are written in the script given below. We have written the solution code below every question.

Search
Clear search
Close search
Google apps
Main menu