14 datasets found
  1. Student's mental health

    • kaggle.com
    zip
    Updated Apr 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdallah Nasser (2024). Student's mental health [Dataset]. https://www.kaggle.com/datasets/abdallahprogrammer/students-mental-health
    Explore at:
    zip(8102 bytes)Available download formats
    Dataset updated
    Apr 7, 2024
    Authors
    Abdallah Nasser
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Abdallah Nasser

    Released under Apache 2.0

    Contents

  2. Tech Layofss EDA Project

    • kaggle.com
    zip
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditi Dash (2024). Tech Layofss EDA Project [Dataset]. https://www.kaggle.com/aditidash30/tech-layofss-eda-project
    Explore at:
    zip(71836 bytes)Available download formats
    Dataset updated
    Jun 6, 2024
    Authors
    Aditi Dash
    Description

    • Conducted an in-depth Exploratory Data Analysis (EDA) using MySQL on a comprehensive dataset of tech layoffs from March 2020 to present, sourced from Kaggle.

    • Utilized advanced SQL queries to extract, clean, and analyze large datasets, uncovering significant insights into the timing, frequency, and scale of layoffs across various tech companies and regions.

  3. Employee Database for SQL Case Study

    • kaggle.com
    zip
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riddhi N Divecha (2025). Employee Database for SQL Case Study [Dataset]. https://www.kaggle.com/datasets/riddhindivecha/employee-database-for-sql-case-study/code
    Explore at:
    zip(890 bytes)Available download formats
    Dataset updated
    Jun 21, 2025
    Authors
    Riddhi N Divecha
    Description

    SQL Case Study Project: Employee Database Analysis 📊

    I recently completed a comprehensive SQL project involving a simulated employee database with multiple tables:

    • 🏢 DEPARTMENT
    • 👨‍💼 EMPLOYEE
    • 💼 JOB
    • 🌍 LOCATION

    In this project, I practiced and applied a wide range of SQL concepts:

    
✅ Simple Queries 
✅ Filtering with WHERE conditions 
✅ Sorting with ORDER BY 
✅ Aggregation using GROUP BY and HAVING 
✅ Multi-table JOINs
 ✅ Conditional Logic using CASE 
✅ Subqueries and Set Operators

    💡 Key Highlights:

    • Salary grade classifications
    • Department-level insights
    • Employee trends based on hire dates
    • Advanced queries like Nth highest salary

    🛠️ Tools Used:
 Azure Data Studio

    📂 You can find the entire project and scripts here:


    👉 https://github.com/RiddhiNDivecha/Employee-Database-Analysis

    This project helped me sharpen my SQL skills and understand business logic more deeply in a practical context.

    💬 I’m open to feedback and happy to connect with fellow data enthusiasts!

    SQL #DataAnalytics #PortfolioProject #CaseStudy #LearningByDoing #DataScience #SQLProject

  4. Healthcare Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishal Jaiswal (2025). Healthcare Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/jaiswalmagic1/healthcare-fraud-detection-dataset
    Explore at:
    zip(10427537 bytes)Available download formats
    Dataset updated
    Mar 6, 2025
    Authors
    Vishal Jaiswal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains comprehensive synthetic healthcare data designed for fraud detection analysis. It includes information on patients, healthcare providers, insurance claims, and payments. The dataset is structured to mimic real-world healthcare transactions, where fraudulent activities such as false claims, overbilling, and duplicate charges can be identified through advanced analytics.

    The dataset is suitable for practicing SQL queries, exploratory data analysis (EDA), machine learning for fraud detection, and visualization techniques. It is designed to help data analysts and data scientists develop and refine their analytical skills in the healthcare insurance domain.

    Dataset Overview The dataset consists of four CSV files:

    Patients Data (patients.csv)

    Contains demographic details of patients, such as age, gender, insurance type, and location. Can be used to analyze patient demographics and healthcare usage patterns. Providers Data (providers.csv)

    Contains information about healthcare providers, including provider ID, specialty, location, and associated hospital.

    Useful for identifying fraudulent claims linked to specific providers or hospitals. Claims Data (claims.csv)

    Contains records of insurance claims made by patients, including diagnosis codes, treatment details, provider ID, and claim amount.

    Can be analyzed for suspicious patterns, such as excessive claims from a single provider or duplicate claims for the same patient.

    Payments Data (payments.csv) Contains details of claim payments made by insurance companies, including payment amount, claim ID, and reimbursement status.

    Helps in detecting discrepancies between claims and actual reimbursements. Possible Analysis Ideas

    This dataset allows for multiple analysis approaches, including but not limited to:

    🔹 Fraud Detection: Identify patterns in claims data to detect fraudulent activities (e.g., excessive billing, duplicate claims). 🔹 Provider Behavior Analysis: Analyze providers who have an unusually high claim volume or high rejection rates. 🔹 Payment Trends: Compare claims vs. payments to find irregularities in reimbursement patterns. 🔹 Patient Demographics & Utilization: Explore which patient groups are more likely to file claims and receive reimbursements. 🔹 SQL Query Practice: Perform advanced SQL queries, including joins, aggregations, window functions, and subqueries, to extract insights from the data.

    Use Cases Practicing SQL queries for job interviews and real-world projects. Learning data cleaning, data wrangling, and feature engineering for healthcare analytics. Applying machine learning techniques for fraud detection. Gaining insights into the healthcare insurance domain and its challenges.

    License & Usage License: CC0 Public Domain (Free to use for any purpose).

    Attribution: Not required but appreciated. Intended Use: This dataset is for educational and research purposes only.

    This dataset is an excellent resource for aspiring data analysts, data scientists, and SQL learners who want to gain hands-on experience in healthcare fraud detection.

  5. SQLite Sakila Sample Database

    • kaggle.com
    zip
    Updated Mar 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atanas Kanev (2021). SQLite Sakila Sample Database [Dataset]. https://www.kaggle.com/datasets/atanaskanev/sqlite-sakila-sample-database/code
    Explore at:
    zip(4495190 bytes)Available download formats
    Dataset updated
    Mar 14, 2021
    Authors
    Atanas Kanev
    Description

    SQLite Sakila Sample Database

    Database Description

    The Sakila sample database is a fictitious database designed to represent a DVD rental store. The tables of the database include film, film_category, actor, customer, rental, payment and inventory among others. The Sakila sample database is intended to provide a standard schema that can be used for examples in books, tutorials, articles, samples, and so forth. Detailed information about the database can be found on the MySQL website: https://dev.mysql.com/doc/sakila/en/

    Sakila for SQLite is a part of the sakila-sample-database-ports project intended to provide ported versions of the original MySQL database for other database systems, including:

    • Oracle
    • SQL Server
    • SQLIte
    • Interbase/Firebird
    • Microsoft Access

    Sakila for SQLite is a port of the Sakila example database available for MySQL, which was originally developed by Mike Hillyer of the MySQL AB documentation team. This project is designed to help database administrators to decide which database to use for development of new products The user can run the same SQL against different kind of databases and compare the performance

    License: BSD Copyright DB Software Laboratory http://www.etl-tools.com

    Note: Part of the insert scripts were generated by Advanced ETL Processor http://www.etl-tools.com/etl-tools/advanced-etl-processor-enterprise/overview.html

    Information about the project and the downloadable files can be found at: https://code.google.com/archive/p/sakila-sample-database-ports/

    Other versions and developments of the project can be found at: https://github.com/ivanceras/sakila/tree/master/sqlite-sakila-db

    https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples/Sakila

    Direct access to the MySQL Sakila database, which does not require installation of MySQL (queries can be typed directly in the browser), is provided on the phpMyAdmin demo version website: https://demo.phpmyadmin.net/master-config/

    Files Description

    The files in the sqlite-sakila-db folder are the script files which can be used to generate the SQLite version of the database. For convenience, the script files have already been run in cmd to generate the sqlite-sakila.db file, as follows:

    sqlite> .open sqlite-sakila.db # creates the .db file sqlite> .read sqlite-sakila-schema.sql # creates the database schema sqlite> .read sqlite-sakila-insert-data.sql # inserts the data

    Therefore, the sqlite-sakila.db file can be directly loaded into SQLite3 and queries can be directly executed. You can refer to my notebook for an overview of the database and a demonstration of SQL queries. Note: Data about the film_text table is not provided in the script files, thus the film_text table is empty. Instead the film_id, title and description fields are included in the film table. Moreover, the Sakila Sample Database has many versions, so an Entity Relationship Diagram (ERD) is provided to describe this specific version. You are advised to refer to the ERD to familiarise yourself with the structure of the database.

  6. Bank Transaction Analytics Dashboard – SQL + Excel

    • kaggle.com
    zip
    Updated Aug 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prachi Singh (2025). Bank Transaction Analytics Dashboard – SQL + Excel [Dataset]. https://www.kaggle.com/datasets/prachisingh29ds/bank-transaction-analytics-dashboard-sql-excel
    Explore at:
    zip(2856220 bytes)Available download formats
    Dataset updated
    Aug 18, 2025
    Authors
    Prachi Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📊 Bank Transaction Analytics Dashboard – SQL + Excel

    🔹 Overview

    This project focuses on Bank Transaction Analysis using a combination of SQL scripts and Excel dashboards. The goal is to provide insights into customer spending patterns, payment modes, suspicious transactions, and overall financial trends.

    The dataset and analysis files can help learners and professionals understand how SQL and Excel can be used together for business decision-making, customer behavior tracking, and data-driven insights.

    🔹 Contents

    The dataset includes the following resources:

    📂 SQL Scripts:

    Create & Insert tables

    15 Basic Queries

    15 Advanced Queries

    📂 CSV File:

    Bank Transaction Analytics.csv (main dataset)

    📂 Excel Charts:

    Pie, Bar, Column, Line, Doughnut charts

    Final Interactive Dashboard

    📂 Screenshots:

    Query outputs, Charts, and Final Dashboard visualization

    📂 PDF Reports:

    Project Report

    Dashboard Report

    📄 README.md:

    Complete documentation and step-by-step explanation

    🔹 Key Insights

    26–35 age group spent the most across categories.

    Amazon identified as the top merchant.

    NetBanking showed the highest share compared to POS/UPI.

    Travel & Shopping emerged as dominant categories.

    🔹 Applications

    Detecting suspicious transactions.

    Understanding customer behavior.

    Identifying top merchants and categories.

    Building business intelligence dashboards.

    🔹 How to Use

    Download the dataset and SQL scripts.

    Run Bank_Transaction_Analytics.SQL to create and insert data.

    Execute the queries (Basic + Advanced) for insights.

    Open Excel files to explore interactive charts and dashboards.

    Refer to Project Report PDF for documentation.

    🔹 Author

    👩‍💻 Created by: Prachi Singh

    GitHub: Bank Transaction Analytics Dashboard(https://github.com/prachi-singh-ds/Bank-Transaction-Analytics-Dashboard)

    ⚡This project is a complete SQL + Excel integration case study and is suitable for Data Science, Business Analytics, and Data Engineering portfolios.

  7. Database for EagleEye

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo, Database for EagleEye [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-14615241?locale=cs
    Explore at:
    unknown(2075)Available download formats
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database was compiled during the EagleEye project (https://cordis.europa.eu/project/id/101059253), which focused on developing a novel 3D printer for high-resolution, large-area printing using digital light projection and two-photon polymerization. It contains essential printing parameters and their relationships to other key factors, such as wavelengths, laser specifications, and photosensitive materials. The data is stored in .csv and .sql formats, making it suitable for both basic and advanced tasks.

  8. Superstore Sales EDA - Nawaf Alzzeer

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nawaf Alzeer (2025). Superstore Sales EDA - Nawaf Alzzeer [Dataset]. https://www.kaggle.com/datasets/nawafalzeer/superstore-sales-eda-nawaf-alzzeer
    Explore at:
    zip(809072 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Nawaf Alzeer
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Complete data engineering project on 4 years (2014-2017) of retail sales transactions.

    DATASET CONTENTS: - Original denormalized data (9,994 rows) - Normalized database: 4 tables (customers, orders, products, sales) - 9 SQL analysis files organized by phase - Complete EDA from data cleaning to business insights

    DATABASE TABLES: - customers: 793 records - orders: 4,931 records
    - products: 1,812 records - sales: 9,686 transactions

    KEY FINDINGS: - Low profitability: 12.44% margin (below industry standard) - Discount problem: 50%+ transactions have 20%+ discounts - Loss-making: 18.66% of transactions lose money - Furniture crisis: Only 2.31% margin - Small baskets: Only 1.96 items per order

    SQL SKILLS DEMONSTRATED: ✓ Window functions (ROW_NUMBER, PARTITION BY) ✓ Database normalization (3NF) ✓ Complex JOINs (3-4 tables) ✓ Data deduplication with CTEs ✓ Business analytics queries ✓ CASE statements and aggregations

    PERFECT FOR: - SQL practice (beginner to advanced) - Database normalization learning - EDA methodology study - Business analytics projects - Data engineering portfolios

    FILES INCLUDED: - 5 CSV files (original + 4 normalized tables) - 9 SQL query files (cleaning, migration, analysis)

    Author: Nawaf Alzzeer License: CC BY-SA 4.0

  9. US_Congressional_Tweets_Dataset

    • kaggle.com
    zip
    Updated Jan 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oscar Yáñez Feijóo (2024). US_Congressional_Tweets_Dataset [Dataset]. https://www.kaggle.com/datasets/oscaryezfeijo/us-congressional-tweets-dataset
    Explore at:
    zip(243754786 bytes)Available download formats
    Dataset updated
    Jan 4, 2024
    Authors
    Oscar Yáñez Feijóo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    United States
    Description

    The "US Congressional Tweets Dataset" is a comprehensive collection of tweets from US Congressional members spanning from 2008 to 2017. This dataset is valuable for organizations like Lobbyists4America, which aims to gain insights into legislative trends and influences for effective lobbying strategies. The dataset is structured into two primary components: users_df and tweets_df.

    Dataset Structure:

    1. users_df: This DataFrame provides detailed information about the Twitter accounts of various congressional members. It includes a range of attributes such as:

      • Account creation date (created_at), follower and friend counts (followers_count, friends_count).
      • Profile-related information like description, location, and verification status.
      • Various Twitter-specific features like contributors_enabled, default_profile, is_translator, etc.
    2. tweets_df: This DataFrame contains the actual tweet data from these congressional accounts. Key columns include:

      • created_at: The timestamp of the tweet.
      • favorite_count and retweet_count: Indicators of the tweet's popularity.
      • text: The text content of the tweet.
      • Metadata such as user_id, lang (language), and source (device/app used for tweeting).
      • Other attributes like possibly_sensitive, quoted_status_id, and engagement-related fields.

    Analysis Performed:

    The dataset is utilized for various analyses, including:

    1. Network Analysis: Exploring the connections and interactions between different congressional members on Twitter, potentially revealing influential figures or groups within Congress.

    2. Sentiment Analysis: Using libraries like TextBlob and NLTK, this analysis assesses the sentiment (positive, negative, neutral) of tweets to understand the general tone and stance of congressional members on various issues.

    3. Correlation Analysis: Investigating relationships between different numerical features in the dataset, such as whether higher tweet frequencies correlate with more followers.

    4. Word Clustering/Topic Modeling: Utilizing NMF (Non-Negative Matrix Factorization) from scikit-learn to cluster words and identify major themes or topics discussed in the tweets.

    5. Time Series Analysis: Observing trends and patterns in tweeting behavior over time, such as increased activity around elections or significant political events.

    Python Libraries Used:

    • Pandas: For data manipulation and analysis.
    • Matplotlib: For visualizing the data.
    • TextBlob and NLTK: For processing textual data and performing sentiment analysis.
    • scikit-learn (sklearn): For machine learning tasks like NMF for topic modeling.
    • spaCy: An advanced natural language processing library.
    • NetworkX: For conducting network analysis.
    • ipywidgets and pytz: For creating interactive elements and handling time zones in the data, respectively.

    Conclusion:

    The "US Congressional Tweets Dataset" is a rich source for analyzing the digital footprint of US Congressional members. Through the application of various data science techniques, Lobbyists4America can extract meaningful insights about political sentiments, networking patterns, and topical trends among lawmakers. This information is crucial for tailoring lobbying efforts and understanding the legislative landscape.

  10. CVD_Vital_Signs

    • kaggle.com
    zip
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chidozie Uzoegwu (2024). CVD_Vital_Signs [Dataset]. https://www.kaggle.com/datasets/chidozieuzoegwu/cvd-vital-signs/suggestions?status=pending
    Explore at:
    zip(402097 bytes)Available download formats
    Dataset updated
    Mar 15, 2024
    Authors
    Chidozie Uzoegwu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Motiavtion The motivation behind this research stems from the pressing need to improve the prediction and management of cardiovascular disease (CVD), a leading cause of mortality worldwide. Despite advancements in medical science, there remains a significant challenge in accurately predicting and detecting CVD in its early stages. This study seeks to address this challenge by leveraging machine learning (ML) and deep learning (DL) models to analyze physiological signs associated with CVD, including respiratory rate, blood pressure, body temperature, heart rate, and oxygen saturation. By comparing the performance of various ML and DL models with a previous study conducted by Ashfaq et al., we aim to identify the most effective prediction model. The potential of achieving high accuracy rates, as demonstrated by the MLP model in our research, offers promising prospects for enhancing CVD prediction and management strategies. These findings hold implications not only for medical researchers and practitioners but also for individuals, academies, analysts, and AI enthusiasts interested in advancing healthcare technology. Furthermore, the integration of these predictive models into monitoring systems using body sensors could revolutionize the way CVD patients are managed. Real-time monitoring facilitated by advanced ML and DL algorithms could enable prompt emergency intervention, potentially saving lives and improving patient outcomes. Overall, this research contributes to the growing body of knowledge in the field of cardiovascular disease prediction and underscores the transformative potential of AI-driven approaches in healthcare.

    About the dataset In this project, we successfully utilized the MIMIC-III clinical database, renowned for its vast collection of deidentified clinical data from over 50,001 critically ill patients treated at Beth Israel Deaconess Medical Center between 2001 and 2012, as underscored by Johnson, Pollard, and Mark (2016). This database encompassed a comprehensive array of demographic information, vital signs, lab test results, treatments, medications, written notes, imaging reports, and post-hospital outcomes. Leveraging the accessibility of Google's Big Query cloud and Amazon's AWS cloud, we employed 'Amazon S3' to seamlessly extract the necessary data for our cardiovascular disease forecasting analysis.

    Data Processing Data Pre-Processing: We meticulously cleaned and prepared the raw dataset, following established procedures outlined by Chaki and Ucar (2023) and Mishra et al. (2020). This involved removing duplicates, correcting anomalies, and addressing missing values to ensure dataset accuracy. Our project efficiently utilized SQL, a standard language for relational databases, along with Amazon Web Services (AWS) Athena, a SQL-based query tool, to retrieve essential data from the MIMIC-III clinical database. Leveraging SQL queries, we accessed vital information including pulse rate, blood pressure, blood oxygen saturation, respiration rate, and body temperature. AWS Athena proved instrumental in seamlessly querying data stored on AWS, enabling swift data retrieval. Through the Athena interface, we executed SQL queries to extract the desired dataset, subsequently saving it to a CSV file for further analysis. This approach significantly streamlined the process of obtaining relevant data from the MIMIC-III database, highlighting the efficiency of SQL and AWS Athena in data retrieval for our research endeavors.

    Dealing with Outliers: We carefully evaluated our dataset for outliers and retained them, as they did not significantly deviate from the mean or standard deviation, thereby maintaining the integrity of our analysis.

    Data Transformation: Our team successfully transformed the dataset into a usable format with properly labeled variables, as outlined by Lachlan (2017). We opted not to scale variables to ensure accurate interpretation.

    Exploratory Data Analysis (EDA): Through comprehensive EDA, we identified trends and patterns in the data, facilitating hypothesis testing and informing model development.

    Model Building: Utilizing methodologies outlined by Janiesch, Zschech, & Heinrich (2021), we developed machine learning (ML) and deep learning (DL) models tailored to our research goals and dataset characteristics.

    Model Selection: We carefully selected appropriate algorithms, considering factors such as data nature, complexity, and available resources, as suggested by Ghosh and Dasgupta (2022).

    Human Biophysical Parameters The project is built upon the foundation of human biophysical parameters, serving as crucial indicators for monitoring and intervening in cardiovascular disease (CVD) patients, facilitating both long-term and near-term risk assessment. These parameters, including heart rate, respiration rate, blood pressure, and oxygen saturation, play a pivotal role in ass...

  11. Youtube Trending Videos Dataset

    • kaggle.com
    zip
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keshav Bansal95 (2025). Youtube Trending Videos Dataset [Dataset]. https://www.kaggle.com/datasets/keshavbansal95/youtube-trending-videos-dataset
    Explore at:
    zip(274004927 bytes)Available download formats
    Dataset updated
    Sep 28, 2025
    Authors
    Keshav Bansal95
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    This dataset provides a comprehensive collection of YouTube video and channel metadata curated for data analysis, visualization, and storytelling projects. It contains rich information on trending videos across multiple countries, including video performance statistics, engagement metrics, and channel-level details.

    The dataset is designed to help learners and researchers explore real-world YouTube dynamics, such as: • What type of content gains the highest views and engagement? • How do categories perform across different countries? • What role do publishing time, video duration, or tags play in driving popularity? • Which channels dominate in terms of subscribers, views, and content consistency?

    Features

    The dataset includes detailed video-level fields such as: • Video ID, title, description, and publish time • Trending date and country • Tags, categories, duration, resolution, and licensed content status • Views, likes, and comment counts

    Alongside channel-level information including: • Channel ID, title, and description • Channel country, publish date, and custom URL (if available) • Subscriber count, total views, video count, and hidden subscriber flag

    With this structured dataset, students and professionals can perform data cleaning, transformation, SQL querying, trend analysis, and dashboarding in tools such as Excel, SQL, Power BI, Tableau, and Python. It is also suitable for advanced machine learning tasks like predicting video performance, engagement modeling, and natural language processing on video titles and descriptions.

    Use Cases 1. Descriptive Analytics: Identify top categories, channels, and countries leading the YouTube trending space. 2. Comparative Analysis: Compare engagement rates across different regions and content types. 3. Visualization Projects: Create dashboards showing performance KPIs, category trends, and time-based patterns. 4. Storytelling: Derive business insights and best practices for creators, marketers, and educators on YouTube.

    Educational Value

    This dataset is structured specifically for student projects and group assignments. It ensures every learner can take a role—whether as a data engineer, analyst, visualization specialist, or business storyteller—mirroring the structure of real-world consulting projects.

    Credits

    This dataset is published as part of the YouTube Data Analytics Project initiated by Analytics Circle, an institute dedicated to empowering learners with practical data analytics, data science, and AI skills through hands-on projects and real-world applications.

  12. Health Care Data Set ( 20+ Tables )

    • kaggle.com
    zip
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moid Ahmed (2025). Health Care Data Set ( 20+ Tables ) [Dataset]. https://www.kaggle.com/datasets/moid1234/health-care-data-set-20-tables
    Explore at:
    zip(2540688774 bytes)Available download formats
    Dataset updated
    Nov 1, 2025
    Authors
    Moid Ahmed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NOTE: Please Read Text File named "ERD Relationship Text" for Detailed Information.

    This dataset represents a complete healthcare management system modeled as a relational database containing over 20 interlinked tables. It captures the entire lifecycle of healthcare operations from patient registration to diagnosis, treatment, billing, inventory, and vendor management. The data structure is designed to simulate a real-world hospital information system (HIS), enabling advanced analytics, data modeling, and visualization. You can easily visualize and explore the schema using tools like dbdiagram.io by pasting the provided table definitions.

    The dataset covers multiple operational areas of a hospital including patient information, clinical operations, financial transactions, human resources, and logistics.

    Patient Information includes personal, contact, and emergency details, along with identification and insurance. Clinical Operations include visits, appointments, diagnoses, treatments, and medications. Financial Transactions cover bills, payments, and vendor settlements. Human Resources include staff details, departments, and medical teams. Logistics and Inventory include equipment, medicines, supplies, and vendor relationships.

    • Patients (STG_EHP_PATN) are linked to Appointments, Visits, Diagnoses, Treatments, Bills, and Insurance Policies.
    • Medical Teams (STG_EHP_MEDT) connect Staff with Visits and Treatments.
    • Allergies and Patient Allergies tables track patient-specific allergy information.
    • Financial tables (Bills, Payments, Vendor Payments) are interconnected through reference numbers for consistent transaction tracing.
    • Inventory tables record medicine and equipment stock movements, supply receipts, and vendor sourcing.

    This dataset can be used for data modeling and SQL practice for complex joins and normalization, healthcare analytics projects involving cost analysis, treatment efficiency, and patient demographics, visualization projects in Power BI, Tableau, or Domo for operational insights, building ETL pipelines and data warehouse models for healthcare systems, and machine learning applications such as predicting patient readmission, billing anomalies, or treatment outcomes.

    To explore the data relationships visually, go to dbdiagram.io, paste the entire provided schema code, and press 2 then 1 (or 2 and Enter) to auto-align the diagram. You’ll see an interactive Entity Relationship Diagram (ERD) representing the entire healthcare ecosystem.

    Total Tables: 20+ Total Columns: 200+ Primary Focus: Patient Management, Clinical Operations, Billing, and Supply Chain

  13. Data from: Hotel Revenue

    • kaggle.com
    zip
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Govind Krishnadas (2022). Hotel Revenue [Dataset]. https://www.kaggle.com/govindkrishnadas/hotel-revenue
    Explore at:
    zip(17007464 bytes)Available download formats
    Dataset updated
    Oct 5, 2022
    Authors
    Govind Krishnadas
    Description

    Revenue management is more crucial than ever to run a successful and profitable hotel. With all the information that's now readily accessible and there are different ways to track and analyze it, your business has a wealth of new opportunities. Successful hoteliers continuously learn and improve their methods to stay one step ahead of their competition. Revenue management strategies are used by only a small percentage of independent hoteliers, limiting their revenue-generating potential.

    In this Hotel-revenue project, I will address a few questions a hotel management team faces.

    The questions are outlined below: 1) What is the hotel revenue growth per year? 2)Is there any relation between guest and their personal cars? 2) Is there any kind of trends/patterns observed in the data?

    These questions are solved using data-driven technologies. In this project, I will use Python and write SQL queries to solve these questions.

    This dataset can be used for learning purposes. Future work would be to perform advance machine learning algorithms and forecasting techniques that can generate enormous insights that could help the hotel management company to outline different strategies and business planning.

  14. Blinkit dataset

    • kaggle.com
    zip
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mukesh gadri (2024). Blinkit dataset [Dataset]. https://www.kaggle.com/datasets/mukeshgadri/blinkit-dataset
    Explore at:
    zip(695160 bytes)Available download formats
    Dataset updated
    Jul 18, 2024
    Authors
    mukesh gadri
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    In the case study titled "Blinkit: Grocery Product Analysis," a dataset called 'Grocery Sales' contains 12 columns with information on sales of grocery items across different outlets. Using Tableau, you as a data analyst can uncover customer behavior insights, track sales trends, and gather feedback. These insights will drive operational improvements, enhance customer satisfaction, and optimize product offerings and store layout. Tableau enables data-driven decision-making for positive outcomes at Blinkit.

    The table Grocery Sales is a .CSV file and has the following columns, details of which are as follows:

    • Item_Identifier: A unique ID for each product in the dataset. • Item_Weight: The weight of the product. • Item_Fat_Content: Indicates whether the product is low fat or not. • Item_Visibility: The percentage of the total display area in the store that is allocated to the specific product. • Item_Type: The category or type of product. • Item_MRP: The maximum retail price (list price) of the product. • Outlet_Identifier: A unique ID for each store in the dataset. • Outlet_Establishment_Year: The year in which the store was established. • Outlet_Size: The size of the store in terms of ground area covered. • Outlet_Location_Type: The type of city or region in which the store is located. • Outlet_Type: Indicates whether the store is a grocery store or a supermarket. • Item_Outlet_Sales: The sales of the product in the particular store. This is the outcome variable that we want to predict.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Abdallah Nasser (2024). Student's mental health [Dataset]. https://www.kaggle.com/datasets/abdallahprogrammer/students-mental-health
Organization logo

Student's mental health

Advanced SQL Project Dataset

Explore at:
zip(8102 bytes)Available download formats
Dataset updated
Apr 7, 2024
Authors
Abdallah Nasser
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by Abdallah Nasser

Released under Apache 2.0

Contents

Search
Clear search
Close search
Google apps
Main menu