7 datasets found
  1. MedSynora DW - Medical Data Warehouse

    • kaggle.com
    zip
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BenMebrar (2025). MedSynora DW - Medical Data Warehouse [Dataset]. https://www.kaggle.com/datasets/mebrar21/medsynora-dw
    Explore at:
    zip(89253728 bytes)Available download formats
    Dataset updated
    Mar 14, 2025
    Authors
    BenMebrar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    MedSynora DW – A Comprehensive Synthetic Hospital Patient Data Warehouse

    Overview MedSynora DW is a huge synthetic dataset designed to simulate the operation flow by adopting a patient-based approach in a large hospital. This dataset covers patient encounters, treatments, lab tests, vital signs, cost details and more over a full year of 2024. It is developed to support data science, machine learning, and business intelligence projects in the healthcare domain.

    Project Highlights • Realistic Simulation: Generated using advanced Python scripts and statistical models, the dataset reflects realistic hospital operations and patient flows without using any real patient data. • Comprehensive Schema: The data warehouse includes multiple fact and dimension tables: o Fact Tables: Encounter, Treatment, Lab Tests, Special Tests, Vitals, and Cost. o Dimension Tables: Patient, Doctor, Disease, Insurance, Room, Date, Chronic Diseases, Allergies, and Additional Services. o Bridge Tables: For managing many-to-many relationships (e.g., doctors per encounter) and some other… • Synthetic & Scalable: The dataset is entirely synthetic, ensuring privacy and compliance. It is designed to be scalable – the current version simulates around 145,000 encounter records.

    Data Generation • Data Sources & Methods: Data is generated using bunch of Py libraries. Highly customized algorithms simulate realistic patient demographics, doctor assignments, treatment choices, lab test results, and cost breakdowns etc.. • Diverse Scenarios: With over 300 diseases and thousands of treatment variations, along with dozens of lab and special tests, the dataset offers profoundly rich variability to support complex analytical projects.

    How to Use This Dataset • For Data Modeling & ETL Testing: Import the CSV files into your favorite database system (e.g., PostgreSQL, MySQL, or directly into a BI tool like Power BI) and set up relationships as described in the accompanying documentation. • For Machine Learning Projects: Use the dataset to build predictive models related to patient outcomes, cost analysis, or treatment efficacy. • For Educational Purposes: Ideal for learning about data warehousing, star schema design, and advanced analytics in healthcare.

    Final Note MedSynora DW offers a unique opportunity to experiment with a comprehensive, realistic hospital data warehouse without compromising real patient information. Enjoy exploring, analyzing, and building with this dataset – and feel free to reach out if you have any questions or suggestions. In particular, inconsistencies, deficiencies or suggestions about the dataset by experts in the field will contribute to other version improvements.

  2. Global Country Information Dataset 2023

    • kaggle.com
    zip
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
    Explore at:
    zip(24063 bytes)Available download formats
    Dataset updated
    Jul 8, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    DOI

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.

    Data Source: This dataset was compiled from multiple data sources

    If this was helpful, a vote is appreciated ❤️ Thank you 🙂

  3. Health Care Data Set ( 20+ Tables )

    • kaggle.com
    zip
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moid Ahmed (2025). Health Care Data Set ( 20+ Tables ) [Dataset]. https://www.kaggle.com/datasets/moid1234/health-care-data-set-20-tables
    Explore at:
    zip(2540688774 bytes)Available download formats
    Dataset updated
    Nov 1, 2025
    Authors
    Moid Ahmed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NOTE: Please Read Text File named "ERD Relationship Text" for Detailed Information.

    This dataset represents a complete healthcare management system modeled as a relational database containing over 20 interlinked tables. It captures the entire lifecycle of healthcare operations from patient registration to diagnosis, treatment, billing, inventory, and vendor management. The data structure is designed to simulate a real-world hospital information system (HIS), enabling advanced analytics, data modeling, and visualization. You can easily visualize and explore the schema using tools like dbdiagram.io by pasting the provided table definitions.

    The dataset covers multiple operational areas of a hospital including patient information, clinical operations, financial transactions, human resources, and logistics.

    Patient Information includes personal, contact, and emergency details, along with identification and insurance. Clinical Operations include visits, appointments, diagnoses, treatments, and medications. Financial Transactions cover bills, payments, and vendor settlements. Human Resources include staff details, departments, and medical teams. Logistics and Inventory include equipment, medicines, supplies, and vendor relationships.

    • Patients (STG_EHP_PATN) are linked to Appointments, Visits, Diagnoses, Treatments, Bills, and Insurance Policies.
    • Medical Teams (STG_EHP_MEDT) connect Staff with Visits and Treatments.
    • Allergies and Patient Allergies tables track patient-specific allergy information.
    • Financial tables (Bills, Payments, Vendor Payments) are interconnected through reference numbers for consistent transaction tracing.
    • Inventory tables record medicine and equipment stock movements, supply receipts, and vendor sourcing.

    This dataset can be used for data modeling and SQL practice for complex joins and normalization, healthcare analytics projects involving cost analysis, treatment efficiency, and patient demographics, visualization projects in Power BI, Tableau, or Domo for operational insights, building ETL pipelines and data warehouse models for healthcare systems, and machine learning applications such as predicting patient readmission, billing anomalies, or treatment outcomes.

    To explore the data relationships visually, go to dbdiagram.io, paste the entire provided schema code, and press 2 then 1 (or 2 and Enter) to auto-align the diagram. You’ll see an interactive Entity Relationship Diagram (ERD) representing the entire healthcare ecosystem.

    Total Tables: 20+ Total Columns: 200+ Primary Focus: Patient Management, Clinical Operations, Billing, and Supply Chain

  4. Adventure Works 2022 CSVs

    • kaggle.com
    zip
    Updated Nov 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
    Explore at:
    zip(567646 bytes)Available download formats
    Dataset updated
    Nov 2, 2022
    Authors
    Algorismus
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Adventure Works 2022 dataset

    How this Dataset is created?

    On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

    How this Dataset may help you?

    this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

    How to use this Dataset?

    Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.

  5. HR Analytics Dataset

    • kaggle.com
    zip
    Updated Oct 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saad Haroon (2023). HR Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/saadharoon27/hr-analytics-dataset/code
    Explore at:
    zip(56327 bytes)Available download formats
    Dataset updated
    Oct 8, 2023
    Authors
    Saad Haroon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    HR Analytics Power BI Project

    Overview

    This Power BI project focuses on HR analytics and aims to provide insights into various aspects of human resources within an organization. The dataset used for analysis contains several columns related to employee information as stated below.

    Dataset Columns

    The dataset used in this Power BI project contains the following columns:

    EmpID: Employee ID Age: Age of the employee AgeGroup: Age group to which the employee belongs Attrition: Employee attrition status (whether the employee has left the organization or is still active) BusinessTravel: Frequency of business travel for the employee DailyRate: Daily rate of pay for the employee Department: Department in which the employee works DistanceFromHome: Distance in miles from the employee's home to the workplace Education: Level of education attained by the employee EducationField: Field of education of the employee EmployeeCount: Number of employees EmployeeNumber: Unique identifier for each employee EnvironmentSatisfaction: Employee's satisfaction level with the work environment Gender: Gender of the employee HourlyRate: Hourly rate of pay for the employee JobInvolvement: Employee's level of job involvement JobLevel: Level of the employee's job position JobRole: Role of the employee within the organization JobSatisfaction: Employee's satisfaction level with their job MaritalStatus: Marital status of the employee MonthlyIncome: Monthly income of the employee SalarySlab: Categorization of monthly income into salary slabs MonthlyRate: Monthly rate of pay for the employee NumCompaniesWorked: Number of companies the employee has worked for in the past Over18: Whether the employee is over 18 years old OverTime: Whether the employee works overtime or not PercentSalaryHike: Percentage increase in salary for the employee PerformanceRating: Performance rating of the employee RelationshipSatisfaction: Employee's satisfaction level with work relationships StandardHours: Standard working hours for the employee StockOptionLevel: Level of stock options granted to the employee TotalWorkingYears: Total number of years the employee has worked TrainingTimesLastYear: Number of training sessions attended by the employee in the last year WorkLifeBalance: Employee's work-life balance satisfaction level YearsAtCompany: Number of years the employee has worked at the current company YearsInCurrentRole: Number of years the employee has been in the current role YearsSinceLastPromotion: Number of years since the employee's last promotion YearsWithCurrManager: Number of years the employee has been working with the current manager

  6. Football Manager 2023: 90k+ Player Stats

    • kaggle.com
    zip
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siddhraj Thakor (2025). Football Manager 2023: 90k+ Player Stats [Dataset]. https://www.kaggle.com/datasets/siddhrajthakor/football-manager-2023-dataset
    Explore at:
    zip(9373378 bytes)Available download formats
    Dataset updated
    Oct 1, 2025
    Authors
    Siddhraj Thakor
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Football Manager Players Dataset

    Overview

    Dive into the ultimate treasure trove for football enthusiasts, data analysts, and gaming aficionados! The Football Manager Players Dataset is a comprehensive collection of player data extracted from a popular football management simulation game, offering an unparalleled look into the virtual world of football talent. This dataset includes detailed attributes for thousands of players across multiple leagues worldwide, making it a goldmine for analyzing player profiles, scouting virtual stars, and building predictive models for football strategies.

    Whether you're a data scientist exploring sports analytics, a football fan curious about your favorite virtual players, or a game developer seeking inspiration, this dataset is your ticket to unlocking endless possibilities!

    Dataset Description

    This dataset is a meticulously curated compilation of player statistics from five CSV files, merged into a single, unified dataset (merged_players.csv). It captures a diverse range of attributes for players from various clubs, nations, and leagues, including top-tier competitions like the English Premier Division, Argentina's Premier Division, and lower divisions across the globe.

    Key Features

    • Rich Player Attributes: Over 70 columns covering essential metrics such as:
      • Basic Info: UID, Name, Date of Birth (DOB), Nationality, Height, Weight, Age
      • Club & Position: Club, Position (e.g., AM, DM, GK), Based (league/division)
      • Performance Stats: Caps, Appearances (AT Apps), Goals (AT Gls), League Appearances, League Goals
      • Technical Skills: Acceleration, Passing, Dribbling, Finishing, Tackling, and more
      • Mental Attributes: Work Rate, Vision, Leadership, Determination
      • Physical Attributes: Pace, Strength, Stamina, Agility
      • Market Value: Transfer Value (e.g., $0 to millions)
      • Miscellaneous: Preferred Foot, Media Handling, Injury Proneness
    • Global Coverage: Players from diverse regions, including Europe (England, Spain, Italy), South America (Argentina, Brazil), Asia (South Korea, China), Africa (Ivory Coast, Burkina Faso), and North America (USA, Mexico).
    • Varied Player Types: From young prospects (15–18 years old) to veteran stars (up to 45 years old), including amateurs, youth players, and professionals.
    • Realistic Insights: Includes attributes like Media Description (e.g., "Young winger," "Veteran striker") and injury status, mirroring real-world football dynamics.

    Dataset Size

    • Rows: Thousands of player records (exact count depends on deduplication).
    • Columns: 70+ attributes per player.
    • File: merged_players.csv (UTF-8 encoded for compatibility with special characters).

    Potential Use Cases

    • Sports Analytics:
      • Analyze player attributes to identify key traits for success by position (e.g., what makes a top goalkeeper?).
      • Predict transfer values based on skills, age, and performance stats.
      • Cluster players by playing style or potential using machine learning.
    • Scouting & Strategy:
      • Build a dream team by filtering players based on specific attributes (e.g., high Pace and Dribbling for wingers).
      • Compare young talents vs. experienced veterans for team-building strategies.
    • Gaming & Modding:
      • Create custom Football Manager databases or mods.
      • Analyze game balance by studying attribute distributions.
    • Visualization:
      • Develop interactive dashboards to explore player stats by league, nationality, or position.
      • Map player origins to visualize global football talent distribution.
    • Education & Research:
      • Use as a teaching tool for data science, exploring data cleaning, merging, and analysis.
      • Study correlations between mental/physical attributes and in-game performance.

    Why This Dataset Stands Out

    • Comprehensive: Covers every aspect of a player's profile, from technical skills to personality traits.
    • Diverse: Includes players from top-tier to lower divisions, offering a broad spectrum of talent.
    • Engaging: Perfect for football fans and data enthusiasts alike, blending gaming with real-world analytics.
    • Ready-to-Use: Merged and cleaned for immediate analysis, with consistent column structure across all records.

    Getting Started

    1. Download: Grab merged_players.csv and load it into your favorite tool (Python/pandas, R, Excel, etc.).
    2. Explore: Check out columns like Transfer Value, Position, and Media Description to start your analysis.
    3. Analyze: Use Python (e.g., pandas, scikit-learn) or visualization tools (e.g., Tableau, Power BI) to uncover insights.
    4. Share: Build models, visualizations, or scouting reports and share your findings with the Kaggle community!

    Example Questions to Explore

    • Which young players (<18 years) have the highest poten...
  7. Supply Chain DataSet

    • kaggle.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Motefaker (2023). Supply Chain DataSet [Dataset]. https://www.kaggle.com/datasets/amirmotefaker/supply-chain-dataset
    Explore at:
    zip(9340 bytes)Available download formats
    Dataset updated
    Jun 1, 2023
    Authors
    Amir Motefaker
    Description

    Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BenMebrar (2025). MedSynora DW - Medical Data Warehouse [Dataset]. https://www.kaggle.com/datasets/mebrar21/medsynora-dw
Organization logo

MedSynora DW - Medical Data Warehouse

Elevate Healthcare Analytics with a Comprehensive Gigantic Data Warehouse

Explore at:
zip(89253728 bytes)Available download formats
Dataset updated
Mar 14, 2025
Authors
BenMebrar
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

MedSynora DW – A Comprehensive Synthetic Hospital Patient Data Warehouse

Overview MedSynora DW is a huge synthetic dataset designed to simulate the operation flow by adopting a patient-based approach in a large hospital. This dataset covers patient encounters, treatments, lab tests, vital signs, cost details and more over a full year of 2024. It is developed to support data science, machine learning, and business intelligence projects in the healthcare domain.

Project Highlights • Realistic Simulation: Generated using advanced Python scripts and statistical models, the dataset reflects realistic hospital operations and patient flows without using any real patient data. • Comprehensive Schema: The data warehouse includes multiple fact and dimension tables: o Fact Tables: Encounter, Treatment, Lab Tests, Special Tests, Vitals, and Cost. o Dimension Tables: Patient, Doctor, Disease, Insurance, Room, Date, Chronic Diseases, Allergies, and Additional Services. o Bridge Tables: For managing many-to-many relationships (e.g., doctors per encounter) and some other… • Synthetic & Scalable: The dataset is entirely synthetic, ensuring privacy and compliance. It is designed to be scalable – the current version simulates around 145,000 encounter records.

Data Generation • Data Sources & Methods: Data is generated using bunch of Py libraries. Highly customized algorithms simulate realistic patient demographics, doctor assignments, treatment choices, lab test results, and cost breakdowns etc.. • Diverse Scenarios: With over 300 diseases and thousands of treatment variations, along with dozens of lab and special tests, the dataset offers profoundly rich variability to support complex analytical projects.

How to Use This Dataset • For Data Modeling & ETL Testing: Import the CSV files into your favorite database system (e.g., PostgreSQL, MySQL, or directly into a BI tool like Power BI) and set up relationships as described in the accompanying documentation. • For Machine Learning Projects: Use the dataset to build predictive models related to patient outcomes, cost analysis, or treatment efficacy. • For Educational Purposes: Ideal for learning about data warehousing, star schema design, and advanced analytics in healthcare.

Final Note MedSynora DW offers a unique opportunity to experiment with a comprehensive, realistic hospital data warehouse without compromising real patient information. Enjoy exploring, analyzing, and building with this dataset – and feel free to reach out if you have any questions or suggestions. In particular, inconsistencies, deficiencies or suggestions about the dataset by experts in the field will contribute to other version improvements.

Search
Clear search
Close search
Google apps
Main menu