7 datasets found

MedSynora DW - Medical Data Warehouse
kaggle.com
zip
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BenMebrar (2025). MedSynora DW - Medical Data Warehouse [Dataset]. https://www.kaggle.com/datasets/mebrar21/medsynora-dw
Explore at:
zip(89253728 bytes)Available download formats
Dataset updated
Mar 14, 2025
Authors
BenMebrar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
MedSynora DW – A Comprehensive Synthetic Hospital Patient Data Warehouse

Overview MedSynora DW is a huge synthetic dataset designed to simulate the operation flow by adopting a patient-based approach in a large hospital. This dataset covers patient encounters, treatments, lab tests, vital signs, cost details and more over a full year of 2024. It is developed to support data science, machine learning, and business intelligence projects in the healthcare domain.

Project Highlights • Realistic Simulation: Generated using advanced Python scripts and statistical models, the dataset reflects realistic hospital operations and patient flows without using any real patient data. • Comprehensive Schema: The data warehouse includes multiple fact and dimension tables: o Fact Tables: Encounter, Treatment, Lab Tests, Special Tests, Vitals, and Cost. o Dimension Tables: Patient, Doctor, Disease, Insurance, Room, Date, Chronic Diseases, Allergies, and Additional Services. o Bridge Tables: For managing many-to-many relationships (e.g., doctors per encounter) and some other… • Synthetic & Scalable: The dataset is entirely synthetic, ensuring privacy and compliance. It is designed to be scalable – the current version simulates around 145,000 encounter records.

Data Generation • Data Sources & Methods: Data is generated using bunch of Py libraries. Highly customized algorithms simulate realistic patient demographics, doctor assignments, treatment choices, lab test results, and cost breakdowns etc.. • Diverse Scenarios: With over 300 diseases and thousands of treatment variations, along with dozens of lab and special tests, the dataset offers profoundly rich variability to support complex analytical projects.

How to Use This Dataset • For Data Modeling & ETL Testing: Import the CSV files into your favorite database system (e.g., PostgreSQL, MySQL, or directly into a BI tool like Power BI) and set up relationships as described in the accompanying documentation. • For Machine Learning Projects: Use the dataset to build predictive models related to patient outcomes, cost analysis, or treatment efficacy. • For Educational Purposes: Ideal for learning about data warehousing, star schema design, and advanced analytics in healthcare.

Final Note MedSynora DW offers a unique opportunity to experiment with a comprehensive, realistic hospital data warehouse without compromising real patient information. Enjoy exploring, analyzing, and building with this dataset – and feel free to reach out if you have any questions or suggestions. In particular, inconsistencies, deficiencies or suggestions about the dataset by experts in the field will contribute to other version improvements.
Global Country Information Dataset 2023
kaggle.com
zip
Updated Jul 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
Explore at:
zip(24063 bytes)Available download formats
Dataset updated
Jul 8, 2023
Authors
Nidula Elgiriyewithana ⚡
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

Key Features

Country: Name of the country.

Density (P/Km2): Population density measured in persons per square kilometer.

Abbreviation: Abbreviation or code representing the country.

Agricultural Land (%): Percentage of land area used for agricultural purposes.

Land Area (Km2): Total land area of the country in square kilometers.

Armed Forces Size: Size of the armed forces in the country.

Birth Rate: Number of births per 1,000 population per year.

Calling Code: International calling code for the country.

Capital/Major City: Name of the capital or major city.

CO2 Emissions: Carbon dioxide emissions in tons.

CPI: Consumer Price Index, a measure of inflation and purchasing power.

CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.

Currency_Code: Currency code used in the country.

Fertility Rate: Average number of children born to a woman during her lifetime.

Forested Area (%): Percentage of land area covered by forests.

Gasoline_Price: Price of gasoline per liter in local currency.

GDP: Gross Domestic Product, the total value of goods and services produced in the country.

Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.

Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.

Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.

Largest City: Name of the country's largest city.

Life Expectancy: Average number of years a newborn is expected to live.

Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.

Minimum Wage: Minimum wage level in local currency.

Official Language: Official language(s) spoken in the country.

Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.

Physicians per Thousand: Number of physicians per thousand people.

Population: Total population of the country.

Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.

Tax Revenue (%): Tax revenue as a percentage of GDP.

Total Tax Rate: Overall tax burden as a percentage of commercial profits.

Unemployment Rate: Percentage of the labor force that is unemployed.

Urban Population: Percentage of the population living in urban areas.

Latitude: Latitude coordinate of the country's location.

Longitude: Longitude coordinate of the country's location.

Potential Use Cases

Analyze population density and land area to study spatial distribution patterns.

Investigate the relationship between agricultural land and food security.

Examine carbon dioxide emissions and their impact on climate change.

Explore correlations between economic indicators such as GDP and various socio-economic factors.

Investigate educational enrollment rates and their implications for human capital development.

Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.

Study labor market dynamics through indicators such as labor force participation and unemployment rates.

Investigate the role of taxation and its impact on economic development.

Explore urbanization trends and their social and environmental consequences.

Data Source: This dataset was compiled from multiple data sources

If this was helpful, a vote is appreciated ❤️ Thank you 🙂
Health Care Data Set ( 20+ Tables )
kaggle.com
zip
Updated Nov 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moid Ahmed (2025). Health Care Data Set ( 20+ Tables ) [Dataset]. https://www.kaggle.com/datasets/moid1234/health-care-data-set-20-tables
Explore at:
zip(2540688774 bytes)Available download formats
Dataset updated
Nov 1, 2025
Authors
Moid Ahmed
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
NOTE: Please Read Text File named "ERD Relationship Text" for Detailed Information.

This dataset represents a complete healthcare management system modeled as a relational database containing over 20 interlinked tables. It captures the entire lifecycle of healthcare operations from patient registration to diagnosis, treatment, billing, inventory, and vendor management. The data structure is designed to simulate a real-world hospital information system (HIS), enabling advanced analytics, data modeling, and visualization. You can easily visualize and explore the schema using tools like dbdiagram.io by pasting the provided table definitions.

The dataset covers multiple operational areas of a hospital including patient information, clinical operations, financial transactions, human resources, and logistics.

Patient Information includes personal, contact, and emergency details, along with identification and insurance. Clinical Operations include visits, appointments, diagnoses, treatments, and medications. Financial Transactions cover bills, payments, and vendor settlements. Human Resources include staff details, departments, and medical teams. Logistics and Inventory include equipment, medicines, supplies, and vendor relationships.

Patients (STG_EHP_PATN) are linked to Appointments, Visits, Diagnoses, Treatments, Bills, and Insurance Policies.

Medical Teams (STG_EHP_MEDT) connect Staff with Visits and Treatments.

Allergies and Patient Allergies tables track patient-specific allergy information.

Financial tables (Bills, Payments, Vendor Payments) are interconnected through reference numbers for consistent transaction tracing.

Inventory tables record medicine and equipment stock movements, supply receipts, and vendor sourcing.

This dataset can be used for data modeling and SQL practice for complex joins and normalization, healthcare analytics projects involving cost analysis, treatment efficiency, and patient demographics, visualization projects in Power BI, Tableau, or Domo for operational insights, building ETL pipelines and data warehouse models for healthcare systems, and machine learning applications such as predicting patient readmission, billing anomalies, or treatment outcomes.

To explore the data relationships visually, go to dbdiagram.io, paste the entire provided schema code, and press 2 then 1 (or 2 and Enter) to auto-align the diagram. You’ll see an interactive Entity Relationship Diagram (ERD) representing the entire healthcare ecosystem.

Total Tables: 20+ Total Columns: 200+ Primary Focus: Patient Management, Clinical Operations, Billing, and Supply Chain
Adventure Works 2022 CSVs
kaggle.com
zip
Updated Nov 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Algorismus (2022). Adventure Works 2022 CSVs [Dataset]. https://www.kaggle.com/datasets/algorismus/adventure-works-in-excel-tables
Explore at:
zip(567646 bytes)Available download formats
Dataset updated
Nov 2, 2022
Authors
Algorismus
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Adventure Works 2022 dataset

How this Dataset is created?

On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.

How this Dataset may help you?

this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.

How to use this Dataset?

Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
HR Analytics Dataset
kaggle.com
zip
Updated Oct 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saad Haroon (2023). HR Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/saadharoon27/hr-analytics-dataset/code
Explore at:
zip(56327 bytes)Available download formats
Dataset updated
Oct 8, 2023
Authors
Saad Haroon
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
HR Analytics Power BI Project

Overview

This Power BI project focuses on HR analytics and aims to provide insights into various aspects of human resources within an organization. The dataset used for analysis contains several columns related to employee information as stated below.

Dataset Columns

The dataset used in this Power BI project contains the following columns:

EmpID: Employee ID Age: Age of the employee AgeGroup: Age group to which the employee belongs Attrition: Employee attrition status (whether the employee has left the organization or is still active) BusinessTravel: Frequency of business travel for the employee DailyRate: Daily rate of pay for the employee Department: Department in which the employee works DistanceFromHome: Distance in miles from the employee's home to the workplace Education: Level of education attained by the employee EducationField: Field of education of the employee EmployeeCount: Number of employees EmployeeNumber: Unique identifier for each employee EnvironmentSatisfaction: Employee's satisfaction level with the work environment Gender: Gender of the employee HourlyRate: Hourly rate of pay for the employee JobInvolvement: Employee's level of job involvement JobLevel: Level of the employee's job position JobRole: Role of the employee within the organization JobSatisfaction: Employee's satisfaction level with their job MaritalStatus: Marital status of the employee MonthlyIncome: Monthly income of the employee SalarySlab: Categorization of monthly income into salary slabs MonthlyRate: Monthly rate of pay for the employee NumCompaniesWorked: Number of companies the employee has worked for in the past Over18: Whether the employee is over 18 years old OverTime: Whether the employee works overtime or not PercentSalaryHike: Percentage increase in salary for the employee PerformanceRating: Performance rating of the employee RelationshipSatisfaction: Employee's satisfaction level with work relationships StandardHours: Standard working hours for the employee StockOptionLevel: Level of stock options granted to the employee TotalWorkingYears: Total number of years the employee has worked TrainingTimesLastYear: Number of training sessions attended by the employee in the last year WorkLifeBalance: Employee's work-life balance satisfaction level YearsAtCompany: Number of years the employee has worked at the current company YearsInCurrentRole: Number of years the employee has been in the current role YearsSinceLastPromotion: Number of years since the employee's last promotion YearsWithCurrManager: Number of years the employee has been working with the current manager
Football Manager 2023: 90k+ Player Stats
kaggle.com
zip
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siddhraj Thakor (2025). Football Manager 2023: 90k+ Player Stats [Dataset]. https://www.kaggle.com/datasets/siddhrajthakor/football-manager-2023-dataset
Explore at:
zip(9373378 bytes)Available download formats
Dataset updated
Oct 1, 2025
Authors
Siddhraj Thakor
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Football Manager Players Dataset

Overview

Dive into the ultimate treasure trove for football enthusiasts, data analysts, and gaming aficionados! The Football Manager Players Dataset is a comprehensive collection of player data extracted from a popular football management simulation game, offering an unparalleled look into the virtual world of football talent. This dataset includes detailed attributes for thousands of players across multiple leagues worldwide, making it a goldmine for analyzing player profiles, scouting virtual stars, and building predictive models for football strategies.

Whether you're a data scientist exploring sports analytics, a football fan curious about your favorite virtual players, or a game developer seeking inspiration, this dataset is your ticket to unlocking endless possibilities!

Dataset Description

This dataset is a meticulously curated compilation of player statistics from five CSV files, merged into a single, unified dataset (merged_players.csv). It captures a diverse range of attributes for players from various clubs, nations, and leagues, including top-tier competitions like the English Premier Division, Argentina's Premier Division, and lower divisions across the globe.

Key Features

Rich Player Attributes: Over 70 columns covering essential metrics such as:

Basic Info: UID, Name, Date of Birth (DOB), Nationality, Height, Weight, Age

Club & Position: Club, Position (e.g., AM, DM, GK), Based (league/division)

Performance Stats: Caps, Appearances (AT Apps), Goals (AT Gls), League Appearances, League Goals

Technical Skills: Acceleration, Passing, Dribbling, Finishing, Tackling, and more

Mental Attributes: Work Rate, Vision, Leadership, Determination

Physical Attributes: Pace, Strength, Stamina, Agility

Market Value: Transfer Value (e.g., $0 to millions)

Miscellaneous: Preferred Foot, Media Handling, Injury Proneness

Global Coverage: Players from diverse regions, including Europe (England, Spain, Italy), South America (Argentina, Brazil), Asia (South Korea, China), Africa (Ivory Coast, Burkina Faso), and North America (USA, Mexico).

Varied Player Types: From young prospects (15–18 years old) to veteran stars (up to 45 years old), including amateurs, youth players, and professionals.

Realistic Insights: Includes attributes like Media Description (e.g., "Young winger," "Veteran striker") and injury status, mirroring real-world football dynamics.

Dataset Size

Rows: Thousands of player records (exact count depends on deduplication).

Columns: 70+ attributes per player.

File: merged_players.csv (UTF-8 encoded for compatibility with special characters).

Potential Use Cases

Sports Analytics:

Analyze player attributes to identify key traits for success by position (e.g., what makes a top goalkeeper?).

Predict transfer values based on skills, age, and performance stats.

Cluster players by playing style or potential using machine learning.

Scouting & Strategy:

Build a dream team by filtering players based on specific attributes (e.g., high Pace and Dribbling for wingers).

Compare young talents vs. experienced veterans for team-building strategies.

Gaming & Modding:

Create custom Football Manager databases or mods.

Analyze game balance by studying attribute distributions.

Visualization:

Develop interactive dashboards to explore player stats by league, nationality, or position.

Map player origins to visualize global football talent distribution.

Education & Research:

Use as a teaching tool for data science, exploring data cleaning, merging, and analysis.

Study correlations between mental/physical attributes and in-game performance.

Why This Dataset Stands Out

Comprehensive: Covers every aspect of a player's profile, from technical skills to personality traits.

Diverse: Includes players from top-tier to lower divisions, offering a broad spectrum of talent.

Engaging: Perfect for football fans and data enthusiasts alike, blending gaming with real-world analytics.

Ready-to-Use: Merged and cleaned for immediate analysis, with consistent column structure across all records.

Getting Started

Download: Grab merged_players.csv and load it into your favorite tool (Python/pandas, R, Excel, etc.).

Explore: Check out columns like Transfer Value, Position, and Media Description to start your analysis.

Analyze: Use Python (e.g., pandas, scikit-learn) or visualization tools (e.g., Tableau, Power BI) to uncover insights.

Share: Build models, visualizations, or scouting reports and share your findings with the Kaggle community!

Example Questions to Explore

Which young players (<18 years) have the highest poten...
Supply Chain DataSet
kaggle.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amir Motefaker (2023). Supply Chain DataSet [Dataset]. https://www.kaggle.com/datasets/amirmotefaker/supply-chain-dataset
Explore at:
zip(9340 bytes)Available download formats
Dataset updated
Jun 1, 2023
Authors
Amir Motefaker
Description
Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

BenMebrar (2025). MedSynora DW - Medical Data Warehouse [Dataset]. https://www.kaggle.com/datasets/mebrar21/medsynora-dw

MedSynora DW - Medical Data Warehouse

Elevate Healthcare Analytics with a Comprehensive Gigantic Data Warehouse

Explore at:

zip(89253728 bytes)Available download formats

Dataset updated

Mar 14, 2025

Authors

BenMebrar

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

MedSynora DW – A Comprehensive Synthetic Hospital Patient Data Warehouse

Overview MedSynora DW is a huge synthetic dataset designed to simulate the operation flow by adopting a patient-based approach in a large hospital. This dataset covers patient encounters, treatments, lab tests, vital signs, cost details and more over a full year of 2024. It is developed to support data science, machine learning, and business intelligence projects in the healthcare domain.

Project Highlights • Realistic Simulation: Generated using advanced Python scripts and statistical models, the dataset reflects realistic hospital operations and patient flows without using any real patient data. • Comprehensive Schema: The data warehouse includes multiple fact and dimension tables: o Fact Tables: Encounter, Treatment, Lab Tests, Special Tests, Vitals, and Cost. o Dimension Tables: Patient, Doctor, Disease, Insurance, Room, Date, Chronic Diseases, Allergies, and Additional Services. o Bridge Tables: For managing many-to-many relationships (e.g., doctors per encounter) and some other… • Synthetic & Scalable: The dataset is entirely synthetic, ensuring privacy and compliance. It is designed to be scalable – the current version simulates around 145,000 encounter records.

Data Generation • Data Sources & Methods: Data is generated using bunch of Py libraries. Highly customized algorithms simulate realistic patient demographics, doctor assignments, treatment choices, lab test results, and cost breakdowns etc.. • Diverse Scenarios: With over 300 diseases and thousands of treatment variations, along with dozens of lab and special tests, the dataset offers profoundly rich variability to support complex analytical projects.

How to Use This Dataset • For Data Modeling & ETL Testing: Import the CSV files into your favorite database system (e.g., PostgreSQL, MySQL, or directly into a BI tool like Power BI) and set up relationships as described in the accompanying documentation. • For Machine Learning Projects: Use the dataset to build predictive models related to patient outcomes, cost analysis, or treatment efficacy. • For Educational Purposes: Ideal for learning about data warehousing, star schema design, and advanced analytics in healthcare.

Final Note MedSynora DW offers a unique opportunity to experiment with a comprehensive, realistic hospital data warehouse without compromising real patient information. Enjoy exploring, analyzing, and building with this dataset – and feel free to reach out if you have any questions or suggestions. In particular, inconsistencies, deficiencies or suggestions about the dataset by experts in the field will contribute to other version improvements.

Clear search

Close search

Google apps

Main menu

MedSynora DW - Medical Data Warehouse

Global Country Information Dataset 2023

Description

Key Features

Potential Use Cases

Health Care Data Set ( 20+ Tables )

Adventure Works 2022 CSVs

Adventure Works 2022 dataset

How this Dataset is created?

How this Dataset may help you?

How to use this Dataset?

HR Analytics Dataset

HR Analytics Power BI Project

Overview

Dataset Columns

Football Manager 2023: 90k+ Player Stats

Football Manager Players Dataset

Overview

Dataset Description

Key Features

Dataset Size

Potential Use Cases

Why This Dataset Stands Out

Getting Started

Example Questions to Explore

Supply Chain DataSet

MedSynora DW - Medical Data Warehouse

Elevate Healthcare Analytics with a Comprehensive Gigantic Data Warehouse