100+ datasets found

Synthetic Healthcare Database for Research (SyH-DR)
catalog.data.gov
healthdata.gov
+2more
Updated Sep 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
Description
The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.
E
Minimum Hospital Data Set
healthinformationportal.eu
html
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Public Service (FPS) Health, Food Chain Safety, and Environment (2022). Minimum Hospital Data Set [Dataset]. https://www.healthinformationportal.eu/health-information-sources/minimum-hospital-data-set
Explore at:
htmlAvailable download formats
Dataset updated
Mar 4, 2022
Dataset authored and provided by
Federal Public Service (FPS) Health, Food Chain Safety, and Environment
License
https://fair.healthdata.be/dataset/12d69eca-4449-47d2-943d-e4448a467292https://fair.healthdata.be/dataset/12d69eca-4449-47d2-943d-e4448a467292
Variables measured
sex, title, topics, acronym, country, language, data_owners, description, contact_name, geo_coverage, and 14 more
Measurement technique
Hospital resources & Healthcare administrative area resources
Description
The MZG is a registration with which all non-psychiatric hospitals in Belgium must make their (anonymised) administrative, medical and nursing data available to the Federal Public Service (FPS) Public Health. The aim of the MZG is to support the government's health policy by

Determining the needs for hospital facilities;

Describing the qualitative and quantitative accreditation standards of hospitals and their services;

Organising the financing of hospitals;

Determining policy for the practice of medicine;

To outline epidemiological policy.

The MZG aims also to support the health policy of hospitals by providing national and individual feedback so that a hospital can compare itself with other hospitals and adapt its internal policy.

All reports can be found here (in French/Dutch).
m
Data from: Generating Heterogeneous Big Data Set for Healthcare and...
data.mendeley.com
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Al-Obidi (2023). Generating Heterogeneous Big Data Set for Healthcare and Telemedicine Research Based on ECG, Spo2, Blood Pressure Sensors, and Text Inputs: Data set classified, Analyzed, Organized, And Presented in Excel File Format. [Dataset]. http://doi.org/10.17632/gsmjh55sfy.1
Explore at:
Unique identifier
https://doi.org/10.17632/gsmjh55sfy.1
Dataset updated
Jan 23, 2023
Authors
Omar Al-Obidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.
d
CompanyData.com (BoldData) - Healthcare Company Data (2.5M Companies)
datarade.ai
Updated Nov 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompanyData.com (BoldData) (2020). CompanyData.com (BoldData) - Healthcare Company Data (2.5M Companies) [Dataset]. https://datarade.ai/data-products/healthcare-data-bolddata
Explore at:
.json, .csv, .xls, .txtAvailable download formats
Dataset updated
Nov 13, 2020
Dataset authored and provided by
CompanyData.com (BoldData)
Area covered
Niger, Trinidad and Tobago, Micronesia (Federated States of), Chile, Burundi, Finland, Svalbard and Jan Mayen, Tokelau, Nigeria, Kenya
Description
CompanyData.com, (BoldData), is your gateway to verified global business intelligence. Our Healthcare Company Database provides in-depth, accurate data on 2.5 million organizations across the healthcare industry—from hospitals and clinics to pharmaceutical companies, biotech firms, and medical equipment suppliers. Every record is sourced from official trade registers and healthcare authorities, ensuring regulatory compliance and unmatched data quality.

We deliver comprehensive company profiles enriched with key firmographics, industry classifications, ownership structures, executive contact details, emails, direct phone numbers, and mobile data. Updated regularly and quality-checked against official sources, our healthcare data empowers organizations to make informed decisions across critical functions—from KYC verification and compliance to targeted sales campaigns, healthcare market analysis, CRM enrichment, and AI model development.

To suit every workflow, we offer flexible delivery solutions including custom bulk files, self-service platform access, real-time API integrations, and on-demand enrichment services. Whether you're scaling a B2B marketing strategy or building healthcare analytics tools, our datasets are ready to plug into your operations.

With coverage of over 380 million verified companies across all industries and regions, CompanyData.com (BoldData) offers the global reach and industry precision that modern organizations demand. Tap into our healthcare data solutions to discover new opportunities, reduce risk, and power smarter business growth across the global health economy.
Health Insurance Marketplace
kaggle.com
zip
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/datasets/hhs/health-insurance-marketplace
Explore at:
zip(868821924 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Authors
US Department of Health and Human Services
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

Exploration Ideas

To help get you started, here are some data exploration ideas:

How do plan rates and benefits vary across states?

How do plan benefits relate to plan rates?

How do plan rates vary by age?

How do plans vary across insurance network providers?

See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

Data Description

This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

Here, we've processed the data to facilitate analytics. This processed version has three components:

1. Original versions of the data

The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

2. Combined CSV files that contain

In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

BenefitsCostSharing.csv

BusinessRules.csv

Network.csv

PlanAttributes.csv

Rate.csv

ServiceArea.csv

Additionally, there are two CSV files that facilitate joining data across years:

Crosswalk2015.csv - joining 2014 and 2015 data

Crosswalk2016.csv - joining 2015 and 2016 data

3. SQLite database

The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

The code to create the processed version of this data is available on GitHub.
G
Open Database of Healthcare Facilities
open.canada.ca
catalogue.arctic-sdi.org
csv, esri rest +4
Updated Mar 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2022). Open Database of Healthcare Facilities [Dataset]. https://open.canada.ca/data/en/dataset/a1bcd4ee-8e57-499b-9c6f-94f6902fdf32
Explore at:
fgdb/gdb, esri rest, csv, html, pdf, wmsAvailable download formats
Dataset updated
Mar 2, 2022
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
The Open Database of Healthcare Facilities (ODHF) is a collection of open data containing the names, types, and locations of health facilities across Canada. It is released under the Open Government License - Canada. The ODHF compiles open, publicly available, and directly-provided data on health facilities across Canada. Data sources include regional health authorities, provincial, territorial and municipal governments, and public health and professional healthcare bodies. This database aims to provide enhanced access to a harmonized listing of health facilities across Canada by making them available as open data. This database is a component of the Linkable Open Data Environment (LODE).
o
Public Health Portfolio Dataset
nihr.opendatasoft.com
nihr.aws-ec2-eu-central-1.opendatasoft.com
csv, excel, json
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Public Health Portfolio Dataset [Dataset]. https://nihr.opendatasoft.com/explore/dataset/phof-datase/
Explore at:
excel, json, csvAvailable download formats
Dataset updated
Sep 26, 2025
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The NIHR is one of the main funders of public health research in the UK. Public health research falls within the remit of a range of NIHR Research Programmes, NIHR Centres of Excellence and Facilities, plus the NIHR Academy. NIHR awards from all NIHR Research Programmes and the NIHR Academy that were funded between January 2006 and the present extraction date are eligible for inclusion in this dataset. An agreed inclusion/exclusion criteria is used to categorise awards as public health awards (see below). Following inclusion in the dataset, public health awards are second level coded to one of the four Public Health Outcomes Framework domains. These domains are: (1) wider determinants (2) health improvement (3) health protection (4) healthcare and premature mortality.More information on the Public Health Outcomes Framework domains can be found here.This dataset is updated quarterly to include new NIHR awards categorised as public health awards. Please note that for those Public Health Research Programme projects showing an Award Budget of £0.00, the project is undertaken by an on-call team for example, PHIRST, Public Health Review Team, or Knowledge Mobilisation Team, as part of an ongoing programme of work.Inclusion criteriaThe NIHR Public Health Overview project team worked with colleagues across NIHR public health research to define the inclusion criteria for NIHR public health research awards. NIHR awards are categorised as public health awards if they are determined to be ‘investigations of interventions in, or studies of, populations that are anticipated to have an effect on health or on health inequity at a population level.’ This definition of public health is intentionally broad to capture the wide range of NIHR public health awards across prevention, health improvement, health protection, and healthcare services (both within and outside of NHS settings). This dataset does not reflect the NIHR’s total investment in public health research. The intention is to showcase a subset of the wider NIHR public health portfolio. This dataset includes NIHR awards categorised as public health awards from NIHR Research Programmes and the NIHR Academy. This dataset does not currently include public health awards or projects funded by any of the three NIHR Research Schools or any of the NIHR Centres of Excellence and Facilities. Therefore, awards from the NIHR Schools for Public Health, Primary Care and Social Care, NIHR Public Health Policy Research Unit and the NIHR Health Protection Research Units do not feature in this curated portfolio.DisclaimersUsers of this dataset should acknowledge the broad definition of public health that has been used to develop the inclusion criteria for this dataset. This caveat applies to all data within the dataset irrespective of the funding NIHR Research Programme or NIHR Academy award.Please note that this dataset is currently subject to a limited data quality review. We are working to improve our data collection methodologies. Please also note that some awards may also appear in other NIHR curated datasets. Further informationFurther information on the individual awards shown in the dataset can be found on the NIHR’s Funding & Awards website here. Further information on individual NIHR Research Programme’s decision making processes for funding health and social care research can be found here.Further information on NIHR’s investment in public health research can be found as follows: NIHR School for Public Health here. NIHR Public Health Policy Research Unit here. NIHR Health Protection Research Units here. NIHR Public Health Research Programme Health Determinants Research Collaborations (HDRC) here. NIHR Public Health Research Programme Public Health Intervention Responsive Studies Teams (PHIRST) here.
Hospital patient data
kaggle.com
Updated Apr 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdulqader_Asiirii (2023). Hospital patient data [Dataset]. https://www.kaggle.com/datasets/abdulqaderasiirii/hospital-patient-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 14, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abdulqader_Asiirii
Description
******CONTEXT******: The data is about hospital patient data, a collection of data from the patient entering the hospital until his exit.

******CONTENT******: Date : The day patient visited Medication Revenue : the revenue of the medication Lab Cost : Lab cost paid by the patient Consultation Revenue : Revenue of the consultation Doctor Type : The type of doctor who treats the patient Financial Class : Patient financial Class Patient Type : (OUTPATIENT) Entry Time : Entered the (OUTPATIENT) & Hospital Post-Consultation Time : when the doctor tells the patients to enter the clinic room Completion Time : when the patients exit the clinic room or the building Patient ID : The unique Identity document

******Requirements******: Dose the patient type affect the waiting time? Is there a specific type of patient waiting a long time? Are we too busy? Do we have staffing issues? How much patients wait before the doctor can see them? What type of staff do we need or where do we need them? What days of the week are affected? How can we fix it?

Please up-vote if you find this dataset helpful!🖤!
m
Dataset of health insurance portfolio
data.mendeley.com
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josep Lledó (2024). Dataset of health insurance portfolio [Dataset]. http://doi.org/10.17632/386vmj2tbk.1
Explore at:
Unique identifier
https://doi.org/10.17632/386vmj2tbk.1
Dataset updated
Dec 6, 2024
Authors
Josep Lledó
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data is formatted as a spreadsheet, encompassing the primary activities over a span of three full years (2017, 2018 and 2019) concerning non-life health insurance portfolio. This dataset comprises 228,711 rows and 42 columns. Each row signifies a insured (individual) policy, while each column represents a distinct variable.
w
Health Care Provider Credential Data
data.wu.ac.at
data.wa.gov
+5more
csv, json, rdf, xml
Updated May 8, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of Washington (2018). Health Care Provider Credential Data [Dataset]. https://data.wu.ac.at/schema/data_gov/ZjgyODFjNWYtNzMyOC00MWUyLTk3ZTctMzQ1NGNhMGI0NmQy
Explore at:
rdf, csv, json, xmlAvailable download formats
Dataset updated
May 8, 2018
Dataset provided by
State of Washington
Description
The Washington State Department of Health presents this information as a service to the public. True and correct copies of legal disciplinary actions taken after July 1998 are available on our Provider Credential Search site. These records are considered certified by the Department of Health.

This includes information on health care providers.

Please contact our Customer Service Center at 360-236-4700 for information about actions before July 1998. The information on this site comes directly from our database and is updated daily at 10:00 a.m.. This data is a primary source for verification of credentials and is extracted from the primary database at 2:00 a.m. daily.

News releases about disciplinary actions taken against Washington State healthcare providers, agencies or facilities are on the agency's Newsroom webpage.

Disclaimer The absence of information in the Provider Credential Search system doesn't imply any recommendation, endorsement or guarantee of competence of any healthcare professional. The presence of information in this system doesn't imply a provider isn't competent or qualified to practice. The reader is encouraged to carefully evaluate any information found in this data set.
A
AI Training Dataset In Healthcare Market Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). AI Training Dataset In Healthcare Market Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-training-dataset-in-healthcare-market-5352
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
global
Variables measured
Market Size
Description
The AI Training Dataset In Healthcare Market size was valued at USD 341.8 million in 2023 and is projected to reach USD 1464.13 million by 2032, exhibiting a CAGR of 23.1 % during the forecasts period. The growth is attributed to the rising adoption of AI in healthcare, increasing demand for accurate and reliable training datasets, government initiatives to promote AI in healthcare, and technological advancements in data collection and annotation. These factors are contributing to the expansion of the AI Training Dataset In Healthcare Market. Healthcare AI training data sets are vital for building effective algorithms, and enhancing patient care and diagnosis in the industry. These datasets include large volumes of Electronic Health Records, images such as X-ray and MRI scans, and genomics data which are thoroughly labeled. They help the AI systems to identify trends, forecast and even help in developing unique approaches to treating the disease. However, patient privacy and ethical use of a patient’s information is of the utmost importance, thus requiring high levels of anonymization and compliance with laws such as HIPAA. Ongoing expansion and variety of datasets are crucial to address existing bias and improve the efficiency of AI for different populations and diseases to provide safer solutions for global people’s health.
COVID-19 Reported Patient Impact and Hospital Capacity by Facility -- RAW
catalog.data.gov
healthdata.gov
+4more
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Health and Human Services (2025). COVID-19 Reported Patient Impact and Hospital Capacity by Facility -- RAW [Dataset]. https://catalog.data.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-by-facility-raw
Explore at:
Dataset updated
Jul 4, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Description
After May 3, 2024, this dataset and webpage will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, and hospital capacity and occupancy data, to HHS through CDC’s National Healthcare Safety Network. Data voluntarily reported to NHSN after May 1, 2024, will be available starting May 10, 2024, at COVID Data Tracker Hospitalizations. The following dataset provides facility-level data for hospital utilization aggregated on a weekly basis (Sunday to Saturday). These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities. The hospital population includes all hospitals registered with Centers for Medicare & Medicaid Services (CMS) as of June 1, 2020. It includes non-CMS hospitals that have reported since July 15, 2020. It does not include psychiatric, rehabilitation, Indian Health Service (IHS) facilities, U.S. Department of Veterans Affairs (VA) facilities, Defense Health Agency (DHA) facilities, and religious non-medical facilities. For a given entry, the term “collection_week” signifies the start of the period that is aggregated. For example, a “collection_week” of 2020-11-15 means the average/sum/coverage of the elements captured from that given facility starting and including Sunday, November 15, 2020, and ending and including reports for Saturday, November 21, 2020. Reported elements include an append of either “_coverage”, “_sum”, or “_avg”. A “_coverage” append denotes how many times the facility reported that element during that collection week. A “_sum” append denotes the sum of the reports provided for that facility for that element during that collection week. A “_avg” append is the average of the reports provided for that facility for that element during that collection week. The file will be updated weekly. No statistical analysis is applied to impute non-response. For averages, calculations are based on the number of values collected for a given hospital in that collection week. Suppression is applied to the file for sums and averages less than four (4). In these cases, the field will be replaced with “-999,999”. A story page was created to display both corrected and raw datasets and can be accessed at this link: https://healthdata.gov/stories/s/nhgk-5gpv This data is preliminary and subject to change as more data become available. Data is available starting on July 31, 2020. Sometimes, reports for a given facility will be provided to both HHS TeleTracking and HHS Protect. When this occurs, to ensure that there are not duplicate reports, deduplication is applied according to prioritization rules within HHS Protect. For influenza fields listed in the file, the current HHS guidance marks these fields as optional. As a result, coverage of these elements are varied. For recent updates to the dataset, scroll to the bottom of the dataset description. On May 3, 2021, the following fields have been added to this data set. hhs_ids previous_day_admission_adult_covid_confirmed_7_day_coverage previous_day_admission_pediatric_covid_confirmed_7_day_coverage previous_day_admission_adult_covid_suspected_7_day_coverage previous_day_admission_pediatric_covid_suspected_7_day_coverage previous_week_personnel_covid_vaccinated_doses_administered_7_day_sum total_personnel_covid_vaccinated_doses_none_7_day_sum total_personnel_covid_vaccinated_doses_one_7_day_sum total_personnel_covid_vaccinated_doses_all_7_day_sum previous_week_patients_covid_vaccinated_doses_one_7_day_sum previous_week_patients_covid_vaccinated_doses_all_
s
Electronic Health Records (EHR) Datasets
shaip.com
json
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2022). Electronic Health Records (EHR) Datasets [Dataset]. https://www.shaip.com/offerings/electronic-health-records-ehr-medical-data-catalog/
Explore at:
jsonAvailable download formats
Dataset updated
Apr 8, 2022
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Get premium quality off-the-shelf EHR dataset to develop better performing machine learning models. Speak to our experts for Electronic Health Records data needs.
Data from: UK Health Accounts
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). UK Health Accounts [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthcaresystem/datasets/healthaccountsreferencetables
Explore at:
xlsxAvailable download formats
Dataset updated
Apr 30, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
United Kingdom
Description
UK healthcare expenditure data by financing scheme, function and provider, and additional analyses produced to internationally standardised definitions.
h
Med_Dataset
huggingface.co
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MedDataHub (2025). Med_Dataset [Dataset]. http://doi.org/10.57967/hf/4497
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/4497
Dataset updated
Jan 28, 2025
Dataset authored and provided by
MedDataHub
Description
Complete Dataset

Data shown below is complete Medical dataset Access the complete dataset using the link below: Download Dataset

Support Us on Product Hunt and X!

| | |

Connect with Me on Happenstance

Join me on Happenstance!Click here to add me as a friend
Looking forward to connecting! For more information or assistance, feel free to contact us at harryjosh242@gmail.com.

short_description: Medical datasets for healthcare model training.… See the full description on the dataset page: https://huggingface.co/datasets/Med-dataset/Med_Dataset.
m
AHD: Arabic Healthcare Dataset
data.mendeley.com
Updated Sep 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hezam Gawbah (2024). AHD: Arabic Healthcare Dataset [Dataset]. http://doi.org/10.17632/mgj29ndgrk.6
Explore at:
Unique identifier
https://doi.org/10.17632/mgj29ndgrk.6
Dataset updated
Sep 4, 2024
Authors
Hezam Gawbah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Numerous language-centric research on healthcare is conducted day by day. To address shortcomings of Arabic natural language generation models, we introduce a large Arabic Healthcare Dataset (AHD) of textual data. For this motivation, we named our dataset ‘AHD’.

The largest Arabic Healthcare Dataset (AHD) as we know was collected from altibbi website.

The AHD consists of more than 808k Question and Answer into 90 variety categories. The AHD contains one file, and the file description will be discussed here. One file is the actual data which is in Arabic language.

AHD.xlsx file contains dataset in excel format, which includes the question, answer, and category in Arabic.

AHD_english.xlsx file contains dataset in excel format, which includes the question, answer, and category translated to English.

Distribution of Question and Answer per category.xlsex shows the distribution of the data set by category.
MULTI-SITE EVALUATION OF A DATA QUALITY TOOL FOR BIG DATA IN HEALTHCARE
figshare.com
xlsx
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vojtech Huser (2016). MULTI-SITE EVALUATION OF A DATA QUALITY TOOL FOR BIG DATA IN HEALTHCARE [Dataset]. http://doi.org/10.6084/m9.figshare.1497942.v4
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1497942.v4
Dataset updated
Jan 20, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Vojtech Huser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Evaluation of data quality in large healthcare datasets.

abstract: Data quality and fitness for analysis are crucial if outputs of big data analyses should be trusted by the public and the research community. Here we analyze the output from a data quality tool called Achilles Heel as it was applied to 24 datasets across seven different organizations. We highlight 12 data quality rules that identified issues in at least 10 of the 24 datasets and provide a full set of 71 rules identified in at least one dataset. Achilles Heel is developed by Observational Health Data Sciences and Informatics (OHDSI) community and is a freely available software that provides a useful starter set of data quality rules. Our analysis represents the first data quality comparison of multiple datasets across several countries in America, Europe and Asia.
Equity in Healthcare Clean DataSets
kaggle.com
Updated Feb 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anopsy (2024). Equity in Healthcare Clean DataSets [Dataset]. https://www.kaggle.com/datasets/anopsy/equity-in-healthcare-clean-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anopsy
Description
This dataset is based on train and test dataset from this competition: https://www.kaggle.com/competitions/widsdatathon2024-challenge1 .

What did I change? 1. I dropped 2 columns that contained to little data.
2. using Machine Learning I imputed "payer_type", "patient_race" and "bmi". 3. using "patient_zip3" I filled missing values in "patient_state" , "Region" and "Division" 4. using SinmpleImputer I imputed few missing numeric data in "Ozone", "PM2.5" and other columns 5. I created some new features, based on demographic features, that may be a bit more informative. 6. I tokenized the 'breast_cancer_diagnosis_desc' column

If you're interested how I did that check those notebooks: https://www.kaggle.com/code/anopsy/ml-for-missing-values for "bmi" and new features check this: https://www.kaggle.com/code/anopsy/fe-and-xgb-on-clean-data

According to the description of the original dataset, it's a "39k record dataset (split into training and test sets) representing patients and their characteristics (age, race, BMI, zip code), their diagnosis and treatment information (breast cancer diagnosis code, metastatic cancer diagnosis code, metastatic cancer treatments, … etc.), their geo (zip-code level) demographic data (income, education, rent, race, poverty, …etc), as well as toxic air quality data (Ozone, PM25 and NO2)."
Hospital Service Area
catalog.data.gov
healthdata.gov
+3more
Updated Sep 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2025). Hospital Service Area [Dataset]. https://catalog.data.gov/dataset/hospital-service-area-af6dc
Explore at:
Dataset updated
Sep 14, 2025
Dataset provided by
Centers for Medicare & Medicaid Services
Description
The Hospital Service Area data is a summary of calendar year Medicare inpatient hospital fee-for-service and Medicare Advantage claims data. It contains number of discharges, total days of care, and total charges summarized by hospital provider number and the ZIP code of the Medicare beneficiary. Note: This full dataset contains more records than most spreadsheet programs can handle, which will result in an incomplete load of data. Use of a database or statistical software is required.
EMRBots: a 100-patient database
figshare.com
data.mendeley.com
zip
Updated Sep 3, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uri Kartoun (2018). EMRBots: a 100-patient database [Dataset]. http://doi.org/10.6084/m9.figshare.7040039.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7040039.v3
Dataset updated
Sep 3, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Uri Kartoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A 100-patient database that contains in total 100 virtual patients, 372 admissions, and 111,483 lab observations.

Facebook

Twitter

Click to copy link

Link copied

Cite

Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr

Synthetic Healthcare Database for Research (SyH-DR)

Explore at:

9 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 16, 2023

Dataset provided by

Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/

Description

The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.

Clear search

Close search

Google apps

Main menu

Synthetic Healthcare Database for Research (SyH-DR)

Minimum Hospital Data Set

Data from: Generating Heterogeneous Big Data Set for Healthcare and...

CompanyData.com (BoldData) - Healthcare Company Data (2.5M Companies)

Health Insurance Marketplace

Exploration Ideas

Data Description

1. Original versions of the data

2. Combined CSV files that contain

3. SQLite database

Open Database of Healthcare Facilities

Public Health Portfolio Dataset

Hospital patient data

Dataset of health insurance portfolio

Health Care Provider Credential Data

AI Training Dataset In Healthcare Market Report

COVID-19 Reported Patient Impact and Hospital Capacity by Facility -- RAW

Electronic Health Records (EHR) Datasets

Data from: UK Health Accounts

Med_Dataset

AHD: Arabic Healthcare Dataset

MULTI-SITE EVALUATION OF A DATA QUALITY TOOL FOR BIG DATA IN HEALTHCARE

Equity in Healthcare Clean DataSets

Hospital Service Area

EMRBots: a 100-patient database

Synthetic Healthcare Database for Research (SyH-DR)