91 datasets found

r
CMS Synthetic Patient Data OMOP
redivis.com
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). CMS Synthetic Patient Data OMOP [Dataset]. https://redivis.com/workflows/6e6p-cfgn5hgz1
Explore at:
Dataset updated
Feb 24, 2025
Description
This is a synthetic patient dataset in the OMOP Common Data Model v5.2, originally released by the CMS and accessed via BigQuery. The dataset includes 24 tables and records for 2 million synthetic patients from 2008 to 2010.
Synthea synthetic patient generator data in OMOP Common Data Model
registry.opendata.aws
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
Explore at:
Dataset updated
Jan 4, 2023
Dataset provided by
Amazon.comhttp://amazon.com/
Description
The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079
Synthetic Patient Data in OMOP
console.cloud.google.com
Updated Jun 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Department%20of%20Health%20%26%20Human%20Services&inv=1&invt=Ab2v-Q (2020). Synthetic Patient Data in OMOP [Dataset]. https://console.cloud.google.com/marketplace/product/hhs/synpuf
Explore at:
Dataset updated
Jun 25, 2020
Dataset provided by
Googlehttp://google.com/
Description
The Synthetic Patient Data in OMOP Dataset is a synthetic database released by the Centers for Medicare and Medicaid Services (CMS) Medicare Claims Synthetic Public Use Files (SynPUF). It is synthetic data containing 2008-2010 Medicare insurance claims for development and demonstration purposes. It has been converted to the Observational Medical Outcomes Partnership (OMOP) common data model from its original form, CSV, by the open source community as released on GitHub Please refer to the CMS Linkable 2008–2010 Medicare Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) User Manual for details regarding how DE-SynPUF was created." This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Domain
redivis.com
Updated Sep 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Domain [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 6, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables.
Cost
redivis.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cost [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 7, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The COST table captures records containing the cost of any medical event recorded in one of the OMOP clinical event tables such as DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, VISIT_OCCURRENCE, VISIT_DETAIL, DEVICE_OCCURRENCE, OBSERVATION or MEASUREMENT.
CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in...
registry.opendata.aws
Updated Jan 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/cmsdesynpuf-omop/
Explore at:
Dataset updated
Jan 18, 2023
Dataset provided by
Amazon.comhttp://amazon.com/
Description
DE-SynPUF is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,300,000 persom (2.3m) data sets in the OMOP Common Data Model format. The DE-SynPUF was created with the goal of providing a realistic set of claims data in the public domain while providing the very highest degree of protection to the Medicare beneficiaries’ protected health information. The purposes of the DE-SynPUF are to:

allow data entrepreneurs to develop and create software and applications that may eventually be applied to actual CMS claims data;

train researchers on the use and complexity of conducting analyses with CMS claims data prior to initiating the process to obtain access to actual CMS data; and,

support safe data mining innovations that may reveal unanticipated knowledge gains while preserving beneficiary privacy. The files have been designed so that programs and procedures created on the DE-SynPUF will function on CMS Limited Data Sets. The data structure of the Medicare DE-SynPUF is very similar to the CMS Limited Data Sets, but with a smaller number of variables. The DE-SynPUF also provides a robust set of metadata on the CMS claims data that have not been previously available in the public domain. Although the DE-SynPUF has very limited inferential research value to draw conclusions about Medicare beneficiaries due to the synthetic processes used to create the file, the Medicare DE-SynPUF does increase access to a realistic Medicare claims data file in a timely and less expensive manner to spur the innovation necessary to achieve the goals of better care for beneficiaries and improve the health of the population.
u
Example (synthetic) electronic health record data
rdr.ucl.ac.uk
application/csv
Updated Apr 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steve Harris; Wai Shing Lai (2024). Example (synthetic) electronic health record data [Dataset]. http://doi.org/10.5522/04/25676298.v1
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.5522/04/25676298.v1
Dataset updated
Apr 24, 2024
Dataset provided by
University College London
Authors
Steve Harris; Wai Shing Lai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These data are modelled using the OMOP Common Data Model v5.3.Correlated Data SourceNG tube vocabulariesGeneration RulesThe patient’s age should be between 18 and 100 at the moment of the visit.Ethnicity data is using 2021 census data in England and Wales (Census in England and Wales 2021) .Gender is equally distributed between Male and Female (50% each).Every person in the record has a link in procedure_occurrence with the concept “Checking the position of nasogastric tube using X-ray”2% of person records have a link in procedure_occurrence with the concept of “Plain chest X-ray”60% of visit_occurrence has visit concept “Inpatient Visit”, while 40% have “Emergency Room Visit”NotesVersion 0Generated by man-made rule/story generatorStructural correct, all tables linked with the relationshipWe used national ethnicity data to generate a realistic distribution (see below)2011 Race Census figure in England and WalesEthnic Group : Population(%)Asian or Asian British: Bangladeshi - 1.1Asian or Asian British: Chinese - 0.7Asian or Asian British: Indian - 3.1Asian or Asian British: Pakistani - 2.7Asian or Asian British: any other Asian background -1.6Black or African or Caribbean or Black British: African - 2.5Black or African or Caribbean or Black British: Caribbean - 1Black or African or Caribbean or Black British: other Black or African or Caribbean background - 0.5Mixed multiple ethnic groups: White and Asian - 0.8Mixed multiple ethnic groups: White and Black African - 0.4Mixed multiple ethnic groups: White and Black Caribbean - 0.9Mixed multiple ethnic groups: any other Mixed or multiple ethnic background - 0.8White: English or Welsh or Scottish or Northern Irish or British - 74.4White: Irish - 0.9White: Gypsy or Irish Traveller - 0.1White: any other White background - 6.4Other ethnic group: any other ethnic group - 1.6Other ethnic group: Arab - 0.6
h
OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes...
healthdatagateway.org
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158), OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes [Dataset]. https://healthdatagateway.org/dataset/139
Explore at:
unknownAvailable download formats
Dataset authored and provided by
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
License
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Description
OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes Dataset number 2.0

Coronavirus disease 2019 (COVID-19) was identified in January 2020. Currently, there have been more than 6 million cases & more than 1.5 million deaths worldwide. Some individuals experience severe manifestations of infection, including viral pneumonia, adult respiratory distress syndrome (ARDS) & death. There is a pressing need for tools to stratify patients, to identify those at greatest risk. Acuity scores are composite scores which help identify patients who are more unwell to support & prioritise clinical care. There are no validated acuity scores for COVID-19 & it is unclear whether standard tools are accurate enough to provide this support. This secondary care COVID OMOP dataset contains granular demographic, morbidity, serial acuity and outcome data to inform risk prediction tools in COVID-19.

PIONEER geography The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. There is a higher than average percentage of minority ethnic groups. WM has a large number of elderly residents but is the youngest population in the UK. Each day >100,000 people are treated in hospital, see their GP or are cared for by the NHS. The West Midlands was one of the hardest hit regions for COVID admissions in both wave 1 & 2.

EHR. University Hospitals Birmingham NHS Foundation Trust (UHB) is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & 100 ITU beds. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”. UHB has cared for >5000 COVID admissions to date. This is a subset of data in OMOP format.

Scope: All COVID swab confirmed hospitalised patients to UHB from January – August 2020. The dataset includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to care process (timings, staff grades, specialty review, wards), presenting complaint, acuity, all physiology readings (pulse, blood pressure, respiratory rate, oxygen saturations), all blood results, microbiology, all prescribed & administered treatments (fluids, antibiotics, inotropes, vasopressors, organ support), all outcomes.

Available supplementary data: Health data preceding & following admission event. Matched “non-COVID” controls; ambulance, 111, 999 data, synthetic data. Further OMOP data available as an additional service.

Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
CPRD Primary Care and Linked Data OMOP Common Data Model
healthdatagateway.org
unknown
Updated Dec 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CPRD NHS England (2024). CPRD Primary Care and Linked Data OMOP Common Data Model [Dataset]. http://doi.org/10.48329/cyhc-9068
Explore at:
unknownAvailable download formats
Unique identifier
https://doi.org/10.48329/cyhc-9068
Dataset updated
Dec 15, 2024
Dataset provided by
National Health Servicehttps://www.nhs.uk/
Authors
CPRD NHS England
License
HTTPS://CPRD.COM/DATA-ACCESSHTTPS://CPRD.COM/DATA-ACCESS
Description
The CPRD Primary Care and Linked Data OMOP CDM database contains longitudinal routinely-collected health records (EHR data) from UK primary care practices, and hospital episode data provided by NHS England. The data has been transformed into a common format (data model) using an open community data standard and structure from the OHDSI standardised vocabularies. The approach allows organisation, standardisation and common representation of medical terms and variables that have been obtained from various clinical data sources. Access to anonymised data from CPRD is subject to a full licence agreement containing detailed terms and conditions of use. Anonymised patient datasets can be extracted for researchers against specific study specifications, following protocol approval.
r
Austin Health OMOP Dataset
researchdata.edu.au
figshare.unimelb.edu.au
Updated Dec 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ROGER WARD; GRAEME HART (2023). Austin Health OMOP Dataset [Dataset]. http://doi.org/10.26188/24562789.V2
Explore at:
Unique identifier
https://doi.org/10.26188/24562789.V2
Dataset updated
Dec 21, 2023
Dataset provided by
The University of Melbourne
Authors
ROGER WARD; GRAEME HART
Description
The Austin Health Dataset is an OMOP dataset based on records held at Austin Health.

The data is derived from an Electronic Medical Records System held in Cerner.

While the data is not open access, researchers can enquire about access subject to ethics and governance approvals.

The dataset is based on Version 5.4 of the OMOP Common data model.

Specifications of data model:

https://ohdsi.github.io/CommonDataModel/cdm54.html

Citation guidance here: https://ohdsi-australia.org/EMR2OMOP.html
Data custodian: Dr Graeme Hart, Melbourne University
Provider
redivis.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Provider [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 7, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The PROVIDER table contains a list of uniquely identified healthcare providers. These are individuals providing hands-on healthcare to patients, such as physicians, nurses, midwives, physical therapists etc.
f
OMOP primary database assessment of risk.
figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). OMOP primary database assessment of risk. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t002
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
Vocabulary
redivis.com
Updated Sep 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Vocabulary [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 6, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The VOCABULARY table includes a list of the Vocabularies collected from various sources or created de novo by the OMOP community. This reference table is populated with a single record for each Vocabulary source and includes a descriptive name and other associated attributes for the Vocabulary.
h
Connected Bradford - Secondary Care BRI OMOP database
healthdatagateway.org
unknown
Updated Jan 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Connected Bradford. Yorkshire & Humber Secure Data Environment. (2025). Connected Bradford - Secondary Care BRI OMOP database [Dataset]. https://healthdatagateway.org/en/dataset/1101
Explore at:
unknownAvailable download formats
Dataset updated
Jan 31, 2025
Dataset authored and provided by
Connected Bradford. Yorkshire & Humber Secure Data Environment.
License
https://bradfordresearch.nhs.uk/connected-bradford/https://bradfordresearch.nhs.uk/connected-bradford/
Description
This dataset is an extract from the Bradford Royal Infirmary EPR system. This contains current and some historical data, and is based on extracting the relevant tables from EPR, mapping to the OMOP schema and outputting in omop cdm 5.3 format.
r
Western Health OMOP Dataset
researchdata.edu.au
figshare.unimelb.edu.au
Updated Dec 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Western Health OMOP Dataset [Dataset]. https://researchdata.edu.au/western-health-omop-dataset/2837361
Explore at:
Unique identifier
https://doi.org/10.26188/24597273.V2
Dataset updated
Dec 21, 2023
Dataset provided by
The University of Melbourne
Authors
ROGER WARD; BILL KARANATSIOS
Description
The Western Health Dataset is an OMOP dataset based on records held at Western Health.

The data is derived from an Electronic Medical Records System held in Cerner.

While the data is not open access, researchers can enquire about access subject to ethics and governance approvals.

The dataset is based on Version 5.4 of the OMOP Common data model.

Specifications of data model:

https://ohdsi.github.io/CommonDataModel/cdm54.html

Data custodian: Bill Karanatsios, Western Health

Citation guidance here: https://ohdsi-australia.org/EMR2OMOP.html
Payer plan period
redivis.com
Updated Sep 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Payer plan period [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 6, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The PAYER_PLAN_PERIOD table captures details of the period of time that a Person is continuously enrolled under a specific health Plan benefit structure from a given Payer.
f
EMR tables and related tables in the OMOP CDM.
figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). EMR tables and related tables in the OMOP CDM. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t004
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

Addressing the Challenges of Health Data Standard Adoption and Usage: A...

zenodo.org

bin

Updated May 12, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Alberto Marfoglia; Alberto Marfoglia; Valerio Antonio Arcobelli; Valerio Antonio Arcobelli; SERENA MOSCATO; SERENA MOSCATO; Antonino Amedeo La Mattina; Antonino Amedeo La Mattina; Sabato Mellone; Sabato Mellone; ANTONELLA CARBONARO; ANTONELLA CARBONARO (2025). Addressing the Challenges of Health Data Standard Adoption and Usage: A Systematic Review - Data Extraction [Dataset]. http://doi.org/10.5281/zenodo.15358180

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15358180

Dataset updated

May 12, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

May 7, 2025

Description

This table presents the data extraction from the 99 studies included according to the criteria outlined in the main manuscript. It is provided as supplementary material to enhance the readability of the paper while ensuring that all relevant information is preserved and accessible without loss of detail.

The names of the variables and their descriptions are provided in the attached file, along with the following details:

Variable		Description
Ref.		The citation in the format: First author et al. [Year] (e.g., AuthorA et al. [2022]). This identifies the study's primary citation for easy reference.
Title		The title of the paper
Standard		The healthcare data standard used in the study. Possible values are: OMOP, OpenEHR, FHIR.
Study Location		The country where the study was conducted.
Objective for using the standard	Detailed	The comprehensive explanation of the specific objective of using the standard in the study, describing how it supports the study’s goals.
	Short	The primary purpose for applying the healthcare standard. Possible values are: Secondary data reuse, Data exchange, Clinical decision support, Vocabulary definition, EHR system design,
Application domain	Type	The application domain type that represents the healthcare standard. Possible solution are: Clinical: Studies with a direct impact on clinical practice, applying established tools or methods in healthcare settings (e.g., predicting in-hospital mortality for heart attack patients) and Research: Studies proposing innovative tools, methodologies, or frameworks still in the design/testing phase, not yet clinically implemented.
	Healthcare Area	The relevant healthcare domain for the study, such as Cardiovascular, Intensive Care Unit, Emergency Department, Oncology, Biology, etc.
	Cluster	The healthcare domain clusterized for easier readability. Possible values include: Clinical Medicine, Clinical Services and Diagnostics, Public Health, Health Information Management and Biomedical Sciences
	Use	This report if the results of the paper serving a Primary use (direct care) or a Secondary use (repurposing existing data or tools for new objectives).
Scale		The scale of the study. Possible values are: Single center (one hospital/clinic), Multi-center (multiple institutions), Regional (specific region), National level (countrywide).
Dataset magnitude in patients		The magnitude of the dataset expressed in chars. Possible values are: A (<10 to 99), B (100 to 9,999), C (10,000 to 999,999) and D (1,000,000 and above).
N° Elements		The number of variables of input in the process of standardization.
Percentuage of mapped variables		The percentage of successful data standardisation.
Coverage of the standard		The methodology of standardisation wheter it was adapted or not.
ETL Tools	Data cleaning & extraction	The tools adopted for supporting data cleaning and extraction.
	Mapping	The tools adopted for the mapping of the variables.
	Validation	The tools adopted for the validation of the standardization process.
	Database	The database adopted for storing the result of the healthcare data standardization.
Process efficiency and Economic assessment		The information about the economic impact if the consequences are concrete and measured by the authors (e.g., actual cost savings, resource usage reductions). If the authors did not measure the economic impact, this field remains blank.
Comments by authors	Limitations	The significant limitations or challenges faced during the study about the standard adopted, such as issues with data compatibility, scalability, or the need for customization.
	Advantages	The benefits of applying the standard model, such as improved data consistency, enhanced clinical outcomes, better interoperability, or more efficient workflows.

h
University College London Hospitals NHS OMOP dataset
healthdatagateway.org
unknown
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). University College London Hospitals NHS OMOP dataset [Dataset]. https://healthdatagateway.org/dataset/1336
Explore at:
unknownAvailable download formats
Dataset updated
May 28, 2025
License
https://safehr-data.org/https://safehr-data.org/
Description
UCLH has an OMOP extraction system (omop_es) that connects our Electronic Health Record (EHR) to an architecture that delivers high quality, standardised extracts meeting the OMOP CDM standards. Our EHR contains records for 6 million patients, 13 million diagnoses and 50 million medication events. These derive from the UCLH patient population which includes national referrals for tertiary and quaternary services (cancer, neurology etc.) and general medical admissions from an inner city teaching hospital that treats >1m outpatients per year, and has >100k inpatient admissions.

UCLH has invested efforts and expertise to align international terminology systems e.g. SNOMED CT, LOINC, UCUM with NHS data standards, during EHR system build and post implementation. Our standardisation work has covered clinical domains i.e. Diagnosis and past medical history, Surgical and Ambulatory procedures, Diagnostic Imaging, Cardiac Echo, Lab Medicine including Biochemistry, Haematology, Microbiology, Immunology, Virology, Allergens, Medications (including route of administration); and Demographic information like Religion, Ethnicity. For some domains (e.g. diagnosis and surgical procedures) we have achieved 100% standardisation, others are an ongoing task.

Our data pipeline, the OMOP-Extraction System (OMOP-ES) is a modular, re-usable architecture written in over 20,000 lines of R. Extractions proceed through four stages.

Standardisation - translates source data to OMOP concepts at full fidelity

Projection - applies rules to redact, filter, transform & link

Post-processing - allows linking of de-identified non-OMOP data

Output - multiple formats & destinations incl. CSV, Parquet or SQLite for direct use or import in a TRE

The system is ● configurable to a variety of OMOP projects via a settings file ● reproducible and automated ● queries EPIC EHR and other sources ● automates filtering of sensitive data with safe defaults and ability for Information Governance teams to inspect settings before & after running ● tests and reports quality of standardisation ● being extended both by the 'core' team and by other trusts in an inner source fashion ● has a small mock database for system development and testing
f
ARCH ontologies and terminologies vs OMOP.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey G. Klann; Matthew A. H. Joss; Kevin Embree; Shawn N. Murphy (2023). ARCH ontologies and terminologies vs OMOP. [Dataset]. http://doi.org/10.1371/journal.pone.0212463.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0212463.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Jeffrey G. Klann; Matthew A. H. Joss; Kevin Embree; Shawn N. Murphy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ARCH ontologies and terminologies vs OMOP.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). CMS Synthetic Patient Data OMOP [Dataset]. https://redivis.com/workflows/6e6p-cfgn5hgz1

CMS Synthetic Patient Data OMOP

Explore at:

Dataset updated

Feb 24, 2025

Description

This is a synthetic patient dataset in the OMOP Common Data Model v5.2, originally released by the CMS and accessed via BigQuery. The dataset includes 24 tables and records for 2 million synthetic patients from 2008 to 2010.

Clear search

Close search

Google apps

Main menu

CMS Synthetic Patient Data OMOP

Synthea synthetic patient generator data in OMOP Common Data Model

Synthetic Patient Data in OMOP

Domain

Cost

CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in...

Example (synthetic) electronic health record data

OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes...

CPRD Primary Care and Linked Data OMOP Common Data Model

Austin Health OMOP Dataset

Provider

OMOP primary database assessment of risk.

Vocabulary

Connected Bradford - Secondary Care BRI OMOP database

Western Health OMOP Dataset

Payer plan period

EMR tables and related tables in the OMOP CDM.

Addressing the Challenges of Health Data Standard Adoption and Usage: A...

University College London Hospitals NHS OMOP dataset

ARCH ontologies and terminologies vs OMOP.

CMS Synthetic Patient Data OMOPSee More Versions

CMS Synthetic Patient Data OMOP