53 datasets found

CMS Synthetic Patient Data OMOP
redivis.com
application/jsonl +7
Updated Aug 19, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). CMS Synthetic Patient Data OMOP [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
sas, avro, parquet, stata, application/jsonl, arrow, csv, spssAvailable download formats
Dataset updated
Aug 19, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Jan 1, 2008 - Dec 31, 2010
Description
Abstract

This is a synthetic patient dataset in the OMOP Common Data Model v5.2, originally released by the CMS and accessed via BigQuery. The dataset includes 24 tables and records for 2 million synthetic patients from 2008 to 2010.

Methodology

This dataset takes on the format of the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). As shown in the diagram below, the purpose of the Common Data Model is to convert various distinctly-formatted datasets into a well-known, universal format with a set of standardized vocabularies. See the diagram below from the Observational Health Data Sciences and Informatics (OHDSI) webpage.

https://redivis.com/fileUploads/d1a95a4e-074a-44d1-92e5-9adfd2f4068a%3E" alt="Why-CDM.png">

Such universal data models ultimately enable researchers to streamline the analysis of observational medical data. For more information regarding the OMOP CDM, refer to the OHSDI OMOP site.

Usage

%3Cli%3EFor documentation regarding the source data format from the Center for Medicare and Medicaid Services (CMS), refer to the %3Ca href="https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF"%3ECMS Synthetic Public Use File%3C/a%3E.%3C/li%3E

%3Cli%3EFor information regarding the conversion of the CMS data file to the OMOP CDM v5.2, refer to %3Ca href="https://github.com/OHDSI/ETL-CMS"%3Ethis OHDSI GitHub page%3C/a%3E. %3C/li%3E

%3Cli%3EFor information regarding each of the 24 tables in this dataset, including more detailed variable metadata, see %3Ca href="https://github.com/OHDSI/CommonDataModel/wiki"%3Ethe OHDSI CDM GitHub Wiki page%3C/a%3E. All variable labels and descriptions as well as table descriptions come from this Wiki page. Note that this GitHub page includes information primarily regarding the 6.0 version of the CDM and that this dataset works with the 5.2 version. %3C/li%3E
h
CPRD Primary Care OMOP Common Data Model
healthdatagateway.org
unknown
Updated Dec 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CPRD (2024). CPRD Primary Care OMOP Common Data Model [Dataset]. http://doi.org/10.48329/6xtz-7b42
Explore at:
unknownAvailable download formats
Unique identifier
https://doi.org/10.48329/6xtz-7b42
Dataset updated
Dec 15, 2024
Dataset authored and provided by
CPRD
License
HTTPS://CPRD.COM/DATA-ACCESSHTTPS://CPRD.COM/DATA-ACCESS
Description
The CPRD Primary Care OMOP CDM database contains longitudinal routinely-collected health records (EHR data) from UK primary care practices. The data has been transformed into a common format (data model) using an open community data standard and structure from the OHDSI standardised vocabularies.
Synthea synthetic patient generator data in OMOP Common Data Model
registry.opendata.aws
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
Explore at:
Dataset updated
Jan 4, 2023
Dataset provided by
Amazon.comhttp://amazon.com/
Description
The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079
Domain
redivis.com
Updated Sep 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Domain [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 6, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables.
f
Data_Sheet_1_Enabling data sharing and utilization for African population...
figshare.com
docx
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sylvia Kiwuwa-Muyingo; Jim Todd; Tathagata Bhattacharjee; Amelia Taylor; Jay Greenfield (2023). Data_Sheet_1_Enabling data sharing and utilization for African population health data using OHDSI tools with an OMOP-common data model.docx [Dataset]. http://doi.org/10.3389/fpubh.2023.1116682.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2023.1116682.s001
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Sylvia Kiwuwa-Muyingo; Jim Todd; Tathagata Bhattacharjee; Amelia Taylor; Jay Greenfield
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The COVID-19 pandemic has spurred the use of AI and DS innovations in data collection and aggregation. Extensive data on many aspects of the COVID-19 has been collected and used to optimize public health response to the pandemic and to manage the recovery of patients in Sub-Saharan Africa. However, there is no standard mechanism for collecting, documenting and disseminating COVID-19 related data or metadata, which makes the use and reuse a challenge. INSPIRE utilizes the Observational Medical Outcomes Partnership (OMOP) as the Common Data Model (CDM) implemented in the cloud as a Platform as a Service (PaaS) for COVID-19 data. The INSPIRE PaaS for COVID-19 data leverages the cloud gateway for both individual research organizations and for data networks. Individual research institutions may choose to use the PaaS to access the FAIR data management, data analysis and data sharing capabilities which come with the OMOP CDM. Network data hubs may be interested in harmonizing data across localities using the CDM conditioned by the data ownership and data sharing agreements available under OMOP's federated model. The INSPIRE platform for evaluation of COVID-19 Harmonized data (PEACH) harmonizes data from Kenya and Malawi. Data sharing platforms must remain trusted digital spaces that protect human rights and foster citizens' participation is vital in an era where information overload from the internet exists. The channel for sharing data between localities is included in the PaaS and is based on data sharing agreements provided by the data producer. This allows the data producers to retain control over how their data are used, which can be further protected through the use of the federated CDM. Federated regional OMOP-CDM are based on the PaaS instances and analysis workbenches in INSPIRE-PEACH with harmonized analysis powered by the AI technologies in OMOP. These AI technologies can be used to discover and evaluate pathways that COVID-19 cohorts take through public health interventions and treatments. By using both the data mapping and terminology mapping, we construct ETLs that populate the data and/or metadata elements of the CDM, making the hub both a central model and a distributed model.
Relationship
redivis.com
Updated Sep 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Relationship [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 6, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The RELATIONSHIP table provides a reference list of all types of relationships that can be used to associate any two concepts in the CONCEPT_RELATIONSHP table.
h
Synthea OMOP (CDM) - North East and North Cumbria
healthdatagateway.org
unknown
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Synthea OMOP (CDM) - North East and North Cumbria [Dataset]. https://healthdatagateway.org/en/dataset/1351
Explore at:
unknownAvailable download formats
Dataset updated
Jun 17, 2025
License
https://northeastnorthcumbria.nhs.uk/our-work/secure-data-environment/https://northeastnorthcumbria.nhs.uk/our-work/secure-data-environment/
Description
Synthetic Primary Care Data (Synthea) transformed into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)

Data is sourced from https://synthea.mitre.org/downloads using the 100 sample patient CSV variant of available downloads. Data has been transformed using the ETL methods described by https://github.com/OHDSI/ETL-Synthea

This is a patient level dataset of Primary Care data covering 100 synthetic patients
Data from: Drug exposure
redivis.com
Updated Sep 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Drug exposure [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 6, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The 'Drug' domain captures records about the utilization of a Drug when ingested or otherwise introduced into the body.
Additional file 3 of Empirical assessment of alternative methods for...
springernature.figshare.com
zip
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Molinaro; Frank DeFalco (2023). Additional file 3 of Empirical assessment of alternative methods for identifying seasonality in observational healthcare data [Dataset]. http://doi.org/10.6084/m9.figshare.20222597.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20222597.v1
Dataset updated
Jun 5, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anthony Molinaro; Frank DeFalco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 3: upsetRplots.zip. All 30 UpsetR plots.
f
OMOP results as of 20/10/22.
plos.figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). OMOP results as of 20/10/22. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t006
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
u
Example (synthetic) electronic health record data
rdr.ucl.ac.uk
application/csv
Updated Apr 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steve Harris; Wai Shing Lai (2024). Example (synthetic) electronic health record data [Dataset]. http://doi.org/10.5522/04/25676298.v1
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.5522/04/25676298.v1
Dataset updated
Apr 24, 2024
Dataset provided by
University College London
Authors
Steve Harris; Wai Shing Lai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These data are modelled using the OMOP Common Data Model v5.3.Correlated Data SourceNG tube vocabulariesGeneration RulesThe patient’s age should be between 18 and 100 at the moment of the visit.Ethnicity data is using 2021 census data in England and Wales (Census in England and Wales 2021) .Gender is equally distributed between Male and Female (50% each).Every person in the record has a link in procedure_occurrence with the concept “Checking the position of nasogastric tube using X-ray”2% of person records have a link in procedure_occurrence with the concept of “Plain chest X-ray”60% of visit_occurrence has visit concept “Inpatient Visit”, while 40% have “Emergency Room Visit”NotesVersion 0Generated by man-made rule/story generatorStructural correct, all tables linked with the relationshipWe used national ethnicity data to generate a realistic distribution (see below)2011 Race Census figure in England and WalesEthnic Group : Population(%)Asian or Asian British: Bangladeshi - 1.1Asian or Asian British: Chinese - 0.7Asian or Asian British: Indian - 3.1Asian or Asian British: Pakistani - 2.7Asian or Asian British: any other Asian background -1.6Black or African or Caribbean or Black British: African - 2.5Black or African or Caribbean or Black British: Caribbean - 1Black or African or Caribbean or Black British: other Black or African or Caribbean background - 0.5Mixed multiple ethnic groups: White and Asian - 0.8Mixed multiple ethnic groups: White and Black African - 0.4Mixed multiple ethnic groups: White and Black Caribbean - 0.9Mixed multiple ethnic groups: any other Mixed or multiple ethnic background - 0.8White: English or Welsh or Scottish or Northern Irish or British - 74.4White: Irish - 0.9White: Gypsy or Irish Traveller - 0.1White: any other White background - 6.4Other ethnic group: any other ethnic group - 1.6Other ethnic group: Arab - 0.6
Person
redivis.com
Updated Sep 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Person [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 6, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The Person Domain contains records that uniquely identify each patient in the source data who is time at-risk to have clinical observations recorded within the source systems.
Optum ZIP5 OMOP
redivis.com
application/jsonl +7
Updated Mar 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2021). Optum ZIP5 OMOP [Dataset]. http://doi.org/10.57761/e54r-bg69
Explore at:
sas, csv, arrow, application/jsonl, stata, spss, avro, parquetAvailable download formats
Unique identifier
https://doi.org/10.57761/e54r-bg69
Dataset updated
Mar 3, 2021
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Description
Abstract

Optum ZIP5 v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/). This dataset covers 2003-Q1 to 2020-Q2

Section 10

A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.

It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences.

%3C!-- --%3E

For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.v

Conventions

Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.

Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.

Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 8

The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables. For example, the "Condition" Domain contains Concepts that describe a condition of a patient, and these Concepts can only be stored in the condition_concept_id field of the CONDITION_OCCURRENCE and CONDITION_ERA tables. This reference table is populated with a single record for each Domain and includes a descriptive name for the Domain.

Conventions

There is one record for each Domain. The domains are defined by the tables and fields in the OMOP CDM that can contain Concepts describing all the various aspects of the healthcare experience of a patient.

The domain_id field contains an alphanumerical identifier, that can also be used as the abbreviation of the Domain.

The domain_name field contains the unabbreviated names of the Domain.

Each Domain also has an entry in the Concept table, which is recorded in the domain_concept_id field. This is for purposes of creating a closed Information Model, where all entities in the OMOP CDM are covered by unique Concept.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 12

A Drug Era is defined as a span of time when the Person is assumed to be exposed to a particular active ingredient. A Drug Era is not the same as a Drug Exposure: Exposures are individual records corresponding to the source when Drug was delivered to the Person, while successive periods of Drug Exposures are combined under certain rules to produce continuous Drug Eras.

Conventions

Drug Eras are derived from records in the DRUG_EXPOSURE table using a standardized algorithm.

Each Drug Era corresponds to one or many Drug Exposures that form a continuous interval and contain the same Drug Ingredient (active compound).

The drug_concept_id field only contains Concepts that have the concept_class 'Ingredient'. The Ingredient is derived from the Drug Concepts in the DRUG_EXPOSURE table that are aggregated into the Drug Era record.

The Drug Era Start Date is the start date of the first Drug Exposure.

The Drug Era End Date is the end date of the last Drug Exposure. The End Date of each Drug Exposure is either taken from the field drug_exposure_end_date or, as it is typically not available, inferred using the following rules:

The Gap Days determine how many total drug-free days are observed between all Drug Exposure events that contribute to a DRUG_ERA record. It is assumed that the drugs are "not stockpiled" by the patient, i.e. that if a new drug prescription or refill is observed (a new DRUG_EXPOSURE record is written), the remaining supply from the previous events is abandoned.

The difference between Persistence Window and Gap Days is that the former is the maximum drug-free time allowed between two subsequent DRUG_EXPOSURE records, while the latter is the sum of actual drug-free days for the given Drug Era under the abo
AFC OMOP DID
redivis.com
application/jsonl +7
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2024). AFC OMOP DID [Dataset]. http://doi.org/10.57761/88ka-5r20
Explore at:
csv, avro, stata, arrow, application/jsonl, spss, sas, parquetAvailable download formats
Unique identifier
https://doi.org/10.57761/88ka-5r20
Dataset updated
Aug 27, 2024
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Description
Abstract

This dataset is the American Family Cohort (AFC) Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) dataset.

This dataset is a medium risk (confidential) de-identified dataset (OMOP DID).

Note: A few updates have been made to the dataset in the August 2024 release. Please check the "Update Notes" section for more details.

Usage

For more details please go to:

https://ohdsi.github.io/CommonDataModel/cdm54.html

AFC OMOP Specifications

Metadata access is required to view this section.

Update Notes

Metadata access is required to view this section.

Addressing the Challenges of Health Data Standard Adoption and Usage: A...

zenodo.org

bin

Updated May 12, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Alberto Marfoglia; Alberto Marfoglia; Valerio Antonio Arcobelli; Valerio Antonio Arcobelli; SERENA MOSCATO; SERENA MOSCATO; Antonino Amedeo La Mattina; Antonino Amedeo La Mattina; Sabato Mellone; Sabato Mellone; ANTONELLA CARBONARO; ANTONELLA CARBONARO (2025). Addressing the Challenges of Health Data Standard Adoption and Usage: A Systematic Review - Data Extraction [Dataset]. http://doi.org/10.5281/zenodo.15358180

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15358180

Dataset updated

May 12, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

May 7, 2025

Description

This table presents the data extraction from the 99 studies included according to the criteria outlined in the main manuscript. It is provided as supplementary material to enhance the readability of the paper while ensuring that all relevant information is preserved and accessible without loss of detail.

The names of the variables and their descriptions are provided in the attached file, along with the following details:

Variable		Description
Ref.		The citation in the format: First author et al. [Year] (e.g., AuthorA et al. [2022]). This identifies the study's primary citation for easy reference.
Title		The title of the paper
Standard		The healthcare data standard used in the study. Possible values are: OMOP, OpenEHR, FHIR.
Study Location		The country where the study was conducted.
Objective for using the standard	Detailed	The comprehensive explanation of the specific objective of using the standard in the study, describing how it supports the study’s goals.
	Short	The primary purpose for applying the healthcare standard. Possible values are: Secondary data reuse, Data exchange, Clinical decision support, Vocabulary definition, EHR system design,
Application domain	Type	The application domain type that represents the healthcare standard. Possible solution are: Clinical: Studies with a direct impact on clinical practice, applying established tools or methods in healthcare settings (e.g., predicting in-hospital mortality for heart attack patients) and Research: Studies proposing innovative tools, methodologies, or frameworks still in the design/testing phase, not yet clinically implemented.
	Healthcare Area	The relevant healthcare domain for the study, such as Cardiovascular, Intensive Care Unit, Emergency Department, Oncology, Biology, etc.
	Cluster	The healthcare domain clusterized for easier readability. Possible values include: Clinical Medicine, Clinical Services and Diagnostics, Public Health, Health Information Management and Biomedical Sciences
	Use	This report if the results of the paper serving a Primary use (direct care) or a Secondary use (repurposing existing data or tools for new objectives).
Scale		The scale of the study. Possible values are: Single center (one hospital/clinic), Multi-center (multiple institutions), Regional (specific region), National level (countrywide).
Dataset magnitude in patients		The magnitude of the dataset expressed in chars. Possible values are: A (<10 to 99), B (100 to 9,999), C (10,000 to 999,999) and D (1,000,000 and above).
N° Elements		The number of variables of input in the process of standardization.
Percentuage of mapped variables		The percentage of successful data standardisation.
Coverage of the standard		The methodology of standardisation wheter it was adapted or not.
ETL Tools	Data cleaning & extraction	The tools adopted for supporting data cleaning and extraction.
	Mapping	The tools adopted for the mapping of the variables.
	Validation	The tools adopted for the validation of the standardization process.
	Database	The database adopted for storing the result of the healthcare data standardization.
Process efficiency and Economic assessment		The information about the economic impact if the consequences are concrete and measured by the authors (e.g., actual cost savings, resource usage reductions). If the authors did not measure the economic impact, this field remains blank.
Comments by authors	Limitations	The significant limitations or challenges faced during the study about the standard adopted, such as issues with data compatibility, scalability, or the need for customization.
	Advantages	The benefits of applying the standard model, such as improved data consistency, enhanced clinical outcomes, better interoperability, or more efficient workflows.

Genomics England - OMOP CDM
healthdatagateway.org
unknown
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Genomics England Ltd (2024). Genomics England - OMOP CDM [Dataset]. https://healthdatagateway.org/en/dataset/373
Explore at:
unknownAvailable download formats
Dataset updated
Oct 8, 2024
Dataset provided by
Genomics England
Authors
Genomics England Ltd
License
https://www.genomicsengland.co.uk/research/academichttps://www.genomicsengland.co.uk/research/academic
Description
Genomics England 100k data in OMOP CDM v5.4 format. Includes 100k data and PHE NCRAS data.
f
EMR tables and related tables in the OMOP CDM.
figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). EMR tables and related tables in the OMOP CDM. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t004
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

OMOP2OBO Condition Occurrence Mappings

zenodo.org
data.niaid.nih.gov

bin

Updated Mar 29, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Tiffany J Callahan; Tiffany J Callahan; Jordan M Wyrwa; Jordan M Wyrwa; Nicole A Vasilevsky; Nicole A Vasilevsky; Tellen D Bennett; Tellen D Bennett; Blake Martin; James A Feinstein; James A Feinstein; William A Baumgartner; William A Baumgartner; Lawrence D Hunter; Lawrence D Hunter; Michael G Kahn; Michael G Kahn; Blake Martin (2023). OMOP2OBO Condition Occurrence Mappings [Dataset]. http://doi.org/10.5281/zenodo.7250177

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7250177

Dataset updated

Mar 29, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

OMOP2OBO Condition Occurrence Mappings V1.0

These mappings were created by the OMOP2OBO mapping algorithm (see links below). OMOP2OBO - the first health system-wide, disease-agnostic mappings between standardized clinical terminologies and eight Open Biomedical Ontology (OBO) Foundry ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, vaccines, and proteins. These mappings are also the first to be explicitly created using standard terminologies in the Observational Medical Outcomes (OMOP) common data model (CDM), ensuring both semantic and clinical interoperability across a space of N conditions (and N relationships curated in these ontologies).

The mappings in this repository were created between OMOP standard condition occurrence concepts (i.e., SNOMED CT) to the Human Phenotype Ontology (HPO) and the (Mondo). The National Library of Medicine's Unified Medical Language System (UMLS) Semantic Types are first used to filter out all concepts that did not have a biological origin (accidents, injuries, external complications, and findings without clear interpretations). Then, the Semantic Type was used to prioritize the mapping of HPO concepts to findings and symptoms and Mondo to Semantic Types indicative of disease. For these OMOP domains, owl:intersectionOf (“and”), and owl:unionOf (“or”) constructors were used to construct semantically expressive mappings.

Mapping Details
Mappings included in this set were generated automatically using OMOP2OBO or through the use of a Bag-of-words embedding model using TF-IDF. Cosine similarity is used to compute similarity scores between all pairwise combinations of OMOP and OBO concepts and ancestor concepts. To improve the efficiency of this process, the algorithm searches only the top 𝑛 most similar results and keeps the top 75th percentile among all pairs with scores >= 0.25. Manually created mappings are also included.

Mapping Categories

Automatic One-to-One Concept: Exact label or synonym, dbXRef, or expert validated mapping @ concept-level; 1:1
Automatic One-to-One Ancestor: Exact label or synonym, dbXRef, or expert validated mapping @ concept ancestor-level; 1:1
Automatic One-to-Many Concept: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Automatic One-to-Many Ancestor: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Manual One-to-One: Hand mapping created using expert suggested resources; 1:1
Manual One-to-Many: Hand mapping created using expert suggested resources; 1:Many
Cosine Similarity: score suggested mapping -- manually verified
UnMapped: No suitable mapping or not mapped type

Mapping Statistics
Additional statistics have been provided for the mappings and are shown in the table below. This table presents the counts of OMOP concepts by mapping category and ontology:

Mapping Category	HPO	Mondo
Automatic One-to-One Concept	4767	9097
Automatic One-to-Many Concept	150	885
Cosine Similarity	1375	667
Automatic One-to-One Ancestor	13595	8911
Automatic One-to-Many Ancestor	38080	40224
Manual	5131	755
Manual One-to-Many	10326	2835
Unmapped	36301	46345

Provenance and Versioning: The V1.0 deposited mappings were created by OMOP2OBO v1.0.0 on October 2022 using the OMOP Common Data Model V5.0 and OBO Foundry ontologies downloaded on September 14, 2020.

Caveats: The deposited files only contain the mappings that were generated automatically by the algorithm. The manually generated mappings will be deposited with the official preprint manuscript. Please note that these are the original mappings that were created for the preprint. They have not been updated to current versions of the ontologies. In our experience, this should result in very few errors, but we do suggest that you check the ontology concepts used against current versions of each ontology before using them.

Important Resources and Documentation

GitHub: OMOP2OBO
Project Wiki: OMOP2OBO - wiki
Zenodo Community: OMOP2OBO
Preprint Manuscript: 10.5281/zenodo.5716421

f
Medication table mappings.
plos.figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). Medication table mappings. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t005
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
h
Connected Bradford - Secondary Care BRI OMOP database
healthdatagateway.org
unknown
Updated Jan 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Connected Bradford. Yorkshire & Humber Secure Data Environment. (2025). Connected Bradford - Secondary Care BRI OMOP database [Dataset]. https://healthdatagateway.org/en/dataset/1101
Explore at:
unknownAvailable download formats
Dataset updated
Jan 31, 2025
Dataset authored and provided by
Connected Bradford. Yorkshire & Humber Secure Data Environment.
License
https://bradfordresearch.nhs.uk/connected-bradford/https://bradfordresearch.nhs.uk/connected-bradford/
Description
This dataset is an extract from the Bradford Royal Infirmary EPR system. This contains current and some historical data, and is based on extracting the relevant tables from EPR, mapping to the OMOP schema and outputting in omop cdm 5.3 format.

Facebook

Twitter

Click to copy link

Link copied

Cite

Redivis Demo Organization (2020). CMS Synthetic Patient Data OMOP [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7

CMS Synthetic Patient Data OMOP

Explore at:

sas, avro, parquet, stata, application/jsonl, arrow, csv, spssAvailable download formats

Dataset updated

Aug 19, 2020

Dataset provided by

Redivis Inc.

Authors

Redivis Demo Organization

Time period covered

Jan 1, 2008 - Dec 31, 2010

Description

Abstract

This is a synthetic patient dataset in the OMOP Common Data Model v5.2, originally released by the CMS and accessed via BigQuery. The dataset includes 24 tables and records for 2 million synthetic patients from 2008 to 2010.

Methodology

This dataset takes on the format of the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). As shown in the diagram below, the purpose of the Common Data Model is to convert various distinctly-formatted datasets into a well-known, universal format with a set of standardized vocabularies. See the diagram below from the Observational Health Data Sciences and Informatics (OHDSI) webpage.

https://redivis.com/fileUploads/d1a95a4e-074a-44d1-92e5-9adfd2f4068a%3E" alt="Why-CDM.png">

Such universal data models ultimately enable researchers to streamline the analysis of observational medical data. For more information regarding the OMOP CDM, refer to the OHSDI OMOP site.

Usage

%3Cli%3EFor documentation regarding the source data format from the Center for Medicare and Medicaid Services (CMS), refer to the %3Ca href="https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF"%3ECMS Synthetic Public Use File%3C/a%3E.%3C/li%3E

%3Cli%3EFor information regarding the conversion of the CMS data file to the OMOP CDM v5.2, refer to %3Ca href="https://github.com/OHDSI/ETL-CMS"%3Ethis OHDSI GitHub page%3C/a%3E. %3C/li%3E

%3Cli%3EFor information regarding each of the 24 tables in this dataset, including more detailed variable metadata, see %3Ca href="https://github.com/OHDSI/CommonDataModel/wiki"%3Ethe OHDSI CDM GitHub Wiki page%3C/a%3E. All variable labels and descriptions as well as table descriptions come from this Wiki page. Note that this GitHub page includes information primarily regarding the 6.0 version of the CDM and that this dataset works with the 5.2 version. %3C/li%3E

Clear search

Close search

Google apps

Main menu

CMS Synthetic Patient Data OMOP

Abstract

Methodology

Usage

CPRD Primary Care OMOP Common Data Model

Synthea synthetic patient generator data in OMOP Common Data Model

Domain

Data_Sheet_1_Enabling data sharing and utilization for African population...

Relationship

Synthea OMOP (CDM) - North East and North Cumbria

Data from: Drug exposure

Additional file 3 of Empirical assessment of alternative methods for...

OMOP results as of 20/10/22.

Example (synthetic) electronic health record data

Person

Optum ZIP5 OMOP

Abstract

Section 10

Section 8

Section 12

AFC OMOP DID

Abstract

Usage

AFC OMOP Specifications

Update Notes

Addressing the Challenges of Health Data Standard Adoption and Usage: A...

Genomics England - OMOP CDM

EMR tables and related tables in the OMOP CDM.

OMOP2OBO Condition Occurrence Mappings

Medication table mappings.

Connected Bradford - Secondary Care BRI OMOP database

CMS Synthetic Patient Data OMOPSee More Versions

Abstract

Methodology

Usage

CMS Synthetic Patient Data OMOP