35 datasets found

OMOP results as of 20/10/22.
plos.figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). OMOP results as of 20/10/22. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t006
Dataset updated
Apr 18, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
IBM MarketScan OMOP
redivis.com
stanford.redivis.com
application/jsonl +7
Updated Jan 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). IBM MarketScan OMOP [Dataset]. http://doi.org/10.57761/zthm-yj89
Explore at:
stata, spss, sas, parquet, application/jsonl, avro, arrow, csvAvailable download formats
Unique identifier
https://doi.org/10.57761/zthm-yj89
Dataset updated
Jan 17, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Description
Abstract

MarketScan databases in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/)
f
EMR tables and related tables in the OMOP CDM.
datasetcatalog.nlm.nih.gov
figshare.com
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hallinan, Christine Mary; Boyle, Dougie; Chidgey, Christine; Ward, Roger; Ormiston-Smith, David (2024). EMR tables and related tables in the OMOP CDM. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001371235
Explore at:
Dataset updated
Apr 18, 2024
Authors
Hallinan, Christine Mary; Boyle, Dougie; Chidgey, Christine; Ward, Roger; Ormiston-Smith, David
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
CMS Synthetic Patient Data OMOP
redivis.com
application/jsonl +7
Updated Aug 19, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). CMS Synthetic Patient Data OMOP [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
sas, avro, parquet, stata, application/jsonl, arrow, csv, spssAvailable download formats
Dataset updated
Aug 19, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Jan 1, 2008 - Dec 31, 2010
Description
Abstract

This is a synthetic patient dataset in the OMOP Common Data Model v5.2, originally released by the CMS and accessed via BigQuery. The dataset includes 24 tables and records for 2 million synthetic patients from 2008 to 2010.

Methodology

This dataset takes on the format of the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). As shown in the diagram below, the purpose of the Common Data Model is to convert various distinctly-formatted datasets into a well-known, universal format with a set of standardized vocabularies. See the diagram below from the Observational Health Data Sciences and Informatics (OHDSI) webpage.

https://redivis.com/fileUploads/d1a95a4e-074a-44d1-92e5-9adfd2f4068a%3E" alt="Why-CDM.png">

Such universal data models ultimately enable researchers to streamline the analysis of observational medical data. For more information regarding the OMOP CDM, refer to the OHSDI OMOP site.

Usage

%3Cli%3EFor documentation regarding the source data format from the Center for Medicare and Medicaid Services (CMS), refer to the %3Ca href="https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF"%3ECMS Synthetic Public Use File%3C/a%3E.%3C/li%3E

%3Cli%3EFor information regarding the conversion of the CMS data file to the OMOP CDM v5.2, refer to %3Ca href="https://github.com/OHDSI/ETL-CMS"%3Ethis OHDSI GitHub page%3C/a%3E. %3C/li%3E

%3Cli%3EFor information regarding each of the 24 tables in this dataset, including more detailed variable metadata, see %3Ca href="https://github.com/OHDSI/CommonDataModel/wiki"%3Ethe OHDSI CDM GitHub Wiki page%3C/a%3E. All variable labels and descriptions as well as table descriptions come from this Wiki page. Note that this GitHub page includes information primarily regarding the 6.0 version of the CDM and that this dataset works with the 5.2 version. %3C/li%3E
h
Connected Bradford - Secondary Care BRI OMOP database
healthdatagateway.org
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Connected Bradford. Yorkshire & Humber Secure Data Environment., Connected Bradford - Secondary Care BRI OMOP database [Dataset]. https://healthdatagateway.org/en/dataset/1101
Explore at:
unknownAvailable download formats
Dataset authored and provided by
Connected Bradford. Yorkshire & Humber Secure Data Environment.
License
https://bradfordresearch.nhs.uk/connected-bradford/https://bradfordresearch.nhs.uk/connected-bradford/
Description
This dataset is an extract from the Bradford Royal Infirmary EPR system. This contains current and some historical data, and is based on extracting the relevant tables from EPR, mapping to the OMOP schema and outputting in omop cdm 5.3 format.
Leeds Teaching Hospitals OMOP Database
healthdatagateway.org
unknown
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leeds Teaching Hospitals NHS Trust (LTHT) (2025). Leeds Teaching Hospitals OMOP Database [Dataset]. https://healthdatagateway.org/dataset/1320
Explore at:
unknownAvailable download formats
Dataset updated
May 16, 2025
Dataset provided by
Leeds Teaching Hospitals NHS Trust
Authors
Leeds Teaching Hospitals NHS Trust (LTHT)
License
https://www.leedsth.nhs.uk/research/our-research/research-data-strategy/https://www.leedsth.nhs.uk/research/our-research/research-data-strategy/
Description
The Leeds Teaching Hospitals NHS Trust (LTHT) OMOP database is a robust, longitudinal dataset constructed using data from the electronic health records (EHR) of patients treated and diagnosed at Leeds Teaching Hospitals NHS Trust since 2003. This comprehensive resource is mapped to the OMOP CDM, ensuring interoperability with other OMOP databases, and enabling privacy-preserving, large-scale, multi-centre studies.

Encompassing a wide array of clinical data, the database includes information on demographics, diagnoses, procedures, medications and laboratory results. A particular strength lies in its detailed cancer-specific data, which supports in-depth analyses of treatment outcomes, survival rates, and disease progression. This makes it an invaluable resource for researchers focusing on oncology, as well as those interested in broader secondary care settings.

Researchers can draw insights from the LTHT OMOP database through federated analytics approaches as well as through the use of standardised OHDSI tools, which enable secure, privacy-preserving analyses across multiple institutions, eliminating the need to access individual-level patient data.

Notably, the LTHT OMOP database has been instrumental in several high-profile studies:

• HERON Network: LTHT is a member of the HERON network, funded by HDR UK, which focuses on enhancing the quality and impact of cancer research through federated analytics. LTHT participated in a study examining the use of antibiotics which are in the WHO watchlist for high risk of antimicrobial resistance. • DigiONE Pilot Studies: These studies analyse harmonised routine care data from OMOP databases in 6 digitally mature European hospitals. Three studies have been conducted to date, focusing on the impact of the COVID-19 pandemic on cancer care, on metastatic non-small cell lung cancer, and on HER2-/HR+ metastatic breast cancer. • FALCON-Lung Study: This study focused on the uptake of immune checkpoint inhibitors for metastatic non-small cell lung cancer across the world, and implemented a clinically validated line of therapy algorithm using systemic anti-cancer therapy data in the OMOP databases of 17 international institutions.

In summary, the LTHT OMOP database stands as a robust resource for secondary care research, particularly in oncology. Its comprehensive, high-quality data, combined with a commitment to national and international collaboration, positions it as a cornerstone for advancing healthcare research and improving patient outcomes.

The LTHT OMOP database consists of the following tables and data:

• Visit occurrence: includes inpatient and outpatient admissions for all patients that are or have been part of the cancer pathway, as well as all in-patient admissions for all other patients. The visit_detail table has not been populated. • Condition occurrence: populated with all diagnoses in the Trust since 2003. • Drug exposure: populated. Includes all anti-cancer drugs (chemotherapy and immunotherapy), and selected antibiotics medication (all antibiotics that are in the WHO watchlist for antimicrobial resistance, as well as access antibiotics). Plans to extend this to all medication prescribed. • Procedure occurrence: populated. Includes surgical and radiotherapy procedures delivered to patients with cancer, as well as all surgical procedures delivered to all other patients. • Measurement: populated with weight, height, TNM staging, performance status, and metastasis location data. • Observation: populated with ethnicity, IMD quintile, clinical trial participation (cancer only) and cancer histology data. • Device exposure: not populated. • Death: populated from ONS.
Synthetic Patient Data in OMOP
console.cloud.google.com
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Department%20of%20Health%20%26%20Human%20Services&hl=ja (2023). Synthetic Patient Data in OMOP [Dataset]. https://console.cloud.google.com/marketplace/product/hhs/synpuf?hl=ja
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
Googlehttp://google.com/
Description
The Synthetic Patient Data in OMOP Dataset is a synthetic database released by the Centers for Medicare and Medicaid Services (CMS) Medicare Claims Synthetic Public Use Files (SynPUF). It is synthetic data containing 2008-2010 Medicare insurance claims for development and demonstration purposes. It has been converted to the Observational Medical Outcomes Partnership (OMOP) common data model from its original form, CSV, by the open source community as released on GitHub Please refer to the CMS Linkable 2008–2010 Medicare Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) User Manual for details regarding how DE-SynPUF was created." This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Optum ZIP5 OMOP
redivis.com
application/jsonl +7
Updated Mar 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2021). Optum ZIP5 OMOP [Dataset]. http://doi.org/10.57761/e54r-bg69
Explore at:
csv, avro, sas, spss, arrow, parquet, application/jsonl, stataAvailable download formats
Unique identifier
https://doi.org/10.57761/e54r-bg69
Dataset updated
Mar 3, 2021
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Description
Abstract

Optum ZIP5 v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/). This dataset covers 2003-Q1 to 2020-Q2

Section 10

A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.

It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences.

%3C!-- --%3E

For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.v

Conventions

Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.

Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.

Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 8

The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables. For example, the "Condition" Domain contains Concepts that describe a condition of a patient, and these Concepts can only be stored in the condition_concept_id field of the CONDITION_OCCURRENCE and CONDITION_ERA tables. This reference table is populated with a single record for each Domain and includes a descriptive name for the Domain.

Conventions

There is one record for each Domain. The domains are defined by the tables and fields in the OMOP CDM that can contain Concepts describing all the various aspects of the healthcare experience of a patient.

The domain_id field contains an alphanumerical identifier, that can also be used as the abbreviation of the Domain.

The domain_name field contains the unabbreviated names of the Domain.

Each Domain also has an entry in the Concept table, which is recorded in the domain_concept_id field. This is for purposes of creating a closed Information Model, where all entities in the OMOP CDM are covered by unique Concept.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 12

A Drug Era is defined as a span of time when the Person is assumed to be exposed to a particular active ingredient. A Drug Era is not the same as a Drug Exposure: Exposures are individual records corresponding to the source when Drug was delivered to the Person, while successive periods of Drug Exposures are combined under certain rules to produce continuous Drug Eras.

Conventions

Drug Eras are derived from records in the DRUG_EXPOSURE table using a standardized algorithm.

Each Drug Era corresponds to one or many Drug Exposures that form a continuous interval and contain the same Drug Ingredient (active compound).

The drug_concept_id field only contains Concepts that have the concept_class 'Ingredient'. The Ingredient is derived from the Drug Concepts in the DRUG_EXPOSURE table that are aggregated into the Drug Era record.

The Drug Era Start Date is the start date of the first Drug Exposure.

The Drug Era End Date is the end date of the last Drug Exposure. The End Date of each Drug Exposure is either taken from the field drug_exposure_end_date or, as it is typically not available, inferred using the following rules:

The Gap Days determine how many total drug-free days are observed between all Drug Exposure events that contribute to a DRUG_ERA record. It is assumed that the drugs are "not stockpiled" by the patient, i.e. that if a new drug prescription or refill is observed (a new DRUG_EXPOSURE record is written), the remaining supply from the previous events is abandoned.

The difference between Persistence Window and Gap Days is that the former is the maximum drug-free time allowed between two subsequent DRUG_EXPOSURE records, while the latter is the sum of actual drug-free days for the given Drug Era under the abo
f
Types of EMR systems studied.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ward, Roger; Hallinan, Christine Mary; Boyle, Dougie; Chidgey, Christine; Ormiston-Smith, David (2024). Types of EMR systems studied. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001371203
Explore at:
Dataset updated
Apr 18, 2024
Authors
Ward, Roger; Hallinan, Christine Mary; Boyle, Dougie; Chidgey, Christine; Ormiston-Smith, David
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
Person
redivis.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Person [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 7, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The Person Domain contains records that uniquely identify each patient in the source data who is time at-risk to have clinical observations recorded within the source systems.
h
CPRD Primary Care OMOP Common Data Model
healthdatagateway.org
unknown
Updated Dec 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CPRD (2024). CPRD Primary Care OMOP Common Data Model [Dataset]. http://doi.org/10.48329/6xtz-7b42
Explore at:
unknownAvailable download formats
Unique identifier
https://doi.org/10.48329/6xtz-7b42
Dataset updated
Dec 15, 2024
Dataset authored and provided by
CPRD
License
HTTPS://CPRD.COM/DATA-ACCESSHTTPS://CPRD.COM/DATA-ACCESS
Description
The CPRD Primary Care OMOP CDM database contains longitudinal routinely-collected health records (EHR data) from UK primary care practices. The data has been transformed into a common format (data model) using an open community data standard and structure from the OHDSI standardised vocabularies.
KETOS: Clinical decision support and machine learning as a service – A...
plos.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Gruendner; Thorsten Schwachhofer; Phillip Sippl; Nicolas Wolf; Marcel Erpenbeck; Christian Gulden; Lorenz A. Kapsner; Jakob Zierk; Sebastian Mate; Michael Stürzl; Roland Croner; Hans-Ulrich Prokosch; Dennis Toddenroth (2023). KETOS: Clinical decision support and machine learning as a service – A training and deployment platform based on Docker, OMOP-CDM, and FHIR Web Services [Dataset]. http://doi.org/10.1371/journal.pone.0223010
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0223010
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Julian Gruendner; Thorsten Schwachhofer; Phillip Sippl; Nicolas Wolf; Marcel Erpenbeck; Christian Gulden; Lorenz A. Kapsner; Jakob Zierk; Sebastian Mate; Michael Stürzl; Roland Croner; Hans-Ulrich Prokosch; Dennis Toddenroth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background and objectiveTo take full advantage of decision support, machine learning, and patient-level prediction models, it is important that models are not only created, but also deployed in a clinical setting. The KETOS platform demonstrated in this work implements a tool for researchers allowing them to perform statistical analyses and deploy resulting models in a secure environment.MethodsThe proposed system uses Docker virtualization to provide researchers with reproducible data analysis and development environments, accessible via Jupyter Notebook, to perform statistical analysis and develop, train and deploy models based on standardized input data. The platform is built in a modular fashion and interfaces with web services using the Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) standard to access patient data. In our prototypical implementation we use an OMOP common data model (OMOP-CDM) database. The architecture supports the entire research lifecycle from creating a data analysis environment, retrieving data, and training to final deployment in a hospital setting.ResultsWe evaluated the platform by establishing and deploying an analysis and end user application for hemoglobin reference intervals within the University Hospital Erlangen. To demonstrate the potential of the system to deploy arbitrary models, we loaded a colorectal cancer dataset into an OMOP database and built machine learning models to predict patient outcomes and made them available via a web service. We demonstrated both the integration with FHIR as well as an example end user application. Finally, we integrated the platform with the open source DataSHIELD architecture to allow for distributed privacy preserving data analysis and training across networks of hospitals.ConclusionThe KETOS platform takes a novel approach to data analysis, training and deploying decision support models in a hospital or healthcare setting. It does so in a secure and privacy-preserving manner, combining the flexibility of Docker virtualization with the advantages of standardized vocabularies, a widely applied database schema (OMOP-CDM), and a standardized way to exchange medical data (FHIR).
b
Observational Medical Outcomes Partnership
bioregistry.io
Updated Apr 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Observational Medical Outcomes Partnership [Dataset]. https://bioregistry.io/omop
Explore at:
Dataset updated
Apr 22, 2021
Description
The OMOP Common Data Model allows for the systematic analysis of disparate observational databases. The concept behind this approach is to transform data contained within those databases into a common format (data model) as well as a common representation (terminologies, vocabularies, coding schemes), and then perform systematic analyses using a library of standard analytic routines that have been written based on the common format.
f
Medication table mappings.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boyle, Dougie; Ward, Roger; Hallinan, Christine Mary; Ormiston-Smith, David; Chidgey, Christine (2024). Medication table mappings. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001371238
Explore at:
Dataset updated
Apr 18, 2024
Authors
Boyle, Dougie; Ward, Roger; Hallinan, Christine Mary; Ormiston-Smith, David; Chidgey, Christine
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
Domain
redivis.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Domain [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 7, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables.
Data from: Drug exposure
redivis.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Drug exposure [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 7, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The 'Drug' domain captures records about the utilization of a Drug when ingested or otherwise introduced into the body.
f
DataSheet_1_The Effect of Statins on Mortality of Patients With Chronic...
frontiersin.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ji Eun Kim; Yun Jin Choi; Se Won Oh; Myung Gyu Kim; Sang Kyung Jo; Won Yong Cho; Shin Young Ahn; Young Joo Kwon; Gang-Jee Ko (2023). DataSheet_1_The Effect of Statins on Mortality of Patients With Chronic Kidney Disease Based on Data of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) and Korea National Health Insurance Claims Database.docx [Dataset]. http://doi.org/10.3389/fneph.2021.821585.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fneph.2021.821585.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Ji Eun Kim; Yun Jin Choi; Se Won Oh; Myung Gyu Kim; Sang Kyung Jo; Won Yong Cho; Shin Young Ahn; Young Joo Kwon; Gang-Jee Ko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The role of statins in chronic kidney disease (CKD) has been extensively evaluated, but it remains controversial in specific population such as dialysis-dependent CKD. This study examined the effect of statins on mortality in CKD patients using two large databases. In data from the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) from two hospitals, CKD was defined as an estimated glomerular filtration rate < 60 mL/min/m2; we compared survival between patients with or without statin treatment. As a sensitivity analysis, the results were validated with the Korea National Health Insurance (KNHI) claims database. In the analysis of CDM datasets, statin users showed significantly lower risks of all-cause and cardiovascular mortality in both hospitals, compared to non-users. Similar results were observed in CKD patients from the KNHI claims database. Lower mortality in the statin group was consistently evident in all subgroup analyses, including patients on dialysis and low-risk young patients. In conclusion, we found that statins were associated with lower mortality in CKD patients, regardless of dialysis status or other risk factors.
Optum DOD OMOP
redivis.com
application/jsonl +7
Updated Aug 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Optum DOD OMOP [Dataset]. http://doi.org/10.57761/dbqm-8c86
Explore at:
sas, csv, stata, application/jsonl, parquet, arrow, spss, avroAvailable download formats
Unique identifier
https://doi.org/10.57761/dbqm-8c86
Dataset updated
Aug 18, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Description
Abstract

Optum DOD (Date of Death) v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/)

Section 10

A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.

It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences.

%3C!-- --%3E

For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.

Conventions

Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.

Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.

Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 5

The CONCEPT_ANCESTOR table is designed to simplify observational analysis by providing the complete hierarchical relationships between Concepts. Only direct parent-child relationships between Concepts are stored in the CONCEPT_RELATIONSHIP table. To determine higher level ancestry connections, all individual direct relationships would have to be navigated at analysis time. The CONCEPT_ANCESTOR table includes records for all parent-child relationships, as well as grandparent-grandchild relationships and those of any other level of lineage.

Using the CONCEPT_ANCESTOR table allows for querying for all descendants of a hierarchical concept. For example, drug ingredients and drug products are all descendants of a drug class ancestor.

Conventions

The concept_name field contains a valid Synonym of a concept, including the description in the concept_name itself. I.e. each Concept has at least one Synonym in the CONCEPT_SYNONYM table. As an example, for a SNOMED-CT Concept, if the fully specified name is stored as the concept_name of the CONCEPT table, then the Preferred Term and Synonyms associated with the Concept are stored in the CONCEPT_SYNONYM table.

Only Synonyms that are active and current are stored in the CONCEPT_SYNONYM table. Tracking synonym/description history and mapping of obsolete synonyms to current Concepts/Synonyms is out of scope for the Standard Vocabularies.

Currently, only English Synonyms are included.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 4

The COST table captures records containing the cost of any medical entity recorded in one of the DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, VISIT_OCCURRENCE or DEVICE_OCCURRENCE tables.

The information about the cost is defined by the amount of money paid by the Person and Payer, or as the charged cost by the healthcare provider. So, the COST table can be used to represent both cost and revenue perspectives. The cost_type_concept_id field will use concepts in the Standardized Vocabularies to designate the source of the cost data. A reference to the health plan information in the PAYER_PLAN_PERIOD table is stored in the record that is responsible for the determination of the cost as well as some of the payments.

Convention

The COST table will store information reporting money or currency amounts. There are three types of cost data, defined in the cost_type_concept_id: 1) paid or reimbursed amounts, 2) charges or list prices (such as Average Wholesale Prices), and 3) costs or expenses incurred by the provider. The defined fields are variables found in almost all U.S.-based claims data sources, which is the most common data source for researchers. Non-U.S.-based data holders are encouraged to engage with OHDSI to adjust these tables to their needs.

One cost record is generated for each response by a payer. In a claims databases, the payment and payment terms reported by the payer for the goods or services billed will generate one cost record. If the source data has payment information f
Cost
redivis.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Cost [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 7, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The COST table captures records containing the cost of any medical event recorded in one of the OMOP clinical event tables such as DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, VISIT_OCCURRENCE, VISIT_DETAIL, DEVICE_OCCURRENCE, OBSERVATION or MEASUREMENT.
Provider
redivis.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2020). Provider [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Explore at:
Dataset updated
Sep 7, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
2008 - 2010
Description
The PROVIDER table contains a list of uniquely identified healthcare providers. These are individuals providing hands-on healthcare to patients, such as physicians, nurses, midwives, physical therapists etc.

Facebook

Twitter

Click to copy link

Link copied

Cite

Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). OMOP results as of 20/10/22. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t006

OMOP results as of 20/10/22.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0301557.t006

Dataset updated

Apr 18, 2024

Dataset provided by

PLOShttp://plos.org/

Authors

Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

Clear search

Close search

Google apps

Main menu

OMOP results as of 20/10/22.

IBM MarketScan OMOP

Abstract

EMR tables and related tables in the OMOP CDM.

CMS Synthetic Patient Data OMOP

Abstract

Methodology

Usage

Connected Bradford - Secondary Care BRI OMOP database

Leeds Teaching Hospitals OMOP Database

Synthetic Patient Data in OMOP

Optum ZIP5 OMOP

Abstract

Section 10

Section 8

Section 12

Types of EMR systems studied.

Person

CPRD Primary Care OMOP Common Data Model

KETOS: Clinical decision support and machine learning as a service – A...

Observational Medical Outcomes Partnership

Medication table mappings.

Domain

Data from: Drug exposure

DataSheet_1_The Effect of Statins on Mortality of Patients With Chronic...

Optum DOD OMOP

Abstract

Section 10

Section 5

Section 4

Cost

Provider

OMOP results as of 20/10/22.See More Versions

OMOP results as of 20/10/22.