53 datasets found
  1. CMS Synthetic Patient Data OMOP

    • redivis.com
    application/jsonl +7
    Updated Aug 19, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2020). CMS Synthetic Patient Data OMOP [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
    Explore at:
    sas, avro, parquet, stata, application/jsonl, arrow, csv, spssAvailable download formats
    Dataset updated
    Aug 19, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Jan 1, 2008 - Dec 31, 2010
    Description

    Abstract

    This is a synthetic patient dataset in the OMOP Common Data Model v5.2, originally released by the CMS and accessed via BigQuery. The dataset includes 24 tables and records for 2 million synthetic patients from 2008 to 2010.

    Methodology

    This dataset takes on the format of the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). As shown in the diagram below, the purpose of the Common Data Model is to convert various distinctly-formatted datasets into a well-known, universal format with a set of standardized vocabularies. See the diagram below from the Observational Health Data Sciences and Informatics (OHDSI) webpage.

    https://redivis.com/fileUploads/d1a95a4e-074a-44d1-92e5-9adfd2f4068a%3E" alt="Why-CDM.png">

    Such universal data models ultimately enable researchers to streamline the analysis of observational medical data. For more information regarding the OMOP CDM, refer to the OHSDI OMOP site.

    Usage

    %3Cli%3EFor documentation regarding the source data format from the Center for Medicare and Medicaid Services (CMS), refer to the %3Ca href="https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF"%3ECMS Synthetic Public Use File%3C/a%3E.%3C/li%3E

    %3Cli%3EFor information regarding the conversion of the CMS data file to the OMOP CDM v5.2, refer to %3Ca href="https://github.com/OHDSI/ETL-CMS"%3Ethis OHDSI GitHub page%3C/a%3E. %3C/li%3E

    %3Cli%3EFor information regarding each of the 24 tables in this dataset, including more detailed variable metadata, see %3Ca href="https://github.com/OHDSI/CommonDataModel/wiki"%3Ethe OHDSI CDM GitHub Wiki page%3C/a%3E. All variable labels and descriptions as well as table descriptions come from this Wiki page. Note that this GitHub page includes information primarily regarding the 6.0 version of the CDM and that this dataset works with the 5.2 version. %3C/li%3E

  2. h

    CPRD Primary Care OMOP Common Data Model

    • healthdatagateway.org
    unknown
    Updated Dec 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CPRD (2024). CPRD Primary Care OMOP Common Data Model [Dataset]. http://doi.org/10.48329/6xtz-7b42
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Dec 15, 2024
    Dataset authored and provided by
    CPRD
    License

    HTTPS://CPRD.COM/DATA-ACCESSHTTPS://CPRD.COM/DATA-ACCESS

    Description

    The CPRD Primary Care OMOP CDM database contains longitudinal routinely-collected health records (EHR data) from UK primary care practices. The data has been transformed into a common format (data model) using an open community data standard and structure from the OHDSI standardised vocabularies.

  3. Synthea synthetic patient generator data in OMOP Common Data Model

    • registry.opendata.aws
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
    Explore at:
    Dataset updated
    Jan 4, 2023
    Dataset provided by
    Amazon.comhttp://amazon.com/
    Description

    The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079

  4. Domain

    • redivis.com
    Updated Sep 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2020). Domain [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
    Explore at:
    Dataset updated
    Sep 6, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    2008 - 2010
    Description

    The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables.

  5. f

    Data_Sheet_1_Enabling data sharing and utilization for African population...

    • figshare.com
    docx
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sylvia Kiwuwa-Muyingo; Jim Todd; Tathagata Bhattacharjee; Amelia Taylor; Jay Greenfield (2023). Data_Sheet_1_Enabling data sharing and utilization for African population health data using OHDSI tools with an OMOP-common data model.docx [Dataset]. http://doi.org/10.3389/fpubh.2023.1116682.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Frontiers
    Authors
    Sylvia Kiwuwa-Muyingo; Jim Todd; Tathagata Bhattacharjee; Amelia Taylor; Jay Greenfield
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The COVID-19 pandemic has spurred the use of AI and DS innovations in data collection and aggregation. Extensive data on many aspects of the COVID-19 has been collected and used to optimize public health response to the pandemic and to manage the recovery of patients in Sub-Saharan Africa. However, there is no standard mechanism for collecting, documenting and disseminating COVID-19 related data or metadata, which makes the use and reuse a challenge. INSPIRE utilizes the Observational Medical Outcomes Partnership (OMOP) as the Common Data Model (CDM) implemented in the cloud as a Platform as a Service (PaaS) for COVID-19 data. The INSPIRE PaaS for COVID-19 data leverages the cloud gateway for both individual research organizations and for data networks. Individual research institutions may choose to use the PaaS to access the FAIR data management, data analysis and data sharing capabilities which come with the OMOP CDM. Network data hubs may be interested in harmonizing data across localities using the CDM conditioned by the data ownership and data sharing agreements available under OMOP's federated model. The INSPIRE platform for evaluation of COVID-19 Harmonized data (PEACH) harmonizes data from Kenya and Malawi. Data sharing platforms must remain trusted digital spaces that protect human rights and foster citizens' participation is vital in an era where information overload from the internet exists. The channel for sharing data between localities is included in the PaaS and is based on data sharing agreements provided by the data producer. This allows the data producers to retain control over how their data are used, which can be further protected through the use of the federated CDM. Federated regional OMOP-CDM are based on the PaaS instances and analysis workbenches in INSPIRE-PEACH with harmonized analysis powered by the AI technologies in OMOP. These AI technologies can be used to discover and evaluate pathways that COVID-19 cohorts take through public health interventions and treatments. By using both the data mapping and terminology mapping, we construct ETLs that populate the data and/or metadata elements of the CDM, making the hub both a central model and a distributed model.

  6. Relationship

    • redivis.com
    Updated Sep 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2020). Relationship [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
    Explore at:
    Dataset updated
    Sep 6, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    2008 - 2010
    Description

    The RELATIONSHIP table provides a reference list of all types of relationships that can be used to associate any two concepts in the CONCEPT_RELATIONSHP table.

  7. h

    Synthea OMOP (CDM) - North East and North Cumbria

    • healthdatagateway.org
    unknown
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Synthea OMOP (CDM) - North East and North Cumbria [Dataset]. https://healthdatagateway.org/en/dataset/1351
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jun 17, 2025
    License

    https://northeastnorthcumbria.nhs.uk/our-work/secure-data-environment/https://northeastnorthcumbria.nhs.uk/our-work/secure-data-environment/

    Description

    Synthetic Primary Care Data (Synthea) transformed into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)

    Data is sourced from https://synthea.mitre.org/downloads using the 100 sample patient CSV variant of available downloads. Data has been transformed using the ETL methods described by https://github.com/OHDSI/ETL-Synthea

    This is a patient level dataset of Primary Care data covering 100 synthetic patients

  8. Data from: Drug exposure

    • redivis.com
    Updated Sep 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2020). Drug exposure [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
    Explore at:
    Dataset updated
    Sep 6, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    2008 - 2010
    Description

    The 'Drug' domain captures records about the utilization of a Drug when ingested or otherwise introduced into the body.

  9. Additional file 3 of Empirical assessment of alternative methods for...

    • springernature.figshare.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Molinaro; Frank DeFalco (2023). Additional file 3 of Empirical assessment of alternative methods for identifying seasonality in observational healthcare data [Dataset]. http://doi.org/10.6084/m9.figshare.20222597.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anthony Molinaro; Frank DeFalco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 3: upsetRplots.zip. All 30 UpsetR plots.

  10. f

    OMOP results as of 20/10/22.

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). OMOP results as of 20/10/22. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

  11. u

    Example (synthetic) electronic health record data

    • rdr.ucl.ac.uk
    application/csv
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Harris; Wai Shing Lai (2024). Example (synthetic) electronic health record data [Dataset]. http://doi.org/10.5522/04/25676298.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Apr 24, 2024
    Dataset provided by
    University College London
    Authors
    Steve Harris; Wai Shing Lai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These data are modelled using the OMOP Common Data Model v5.3.Correlated Data SourceNG tube vocabulariesGeneration RulesThe patient’s age should be between 18 and 100 at the moment of the visit.Ethnicity data is using 2021 census data in England and Wales (Census in England and Wales 2021) .Gender is equally distributed between Male and Female (50% each).Every person in the record has a link in procedure_occurrence with the concept “Checking the position of nasogastric tube using X-ray”2% of person records have a link in procedure_occurrence with the concept of “Plain chest X-ray”60% of visit_occurrence has visit concept “Inpatient Visit”, while 40% have “Emergency Room Visit”NotesVersion 0Generated by man-made rule/story generatorStructural correct, all tables linked with the relationshipWe used national ethnicity data to generate a realistic distribution (see below)2011 Race Census figure in England and WalesEthnic Group : Population(%)Asian or Asian British: Bangladeshi - 1.1Asian or Asian British: Chinese - 0.7Asian or Asian British: Indian - 3.1Asian or Asian British: Pakistani - 2.7Asian or Asian British: any other Asian background -1.6Black or African or Caribbean or Black British: African - 2.5Black or African or Caribbean or Black British: Caribbean - 1Black or African or Caribbean or Black British: other Black or African or Caribbean background - 0.5Mixed multiple ethnic groups: White and Asian - 0.8Mixed multiple ethnic groups: White and Black African - 0.4Mixed multiple ethnic groups: White and Black Caribbean - 0.9Mixed multiple ethnic groups: any other Mixed or multiple ethnic background - 0.8White: English or Welsh or Scottish or Northern Irish or British - 74.4White: Irish - 0.9White: Gypsy or Irish Traveller - 0.1White: any other White background - 6.4Other ethnic group: any other ethnic group - 1.6Other ethnic group: Arab - 0.6

  12. Person

    • redivis.com
    Updated Sep 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2020). Person [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
    Explore at:
    Dataset updated
    Sep 6, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    2008 - 2010
    Description

    The Person Domain contains records that uniquely identify each patient in the source data who is time at-risk to have clinical observations recorded within the source systems.

  13. Optum ZIP5 OMOP

    • redivis.com
    application/jsonl +7
    Updated Mar 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2021). Optum ZIP5 OMOP [Dataset]. http://doi.org/10.57761/e54r-bg69
    Explore at:
    sas, csv, arrow, application/jsonl, stata, spss, avro, parquetAvailable download formats
    Dataset updated
    Mar 3, 2021
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Description

    Abstract

    Optum ZIP5 v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/). This dataset covers 2003-Q1 to 2020-Q2

    Section 10

    A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

    • It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.
    • It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences.

    %3C!-- --%3E

    For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.v

    Conventions

    • Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.
    • Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.
    • Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.

    %3C!-- --%3E

    The text above is taken from the OMOP CDM v5.3 Specification document.

    Section 8

    The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables. For example, the "Condition" Domain contains Concepts that describe a condition of a patient, and these Concepts can only be stored in the condition_concept_id field of the CONDITION_OCCURRENCE and CONDITION_ERA tables. This reference table is populated with a single record for each Domain and includes a descriptive name for the Domain.

    Conventions

    • There is one record for each Domain. The domains are defined by the tables and fields in the OMOP CDM that can contain Concepts describing all the various aspects of the healthcare experience of a patient.
    • The domain_id field contains an alphanumerical identifier, that can also be used as the abbreviation of the Domain.
    • The domain_name field contains the unabbreviated names of the Domain.
    • Each Domain also has an entry in the Concept table, which is recorded in the domain_concept_id field. This is for purposes of creating a closed Information Model, where all entities in the OMOP CDM are covered by unique Concept.

    %3C!-- --%3E

    The text above is taken from the OMOP CDM v5.3 Specification document.

    Section 12

    A Drug Era is defined as a span of time when the Person is assumed to be exposed to a particular active ingredient. A Drug Era is not the same as a Drug Exposure: Exposures are individual records corresponding to the source when Drug was delivered to the Person, while successive periods of Drug Exposures are combined under certain rules to produce continuous Drug Eras.

    Conventions

    • Drug Eras are derived from records in the DRUG_EXPOSURE table using a standardized algorithm.
    • Each Drug Era corresponds to one or many Drug Exposures that form a continuous interval and contain the same Drug Ingredient (active compound).
    • The drug_concept_id field only contains Concepts that have the concept_class 'Ingredient'. The Ingredient is derived from the Drug Concepts in the DRUG_EXPOSURE table that are aggregated into the Drug Era record.
    • The Drug Era Start Date is the start date of the first Drug Exposure.
    • The Drug Era End Date is the end date of the last Drug Exposure. The End Date of each Drug Exposure is either taken from the field drug_exposure_end_date or, as it is typically not available, inferred using the following rules:
    • The Gap Days determine how many total drug-free days are observed between all Drug Exposure events that contribute to a DRUG_ERA record. It is assumed that the drugs are "not stockpiled" by the patient, i.e. that if a new drug prescription or refill is observed (a new DRUG_EXPOSURE record is written), the remaining supply from the previous events is abandoned.
    • The difference between Persistence Window and Gap Days is that the former is the maximum drug-free time allowed between two subsequent DRUG_EXPOSURE records, while the latter is the sum of actual drug-free days for the given Drug Era under the abo
  14. AFC OMOP DID

    • redivis.com
    application/jsonl +7
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2024). AFC OMOP DID [Dataset]. http://doi.org/10.57761/88ka-5r20
    Explore at:
    csv, avro, stata, arrow, application/jsonl, spss, sas, parquetAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Description

    Abstract

    This dataset is the American Family Cohort (AFC) Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) dataset.

    This dataset is a medium risk (confidential) de-identified dataset (OMOP DID).

    Note: A few updates have been made to the dataset in the August 2024 release. Please check the "Update Notes" section for more details.

    Usage

    For more details please go to:

    https://ohdsi.github.io/CommonDataModel/cdm54.html

    AFC OMOP Specifications

    Metadata access is required to view this section.

    Update Notes

    Metadata access is required to view this section.

  15. Addressing the Challenges of Health Data Standard Adoption and Usage: A...

    • zenodo.org
    bin
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Marfoglia; Alberto Marfoglia; Valerio Antonio Arcobelli; Valerio Antonio Arcobelli; SERENA MOSCATO; SERENA MOSCATO; Antonino Amedeo La Mattina; Antonino Amedeo La Mattina; Sabato Mellone; Sabato Mellone; ANTONELLA CARBONARO; ANTONELLA CARBONARO (2025). Addressing the Challenges of Health Data Standard Adoption and Usage: A Systematic Review - Data Extraction [Dataset]. http://doi.org/10.5281/zenodo.15358180
    Explore at:
    binAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alberto Marfoglia; Alberto Marfoglia; Valerio Antonio Arcobelli; Valerio Antonio Arcobelli; SERENA MOSCATO; SERENA MOSCATO; Antonino Amedeo La Mattina; Antonino Amedeo La Mattina; Sabato Mellone; Sabato Mellone; ANTONELLA CARBONARO; ANTONELLA CARBONARO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 7, 2025
    Description

    This table presents the data extraction from the 99 studies included according to the criteria outlined in the main manuscript. It is provided as supplementary material to enhance the readability of the paper while ensuring that all relevant information is preserved and accessible without loss of detail.

    The names of the variables and their descriptions are provided in the attached file, along with the following details:

    VariableDescription
    Ref.The citation in the format: First author et al. [Year] (e.g., AuthorA et al. [2022]). This identifies the study's primary citation for easy reference.
    TitleThe title of the paper
    StandardThe healthcare data standard used in the study. Possible values are: OMOP, OpenEHR, FHIR.
    Study LocationThe country where the study was conducted.
    Objective for using the standardDetailedThe comprehensive explanation of the specific objective of using the standard in the study, describing how it supports the study’s goals.
    ShortThe primary purpose for applying the healthcare standard. Possible values are: Secondary data reuse, Data exchange, Clinical decision support, Vocabulary definition, EHR system design,
    Application domainTypeThe application domain type that represents the healthcare standard. Possible solution are: Clinical: Studies with a direct impact on clinical practice, applying established tools or methods in healthcare settings (e.g., predicting in-hospital mortality for heart attack patients) and Research: Studies proposing innovative tools, methodologies, or frameworks still in the design/testing phase, not yet clinically implemented.
    Healthcare AreaThe relevant healthcare domain for the study, such as Cardiovascular, Intensive Care Unit, Emergency Department, Oncology, Biology, etc.
    ClusterThe healthcare domain clusterized for easier readability. Possible values include: Clinical Medicine, Clinical Services and Diagnostics, Public Health, Health Information Management and Biomedical Sciences
    UseThis report if the results of the paper serving a Primary use (direct care) or a Secondary use (repurposing existing data or tools for new objectives).
    ScaleThe scale of the study. Possible values are: Single center (one hospital/clinic), Multi-center (multiple institutions), Regional (specific region), National level (countrywide).
    Dataset magnitude in patientsThe magnitude of the dataset expressed in chars. Possible values are: A (<10 to 99), B (100 to 9,999), C (10,000 to 999,999) and D (1,000,000 and above).
    N° ElementsThe number of variables of input in the process of standardization.
    Percentuage of mapped variablesThe percentage of successful data standardisation.
    Coverage of the standardThe methodology of standardisation wheter it was adapted or not.
    ETL ToolsData cleaning & extractionThe tools adopted for supporting data cleaning and extraction.
    MappingThe tools adopted for the mapping of the variables.
    ValidationThe tools adopted for the validation of the standardization process.
    DatabaseThe database adopted for storing the result of the healthcare data standardization.
    Process efficiency and Economic assessmentThe information about the economic impact if the consequences are concrete and measured by the authors (e.g., actual cost savings, resource usage reductions). If the authors did not measure the economic impact, this field remains blank.
    Comments by authorsLimitationsThe significant limitations or challenges faced during the study about the standard adopted, such as issues with data compatibility, scalability, or the need for customization.
    AdvantagesThe benefits of applying the standard model, such as improved data consistency, enhanced clinical outcomes, better interoperability, or more efficient workflows.
  16. Genomics England - OMOP CDM

    • healthdatagateway.org
    unknown
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Genomics England Ltd (2024). Genomics England - OMOP CDM [Dataset]. https://healthdatagateway.org/en/dataset/373
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    Genomics England
    Authors
    Genomics England Ltd
    License

    https://www.genomicsengland.co.uk/research/academichttps://www.genomicsengland.co.uk/research/academic

    Description

    Genomics England 100k data in OMOP CDM v5.4 format. Includes 100k data and PHE NCRAS data.

  17. f

    EMR tables and related tables in the OMOP CDM.

    • figshare.com
    xls
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). EMR tables and related tables in the OMOP CDM. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

  18. OMOP2OBO Condition Occurrence Mappings

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Mar 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiffany J Callahan; Tiffany J Callahan; Jordan M Wyrwa; Jordan M Wyrwa; Nicole A Vasilevsky; Nicole A Vasilevsky; Tellen D Bennett; Tellen D Bennett; Blake Martin; James A Feinstein; James A Feinstein; William A Baumgartner; William A Baumgartner; Lawrence D Hunter; Lawrence D Hunter; Michael G Kahn; Michael G Kahn; Blake Martin (2023). OMOP2OBO Condition Occurrence Mappings [Dataset]. http://doi.org/10.5281/zenodo.7250177
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tiffany J Callahan; Tiffany J Callahan; Jordan M Wyrwa; Jordan M Wyrwa; Nicole A Vasilevsky; Nicole A Vasilevsky; Tellen D Bennett; Tellen D Bennett; Blake Martin; James A Feinstein; James A Feinstein; William A Baumgartner; William A Baumgartner; Lawrence D Hunter; Lawrence D Hunter; Michael G Kahn; Michael G Kahn; Blake Martin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    OMOP2OBO Condition Occurrence Mappings V1.0

    These mappings were created by the OMOP2OBO mapping algorithm (see links below). OMOP2OBO - the first health system-wide, disease-agnostic mappings between standardized clinical terminologies and eight Open Biomedical Ontology (OBO) Foundry ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, vaccines, and proteins. These mappings are also the first to be explicitly created using standard terminologies in the Observational Medical Outcomes (OMOP) common data model (CDM), ensuring both semantic and clinical interoperability across a space of N conditions (and N relationships curated in these ontologies).

    The mappings in this repository were created between OMOP standard condition occurrence concepts (i.e., SNOMED CT) to the Human Phenotype Ontology (HPO) and the (Mondo). The National Library of Medicine's Unified Medical Language System (UMLS) Semantic Types are first used to filter out all concepts that did not have a biological origin (accidents, injuries, external complications, and findings without clear interpretations). Then, the Semantic Type was used to prioritize the mapping of HPO concepts to findings and symptoms and Mondo to Semantic Types indicative of disease. For these OMOP domains, owl:intersectionOf (“and”), and owl:unionOf (“or”) constructors were used to construct semantically expressive mappings.


    Mapping Details
    Mappings included in this set were generated automatically using OMOP2OBO or through the use of a Bag-of-words embedding model using TF-IDF. Cosine similarity is used to compute similarity scores between all pairwise combinations of OMOP and OBO concepts and ancestor concepts. To improve the efficiency of this process, the algorithm searches only the top 𝑛 most similar results and keeps the top 75th percentile among all pairs with scores >= 0.25. Manually created mappings are also included.

    Mapping Categories

    • Automatic One-to-One Concept: Exact label or synonym, dbXRef, or expert validated mapping @ concept-level; 1:1
    • Automatic One-to-One Ancestor: Exact label or synonym, dbXRef, or expert validated mapping @ concept ancestor-level; 1:1
    • Automatic One-to-Many Concept: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
    • Automatic One-to-Many Ancestor: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
    • Manual One-to-One: Hand mapping created using expert suggested resources; 1:1
    • Manual One-to-Many: Hand mapping created using expert suggested resources; 1:Many
    • Cosine Similarity: score suggested mapping -- manually verified
    • UnMapped: No suitable mapping or not mapped type


    Mapping Statistics
    Additional statistics have been provided for the mappings and are shown in the table below. This table presents the counts of OMOP concepts by mapping category and ontology:

    Mapping CategoryHPOMondo
    Automatic One-to-One Concept47679097
    Automatic One-to-Many Concept150885
    Cosine Similarity1375667
    Automatic One-to-One Ancestor135958911
    Automatic One-to-Many Ancestor 3808040224
    Manual5131755
    Manual One-to-Many103262835
    Unmapped3630146345


    Provenance and Versioning: The V1.0 deposited mappings were created by OMOP2OBO v1.0.0 on October 2022 using the OMOP Common Data Model V5.0 and OBO Foundry ontologies downloaded on September 14, 2020.

    Caveats: The deposited files only contain the mappings that were generated automatically by the algorithm. The manually generated mappings will be deposited with the official preprint manuscript. Please note that these are the original mappings that were created for the preprint. They have not been updated to current versions of the ontologies. In our experience, this should result in very few errors, but we do suggest that you check the ontology concepts used against current versions of each ontology before using them.

    Important Resources and Documentation

  19. f

    Medication table mappings.

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). Medication table mappings. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

  20. h

    Connected Bradford - Secondary Care BRI OMOP database

    • healthdatagateway.org
    unknown
    Updated Jan 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Connected Bradford. Yorkshire & Humber Secure Data Environment. (2025). Connected Bradford - Secondary Care BRI OMOP database [Dataset]. https://healthdatagateway.org/en/dataset/1101
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jan 31, 2025
    Dataset authored and provided by
    Connected Bradford. Yorkshire & Humber Secure Data Environment.
    License

    https://bradfordresearch.nhs.uk/connected-bradford/https://bradfordresearch.nhs.uk/connected-bradford/

    Description

    This dataset is an extract from the Bradford Royal Infirmary EPR system. This contains current and some historical data, and is based on extracting the relevant tables from EPR, mapping to the OMOP schema and outputting in omop cdm 5.3 format.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Redivis Demo Organization (2020). CMS Synthetic Patient Data OMOP [Dataset]. https://redivis.com/datasets/ye2v-6skh7wdr7
Organization logo

CMS Synthetic Patient Data OMOP

Explore at:
sas, avro, parquet, stata, application/jsonl, arrow, csv, spssAvailable download formats
Dataset updated
Aug 19, 2020
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Jan 1, 2008 - Dec 31, 2010
Description

Abstract

This is a synthetic patient dataset in the OMOP Common Data Model v5.2, originally released by the CMS and accessed via BigQuery. The dataset includes 24 tables and records for 2 million synthetic patients from 2008 to 2010.

Methodology

This dataset takes on the format of the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). As shown in the diagram below, the purpose of the Common Data Model is to convert various distinctly-formatted datasets into a well-known, universal format with a set of standardized vocabularies. See the diagram below from the Observational Health Data Sciences and Informatics (OHDSI) webpage.

https://redivis.com/fileUploads/d1a95a4e-074a-44d1-92e5-9adfd2f4068a%3E" alt="Why-CDM.png">

Such universal data models ultimately enable researchers to streamline the analysis of observational medical data. For more information regarding the OMOP CDM, refer to the OHSDI OMOP site.

Usage

%3Cli%3EFor documentation regarding the source data format from the Center for Medicare and Medicaid Services (CMS), refer to the %3Ca href="https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF"%3ECMS Synthetic Public Use File%3C/a%3E.%3C/li%3E

%3Cli%3EFor information regarding the conversion of the CMS data file to the OMOP CDM v5.2, refer to %3Ca href="https://github.com/OHDSI/ETL-CMS"%3Ethis OHDSI GitHub page%3C/a%3E. %3C/li%3E

%3Cli%3EFor information regarding each of the 24 tables in this dataset, including more detailed variable metadata, see %3Ca href="https://github.com/OHDSI/CommonDataModel/wiki"%3Ethe OHDSI CDM GitHub Wiki page%3C/a%3E. All variable labels and descriptions as well as table descriptions come from this Wiki page. Note that this GitHub page includes information primarily regarding the 6.0 version of the CDM and that this dataset works with the 5.2 version. %3C/li%3E

Search
Clear search
Close search
Google apps
Main menu