17 datasets found
  1. Synthea synthetic patient generator data in OMOP Common Data Model

    • registry.opendata.aws
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
    Explore at:
    Dataset updated
    Jan 4, 2023
    Dataset provided by
    Amazon.comhttp://amazon.com/
    Description

    The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079

  2. f

    An example of ER visit event logs.

    • plos.figshare.com
    xls
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim (2023). An example of ER visit event logs. [Dataset]. http://doi.org/10.1371/journal.pone.0279641.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example of ER visit event logs.

  3. f

    OMOP primary database assessment of risk.

    • figshare.com
    xls
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). OMOP primary database assessment of risk. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

  4. Z

    Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare Graph: Public Data, Common Data Models, and Practical Instantiation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2641232
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Miller, Mark Andrew
    Stoeckert, Chirstian
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These RDF triples (synthea_graph_exportable.nq.zip) are the result of modeling electronic health records (synthea_csv_output_turbo_cannonical.zip), that were synthesized with the Synthea software (https://github.com/synthetichealth/synthea). Anyone who loads them into a triplestore database is encouraged to provide feedback at https://github.com/PennTURBO/EhrGraphCollab/issues. The following abstract comes from a paper, describing the semantic instantiation process, and presented to the ICBO 2019 conference (https://drive.google.com/file/d/1eYXTBl75Wx3XPMmCIOZba-8Cv0DIhlRq/view).

    ABSTRACT: There is ample literature on the semantic modeling of biomedical data in general, but less has been published on realism-based, semantic instantiation of electronic health records (EHR). Reasons include difficult design decisions and issues of data governance. A collaborative approach can address design and technology utilization issues, but is especially constrained by limited access to the data at hand: protected health information.

    Effective collaboration can be facilitated by public EHR-like data sets, which would ideally include a large variety of datatypes mirroring actual EHRs and enough records to drive a performance assessment. An investment into reading public EHR-like data from a popular common data model (CDM) is preferable over reading each public data set’s native format.

    In addition to identifying suitable public EHR-like data sets and CDMs, this paper addresses instantiation via relational-to-RDF mapping. The completed instantiation is available for download, and a competency question demonstrates fidelity across all discussed formats.

  5. EHRSHOT

    • redivis.com
    application/jsonl +7
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shah Lab (2025). EHRSHOT [Dataset]. http://doi.org/10.57761/0gv9-nd83
    Explore at:
    avro, sas, parquet, spss, csv, stata, arrow, application/jsonlAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Shah Lab
    Description

    Abstract

    👂💉 EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.

    ⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab

    ⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.

    Methodology

    1. 📖 Overview

    EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:

    • **6,739 **patients
    • 41.6 million clinical events
    • 921,499 visits
    • 15 prediction tasks

    %3C!-- --%3E

    2. 💽 Dataset

    EHRSHOT is sourced from Stanford’s STARR-OMOP database.

    • Data follows the OMOP CDM and is fully de-identified.
    • Unlike most other EHR research datasets, EHRSHOT is not restricted to ED/ICU visits and instead includes longitudinal patient data for all hospital encounter types.
    • EHRSHOT does not contain clinical notes or images.

    %3C!-- --%3E

    We provide two versions of the dataset:

    • EHRSHOT-Original is the same exact dataset used in the original EHRSHOT paper.
    • EHRSHOT-OMOP is a more complete version of the EHRSHOT dataset which includes all OMOP CDM tables and additional OMOP metadata.

    %3C!-- --%3E

    To access the raw data, please see the "Tables" and "Files"** **tabs above:

    3. 💽 Data Files and Formats

    We provide EHRSHOT in two file formats:

    • OMOP CDM v5.4
    • Medical Event Data Standard (MEDS)

    %3C!-- --%3E

    Within the "Tables" tab...

    1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.

    Within the "Files" tab...

    1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-Original

    * Data Format: FEMR 0.1.16

    * Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.

    2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-Original

    * Data Format: MEDS 0.3.3

    * Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.

    3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8

    * Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.

    4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8

    * Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.

    4. 🤖 Model

    We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base

    **5. 🧑‍💻 Code **

    Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/

    Usage

    **NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **

    Access to the EHRSHOT dataset requires the following:

    • Verified Affiliation with an **Academic, Government, **o
  6. f

    An example of inpatient visit event logs.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim (2023). An example of inpatient visit event logs. [Dataset]. http://doi.org/10.1371/journal.pone.0279641.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example of inpatient visit event logs.

  7. f

    Table_2_Streamlining intersectoral provision of real-world health data: a...

    • frontiersin.figshare.com
    • figshare.com
    application/csv
    Updated Jun 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katja Hoffmann; Igor Nesterow; Yuan Peng; Elisa Henke; Daniela Barnett; Cigdem Klengel; Mirko Gruhl; Martin Bartos; Frank Nüßler; Richard Gebler; Sophia Grummt; Anne Seim; Franziska Bathelt; Ines Reinecke; Markus Wolfien; Jens Weidner; Martin Sedlmayr (2024). Table_2_Streamlining intersectoral provision of real-world health data: a service platform for improved clinical research and patient care.CSV [Dataset]. http://doi.org/10.3389/fmed.2024.1377209.s002
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Frontiers
    Authors
    Katja Hoffmann; Igor Nesterow; Yuan Peng; Elisa Henke; Daniela Barnett; Cigdem Klengel; Mirko Gruhl; Martin Bartos; Frank Nüßler; Richard Gebler; Sophia Grummt; Anne Seim; Franziska Bathelt; Ines Reinecke; Markus Wolfien; Jens Weidner; Martin Sedlmayr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionObtaining real-world data from routine clinical care is of growing interest for scientific research and personalized medicine. Despite the abundance of medical data across various facilities — including hospitals, outpatient clinics, and physician practices — the intersectoral exchange of information remains largely hindered due to differences in data structure, content, and adherence to data protection regulations. In response to this challenge, the Medical Informatics Initiative (MII) was launched in Germany, focusing initially on university hospitals to foster the exchange and utilization of real-world data through the development of standardized methods and tools, including the creation of a common core dataset. Our aim, as part of the Medical Informatics Research Hub in Saxony (MiHUBx), is to extend the MII concepts to non-university healthcare providers in a more seamless manner to enable the exchange of real-world data among intersectoral medical sites.MethodsWe investigated what services are needed to facilitate the provision of harmonized real-world data for cross-site research. On this basis, we designed a Service Platform Prototype that hosts services for data harmonization, adhering to the globally recognized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) international standard communication format and the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). Leveraging these standards, we implemented additional services facilitating data utilization, exchange and analysis. Throughout the development phase, we collaborated with an interdisciplinary team of experts from the fields of system administration, software engineering and technology acceptance to ensure that the solution is sustainable and reusable in the long term.ResultsWe have developed the pre-built packages “ResearchData-to-FHIR,” “FHIR-to-OMOP,” and “Addons,” which provide the services for data harmonization and provision of project-related real-world data in both the FHIR MII Core dataset format (CDS) and the OMOP CDM format as well as utilization and a Service Platform Prototype to streamline data management and use.ConclusionOur development shows a possible approach to extend the MII concepts to non-university healthcare providers to enable cross-site research on real-world data. Our Service Platform Prototype can thus pave the way for intersectoral data sharing, federated analysis, and provision of SMART-on-FHIR applications to support clinical decision making.

  8. Optum DOD OMOP

    • redivis.com
    application/jsonl +7
    Updated Aug 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Optum DOD OMOP [Dataset]. http://doi.org/10.57761/dbqm-8c86
    Explore at:
    csv, avro, sas, spss, parquet, stata, arrow, application/jsonlAvailable download formats
    Dataset updated
    Aug 18, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Description

    Abstract

    Optum DOD (Date of Death) v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/)

    Section 10

    A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

    • It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.
    • It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences.

    %3C!-- --%3E

    For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.

    Conventions

    • Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.
    • Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.
    • Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.

    %3C!-- --%3E

    The text above is taken from the OMOP CDM v5.3 Specification document.

    Section 5

    The CONCEPT_ANCESTOR table is designed to simplify observational analysis by providing the complete hierarchical relationships between Concepts. Only direct parent-child relationships between Concepts are stored in the CONCEPT_RELATIONSHIP table. To determine higher level ancestry connections, all individual direct relationships would have to be navigated at analysis time. The CONCEPT_ANCESTOR table includes records for all parent-child relationships, as well as grandparent-grandchild relationships and those of any other level of lineage.

    Using the CONCEPT_ANCESTOR table allows for querying for all descendants of a hierarchical concept. For example, drug ingredients and drug products are all descendants of a drug class ancestor.

    Conventions

    • The concept_name field contains a valid Synonym of a concept, including the description in the concept_name itself. I.e. each Concept has at least one Synonym in the CONCEPT_SYNONYM table. As an example, for a SNOMED-CT Concept, if the fully specified name is stored as the concept_name of the CONCEPT table, then the Preferred Term and Synonyms associated with the Concept are stored in the CONCEPT_SYNONYM table.
    • Only Synonyms that are active and current are stored in the CONCEPT_SYNONYM table. Tracking synonym/description history and mapping of obsolete synonyms to current Concepts/Synonyms is out of scope for the Standard Vocabularies.
    • Currently, only English Synonyms are included.

    %3C!-- --%3E

    The text above is taken from the OMOP CDM v5.3 Specification document.

    Section 4

    The COST table captures records containing the cost of any medical entity recorded in one of the DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, VISIT_OCCURRENCE or DEVICE_OCCURRENCE tables.

    The information about the cost is defined by the amount of money paid by the Person and Payer, or as the charged cost by the healthcare provider. So, the COST table can be used to represent both cost and revenue perspectives. The cost_type_concept_id field will use concepts in the Standardized Vocabularies to designate the source of the cost data. A reference to the health plan information in the PAYER_PLAN_PERIOD table is stored in the record that is responsible for the determination of the cost as well as some of the payments.

    Convention

    The COST table will store information reporting money or currency amounts. There are three types of cost data, defined in the cost_type_concept_id: 1) paid or reimbursed amounts, 2) charges or list prices (such as Average Wholesale Prices), and 3) costs or expenses incurred by the provider. The defined fields are variables found in almost all U.S.-based claims data sources, which is the most common data source for researchers. Non-U.S.-based data holders are encouraged to engage with OHDSI to adjust these tables to their needs.

    One cost record is generated for each response by a payer. In a claims databases, the payment and payment terms reported by the payer for the goods or services billed will generate one cost record. If the source data has payment information f

  9. d

    Data from: The COVID-19 trial finder

    • datadryad.org
    • zenodo.org
    zip
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingcheng Sun; Alex Butler; Fengyang Lin; Hao Liu; Latoya Stewart; Jae Hyun Kim; Betina Ross Idnay; Qingyin Ge; Xinyi Wei; Cong Liu; Chi Yuan; Chunhua Weng (2021). The COVID-19 trial finder [Dataset]. http://doi.org/10.5061/dryad.7h44j0zs9
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 16, 2021
    Dataset provided by
    Dryad
    Authors
    Yingcheng Sun; Alex Butler; Fengyang Lin; Hao Liu; Latoya Stewart; Jae Hyun Kim; Betina Ross Idnay; Qingyin Ge; Xinyi Wei; Cong Liu; Chi Yuan; Chunhua Weng
    Time period covered
    2020
    Description

    The dataset contains 581 structured COVID-19 clinical trials. Entities in each trial are extracted and mapped to standard concepts in five domains (condition, device, drug, measurement, and procedure) following Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The mapped concepts are used as semantic tags for trial indexing. Definitions of the dataset columns are described in the README file.

  10. OMOP2OBO Drug Exposure Ingredient Mappings

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin
    Updated Mar 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiffany J Callahan; Tiffany J Callahan; William A Baumgartner; William A Baumgartner; Lawrence D Hunter; Lawrence D Hunter; Michael G Kahn; Michael G Kahn (2023). OMOP2OBO Drug Exposure Ingredient Mappings [Dataset]. http://doi.org/10.5281/zenodo.6774402
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tiffany J Callahan; Tiffany J Callahan; William A Baumgartner; William A Baumgartner; Lawrence D Hunter; Lawrence D Hunter; Michael G Kahn; Michael G Kahn
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    OMOP2OBO Drug Exposure Ingredient Mappings V1.0

    These mappings were created by the OMOP2OBO mapping algorithm (see links below). OMOP2OBO - the first health system-wide, disease-agnostic mappings between standardized clinical terminologies and eight Open Biomedical Ontology (OBO) Foundry ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, vaccines, and proteins. These mappings are also the first to be explicitly created using standard terminologies in the Observational Medical Outcomes (OMOP) common data model (CDM), ensuring both semantic and clinical interoperability across a space of N conditions [and N relationships curated in these ontologies].

    The mappings in this repository were created between OMOP standard drug exposure concepts at the ingredient-level (i.e., RxNorm) to the Chemical Entities of Biological Interest (ChEBI), the National Center for Biotechnology Information Taxon Ontology (NCBITaxon), the Protein Ontology (PRO), and the Vaccine Ontology (VO). All concepts were aligned to at least one ChEBI concept and the remaining ontologies (NCBITaxon, PR, and VO) were mapped by their drug class and/or type (e.g., biologics versus vaccines). For these OMOP domains, owl:intersectionOf (“and”), and owl:unionOf (“or”) constructors were used to construct semantically expressive mappings.


    Mapping Details
    Mappings included in this set were generated automatically using OMOP2OBO or through the use of a Bag-of-words embedding model using TF-IDF. Cosine similarity is used to compute similarity scores between all pairwise combinations of OMOP and OBO concepts and ancestor concepts. To improve the efficiency of this process, the algorithm searches only the top 𝑛 most similar results and keeps the top 75th percentile among all pairs with scores >= 0.25.

    Mapping Categories

    • Automatic Exact - Concept: Exact label or synonym, dbXRef, or expert validated mapping @ concept-level; 1:1
    • Automatic Exact - Ancestor: Exact label or synonym, dbXRef, or expert validated mapping @ concept ancestor-level; 1:1
    • Automatic Constructor - Concept: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
    • Automatic Constructor - Ancestor: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
    • Manual: Hand mapping created using expert suggested resources; 1:1
    • Manual Constructor: Hand mapping created using expert suggested resources; 1:Many
    • Concept Similarity: score suggested mapping -- manually verified
    • UnMapped: No suitable mapping or not mapped type

    Mapping Statistics
    Additional statistics have been provided for the mappings and are shown in the table below. This table presents the counts of OMOP concepts by mapping category and ontology:

    Mapping categoryChEBINCBITaxonPROVO
    Automatic Exact - Concept315115543108
    Automatic Constructor - Constructor404110
    Automatic Exact - Ancestor14717204
    Automatic Constructor - Ancestor210322
    Concept Similarity10942411817
    Manual32223015721
    Manual Constructor721482
    UnMapped739271461155811653


    Provenance and Versioning: The V1.0 deposited mappings were created by OMOP2OBO v1.0.0 on October 2022 using the OMOP Common Data Model V5.0 and OBO Foundry ontologies downloaded on September 14, 2020.

    Caveats: Please note that these are the original mappings that were created for the preprint. They have not been updated to current versions of the ontologies. In our experience, this should result in very few errors, but we do suggest that you check the ontology concepts used against current versions of each ontology before using them.

    Important Resources and Documentation

  11. An example of outpatient visit event logs.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example of outpatient visit event logs.

  12. Hospital Chargemasters

    • data.chhs.ca.gov
    • data.ca.gov
    • +1more
    zip
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2024). Hospital Chargemasters [Dataset]. https://data.chhs.ca.gov/dataset/chargemasters
    Explore at:
    zip(271130648), zip(271072163), zip(242190556), zip(883069869), zip(689244251), zip(256914973), zip(243189626), zip(264486994), zip(564467341), zip(263064822), zip(367638205), zip(261492388), zip(237780723), zip(226308410)Available download formats
    Dataset updated
    Oct 7, 2024
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    This dataset contains Hospital Chargemasters with prices in effect as of June 1 of their reporting year. Chargemasters consists of a list of average charges for 25 common outpatient procedures, and the estimated percentage change in gross revenue due to price changes each July 1.

    For more on HCAI Chargemaster Data.

  13. f

    Model performance across OHDSI data network.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiong Wang; Jenna M. Reps; Kristin Feeney Kostka; Patrick B. Ryan; Yuhui Zou; Erica A. Voss; Peter R. Rijnbeek; RuiJun Chen; Gowtham A. Rao; Henry Morgan Stewart; Andrew E. Williams; Ross D. Williams; Mui Van Zandt; Thomas Falconer; Margarita Fernandez-Chas; Rohit Vashisht; Stephen R. Pfohl; Nigam H. Shah; Suranga N. Kasthurirathne; Seng Chan You; Qing Jiang; Christian Reich; Yi Zhou (2023). Model performance across OHDSI data network. [Dataset]. http://doi.org/10.1371/journal.pone.0226718.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Qiong Wang; Jenna M. Reps; Kristin Feeney Kostka; Patrick B. Ryan; Yuhui Zou; Erica A. Voss; Peter R. Rijnbeek; RuiJun Chen; Gowtham A. Rao; Henry Morgan Stewart; Andrew E. Williams; Ross D. Williams; Mui Van Zandt; Thomas Falconer; Margarita Fernandez-Chas; Rohit Vashisht; Stephen R. Pfohl; Nigam H. Shah; Suranga N. Kasthurirathne; Seng Chan You; Qing Jiang; Christian Reich; Yi Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model performance across OHDSI data network.

  14. f

    EMR tables and related tables in the OMOP CDM.

    • figshare.com
    xls
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). EMR tables and related tables in the OMOP CDM. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

  15. P

    Global Chronic Disease Management Market Industry Best Practices 2025-2032

    • statsndata.org
    excel, pdf
    Updated Feb 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Chronic Disease Management Market Industry Best Practices 2025-2032 [Dataset]. https://www.statsndata.org/report/chronic-disease-management-market-53656
    Explore at:
    excel, pdfAvailable download formats
    Dataset updated
    Feb 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Chronic Disease Management (CDM) market plays a critical role in the healthcare landscape, addressing the growing prevalence of chronic conditions such as diabetes, cardiovascular diseases, COPD, and mental health disorders. With the increasing aging population and rising lifestyle-related diseases, the market h

  16. f

    Data in the data repository and the resultant OMOP CDM after conversion.

    • plos.figshare.com
    xls
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). Data in the data repository and the resultant OMOP CDM after conversion. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data in the data repository and the resultant OMOP CDM after conversion.

  17. Supplementary Material for: The Direct Medical Cost of Essential Tremor

    • karger.figshare.com
    docx
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kapinos K.A.; Louis E.D. (2024). Supplementary Material for: The Direct Medical Cost of Essential Tremor [Dataset]. http://doi.org/10.6084/m9.figshare.27209181.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Oct 11, 2024
    Dataset provided by
    Karger Publishershttp://www.karger.com/
    Authors
    Kapinos K.A.; Louis E.D.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Objectives: To determine the direct medical cost of illness from essential tremor (ET) from a patient perspective. Methods: Secondary data from the Optum’s de-identified Clinformatics® Data Mart Database (CDM) from 2018-2019 was used to assess medical resource utilization and costs. Propensity score matching was used to match patients age 40+ with to statistically similar controls. Generalized linear models were used to estimate average, adjusted total costs of care per year, by health care setting, and provider specialty. Results: The final sample included 41,200 patients with at least one ET claim and 36,871 matched patients. Overall, ET patients ages 40+ had about $28,217 in direct medical costs per year, which was about $1,601 more than matched comparisons (p < 0.001). This was driven by greater number of outpatient visits overall and with specialists. Extrapolating the estimates from our study and pairing them with published age-specific disease prevalence statistics for ET, we calculated an annual cost for direct medical care of ET patients ages 40+ to be about $9.4 billion. Conclusion: The estimated direct medical costs among adults age 40+ with an ET diagnosis aggregated to the population-level are non-trivial.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
Organization logo

Synthea synthetic patient generator data in OMOP Common Data Model

Explore at:
Dataset updated
Jan 4, 2023
Dataset provided by
Amazon.comhttp://amazon.com/
Description

The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079

Search
Clear search
Close search
Google apps
Main menu