17 datasets found

Synthea synthetic patient generator data in OMOP Common Data Model
registry.opendata.aws
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
Explore at:
Dataset updated
Jan 4, 2023
Dataset provided by
Amazon.comhttp://amazon.com/
Description
The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079
f
An example of ER visit event logs.
plos.figshare.com
xls
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim (2023). An example of ER visit event logs. [Dataset]. http://doi.org/10.1371/journal.pone.0279641.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0279641.t004
Dataset updated
Jun 19, 2023
Dataset provided by
PLOS ONE
Authors
Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An example of ER visit event logs.
f
OMOP primary database assessment of risk.
figshare.com
xls
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). OMOP primary database assessment of risk. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t002
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
Z
Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare Graph: Public Data, Common Data Models, and Practical Instantiation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2641232
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Miller, Mark Andrew
Stoeckert, Chirstian
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These RDF triples (synthea_graph_exportable.nq.zip) are the result of modeling electronic health records (synthea_csv_output_turbo_cannonical.zip), that were synthesized with the Synthea software (https://github.com/synthetichealth/synthea). Anyone who loads them into a triplestore database is encouraged to provide feedback at https://github.com/PennTURBO/EhrGraphCollab/issues. The following abstract comes from a paper, describing the semantic instantiation process, and presented to the ICBO 2019 conference (https://drive.google.com/file/d/1eYXTBl75Wx3XPMmCIOZba-8Cv0DIhlRq/view).

ABSTRACT: There is ample literature on the semantic modeling of biomedical data in general, but less has been published on realism-based, semantic instantiation of electronic health records (EHR). Reasons include difficult design decisions and issues of data governance. A collaborative approach can address design and technology utilization issues, but is especially constrained by limited access to the data at hand: protected health information.

Effective collaboration can be facilitated by public EHR-like data sets, which would ideally include a large variety of datatypes mirroring actual EHRs and enough records to drive a performance assessment. An investment into reading public EHR-like data from a popular common data model (CDM) is preferable over reading each public data set’s native format.

In addition to identifying suitable public EHR-like data sets and CDMs, this paper addresses instantiation via relational-to-RDF mapping. The completed instantiation is available for download, and a competency question demonstrates fidelity across all discussed formats.
EHRSHOT
redivis.com
application/jsonl +7
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shah Lab (2025). EHRSHOT [Dataset]. http://doi.org/10.57761/0gv9-nd83
Explore at:
avro, sas, parquet, spss, csv, stata, arrow, application/jsonlAvailable download formats
Unique identifier
https://doi.org/10.57761/0gv9-nd83
Dataset updated
Feb 13, 2025
Dataset provided by
Redivis Inc.
Authors
Shah Lab
Description
Abstract

👂💉 EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.

⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab

⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.

Methodology

1. 📖 Overview

EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:

**6,739 **patients

41.6 million clinical events

921,499 visits

15 prediction tasks

%3C!-- --%3E

2. 💽 Dataset

EHRSHOT is sourced from Stanford’s STARR-OMOP database.

Data follows the OMOP CDM and is fully de-identified.

Unlike most other EHR research datasets, EHRSHOT is not restricted to ED/ICU visits and instead includes longitudinal patient data for all hospital encounter types.

EHRSHOT does not contain clinical notes or images.

%3C!-- --%3E

We provide two versions of the dataset:

EHRSHOT-Original is the same exact dataset used in the original EHRSHOT paper.

EHRSHOT-OMOP is a more complete version of the EHRSHOT dataset which includes all OMOP CDM tables and additional OMOP metadata.

%3C!-- --%3E

To access the raw data, please see the "Tables" and "Files"** **tabs above:

3. 💽 Data Files and Formats

We provide EHRSHOT in two file formats:

OMOP CDM v5.4

Medical Event Data Standard (MEDS)

%3C!-- --%3E

Within the "Tables" tab...

1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.

Within the "Files" tab...

1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E

* Dataset Version: EHRSHOT-Original

* Data Format: FEMR 0.1.16

* Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.

2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E

* Dataset Version: EHRSHOT-Original

* Data Format: MEDS 0.3.3

* Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.

3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8

* Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.

4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E

* Dataset Version: EHRSHOT-OMOP

* Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8

* Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.

4. 🤖 Model

We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base

**5. 🧑‍💻 Code **

Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/

Usage

**NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **

Access to the EHRSHOT dataset requires the following:

Verified Affiliation with an **Academic, Government, **o
f
An example of inpatient visit event logs.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim (2023). An example of inpatient visit event logs. [Dataset]. http://doi.org/10.1371/journal.pone.0279641.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0279641.t001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An example of inpatient visit event logs.
f
Table_2_Streamlining intersectoral provision of real-world health data: a...
frontiersin.figshare.com
figshare.com
application/csv
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katja Hoffmann; Igor Nesterow; Yuan Peng; Elisa Henke; Daniela Barnett; Cigdem Klengel; Mirko Gruhl; Martin Bartos; Frank Nüßler; Richard Gebler; Sophia Grummt; Anne Seim; Franziska Bathelt; Ines Reinecke; Markus Wolfien; Jens Weidner; Martin Sedlmayr (2024). Table_2_Streamlining intersectoral provision of real-world health data: a service platform for improved clinical research and patient care.CSV [Dataset]. http://doi.org/10.3389/fmed.2024.1377209.s002
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2024.1377209.s002
Dataset updated
Jun 5, 2024
Dataset provided by
Frontiers
Authors
Katja Hoffmann; Igor Nesterow; Yuan Peng; Elisa Henke; Daniela Barnett; Cigdem Klengel; Mirko Gruhl; Martin Bartos; Frank Nüßler; Richard Gebler; Sophia Grummt; Anne Seim; Franziska Bathelt; Ines Reinecke; Markus Wolfien; Jens Weidner; Martin Sedlmayr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionObtaining real-world data from routine clinical care is of growing interest for scientific research and personalized medicine. Despite the abundance of medical data across various facilities — including hospitals, outpatient clinics, and physician practices — the intersectoral exchange of information remains largely hindered due to differences in data structure, content, and adherence to data protection regulations. In response to this challenge, the Medical Informatics Initiative (MII) was launched in Germany, focusing initially on university hospitals to foster the exchange and utilization of real-world data through the development of standardized methods and tools, including the creation of a common core dataset. Our aim, as part of the Medical Informatics Research Hub in Saxony (MiHUBx), is to extend the MII concepts to non-university healthcare providers in a more seamless manner to enable the exchange of real-world data among intersectoral medical sites.MethodsWe investigated what services are needed to facilitate the provision of harmonized real-world data for cross-site research. On this basis, we designed a Service Platform Prototype that hosts services for data harmonization, adhering to the globally recognized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) international standard communication format and the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). Leveraging these standards, we implemented additional services facilitating data utilization, exchange and analysis. Throughout the development phase, we collaborated with an interdisciplinary team of experts from the fields of system administration, software engineering and technology acceptance to ensure that the solution is sustainable and reusable in the long term.ResultsWe have developed the pre-built packages “ResearchData-to-FHIR,” “FHIR-to-OMOP,” and “Addons,” which provide the services for data harmonization and provision of project-related real-world data in both the FHIR MII Core dataset format (CDS) and the OMOP CDM format as well as utilization and a Service Platform Prototype to streamline data management and use.ConclusionOur development shows a possible approach to extend the MII concepts to non-university healthcare providers to enable cross-site research on real-world data. Our Service Platform Prototype can thus pave the way for intersectoral data sharing, federated analysis, and provision of SMART-on-FHIR applications to support clinical decision making.
Optum DOD OMOP
redivis.com
application/jsonl +7
Updated Aug 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Optum DOD OMOP [Dataset]. http://doi.org/10.57761/dbqm-8c86
Explore at:
csv, avro, sas, spss, parquet, stata, arrow, application/jsonlAvailable download formats
Unique identifier
https://doi.org/10.57761/dbqm-8c86
Dataset updated
Aug 18, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Description
Abstract

Optum DOD (Date of Death) v8.0 database in the OMOP data model (https://www.ohdsi.org/data-standardization/the-common-data-model/)

Section 10

A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:

It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event.

It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences.

%3C!-- --%3E

For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP's original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.

Conventions

Condition Era records will be derived from the records in the CONDITION_OCCURRENCE table using a standardized algorithm.

Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval.

Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 5

The CONCEPT_ANCESTOR table is designed to simplify observational analysis by providing the complete hierarchical relationships between Concepts. Only direct parent-child relationships between Concepts are stored in the CONCEPT_RELATIONSHIP table. To determine higher level ancestry connections, all individual direct relationships would have to be navigated at analysis time. The CONCEPT_ANCESTOR table includes records for all parent-child relationships, as well as grandparent-grandchild relationships and those of any other level of lineage.

Using the CONCEPT_ANCESTOR table allows for querying for all descendants of a hierarchical concept. For example, drug ingredients and drug products are all descendants of a drug class ancestor.

Conventions

The concept_name field contains a valid Synonym of a concept, including the description in the concept_name itself. I.e. each Concept has at least one Synonym in the CONCEPT_SYNONYM table. As an example, for a SNOMED-CT Concept, if the fully specified name is stored as the concept_name of the CONCEPT table, then the Preferred Term and Synonyms associated with the Concept are stored in the CONCEPT_SYNONYM table.

Only Synonyms that are active and current are stored in the CONCEPT_SYNONYM table. Tracking synonym/description history and mapping of obsolete synonyms to current Concepts/Synonyms is out of scope for the Standard Vocabularies.

Currently, only English Synonyms are included.

%3C!-- --%3E

The text above is taken from the OMOP CDM v5.3 Specification document.

Section 4

The COST table captures records containing the cost of any medical entity recorded in one of the DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, VISIT_OCCURRENCE or DEVICE_OCCURRENCE tables.

The information about the cost is defined by the amount of money paid by the Person and Payer, or as the charged cost by the healthcare provider. So, the COST table can be used to represent both cost and revenue perspectives. The cost_type_concept_id field will use concepts in the Standardized Vocabularies to designate the source of the cost data. A reference to the health plan information in the PAYER_PLAN_PERIOD table is stored in the record that is responsible for the determination of the cost as well as some of the payments.

Convention

The COST table will store information reporting money or currency amounts. There are three types of cost data, defined in the cost_type_concept_id: 1) paid or reimbursed amounts, 2) charges or list prices (such as Average Wholesale Prices), and 3) costs or expenses incurred by the provider. The defined fields are variables found in almost all U.S.-based claims data sources, which is the most common data source for researchers. Non-U.S.-based data holders are encouraged to engage with OHDSI to adjust these tables to their needs.

One cost record is generated for each response by a payer. In a claims databases, the payment and payment terms reported by the payer for the goods or services billed will generate one cost record. If the source data has payment information f
d
Data from: The COVID-19 trial finder
datadryad.org
zenodo.org
zip
Updated Nov 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yingcheng Sun; Alex Butler; Fengyang Lin; Hao Liu; Latoya Stewart; Jae Hyun Kim; Betina Ross Idnay; Qingyin Ge; Xinyi Wei; Cong Liu; Chi Yuan; Chunhua Weng (2021). The COVID-19 trial finder [Dataset]. http://doi.org/10.5061/dryad.7h44j0zs9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.7h44j0zs9
Dataset updated
Nov 16, 2021
Dataset provided by
Dryad
Authors
Yingcheng Sun; Alex Butler; Fengyang Lin; Hao Liu; Latoya Stewart; Jae Hyun Kim; Betina Ross Idnay; Qingyin Ge; Xinyi Wei; Cong Liu; Chi Yuan; Chunhua Weng
Time period covered
2020
Description
The dataset contains 581 structured COVID-19 clinical trials. Entities in each trial are extracted and mapped to standard concepts in five domains (condition, device, drug, measurement, and procedure) following Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The mapped concepts are used as semantic tags for trial indexing. Definitions of the dataset columns are described in the README file.

OMOP2OBO Drug Exposure Ingredient Mappings

zenodo.org
explore.openaire.eu
+1more

bin

Updated Mar 29, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Tiffany J Callahan; Tiffany J Callahan; William A Baumgartner; William A Baumgartner; Lawrence D Hunter; Lawrence D Hunter; Michael G Kahn; Michael G Kahn (2023). OMOP2OBO Drug Exposure Ingredient Mappings [Dataset]. http://doi.org/10.5281/zenodo.6774402

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6774402

Dataset updated

Mar 29, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Tiffany J Callahan; Tiffany J Callahan; William A Baumgartner; William A Baumgartner; Lawrence D Hunter; Lawrence D Hunter; Michael G Kahn; Michael G Kahn

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

OMOP2OBO Drug Exposure Ingredient Mappings V1.0

These mappings were created by the OMOP2OBO mapping algorithm (see links below). OMOP2OBO - the first health system-wide, disease-agnostic mappings between standardized clinical terminologies and eight Open Biomedical Ontology (OBO) Foundry ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, vaccines, and proteins. These mappings are also the first to be explicitly created using standard terminologies in the Observational Medical Outcomes (OMOP) common data model (CDM), ensuring both semantic and clinical interoperability across a space of N conditions [and N relationships curated in these ontologies].

The mappings in this repository were created between OMOP standard drug exposure concepts at the ingredient-level (i.e., RxNorm) to the Chemical Entities of Biological Interest (ChEBI), the National Center for Biotechnology Information Taxon Ontology (NCBITaxon), the Protein Ontology (PRO), and the Vaccine Ontology (VO). All concepts were aligned to at least one ChEBI concept and the remaining ontologies (NCBITaxon, PR, and VO) were mapped by their drug class and/or type (e.g., biologics versus vaccines). For these OMOP domains, owl:intersectionOf (“and”), and owl:unionOf (“or”) constructors were used to construct semantically expressive mappings.

Mapping Details
Mappings included in this set were generated automatically using OMOP2OBO or through the use of a Bag-of-words embedding model using TF-IDF. Cosine similarity is used to compute similarity scores between all pairwise combinations of OMOP and OBO concepts and ancestor concepts. To improve the efficiency of this process, the algorithm searches only the top 𝑛 most similar results and keeps the top 75th percentile among all pairs with scores >= 0.25.

Mapping Categories

Automatic Exact - Concept: Exact label or synonym, dbXRef, or expert validated mapping @ concept-level; 1:1
Automatic Exact - Ancestor: Exact label or synonym, dbXRef, or expert validated mapping @ concept ancestor-level; 1:1
Automatic Constructor - Concept: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Automatic Constructor - Ancestor: Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Manual: Hand mapping created using expert suggested resources; 1:1
Manual Constructor: Hand mapping created using expert suggested resources; 1:Many
Concept Similarity: score suggested mapping -- manually verified
UnMapped: No suitable mapping or not mapped type

Mapping Statistics
Additional statistics have been provided for the mappings and are shown in the table below. This table presents the counts of OMOP concepts by mapping category and ontology:

Mapping category	ChEBI	NCBITaxon	PRO	VO
Automatic Exact - Concept	3151	155	43	108
Automatic Constructor - Constructor	404	1	1	0
Automatic Exact - Ancestor	147	17	20	4
Automatic Constructor - Ancestor	210	3	2	2
Concept Similarity	109	4241	18	17
Manual	322	230	157	21
Manual Constructor	72	14	8	2
UnMapped	7392	7146	11558	11653

Provenance and Versioning: The V1.0 deposited mappings were created by OMOP2OBO v1.0.0 on October 2022 using the OMOP Common Data Model V5.0 and OBO Foundry ontologies downloaded on September 14, 2020.

Caveats: Please note that these are the original mappings that were created for the preprint. They have not been updated to current versions of the ontologies. In our experience, this should result in very few errors, but we do suggest that you check the ontology concepts used against current versions of each ontology before using them.

Important Resources and Documentation

GitHub: OMOP2OBO
Project Wiki: OMOP2OBO - wiki
Zenodo Community: OMOP2OBO
Preprint Manuscript: 10.5281/zenodo.5716421

An example of outpatient visit event logs.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0279641.t003
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Kangah Park; Minsu Cho; Minseok Song; Sooyoung Yoo; Hyunyoung Baek; Seok Kim; Kidong Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An example of outpatient visit event logs.
Hospital Chargemasters
data.chhs.ca.gov
data.ca.gov
+1more
zip
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2024). Hospital Chargemasters [Dataset]. https://data.chhs.ca.gov/dataset/chargemasters
Explore at:
zip(271130648), zip(271072163), zip(242190556), zip(883069869), zip(689244251), zip(256914973), zip(243189626), zip(264486994), zip(564467341), zip(263064822), zip(367638205), zip(261492388), zip(237780723), zip(226308410)Available download formats
Dataset updated
Oct 7, 2024
Dataset authored and provided by
Department of Health Care Access and Information
Description
This dataset contains Hospital Chargemasters with prices in effect as of June 1 of their reporting year. Chargemasters consists of a list of average charges for 25 common outpatient procedures, and the estimated percentage change in gross revenue due to price changes each July 1.

For more on HCAI Chargemaster Data.
f
Model performance across OHDSI data network.
plos.figshare.com
figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiong Wang; Jenna M. Reps; Kristin Feeney Kostka; Patrick B. Ryan; Yuhui Zou; Erica A. Voss; Peter R. Rijnbeek; RuiJun Chen; Gowtham A. Rao; Henry Morgan Stewart; Andrew E. Williams; Ross D. Williams; Mui Van Zandt; Thomas Falconer; Margarita Fernandez-Chas; Rohit Vashisht; Stephen R. Pfohl; Nigam H. Shah; Suranga N. Kasthurirathne; Seng Chan You; Qing Jiang; Christian Reich; Yi Zhou (2023). Model performance across OHDSI data network. [Dataset]. http://doi.org/10.1371/journal.pone.0226718.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0226718.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Qiong Wang; Jenna M. Reps; Kristin Feeney Kostka; Patrick B. Ryan; Yuhui Zou; Erica A. Voss; Peter R. Rijnbeek; RuiJun Chen; Gowtham A. Rao; Henry Morgan Stewart; Andrew E. Williams; Ross D. Williams; Mui Van Zandt; Thomas Falconer; Margarita Fernandez-Chas; Rohit Vashisht; Stephen R. Pfohl; Nigam H. Shah; Suranga N. Kasthurirathne; Seng Chan You; Qing Jiang; Christian Reich; Yi Zhou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Model performance across OHDSI data network.
f
EMR tables and related tables in the OMOP CDM.
figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). EMR tables and related tables in the OMOP CDM. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t004
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and ‘validation’ analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.MethodsWe used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.ResultsAcross three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A ’FAIL’ occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.ConclusionThe OMOP CDM’s widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.
P
Global Chronic Disease Management Market Industry Best Practices 2025-2032
statsndata.org
excel, pdf
Updated Feb 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Chronic Disease Management Market Industry Best Practices 2025-2032 [Dataset]. https://www.statsndata.org/report/chronic-disease-management-market-53656
Explore at:
excel, pdfAvailable download formats
Dataset updated
Feb 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Chronic Disease Management (CDM) market plays a critical role in the healthcare landscape, addressing the growing prevalence of chronic conditions such as diabetes, cardiovascular diseases, COPD, and mental health disorders. With the increasing aging population and rising lifestyle-related diseases, the market h
f
Data in the data repository and the resultant OMOP CDM after conversion.
plos.figshare.com
xls
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle (2024). Data in the data repository and the resultant OMOP CDM after conversion. [Dataset]. http://doi.org/10.1371/journal.pone.0301557.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301557.t003
Dataset updated
Apr 18, 2024
Dataset provided by
PLOS ONE
Authors
Roger Ward; Christine Mary Hallinan; David Ormiston-Smith; Christine Chidgey; Dougie Boyle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data in the data repository and the resultant OMOP CDM after conversion.
Supplementary Material for: The Direct Medical Cost of Essential Tremor
karger.figshare.com
docx
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kapinos K.A.; Louis E.D. (2024). Supplementary Material for: The Direct Medical Cost of Essential Tremor [Dataset]. http://doi.org/10.6084/m9.figshare.27209181.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27209181.v1
Dataset updated
Oct 11, 2024
Dataset provided by
Karger Publishershttp://www.karger.com/
Authors
Kapinos K.A.; Louis E.D.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Objectives: To determine the direct medical cost of illness from essential tremor (ET) from a patient perspective. Methods: Secondary data from the Optum’s de-identified Clinformatics® Data Mart Database (CDM) from 2018-2019 was used to assess medical resource utilization and costs. Propensity score matching was used to match patients age 40+ with to statistically similar controls. Generalized linear models were used to estimate average, adjusted total costs of care per year, by health care setting, and provider specialty. Results: The final sample included 41,200 patients with at least one ET claim and 36,871 matched patients. Overall, ET patients ages 40+ had about $28,217 in direct medical costs per year, which was about $1,601 more than matched comparisons (p < 0.001). This was driven by greater number of outpatient visits overall and with specialists. Extrapolating the estimates from our study and pairing them with published age-specific disease prevalence statistics for ET, we calculated an annual cost for direct medical care of ET patients ages 40+ to be about $9.4 billion. Conclusion: The estimated direct medical costs among adults age 40+ with an ET diagnosis aggregated to the population-level are non-trivial.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/

Synthea synthetic patient generator data in OMOP Common Data Model

Explore at:

Dataset updated

Jan 4, 2023

Dataset provided by

Amazon.comhttp://amazon.com/

Description

The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079

Clear search

Close search

Google apps

Main menu

Synthea synthetic patient generator data in OMOP Common Data Model

An example of ER visit event logs.

OMOP primary database assessment of risk.

Semantic Triples from "A Collaborative, Realism-Based, Electronic Healthcare...

EHRSHOT

Abstract

Methodology

Usage

An example of inpatient visit event logs.

Table_2_Streamlining intersectoral provision of real-world health data: a...

Optum DOD OMOP

Abstract

Section 10

Section 5

Section 4

Data from: The COVID-19 trial finder

OMOP2OBO Drug Exposure Ingredient Mappings

An example of outpatient visit event logs.

Hospital Chargemasters

Model performance across OHDSI data network.

EMR tables and related tables in the OMOP CDM.

Global Chronic Disease Management Market Industry Best Practices 2025-2032

Data in the data repository and the resultant OMOP CDM after conversion.

Supplementary Material for: The Direct Medical Cost of Essential Tremor

Synthea synthetic patient generator data in OMOP Common Data Model