70 datasets found
  1. RxNorm Data

    • kaggle.com
    • bioregistry.io
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2019). RxNorm Data [Dataset]. https://www.kaggle.com/datasets/nlm-nih/nlm-rxnorm
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    National Library of Medicine
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm

    RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/

    Content

    RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.

    This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.

    The following tables are included in the RxNorm dataset:

    • RXNCONSO contains concept and source information

    • RXNREL contains information regarding relationships between entities

    • RXNSAT contains attribute information

    • RXNSTY contains semantic information

    • RXNSAB contains source info

    • RXNCUI contains retired rxcui codes

    • RXNATOMARCHIVE contains archived data

    • RXNCUICHANGES contains concept changes

    Update Frequency: Monthly

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://www.nlm.nih.gov/research/umls/rxnorm/

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm

    https://cloud.google.com/bigquery/public-data/rxnorm

    Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.

    Banner Photo by @freestocks from Unsplash.

    Inspiration

    What are the RXCUI codes for the ingredients of a list of drugs?

    Which ingredients have the most variety of dose forms?

    In what dose forms is the drug phenylephrine found?

    What are the ingredients of the drug labeled with the generic code number 072718?

  2. E

    Health Statistic and Research Database

    • www-acc.healthinformationportal.eu
    • healthinformationportal.eu
    html
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Estonian National Institute for Health Development (2023). Health Statistic and Research Database [Dataset]. https://www-acc.healthinformationportal.eu/services/find-data?page=6
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Feb 23, 2023
    Dataset authored and provided by
    Estonian National Institute for Health Development
    Variables measured
    sex, title, topics, country, language, data_owners, description, contact_name, geo_coverage, contact_email, and 10 more
    Measurement technique
    Multiple sources
    Description

    The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.

    The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).

    The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.

    A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.

    Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.

  3. Medicine data: European public assessment reports (EPAR) for human medicines...

    • data.europa.eu
    excel xls, html
    Updated Dec 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Medicines Agency (2015). Medicine data: European public assessment reports (EPAR) for human medicines [Dataset]. https://data.europa.eu/data/datasets/epar-human-medicines?locale=en
    Explore at:
    excel xls, htmlAvailable download formats
    Dataset updated
    Dec 15, 2015
    Dataset authored and provided by
    European Medicines Agencyhttp://ema.europa.eu/
    License

    http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj

    Area covered
    Europe
    Description

    The EMA publishes an EPAR for every medicine granted a central marketing authorisation by the European Commission following an assessment by the EMA's Committee for Medicinal Products for Human Use (CHMP). EPARs are full scientific assessment reports of medicines authorised at a European Union level.

    You can find information including a public-friendly summary in question-and-answer format and the package leaflet. You can also find information on medicines that have been refused a marketing authorisation or that have been suspended or withdrawn after being approved.

    Different filter options on the website allow for browsing the data by the therapeutic area or type (orphan, generic, biosimilar etc.). Search results can be exported in Excel format.

    The Agency does not evaluate all medicines currently in use in Europe. If you cannot find the medicine you need through this search, please visit the website of your national health authority.

  4. Drug Product Database - All Files

    • open.canada.ca
    • data.amerigeoss.org
    • +1more
    html, json, xml, zip
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Health Canada (2025). Drug Product Database - All Files [Dataset]. https://open.canada.ca/data/en/dataset/bf55e42a-63cb-4556-bfd8-44f26e5a36fe
    Explore at:
    json, xml, html, zipAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset provided by
    Health Canadahttp://www.hc-sc.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The Drug Product Database (DPD) system captures information on Canadian human, veterinary and disinfectant products approved for use by Health Canada. To facilitate the use of the drug product data, multiple Drug Product files are available. Users can access the complete data set through the “Drug Product” file. Subsets of the data can be accessed in the “Drug Product By …” files. The data in these files are filtered based on the current drug product status. For example, only drug product data for Approved products will be found in the “Drug Product By Approved Status” file.

  5. w

    DrugBank

    • data.wu.ac.at
    Updated Oct 10, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global (2013). DrugBank [Dataset]. https://data.wu.ac.at/odso/datahub_io/Y2VhMjFmYWItZjRhOC00MWRiLWIzMGEtOGU3NDM3ZmQ4MWE3
    Explore at:
    Dataset updated
    Oct 10, 2013
    Dataset provided by
    Global
    Description

    About

    The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4800 drug entries including >1,480 FDA-approved small molecule drugs, 128 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,200 experimental drugs. Additionally, more than 2,500 non-redundant protein (i.e. drug target) sequences are linked to these FDA approved drug entries. Each DrugCard entry contains more than 100 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.

    Openness

    Not open due to noncommercial conditions of re-use (from about page):

    DrugBank is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors and explicit acknowledgment of the source material (DrugBank) and the original publication (see below). We ask that users who download significant portions of the database cite the DrugBank paper in any resulting publications.

  6. DrugBank Database Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). DrugBank Database Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/drugbank-database-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    DrugBank Vocabulary contains information on DrugBank identifiers, names, and synonyms to permit easy linking and integration into any type of project. DrugBank is a richly annotated resource that combines detailed drug data with comprehensive drug target and drug action information. DrugBank is widely used to facilitate in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education.

  7. EU Veterinary Medicinal Product Database

    • data.europa.eu
    html
    Updated Nov 21, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Medicines Agency (2016). EU Veterinary Medicinal Product Database [Dataset]. https://data.europa.eu/data/datasets/eu-veterinary-medicinal-product-database?locale=en
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Nov 21, 2016
    Dataset authored and provided by
    European Medicines Agencyhttp://ema.europa.eu/
    License

    http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj

    Area covered
    European Union
    Description

    The EU Veterinary Medicinal Product Database is intended to be a source of information on all medicinal products for veterinary use that have been authorised in the European Union and the European Economic Area. The database is hosted by the European Medicines Agency.

  8. GUDID Download

    • healthdata.gov
    • data.virginia.gov
    • +4more
    application/rdfxml +5
    Updated Feb 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    open.fda.gov (2021). GUDID Download [Dataset]. https://healthdata.gov/dataset/GUDID-Download/vkqw-3twk
    Explore at:
    tsv, xml, csv, json, application/rssxml, application/rdfxmlAvailable download formats
    Dataset updated
    Feb 25, 2021
    Dataset provided by
    Food and Drug Administrationhttp://www.fda.gov/
    Description

    The Global Unique Device Identification Database (GUDID) contains key device identification information submitted to the FDA about medical devices that have Unique Device Identifiers (UDI). Unique device identification is a system being established by the

  9. p

    MIMIC-IV

    • physionet.org
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  10. f

    Agomelatine database of registered studies in humans

    • figshare.com
    zip
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge H Ramirez (2016). Agomelatine database of registered studies in humans [Dataset]. http://doi.org/10.6084/m9.figshare.1126327.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Authors
    Jorge H Ramirez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database complies with the terms and conditions of: ClinicalTrials.govhttp://clinicaltrials.gov/ct2/about-site/terms-conditions WHO ICTRP http://www.who.int/ictrp/search/download/en/

  11. Healthcare Management System

    • kaggle.com
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anouska Abhisikta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Patients Table:

    • PatientID: Unique identifier for each patient.
    • firstname: First name of the patient.
    • lastname: Last name of the patient.
    • email: Email address of the patient.

    This table stores information about individual patients, including their names and contact details.

    Doctors Table:

    • DoctorID: Unique identifier for each doctor.
    • DoctorName: Full name of the doctor.
    • Specialization: Area of medical specialization.
    • DoctorContact: Contact details of the doctor.

    This table contains details about healthcare providers, including their names, specializations, and contact information.

    Appointments Table:

    • AppointmentID: Unique identifier for each appointment.
    • Date: Date of the appointment.
    • Time: Time of the appointment.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.
    • DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

    This table records scheduled appointments, linking patients to doctors.

    MedicalProcedure Table:

    • ProcedureID: Unique identifier for each medical procedure.
    • ProcedureName: Name or description of the medical procedure.
    • AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

    This table stores details about medical procedures associated with specific appointments.

    Billing Table:

    • InvoiceID: Unique identifier for each billing transaction.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.
    • Items: Description of items or services billed.
    • Amount: Amount charged for the billing transaction.

    This table maintains records of billing transactions, associating them with specific patients.

    demo Table:

    • ID: Primary key, serves as a unique identifier for each record.
    • Name: Name of the entity.
    • Hint: Additional information or hint about the entity.

    This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

    This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.

  12. h

    Multimodal ground truth datasets for abdominal medical image registration...

    • heidata.uni-heidelberg.de
    zip
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Zöllner; Frank Zöllner (2023). Multimodal ground truth datasets for abdominal medical image registration [data] [Dataset]. http://doi.org/10.11588/DATA/ICSFUS
    Explore at:
    zip(3796777237), zip(27228993659), zip(2968034134)Available download formats
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    heiDATA
    Authors
    Frank Zöllner; Frank Zöllner
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    BMBF
    Description

    Sparsity of annotated data is a major limitation in medical image processing tasks such as registration. Registered multimodal image data are essential for the diagnosis of medical conditions and the success of interventional medical procedures. To overcome the shortage of data, we present a method that allows the generation of annotated multimodal 4D datasets. We use a CycleGAN network architecture to generate multimodal synthetic data from the 4D extended cardiac–torso (XCAT) phantom and real patient data. Organ masks are provided by the XCAT phantom; therefore, the generated dataset can serve as ground truth for image segmentation and registration. Compared to real patient data, the synthetic data showed good agreement regarding the image voxel intensity distribution and the noise characteristics. The generated T1-weighted magnetic resonance imaging, computed tomography (CT), and cone beam CT images are inherently co-registered.

  13. EHRSHOT

    • redivis.com
    application/jsonl +7
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shah Lab (2025). EHRSHOT [Dataset]. http://doi.org/10.57761/0gv9-nd83
    Explore at:
    avro, sas, parquet, spss, csv, stata, arrow, application/jsonlAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Shah Lab
    Description

    Abstract

    👂💉 EHRSHOT is a dataset for benchmarking the few-shot performance of foundation models for clinical prediction tasks. EHRSHOT contains de-identified structured data (e.g., diagnosis and procedure codes, medications, lab values) from the electronic health records (EHRs) of 6,739 Stanford Medicine patients and includes 15 prediction tasks. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and includes data beyond ICU and emergency department patients.

    ⚡️Quickstart 1. To recreate the original EHRSHOT paper, download the EHRSHOT_ASSETS.zip file from the "Files" tab 2. To work with OMOP CDM formatted data, download all the tables in the "Tables" tab

    ⚙️ Please see the "Methodology" section below for details on the dataset and downloadable files.

    Methodology

    1. 📖 Overview

    EHRSHOT is a benchmark for evaluating models on few-shot learning for patient classification tasks. The dataset contains:

    • **6,739 **patients
    • 41.6 million clinical events
    • 921,499 visits
    • 15 prediction tasks

    %3C!-- --%3E

    2. 💽 Dataset

    EHRSHOT is sourced from Stanford’s STARR-OMOP database.

    • Data follows the OMOP CDM and is fully de-identified.
    • Unlike most other EHR research datasets, EHRSHOT is not restricted to ED/ICU visits and instead includes longitudinal patient data for all hospital encounter types.
    • EHRSHOT does not contain clinical notes or images.

    %3C!-- --%3E

    We provide two versions of the dataset:

    • EHRSHOT-Original is the same exact dataset used in the original EHRSHOT paper.
    • EHRSHOT-OMOP is a more complete version of the EHRSHOT dataset which includes all OMOP CDM tables and additional OMOP metadata.

    %3C!-- --%3E

    To access the raw data, please see the "Tables" and "Files"** **tabs above:

    3. 💽 Data Files and Formats

    We provide EHRSHOT in two file formats:

    • OMOP CDM v5.4
    • Medical Event Data Standard (MEDS)

    %3C!-- --%3E

    Within the "Tables" tab...

    1. %3Cu%3EEHRSHOT-OMOP%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Notes: Contains all OMOP CDM tables for the EHRSHOT patients. Note that this dataset is slightly different than the original EHRSHOT dataset, as these tables contain the full OMOP schema rather than a filtered subset.

    Within the "Files" tab...

    1. %3Cu%3EEHRSHOT_ASSETS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-Original

    * Data Format: FEMR 0.1.16

    * Notes: The original EHRSHOT dataset as detailed in the paper. Also includes model weights.

    2. %3Cu%3EEHRSHOT_MEDS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-Original

    * Data Format: MEDS 0.3.3

    * Notes: The original EHRSHOT dataset as detailed in the paper. It does not include any models.

    3. %3Cu%3EEHRSHOT_OMOP_MEDS.zip%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Data Format: MEDS 0.3.3 + MEDS-ETL 0.3.8

    * Notes: Converts the dataset from EHRSHOT-OMOP into MEDS format via the `meds_etl_omop`command from MEDS-ETL.

    4. %3Cu%3EEHRSHOT_OMOP_MEDS_Reader.zip%3C/u%3E

    * Dataset Version: EHRSHOT-OMOP

    * Data Format: MEDS Reader 0.1.9 + MEDS 0.3.3 + MEDS-ETL 0.3.8

    * Notes: Same data as EHRSHOT_OMOP_MEDS.zip, but converted into a MEDS-Reader database for faster reads.

    4. 🤖 Model

    We also release the full weights of **CLMBR-T-base, **a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. Please download from https://huggingface.co/StanfordShahLab/clmbr-t-base

    **5. 🧑‍💻 Code **

    Please see our Github repo to obtain code for loading the dataset and running a set of pretrained baseline models: https://github.com/som-shahlab/ehrshot-benchmark/

    Usage

    **NOTE: You must authenticate to Redivis using your formal affiliation's email address. If you use gmail or other personal email addresses, you will not be granted access. **

    Access to the EHRSHOT dataset requires the following:

    • Verified Affiliation with an **Academic, Government, **o
  14. p

    MIMIC-III Clinical Database Demo

    • physionet.org
    Updated Apr 24, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Tom Pollard; Roger Mark (2019). MIMIC-III Clinical Database Demo [Dataset]. http://doi.org/10.13026/C2HM2Q
    Explore at:
    Dataset updated
    Apr 24, 2019
    Authors
    Alistair Johnson; Tom Pollard; Roger Mark
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012 [1]. The MIMIC-III Clinical Database is available on PhysioNet (doi: 10.13026/C2XW26). Though deidentified, MIMIC-III contains detailed information regarding the care of real patients, and as such requires credentialing before access. To allow researchers to ascertain whether the database is suitable for their work, we have manually curated a demo subset, which contains information for 100 patients also present in the MIMIC-III Clinical Database. Notably, the demo dataset does not include free-text notes.

  15. Data from: DailyMed

    • healthdata.gov
    • data.virginia.gov
    • +6more
    application/rdfxml +5
    Updated Mar 31, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datadiscovery.nlm.nih.gov (2021). DailyMed [Dataset]. https://healthdata.gov/dataset/DailyMed/j3hv-i8vg
    Explore at:
    application/rssxml, csv, tsv, json, application/rdfxml, xmlAvailable download formats
    Dataset updated
    Mar 31, 2021
    Dataset provided by
    datadiscovery.nlm.nih.gov
    Description

    DailyMed provides health information providers and the public with a standard, comprehensive, up-to-date, look-up and download resource of medication content and labeling as found in medication package inserts, also known as Structured Product Labeling (SPL).

  16. P

    MIMIC-IV Dataset

    • paperswithcode.com
    • physionet.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MIMIC-IV Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv
    Explore at:
    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy.

    The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  17. Z

    Data from: MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark...

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Shi (2023). MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4269851
    Explore at:
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    Bingbing Ni
    Rui Shi
    Jiancheng Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data repository for MedMNIST v1 is out of date! Please check the latest version of MedMNIST v2.

    Abstract

    We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.

    Please note that this dataset is NOT intended for clinical use.

    We recommend our official code to download, parse and use the MedMNIST dataset:

    pip install medmnist

    Citation and Licenses

    If you find this project useful, please cite our ISBI'21 paper as: Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," arXiv preprint arXiv:2010.14925, 2020.

    or using bibtex: @article{medmnist, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, journal={arXiv preprint arXiv:2010.14925}, year={2020} }

    Besides, please cite the corresponding paper if you use any subset of MedMNIST. Each subset uses the same license as that of the source dataset.

    PathMNIST

    Jakob Nikolas Kather, Johannes Krisam, et al., "Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study," PLOS Medicine, vol. 16, no. 1, pp. 1–22, 01 2019.

    License: CC BY 4.0

    ChestMNIST

    Xiaosong Wang, Yifan Peng, et al., "Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases," in CVPR, 2017, pp. 3462–3471.

    License: CC0 1.0

    DermaMNIST

    Philipp Tschandl, Cliff Rosendahl, and Harald Kittler, "The ham10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions," Scientific data, vol. 5, pp. 180161, 2018.

    Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, and Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; arXiv:1902.03368.

    License: CC BY-NC 4.0

    OCTMNIST/PneumoniaMNIST

    Daniel S. Kermany, Michael Goldbaum, et al., "Identifying medical diagnoses and treatable diseases by image-based deep learning," Cell, vol. 172, no. 5, pp. 1122 – 1131.e9, 2018.

    License: CC BY 4.0

    RetinaMNIST

    DeepDR Diabetic Retinopathy Image Dataset (DeepDRiD), "The 2nd diabetic retinopathy – grading and image quality estimation challenge," https://isbi.deepdr.org/data.html, 2020.

    License: CC BY 4.0

    BreastMNIST

    Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy, "Dataset of breast ultrasound images," Data in Brief, vol. 28, pp. 104863, 2020.

    License: CC BY 4.0

    OrganMNIST_{Axial,Coronal,Sagittal}

    Patrick Bilic, Patrick Ferdinand Christ, et al., "The liver tumor segmentation benchmark (lits)," arXiv preprint arXiv:1901.04056, 2019.

    Xuanang Xu, Fugen Zhou, et al., "Efficient multiple organ localization in ct image using 3d region proposal network," IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1885–1898, 2019.

    License: CC BY 4.0

  18. d

    Data from: RxNorm

    • catalog.data.gov
    • datadiscovery.nlm.nih.gov
    • +5more
    Updated May 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). RxNorm [Dataset]. https://catalog.data.gov/dataset/rxnorm-3180d
    Explore at:
    Dataset updated
    May 31, 2025
    Dataset provided by
    National Library of Medicine
    Description

    RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Technical documentation at http://www.nlm.nih.gov/research/umls/rxnorm/docs/index.html

  19. d

    Search for Patient Engagement in Research

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boruff, Jill (2023). Search for Patient Engagement in Research [Dataset]. http://doi.org/10.5683/SP3/NLQR1E
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Boruff, Jill
    Description

    The notes field contains the full MEDLINE (Ovid) search strategy for patient engagement in research. The search file in this dataset contains the full MEDLINE (Ovid), Embase (Ovid), CINAHL (EBSCO), Cochrane Central search strategies for patient engagment in research. It is a comprehensive but not exhaustive search. The RIS files contain the complete database downloads. Search date: 20230330

  20. p

    Data from: MIT-BIH Arrhythmia Database

    • physionet.org
    • opendatalab.com
    • +1more
    Updated Feb 24, 2005
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Moody; Roger Mark (2005). MIT-BIH Arrhythmia Database [Dataset]. http://doi.org/10.13026/C2F305
    Explore at:
    Dataset updated
    Feb 24, 2005
    Authors
    George Moody; Roger Mark
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Library of Medicine (2019). RxNorm Data [Dataset]. https://www.kaggle.com/datasets/nlm-nih/nlm-rxnorm
Organization logo

RxNorm Data

National Library of Medicine RxNorm Data (BigQuery Dataset)

Explore at:
153 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
National Library of Medicine
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm

RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/

Content

RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.

This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.

The following tables are included in the RxNorm dataset:

  • RXNCONSO contains concept and source information

  • RXNREL contains information regarding relationships between entities

  • RXNSAT contains attribute information

  • RXNSTY contains semantic information

  • RXNSAB contains source info

  • RXNCUI contains retired rxcui codes

  • RXNATOMARCHIVE contains archived data

  • RXNCUICHANGES contains concept changes

Update Frequency: Monthly

Fork this kernel to get started with this dataset.

Acknowledgements

https://www.nlm.nih.gov/research/umls/rxnorm/

https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm

https://cloud.google.com/bigquery/public-data/rxnorm

Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.

Banner Photo by @freestocks from Unsplash.

Inspiration

What are the RXCUI codes for the ingredients of a list of drugs?

Which ingredients have the most variety of dose forms?

In what dose forms is the drug phenylephrine found?

What are the ingredients of the drug labeled with the generic code number 072718?

Search
Clear search
Close search
Google apps
Main menu