https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm
RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/
RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.
This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.
The following tables are included in the RxNorm dataset:
RXNCONSO contains concept and source information
RXNREL contains information regarding relationships between entities
RXNSAT contains attribute information
RXNSTY contains semantic information
RXNSAB contains source info
RXNCUI contains retired rxcui codes
RXNATOMARCHIVE contains archived data
RXNCUICHANGES contains concept changes
Update Frequency: Monthly
Fork this kernel to get started with this dataset.
https://www.nlm.nih.gov/research/umls/rxnorm/
https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm
https://cloud.google.com/bigquery/public-data/rxnorm
Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.
Banner Photo by @freestocks from Unsplash.
What are the RXCUI codes for the ingredients of a list of drugs?
Which ingredients have the most variety of dose forms?
In what dose forms is the drug phenylephrine found?
What are the ingredients of the drug labeled with the generic code number 072718?
The Influenza Research Database (IRD) serves as a public repository and analysis platform for flu sequence, experiment, surveillance and related data.
National Database for Autism Research (NDAR) is an extensible, scalable informatics platform for autism spectrum disorder-relevant data at all levels of biological and behavioral organization (molecules, genes, neural tissue, behavioral, social and environmental interactions) and for all data types (text, numeric, image, time series, etc.). NDAR was developed to share data across the entire ASD field and to facilitate collaboration across laboratories, as well as interconnectivity with other informatics platforms. NDAR Homepage: http://ndar.nih.gov/
The Clinical Outcomes Research Initiative (CORI) was established in 1995 under the American Society for Gastrointestinal Endoscopy (ASGE) to study outcomes of gastrointestinal (GI) endoscopic procedures in clinical settings. Physicians participating in the CORI consortium produce GI endoscopy reports using an electronic health record developed specially for the project. CORI practice sites include hospitals, ambulatory care centers, private practices, universities, and Veteran's hospitals. The practice data are stripped of most patient and physician identifiers before transmitting to a central data repository, where they are tested for completeness and accuracy. Data from all participating practices are merged and stored in the National Endoscopic Database (NED).
Data from the NED has been analyzed to examine endoscopic practice patterns, including endoscopic utilization, frequency and severity of endoscopic findings, and endoscopic treatment and medical management. The data also serve as a resource to develop research hypotheses and to support quality measure reporting. CORI data has already been utilized to support many research initiatives, many of which have resulted in publications in medical journals and presentations at GI conferences.
In addition to availability in through the CORI consortium, the data collected in the NED since 2000 is been contributed to the NIDDK Repository and is available for request through the NIDDK Repository site.
The current data package contains data from the v3 warehouse from 2000 through 2010.
ZFIN serves as the zebrafish model organism database. It aims to: a) be the community database resource for the laboratory use of zebrafish, b) develop and support integrated zebrafish genetic, genomic and developmental information, c) maintain the definitive reference data sets of zebrafish research information, d) to link this information extensively to corresponding data in other model organism and human databases, e) facilitate the use of zebrafish as a model for human biology, and f) serve the needs of the research community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present database supports the study "A global database of seahorse research and innovation, from the beginning to 2022".
Scientific knowledge on seahorses is rapidly expanding in response to declines in wild populations due to habitat loss and fishing. A plethora of information has accumulated and up until now, a global, publicly available, curated database has never been produced in a transparent and systematic way. Here, we present the largest open-access repository of scientific publications addressing seahorses and, for the first time, of theses and patents. Compilation followed the “Preferred Reporting Items for Systematic reviews and Meta-Analyses” (PRISMA) Statement for systematic reviews and meta-analyses, with modifications. The current repository duplicates the number of scientific publication records found from previous bibliometric/literature/review studies, using three extra repositories of source publications, and a lifetime window, e.i. from the beginning to March 2022. A total of 977 scientific publications, 101 theses and 533 patents are gathered in the dataset, covering 41 seahorse species out of 48 currently recognized. In addition, current work presents for the first time new metrics on authors, institutions, and research subject/field/discipline/thematic, as well as the organism’s stage of development (embryo, newborn, juvenile, subadult and adult). To expand metadata usage, the database was also made available in the Dublin Core™ Metadata Initiative format. This contribution can be used as a core reference for scientists, aquaculturists and conservationists, and is useful to rapidly identify relevant literature and knowledge gaps, better understand seahorse research and discover new trends in seahorse research and innovation.
The database is available in two formats:
1) SeahorseBibliometricDatabase.xlsx
and
2) SeahorseBibliometricDatabase_DublinCore.xlsx, which is a vocabulary standardized (Dublin Core™ Metadata Initiative) version of the previous one, for metadata reuse.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Abstract MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012 [1]. The MIMIC-III Clinical Database is available on PhysioNet (doi: 10.13026/C2XW26). Though deidentified, MIMIC-III contains detailed information regarding the care of real patients, and as such requires credentialing before access. To allow researchers to ascertain whether the database is suitable for their work, we have manually curated a demo subset, which contains information for 100 patients also present in the MIMIC-III Clinical Database. Notably, the demo dataset does not include free-text notes.
Background In recent years there has been a concerted move towards the adoption of digital health record systems in hospitals. Despite this advance, interoperability of digital systems remains an open issue, leading to challenges in data integration. As a result, the potential that hospital data offers in terms of understanding and improving care is yet to be fully realized.
MIMIC-III integrates deidentified, comprehensive clinical data of patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts, and makes it widely accessible to researchers internationally under a data use agreement. The open nature of the data allows clinical studies to be reproduced and improved in ways that would not otherwise be possible.
The MIMIC-III database was populated with data that had been acquired during routine hospital care, so there was no associated burden on caregivers and no interference with their workflow. For more information on the collection of the data, see the MIMIC-III Clinical Database page.
Methods The demo dataset contains all intensive care unit (ICU) stays for 100 patients. These patients were selected randomly from the subset of patients in the dataset who eventually die. Consequently, all patients will have a date of death (DOD). However, patients do not necessarily die during an individual hospital admission or ICU stay.
This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.
Data Description MIMIC-III is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-III Clinical Database page. The demo shares an identical schema, except all rows in the NOTEEVENTS table have been removed.
The data files are distributed in comma separated value (CSV) format following the RFC 4180 standard. Notably, string fields which contain commas, newlines, and/or double quotes are encapsulated by double quotes ("). Actual double quotes in the data are escaped using an additional double quote. For example, the string she said "the patient was notified at 6pm"
would be stored in the CSV as "she said ""the patient was notified at 6pm"""
. More detail is provided on the RFC 4180 description page: https://tools.ietf.org/html/rfc4180
Usage Notes The MIMIC-III demo provides researchers with an opportunity to review the structure and content of MIMIC-III before deciding whether or not to carry out an analysis on the full dataset.
CSV files can be opened natively using any text editor or spreadsheet program. However, some tables are large, and it may be preferable to navigate the data stored in a relational database. One alternative is to create an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.
DB Browser for SQLite is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite. We have found this tool to be useful for navigating SQLite files. Information regarding installation of the software and creation of the database can be found online: https://sqlitebrowser.org/
Release Notes Release notes for the demo follow the release notes for the MIMIC-III database.
Acknowledgements This research and development was supported by grants NIH-R01-EB017205, NIH-R01-EB001659, and NIH-R01-GM104987 from the National Institutes of Health. The authors would also like to thank Philips Healthcare and staff at the Beth Israel Deaconess Medical Center, Boston, for supporting database development, and Ken Pierce for providing ongoing support for the MIMIC research community.
Conflicts of Interest The authors declare no competing financial interests.
References Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Mo...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Dietary Supplements Label Database (DSLD) - Product Information’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/2a76d253-e2f4-49c5-90e3-d08701608b28 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
(https://dsld.nlm.nih.gov) The Dietary Supplement Label Database (DSLD) includes full label derived information from dietary supplement products marketed in the U.S. with a Web-based user interface that provides ready access to label information. It was developed to serve the research community and as a resource for health care providers and the public. It can be an educational and research tool for students, academics, and other professionals.
The Product Information dataset contains the full listing of product labels, LanguaLcodes, and other product information.
--- Original source retains full ownership of the source dataset ---
This resource was retired on June 1, 2021 and is no longer updated. These data remain available to support research and development efforts.
Disaster Lit®: Database for Disaster Medicine and Public Health is a dataset of links to disaster medicine and public health documents available on the Internet at no cost. Documents include expert guidelines, research reports, conference proceedings, training classes, factsheets, websites, databases, and similar materials selected from over 700 organizations for a professional audience. Materials were selected from non-commercial publishing sources and supplement disaster-related resources from PubMed (biomedical journal literature) and MedlinePlus (health information for the public).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction Diverse neurological conditions are reported associated with the SARS-CoV-2 virus; neurological symptoms are the most common conditions to persist after resolution of acute infection, affecting 20% of patients six months after acute illness. The COVID-19 Neuro Databank (NeuroCOVID) was created to overcome the limitations of siloed small local cohorts to collect detailed, curated, and harmonized de-identified data from a large diverse cohort of adults with new or worsened neurological conditions associated with COVID-19 illness, as a scientific resource.
Methods A Steering Committee including U.S. and international experts meets quarterly to provide guidance. Initial study sites were recruited to include a wide U.S. geographic distribution, academic and non-academic sites, urban and non-urban locations, and patients of different ages, disease severity, and comorbidities seen by a variety of clinical specialists. The NeuroCOVID REDCap database was developed, incorporating input from professional guidelines, existing common data elements, and subject matter experts. A cohort of eligible adults is identified at each site; inclusion criteria are: a new or worsened neurological condition associated with a COVID-19 infection confirmed by testing. De-identified data are abstracted from patients’ medical records, using standardized common data elements and five case report forms. The database was carefully enhanced in response to feedback from site investigators and evolving scientific interest in post-acute conditions and their timing. Additional U.S. and international sites were added, focusing on diversity and populations not already described in published literature. By early 2024, NeuroCOVID included over 2700 patient records, including data from 16 U.S. and 5 international sites. Data are being shared with the scientific community in compliance with NIH requirements. The program has been invited to share case report forms with the National Library of Medicine as an ongoing resource for the scientific community.
Conclusion The NeuroCOVID database is a unique and valuable source of comprehensive de-identified data on a wide variety of neurological conditions associated with COVID-19 illness, including a diverse patient population. Initiated early in the pandemic, data collection has been responsive to evolving scientific interests. NeuroCOVID will continue to contribute to scientific efforts to characterize and treat this challenging illness and its consequences.
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 11, 2023. The Specimen Resource Locator is a database to help researchers locate human specimens (tissue, serum, DNA/RNA, other specimens) for cancer research. It includes tissue banks and tissue procurement systems with access to normal, benign, precancerous and cancerous human tissue from a variety of organs. Researchers specify the types of specimens, number of cases, preservation methods and associated data they require. The Locator will then search the database and return a list of tissue resources most likely to meet their requirements. When no match is obtained, the researcher is referred to the NCI Tissue Expediter (tissexp@mail.nih.gov). The Tissue expediter is a scientist who can help researchers identify appropriate resources and/or appropriate collaborators.
Health Services Research Projects in Progress (HSRProj) is a database of health services research and public health projects in progress, related to research in quality, cost, and access to health care. Includes behavioral health research and public health research. Currently includes over 38,000 projects with information back to the 1990s.
This resource was retired on September 14, 2021 and is no longer updated.
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on March 12,2025. The AIDSinfo Drug Database provides fact sheets on HIV/AIDS related drugs. The fact sheets describe the drug''s use, pharmacology, side effects, and other information. The database includes: -Approved and investigational HIV/AIDS related drugs -Three versions of each fact sheet: patient, health professional, and Spanish. AIDSinfo is a 100% federally funded U.S. Department of Health and Human Services (DHHS) project that offers the latest federally approved information on HIV/AIDS clinical research, treatment and prevention, and medical practice guidelines for people living with HIV/AIDS, their families and friends, health care providers, scientists, and researchers. Sponsors: -National Institutes of Health (NIH) Office of AIDS Research National Institute of Allergy and Infectious Diseases (NIAID) National Library of Medicine (NLM) -Health Resources and Services Administration (HRSA) -Centers for Disease Control and Prevention (CDC) -Centers for Medicare and Medicaid Services (CMS)
The NIMH Repository and Genomics Resource (RGR) stores biosamples, genetic, pedigree and clinical data collected in designated NIMH-funded human subject studies. The RGR database likewise links to other repositories holding data from the same subjects, including dbGAP, GEO and NDAR. The NIMH RGR allows the broader research community to access these data and biospecimens (e.g., lymphoblastoid cell lines, induced pluripotent cell lines, fibroblasts) and further expand the genetic and molecular characterization of patient populations with severe mental illness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Disaster Lit® (retired June 1, 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/19f684aa-3a28-4577-bc67-27c4395b66f8 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This resource was retired on June 1, 2021 and is no longer updated. These data remain available to support research and development efforts. More information is available at https://www.nlm.nih.gov/dimrc/disasterinfo.html.
Disaster Lit®: Database for Disaster Medicine and Public Health is a dataset of links to disaster medicine and public health documents available on the Internet at no cost. Documents include expert guidelines, research reports, conference proceedings, training classes, factsheets, websites, databases, and similar materials selected from over 700 organizations for a professional audience. Materials were selected from non-commercial publishing sources and supplement disaster-related resources from PubMed (biomedical journal literature) and MedlinePlus (health information for the public).
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WikiDBs is an open-source corpus of 100,000 relational databases. We aim to support research on tabular representation learning on multi-table data. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases.
WikiDBs was published as a spotlight paper at the Dataset & Benchmarks track at NeurIPS 2024.
WikiDBs contains the database schemas, as well as table contents. The database tables are provided as CSV files, and each database schema as JSON. The 100,000 databases are available in five splits, containing 20k databases each. In total, around 165 GB of disk space are needed for the full corpus. We also provide a script to convert the databases into SQLite.
The "https://addhealth.cpc.unc.edu/" Target="_blank">National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States. The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24-32*. Add Health combines longitudinal survey data on respondents' social, economic, psychological and physical well-being with contextual data on the family, neighborhood, community, school, friendships, peer groups, and romantic relationships, providing unique opportunities to study how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood. The fourth wave of interviews expanded the collection of biological data in Add Health to understand the social, behavioral, and biological linkages in health trajectories as the Add Health cohort ages through adulthood. The fifth wave of data collection is planned to begin in 2016.
Initiated in 1994 and supported by three program project grants from the "https://www.nichd.nih.gov/" Target="_blank">Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) with co-funding from 23 other federal agencies and foundations, Add Health is the largest, most comprehensive longitudinal survey of adolescents ever undertaken. Beginning with an in-school questionnaire administered to a nationally representative sample of students in grades 7-12, the study followed up with a series of in-home interviews conducted in 1995, 1996, 2001-02, and 2008. Other sources of data include questionnaires for parents, siblings, fellow students, and school administrators and interviews with romantic partners. Preexisting databases provide information about neighborhoods and communities.
Add Health was developed in response to a mandate from the U.S. Congress to fund a study of adolescent health, and Waves I and II focus on the forces that may influence adolescents' health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities. As participants have aged into adulthood, however, the scientific goals of the study have expanded and evolved. Wave III, conducted when respondents were between 18 and 26** years old, focuses on how adolescent experiences and behaviors are related to decisions, behavior, and health outcomes in the transition to adulthood. At Wave IV, respondents were ages 24-32* and assuming adult roles and responsibilities. Follow up at Wave IV has enabled researchers to study developmental and health trajectories across the life course of adolescence into adulthood using an integrative approach that combines the social, behavioral, and biomedical sciences in its research objectives, design, data collection, and analysis.
* 52 respondents were 33-34 years old at the time of the Wave IV interview.
** 24 respondents were 27-28 years old at the time of the Wave III interview.
To provide an array of community characteristics by which researchers may investigate the nature of such contextual influences for a wide range of adolescent health behaviors, selected contextual variables have been calculated and compiled. These are provided in this Contextual Database, already linked to the Add Health respondent IDs.
The "https://addhealth.cpc.unc.edu/" Target="_blank">National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-95 school year. The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24-32*. Add Health combines longitudinal survey data on respondents' social, economic, psychological and physical well-being with contextual data on the family, neighborhood, community, school, friendships, peer groups, and romantic relationships, providing unique opportunities to study how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood. The fourth wave of interviews expanded the collection of biological data in Add Health to understand the social, behavioral, and biological linkages in health trajectories as the Add Health cohort ages through adulthood. The fifth wave of data collection is planned to begin in 2016.
Initiated in 1994 and supported by three program project grants from the "https://www.nichd.nih.gov/" Target="_blank">Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) with co-funding from 23 other federal agencies and foundations, Add Health is the largest, most comprehensive longitudinal survey of adolescents ever undertaken. Beginning with an in-school questionnaire administered to a nationally representative sample of students in grades 7-12, the study followed up with a series of in-home interviews conducted in 1995, 1996, 2001-02, and 2008. Other sources of data include questionnaires for parents, siblings, fellow students, and school administrators and interviews with romantic partners. Preexisting databases provide information about neighborhoods and communities.
Add Health was developed in response to a mandate from the U.S. Congress to fund a study of adolescent health, and Waves I and II focus on the forces that may influence adolescents' health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities. As participants have aged into adulthood, however, the scientific goals of the study have expanded and evolved. Wave III, conducted when respondents were between 18 and 26** years old, focuses on how adolescent experiences and behaviors are related to decisions, behavior, and health outcomes in the transition to adulthood. At Wave IV, respondents were ages 24-32* and assuming adult roles and responsibilities. Follow up at Wave IV has enabled researchers to study developmental and health trajectories across the life course of adolescence into adulthood using an integrative approach that combines the social, behavioral, and biomedical sciences in its research objectives, design, data collection, and analysis.
* 52 respondents were 33-34 years old at the time of the Wave IV interview.
** 24 respondents were 27-28 years old at the time of the Wave III interview.
To provide an array of community characteristics by which researchers may investigate the nature of such contextual influences for a wide range of adolescent health behaviors, selected contextual variables have been calculated and compiled. These are provided in this Contextual Database, already linked to the Add Health respondent IDs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MeSpEn consists of a resource of heterogeneous health related documents in Spanish and English useful to build parallel corpora for training and evaluating Spanish <-> English medical machine translation systems, to generate multilingual automatic term extraction tools, and develop other Spanish medical NLP components. MeSpEn provides the combination and harmonization of various bibliographic datasets of biomedical and clinical literature from Spain and Latin America or web-content with trusted information sources about diseases, conditions, and wellness issues for patients.
MeSpEn was used to generate automatically bilingual health related-glossaries through automatic term detection and named entity recognition in English and target candidate term extraction in Spanish through sentence alignment approaches, implying potentially the generation of Silver Standard annotated health texts in Spanish.
MeSpEn was used to generate automatically bilingual health related-glossaries through automatic term detection and named entity recognition in English and target candidate term extraction in Spanish through sentence alignment approaches, implying potentially the generation of Silver Standard annotated health texts in Spanish (see Villegas, et al. "The MeSpEN resource for English-Spanish medical machine translation and terminologies: census of parallel corpora, glossaries and term translations." Proc. LREC 2018 Workshop MultilingualBIO: Multilingual Biomedical Text Processing).
The MeSpEn resource aggregates several datasets, mainly from 4 principal sources: IBECS, SciELO, Pubmed and MedlinePlus:
IBECS (Spanish Bibliographical Index in Health Sciences) is a bibliographical database that collects scientific journals covering multiple fields in health sciences. It is maintained by the Spanish National Health Sciences Library (BNCS), at the Carlos III Health Institute.
This corpus contains titles and abstracts from 168,198 records in English and Spanish. Users can find the metadata of each record written in Dublin Core format. The original XML file of the record provided by IBECS is provided as well.
For more information about IBECS parallel corpora, see IBECS_README file.
SciELO (Scientific Electronic Library Online) gathers electronic publications of complete full text articles from scientific journals of Latin America, South Africa and Spain. Currently is present in 15 countries and supported by the Sao Paulo Research Foundation (FAPESP) and the Brazilian National Council for Scientific and Technological Development (BIREME).
This corpus contains titles and abstracts from 161,710 records in English and Spanish. Users can find the metadata of each record written in Dublin Core format.
For more information about SciELO parallel corpora, see Scielo_README file.
Pubmed is a free search engine used to access the MedlineNLM).
This corpus contains titles and abstracts from 127,619 records. Users can find the metadata of each record written in Dublin Core format. The original XML file of the record provided by PubMed is provided as well.
For more information about Pubmed parallel corpora, see Pubmed_README file.
Users can access to all Spanish articles in Pubmed by clicking here. Follow these steps to download all articles' metadata in XML format:
MedlinePlus is an online information service provided by the U.S. National Library of Medicine (NLM), and gives free information about health in both English and Spanish. MedlinePlus provides the following information: Health topics, Drugs and supplements, Laboratory test information, Medical encyclopedia.
There are 2 corpora available for download:
These corpora are also available at http://temu.bsc.es/mespen/
In addition, forty-six bilingual medical glossaries for various language pairs are available at https://zenodo.org/record/2205690#.XefkzdEo9hF
Copyright (c) 2019 Secretaría de Estado para el Avance Digital
The 3PFL database links information on patented inventions and scientific publications related to a public procurement contract or a research grant awarded by the U.S. Federal Government to detailed contract-level/grant-level information (e.g., awarding agency, recipient organization, award size). We have combined data from multiple sources, including (but not limited to) the United States Patent and Trademark Office bulk database, the Federal Procurement Database System, the Award Submission Portal (ASP), and the European Patent Office's PATSTAT database. We also provide a link to the scientific publications associated with these patents. The 3PFL database provides rich and original information that opens the door to novel empirical research in the economics of innovation and science. The tables 07_grantee_information and 09_paper_information.csv (for NIH research grant related publications) are in a preliminary version and will be updated in future releases.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm
RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/
RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.
This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.
The following tables are included in the RxNorm dataset:
RXNCONSO contains concept and source information
RXNREL contains information regarding relationships between entities
RXNSAT contains attribute information
RXNSTY contains semantic information
RXNSAB contains source info
RXNCUI contains retired rxcui codes
RXNATOMARCHIVE contains archived data
RXNCUICHANGES contains concept changes
Update Frequency: Monthly
Fork this kernel to get started with this dataset.
https://www.nlm.nih.gov/research/umls/rxnorm/
https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm
https://cloud.google.com/bigquery/public-data/rxnorm
Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.
Banner Photo by @freestocks from Unsplash.
What are the RXCUI codes for the ingredients of a list of drugs?
Which ingredients have the most variety of dose forms?
In what dose forms is the drug phenylephrine found?
What are the ingredients of the drug labeled with the generic code number 072718?