The "https://addhealth.cpc.unc.edu/" Target="_blank">National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States. The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24-32*. Add Health combines longitudinal survey data on respondents' social, economic, psychological and physical well-being with contextual data on the family, neighborhood, community, school, friendships, peer groups, and romantic relationships, providing unique opportunities to study how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood. The fourth wave of interviews expanded the collection of biological data in Add Health to understand the social, behavioral, and biological linkages in health trajectories as the Add Health cohort ages through adulthood. The fifth wave of data collection is planned to begin in 2016.
Initiated in 1994 and supported by three program project grants from the "https://www.nichd.nih.gov/" Target="_blank">Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) with co-funding from 23 other federal agencies and foundations, Add Health is the largest, most comprehensive longitudinal survey of adolescents ever undertaken. Beginning with an in-school questionnaire administered to a nationally representative sample of students in grades 7-12, the study followed up with a series of in-home interviews conducted in 1995, 1996, 2001-02, and 2008. Other sources of data include questionnaires for parents, siblings, fellow students, and school administrators and interviews with romantic partners. Preexisting databases provide information about neighborhoods and communities.
Add Health was developed in response to a mandate from the U.S. Congress to fund a study of adolescent health, and Waves I and II focus on the forces that may influence adolescents' health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities. As participants have aged into adulthood, however, the scientific goals of the study have expanded and evolved. Wave III, conducted when respondents were between 18 and 26** years old, focuses on how adolescent experiences and behaviors are related to decisions, behavior, and health outcomes in the transition to adulthood. At Wave IV, respondents were ages 24-32* and assuming adult roles and responsibilities. Follow up at Wave IV has enabled researchers to study developmental and health trajectories across the life course of adolescence into adulthood using an integrative approach that combines the social, behavioral, and biomedical sciences in its research objectives, design, data collection, and analysis.
* 52 respondents were 33-34 years old at the time of the Wave IV interview.
** 24 respondents were 27-28 years old at the time of the Wave III interview.
The Wave III public-use data are helpful in analyzing the transition between adolescence and young adulthood. Included in this dataset are data on pregnancy.
Open-i service provides search and retrieval of abstracts and images (including charts, graphs, clinical images, etc.) from the open source literature, and biomedical image collections. Searching may be done by text queries as well as by query images.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm
RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/
RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.
This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.
The following tables are included in the RxNorm dataset:
RXNCONSO contains concept and source information
RXNREL contains information regarding relationships between entities
RXNSAT contains attribute information
RXNSTY contains semantic information
RXNSAB contains source info
RXNCUI contains retired rxcui codes
RXNATOMARCHIVE contains archived data
RXNCUICHANGES contains concept changes
Update Frequency: Monthly
Fork this kernel to get started with this dataset.
https://www.nlm.nih.gov/research/umls/rxnorm/
https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm
https://cloud.google.com/bigquery/public-data/rxnorm
Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.
Banner Photo by @freestocks from Unsplash.
What are the RXCUI codes for the ingredients of a list of drugs?
Which ingredients have the most variety of dose forms?
In what dose forms is the drug phenylephrine found?
What are the ingredients of the drug labeled with the generic code number 072718?
Microarray study attributes and data sharing status397 rows, one row for each study that created gene expression microarray data as identified by Ochsner et al. (doi:10.1038/nmeth1208-991). Attributes of each study are included in 23 columns. Dependent variable is called is_data_shared.Piwowar_Metrics2009_rawdata.csvStatistical analysis R scriptStatistical R script for analysis and graphics as presented in the paper.Piwowar_Metrics2009_statistics.R
The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) is a national genetics data repository facilitating access to genotypic and phenotypic data for Alzheimer's disease (AD). Data include GWAS, whole genome (WGS) and whole exome (WES), expression, RNA Seq, and CHIP Seq analyses. Data for the Alzheimer’s Disease Sequencing Project (ADSP) are available through a partnership with dbGaP (ADSP at dbGaP). Results are integrated and annotated in the searchable genomics database that also provides access to a variety of software packages, analytic pipelines, online resources, and web-based tools to facilitate analysis and interpretation of large-scale genomic data. Data are available as defined by the NIA Genomics of Alzheimer’s Disease Sharing Policy and the NIH Genomics Data Sharing Policy. Investigators return secondary analysis data to the database in keeping with the NIAGADS Data Distribution Agreement.
NAHDAP acquires, preserves and disseminates data relevant to drug addiction and HIV research. By preserving and making available an easily accessible library of electronic data on drug addiction and HIV infection in the United States, NAHDAP offers scholars the opportunity to conduct secondary analysis on major issues of social and behavioral sciences and public policy.
"The NICHD Data and Specimen Hub (DASH) is a centralized resource that allows researchers to share and access de-identified data from studies funded by NICHD. DASH also serves as a portal for requesting biospecimens from selected DASH studies.". This dataset is associated with the following publication: Deluca, N., K. Thomas, A. Mullikin, R. Slover, L. Stanek, D. Pilant, and E. Hubal. Geographic and demographic variability in serum PFAS concentrations for pregnant women in the United States. Journal of Exposure Science and Environmental Epidemiology. Nature Publishing Group, London, UK, 33(1): 710-724, (2023).
The Dataset Catalog (beta)is a catalog of biomedical datasets from various repositories for users to search, discover, retrieve, and connect with datasets to accelerate scientific research. This beta version aims to collect user feedback to inform future product development.
BioGRID is an online interaction repository with data on raw protein and genetic interactions from major model organism species. All interaction data are freely provided through our search index and available via download in a wide variety of standardized formats.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.
The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.
Methodology
To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).
These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.
To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.
Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.
Description of the data in this data set
Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies
The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information
Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet
2) Complete reference - the complete source information to refer to the study
3) Year of publication - the year in which the study was published
4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter}
5) DOI / Website- a link to the website where the study can be found
6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science
7) Availability in OA - availability of an article in the Open Access
8) Keywords - keywords of the paper as indicated by the authors
9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}
Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?
Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)?
18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))
HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term?
20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output")
21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description)
22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles?
23) Data - what data do HVD cover?
24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)
Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx
Licenses or restrictions CC-BY
For more info, see README.txt
BRADS is a repository for data and biospecimens from population health research initiatives and clinical or interventional trials designed and implemented by NICHD’s Division of Intramural Population Health Research (DIPHR). Topics include human reproduction and development, pregnancy, child health and development, and women’s health. The website is maintained by DIPHR.
The Influenza Research Database (IRD) serves as a public repository and analysis platform for flu sequence, experiment, surveillance and related data.
National Database for Autism Research (NDAR) is an extensible, scalable informatics platform for autism spectrum disorder-relevant data at all levels of biological and behavioral organization (molecules, genes, neural tissue, behavioral, social and environmental interactions) and for all data types (text, numeric, image, time series, etc.). NDAR was developed to share data across the entire ASD field and to facilitate collaboration across laboratories, as well as interconnectivity with other informatics platforms. NDAR Homepage: http://ndar.nih.gov/
Introduction: Through funding agency and publisher policies, an increasing proportion of the health sciences literature is being made open access. Such an increase in access raises questions about the awareness and potential utilization of this literature by those working in health fields. Methods: A sample of physicians (N=336) and public health non-governmental organization (NGO) staff (N=92) were provided with relatively complete access to the research literature indexed in PubMed, as well as access to the point-of-care service UpToDate, for up to one year, with their usage monitored through the tracking of web-log data. The physicians also participated in a one-month trial of relatively complete or limited access. Results: The study found that participants' research interests were not satisfied by article abstracts alone nor, in the case of the physicians, by a clinical summary service such as UpToDate. On average, a third of the physicians viewed research a little more frequently t...
The NLM Visible Human Project® has created publicly-available complete, anatomically detailed, three-dimensional representations of a human male body and a human female body. Specifically, the VHP provides a public-domain library of cross-sectional cryosection, CT, and MRI images obtained from one male cadaver and one female cadaver. The Visible Man data set was publicly released in 1994 and the Visible Woman in 1995. https://www.nlm.nih.gov/research/visible/visible_human.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study of music is highly interdisciplinary, and thus requires the combination of datasets from multiple musical domains, such as catalog metadata (authors, song titles, dates), industrial records (labels, producers, sales), and music notation (scores). While today an abundance of music metadata exists on the Linked Open Data cloud, datasets containing interoperable symbolic descriptions of music itself, i.e. music notation with note and instrument level information, are scarce. This is the MIDI Linked Data Cloud, a dataset that represents multiple collections of digital music in the MIDI standard format as Linked Data. At the time of writing, the dataset comprises 10,215,557,355 triples of 308,443 interconnected MIDI scores, and provides Web-compatible descriptions of their MIDI events.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
The PMC Open Access Subset includes more than 3.4 million journal articles and preprints that are made available under license terms that allow reuse.
Not all articles in PMC are available for text mining and other reuse, many have copyright protection, however articles in the PMC Open Access Subset are made available under Creative Commons or similar licenses that generally allow more liberal redistribution and reuse than a traditional copyrighted work.
The PMC Open Access Subset is one part of the PMC Article Datasets
The VSAC is a repository and authoring tool for public value sets created by external programs. Value sets are lists of codes and corresponding terms, from NLM-hosted standard clinical vocabularies (such as SNOMED CT®, RxNorm, LOINC® and others), that define clinical concepts to support effective and interoperable health information exchange. The VSAC does not create value set content. The VSAC also provides downloadable access to all official versions of value sets specified by the Centers for Medicare & Medicaid Services (CMS) electronic Clinical Quality Measures (eCQMs). For information on CMS eCQMs, visit the eCQI Resource Center. The VSAC is provided by the National Library of Medicine (NLM), in collaboration with the Office of the National Coordinator for Health Information Technology (ONC) and CMS.
The Cell Centered Database (CCDB) is a web accessible database for high resolution 2D, 3D and 4D data from light and electron microscopy, including correlated imaging.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository provides data from the FDA's Orange Book on patents associated with marketed drugs, with additional information on whether each patent is a "public-sector" patent. The dataset accompanies the following article: - Lisa Larrimore Ouellette and Bhaven N. Sampat. 2024. "The Feasibility of Using Bayh-Dole March-In Rights to Lower Drug Prices: An Update." A grant from the National Institute of Healthcare Management (NIHCM) (PI-Sampat) helped support the data collection and cleaning effort.
The "https://addhealth.cpc.unc.edu/" Target="_blank">National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States. The Add Health cohort has been followed into young adulthood with four in-home interviews, the most recent in 2008, when the sample was aged 24-32*. Add Health combines longitudinal survey data on respondents' social, economic, psychological and physical well-being with contextual data on the family, neighborhood, community, school, friendships, peer groups, and romantic relationships, providing unique opportunities to study how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood. The fourth wave of interviews expanded the collection of biological data in Add Health to understand the social, behavioral, and biological linkages in health trajectories as the Add Health cohort ages through adulthood. The fifth wave of data collection is planned to begin in 2016.
Initiated in 1994 and supported by three program project grants from the "https://www.nichd.nih.gov/" Target="_blank">Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) with co-funding from 23 other federal agencies and foundations, Add Health is the largest, most comprehensive longitudinal survey of adolescents ever undertaken. Beginning with an in-school questionnaire administered to a nationally representative sample of students in grades 7-12, the study followed up with a series of in-home interviews conducted in 1995, 1996, 2001-02, and 2008. Other sources of data include questionnaires for parents, siblings, fellow students, and school administrators and interviews with romantic partners. Preexisting databases provide information about neighborhoods and communities.
Add Health was developed in response to a mandate from the U.S. Congress to fund a study of adolescent health, and Waves I and II focus on the forces that may influence adolescents' health and risk behaviors, including personal traits, families, friendships, romantic relationships, peer groups, schools, neighborhoods, and communities. As participants have aged into adulthood, however, the scientific goals of the study have expanded and evolved. Wave III, conducted when respondents were between 18 and 26** years old, focuses on how adolescent experiences and behaviors are related to decisions, behavior, and health outcomes in the transition to adulthood. At Wave IV, respondents were ages 24-32* and assuming adult roles and responsibilities. Follow up at Wave IV has enabled researchers to study developmental and health trajectories across the life course of adolescence into adulthood using an integrative approach that combines the social, behavioral, and biomedical sciences in its research objectives, design, data collection, and analysis.
* 52 respondents were 33-34 years old at the time of the Wave IV interview.
** 24 respondents were 27-28 years old at the time of the Wave III interview.
The Wave III public-use data are helpful in analyzing the transition between adolescence and young adulthood. Included in this dataset are data on pregnancy.