A list of NIH-supported repositories that accept submissions of appropriate scientific research data from biomedical researchers. It includes resources that aggregate information about biomedical data and information sharing systems. Links are provided to information about submitting data to and accessing data from the listed repositories. Additional information about the repositories and points-of contact for further information or inquiries can be found on the websites of the individual repositories.
Research projects funded by the National Institutes of Health (NIH), other DHHS Operating Divisions (ACF, AHRQ, CDC, FDA, HRSA), and the Department of Veterans Affairs. The ExPORTER files provide weekly and/or yearly snapshots of the data publicly accessible through the NIH Research Portfolio Online Reporting Tools, Expenditures and Results (RePORTER) system at https://reporter.nih.gov. The RePORTER database can also be queried using the user interface or the API. The RePORTER database contains information such as project title, abstract, principal investigator, funded organization, total awarded costs, categorization by area of research (NIH only), and project keywords. Also available is information on research publications and patents that have cited support from each project.
The NIH Common Data Elements (CDE) Repository has been designed to provide access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes. Visit the NIH CDE Resource Portal for contextual information about the repository.
A listing of NIH supported data sharing repositories that make data accessible for reuse. Most accept submissions of appropriate data from NIH-funded investigators (and others), but some restrict data submission to only those researchers involved in a specific research network. Also included are resources that aggregate information about biomedical data and information sharing systems. The table can be sorted according by name and by NIH Institute or Center and may be searched using keywords so that you can find repositories more relevant to your data. Links are provided to information about submitting data to and accessing data from the listed repositories. Additional information about the repositories and points-of-contact for further information or inquiries can be found on the websites of the individual repositories.
The ImmPort system serves as a long-term, sustainable archive of immunology research data generated by investigators mainly funded through the NIAID/DAIT. The core component of the ImmPort system is an extensive data warehouse containing an integration of experimental data and clinical trial data. The ImmPort system also provides data analysis tools and an immunology-focused ontology. The analytical tools created and integrated as part of the ImmPort system are available to any researcher within ImmPort after registration and approval by DAIT. Additionally, the data provided mainly by NIAID/DAIT funded researchers in ImmPort will be available to all registered users after the appropriate embargo time.
ZFIN serves as the zebrafish model organism database. It aims to: a) be the community database resource for the laboratory use of zebrafish, b) develop and support integrated zebrafish genetic, genomic and developmental information, c) maintain the definitive reference data sets of zebrafish research information, d) to link this information extensively to corresponding data in other model organism and human databases, e) facilitate the use of zebrafish as a model for human biology, and f) serve the needs of the research community.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Pile -- NIHExPorter (refined by Data-Juicer)
A refined version of NIHExPorter dataset in The Pile by Data-Juicer. Removing some "bad" samples from the original dataset to make it higher-quality. This dataset is usually used to pretrain a Large Language Model. Notice: Here is a small subset for previewing. The whole dataset is available here (About 2.0G).
Dataset Information
Number of samples: 858,492 (Keep ~91.36% from the original dataset)
Refining… See the full description on the dataset page: https://huggingface.co/datasets/datajuicer/the-pile-nih-refined-by-data-juicer.
The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database are curated both manually by expert curators and also automatically using computational approaches that utilize the the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data.
The Cell Centered Database (CCDB) is a web accessible database for high resolution 2D, 3D and 4D data from light and electron microscopy, including correlated imaging.
The Learning Resources Database is a catalog of interactive tutorials, videos, online classes, finding aids, and other instructional resources on National Library of Medicine (NLM) products and services. Resources may be available for immediate use via a browser or downloadable for use in course management systems.
The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) is a national genetics data repository facilitating access to genotypic and phenotypic data for Alzheimer's disease (AD). Data include GWAS, whole genome (WGS) and whole exome (WES), expression, RNA Seq, and CHIP Seq analyses. Data for the Alzheimer s Disease Sequencing Project (ADSP) are available through a partnership with dbGaP (ADSP at dbGaP). Results are integrated and annotated in the searchable genomics database that also provides access to a variety of software packages, analytic pipelines, online resources, and web-based tools to facilitate analysis and interpretation of large-scale genomic data. Data are available as defined by the NIA Genomics of Alzheimer s Disease Sharing Policy and the NIH Genomics Data Sharing Policy. Investigators return secondary analysis data to the database in keeping with the NIAGADS Data Distribution Agreement.
EuPathDB Bioinformatics Resource Center for Biodefense and Emerging/Re-emerging Infectious Diseases is a portal for accessing genomic-scale datasets associated with the eukaryotic pathogens.
Nonalcoholic fatty liver disease (NAFLD) affects 10%-30% of the general U.S. population and can progress to significant fibrosis and cirrhosis. When nonalcoholic steatohepatitis (NASH) is present, the 5-year and 10-year survivals are estimated at 67% and 59%, respectively. The presence of NASH and early fibrosis is currently established only by liver biopsy; noninvasively determining who has NASH and who is at risk for progressing to cirrhosis remains challenging.
The Nonalcoholic Steatohepatitis Clinical Research Network (NASH CRN) was initiated by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) in 2002 to conduct multicenter, collaborative studies on the etiology, contributing factors, natural history, complications, and treatment of NASH. To meet these goals, patients with the full spectrum of NAFLD or cryptogenic cirrhosis were enrolled in an observational Database study.
Comprehensive data, including demographics, medical history, symptoms, medication use, diet and exercise habits, and routine laboratory studies were collected on all patients at entry and at annual visits for up to 4 years after enrollment. Study questionnaires administered at enrollment and at selected follow-up visits included AUDIT; Block Food Questionnaire; Skinner Lifetime Drinking History, Physical Activity Questionnaire, Modifiable Activity Questionnaire; and the MOS 36-Item Short-Form Health Survey. Specimens were collected at selected time points during follow-up. If liver biopsies were obtained as part of routine patient care, they were scored using the NASH CRN NAFLD Activity Score (NAS) and fibrosis score.
Database of Short Genetic Variations (dbSNP) contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.
A database of federally funded biomedical research projects conducted at universities, hospitals, and other research institutions that provides a central point of access to reports, data, and analyses of NIH research. The RePORTER has replaced the CRISP database. The database, maintained by the Office of Extramural Research at the National Institutes of Health, includes projects funded by the National Institutes of Health (NIH), Substance Abuse and Mental Health Services (SAMHSA), Health Resources and Services Administration (HRSA), Food and Drug Administration (FDA), Centers for Disease Control and Prevention (CDCP), Agency for Health Care Research and Quality (AHRQ), and Office of Assistant Secretary of Health (OASH).
The Influenza Research Database (IRD) serves as a public repository and analysis platform for flu sequence, experiment, surveillance and related data.
The Mouse Phenome Database (MPD) has characterizations of hundreds of strains of laboratory mice to facilitate translational discoveries and to assist in selection of strains for experimental studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COInr is a non-redundant, comprehensive database of COI sequences extracted from NCBI-nt and BOLD. It is not limited to a taxon, a gene region, or a taxonomic resolution. Sequences are dereplicated between databases and within taxa.
Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing creating their full or ranked linages.
COInr is a good starting point to create custom databases according to the users’ needs using mkCOInr scripts available at https://github.com/meglecz/mkCOInr
It is possible to select/eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifiers.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The zip files contains the data and programs for replicating the statistical analyses in Packalen M and J Bhattacharya (2020) “NIH Funding and the Pursuit of Edge Science”. Earlier version of the paper was circulated as NBER working paper No. 24860, titled “Does the NIH Fund Edge Science?” The file Readme_NIHEdgeScience_StatisticalAnalysis.pdf contains documentation for the data files and programs.
Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.
A list of NIH-supported repositories that accept submissions of appropriate scientific research data from biomedical researchers. It includes resources that aggregate information about biomedical data and information sharing systems. Links are provided to information about submitting data to and accessing data from the listed repositories. Additional information about the repositories and points-of contact for further information or inquiries can be found on the websites of the individual repositories.