THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome
Manually curated database of all conditions with known genetic causes, focusing on medically significant genetic data with available interventions. Includes gene symbol, conditions, allelic conditions, inheritance, age in which interventions are indicated, clinical categorization, and general description of interventions/rationale. Contents are intended to describe types of interventions that might be considered. Includes only single gene alterations and does not include genetic associations or susceptibility factors related to more complex diseases.
Collection of curated structural variation in the human genome. Catalogue of human genomic structural variation identified in healthy control samples for studies aiming to correlate genomic variation with phenotypic data. It is continuously updated with new data from peer reviewed research studies. The Database is no longer accepting direct submission of data as they are currently part of a collaboration with two new archival CNV databases at EBI and NCBI, called DGVa and dbVAR, respectively. One of the changes to DGV as part of this collaborative effort is that they will no longer be accepting direct submissions, but rather obtain the datasets from DGVa (short for DGV archive). This will ensure that the three databases are synchronized, and will allow for an official accessioning of variants.
A portal to georeferenced databases and tools for the analysis of marine bacterial, archaeal, and phage genomes and metagenomes. The megx.net database, MegDB (microbial ecological genomics DataBase), is a collection of publicly available georeferenced marine bacterial and archaeal genomes and metagenomes, including the Global Ocean Sampling (GOS) reads. Marine microbial genomics and metagenomics is an emerging field in environmental research. Since the completion of the first marine bacterial genome in 2003, the number of fully sequenced marine bacteria has grown rapidly. Concurrently, marine metagenomics studies are performed on a regular basis, and the resulting number of sequences is growing exponentially. To address environmentally relevant questions like organismal adaptations to oceanic provinces and regional differences in the microbial cycling of nutrients, it is necessary to couple sequence data with geographical information and supplement them with contextual information like physical, chemical and biological data. Therefore, new specialized databases are needed to organize and standardize data storage as well as centralize data access and interpretation. Megx.net is a set of databases and tools that handle genomic and metagenomic sequences in their environmental contexts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The proportion of sites that is polymorphic in each category using SNPs where at least one derived allele is reported (Derived Allele Count (DAC) > = 1) and those SNPs where at least six derived alleles are reported (DAC > = 6).
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The primary mission of the Alliance of Genome Resources (the Alliance) is to develop and maintain sustainable genome information resources that facilitate the use of diverse model organisms in understanding the genetic and genomic basis of human biology, health and disease. This understanding is fundamental for advancing genome biology research and for translating human genome data into clinical utility. The unified Alliance information system will represent the union of the data and information represented in the current individual MODs rather than the intersection, and thus provide the best of each in one place while maintaining community integrity and preserving the unique aspects of each model organism. By working together we can be more comprehensive and efficient, and hence more sustainable. Through the implementation of a shared, modular information system architecture, the Alliance seeks to serve diverse user communities including (i) human geneticists who want access to all model organism data for orthologous human genes; (ii) basic science researchers who use specific model organisms to understand fundamental biology; (iii) computational biologists and data scientists who need access to standardized, well-structured data, both big and small; and (iv) educators and students. Community genome resources such as the Model Organism Databases and the Gene Ontology Consortium have developed high quality resources enabling cost and time effective information retrieval and aggregation that would otherwise require countless hours to achieve. Regardless of their success and utility, there remain challenges to using and sustaining MODs. Searching across multiple model organism database resources remains a barrier to realizing the full impact of these resources in advancing genome biology and genomic medicine. In addition, despite a growing need for MODs by the biomedical research community as well as the increasing volumes of data and publications, the financial resources available to sustain MODs and related information resources are being reduced. We believe that one contribution to solving these challenges while continuing to serve our diverse user communities is to unify our efforts. To this end, six MODs (Saccharomyces Genome Database, WormBase, FlyBase, Zebrafish Information Network, Mouse Genome Database, Rat Genome Database) and the Gene Ontology (GO) project joined together in the fall of 2016 to form the Alliance of Genome Resources (the Alliance) consortium. Resources in this dataset:Resource Title: Alliance of Genome Resources. File Name: Web Page, url: https://www.alliancegenome.org/
MBGD is a database for comparative analysis of completely sequenced microbial genomes, the number of which is now growing rapidly. The aim of MBGD is to facilitate comparative genomics from various points of view such as ortholog identification, paralog clustering, motif analysis and gene order comparison. The heart of MBGD function is to create orthologous or homologous gene cluster table. For this purpose, similarities between all genes are precomputed and stored into the database, in addition to the annotations of genes such as function categories that were assigned by the original authors and motifs that were found in the translated sequence. Using these homology data, MBGD dynamically creates orthologous gene cluster table. Users can change a set of organisms or cutoff parameters to create their own orthologous grouping. Based on this cluster table, users can further analyze multiple genomes from various points of view with the functions such as global map comparison, local map comparison, multiple sequence alignment and phylogenetic tree construction.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Species detection using eDNA is revolutionizing the global capacity to monitor biodiversity. However, the lack of regional, vouchered, genomic sequence information—especially sequence information that includes intraspecific variation—creates a bottleneck for management agencies wanting to harness the complete power of eDNA to monitor taxa and implement eDNA analyses. eDNA studies depend upon regional databases of complete mitogenomic sequence information to evaluate the effectiveness of such data to differentiate, identify and detect taxa. We created the Oregon Biodiversity Genome Project working group to utilize recent advances in sequencing technology to create a database of complete, near error-free mitogenomic sequences for all of Oregon's resident freshwater fishes. So far, we have successfully assembled the complete mitogenomes of 313 specimens of freshwater fish representing 7 families, 55 genera, and 129 (88%) of the 146 resident species and lineages. Our comparative analyses of these sequences illustrate that the short (~150 bp) mitochondrial “barcode” regions typically used for eDNA assays are not consistently diagnostic for species-level identification and that no single region is best for metabarcoding Oregon’s fishes. However, often-overlooked intergenic regions of the mitogenome such as the D-loop have the potential to reliably diagnose and differentiate species. This project provides a blueprint for other researchers to follow as they build regional databases. It also illustrates the taxonomic value and limits of complete mitogenomic sequences, and how current eDNA assays and the “PCR-free” environmental genomics methods of the future can best leverage this information. Methods Voucher Specimen and Tissue Collection This effort was motivated by the Oregon Biodiversity Genome Project (OBGP; www.obgp.org), a multi-institution collaboration between scientists and wildlife managers at Oregon State University, the Oregon Department of Fish and Wildlife (ODFW), and the United States Forest Service. The primary objective of the OBGP is to develop a regional genetic reference database to facilitate statewide eDNA monitoring programs for Oregon’s resident freshwater fishes. The specific goals of the OBGP (Fig 2a) are to: (1) use sterile laboratory methods to collect 10 georeferenced full-bodied vouchers of each freshwater fish species from dispersed watersheds in Oregon; (2) archive and link voucher specimens, tissues, and metadata for taxonomic verification and revision; (3) sequence full mitogenomes from multiple specimens per species; and (4) make all curated data publicly available via a client-server database accessed via a web browser.
The study area encompassed the State of Oregon—the region of interest for our eDNA monitoring program. We collected fishes in Oregon and expanded to a few sites in northern California and Washington State (Fig 2b). We examined historical location records in existing collections such as Oregon State Ichthyology Collection and conferred with local biologists to identify resident fishes and occupied locations. For cases where we knew or suspected that deeply divergent evolutionary lineages existed within the present concept of a species, we aimed to include representatives of all lineages. Biologists from ODFW ultimately identified 146 native and nonnative freshwater fish species and lineages that currently reside in Oregon and strategized collections to span watersheds throughout the state (Appendix S1). Each sampling kit (Appendix S2 Box S1) contained a 500-mL Nalgene bottle filled with 10% formalin, a 2.0 mL cryotube filled with 95% EtOH, a sterile scalpel, scissors and tweezers, a bleach wipe, latex gloves, a detailed sampling protocol to ensure consistent tissue sampling and data collection (Appendix S2 Box S2), and a field notes sheet (Appendix S2 Box S3) for metadata collection. Collectors anaesthetized and euthanized all fish specimens prior to tissue collection by immersion in an aqueous solution of Tricaine mesylate (MS-222). For collections in 2017, we worked with partners (Appendix S3 collecting_entity) who followed accepted procedures under Oregon State University and USFS IACUC protocols, but an IACUC was not required by all partner institutions. Specimen collection by ODFW was conducted under the agency’s statutory management authority and in 2018, 2019, and 2020 ODFW collected specimens for ESA-listed species under National Oceanic and Atmospheric Administration Permit numbers 21780, 22639, and 23527 respectively. Fish under USFWS jurisdiction (i.e. fish that are neither marine nor anadromous) were covered under ODFW’s ESA Section 6 Cooperative Agreement with USFWS. Details regarding partner collection permits and authority are listed in Appendix S3. We instructed all partners to collect a minimum of ~0.5 cm3 of tissue from each specimen, which was then placed in 95% EtOH for DNA extraction and sequencing. Euthanized fish were placed in 10% Formalin to ensure preservation of diagnostic features. When we failed to collect species or redundant examples of species, we augmented in-field collection with tissue samples loaned or gifted from North American ichthyology collections (OS14271, OS18056, OS18057, OS19982, OS19351, OS18993, OS20085, OS20084, OS20081, OS20080, OS20094, OS20088, OS20108, OS14271, OS22282, UW155929, UW158361, UAM:Fish:10376:401245, UAM:Fish:10464:374966, UAM:Fish:10464:374967). The goal of collecting 10 individuals per species was amended to collect three individuals and add specimens only if intraspecific genetic variation was detected in downstream mitogenome identity analyses (See below).
Taxonomic Verification, Accession, and Cataloging ODFW biologists and partners identified specimens provisionally in the field and Oregon State Ichthyology Collection taxonomists verified and refined those identifications prior to cataloging the specimens by morphological examination and reference to published keys (Markle and Tomelleri 2016, Wydoski and Whitney 2003). The Oregon State Ichthyology Collection has arranged to accession all vouchers and tissues, with full-bodied voucher specimens being transferred from formalin to isopropyl alcohol for permanent storage. Tissues were stored in 2.0 mL cryotubes at -70°C in 95% EtOH. Accessioning and cataloging were ongoing at the time of writing.
After generating sequence data (See below), we performed distance-based cluster analyses in Geneious to verify morphological identification 10.2.6 using default settings (Global alignment with free end gaps, Cost Matrix of 65% similarity, Tamura-Nei Genetic Distance Model, Neighbor-Joining (NJ) Tree build Method, Gap open penalty of 12, Gap extension penalty of 3). We used the NAD2 gene for Catostomidae, Centrarchidae, Cottidae, Cyprinidae, Ictaluridae, and Salmonidae NJ trees. Because the species of Lampreys (Petromyzontidae) in Oregon possess very similar mitogenomes, we concatenated the NAD4, NAD5, and NAD6 genes in order to increase the length of sequence examined in the search for genetic clusters. In cases of incongruence between morphological and genetic clustering, we revisited the anatomical identifications of the vouchers, investigated the possibility of swapped or contaminated molecular samples, and corrected identifications as needed.
DNA Extraction and Sequencing We subsampled tissues into ~1.0 mm3 volumes and extracted DNA from these subsamples using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) spin-column protocol for animal tissues. To further optimize the lysing process, we crushed tissues in-tube with a micropestle after incubation. We used the Invitrogen dsDNA Broad-Range assay Kit and a Qubit fluorometer (Invitrogen, Carlsbad, CA) to measure DNA concentrations and yield. For each extracted specimen, 100 µL of extract containing 100-2000 ng/µL of DNA was transferred to a 0.65 mL Bioruptor microtube and sonicated (30 s on, 90 s off; 6 cycles) to ~300 bp in length using the manufacturer's protocol using a Bioruptor Pico sonication system (Diagenode, Denville, NJ). We prepared libraries for next generation sequencing for the first two sequencing runs according to manufacturers’ instructions using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA) (Appendix S3 library_prep). Oregon State University’s Center for Quantitative Life Sciences performed library preparation for the final two runs using the plexWell 96 Kit (SeqWell, Beverly, MA) (Appendix S3 library_prep). Paired-end (2 x 150 bp) sequencing was performed on all samples at multiplexing levels between 50 to 71 samples/lane (Appendix S3 spl) using an Illumina HiSeq 3000 at the Center for Quantitative Life Sciences.
Mitogenome Assembly To capture geographic genetic variation of each resident species across its range within Oregon, we sequenced the first collected representative of each species and subsequently sequenced specimens collected from separate watersheds. We stored gzipped fastq sequencing files on 2 x 1TB enterprise NL-SAS hard drives, and performed mitogenome assemblies on 4 x 2.30 GHz 16-core processors using 512GB ECC RAM. We targeted the first collected representative of each species for sequencing and maximized geographic distance among subsequent sequenced specimens to capture geographic genetic variation of all species throughout Oregon. Mitochondrial genomes were assembled de novo from raw paired reads using SPAdes assembler (versions 3.12.0-3.15.3) (Bankevich et al. 2012) or getOrganelle 1.6.2 or 1.7.5 (Jin et al. 2020). Three mitogenomes were recovered by performing reference-guided filtering with BLAT (Kent 2002) using the complete mitogenome sequences of identical or closely related species prior to SPAdes assembly. We resolved one mitogenome by first mapping reads in Geneious 10.2.6 to the noncircular mitochondrial contig produced from SPAdes de novo assembly and then
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Human mitochondrial DNA (mtDNA) encodes a set of 37 genes which are essential structural and functional components of the electron transport chain. Variations in these genes have been implicated in a broad spectrum of diseases and are extensively reported in literature and various databases. In this study, we describe MitoLSDB, an integrated platform to catalogue disease association studies on mtDNA (http://mitolsdb.igib.res.in). The main goal of MitoLSDB is to provide a central platform for direct submissions of novel variants that can be curated by the Mitochondrial Research Community. MitoLSDB provides access to standardized and annotated data from literature and databases encompassing information from 5231 individuals, 675 populations and 27 phenotypes. This platform is developed using the Leiden Open (source) Variation Database (LOVD) software. MitoLSDB houses information on all 37 genes in each population amounting to 132397 variants, 5147 unique variants. For each variant its genomic location as per the Revised Cambridge Reference Sequence, codon and amino acid change for variations in protein-coding regions, frequency, disease/phenotype, population, reference and remarks are also listed. MitoLSDB curators have also reported errors documented in literature which includes 94 phantom mutations, 10 NUMTs, six documentation errors and one artefactual recombination. MitoLSDB is the largest repository of mtDNA variants systematically standardized and presented using the LOVD platform. We believe that this is a good starting resource to curate mtDNA variants and will facilitate direct submissions enhancing data coverage, annotation in context of pathogenesis and quality control by ensuring non-redundancy in reporting novel disease associated variants.
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. A database of information on bacterial phages. It contains multiple phage genomes, which users can BLAST and MegaBLAST, and also hosts a Phage Forum in which users can discuss phage data. Interactive browsing of completed phage genomes is available using the program. The browser allows users to scan the genome for particular features and to download sequence information plus analyses of those features. Views of the genome are generated showing named genes BLAST similarities to other phages predicted tRNAs and other sequence features.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Legend: a IL19 i 17 – mutations included in the INNOLiPA tests (see below); CFMDB – non-INNOLiPA mutations present in the CTFR mutation database; novel – mutations first reported in this study; b in three chromosomes R668C with G576A in trans; c F508del, c.1585-1G>A, G542X, N1303K or c.579+3A>G; d F508del, G542X, R553X or N1303K; e not pathogenic if not in cis with c.3067-72del6 (l.n.3199del6); f not pathogenic – see explanation the text; g not pathogenic if not in cis with G1244V.aMutations detected by two INNOLiPA_CFTR tests (legacy names): IL19 (INNOLiPA_CFTR19): F508del; G542X; N1303K; W1282X; G551D; 1717-1G>A; R553X; CFTRdele2,3(21kb); I507del; 711+1G>T; 3272-26A>G; 3905insT; R560T; 1898+1G>A; S1251N; I148T; 3199del6; 3120+1G>A; Q552X.IL17 (INNOLiPA_CFTR17_TnUpdate): 621+1G>T; 3849+10kbC>T; 2183AA>G; 394delTT; 2789+5G>A; R1162X; 3659delC; R117H; R334W; R347P; G85E; 1078delT; A455E; 2143delT; E60X; 2184delA; 711+5G>A; polymorphism 5T/7T/9T.
The Sol Genomics Network (SGN) is a clade-oriented database dedicated to the biology of the Solanaceae family which includes a large number of closely related and many agronomically important species such as tomato, potato, tobacco, eggplant, pepper, and the ornamental Petunia hybrida. SGN is part of the International Solanaceae Initiative (SOL), which has the long-term goal of creating a network of resources and information to address key questions in plant adaptation and diversification. A key problem of the post-genomic era is the linking of the phenome to the genome, and SGN allows to track and help discover new such linkages. Data: Solanaceae and other Genomes SGN is a home for Solanaceae and closely related genomes, such as selected Rubiaceae genomes (e.g., Coffea). The tomato, potato, pepper, and eggplant genome are examples of genomes that are currently available. If you would like to include a Solanaceae genome that you sequenced in SGN, please contact us. ESTs SGN houses EST collections for tomato, potato, pepper, eggplant and petunia and corresponding unigene builds. EST sequence data and cDNA clone resources greatly facilitate cloning strategies based on sequence similarity, the study of syntenic relationships between species in comparative mapping projects, and are essential for microarray technology. Unigenes SGN assembles and publishes unigene builds from these EST sequences. For more information, see Unigene Methods. Maps and Markers SGN has genetic maps and a searchable catalog of markers for tomato, potato, pepper, and eggplant. Tools SGN makes available a wide range of web-based bioinformatics tools for use by anyone, listed here. Some of our most popular tools include BLAST searches, the SolCyc biochemical pathways database, a CAPS experiment designer, an Alignment Analyzer and browser for phylogenetic trees. The VIGS tool can help predict the properties of VIGS (Viral Induced Gene Silencing) constructs. The data in SGN have been submitted by many different research groups around the world. A web form is available to submit data for display on SGN. SGN community-driven gene and phenotype database: Simple web interfaces have been developed for the SGN user-community to submit, annotate, and curate the Solanaceae locus and phenotype databases. The goal is to share biological information, and have the experts in their field review existing data and submit information about their favorite genes and phenotypes. Resources in this dataset:Resource Title: Website Pointer to Sol Genomics Network. File Name: Web Page, url: https://solgenomics.net/ Specialized Search interfaces are provided for: Organisms/Taxon; Genes and Loci; Genomic sequences and annotations; QTLs, Mutants & Accessions, Traits; Transcripts: Unigenes, ESTs, & Libraries; Unigene families; Markers; Genomic clones; Images; Expression: Templates, Experiments, Platforms; Traits.
https://www.genomicsengland.co.uk/about-gecip/joining-research-community/https://www.genomicsengland.co.uk/about-gecip/joining-research-community/
Contains tables related to long-reads sequencing data for 100,000 Genomes Project participants.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Initiated in 2003, the Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database providing centralized access to Rosaceae genomics, genetics and breeding data and analysis tools to facilitate basic, translational and applied Rosaceae research. GDR is supported by grants from the National Science Foundation Plant Genome Program (2003-2008), USDA National Institute of Food and Agriculture (NIFA) Specialty Crop Research Program (2009-2019), USDA NIFA National Research Support Project 10 (2014-2019), and the Washington Tree Fruit Research Commission (2008-2016), Clemson University, University of Florida and Washington State University. http://www.ars.usda.gov/is/graphics/photos/aug97/k6084-1.htm">K6084-1: Photo by Jack Dykinga Resources in this dataset:Resource Title: Genome Database for Rosaceae - Download Data. File Name: Web Page, url: https://www.rosaceae.org/data/download This is the download page for the Genome Database for Rosaceae - datasets can be downloaded directly from this location
THIS RESOURCE IS NO LONGER IN SERVICE, documented on August 20,2019.The COG-database has become a powerful tool in the field of comparative genomics. The construction of this data-base is based on sequence homologies of proteins from different completely sequenced genomes. Highly homologous proteins are assigned to clusters of orthologous groups. The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies. The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. Here is a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.
The Stanley Online Genomics Database uses samples from the Stanley Medical Research Institute (SMRI) Brain Bank. These samples were processed and run on gene expression arrays by a variety of researchers in collaboration with the SMRI. These researchers have performed analyses on their respective studies using a range of analytic approaches. All of the genomic data have been aggregated in this online database, and a consistent set of analyses have been applied to each study. Additionally, a comprehensive set of cross-study analyses have been performed. A thorough collection of gene expression summaries are provided, inclusive of patient demographics, disease subclasses, regulated biological pathways, and functional classifications. Raw data is also available to download. The database is derived from two sets of brain samples, the Stanley Array collection and the Stanley Consortium collection. The Stanley Array collection contains 105 patients, and the Stanley Consortium collection contains 60 patients. Multiple genomic studies have been conducted using these brain samples. From these studies, twelve were selected for inclusion in the database on the basis of number of patients studied, genomic platform used, and data quality. The Consortium collection studies have fewer patients but more diversity in brain regions and array platforms, while the Array collection studies are more homogenous. There are tradeoffs, the Consortium results will be more variable, but findings may be more broadly representative. The collections contain brain samples from subjects in four main groups: Bipolar Schizophrenia, Depression, and Controls Brain regions used in the studies include: Broadman Area 6, Broadman Area 8/9, Broadman Area 10, Broadman Area 46, Cerebellum The 12 studies encompass a range of microarray platforms: Affymetrix HG-U95Av2, Affymetrix HG-U133A, Affymetrix HG-U133 2.0+, Codelink Human 20K, Agilent Human I, Custom cDNA Publications based on any of the clinical or genomic data should credit the Stanley Medical Research Institute, as well as any individual SMRI collaborators whose data is being used. Publications which make use of analytic results/methods in the database should additionally cite Dr. Michael Elashoff. Registration is required to access the data.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
SoyBase is a repository for genetics, genomics and related data resources for soybean. It contains current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1). SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser. Project No:3625-21000-062-00D Accession No: 0425040 Resources in this dataset:Resource Title: SoyBase, the USDA-ARS soybean genetics and genomics database web site. File Name: Web Page, url: https://soybase.org SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1).
Soybean Pods and Seeds SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset comprises of microbial metagenomics sequencing reads of seawater collected across 48 reef sites across the Great Barrier Reef. Samples were collected across four Long Term Monitoring Program (LTMP) field trips between November 2019-July 2020, combining water chemistry data, LTMP field surveys and microbial metagenomics data. This data collection was a major part of the QRCIF IMOS GBR microbial genomic database project, which aims to generate a comprehensive open access repositor of microbial genomic data from across the region. Seawater was collected in quadruplicate either by SCUBA or using Niskin Bottles at each reef site, 5L of seawater was pre-filtered using a 5µm filter and applied to a 0.22µm sterivex filter, snap frozen and stored at -20°C in preparation of DNA extraction. DNA was extracted from sterivex filters using phenol:chloroform:Iso-amyl alcolol extraction, ethanol precipitation and cleanup using the Zymo Clean and Concentrator® kit before submission for sequencing at the Australian Centre for Ecogenomics sequencing facility, Illumina. The data presented as illumina paired-end shotgun metagenomics sequencing runs, in fastq format, generated by Microba Life Sciences, Brisbane, QLD, Australia. Each downloadable archive contains forward and reverse reads for all replicate sampling performed at that particular site. Water quality particulate and dissolved nutrient data was generated as previously described (https://doi.org/10.25845/5c09b551f315b) from water samples collected simultaneously at each reef site.
Zip files are available through the spatial layer under each site's 'illumina.seawater.zip' - please note these are large downloads (between 6 - 14 GB).
The Oomycete Genomics Database is a publicly accessible resource that includes functional assays and expression data, combined with transcript and genomic analysis and annotation. OGD builds upon data available from the Phytophthora Genome Consortium, Syngenta Phytophthora Consortium and the Phytophthora Functional Genomics Database. Data are analyzed and annotated using NCGR''s XGI System. The knowledge gained from these studies provide significant insight into key molecular processes regulating an economically important pathosystem and will provide novel tools for improvement of disease resistance in crop plants.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays. Genomic, genetic, sequence, germplasm, gene product, metabolic pathways, functional characterization, literature reference, diversity, and expression are among the datatypes stored at MaizeGDB. At the project's website are custom interfaces enabling researchers to browse data and to seek out specific information matching explicit search criteria. First released in 1991 with the name MaizeDB, the Maize Genetics and Genomics Database, now MaizeGDB (since 2003), is funded, developed, and hosted by the USDA-ARS located at Ames, Iowa. Resources in this dataset:Resource Title: MaizeGDB, the community database for maize genetics and genomics.. File Name: Web Page, url: https://maizegdb.org/ MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays. Established as a USDA-ARS resource in 2003, MaizeGDB supplies data and resources related to maize. The types of data include genomic, genetic, sequence, germplasm, gene product, metabolic pathways, functional characterization, literature reference, diversity, and expression.
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome