Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of all genomes and accession numbers used for this analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Since the advent of the genomics age that began in the 1990s with the sequencing of a couple of model bacterial and eukaryotic genomes, humans have been on a quest to sequence many species in our ecosystems to find commonalities and differences in sequence that help explain phenotypes. This led to the field of functional genomics, and which is what gave us the capability to automatically annotate a genome with sequence homology as probe. This work sought to provide the gene database of all genes in the human genome on a granular level by categorizing the genetic repertoire of humans at the chromosomal level. Specifically, an in-house MATLAB genome analysis software was used to parse the annotated genome sequence file of different chromosomes of the human genome. Variables that have been output for each gene includes gene name, gene function, promoter sequence and gene sequence. Such information, when aggregated at the level of chromosomes, and entire genome, should inform further studies seeking to unravel the mysteries that link gene sequence, gene expression, cell differentiation, and organismal developmental trajectories and phenotypes.
Facebook
TwitterA database of oncogenes and tumor suppressor genes. Users can search by genes, chromosomes, and keywords. The coAnsensus domain analysis tool functions to identify conserved protein domains and GO terms among selected TAG genes, while the “oncogenic domain analysis” can analyze oncogenic potential of any user-provided protein based on a weighed term frequency table calculated from the TAG proteins. The completion of human genome sequences allows one to rapidly identify and analyze genes of interest through the use of computational approach. The available annotations including physical characterization and functional domains of known tumor-related genes thus can be used to study the role of genes involved in carcinogenesis. The tumor-associated gene (TAG) database was designed to utilize information from well-characterized oncogenes and tumor suppressor genes to facilitate cancer research. All target genes were identified through text-mining approach from the PubMed database. A semi-automatic information retrieving engine was built to collect specific information of these target genes from various resources and store in the TAG database. At current stage, 519 TAGs including 198 oncogenes, 170 tumor suppressor genes, and 151 genes related to oncogenesis were collected. Information collected in TAG database can be browsed through user-friendly web interfaces that provide searching genes by chromosome or by keywords. The “consensus domain analysis” tool functions to identify conserved protein domains and GO terms among selected TAG genes. In addition, the “oncogenic domain analysis” can analyze oncogenic potential of any user-provided protein based on a weighed term frequency table calculated from the TAG proteins. This study was supported by grant from National research program for genomic medicine (NRPGM) and personnel from Bioinformatics Center of Center for Biotechnology and Biosciences in the National Cheng Kung University, Taiwan.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
inphared.pl (INfrastructure for a PHAge REference Database) is a perl script which downloads and filters phage genomes from Genbank to provide the most complete phage genome database possible.Useful information, including viral taxonomy and bacterial host data, is extracted from the Genbank files and provided in a summary table. Genes are called on the genomes using Prokka and this output is used to gather metrics which are summarised in the output files, as well as useful input files for vConTACT2. The data provided is all genomes up to Jan 2021. This can be downloaded so users do not have to repeat the process of consistent gene calling on existing genomes. The folder GenomesDB contains subfolders each containing a subfolder that is named on the accession number of each phage. Within each folder are re-called genes in the following format .ffn.faa The complete genome *fna and genbank file without any annotation *gbf See https://github.com/RyanCook94/
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented on 8/12/13. An expanded version of the Alternative Splicing Annotation Project (ASAP) database with a new interface and integration of comparative features using UCSC BLASTZ multiple alignments. It supports 9 vertebrate species, 4 insects, and nematodes, and provides with extensive alternative splicing analysis and their splicing variants. As for human alternative splicing data, newly added EST libraries were classified and included into previous tissue and cancer classification, and lists of tissue and cancer (normal) specific alternatively spliced genes are re-calculated and updated. They have created a novel orthologous exon and intron databases and their splice variants based on multiple alignment among several species. These orthologous exon and intron database can give more comprehensive homologous gene information than protein similarity based method. Furthermore, splice junction and exon identity among species can be valuable resources to elucidate species-specific genes. ASAP II database can be easily integrated with pygr (unpublished, the Python Graph Database Framework for Bioinformatics) and its powerful features such as graph query, multi-genome alignment query and etc. ASAP II can be searched by several different criteria such as gene symbol, gene name and ID (UniGene, GenBank etc.). The web interface provides 7 different kinds of views: (I) user query, UniGene annotation, orthologous genes and genome browsers; (II) genome alignment; (III) exons and orthologous exons; (IV) introns and orthologous introns; (V) alternative splicing; (IV) isoform and protein sequences; (VII) tissue and cancer vs. normal specificity. ASAP II shows genome alignments of isoforms, exons, and introns in UCSC-like genome browser. All alternative splicing relationships with supporting evidence information, types of alternative splicing patterns, and inclusion rate for skipped exons are listed in separate tables. Users can also search human data for tissue- and cancer-specific splice forms at the bottom of the gene summary page. The p-values for tissue-specificity as log-odds (LOD) scores, and highlight the results for LOD >= 3 and at least 3 EST sequences are all also reported.
Facebook
TwitterThe Daphnia Genomics Consortium (DGC) is an international network of investigators committed to mounting the freshwater crustacean Daphnia as a model system for ecology, evolution and the environmental sciences. Along with research activities, the DGC is: (1) coordinating efforts towards developing the Daphnia genomic toolbox, which will then be available for use by the general community; (2) facilitating collaborative cross-disciplinary investigations; (3) developing bioinformatic strategies for organizing the rapidly growing genome database; and (4) exploring emerging technologies to improve high throughput analyses of molecular and ecological samples. If we are to succeed in creating a new model system for modern life-sciences research, it will need to be a community-wide effort. Research activities of the DGC are primarily focused on creating genomic tools and information. When completed, the current projects will offer a first view of the Daphnia genome''s topography, including regions of high and low recombination, the distribution of transposable, repetitive and regulatory elements, the size and structure of genes and of their neighborhoods. This information is crucial in formulating testable hypotheses relating genetics and demographics to the evolutionary potential or constraints of natural populations. Projects aiming to compile identifiable genes with their function are also underway, together with robust methods to verify these findings. Finally, these tools are being tested, by exploring their uses in key ecological and toxicological investigations. Each project benefits from the leadership and expertise of many individuals. For further details, begin by contacting the project directors. The DGC consists of biologists from a broad spectrum of subdisciplines, including limnology, ecotoxicology, quantitative and population genetics, systematics, molecular biology and evolution, developmental biology, genomics and bioinformatics. In many regards, the rapid early success of the consortium results from its grass-roots origin promoting an international composition, under a cooperative model, with significant scientific breadth. We hold to this approach in building this network and encourage more people to participate. All the while, the DGC is structured to effectively reach specific goals. The consortium includes an advisory board (composed of experts of the various subdisciplines), whose responsibility is to act as the research community''s agent in guiding the development of Daphnia genomic resources. The advisors communicate directly to DGC members, who are either contributing genomic tools or actively seeking funds for this function. The consortium''s main body (given the widespread interest in applying genomic tools in environmental studies) are the affiliates, who make use of these tools for their research and who are soliciting support.
Facebook
TwitterDatabase of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.
Facebook
TwitterThis is the HQSNP DB (high-quality SNP database) developed by CHG bioinformatics group. The high-quality SNP is defined as a SNP having allele frequency or genotyping data. The majority of the HQSNPs come from HapMap, others come from JSNP (Japanese SNP database), TSC (The SNP Consortium), Affymetrix 120K SNP, and Perlegen SNP. There are four kinds of SNP search you can do: * Get SNPs by dbSNP rs#: Choose this search if you have already selected a list of SNPs and you just want to get the SNP information. The program will generate a Excel file containing the SNP flanking sequence, variation, quality, function, etc. In the Excel file, there are 10 highlighted fields. You can send only those highlighted information to Illumina to get SNP pre-score. (The same fields are presented in other types of searches as well.) * Get gene SNPs by gene names: Choose this search if you have a list of gene names and you want to get the SNP information in these genes. The gene name can be official gene symbol, Ensembl gene ID, RefSeq accession ID, LocusLink number, etc. * Get gene SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get all gene SNP information in these regions. The software will find all the Ensembl genes in the regions and find SNPs associated to each Ensembl gene. * Get genome scan SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get evenly spaced SNPs in these regions. A SNP selection tool (SNPselector) was built upon HQSNP. It took snp ID list, gene name list, or genome region list as input and searched SNPs for genome scan or gene assoctiation study. It could take an optional ABI SNP file (exported from ABI SNP search web page) as input for checking whether the candidate SNP is available from ABI. It could also take an optional Illumina SNP pre-score file as input to select SNP for Illumina SNP assay. It generated results sorted by tag SNP in LD block, SNP quality, SNP function, SNP regulatory potential, and SNP mutation risk. SNPselector is now retired from public use (as of September 30, 2010).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Agricultural crop breeding programs, particularly at the national level, typically consist of a core panel of elite breeding cultivars alongside a number of local landrace varieties (or other endemic cultivars) that provide additional sources of phenotypic and genomic variation or contribute as experimental materials (e.g., in GWAS studies). Three issues commonly arise. First, focusing primarily on core development accessions may mean that the potential contributions of landraces or other secondary accessions may be overlooked. Second, elite cultivars may accumulate deleterious alleles away from nontarget loci due to the strong effects of artificial selection. Finally, a tendency to focus solely on SNP-based methods may cause incomplete or erroneous identification of functional variants. In practice, integration of local breeding programs with findings from global database projects may be challenging. First, local GWAS experiments may only indicate useful functional variants according to the diversity of the experimental panel, while other potentially useful loci—identifiable at a global level—may remain undiscovered. Second, large-scale experiments such as GWAS may prove prohibitively costly or logistically challenging for some agencies. Here, we present a fully automated bioinformatics pipeline (riceExplorer) that can easily integrate local breeding program sequence data with international database resources, without relying on any phenotypic experimental procedure. It identifies associated functional haplotypes that may prove more robust in determining the genotypic determinants of desirable crop phenotypes. In brief, riceExplorer evaluates a global crop database (IRRI 3000 Rice Genomes) to identify haplotypes that are associated with extreme phenotypic variation at the global level and recorded in the database. It then examines which potentially useful variants are present in the local crop panel, before distinguishing between those that are already incorporated into the elite breeding accessions and those only found among secondary varieties (e.g., landraces). Results highlight the effectiveness of our pipeline, identifying potentially useful functional haplotypes across the genome that are absent from elite cultivars and found among landraces and other secondary varieties in our breeding program. riceExplorer can automatically conduct a full genome analysis and produces annotated graphical output of chromosomal maps, potential global diversity sources, and summary tables.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is an expertly curated database of literature-derived functional information for the model organism budding yeast, Saccharomyces cerevisiae. SGD constantly strives to synergize new types of experimental data and bioinformatics predictions with existing data, and to organize them into a comprehensive and up-to-date information resource. The primary mission of SGD is to facilitate research into the biology of yeast and to provide this wealth of information to advance, in many ways, research on other organisms, even those as evolutionarily distant as humans. To build such a bridge between biological kingdoms, SGD is curating data regarding yeast-human complementation, in which a human gene can successfully replace the function of a yeast gene, and/or vice versa. These data are manually curated from published literature, made available for download, and incorporated into a variety of analysis tools provided by SGD.
Facebook
TwitterThe Sol Genomics Network (SGN) is a clade-oriented database dedicated to the biology of the Solanaceae family which includes a large number of closely related and many agronomically important species such as tomato, potato, tobacco, eggplant, pepper, and the ornamental Petunia hybrida. SGN is part of the International Solanaceae Initiative (SOL), which has the long-term goal of creating a network of resources and information to address key questions in plant adaptation and diversification. A key problem of the post-genomic era is the linking of the phenome to the genome, and SGN allows to track and help discover new such linkages. Data: Solanaceae and other Genomes SGN is a home for Solanaceae and closely related genomes, such as selected Rubiaceae genomes (e.g., Coffea). The tomato, potato, pepper, and eggplant genome are examples of genomes that are currently available. If you would like to include a Solanaceae genome that you sequenced in SGN, please contact us. ESTs SGN houses EST collections for tomato, potato, pepper, eggplant and petunia and corresponding unigene builds. EST sequence data and cDNA clone resources greatly facilitate cloning strategies based on sequence similarity, the study of syntenic relationships between species in comparative mapping projects, and are essential for microarray technology. Unigenes SGN assembles and publishes unigene builds from these EST sequences. For more information, see Unigene Methods. Maps and Markers SGN has genetic maps and a searchable catalog of markers for tomato, potato, pepper, and eggplant. Tools SGN makes available a wide range of web-based bioinformatics tools for use by anyone, listed here. Some of our most popular tools include BLAST searches, the SolCyc biochemical pathways database, a CAPS experiment designer, an Alignment Analyzer and browser for phylogenetic trees. The VIGS tool can help predict the properties of VIGS (Viral Induced Gene Silencing) constructs. The data in SGN have been submitted by many different research groups around the world. A web form is available to submit data for display on SGN. SGN community-driven gene and phenotype database: Simple web interfaces have been developed for the SGN user-community to submit, annotate, and curate the Solanaceae locus and phenotype databases. The goal is to share biological information, and have the experts in their field review existing data and submit information about their favorite genes and phenotypes. Resources in this dataset:Resource Title: Website Pointer to Sol Genomics Network. File Name: Web Page, url: https://solgenomics.net/ Specialized Search interfaces are provided for: Organisms/Taxon; Genes and Loci; Genomic sequences and annotations; QTLs, Mutants & Accessions, Traits; Transcripts: Unigenes, ESTs, & Libraries; Unigene families; Markers; Genomic clones; Images; Expression: Templates, Experiments, Platforms; Traits.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents the catalogue of genes in the human genome obtained through parsing the annotated human genome sequence available on Genbank. By using an in-house MATLAB genome analysis software, contents of the annotated genome sequence file of Homo sapiens are parsed into gene identifier, gene function description, and gene sequence of each gene in the human genome. Overall, the resource should be useful for informing human genetics research.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This zipped tarball (.tar.gz) contains a pre-built database for Bacannot (https://github.com/fmalmeida/bacannot).
Files are in the naming convention YEAR_MONTH_DAY.
Facebook
TwitterNOTE: This dataset is no longer publicly available. BBGD (http://bioinformatics.towson.edu/BBGD/) was developed as a database for blueberry genomics. BBGD is both a sequence and gene expression database. It stores both EST and microarray data and allows scientists to correlate expression profiles with gene function. BBGD is a public online database. "Presently, the main focus of the database is the identification of genes in blueberry that are significantly induced or suppressed after low temperature exposure. " To gain a better understanding of changes in gene expression associated with cold acclimation in blueberry, the Rowland laboratory (USDA-ARS, Beltsville, MD) has undertaken a genomics approach based on the analysis of Expressed Sequence Tags (ESTs). Initially, two standard cDNA libraries were constructed using RNA from cold-acclimated and non-acclimated floral buds of the blueberry cultivar ‘Bluecrop’ (Vaccinium corymbosum L.) and about 1200 5’-end ESTs were generated from each of the libraries. About 100 3’-end ESTs were generated from the cold-acclimated library as well. The Blueberry EST database contains EST sequences from a number blueberry libraries including cold acclimated and non-acclimated libraries. It also includes forward and reverse subtractive libraries. You can query the sequence database by clone ID, accession number or gene (clone) name below. Or you can get a list (in tabular) format of all the clones in a particular library by clicking on the library name on the left side navigation bar. Attribution for photo: D2601-1 - Blueberry plant: Copyright free, public _domain photo by Mark Ehlenfeldt
Facebook
TwitterCommunity model organism database for laboratory mouse and authoritative source for phenotype and functional annotations of mouse genes. MGD includes complete catalog of mouse genes and genome features with integrated access to genetic, genomic and phenotypic information, all serving to further the use of the mouse as a model system for studying human biology and disease. MGD is a major component of the Mouse Genome Informatics.Contains standardized descriptions of mouse phenotypes, associations between mouse models and human genetic diseases, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information. Data are obtained and integrated via manual curation of the biomedical literature, direct contributions from individual investigators and downloads from major informatics resource centers. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Commensal in the gut, the Gram-negative rod-shaped Escherichia coli is perhaps the most familiar of the common bacteria that people knows about. Indeed, from a pathogen involved in food poisoning to a biotechnology workhorse producing useful products, the facultative anaerobic E. coli bacterium has received intense research scrutiny into varied aspects of its metabolism and genetic repertoire. Used as a model organism for understanding fundamental biological questions and a microbial chassis for biotechnology purposes, E. coli continues to withhold secrets important to furthering our understanding of biology or developing better biotechnology platforms. This work reports the gene database of E. coli K-12 obtained through parsing the annotated genome sequence of the bacterium from Genbank. Comprising gene name, gene function descriptor, and gene sequence, the gene database of E. coli K-12 should find use in a variety of molecular cloning and genome editing workflows seeking to understand functional genomics or to exploit the rich genetic repertoire of the organism for biotechnology purposes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains 13,850 microbial genomes assembled from various fermented foods and associated curated metadata.
We have also clustered the database at 95% identity for creating a species-representative database, and at 99% identity for creating a "strain"-representative database, since we hypothesize that many bioactivities and phenotypes for fermented food microbes are important at the strain-level.
This GitHub repository documents how publicly available genomes and metagenome-assembled genomes were sourced and curated. This GitHub repository documents how the associated metadata was curated.
This database largely pulls from existing genome resources, and we curated this database specifically for fermented foods. If you use this database, please cite the following genome databases/resources:
We are incredibly grateful for these groups and countless others taking the time to make their data publicly available. Included in the metadata is the original DOI and study link from which the genome was generated, in addition to if they were collated into one of the above two larger databases. If you specifically use/analyze a subset of genomes, please cite those studies to credit those that generate data and make it publicly available.
We have provided the entire set of 13,850 microbial genomes in a single tar archive for download. We have also provided tar archives of the genomes clustered at 95% and 99% identity. If you wish to download the entire database and then only use a subset of the database, such as species-representative (clustered at 95% ANI) or "strain-representative" (clustered at 99% ANI) genomes after downloading the entire database, you can use our helper script for subsetting genomes that are representatives or a custom list that you provide.
usage: subset_genomes.py [-h] [--rep-column {rep_95id,rep_99id}]
[--id-column ID_COLUMN] [--dry-run]
[--genome-list GENOME_LIST]
metadata_tsv all_genomes_dir output_dir
Subset representative genomes (species/strain) from a genome set using
metadata.
positional arguments:
metadata_tsv Path to metadata TSV file
all_genomes_dir Directory containing all .fa genome files
output_dir Directory to copy representative genomes to
optional arguments:
-h, --help show this help message and exit
--rep-column {rep_95id,rep_99id}
Column in metadata to use for representatives (e.g.,
rep_95id or rep_99id)
--id-column ID_COLUMN
Column in metadata with genome file IDs (default:
mag_id)
--dry-run Only print what would be copied, don't actually copy
--genome-list GENOME_LIST
Optional: Path to file with list of genome IDs or
filenames to subset (one per line)
We have uploaded the "strain-representative" set of ~4300 genomes to KBase as a public narrative.
Facebook
TwitterThis exercise is an adaptation of the Annotation Lesson by Rosenwald et al. It introduces the use of bioinformatics tools to extract information from genome databases. It is a basic lesson on genome annotation databases.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A fasta file containing all available genomes from the JGI genome database that are used for the default PICRUSt2 reference database.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome