THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome
The Oomycete Genomics Database is a publicly accessible resource that includes functional assays and expression data, combined with transcript and genomic analysis and annotation. OGD builds upon data available from the Phytophthora Genome Consortium, Syngenta Phytophthora Consortium and the Phytophthora Functional Genomics Database. Data are analyzed and annotated using NCGR''s XGI System. The knowledge gained from these studies provide significant insight into key molecular processes regulating an economically important pathosystem and will provide novel tools for improvement of disease resistance in crop plants.
SoyBase is a repository for genetics, genomics and related data resources for soybean. It contains current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1). SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser. Project No:3625-21000-062-00D Accession No: 0425040 Resources in this dataset:Resource Title: SoyBase, the USDA-ARS soybean genetics and genomics database web site. File Name: Web Page, url: https://soybase.org SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1). Soybean Pods and Seeds SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser.
Manually curated database of all conditions with known genetic causes, focusing on medically significant genetic data with available interventions. Includes gene symbol, conditions, allelic conditions, inheritance, age in which interventions are indicated, clinical categorization, and general description of interventions/rationale. Contents are intended to describe types of interventions that might be considered. Includes only single gene alterations and does not include genetic associations or susceptibility factors related to more complex diseases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.
The Stanley Online Genomics Database uses samples from the Stanley Medical Research Institute (SMRI) Brain Bank. These samples were processed and run on gene expression arrays by a variety of researchers in collaboration with the SMRI. These researchers have performed analyses on their respective studies using a range of analytic approaches. All of the genomic data have been aggregated in this online database, and a consistent set of analyses have been applied to each study. Additionally, a comprehensive set of cross-study analyses have been performed. A thorough collection of gene expression summaries are provided, inclusive of patient demographics, disease subclasses, regulated biological pathways, and functional classifications. Raw data is also available to download. The database is derived from two sets of brain samples, the Stanley Array collection and the Stanley Consortium collection. The Stanley Array collection contains 105 patients, and the Stanley Consortium collection contains 60 patients. Multiple genomic studies have been conducted using these brain samples. From these studies, twelve were selected for inclusion in the database on the basis of number of patients studied, genomic platform used, and data quality. The Consortium collection studies have fewer patients but more diversity in brain regions and array platforms, while the Array collection studies are more homogenous. There are tradeoffs, the Consortium results will be more variable, but findings may be more broadly representative. The collections contain brain samples from subjects in four main groups: Bipolar Schizophrenia, Depression, and Controls Brain regions used in the studies include: Broadman Area 6, Broadman Area 8/9, Broadman Area 10, Broadman Area 46, Cerebellum The 12 studies encompass a range of microarray platforms: Affymetrix HG-U95Av2, Affymetrix HG-U133A, Affymetrix HG-U133 2.0+, Codelink Human 20K, Agilent Human I, Custom cDNA Publications based on any of the clinical or genomic data should credit the Stanley Medical Research Institute, as well as any individual SMRI collaborators whose data is being used. Publications which make use of analytic results/methods in the database should additionally cite Dr. Michael Elashoff. Registration is required to access the data.
MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays. Genomic, genetic, sequence, germplasm, gene product, metabolic pathways, functional characterization, literature reference, diversity, and expression are among the datatypes stored at MaizeGDB. At the project's website are custom interfaces enabling researchers to browse data and to seek out specific information matching explicit search criteria. First released in 1991 with the name MaizeDB, the Maize Genetics and Genomics Database, now MaizeGDB (since 2003), is funded, developed, and hosted by the USDA-ARS located at Ames, Iowa. Resources in this dataset:Resource Title: MaizeGDB, the community database for maize genetics and genomics.. File Name: Web Page, url: https://maizegdb.org/ MaizeGDB is a community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays. Established as a USDA-ARS resource in 2003, MaizeGDB supplies data and resources related to maize. The types of data include genomic, genetic, sequence, germplasm, gene product, metabolic pathways, functional characterization, literature reference, diversity, and expression.
Collection of curated structural variation in the human genome. Catalogue of human genomic structural variation identified in healthy control samples for studies aiming to correlate genomic variation with phenotypic data. It is continuously updated with new data from peer reviewed research studies. The Database is no longer accepting direct submission of data as they are currently part of a collaboration with two new archival CNV databases at EBI and NCBI, called DGVa and dbVAR, respectively. One of the changes to DGV as part of this collaborative effort is that they will no longer be accepting direct submissions, but rather obtain the datasets from DGVa (short for DGV archive). This will ensure that the three databases are synchronized, and will allow for an official accessioning of variants.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Not only is cacao the basic ingredient in the world’s favorite confection, chocolate, but it provides a livelihood for over 6.5 million farmers in Africa, South America and Asia and ranks as one of the top ten agriculture commodities in the world. Historically, cocoa production has been plagued by serious losses due to pests and diseases. The release of the cacao genome sequence will provide researchers with access to the latest genomic tools, enabling more efficient research and accelerating the breeding process, thereby expediting the release of superior cacao cultivars. The sequenced genotype, Matina 1-6, is representative of the genetic background most commonly found in the cacao producing countries, enabling results to be applied immediately and broadly to current commercial cultivars. Matina 1-6 is highly homozygous which greatly reduces the complexity of the sequence assembly process. While the sequence provided is a preliminary release, it already covers 92% of the genome, with approximately 35,000 genes. We will continue to refine the assembly and annotation, working toward a complete finished sequence. Updates will be made available via the main project website. Resources in this dataset:Resource Title: Cacao Genome Database. File Name: Web Page, url: http://www.cacaogenomedb.org/
Database and integrated tools to improve annotation of the bovine genome and to integrate the genome sequence with other genomics data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a public resource highlighting efforts at ARS in developing genome information for the Citrus Carrizo Genome. Updates and progress are reported here. Resources in this dataset:Resource Title: Web Page. File Name: Web Page, url: https://citrus.pw.usda.gov
Database developed to assist the phylogeneticist user in retrieving individual gene sequence alignments for genes in complete mammalian mitochondrial genomes. Data retrieval in MamMiBase requires three stages. At the first stage, the user must select the mammalian species or group that (s)he wishes to study. In the second stage, the user will select the outgroup from a list that included all species selected in the first stage plus Xenopus laevis and Gallus gallus. Finally, at the third stage, the user will select individual mitochondrial gene alignments or a phylogenetic tree that (s)he wishes to download.
https://www.genomicsengland.co.uk/about-gecip/joining-research-community/https://www.genomicsengland.co.uk/about-gecip/joining-research-community/
To identify and enrol participants for the 100,000 Genomes Project we have created NHS Genomic Medicine Centres (GMCs). Each centre includes several NHS Trusts and hospitals. GMCs recruit and consent patients. They then provide DNA samples and clinical information for analysis.
Illumina, a biotechnology company, have been commissioned to sequence the DNA of participants. They return the whole genome sequences to Genomics England. We have created a secure, monitored, infrastructure to store the genome sequences and clinical data. The data is analysed within this infrastructure and any important findings, like a diagnosis, are passed back to the patient’s doctor.
To help make sure that the project brings benefits for people who take part, we have created the Genomics England Clinical Interpretation Partnership (GeCIP). GeCIP brings together funders, researchers, NHS teams and trainees. They will analyse the data – to help ensure benefits for patients and an increased understanding of genomics. The data will also be used for medical and scientific research. This could be research into diagnosing, understanding or treating disease.
To learn more about how we work you can read the 100,000 Genomes Project protocol. It has details of the development, delivery and operation of the project. It also sets out the patient and clinical benefit, scientific and transformational objectives, the implementation strategy and the ethical and governance frameworks.
Database of peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome expanded to include all Pseudomonas species to facilitate cross-strain and cross-species genome comparisons with high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. The current annotation is updated using recent research literature and peer-reviewed submissions by a worldwide community of PseudoCAP (Pseudomonas aeruginosa Community Annotation Project) participating researchers. If you are interested in participating, you are invited to get involved. Many annotations, DNA sequences, Orthologs, Intergenic DNA, and Protein sequences are available for download.
The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) is a national genetics data repository facilitating access to genotypic and phenotypic data for Alzheimer's disease (AD). Data include GWAS, whole genome (WGS) and whole exome (WES), expression, RNA Seq, and CHIP Seq analyses. Data for the Alzheimer’s Disease Sequencing Project (ADSP) are available through a partnership with dbGaP (ADSP at dbGaP). Results are integrated and annotated in the searchable genomics database that also provides access to a variety of software packages, analytic pipelines, online resources, and web-based tools to facilitate analysis and interpretation of large-scale genomic data. Data are available as defined by the NIA Genomics of Alzheimer’s Disease Sharing Policy and the NIH Genomics Data Sharing Policy. Investigators return secondary analysis data to the database in keeping with the NIAGADS Data Distribution Agreement.
Asteraceae, the largest family of angiosperms, has attracted widespread attention for its exceptional medicinal, horticultural, and ornamental value. However, researches on Asteraceae plants face challenges due to their intricate genetic background. With the continuous advancement of sequencing technology, a vast number of genomes and genetic resources from Asteraceae species have been accumulated. This has spurred a demand for comprehensive genomic analysis within this diverse plant group. To meet this need, we developed the Asteraceae Genomics Database (AGD; http://cbcb.cdutcm.edu.cn/AGD/). The AGD serves as a centralized and systematic resource, empowering researchers in various fields such as gene annotation, gene family analysis, evolutionary biology, and genetic breeding. AGD not only encompasses high-quality genomic sequences, and organelle genome data, but also provides a wide range of analytical tools, including BLAST, JBrowse, SSR Finder, HmmSearch, Heatmap, Primer3, PlantiSMASH, and CRISPRCasFinder. These tools enable users to conveniently query, analyze, and compare genomic information across various Asteraceae species. The establishment of AGD holds great significance in advancing Asteraceae genomics, promoting genetic breeding, and safeguarding biodiversity by providing researchers with a comprehensive and user-friendly genomics resource platform.
The Daphnia Genomics Consortium (DGC) is an international network of investigators committed to mounting the freshwater crustacean Daphnia as a model system for ecology, evolution and the environmental sciences. Along with research activities, the DGC is: (1) coordinating efforts towards developing the Daphnia genomic toolbox, which will then be available for use by the general community; (2) facilitating collaborative cross-disciplinary investigations; (3) developing bioinformatic strategies for organizing the rapidly growing genome database; and (4) exploring emerging technologies to improve high throughput analyses of molecular and ecological samples. If we are to succeed in creating a new model system for modern life-sciences research, it will need to be a community-wide effort. Research activities of the DGC are primarily focused on creating genomic tools and information. When completed, the current projects will offer a first view of the Daphnia genome''s topography, including regions of high and low recombination, the distribution of transposable, repetitive and regulatory elements, the size and structure of genes and of their neighborhoods. This information is crucial in formulating testable hypotheses relating genetics and demographics to the evolutionary potential or constraints of natural populations. Projects aiming to compile identifiable genes with their function are also underway, together with robust methods to verify these findings. Finally, these tools are being tested, by exploring their uses in key ecological and toxicological investigations. Each project benefits from the leadership and expertise of many individuals. For further details, begin by contacting the project directors. The DGC consists of biologists from a broad spectrum of subdisciplines, including limnology, ecotoxicology, quantitative and population genetics, systematics, molecular biology and evolution, developmental biology, genomics and bioinformatics. In many regards, the rapid early success of the consortium results from its grass-roots origin promoting an international composition, under a cooperative model, with significant scientific breadth. We hold to this approach in building this network and encourage more people to participate. All the while, the DGC is structured to effectively reach specific goals. The consortium includes an advisory board (composed of experts of the various subdisciplines), whose responsibility is to act as the research community''s agent in guiding the development of Daphnia genomic resources. The advisors communicate directly to DGC members, who are either contributing genomic tools or actively seeking funds for this function. The consortium''s main body (given the widespread interest in applying genomic tools in environmental studies) are the affiliates, who make use of these tools for their research and who are soliciting support.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Initiated in 2003, the Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database providing centralized access to Rosaceae genomics, genetics and breeding data and analysis tools to facilitate basic, translational and applied Rosaceae research. GDR is supported by grants from the National Science Foundation Plant Genome Program (2003-2008), USDA National Institute of Food and Agriculture (NIFA) Specialty Crop Research Program (2009-2019), USDA NIFA National Research Support Project 10 (2014-2019), and the Washington Tree Fruit Research Commission (2008-2016), Clemson University, University of Florida and Washington State University. http://www.ars.usda.gov/is/graphics/photos/aug97/k6084-1.htm">K6084-1: Photo by Jack Dykinga Resources in this dataset:Resource Title: Genome Database for Rosaceae - Download Data. File Name: Web Page, url: https://www.rosaceae.org/data/download This is the download page for the Genome Database for Rosaceae - datasets can be downloaded directly from this location
Comprehensive collection of high quality microbial genomics reference data for bacteria, viruses, and fungi in holdings of American Type Culture Collection.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises images representing animal genotypes and offers a unique opportunity to delve into the realm of image processing techniques applied to genomic analysis. The original genomic data were sourced from Daniela Lourenco's GitHub repository https://github.com/danielall/Data_ssGBLUP, which contains data used as examples in the paper entitled "Single-step genomic evaluations from theory to practice: using SNP chips and sequence data in blupf90" by Lourenco et al. (2020). According to the data description, these data were simulated using QMSim (Sargolzaei & Schenkel, 2009). All the genetic variance was explained by 500 QTL. Animals were genotyped for 45,000 SNP and the average LD was 0.18. 2024 animals have genotypes and phenotypes. SNP genotype is coded based on the number of copies of the alternative allele (0, 1, 2).
Simulation details
Data were simulated using the software QMsim (Sargolzaei and Schenkel, 2009). In the first simulation step, 200 generations of the historical population were simulated to create mutation and drift equilibrium and linkage disequilibrium (LD). This historical population started from 50,000 individuals and decreased to 2,100 in the last generation, with an equal proportion of males and females. The second step generated an expanded population, which started with 10 males and
2000 females from the last historical generation. Each one of the 2000 females was randomly mated and produced 1 offspring per generation. Sire and dam were randomly replaced over 20 generations, and the replacement was 50% and 20%, respectively. The third step was used to
generate the recent population that had the same parameters as the expansion population. Five generations were simulated, and all animals were genotyped. Only data from the recent population were used, which comprised pedigree information and phenotypes for 10,000 animals, and genotypes for 1020 parents from generations 1-4 and 1004 individuals in generation 5. For the genome, 29 chromosomes with a total of 2319 cM were simulated. Each chromosome had a similar number of SNP as the BovineSNP50k BeadChip (Illumina Inc., San Diego, CA). Although the number of simulated SNP was 54,000, nearly 45,000 passed the quality control and remained in the analyses. Along with SNP, 500 biallelic QTL were randomly placed on chromosomes. The QTL effects were sampled from a gamma distribution. The QTL and SNP had recurrent mutations with a probability of 2.5 × 10-5.
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome