Facebook
TwitterAnnotation data were generated from existing genome assemblies of Calonectria henricotiae JAC13-131 (aka P-10-5865) and C. pseudonaviculata JAC13-27 (aka CT1). Gene prediction and annotations were conducted using the Funannotate v1.8.1 pipeline (https://funannotate.readthedocs.io/en/latest/).
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented on 8/12/13. An expanded version of the Alternative Splicing Annotation Project (ASAP) database with a new interface and integration of comparative features using UCSC BLASTZ multiple alignments. It supports 9 vertebrate species, 4 insects, and nematodes, and provides with extensive alternative splicing analysis and their splicing variants. As for human alternative splicing data, newly added EST libraries were classified and included into previous tissue and cancer classification, and lists of tissue and cancer (normal) specific alternatively spliced genes are re-calculated and updated. They have created a novel orthologous exon and intron databases and their splice variants based on multiple alignment among several species. These orthologous exon and intron database can give more comprehensive homologous gene information than protein similarity based method. Furthermore, splice junction and exon identity among species can be valuable resources to elucidate species-specific genes. ASAP II database can be easily integrated with pygr (unpublished, the Python Graph Database Framework for Bioinformatics) and its powerful features such as graph query, multi-genome alignment query and etc. ASAP II can be searched by several different criteria such as gene symbol, gene name and ID (UniGene, GenBank etc.). The web interface provides 7 different kinds of views: (I) user query, UniGene annotation, orthologous genes and genome browsers; (II) genome alignment; (III) exons and orthologous exons; (IV) introns and orthologous introns; (V) alternative splicing; (IV) isoform and protein sequences; (VII) tissue and cancer vs. normal specificity. ASAP II shows genome alignments of isoforms, exons, and introns in UCSC-like genome browser. All alternative splicing relationships with supporting evidence information, types of alternative splicing patterns, and inclusion rate for skipped exons are listed in separate tables. Users can also search human data for tissue- and cancer-specific splice forms at the bottom of the gene summary page. The p-values for tissue-specificity as log-odds (LOD) scores, and highlight the results for LOD >= 3 and at least 3 EST sequences are all also reported.
Facebook
TwitterOpen source, open access database and literature curation system for community based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. Automatically cross referenced against PubMED, Entrez Gene, EnsEMBL, dbSNP, eVOC: Cell type ontology, and Taxonomy database. Community driven resource for curated regulatory annotation.
Facebook
TwitterA repository for high-quality gene models produced by the manual annotation of vertebrate genomes.
Facebook
TwitterA database that contains sequences built from the existing primary sequence data in GenBank. The sequences and corresponding annotations are experimentally supported and have been published in a peer-reviewed scientific journal.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes and includes tools BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.
I downloaded this annotation from the ensemble database. The columns on the dataframe are: "Gene stable ID", "Gene stable ID version", "Transcript stable ID", "Transcript stable ID version", "Gene type", "Gene name". "Gene type" denotes the type of the gene. "Gene stable ID" and "Gene stable ID version" are lists of ensmbl gene ids which start with the prefix "ENSG" (for example ENSG00000281806 ), while "Gene name" are gene symbols determined by HUGO Gene Nomenclature Committee (HGNC). "Transcript stable ID" and "Transcript stable ID version" are a lists of ensmbl probe ids. "Karyotype band" has karotype band information. "Chromosome/scaffold name" specifics the chromosome the gene is located on. "Gene start" indicates the start position for the codon.
Ensemble Database (http://useast.ensembl.org/index.html)
Provide accessible transcriptome annotation data for Kaggle community.
Facebook
TwitterThe GOA (Gene Ontology Annotation) project provides high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB) and International Protein Index (IPI). This involves electronic annotation and the integration of high-quality manual GO annotation from all GO Consortium model organism groups and specialist groups.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
AbstractCannabis sativa is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing cultivars are highly heterozygous and lack competitiveness in the overall fiber and grain markets, relegating hemp to less than 200,000 hectares globally1. The relaxation of drug laws in recent decades has generated widespread interest in expanding and reincorporating cannabis into agricultural systems, but progress has been impeded by the limited understanding of genomics and breeding potential. No studies to date have examined the genomic diversity and evolution of cannabis populations using haplotype-resolved, chromosome-scale assemblies from publicly available germplasm. Here we present a cannabis pangenome, constructed with 181 new and 12 previously released genomes from a total of 156 biological samples from both male (XY) and female (XX) plants, including 42 trio phased and 36 haplotype-resolved, chromosome-scale assemblies. We discovered widespread regions of the cannabis pangenome that are surprisingly diverse for a single species, with high levels of genetic and structural variation, and propose a novel population structure and hybridization history. Conversely, the cannabinoid synthase genes contain very low levels of diversity, despite being embedded within a variable region containing multiple pseudogenized paralogs and distinct transposable element arrangements. Additionally, we identified variants of acyl-lipid thioesterase (ALT) genes2 that are associated with fatty acid chain length variation and the production of the rare cannabinoids, tetrahydrocannabinol varin (THCV) and cannabidiol varin (CBDV). We conclude the Cannabis sativa gene pool has only been partially characterized, and that the existence of wild relatives in Asia remains likely, while its potential as a crop species remains largely unrealized.1. Nions, U. Commodities at a glance: Special issue on industrial hemp. Commod Glance (2023) doi:10.18356/9789210019958.2. Pulsifer, I. P. et al. Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. Plant Mol. Biol. 84, 549–563 (2014).Transposable element analysisTo identify transposable elements, we used the EDTA pipeline with default settings. EDTAOutput.tar.gz includes EDTA transposon annotations for 78 scaffolded, chromosome-level cannabis genomes.Structural Variation analysis The 78 fully scaffolded assembly haplotypes were each aligned to the EH23a assembly using minimap2 (Heng Li 2018). Syri was then used to call structural variations on each alignment (Goel et al. 2019) and plotsr was used to visualize alignments and SVs (Goel and Schneeberger 2022). DUP_query_coord.bed.tar.gz includes duplications for 78 assemblies with EH23a as referenceINVTR_query_coord.bed.tar.gz includes inverted translocations for 78 assemblies with EH23a as referenceINVs_query_coord.bed.tar.gz includes inversions for 78 assemblies with EH23a as referenceTRANS_query_coord.bed.tar.gz includes translocations for 78 assemblies with EH23a as referencecsat_orientations.tsv is a scaffold orientation file for 78 assemblies with EH23a as reference
Facebook
TwitterBioinformatics resource system including web server and web service for functional annotation and enrichment analyses of gene lists. Consists of comprehensive knowledgebase and set of functional analysis tools. Includes gene centered database integrating heterogeneous gene annotation resources to facilitate high throughput gene functional analysis., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Facebook
TwitterRice Annotation Project Database (RAP-DB) is a primary rice (Oryza sativa) annotation database established in 2004 upon the completion of the Oryza sativa ssp. japonica cv. Nipponbare genome sequencing by the International Rice Genome Sequencing Project. RAP-DB provides comprehensive resources (e.g. genome annotation, gene expression, DNA markers, genetic diversity, etc.) for biological and agricultural research communities. This collection provides transcript information in RAP-DB.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE. Documented September 15, 2017.A virtual database of annotations between databases.
Facebook
TwitterHighly reproducible interaction data in the Yeast Interacting Proteins Database with the "IST hit" (to be described in the table below) of 3 or more. Annotation (gene name and description) is updated by the SGD (Saccharomyces Genome Database;http://www.yeastgenome.org/, August 15, 2009). The number of data is 841. The data are given in a CSV format text file.
Facebook
TwitterPathways identified by the Database for Annotation, Visualization and Integrated Discovery (DAVID version 6.7) in the Kyoto Encyclopedia of Genes and Genomes (KEGG).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reformatted PHROGs database
Reformatting of the PHROGs database (https://phrogs.lmge.uca.fr/index.php) to allow it to be used within Prokka directly and output the PHROGS annotations to the “product” description in genbank files. All annotations are maintained from the PHROGs database set.
File is called all_PHROGs.tar.gz
Place in prokka-1.13-2/db/hmm/ to use with prokka
Contained within this repository are worked examples for the assembly and annotation of a bacteriophage genome.
Full detailed instructions are provided as part of the manuscript “Phage Genome annotation: where to begin and end”
File: SRR13108336_Illumina.tar.gz contains all the files produced from the intermediary steps in the assembly and annotation of a phage genome from single end Illumina reads. Originally deposited by Milhaven et al (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8407768/) and used as an exemplar
File: nanopore_example.tar.gz contains all the files produced from the intermediary steps in the assembly and annotation of a phage genome from nanopore data. Originally deposited by D`Souza et al (https://journals.asm.org/doi/10.1128/MRA.00730-20)
Facebook
TwitterCentral repository for high quality frequently updated manual annotation of vertebrate finished genome sequence. Human, mouse and zebrafish are in the process of being completely annotated, whereas for other species the annotation is only of specific genomic regions of particular biological interest. The majority of the annotation is from the HAVANA group at the Welcome Trust Sanger Institute. Users can BLAST, search for specific text, export, and download data. Genomes and details of the projects for each species are available through the homepages for human mouse and zebrafish. The website is built upon code from the EnsEMBL (http://www.ensembl.org) project. Some Ensembl features are not available in Vega. From the users point of view perhaps the most significant of these is MartView. However due to their inclusion in Ensembl, Vega human and mouse data can be queried using Ensembl MartView. Vega contains annotation of the human MHC region in eight haplotypes, and the LRC region in three haplotypes. Vega also contains annotation on the Insulin Dependent Diabetes (IDD) regions on non-reference assemblies for mouse.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Functional Annotation of Variants - Online Resource (FAVOR, https://favor.genohub.org) is a comprehensive whole-genome variant annotation database and a variant browser, providing hundreds of functional annotation scores from a variety of aspects of variant biological function. This FAVOR Essential Database is comprised of a collection of essential annotation scores for all possible SNVs (8,812,917,339) and observed indels (79,997,898) in Build GRCh38/hg38, including variant info, chromosome, position, reference allele, alternative allele, aPC-Conservation, aPC-Epigenetics, aPC-Epigenetics-Active, aPC-Epigenetics-Repressed, aPC-Epigenetics-Transcription, aPC-Local-Nucleotide-Diversity, aPC-Mappability, aPC-Mutation-Density, aPC-Protein-Function, aPC-Proximity-To-TSSTES, aPC-Transcription-Factor, CAGE promoter, CAGE, MetaSVM, rsID, FATHMM-XF, Gencode Comprehensive Category, Gencode Comprehensive Info, Gencode Comprehensive Exonic Category, Gencode Comprehensive Exonic Info, GeneHancer, LINSIGHT, CADD, rDHS. These annotation scores can be integrated into FAVORannotator (https://github.com/zhouhufeng/FAVORannotator) to create an annotated GDS (aGDS) file by storing the genotype data and their functional annotation data in an all-in-one file. The aGDS file can then facilitate a wide range of functionally-informed downstream analyses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository contains the mandatory DB for Bakta (db.tar.gz).
Bakta is a tool for the rapid & standardized local annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readble JSON & bioinformatics standard file formats for automatic downstream analysis: https://github.com/oschwengers/bakta
This db provides protein sequence hash digests and lengths of UniProt's UniRef100/UniRef90 clusters for ultra-fast identification & lookups. It has been pre-annotated with several specialized db and enriched with Dbxrefs. All conducted pre-annotations are logged and provided in the db.log.gz file.
External DB versions:
Facebook
TwitterA virtual database of annotations made by 50 database providers (April 2014) - and growing (see below), that map data to publication information. All NIF Data Federation sources can be part of this virtual database as long as they indicate the publications that correspond to data records. The format that NIF accepts is the PubMed Identifier, category or type of data that is being linked to, and a data record identifier. A subset of this data is passed to NCBI, as LinkOuts (links at the bottom of PubMed abstracts), however due to NCBI policies the full data records are not currently associated with PubMed records. Database providers can use this mechanism to link to other NCBI databases including gene and protein, however these are not included in the current data set at this time. (To view databases available for linking see, http://www.ncbi.nlm.nih.gov/books/NBK3807/#files.Databases_Available_for_Linking ) The categories that NIF uses have been standardized to the following types: * Resource: Registry * Resource: Software * Reagent: Plasmid * Reagent: Antibodies * Data: Clinical Trials * Data: Gene Expression * Data: Drugs * Data: Taxonomy * Data: Images * Data: Animal Model * Data: Microarray * Data: Brain connectivity * Data: Volumetric observation * Data: Value observation * Data: Activation Foci * Data: Neuronal properties * Data: Neuronal reconstruction * Data: Chemosensory receptor * Data: Electrophysiology * Data: Computational model * Data: Brain anatomy * Data: Gene annotation * Data: Disease annotation * Data: Cell Model * Data: Chemical * Data: Pathways For more information refer to Create a LinkOut file, http://neuinfo.org/nif_components/disco/interoperation.shtm Participating resources ( http://disco.neuinfo.org/webportal/discoLinkoutServiceSummary.do?id=4 ): * Addgene http://www.addgene.org/pgvec1 * Animal Imaging Database http://aidb.crbs.ucsd.edu * Antibody Registry http://www.neuinfo.org/products/antibodyregistry/ * Avian Brain Circuitry Database http://www.behav.org/abcd/abcd.php * BAMS Connectivity http://brancusi.usc.edu/ * Beta Cell Biology Consortium http://www.betacell.org/ * bioDBcore http://biodbcore.org/ * BioGRID http://thebiogrid.org/ * BioNumbers http://bionumbers.hms.harvard.edu/ * Brain Architecture Management System http://brancusi.usc.edu/bkms/ * Brede Database http://hendrix.imm.dtu.dk/services/jerne/brede/ * Cell Centered Database http://ccdb.ucsd.edu * CellML Model Repository http://www.cellml.org/models * CHEBI http://www.ebi.ac.uk/chebi/ * Clinical Trials Network (CTN) Data Share http://www.ctndatashare.org/ * Comparative Toxicogenomics Database http://ctdbase.org/ * Coriell Cell Repositories http://ccr.coriell.org/ * CRCNS - Collaborative Research in Computational Neuroscience - Data sharing http://crcns.org * Drug Related Gene Database https://confluence.crbs.ucsd.edu/display/NIF/DRG * DrugBank http://www.drugbank.ca/ * FLYBASE http://flybase.org/ * Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ * Gene Ontology Tools http://www.geneontology.org/GO.tools.shtml * Gene Weaver http://www.GeneWeaver.org * GeneDB http://www.genedb.org/Homepage * Glomerular Activity Response Archive http://gara.bio.uci.edu * GO http://www.geneontology.org/ * Internet Brain Volume Database http://www.cma.mgh.harvard.edu/ibvd/ * ModelDB http://senselab.med.yale.edu/modeldb/ * Mouse Genome Informatics Transgenes ftp://ftp.informatics.jax.org/pub/reports/MGI_PhenotypicAllele.rpt * NCBI Taxonomy Browser http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html * NeuroMorpho.Org http://neuromorpho.org/neuroMorpho * NeuronDB http://senselab.med.yale.edu/neurondb * SciCrunch Registry http://neuinfo.org/nif/nifgwt.html?tab=registry * NIF Registry Automated Crawl Data http://lucene1.neuinfo.org/nif_resource/current/ * NITRC http://www.nitrc.org/ * Nuclear Receptor Signaling Atlas http://www.nursa.org * Olfactory Receptor DataBase http://senselab.med.yale.edu/ordb/ * OMIM http://omim.org * OpenfMRI http://openfmri.org * PeptideAtlas http://www.peptideatlas.org * RGD http://rgd.mcw.edu * SFARI Gene: AutDB https://gene.sfari.org/autdb/Welcome.do * SumsDB http://sumsdb.wustl.edu/sums/ * Temporal-Lobe: Hippocampal - Parahippocampal Neuroanatomy of the Rat http://www.temporal-lobe.com/ * The Cell: An Image Library http://www.cellimagelibrary.org/ * Visiome Platform http://platform.visiome.neuroinf.jp/ * WormBase http://www.wormbase.org * YPED http://medicine.yale.edu/keck/nida/yped.aspx * ZFIN http://zfin.org
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the gene annotation data for three species of Blastobotrys yeats: B. mokoenaii, B. illinoisensis, and B. malaysiensis.The genome assemblies for B. mokoenaii (NRRL Y-27120) and B. malaysiensis (NRRL Y-6417) were publicly available on the National Center for Biotechnology Information (NCBI) under accessions GCA_003705765.3 and GCA_030558815.1, respectively.The genome assembly for B. illinoisensis (NRRL YB-1343) was generated by SciLifeLab's National Genomics Infrastructure (NGI) using PacBio long-read data and deposited in the European Nucleotide Archive (ENA) under accession GCA_965113335.1.File descriptionbmokoenaii_annotation.gffThis file contains the gene models predicted for B. mokoenaii (GCA_003705765.3).billinoisensis_annotation.gffThis file contains the gene models predicted for B. illinoisensis (GCA_003705765.3).bmalaysiensis_annotation.gffThis file contains the gene models predicted for B. malaysiensis (GCA_030558815.1).Gene annotation methodsRepeat MaskingPrior to annotation, a repeat library was built for each species using RepeatModeler2 v2.0.2 and the genomes were soft-masked using RepeatMasker v4.1.5.$ RepeatModeler -database ${DB} -engine ncbi -pa 16$ RepeatMasker -dir . -gff -u -no_is -xsmall -e ncbi -lib ${LIBRARY} -pa 16 genome.fastaStructural AnnotationStructural annotation was performed on the soft-masked genomes using Braker3 v3.0.3 incorporating external evidence in the form of all fungal proteins from OrthoDB v11 (available at https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11).$ braker.pl --genome="$genome" --prot_seq=${protein} --workingdir=${PWD} --gff3 --threads=16 --verbosity=3 --nocleanup --species=${i}Functional AnnotationThe predicted genes were functionally annotated using the National Bioiformatics Infrastructure Sweden (NBIS) functional_annotation nextflow pipeline v2.0.0 (https://github.com/NBISweden/pipelines-nextflow). Briefly, this pipeline performs similarity searches between the annotated proteins and the UniProtKB/Swiss-Prot database (downloaded on 2023-12) using the Basic Local Alignment Search Tool (BLAST). Then it uses InterProScan to query the proteins against InterPro v59-91 databases, and merges results using AGAT v1.2.0.tRNAs and rRNAsTransfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated using tRNAscan-SE v2.0.12 and barrnap v0.9, respectively. Other ncRNAs, such as SRP RNA, RNase P RNA, spliceosomal ncRNAs etc. have not been predicted. Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.$ tRNAscan-SE -E --gff ${output}_trnas.gff --thread 16 ${genome}.fasta$ barrnap --kingdom euk --threads 6 ${genome}.fasta > ${output}_rrna.gffAnnotation integrationFinnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.$ agat_sp_complement_annotations.pl --ref ${protein_coding} --add ${trna} --add ${rrna} --out full_annotation.gff
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This zipped tarball (.tar.gz) contains a pre-built database for Bacannot (https://github.com/fmalmeida/bacannot).
Files are in the naming convention YEAR_MONTH_DAY.
Facebook
TwitterAnnotation data were generated from existing genome assemblies of Calonectria henricotiae JAC13-131 (aka P-10-5865) and C. pseudonaviculata JAC13-27 (aka CT1). Gene prediction and annotations were conducted using the Funannotate v1.8.1 pipeline (https://funannotate.readthedocs.io/en/latest/).