100+ datasets found

d
Data from: Genome annotation data from Calonectria species
catalog.data.gov
agdatacommons.nal.usda.gov
Updated May 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Genome annotation data from Calonectria species [Dataset]. https://catalog.data.gov/dataset/genome-annotation-data-from-calonectria-species-6746d
Explore at:
Dataset updated
May 8, 2025
Dataset provided by
Agricultural Research Service
Description
Annotation data were generated from existing genome assemblies of Calonectria henricotiae JAC13-131 (aka P-10-5865) and C. pseudonaviculata JAC13-27 (aka CT1). Gene prediction and annotations were conducted using the Funannotate v1.8.1 pipeline (https://funannotate.readthedocs.io/en/latest/).
n
Alternative Splicing Annotation Project II Database
neuinfo.org
dknet.org
+2more
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Alternative Splicing Annotation Project II Database [Dataset]. http://identifiers.org/RRID:SCR_000322
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_000322
Dataset updated
Oct 30, 2024
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented on 8/12/13. An expanded version of the Alternative Splicing Annotation Project (ASAP) database with a new interface and integration of comparative features using UCSC BLASTZ multiple alignments. It supports 9 vertebrate species, 4 insects, and nematodes, and provides with extensive alternative splicing analysis and their splicing variants. As for human alternative splicing data, newly added EST libraries were classified and included into previous tissue and cancer classification, and lists of tissue and cancer (normal) specific alternatively spliced genes are re-calculated and updated. They have created a novel orthologous exon and intron databases and their splice variants based on multiple alignment among several species. These orthologous exon and intron database can give more comprehensive homologous gene information than protein similarity based method. Furthermore, splice junction and exon identity among species can be valuable resources to elucidate species-specific genes. ASAP II database can be easily integrated with pygr (unpublished, the Python Graph Database Framework for Bioinformatics) and its powerful features such as graph query, multi-genome alignment query and etc. ASAP II can be searched by several different criteria such as gene symbol, gene name and ID (UniGene, GenBank etc.). The web interface provides 7 different kinds of views: (I) user query, UniGene annotation, orthologous genes and genome browsers; (II) genome alignment; (III) exons and orthologous exons; (IV) introns and orthologous introns; (V) alternative splicing; (IV) isoform and protein sequences; (VII) tissue and cancer vs. normal specificity. ASAP II shows genome alignments of isoforms, exons, and introns in UCSC-like genome browser. All alternative splicing relationships with supporting evidence information, types of alternative splicing patterns, and inclusion rate for skipped exons are listed in separate tables. Users can also search human data for tissue- and cancer-specific splice forms at the bottom of the gene summary page. The p-values for tissue-specificity as log-odds (LOD) scores, and highlight the results for LOD >= 3 and at least 3 EST sequences are all also reported.
d
Open Regulatory Annotation Database
dknet.org
scicrunch.org
+2more
Updated Nov 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Open Regulatory Annotation Database [Dataset]. http://identifiers.org/RRID:SCR_007835
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007835
Dataset updated
Nov 30, 2025
Description
Open source, open access database and literature curation system for community based annotation of experimentally identified DNA regulatory regions, transcription factor binding sites and regulatory variants. Automatically cross referenced against PubMED, Entrez Gene, EnsEMBL, dbSNP, eVOC: Cell type ontology, and Taxonomy database. Community driven resource for curated regulatory annotation.
b
Vertebrate Genome Annotation Database
bioregistry.io
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Vertebrate Genome Annotation Database [Dataset]. http://identifiers.org/re3data:r3d100012575
Explore at:
Unique identifier
https://identifiers.org/re3data:r3d100012575
Dataset updated
Apr 26, 2024
Description
A repository for high-quality gene models produced by the manual annotation of vertebrate genomes.
d
Third Party Annotation (TPA) Database
catalog.data.gov
datadiscovery.nlm.nih.gov
+3more
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). Third Party Annotation (TPA) Database [Dataset]. https://catalog.data.gov/dataset/third-party-annotation-tpa-database
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
A database that contains sequences built from the existing primary sequence data in GenBank. The sequences and corresponding annotations are experimentally supported and have been published in a peer-reviewed scientific journal.
Ensembl BioMart Annotation
kaggle.com
zip
Updated May 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Randy Williams (2021). Ensembl BioMart Annotation [Dataset]. https://www.kaggle.com/datasets/rwilliams7653/ensembl-biomart-annotation
Explore at:
zip(3469507 bytes)Available download formats
Dataset updated
May 20, 2021
Authors
Randy Williams
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes and includes tools BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.

Content

I downloaded this annotation from the ensemble database. The columns on the dataframe are: "Gene stable ID", "Gene stable ID version", "Transcript stable ID", "Transcript stable ID version", "Gene type", "Gene name". "Gene type" denotes the type of the gene. "Gene stable ID" and "Gene stable ID version" are lists of ensmbl gene ids which start with the prefix "ENSG" (for example ENSG00000281806 ), while "Gene name" are gene symbols determined by HUGO Gene Nomenclature Committee (HGNC). "Transcript stable ID" and "Transcript stable ID version" are a lists of ensmbl probe ids. "Karyotype band" has karotype band information. "Chromosome/scaffold name" specifics the chromosome the gene is located on. "Gene start" indicates the start position for the codon.

Acknowledgements

Ensemble Database (http://useast.ensembl.org/index.html)

Inspiration

Provide accessible transcriptome annotation data for Kaggle community.
b
Gene Ontology Annotation Database
bioregistry.io
Updated Apr 24, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Gene Ontology Annotation Database [Dataset]. https://bioregistry.io/goa
Explore at:
Dataset updated
Apr 24, 2021
Description
The GOA (Gene Ontology Annotation) project provides high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB) and International Protein Index (IPI). This involves electronic annotation and the integration of high-quality manual GO annotation from all GO Consortium model organism groups and specialist groups.
f
Cannabis Pangenome Annotation Data
plus.figshare.com
application/x-gzip
Updated May 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Lynch; Lillian Padgitt-Cobb; Andrea R. Garfinkel; Brian Knaus; Nolan Hartwick; Nicholas Allsing; Anthony Aylward; Allen Mamerto; Justine Kipruto Kitony; Kelly Colt; Emily Murray; Tiffany Duong; Aaron Trippe; Seth Crawford; Kelly Vining; Todd Michael (2024). Cannabis Pangenome Annotation Data [Dataset]. http://doi.org/10.25452/figshare.plus.25909024.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.25452/figshare.plus.25909024.v1
Dataset updated
May 30, 2024
Dataset provided by
Figshare+
Authors
Ryan Lynch; Lillian Padgitt-Cobb; Andrea R. Garfinkel; Brian Knaus; Nolan Hartwick; Nicholas Allsing; Anthony Aylward; Allen Mamerto; Justine Kipruto Kitony; Kelly Colt; Emily Murray; Tiffany Duong; Aaron Trippe; Seth Crawford; Kelly Vining; Todd Michael
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
AbstractCannabis sativa is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing cultivars are highly heterozygous and lack competitiveness in the overall fiber and grain markets, relegating hemp to less than 200,000 hectares globally1. The relaxation of drug laws in recent decades has generated widespread interest in expanding and reincorporating cannabis into agricultural systems, but progress has been impeded by the limited understanding of genomics and breeding potential. No studies to date have examined the genomic diversity and evolution of cannabis populations using haplotype-resolved, chromosome-scale assemblies from publicly available germplasm. Here we present a cannabis pangenome, constructed with 181 new and 12 previously released genomes from a total of 156 biological samples from both male (XY) and female (XX) plants, including 42 trio phased and 36 haplotype-resolved, chromosome-scale assemblies. We discovered widespread regions of the cannabis pangenome that are surprisingly diverse for a single species, with high levels of genetic and structural variation, and propose a novel population structure and hybridization history. Conversely, the cannabinoid synthase genes contain very low levels of diversity, despite being embedded within a variable region containing multiple pseudogenized paralogs and distinct transposable element arrangements. Additionally, we identified variants of acyl-lipid thioesterase (ALT) genes2 that are associated with fatty acid chain length variation and the production of the rare cannabinoids, tetrahydrocannabinol varin (THCV) and cannabidiol varin (CBDV). We conclude the Cannabis sativa gene pool has only been partially characterized, and that the existence of wild relatives in Asia remains likely, while its potential as a crop species remains largely unrealized.1. Nions, U. Commodities at a glance: Special issue on industrial hemp. Commod Glance (2023) doi:10.18356/9789210019958.2. Pulsifer, I. P. et al. Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. Plant Mol. Biol. 84, 549–563 (2014).Transposable element analysisTo identify transposable elements, we used the EDTA pipeline with default settings. EDTAOutput.tar.gz includes EDTA transposon annotations for 78 scaffolded, chromosome-level cannabis genomes.Structural Variation analysis The 78 fully scaffolded assembly haplotypes were each aligned to the EH23a assembly using minimap2 (Heng Li 2018). Syri was then used to call structural variations on each alignment (Goel et al. 2019) and plotsr was used to visualize alignments and SVs (Goel and Schneeberger 2022). DUP_query_coord.bed.tar.gz includes duplications for 78 assemblies with EH23a as referenceINVTR_query_coord.bed.tar.gz includes inverted translocations for 78 assemblies with EH23a as referenceINVs_query_coord.bed.tar.gz includes inversions for 78 assemblies with EH23a as referenceTRANS_query_coord.bed.tar.gz includes translocations for 78 assemblies with EH23a as referencecsat_orientations.tsv is a scaffold orientation file for 78 assemblies with EH23a as reference
n
DAVID
neuinfo.org
dknet.org
+1more
Updated Aug 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). DAVID [Dataset]. http://identifiers.org/RRID:SCR_001881
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001881
Dataset updated
Aug 17, 2024
Description
Bioinformatics resource system including web server and web service for functional annotation and enrichment analyses of gene lists. Consists of comprehensive knowledgebase and set of functional analysis tools. Includes gene centered database integrating heterogeneous gene annotation resources to facilitate high throughput gene functional analysis., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
b
Rice annotation Project database
bioregistry.io
Updated May 3, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Rice annotation Project database [Dataset]. https://bioregistry.io/rapdb.transcript
Explore at:
Dataset updated
May 3, 2021
Description
Rice Annotation Project Database (RAP-DB) is a primary rice (Oryza sativa) annotation database established in 2004 upon the completion of the Oryza sativa ssp. japonica cv. Nipponbare genome sequencing by the International Rice Genome Sequencing Project. RAP-DB provides comprehensive resources (e.g. genome annotation, gene expression, DNA markers, genetic diversity, etc.) for biological and agricultural research communities. This collection provides transcript information in RAP-DB.
d
Integrated Data Annotation
dknet.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Integrated Data Annotation [Dataset]. http://identifiers.org/RRID:SCR_010499
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_010499
Dataset updated
Jan 29, 2022
Description
THIS RESOURCE IS NO LONGER IN SERVICE. Documented September 15, 2017.A virtual database of annotations between databases.
b
Core Data of Yeast Interacting Proteins Database (Annotation Updated...
dbarchive.biosciencedbc.jp
Updated Jan 31, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Core Data of Yeast Interacting Proteins Database (Annotation Updated Version) [Dataset]. http://doi.org/10.18908/lsdba.nbdc00742-001
Explore at:
Unique identifier
https://doi.org/10.18908/lsdba.nbdc00742-001
Dataset updated
Jan 31, 2013
Description
Highly reproducible interaction data in the Yeast Interacting Proteins Database with the "IST hit" (to be described in the table below) of 3 or more. Annotation (gene name and description) is updated by the SGD (Saccharomyces Genome Database;http://www.yeastgenome.org/, August 15, 2009). The number of data is 841. The data are given in a CSV format text file.
f
Pathways identified by the Database for Annotation, Visualization and...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parker-Gaddis, Kristen L.; Maltecca, Christian; Cole, John B.; Tiezzi, Francesco; Clay, John S. (2015). Pathways identified by the Database for Annotation, Visualization and Integrated Discovery (DAVID version 6.7) in the Kyoto Encyclopedia of Genes and Genomes (KEGG). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001923133
Explore at:
Dataset updated
Feb 6, 2015
Authors
Parker-Gaddis, Kristen L.; Maltecca, Christian; Cole, John B.; Tiezzi, Francesco; Clay, John S.
Description
Pathways identified by the Database for Annotation, Visualization and Integrated Discovery (DAVID version 6.7) in the Kyoto Encyclopedia of Genes and Genomes (KEGG).
l
Bacteriophage Genome Annotation
figshare.le.ac.uk
figshare.com
application/gzip
Updated Nov 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Millard; Anastasiya Shen (2021). Bacteriophage Genome Annotation [Dataset]. http://doi.org/10.25392/leicester.data.16896277.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.16896277.v1
Dataset updated
Nov 5, 2021
Dataset provided by
University of Leicester
Authors
Andrew Millard; Anastasiya Shen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reformatted PHROGs database

Reformatting of the PHROGs database (https://phrogs.lmge.uca.fr/index.php) to allow it to be used within Prokka directly and output the PHROGS annotations to the “product” description in genbank files. All annotations are maintained from the PHROGs database set.

File is called all_PHROGs.tar.gz

Place in prokka-1.13-2/db/hmm/ to use with prokka

Contained within this repository are worked examples for the assembly and annotation of a bacteriophage genome.

Full detailed instructions are provided as part of the manuscript “Phage Genome annotation: where to begin and end”

File: SRR13108336_Illumina.tar.gz contains all the files produced from the intermediary steps in the assembly and annotation of a phage genome from single end Illumina reads. Originally deposited by Milhaven et al (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8407768/) and used as an exemplar

File: nanopore_example.tar.gz contains all the files produced from the intermediary steps in the assembly and annotation of a phage genome from nanopore data. Originally deposited by D`Souza et al (https://journals.asm.org/doi/10.1128/MRA.00730-20)
d
VEGA
dknet.org
rrid.site
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). VEGA [Dataset]. http://identifiers.org/RRID:SCR_007907
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007907 https://identifiers.org/RRID:SCR_007907/resolver
Dataset updated
Jan 29, 2022
Description
Central repository for high quality frequently updated manual annotation of vertebrate finished genome sequence. Human, mouse and zebrafish are in the process of being completely annotated, whereas for other species the annotation is only of specific genomic regions of particular biological interest. The majority of the annotation is from the HAVANA group at the Welcome Trust Sanger Institute. Users can BLAST, search for specific text, export, and download data. Genomes and details of the projects for each species are available through the homepages for human mouse and zebrafish. The website is built upon code from the EnsEMBL (http://www.ensembl.org) project. Some Ensembl features are not available in Vega. From the users point of view perhaps the most significant of these is MartView. However due to their inclusion in Ensembl, Vega human and mouse data can be queried using Ensembl MartView. Vega contains annotation of the human MHC region in eight haplotypes, and the LRC region in three haplotypes. Vega also contains annotation on the Insulin Dependent Diabetes (IDD) regions on non-reference assemblies for mouse.
H
FAVOR Essential Database
dataverse.harvard.edu
search.dataone.org
Updated Apr 12, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hufeng Zhou; Theodore Arapoglou; Xihao Li; Zilin Li; Xihong Lin (2022). FAVOR Essential Database [Dataset]. http://doi.org/10.7910/DVN/1VGTJI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/1VGTJI
Dataset updated
Apr 12, 2022
Dataset provided by
Harvard Dataverse
Authors
Hufeng Zhou; Theodore Arapoglou; Xihao Li; Zilin Li; Xihong Lin
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Functional Annotation of Variants - Online Resource (FAVOR, https://favor.genohub.org) is a comprehensive whole-genome variant annotation database and a variant browser, providing hundreds of functional annotation scores from a variety of aspects of variant biological function. This FAVOR Essential Database is comprised of a collection of essential annotation scores for all possible SNVs (8,812,917,339) and observed indels (79,997,898) in Build GRCh38/hg38, including variant info, chromosome, position, reference allele, alternative allele, aPC-Conservation, aPC-Epigenetics, aPC-Epigenetics-Active, aPC-Epigenetics-Repressed, aPC-Epigenetics-Transcription, aPC-Local-Nucleotide-Diversity, aPC-Mappability, aPC-Mutation-Density, aPC-Protein-Function, aPC-Proximity-To-TSSTES, aPC-Transcription-Factor, CAGE promoter, CAGE, MetaSVM, rsID, FATHMM-XF, Gencode Comprehensive Category, Gencode Comprehensive Info, Gencode Comprehensive Exonic Category, Gencode Comprehensive Exonic Info, GeneHancer, LINSIGHT, CADD, rDHS. These annotation scores can be integrated into FAVORannotator (https://github.com/zhouhufeng/FAVORannotator) to create an annotated GDS (aGDS) file by storing the genotype data and their functional annotation data in an all-in-one file. The aGDS file can then facilitate a wide range of functionally-informed downstream analyses.
Bakta database
zenodo.org
application/gzip +1
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oliver Schwengers; Oliver Schwengers (2023). Bakta database [Dataset]. http://doi.org/10.5281/zenodo.4662588
Explore at:
application/gzip, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4662588
Dataset updated
Feb 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Oliver Schwengers; Oliver Schwengers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data repository contains the mandatory DB for Bakta (db.tar.gz).

Bakta is a tool for the rapid & standardized local annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readble JSON & bioinformatics standard file formats for automatic downstream analysis: https://github.com/oschwengers/bakta

This db provides protein sequence hash digests and lengths of UniProt's UniRef100/UniRef90 clusters for ultra-fast identification & lookups. It has been pre-annotated with several specialized db and enriched with Dbxrefs. All conducted pre-annotations are logged and provided in the db.log.gz file.

External DB versions:

NCBI AMRFinderPlus: 2021-03-01

COG: 2020

DoriC: 10

ISFinder: 2019-09-25

Mob-suite: 2.0

Pfam: 34

RefSeq: r205

Rfam: 14.5

UniProtKB/Swiss-Prot: 2021_01

VFDB: 2021-04-05
n
Integrated Manually Extracted Annotation
neuinfo.org
scicrunch.org
+2more
Updated Apr 15, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). Integrated Manually Extracted Annotation [Dataset]. http://identifiers.org/RRID:SCR_008876
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008876
Dataset updated
Apr 15, 2014
Description
A virtual database of annotations made by 50 database providers (April 2014) - and growing (see below), that map data to publication information. All NIF Data Federation sources can be part of this virtual database as long as they indicate the publications that correspond to data records. The format that NIF accepts is the PubMed Identifier, category or type of data that is being linked to, and a data record identifier. A subset of this data is passed to NCBI, as LinkOuts (links at the bottom of PubMed abstracts), however due to NCBI policies the full data records are not currently associated with PubMed records. Database providers can use this mechanism to link to other NCBI databases including gene and protein, however these are not included in the current data set at this time. (To view databases available for linking see, http://www.ncbi.nlm.nih.gov/books/NBK3807/#files.Databases_Available_for_Linking ) The categories that NIF uses have been standardized to the following types: * Resource: Registry * Resource: Software * Reagent: Plasmid * Reagent: Antibodies * Data: Clinical Trials * Data: Gene Expression * Data: Drugs * Data: Taxonomy * Data: Images * Data: Animal Model * Data: Microarray * Data: Brain connectivity * Data: Volumetric observation * Data: Value observation * Data: Activation Foci * Data: Neuronal properties * Data: Neuronal reconstruction * Data: Chemosensory receptor * Data: Electrophysiology * Data: Computational model * Data: Brain anatomy * Data: Gene annotation * Data: Disease annotation * Data: Cell Model * Data: Chemical * Data: Pathways For more information refer to Create a LinkOut file, http://neuinfo.org/nif_components/disco/interoperation.shtm Participating resources ( http://disco.neuinfo.org/webportal/discoLinkoutServiceSummary.do?id=4 ): * Addgene http://www.addgene.org/pgvec1 * Animal Imaging Database http://aidb.crbs.ucsd.edu * Antibody Registry http://www.neuinfo.org/products/antibodyregistry/ * Avian Brain Circuitry Database http://www.behav.org/abcd/abcd.php * BAMS Connectivity http://brancusi.usc.edu/ * Beta Cell Biology Consortium http://www.betacell.org/ * bioDBcore http://biodbcore.org/ * BioGRID http://thebiogrid.org/ * BioNumbers http://bionumbers.hms.harvard.edu/ * Brain Architecture Management System http://brancusi.usc.edu/bkms/ * Brede Database http://hendrix.imm.dtu.dk/services/jerne/brede/ * Cell Centered Database http://ccdb.ucsd.edu * CellML Model Repository http://www.cellml.org/models * CHEBI http://www.ebi.ac.uk/chebi/ * Clinical Trials Network (CTN) Data Share http://www.ctndatashare.org/ * Comparative Toxicogenomics Database http://ctdbase.org/ * Coriell Cell Repositories http://ccr.coriell.org/ * CRCNS - Collaborative Research in Computational Neuroscience - Data sharing http://crcns.org * Drug Related Gene Database https://confluence.crbs.ucsd.edu/display/NIF/DRG * DrugBank http://www.drugbank.ca/ * FLYBASE http://flybase.org/ * Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ * Gene Ontology Tools http://www.geneontology.org/GO.tools.shtml * Gene Weaver http://www.GeneWeaver.org * GeneDB http://www.genedb.org/Homepage * Glomerular Activity Response Archive http://gara.bio.uci.edu * GO http://www.geneontology.org/ * Internet Brain Volume Database http://www.cma.mgh.harvard.edu/ibvd/ * ModelDB http://senselab.med.yale.edu/modeldb/ * Mouse Genome Informatics Transgenes ftp://ftp.informatics.jax.org/pub/reports/MGI_PhenotypicAllele.rpt * NCBI Taxonomy Browser http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html * NeuroMorpho.Org http://neuromorpho.org/neuroMorpho * NeuronDB http://senselab.med.yale.edu/neurondb * SciCrunch Registry http://neuinfo.org/nif/nifgwt.html?tab=registry * NIF Registry Automated Crawl Data http://lucene1.neuinfo.org/nif_resource/current/ * NITRC http://www.nitrc.org/ * Nuclear Receptor Signaling Atlas http://www.nursa.org * Olfactory Receptor DataBase http://senselab.med.yale.edu/ordb/ * OMIM http://omim.org * OpenfMRI http://openfmri.org * PeptideAtlas http://www.peptideatlas.org * RGD http://rgd.mcw.edu * SFARI Gene: AutDB https://gene.sfari.org/autdb/Welcome.do * SumsDB http://sumsdb.wustl.edu/sums/ * Temporal-Lobe: Hippocampal - Parahippocampal Neuroanatomy of the Rat http://www.temporal-lobe.com/ * The Cell: An Image Library http://www.cellimagelibrary.org/ * Visiome Platform http://platform.visiome.neuroinf.jp/ * WormBase http://www.wormbase.org * YPED http://medicine.yale.edu/keck/nida/yped.aspx * ZFIN http://zfin.org
s
Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and...
figshare.scilifelab.se
researchdata.se
+1more
txt
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Ravn; Amanda Sörensen Ristinmaa; Scott Mazurkewich; Guilherme Borges Dias; Johan Larsbrink; Cecilia Geijer (2025). Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and Blastobotrys malaysiensis [Dataset]. http://doi.org/10.17044/scilifelab.28606814.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.17044/scilifelab.28606814.v1
Dataset updated
Mar 21, 2025
Dataset provided by
Chalmers University of Technology
Authors
Jonas Ravn; Amanda Sörensen Ristinmaa; Scott Mazurkewich; Guilherme Borges Dias; Johan Larsbrink; Cecilia Geijer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the gene annotation data for three species of Blastobotrys yeats: B. mokoenaii, B. illinoisensis, and B. malaysiensis.The genome assemblies for B. mokoenaii (NRRL Y-27120) and B. malaysiensis (NRRL Y-6417) were publicly available on the National Center for Biotechnology Information (NCBI) under accessions GCA_003705765.3 and GCA_030558815.1, respectively.The genome assembly for B. illinoisensis (NRRL YB-1343) was generated by SciLifeLab's National Genomics Infrastructure (NGI) using PacBio long-read data and deposited in the European Nucleotide Archive (ENA) under accession GCA_965113335.1.File descriptionbmokoenaii_annotation.gffThis file contains the gene models predicted for B. mokoenaii (GCA_003705765.3).billinoisensis_annotation.gffThis file contains the gene models predicted for B. illinoisensis (GCA_003705765.3).bmalaysiensis_annotation.gffThis file contains the gene models predicted for B. malaysiensis (GCA_030558815.1).Gene annotation methodsRepeat MaskingPrior to annotation, a repeat library was built for each species using RepeatModeler2 v2.0.2 and the genomes were soft-masked using RepeatMasker v4.1.5.$ RepeatModeler -database ${DB} -engine ncbi -pa 16$ RepeatMasker -dir . -gff -u -no_is -xsmall -e ncbi -lib ${LIBRARY} -pa 16 genome.fastaStructural AnnotationStructural annotation was performed on the soft-masked genomes using Braker3 v3.0.3 incorporating external evidence in the form of all fungal proteins from OrthoDB v11 (available at https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11).$ braker.pl --genome="$genome" --prot_seq=${protein} --workingdir=${PWD} --gff3 --threads=16 --verbosity=3 --nocleanup --species=${i}Functional AnnotationThe predicted genes were functionally annotated using the National Bioiformatics Infrastructure Sweden (NBIS) functional_annotation nextflow pipeline v2.0.0 (https://github.com/NBISweden/pipelines-nextflow). Briefly, this pipeline performs similarity searches between the annotated proteins and the UniProtKB/Swiss-Prot database (downloaded on 2023-12) using the Basic Local Alignment Search Tool (BLAST). Then it uses InterProScan to query the proteins against InterPro v59-91 databases, and merges results using AGAT v1.2.0.tRNAs and rRNAsTransfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated using tRNAscan-SE v2.0.12 and barrnap v0.9, respectively. Other ncRNAs, such as SRP RNA, RNase P RNA, spliceosomal ncRNAs etc. have not been predicted. Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.$ tRNAscan-SE -E --gff ${output}_trnas.gff --thread 16 ${genome}.fasta$ barrnap --kingdom euk --threads 6 ${genome}.fasta > ${output}_rrna.gffAnnotation integrationFinnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.$ agat_sp_complement_annotations.pl --ref ${protein_coding} --add ${trna} --add ${rrna} --out full_annotation.gff
Bacannot database
zenodo.org
data-staging.niaid.nih.gov
+1more
application/gzip
Updated Oct 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felipe Marques de Almeida; Felipe Marques de Almeida (2023). Bacannot database [Dataset]. http://doi.org/10.5281/zenodo.7615812
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7615812
Dataset updated
Oct 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Felipe Marques de Almeida; Felipe Marques de Almeida
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This zipped tarball (.tar.gz) contains a pre-built database for Bacannot (https://github.com/fmalmeida/bacannot).
Files are in the naming convention YEAR_MONTH_DAY.

Facebook

Twitter

Click to copy link

Link copied

Cite

Agricultural Research Service (2025). Genome annotation data from Calonectria species [Dataset]. https://catalog.data.gov/dataset/genome-annotation-data-from-calonectria-species-6746d

Data from: Genome annotation data from Calonectria species

Explore at:

Dataset updated

May 8, 2025

Dataset provided by

Agricultural Research Service

Description

Annotation data were generated from existing genome assemblies of Calonectria henricotiae JAC13-131 (aka P-10-5865) and C. pseudonaviculata JAC13-27 (aka CT1). Gene prediction and annotations were conducted using the Funannotate v1.8.1 pipeline (https://funannotate.readthedocs.io/en/latest/).

Clear search

Close search

Google apps

Main menu

Data from: Genome annotation data from Calonectria species

Alternative Splicing Annotation Project II Database

Open Regulatory Annotation Database

Vertebrate Genome Annotation Database

Third Party Annotation (TPA) Database

Ensembl BioMart Annotation

Context

Content

Acknowledgements

Inspiration

Gene Ontology Annotation Database

Cannabis Pangenome Annotation Data

DAVID

Rice annotation Project database

Integrated Data Annotation

Core Data of Yeast Interacting Proteins Database (Annotation Updated...

Pathways identified by the Database for Annotation, Visualization and...

Bacteriophage Genome Annotation

VEGA

FAVOR Essential Database

Bakta database

Integrated Manually Extracted Annotation

Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and...

Bacannot database

Data from: Genome annotation data from Calonectria speciesSee More Versions

Data from: Genome annotation data from Calonectria species