100+ datasets found

V
RefSeq: NCBI Reference Sequence Database
data.virginia.gov
html
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). RefSeq: NCBI Reference Sequence Database [Dataset]. https://data.virginia.gov/dataset/refseq-ncbi-reference-sequence-database
Explore at:
htmlAvailable download formats
Dataset updated
Jun 18, 2025
Dataset provided by
National Library of Medicine
Description
A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.
u
Indexed NCBI nt database - original
figshare.unimelb.edu.au
bin
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VANESSA ROSSETTO MARCELINO (2024). Indexed NCBI nt database - original [Dataset]. http://doi.org/10.26188/25222610.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26188/25222610.v1
Dataset updated
Feb 28, 2024
Dataset provided by
The University of Melbourne
Authors
VANESSA ROSSETTO MARCELINO
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Indexed NCBI nucleotide database, used to benchmark CCMetagen in its original publication.To download from the command line, use:curl "https://mediaflux.researchsoftware.unimelb.edu.au:443/mflux/share.mfjp?_token=i8yedNiYfdjrBfGJ8Y5z1128247857&browser=true&filename=ncbi_nt_kma.zip" -d browser=false -o ncbi_nt_kma.zip
Data from: NCBI Taxonomy
gbif.org
Updated Feb 19, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GBIF (2015). NCBI Taxonomy [Dataset]. http://doi.org/10.15468/rhydar
Explore at:
Unique identifier
https://doi.org/10.15468/rhydar
Dataset updated
Feb 19, 2015
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
Description
The NCBI taxonomy database is not a primary source for taxonomic or phylogenetic information. Furthermore, the database does not follow a single taxonomic treatise but rather attempts to incorporate phylogenetic and taxonomic knowledge from a variety of sources, including the published literature, web-based databases, and the advice of sequence submitters and outside taxonomy experts. Consequently, the NCBI taxonomy database is not a phylogenetic or taxonomic authority and should not be cited as such.
n
NCBI Genome Survey Sequences Database
neuinfo.org
Updated Sep 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). NCBI Genome Survey Sequences Database [Dataset]. http://identifiers.org/RRID:SCR_002146
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002146
Dataset updated
Sep 15, 2024
Description
Database of unannotated short single-read primarily genomic sequences from GenBank including random survey sequences clone-end sequences and exon-trapped sequences. The GSS division of GenBank is similar to the EST division, with the exception that most of the sequences are genomic in origin, rather than cDNA (mRNA). It should be noted that two classes (exon trapped products and gene trapped products) may be derived via a cDNA intermediate. Care should be taken when analyzing sequences from either of these classes, as a splicing event could have occurred and the sequence represented in the record may be interrupted when compared to genomic sequence. The GSS division contains (but is not limited to) the following types of data: * random single pass read genome survey sequences. * cosmid/BAC/YAC end sequences * exon trapped genomic sequences * Alu PCR sequences * transposon-tagged sequences Although dbGSS sequences are incorporated into the GSS Division of GenBank, annotation in dbGSS is more comprehensive and includes detailed information about the contributors, experimental conditions, and genetic map locations.
d
NCBI Virus
catalog.data.gov
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). NCBI Virus [Dataset]. https://catalog.data.gov/dataset/ncbi-virus
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
NCBI Virus is an integrative, value-added resource designed to support retrieval, display and analysis of a curated collection of virus sequences and large sequence datasets. Its goal is to increase the usability of viral sequence data archived in GenBank and other NCBI repositories. This resource includes resources previously included in HIV-1, Human Protein Interaction Database, Influenza Virus Resource, and Virus Variation.
n
NCBI Protein Database
neuinfo.org
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). NCBI Protein Database [Dataset]. http://identifiers.org/RRID:SCR_003257
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003257
Dataset updated
Feb 1, 2001
Description
Databases of protein sequences and 3D structures of proteins. Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
u
Data from: CottonGen: Cotton Database Resources
agdatacommons.nal.usda.gov
bin
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Yu; Sook Jung; Chun-Huai Cheng; Stephen P. Ficklin; Taein Lee; Ping Zheng; Don Jones; Richard G. Percy; Dorrie Main (2025). CottonGen: Cotton Database Resources [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/CottonGen_Cotton_Database_Resources/24853203
Explore at:
binAvailable download formats
Dataset updated
Nov 21, 2025
Dataset provided by
MainLab, Washington State University
Authors
Jing Yu; Sook Jung; Chun-Huai Cheng; Stephen P. Ficklin; Taein Lee; Ping Zheng; Don Jones; Richard G. Percy; Dorrie Main
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
CottonGen (https://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data to enable basic, translational and applied research in cotton. Built using the open-source Tripal database infrastructure, CottonGen supersedes CottonDB and the Cotton Marker Database, which includes sequences, genetic and physical maps, genotypic and phenotypic markers and polymorphisms, quantitative trait loci (QTLs), pathogens, germplasm collections and trait evaluations, pedigrees, and relevant bibliographic citations, with enhanced tools for easier data sharing, mining, visualization, and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST. This project is funded/supported by Cotton Incorporated, the USDA-ARS Crop Germplasm Research Unit at College Station, TX, the Southern Association of Agricultural Experiment Station Directors, Bayer CropScience, Corteva/Agriscience, Dow/Phytogen, Monsanto, Washington State University, and NRSP10. Resources in this dataset:Resource Title: Website Pointer for CottonGen. File Name: Web Page, url: https://www.cottongen.org/ Genomic, Genetic and Breeding Resources for Cotton Research Discovery and Crop Improvement organized by :

Species (Gossypium arboreum, barbadense, herbaceum, hirsutum, raimondii, others), Data (Contributors, Download, Submission, Community Projects, Archives, Cotton Trait Ontology, Nomenclatures, and links to Variety Testing Data and NCBISRA Datasets), Search options (Colleague, Genes and Transcripts, Genotype, Germplasm, Map, Markers, Publications, QTLs, Sequences, Trait Evaluation, MegaSearch), Tools (BIMS, BLAST+, CottonCyc, JBrowse, Map Viewer, Primer3, Sequence Retrieval, Synteny Viewer), International Cotton Genome Initiative (ICGI), and Help sources (User manual, FAQs).

Also provides Quick Start links for Major Species and Tools.
Data from: NCBI Taxonomy
data.niaid.nih.gov
Updated Mar 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TAXON (2021). NCBI Taxonomy [Dataset]. https://data.niaid.nih.gov/resources?id=ds_385ea4f5f9
Explore at:
Dataset updated
Mar 29, 2021
Dataset provided by
National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
Authors
TAXON
Description
The NCBI Taxonomy database is a curated set of names and classifications for all organisms that are represented in the Entrez databases. The Taxonomy database attempts to incorporate phylogenetic and taxonomic knowledge from a variety of sources, including the published literature, web-based databases, and the advice of sequence submitters and outside taxonomy experts.
NCBI Gene
integbio.jp
Updated Jun 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for Biotechnology Information (2019). NCBI Gene [Dataset]. https://integbio.jp/dbcatalog/en/record/nbdc00073?jtpl=56
Explore at:
Dataset updated
Jun 9, 2019
Dataset provided by
National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
License
http://www.ncbi.nlm.nih.gov/About/disclaimer.htmlhttp://www.ncbi.nlm.nih.gov/About/disclaimer.html
Description
The gene database provides information on gene sequence, structure, location, and function for annotated genes from the NCBI database. Users can search by accession ID or keyword, compare and identify sequences using BLAST, or submit references into function (RIFs) based on experimental results. Bulk download and an update mailing list are available.
d
Library LinkOut
catalog.data.gov
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). Library LinkOut [Dataset]. https://catalog.data.gov/dataset/library-linkout
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
LinkOut is a service that allows you to link directly from PubMed and other NCBI databases to a wide range of information and services beyond the NCBI systems. LinkOut aims to facilitate access to relevant online resources in order to extend, clarify, and supplement information found in NCBI databases. Third parties can link directly from PubMed and other Entrez database records to relevant Web-accessible resources beyond the Entrez system. Includes full-text publications, biological databases, consumer health information and research tools.
d
NCBI BioSystems Database
dknet.org
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). NCBI BioSystems Database [Dataset]. http://identifiers.org/RRID:SCR_004690
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004690
Dataset updated
Jan 29, 2022
Description
Database that provides access to biological systems and their component genes, proteins, and small molecules, as well as literature describing those biosystems and other related data throughout Entrez. A biosystem, or biological system, is a group of molecules that interact directly or indirectly, where the grouping is relevant to the characterization of living matter. BioSystem records list and categorize components, such as the genes, proteins, and small molecules involved in a biological system. The companion FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems. A number of databases provide diagrams showing the components and products of biological pathways along with corresponding annotations and links to literature. This database was developed as a complementary project to (1) serve as a centralized repository of data; (2) connect the biosystem records with associated literature, molecular, and chemical data throughout the Entrez system; and (3) facilitate computation on biosystems data. The NCBI BioSystems Database currently contains records from several source databases: KEGG, BioCyc (including its Tier 1 EcoCyc and MetaCyc databases, and its Tier 2 databases), Reactome, the National Cancer Institute's Pathway Interaction Database, WikiPathways, and Gene Ontology (GO). It includes several types of records such as pathways, structural complexes, and functional sets, and is desiged to accomodate other record types, such as diseases, as data become available. Through these collaborations, the BioSystems database facilitates access to, and provides the ability to compute on, a wide range of biosystems data. If you are interested in depositing data into the BioSystems database, please contact them.
u
Data from: CottonGen CottonCyc Pathways Database
agdatacommons.nal.usda.gov
bin
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taein Lee; Sook Jung; Ksenija Gasic; Todd Campbell; Jing Yu; Jodi Humann; Heidi Hough; Dorrie Main (2023). CottonGen CottonCyc Pathways Database [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/CottonGen_CottonCyc_Pathways_Database/24853212
Explore at:
binAvailable download formats
Dataset updated
Dec 18, 2023
Dataset provided by
MainLab, Washington State University
Authors
Taein Lee; Sook Jung; Ksenija Gasic; Todd Campbell; Jing Yu; Jodi Humann; Heidi Hough; Dorrie Main
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The CottonGen CottonCyc Pathways Database, part of CottonGen, supports searching and browsing the following CottonCyc databases:

Cyc pathways for JGI v2.0 G. raimondii D5 genome assembly

This Cyc database was constructed using PathwayTools version 20.0 using the gene models from the JGI v2.0 D5 genome assembly of Gossypium raimondii. There has been no manual curation of this Cyc database. Pathway predictions were made using PathwayTools and in-silico v2.1 annotations as provided by JGI.

Cyc pathways for CGP-BGI v1.0 G. hirsutum AD1 genome assembly

This Cyc database was constructed using PathwayTools version 20.0 using the gene models from the CGP-BGI v1.0 AD1 genome assembly of Gossypium hirsutum. There has been no manual curation of this Cyc database. Pathway predictions were made using PathwayTools and in-silico v1.0 annotations as provided by CGP-BGI. Search parameters include genes, proteins, RNAs, compounds, reactions, pathways, growth media, and BLAST search. Resources in this dataset:Resource Title: Website Pointer to CottonGen CottonCyc Pathways Database. File Name: Web Page, url: http://ptools.cottongen.org/
Data from: COInr a comprehensive, non-redundant COI database from NCBI-nt...
zenodo.org
application/gzip
Updated May 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emese Meglecz; Emese Meglecz (2023). COInr a comprehensive, non-redundant COI database from NCBI-nt and BOLD [Dataset]. http://doi.org/10.5281/zenodo.6555985
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6555985
Dataset updated
May 5, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Emese Meglecz; Emese Meglecz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COInr is a non-redundant, comprehensive database of COI sequences extracted from NCBI-nt and BOLD. It is not limited to a taxon, a gene region, or a taxonomic resolution. Sequences are dereplicated between databases and within taxa.

Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing creating their full or ranked linages.

COInr is a good starting point to create custom databases according to the users’ needs using mkCOInr scripts available at https://github.com/meglecz/mkCOInr
It is possible to select/eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifiers.
NCBI.fungiDBselect.genomeonly.tar.gz
figshare.com
application/gzip
Updated Jul 12, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Cox (2016). NCBI.fungiDBselect.genomeonly.tar.gz [Dataset]. http://doi.org/10.6084/m9.figshare.3482825.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3482825.v1
Dataset updated
Jul 12, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jeremy Cox
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Custom genome only database used with IMSA+A.https://github.com/JeremyCoxBMI/IMSA-A This is built from NCBI Genomes database and select FungiDB.org genomes.
u
Indexed NCBI nt database - without unclassified environmental sequences
figshare.unimelb.edu.au
bin
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VANESSA ROSSETTO MARCELINO (2024). Indexed NCBI nt database - without unclassified environmental sequences [Dataset]. http://doi.org/10.26188/25222598.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26188/25222598.v1
Dataset updated
Feb 28, 2024
Dataset provided by
The University of Melbourne
Authors
VANESSA ROSSETTO MARCELINO
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Indexed NCBI nucleotide database that excludes environmental (unclassified) sequences, ready-to-use with KMA and CCMetagen.The database can be downloaded directly form the command line with:curl "https://mediaflux.researchsoftware.unimelb.edu.au:443/mflux/share.mfjp?_token=ko6MbZXl7FWjAS3jsItV1128247851&browser=true&filename=ncbi_nt_no_env_11jun2019.zip" -d browser=false -o ncbi_nt_no_env_11jun2019.zip
u
Data from: CottonGen BLAST
agdatacommons.nal.usda.gov
bin
Updated Feb 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taein Lee; Sook Jung; Ksenija Gasic; Todd Campbell; Jing Yu; Jodi Humann; Heidi Hough; Dorrie Main (2024). CottonGen BLAST [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/CottonGen_BLAST/24853260
Explore at:
binAvailable download formats
Dataset updated
Feb 13, 2024
Dataset provided by
MainLab, Washington State University
Authors
Taein Lee; Sook Jung; Ksenija Gasic; Todd Campbell; Jing Yu; Jodi Humann; Heidi Hough; Dorrie Main
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
CottonGen offers BLAST with genome, transcriptome, peptide and marker sequence databases from Gossypium species. This can be done using nucleotide sequences or peptide sequences. BLAST functionality is similar to that on NCBI. BLAST Programs:

blastn: Search a nucleotide database using a nucleotide query. blastx: Search protein database using a translated nucleotide query. tblastn: Search translated nucleotide database using a protein query.

blastp: Search protein database using a protein query. Resources in this dataset:Resource Title: Website Pointer for CottonGen BLAST Search. File Name: Web Page, url: https://www.cottongen.org/blast CottonGen offers BLAST with genome, transcriptome, peptide and marker sequence databases from Gossypium species. This can be done using nucleotide sequences or peptide sequences. BLAST functionality is similar to that on NCBI. Enter or upload FASTA sequence(s) to query and select BLAST database.

BLAST Programs:

blastn: Search a nucleotide database using a nucleotide query. blastx: Search protein database using a translated nucleotide query. tblastn: Search translated nucleotide database using a protein query. blastp: Search protein database using a protein query.
Data_Sheet_1_Contamination in Reference Sequence Databases: Time for...
frontiersin.figshare.com
pdf
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valérian Lupo; Mick Van Vlierberghe; Hervé Vanderschuren; Frédéric Kerff; Denis Baurain; Luc Cornet (2023). Data_Sheet_1_Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics.pdf [Dataset]. http://doi.org/10.3389/fmicb.2021.755101.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmicb.2021.755101.s001
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Valérian Lupo; Mick Van Vlierberghe; Hervé Vanderschuren; Frédéric Kerff; Denis Baurain; Luc Cornet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a k-folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases.
s
NCBI BioProject
scicrunch.org
Updated Dec 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). NCBI BioProject [Dataset]. http://identifiers.org/RRID:SCR_004801
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004801
Dataset updated
Dec 4, 2023
Description
Database of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. It is a searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms. Submissions are supported by a web-based Submission Portal. The database facilitates organization and classification of project data submitted to NCBI, EBI and DDBJ databases that captures descriptive information about research projects that result in high volume submissions to archival databases, ties together related data across multiple archives and serves as a central portal by which to inform users of data availability. BioProject records link to corresponding data stored in archival repositories. The BioProject resource is a redesigned, expanded, replacement of the NCBI Genome Project resource. The redesign adds tracking of several data elements including more precise information about a project''''s scope, material, and objectives. Genome Project identifiers are retained in the BioProject as the ID value for a record, and an Accession number has been added. Database content is exchanged with other members of the International Nucleotide Sequence Database Collaboration (INSDC). BioProject is accessible via FTP.
kraken2 database of marine animal genomes, for host decontamination
zenodo.org
application/gzip
Updated Dec 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelina Angelova; Angelina Angelova (2025). kraken2 database of marine animal genomes, for host decontamination [Dataset]. http://doi.org/10.5281/zenodo.17873185
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17873185
Dataset updated
Dec 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Angelina Angelova; Angelina Angelova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
kraken2 database of common marine animal hosts in marine metagenomic dataset. Used in Nephele pipelines for decontamination of metagenomic datasets from common marine animal host reads (database inclusive of human genome).

Content of assemblies:

Homo sapiens (GRCh38.p13),

Conus ventricosus (ASM1839881v1),

Crassostrea virginica (C_virginica-3.0) ,

Crassostrea gigas (cgigas_uk_roslin_v1),

Mytilus galloprovincialis (MGAL_10),

Octopus sinensis (ASM634580v1),

Paraescarpia echinospica (HKBU_Pec_v1),

Streblospio benedicti (ASM1909598v1),

Hyalella azteca (Hazt_2.0.2),

Amphibalanus amphitrite (NRLGWU_Aamphi_draft),

Paramacrobiotus sp. TYO (Prichtersi_v1.0),

Hypsibius dujardini (ASM157998v1),

Lytechinus pictus (UCSD_Lpic_2.0),

Strongylocentrotus purpuratus (Spur_5.0),

Apostichopus parvimensis (Ppar_1.0),

Hydra vulgaris (Hydra_105_v3),

Hydra viridissima (ASM1470644v1),

Pocillopora damicornis (ASM370409v1),

Amphimedon queenslandica (assembly v1.0)
n
NCBI Nucleotide
neuinfo.org
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). NCBI Nucleotide [Dataset]. http://identifiers.org/RRID:SCR_004860
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004860
Dataset updated
Feb 1, 2001
Description
Database of nucleotide sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Library of Medicine (2025). RefSeq: NCBI Reference Sequence Database [Dataset]. https://data.virginia.gov/dataset/refseq-ncbi-reference-sequence-database

RefSeq: NCBI Reference Sequence Database

Explore at:

htmlAvailable download formats

Dataset updated

Jun 18, 2025

Dataset provided by

National Library of Medicine

Description

A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.

Clear search

Close search

Google apps

Main menu

RefSeq: NCBI Reference Sequence Database

Indexed NCBI nt database - original

Data from: NCBI Taxonomy

NCBI Genome Survey Sequences Database

NCBI Virus

NCBI Protein Database

Data from: CottonGen: Cotton Database Resources

Data from: NCBI Taxonomy

NCBI Gene

Library LinkOut

NCBI BioSystems Database

Data from: CottonGen CottonCyc Pathways Database

Data from: COInr a comprehensive, non-redundant COI database from NCBI-nt...

NCBI.fungiDBselect.genomeonly.tar.gz

Indexed NCBI nt database - without unclassified environmental sequences

Data from: CottonGen BLAST

Data_Sheet_1_Contamination in Reference Sequence Databases: Time for...

NCBI BioProject

kraken2 database of marine animal genomes, for host decontamination

NCBI Nucleotide

RefSeq: NCBI Reference Sequence Database