68 datasets found

r
NCBI Structure
rrid.site
dknet.org
+2more
Updated Nov 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBI Structure [Dataset]. http://identifiers.org/RRID:SCR_004218
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004218
Dataset updated
Nov 30, 2025
Description
Database of three-dimensional structures of macromolecules that allows the user to retrieve structures for specific molecule types as well as structures for genes and proteins of interest. Three main databases comprise Structure-The Molecular Modeling Database; Conserved Domains and Protein Classification; and the BioSystems Database. Structure also links to the PubChem databases to connect biological activity data to the macromolecular structures. Users can locate structural templates for proteins and interactively view structures and sequence data to closely examine sequence-structure relationships. * Macromolecular structures: The three-dimensional structures of biomolecules provide a wealth of information on their biological function and evolutionary relationships. The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more. It is possible, for example, to find 3D structures for homologs of a protein of interest by following the Related Structure link in an Entrez Protein sequence record. * Conserved domains and protein classification: Conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, in addition to NCBI-curated domains that use 3D-structure information explicitly to define domain boundaries and provide insights into sequence/structure/function relationships. * Small molecules and their biological activity: The PubChem project provides information on the biological activities of small molecules and is a component of NIH''''s Molecular Libraries Roadmap Initiative. PubChem includes three databases: PCSubstance, PCBioAssay, and PCCompound. The PubChem data are linked to other data types (illustrated example) in the Entrez system, making it possible, for example, to retrieve information about a compound and then Link to its biological activity data, retrieve 3D protein structures bound to the compound and interactively view their active sites, and find biosystems that include the compound as a component. * Biological Systems: A biosystem, or biological system, is a group of molecules that interact directly or indirectly, where the grouping is relevant to the characterization of living matter. The NCBI BioSystems Database provides centralized access to biological pathways from several source databases and connects the biosystem records with associated literature, molecular, and chemical data throughout the Entrez system. BioSystem records list and categorize components (illustrated example), such as the genes, proteins, and small molecules involved in a biological system. The companion FLink icon FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems.
n
Molecular Modeling DataBase
neuinfo.org
dknet.org
+2more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Molecular Modeling DataBase [Dataset]. http://identifiers.org/RRID:SCR_010623
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_010623
Dataset updated
Jan 29, 2022
Description
The Molecular Modeling DataBase (MMDB), also known as Entrez Structure, is a database of experimentally determined structures obtained from the RCSB Protein Data Bank (PDB). MMDB is developed by the Structure Group of the NCBI Computational Biology Branch. The data processing procedure at NCBI results in the addition of a number of useful features that facilitate computation on the data and link them to many other data types in the Entrez system. The structure database is considerably smaller than Entrez''s Protein or Nucleotide databases, but a large fraction of all known protein sequences have homologs in this set, and one may often learn more about a protein by examining 3-D structures of its homologs. These are accessible as Related Structures in the Links menu of Entrez Protein sequence records (illustrated example). It is then possible to align the query protein to the structure-based sequence, as shown in the illustration on this page. Additional resources can be used along with MMDB to interactively view the structures, find similar 3D structures, learn about the types of interactions and bound chemicals that have been found to exist among the similar 3D structures, and more.
b
Conserved Domain Database at NCBI
bioregistry.io
Updated Feb 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Conserved Domain Database at NCBI [Dataset]. http://identifiers.org/re3data:r3d100012041
Explore at:
Unique identifier
https://identifiers.org/re3data:r3d100012041
Dataset updated
Feb 12, 2023
Description
The Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution.
r
NCBI Protein Database
rrid.site
neuinfo.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). NCBI Protein Database [Dataset]. http://identifiers.org/RRID:SCR_003257
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003257
Dataset updated
Jan 29, 2022
Description
Databases of protein sequences and 3D structures of proteins. Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
s
NCBI dbRBC
scicrunch.org
neuinfo.org
+1more
Updated Jul 1, 2002
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2002). NCBI dbRBC [Dataset]. http://identifiers.org/RRID:SCR_005959
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_005959 https://identifiers.org/RRID:SCR_005959?q=&i=rrid
Dataset updated
Jul 1, 2002
Description
The dbRBC database provides an open, publicly accessible platform for DNA and clinical data related to the human Red Blood Cells (RBC). A new bioinformatics resource, dbRBC, has been installed at the National Center of Biotechnology Information (NCBI). This resource combines the well established Blood Group Antigen Gene Mutation Database (BGMUT) with tools and interlinked resources developed at the NCBI. The main task of dbRBC is to provide access to publicly available genomic, protein and structural information linked to the red blood cell antigens. The site offers a number of resources: * BGMUT Database * Alignment Viewer * SBT Tool * Probe/Primer Resource * Typing Kit Interface * Obstacle
NCBI Gene
integbio.jp
bioregistry.io
Updated Jun 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for Biotechnology Information (2019). NCBI Gene [Dataset]. https://integbio.jp/dbcatalog/en/record/nbdc00073?jtpl=56
Explore at:
Dataset updated
Jun 9, 2019
Dataset provided by
National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
License
http://www.ncbi.nlm.nih.gov/About/disclaimer.htmlhttp://www.ncbi.nlm.nih.gov/About/disclaimer.html
Description
The gene database provides information on gene sequence, structure, location, and function for annotated genes from the NCBI database. Users can search by accession ID or keyword, compare and identify sequences using BLAST, or submit references into function (RIFs) based on experimental results. Bulk download and an update mailing list are available.
e
Data from: PROSITE
prosite.expasy.org
identifiers.org
+7more
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE [Dataset]. https://prosite.expasy.org/
Explore at:
Dataset updated
Oct 15, 2025
Description
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
d
PubChem BioAssay
dknet.org
scicrunch.org
+1more
Updated Oct 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). PubChem BioAssay [Dataset]. http://identifiers.org/RRID:SCR_010734
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_010734
Dataset updated
Oct 11, 2024
Description
Data and information collection and repository for biological activities of small molecules and small interfering RNAs (siRNAs) hosted by the US National Institutes of Health (NIH). Used to select and summarize the bioactivities of tested substances.
f
Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific...
figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew F. Neuwald; Stephen F. Altschul (2023). Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties [Dataset]. http://doi.org/10.1371/journal.pcbi.1004936
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1004936
Dataset updated
May 30, 2023
Dataset provided by
PLOS Computational Biology
Authors
Andrew F. Neuwald; Stephen F. Altschul
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We describe a Bayesian Markov chain Monte Carlo (MCMC) sampler for protein multiple sequence alignment (MSA) that, as implemented in the program GISMO and applied to large numbers of diverse sequences, is more accurate than the popular MSA programs MUSCLE, MAFFT, Clustal-Ω and Kalign. Features of GISMO central to its performance are: (i) It employs a “top-down” strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the input sequences, and then realigns closely related subgroups in tandem. (ii) It infers position-specific gap penalties that favor insertions or deletions (indels) within each sequence at alignment positions in which indels are invoked in other sequences. This favors the placement of insertions between conserved blocks, which can be understood as making up the proteins’ structural core. (iii) It uses a Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences. (iv) It defines a system for exploring alignment space that provides natural avenues for further experimentation through the development of new sampling strategies for more efficiently escaping from suboptimal traps. GISMO’s superior performance is illustrated using 408 protein sets containing, on average, 235 sequences. These sets correspond to NCBI Conserved Domain Database alignments, which have been manually curated in the light of available crystal structures, and thus provide a means to assess alignment accuracy. GISMO fills a different niche than other MSA programs, namely identifying and aligning a conserved domain present within a large, diverse set of full length sequences. The GISMO program is available at http://gismo.igs.umaryland.edu/.
n
MIPModDB
neuinfo.org
scicrunch.org
Updated Nov 14, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). MIPModDB [Dataset]. http://identifiers.org/RRID:SCR_006058
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006058
Dataset updated
Nov 14, 2011
Description
This is a database of comparative protein structure models of MIP (Major Intrinsic Protein) family of proteins. The nearly completed sets of MIPs have been identified from the completed genome sequence of organisms available at NCBI. The structural models of MIP proteins were created by defined protocol. The database aims to provide key information of MIPs in particular based on sequence as well as structures. This will further help to decipher the function of uncharacterized MIPs. For each MIP entry, this database contains information about the source, gene structure, sequence features, substitutions in the conserved NPA motifs, structural model, the residues forming the selectivity filter and channel radius profile. For selected set of MIPs, it is possible to derive structure-based sequence alignment and evolutionary relationship. Sequences and structures of selected MIPs can be downloaded from MIPModDB database.
b
Links to published DMSP-dependent protein structures for the apoenzyme DmdA...
bco-dmo.org
dataone.org
csv
Updated Nov 19, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mary Ann Moran; Ronald P. Kiene; William Whitman (2012). Links to published DMSP-dependent protein structures for the apoenzyme DmdA from Pelagibacter ubique at NCBI's MMDB (En-Gen DMSP Cycling project) [Dataset]. https://www.bco-dmo.org/dataset/3784
Explore at:
csv(542 bytes)Available download formats
Dataset updated
Nov 19, 2012
Dataset provided by
Biological and Chemical Data Management Office
Authors
Mary Ann Moran; Ronald P. Kiene; William Whitman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
taxon, PDB_ID, strain, MMDB_ID, protein_name
Description
Links are provided to published protein structures for the apoenzyme DmdA from Pelagibacter ubique, as well as for DmdA co-crystals soaked with substrate DMSP or the cofactor tetrahydrofolate (THF) accessible via NCBI's Molecular Modeling Database (MMDB).

Experimental design, methods, and results are further described in:
D. J. Schuller, C. R. Reisch, M. A. Moran, W. B. Whitman, and W. N. Lanzilotta (2012). Structures of dimethylsulfoniopropionate-dependent demethylase from the marine organism Pelegabacter ubique. Protein Science, vol. 21, p. 289. doi: 10.1002/pro.2015
r
NCBI Assembly Archive Viewer
rrid.site
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). NCBI Assembly Archive Viewer [Dataset]. http://identifiers.org/RRID:SCR_012917
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_012917
Dataset updated
Jan 29, 2022
Description
Database providing information on structure of assembled genomes, assembly names and other meta-data, statistical reports, and links to genomic sequence data. The Archive links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ).
NCBI hexapod transcriptomes.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hollister W. Herhold; Steven R. Davis; David A. Grimaldi (2023). NCBI hexapod transcriptomes. [Dataset]. http://doi.org/10.1371/journal.pone.0234272.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0234272.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Hollister W. Herhold; Steven R. Davis; David A. Grimaldi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NCBI hexapod transcriptomes.
s
SDAP: Structural Database of Allergenic Proteins
scicrunch.org
Updated Oct 17, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). SDAP: Structural Database of Allergenic Proteins [Dataset]. http://identifiers.org/RRID:SCR_012806
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_012806
Dataset updated
Oct 17, 2019
Description
A database of allergenic proteins. It contains various computational tools that can assist structural biology studies related to allergens. SDAP is an important tool in the investigation of the cross-reactivity between known allergens, in testing the FAO/WHO allergenicity rules for new proteins, and in predicting the IgE-binding potential of genetically modified food proteins. Using this Internet service through a browser, it is possible to retrieve information related to an allergen from the most common protein sequence and structure databases (SwissProt, PIR, NCBI, PDB), to find sequence and structural neighbors for an allergen, and to search for the presence of an epitope other the whole collection of allergens.
Identification of high-risk missense SNPs of the human PC.
plos.figshare.com
xlsx
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahvash Farajzadeh-Dehkordi; Ladan Mafakher; Abbas Harifi; Fatemeh Samiee-Rad; Babak Rahmani (2023). Identification of high-risk missense SNPs of the human PC. [Dataset]. http://doi.org/10.1371/journal.pone.0294417.s006
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0294417.s006
Dataset updated
Nov 28, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mahvash Farajzadeh-Dehkordi; Ladan Mafakher; Abbas Harifi; Fatemeh Samiee-Rad; Babak Rahmani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identification of high-risk missense SNPs of the human PC.
r
Data from: dbVar
rrid.site
neuinfo.org
+1more
Updated Jul 4, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). dbVar [Dataset]. http://identifiers.org/RRID:SCR_003219
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003219
Dataset updated
Jul 4, 2014
Description
Structural variation database designed to store data on variant DNA > / = 1 bp in size from all organisms. Associations of defined variants with phenotype information is also provided. Users can browse data containing number of variant cells from each study, and filter studies by organism, study type, method and genomic variant. Organisms include human, mouse, cattle and several additional animals.
d
Human Intermediate Filament Database
dknet.org
rrid.site
+2more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Human Intermediate Filament Database [Dataset]. http://identifiers.org/RRID:SCR_007744
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007744
Dataset updated
Jan 29, 2022
Description
The Human Intermediate Filament Database is a continuously updated review of the intermediate filament field. It is hoped that users will contribute to the development and expansion of the database on a regular basis. Contributions may include novel variants, new patients with previously discovered sequence and allelic variants. Suggestions on ways to improve the database are also welcome. The entire database can be searched through the Browse and Search options. A number of different parameters can be used to search the database including unique identifier, intermediate filament, disease DNA variations, amino acid variations, domain, date accepted, author and abstract. Output from the search is returned in a table containing all the pertinent cross referenced information. Multiple sequence alignment can also be performed via the CLUSTALW program to determine cDNA or protein sequence conservation. The database is linked to multiple other resources including NCBI RefSeq, PDB, OMIM, UCSC genome browser, NCBI Gene, HomoloGene, PubMed and HGNC. In the case of HGNC, reciprocal links are also available from HGNC that links to Human Intermediate Filament Database. Due to the protein centric nature of the Human Intermediate Filament Database and the gene centric nature of HGNC, a HGNC record will potentially link to multiple records in this database due to the presence of alternative splicing. In such an event, the Human Intermediate Filament Database will present to the user a list of all the protein records resulting from the HGNC gene record. The database uses Jalview and Jmol applets for the visualization of multiple sequence alignment and structure respectively. The database contains information on disease phenotypes of a variety of different intermediate filament related diseases.
List of " 26 high-risk missense SNPs of human PC" identified by six in...
plos.figshare.com
xls
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahvash Farajzadeh-Dehkordi; Ladan Mafakher; Abbas Harifi; Fatemeh Samiee-Rad; Babak Rahmani (2023). List of " 26 high-risk missense SNPs of human PC" identified by six in silico programs. [Dataset]. http://doi.org/10.1371/journal.pone.0294417.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0294417.t001
Dataset updated
Nov 28, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mahvash Farajzadeh-Dehkordi; Ladan Mafakher; Abbas Harifi; Fatemeh Samiee-Rad; Babak Rahmani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of " 26 high-risk missense SNPs of human PC" identified by six in silico programs.
d
Data from: Unlocking natural history collections to improve eDNA reference...
search.dataone.org
datadryad.org
Updated Oct 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Schmid; Nicolas Straube; Camille Albouy; Bo Delling; James Maclaine; Michael Matschiner; Peter Rask MÃ¸ller; Annamaria Nocita; Anja PalandaÄ iÄ‡; Lukas RÃ¼ber; Moritz Sonnewald; Nadir Alvarez; StÃ©phanie Manel; LoÃ¯c Pellissier (2025). Unlocking natural history collections to improve eDNA reference databases and biodiversity monitoring [Dataset]. http://doi.org/10.5061/dryad.0zpc8677g
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.0zpc8677g
Dataset updated
Oct 17, 2025
Dataset provided by
Dryad Digital Repository
Authors
Sarah Schmid; Nicolas Straube; Camille Albouy; Bo Delling; James Maclaine; Michael Matschiner; Peter Rask MÃ¸ller; Annamaria Nocita; Anja PalandaÄ iÄ‡; Lukas RÃ¼ber; Moritz Sonnewald; Nadir Alvarez; StÃ©phanie Manel; LoÃ¯c Pellissier
Description
Biodiversity changes due to human activities highlight the need for efficient biodiversity monitoring approaches. Environmental DNA (eDNA) metabarcoding offers a non-invasive method used for biodiversity monitoring and ecosystem assessment, but its accuracy depends on comprehensive DNA reference databases. Natural history collections often contain rare or difficult-to-obtain samples that can serve as a valuable resource to fill gaps in eDNA reference databases. Here, we discuss the utility of specimens from natural history collections in supporting future eDNA applications. Museomicsâ€”the application of -omics techniques to museum specimensâ€”offers a promising avenue for improving eDNA reference databases by increasing species coverage. Furthermore, museomics can provide transferable methodological advancements for extracting genetic material from samples with low and degraded DNA. The integration of natural history collections, museomics, and eDNA approaches has the potential to signific..., Dataset for analyzing the potential of museum specimens to improve the DNA reference database To examine the cumulative number of species sequenced for a given DNA barcode/mitochondrial genome (also referred to as mitogenome) over the years, we retrieved all data available from NCBI using the R package rentrez v1.2.3 (Winter 2017). We searched the nucleotide database for the rRNA 12S, rRNA 16S, rRNA 18S, cytochrome B (cytB), cytochrome oxidase I (COI) barcodes, as well as for the complete mitogenomes for all fish orders. In addition, we also retrieved all the fish species with available data on the sequence read archive (SRA) using the Entrez Direct (Kans 2024), which provides access to the NCBI databases from a Unix terminal window. To highlight the potential of museum specimens for increasing the number of species with an available barcode/mitogenome sequence, we first downloaded all available datasets on the Global Biodiversity Information Facility (GBIF) listing fish specimens store..., , # Unlocking natural history collections to improve eDNA reference databases and biodiversity monitoring

Description of the data and file structure

The dataset consists of a main folder, data.zip.

Various

kit_custom_prices.xlsx - price estimate for DNA extraction and ssDNA library prep using a commercial kit or the custom protocol from Nicolas Straube.

barcodes_data

output from the cumul_barcodes_plot.R script.

species_with_barcodes.csv - list of all fishes (marine + freshwater) with a given barcode available, according to NCBI. (1) species name, (2) NCBI taxon ID, (3) date when the species sequence was first uploaded on NCBI, (4) marker of interest, (5) year the species sequence was first uploaded on NCBI.

occurence_data

contains a different type of list of species (museum, 12S availability, etc.)

combined_gbif_species.csv - output from the script museum_potential/1_process_gbif_datasets.R. Contains all the species of fish found in the main natural ...,
Datasets - Unveiling Host-Parasite Relationships through Conserved MITEs in...
zenodo.org
observatorio-cientifico.ua.es
Updated Aug 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ANA BELEN MARTIN CUADRADO; ANA BELEN MARTIN CUADRADO (2024). Datasets - Unveiling Host-Parasite Relationships through Conserved MITEs in Prokaryote and Viral Genomes [Dataset]. http://doi.org/10.5281/zenodo.12572003
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.12572003
Dataset updated
Aug 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
ANA BELEN MARTIN CUADRADO; ANA BELEN MARTIN CUADRADO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Title:

Unveiling Host-Parasite Relationships through Conserved MITEs in Prokaryote and Viral Genomes

Authors:

Francisco Nadal-Molero⁽¹⁾, Riccardo Roselli⁽¹⁾, Silvia Garcia-Juan⁽¹⁾, Alicia Campos-Lopez⁽¹⁾, Ana-Belen Martin-Cuadrado^(1*)

SUPPLEMENTARY FILES

Supplementary File S1. Sequences of cMITEs detected in Bacteria genomes (fasta format). The hosting microbial species and inferred NCBI-taxonomy are indicated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Lineage”.

Supplementary File S2. Sequences of cMITEs detected in the Archaea genomes (fasta format). The hosting microbial species and inferred NCBI-taxonomy are indicated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Lineage”.

Supplementary File S3. Sequences of vMITEs detected in the virus sequences from the NCBI and IMG/VR v.4.1 database (fasta format). Virus, microbial host (if known) and inferred NCBI-taxonomy is stated in the name of each sequence. The structure of the MITE name is:

“Accession|Genome|start|end|TSD|TIRlength|MITETracker_group|Virus|Name|Host”.

Supplementary File S4. Sequences of si-vMITEs detected in the virus sequences from the NCBI and IMG/VR v.4.1 database (fasta format). Virus, microbial host (if known) and inferred NCBI-taxonomy are stated in the name of each sequence. The structure of the MITE name is: “Accession|Genome|start|end|Ident.Method.by.DB|Host”.

Supplementary Files S5. Cytoscape networks. (A) Figure 1A, (B) Figure 1B.

Supplementary File S6. Sequences of cMITEs obtained from 5837 genomes of Neisseriales. The structure of the MITE name is:

“Accession|NucleotideID|start|end|TSD|TIRlength|MITETracker_group|Genome|Lineage”.

Supplementary File S7. Sequences of si-vMITEs obtained from 5837 genomes of Neisseriales. The structure of the MITE name is: “Accession|Genome|start|end|Host”.

Supplementary File S8. Sequences of cMITEs obtained from 46051 genomes of Bacteroidota. The structure of the MITE name is:

“Accession|NucleotideID|start|end|TSD|TIRlength|MITETracker_group|Genome|Lineage”.

Supplementary File S9. Sequences of si-vMITEs obtained from 46051 genomes of Bacteroidota. The structure of the MITE name is: “Accession|Genome|start|end|Host”.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). NCBI Structure [Dataset]. http://identifiers.org/RRID:SCR_004218

NCBI Structure

RRID:SCR_004218, nlx_23947, r3d100010927, NCBI Structure (RRID:SCR_004218), NCBI Structure

Explore at:

Unique identifier

https://identifiers.org/RRID:SCR_004218

Dataset updated

Nov 30, 2025

Description

Database of three-dimensional structures of macromolecules that allows the user to retrieve structures for specific molecule types as well as structures for genes and proteins of interest. Three main databases comprise Structure-The Molecular Modeling Database; Conserved Domains and Protein Classification; and the BioSystems Database. Structure also links to the PubChem databases to connect biological activity data to the macromolecular structures. Users can locate structural templates for proteins and interactively view structures and sequence data to closely examine sequence-structure relationships. * Macromolecular structures: The three-dimensional structures of biomolecules provide a wealth of information on their biological function and evolutionary relationships. The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more. It is possible, for example, to find 3D structures for homologs of a protein of interest by following the Related Structure link in an Entrez Protein sequence record. * Conserved domains and protein classification: Conserved domains are functional units within a protein that act as building blocks in molecular evolution and recombine in various arrangements to make proteins with different functions. The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, in addition to NCBI-curated domains that use 3D-structure information explicitly to define domain boundaries and provide insights into sequence/structure/function relationships. * Small molecules and their biological activity: The PubChem project provides information on the biological activities of small molecules and is a component of NIH''''s Molecular Libraries Roadmap Initiative. PubChem includes three databases: PCSubstance, PCBioAssay, and PCCompound. The PubChem data are linked to other data types (illustrated example) in the Entrez system, making it possible, for example, to retrieve information about a compound and then Link to its biological activity data, retrieve 3D protein structures bound to the compound and interactively view their active sites, and find biosystems that include the compound as a component. * Biological Systems: A biosystem, or biological system, is a group of molecules that interact directly or indirectly, where the grouping is relevant to the characterization of living matter. The NCBI BioSystems Database provides centralized access to biological pathways from several source databases and connects the biosystem records with associated literature, molecular, and chemical data throughout the Entrez system. BioSystem records list and categorize components (illustrated example), such as the genes, proteins, and small molecules involved in a biological system. The companion FLink icon FLink tool, in turn, allows you to input a list of proteins, genes, or small molecules and retrieve a ranked list of biosystems.

Clear search

Close search

Google apps

Main menu

NCBI Structure

Molecular Modeling DataBase

Conserved Domain Database at NCBI

NCBI Protein Database

NCBI dbRBC

NCBI Gene

Data from: PROSITE

PubChem BioAssay

Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific...

MIPModDB

Links to published DMSP-dependent protein structures for the apoenzyme DmdA...

NCBI Assembly Archive Viewer

NCBI hexapod transcriptomes.

SDAP: Structural Database of Allergenic Proteins

Identification of high-risk missense SNPs of the human PC.

Data from: dbVar

Human Intermediate Filament Database

List of " 26 high-risk missense SNPs of human PC" identified by six in...

Data from: Unlocking natural history collections to improve eDNA reference...

Description of the data and file structure

Datasets - Unveiling Host-Parasite Relationships through Conserved MITEs in...

NCBI Structure

RRID:SCR_004218, nlx_23947, r3d100010927, NCBI Structure (RRID:SCR_004218), NCBI Structure