100+ datasets found

s
Amino Acid Index Database
scicrunch.org
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Amino Acid Index Database [Dataset]. http://identifiers.org/RRID:SCR_007044
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007044
Dataset updated
Jan 29, 2022
Description
AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid mutation matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. This section currently contains 544 indices. Another important feature of amino acids that can be represented numerically is the similarity between amino acids. Thus, a similarity matrix, also called a mutation matrix, is a set of 210 numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used for sequence alignments and similarity searches. The AAindex2 section of the Amino Acid Index Database is a collection of published amino acid mutation matrices together with the result of cluster analysis. This section currently contains 94 matrices. In the release 9.0, we added a collection of published protein pairwise contact potentials to AAindex as AAindex3. This section currently contains 47 contact potential matrices. Sponsors: This work was supported by grants and resources from the Ministry of Education, Culture, Sports, Science and Technology, and the Japan Science and Technology Agency, and the Bioinformatics Center, Institute for Chemical Research, Kyoto University and the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo.
e
PROSITE profiles
ebi.ac.uk
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
s
PepBank Peptide Database
scicrunch.org
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). PepBank Peptide Database [Dataset]. http://identifiers.org/RRID:SCR_002086
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002086
Dataset updated
Dec 4, 2023
Description
A database of peptides based on sequence text mining and public peptide data sources. Only peptides that are 20 amino acids or shorter are stored. Only peptides with available sequences are stored. After submitting a query you can further refine the results using the new heat map retrieval tool to quickly find the entries that are most relevant to you. Text classification helps you find candidate peptides that are related to cancer, cardiovascular diseases, diabetes, apoptosis, angiogenesis and molecular imaging or peptides for which binding data exist.
e
SFLD
ebi.ac.uk
Updated Sep 7, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Sep 7, 2018
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.
d
Peptide Sequence Database
dknet.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Peptide Sequence Database [Dataset]. http://identifiers.org/RRID:SCR_005764
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_005764
Dataset updated
Jan 29, 2022
Description
The Peptide Sequence Database contains putative peptide sequences from human, mouse, rat, and zebrafish. Compressed to eliminate redundancy, these are about 40 fold smaller than a brute force enumeration. Current and old releases are available for download. Each species'' peptide sequence database comprises peptide sequence data from releveant species specific UniGene and IPI clusters, plus all sequences from their consituent EST, mRNA and protein sequence databases, namely RefSeq proteins and mRNAs, UniProt''s SwissProt and TrEMBL, GenBank mRNA, ESTs, and high-throughput cDNAs, HInv-DB, VEGA, EMBL, IPI protein sequences, plus the enumeration of all combinations of UniProt sequence variants, Met loss PTM, and signal peptide cleavages. The README file contains some information about the non amino-acid symbols O (digest site corresponding to a protein N- or C-terminus) and J (no digest sequence join) used in these peptide sequence databases and information about how to configure various search engines to use them. Some search engines handle (very) long sequences badly and in some cases must be patched to use these peptide sequence databases. All search engines supported by the PepArML meta-search engine can (or can be patched to) successfully search these peptide sequence databases.
n
Aminoacyl-tRNA synthetase database
neuinfo.org
Updated Aug 23, 2003
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2003). Aminoacyl-tRNA synthetase database [Dataset]. http://identifiers.org/RRID:SCR_013498
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013498
Dataset updated
Aug 23, 2003
Description
The AARSs database is the collection of amino acid sequences of all published AARSs. Currently it contains 1047 primary structures of cytoplasmic and organellar AARSs from various organisms. The entries are grouped according to AARS amino acid specificity. They are based on EMBL/SWISS-PROT format. Each includes the AARS amino acid sequence, its SWISS-PROT name and the accession number, a short description of the sequence, its source (organism name with taxonomic classification) and bibliographic information. For the enzymes whose sequences were determined at the nucleotide level, the appropriate EMBL/GenBank or TIGR entries are included, and for those with already known 3D structure, the cross-references to the Brookhaven Protein Data Base are indicated. The partial sequences of AARSs are also included in the database. According to the original SWISS-PROT description, some of the entries have been marked as putative or probable.
n
NCBI Protein Database
neuinfo.org
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2001). NCBI Protein Database [Dataset]. http://identifiers.org/RRID:SCR_003257
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003257
Dataset updated
Feb 1, 2001
Description
Databases of protein sequences and 3D structures of proteins. Collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
n
CADB - Conformational Angles DataBase of Proteins
neuinfo.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). CADB - Conformational Angles DataBase of Proteins [Dataset]. http://identifiers.org/RRID:SCR_007573
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007573
Dataset updated
Jan 29, 2022
Description
Conformation Angles DataBase is a comprehensive, authoritative and timely knowledge base developed to facilitate retrieval of information related to the conformational angles (main-chain and side-chain) of the amino acid residues present in the non-redundant (both 25% and 90%) data set. The database includes the options of determining the dependency of the conformation angles of a particular residue upon the flanking residues in main-chain, doublet analysis, triplet analysis and analysis of a particular protein structure. It is worth mentioning that for all the options, a user-friendly and convenient Java Graphical User Interface (GUI) has been provided to display the output on the client machine.
e
NCBIFAM
ebi.ac.uk
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Aug 6, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
d
Conserved Domain Database
dknet.org
Updated Aug 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Conserved Domain Database [Dataset]. http://identifiers.org/RRID:SCR_002077
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002077
Dataset updated
Aug 31, 2024
Description
Database of annotations of functional units in proteins including multiple sequence alignment models for ancient domains and full-length proteins. This collection of models includes 3D structures that display the sequence/structure/function relationships in proteins. It also includes alignments of the domains to known three-dimensional protein structures in the MMDB database. The source databases are Pfam, Smart, and COG. Users can identify amino acids in protein sequences with the resources available as well as view single sequences embedded within multiple sequence alignments.
e
Data from: PROSITE
prosite.expasy.org
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE [Dataset]. https://prosite.expasy.org/
Explore at:
Dataset updated
Oct 15, 2025
Description
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
Ribosomal protein database of Listeria floridensis
figshare.com
xlsx
Updated Feb 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenfa Ng (2021). Ribosomal protein database of Listeria floridensis [Dataset]. http://doi.org/10.6084/m9.figshare.13834478.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13834478.v1
Dataset updated
Feb 10, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Wenfa Ng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This work presents the ribosomal protein database of Listeria floridensis. Original data for the work came from the annotated proteome data of the bacterium downloaded from UniProt. Using an in-house MATLAB ribosomal protein database analysis software, the original proteome data file was parsed to extract protein name and amino acid sequence of all ribosomal proteins in the species. The database also includes calculated variables such as number of residues, molecular weight, and nucleotide sequence. Overall, the presented database could serve as a ribosomal protein mass fingerprint for use in microbial identification, or it could be used in fundamental studies seeking to uncover new insights into ribosomal protein biology.
s
Human Gene and Protein Database (HGPD)
scicrunch.org
Updated Nov 23, 2008
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2008). Human Gene and Protein Database (HGPD) [Dataset]. http://identifiers.org/RRID:SCR_002889
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002889
Dataset updated
Nov 23, 2008
Description
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 4,2023.The Human Gene and Protein Database presents SDS-PAGE patterns and other informations of human genes and proteins. The HGPD was constructed from full-length cDNAs. For conversion to Gateway entry clones, we first determined an open reading frame (ORF) region in each cDNA meeting the criteria. Those ORF regions were PCR-amplified utilizing selected resource cDNAs as templates. All the details of the construction and utilization of entry clones will be published elsewhere. Amino acid and nucleotide sequences of an ORF for each cDNA and sequence differences of Gateway entry clones from source cDNAs are presented in the GW: Gateway Summary window. Utilizing those clones with a very efficient cell-free protein synthesis system featuring wheat germ, we have produced a large number of human proteins in vitro. Expressed proteins were detected in almost all cases. Proteins in both total and supernatant fractions are shown in the PE: Protein Expression window. In addition, we have also successfully expressed proteins in HeLa cells and determined subcellular localizations of human proteins. These biological data are presented on the frame of cDNA clusters in the Human Gene and Protein Database. To build the basic frame of HGPD, sequences of FLJ full-length cDNAs and others deposited in public databases (Human ESTs, RefSeq, Ensembl, MGC, etc.) are assembled onto the genome sequences (NCBI Build 35 (UCSC hg17)). The majority of analysis data for cDNA sequences in HGPD are shared with the FLJ Human cDNA Database (http://flj.hinv.jp/) constructed as a human cDNA sequence analysis database focusing on mRNA varieties caused by variations in transcription start site (TSS) and splicing.
d
Ribosomal protein database of Acinetobacter baumannii
search.dataone.org
Updated Nov 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ng, Wenfa (2023). Ribosomal protein database of Acinetobacter baumannii [Dataset]. http://doi.org/10.7910/DVN/ZFXEAV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ZFXEAV
Dataset updated
Nov 19, 2023
Dataset provided by
Harvard Dataverse
Authors
Ng, Wenfa
Description
This work presents the ribosomal protein database of Acinetobacter baumannii. Original data for the work came from the annotated proteome data of the bacterium downloaded from UniProt. Using an in-house MATLAB ribosomal protein database analysis software, the original proteome data file was parsed to extract protein name and amino acid sequence of all ribosomal proteins in the species. The database also includes calculated variables such as number of residues, molecular weight, and nucleotide sequence. Overall, the presented database could serve as a ribosomal protein mass fingerprint for use in microbial identification, or it could be used in fundamental studies seeking to uncover new insights into ribosomal protein biology.
e
HAMAP
ebi.ac.uk
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.
n
ProOpDB
neuinfo.org
Updated Oct 8, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). ProOpDB [Dataset]. http://identifiers.org/RRID:SCR_006111
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006111
Dataset updated
Oct 8, 2011
Description
The Prokaryotic Operon DataBase (ProOpDB) constitutes one of the most precise and complete repository of operon predictions in our days. Using our novel and highly accurate operon algorithm, we have predicted the operon structures of more than 1,200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: i) organism name, ii) metabolic pathways, as defined by the KEGG database, iii) gene orthology, as defined by the COG database, iv) conserved protein motifs, as defined by the Pfam database, v) reference gene, vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient protocol to select the more representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool (GeConT) to visualize their genomic context and retrieve the sequence of their corresponding 5�� regulatory regions, as well as the nucleotide or amino acid sequences of their genes. The prediction algorithm The algorithm is a multilayer perceptron neural network (MLP) classifier, that used as input the intergenic distances of contiguous genes and the functional relationship scores of the STRING database between the different groups of orthologous proteins, as defined in the COG database. Nevertheless, the operon prediction of our method is not restricted to only those genes with a COG assignation, since we successfully defined new groups of orthologous genes and obtained, by extrapolation, a set of equivalent STRING-like scores based on conserved gene pairs on different genomes. Since the STRING functional relationships scores are determined in an un-bias manner and efficiently integrates a large amount of information coming from different sources and kind of evidences, the prediction made by our MLP are considerably less influenced by the bias imposed in the training procedure using one specific organism.
f
Analysis of codon usage at integrase amino acid position 97 among...
datasetcatalog.nlm.nih.gov
Updated Feb 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Callebaut, Christian; Abram, Michael E.; Barnes, Tiffany L.; Ram, Renee R.; White, Kirsten L.; Miller, Michael D.; Margot, Nicolas A. (2017). Analysis of codon usage at integrase amino acid position 97 among patient-derived integrase sequences queried from the Los Alamos HIV database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001793613
Explore at:
Dataset updated
Feb 17, 2017
Authors
Callebaut, Christian; Abram, Michael E.; Barnes, Tiffany L.; Ram, Renee R.; White, Kirsten L.; Miller, Michael D.; Margot, Nicolas A.
Description
Analysis of codon usage at integrase amino acid position 97 among patient-derived integrase sequences queried from the Los Alamos HIV database.
n
Protein Mutant Database
neuinfo.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Protein Mutant Database [Dataset]. http://identifiers.org/RRID:SCR_007878
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007878
Description
It provides information on natural and artificial mutants, including random and site-directed ones, for all proteins except members of the globin and immunoglobulin families. The PMD is based on literature, and each entry in the database corresponds to one article which may describe one, several or a number of protein mutants. Each database entry is identified by a serial number and is defined as either natural or artificial, depending on the type of the mutation. For each entry the following are recorded : JOURNAL, TITLE, CROSS-REFERENCE, PROTEIN, N-TERMINAL, CHANGE, FUNCTION, STRUCTURE, STABILITY, etc. CROSS-REFERENCE indicates the code names of the protein given in other databases such as Protein Identification Resources (2). N-TERMINAL shows the N-terminal sequence of five amino acids which may help to show the unambiguous numbering of th e sequence. CHANGE indicates the position and kind of mutations, such as amino acid substitution, insertion and deletion, denoted with a specific notation. Any functional or structural features (FUNCTION, STRUCTURE, STABILITY,etc) observed in the mutant are described immediately after ''CHANGE''. Relative differences in activity and/or stability, in comparison with the wild-type protein, are indicated with symbols (- -),(-),(=),(+) or (+ +). Complete loss of activity is denoted as (0). Data Submission A data submission system was newly prepared in the PMD. We welcome the authors of articles published in academic journals to submit their own mutant data to the PMD. After checking the contents, we will register the data with a unique accession number.
d
Biological Interaction database for Protein-nucleic Acid
dknet.org
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Biological Interaction database for Protein-nucleic Acid [Dataset]. http://identifiers.org/RRID:SCR_013371
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013371 https://identifiers.org/RRID:SCR_013371/resolver
Dataset updated
Oct 8, 2024
Description
A database for protein-nucleic acid interaction that provides various features of protein-nucleic acid interfaces. There are 2333 protein-nucleic acid PDB complexes, 9547 SCOP domains, and 9633 domain-nucleic acid interfaces in BIPA. BIPA also provides a multiple structural alignment of representative structures at the SCOP family level using the program SALIGN, and the structural alignments were further annotated using the program JOY to detect local environments of amino acids.
Amino Acid Reference Chart
aatbio.com
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AAT Bioquest (2020). Amino Acid Reference Chart [Dataset]. https://www.aatbio.com/data-sets/amino-acid-reference-chart-table
Explore at:
Dataset updated
Jul 9, 2020
Dataset authored and provided by
AAT Bioquest
License
https://www.aatbio.com/tou.htmlhttps://www.aatbio.com/tou.html
Description
Amino acids are the building blocks that make up all proteins, polypeptides and peptides. Each amino acid consists of a central carbon, known as the α-carbon, to which an amino group (-NH2), an acidic carboxyl group (-COOH) and an organic side chain

Facebook

Twitter

Click to copy link

Link copied

Cite

(2022). Amino Acid Index Database [Dataset]. http://identifiers.org/RRID:SCR_007044

Amino Acid Index Database

RRID:SCR_007044, nif-0000-02527, Amino Acid Index Database (RRID:SCR_007044), AAindex

Explore at:

Unique identifier

https://identifiers.org/RRID:SCR_007044

Dataset updated

Jan 29, 2022

Description

AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid mutation matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. An amino acid index is a set of 20 numerical values representing any of the different physicochemical and biological properties of amino acids. The AAindex1 section of the Amino Acid Index Database is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. This section currently contains 544 indices. Another important feature of amino acids that can be represented numerically is the similarity between amino acids. Thus, a similarity matrix, also called a mutation matrix, is a set of 210 numerical values, 20 diagonal and 20x19/2 off-diagonal elements, used for sequence alignments and similarity searches. The AAindex2 section of the Amino Acid Index Database is a collection of published amino acid mutation matrices together with the result of cluster analysis. This section currently contains 94 matrices. In the release 9.0, we added a collection of published protein pairwise contact potentials to AAindex as AAindex3. This section currently contains 47 contact potential matrices. Sponsors: This work was supported by grants and resources from the Ministry of Education, Culture, Sports, Science and Technology, and the Japan Science and Technology Agency, and the Bioinformatics Center, Institute for Chemical Research, Kyoto University and the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo.

Clear search

Close search

Google apps

Main menu

Amino Acid Index Database

PROSITE profiles

PepBank Peptide Database

SFLD

Peptide Sequence Database

Aminoacyl-tRNA synthetase database

NCBI Protein Database

CADB - Conformational Angles DataBase of Proteins

NCBIFAM

Conserved Domain Database

Data from: PROSITE

Ribosomal protein database of Listeria floridensis

Human Gene and Protein Database (HGPD)

Ribosomal protein database of Acinetobacter baumannii

HAMAP

ProOpDB

Analysis of codon usage at integrase amino acid position 97 among...

Protein Mutant Database

Biological Interaction database for Protein-nucleic Acid

Amino Acid Reference Chart

Amino Acid Index DatabaseSee More Versions

RRID:SCR_007044, nif-0000-02527, Amino Acid Index Database (RRID:SCR_007044), AAindex

Amino Acid Index Database