100+ datasets found

e
PROSITE profiles
ebi.ac.uk
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
e
NCBIFAM
ebi.ac.uk
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Aug 6, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
Structural Protein Sequences
kaggle.com
zip
Updated Feb 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SHAHIR (2018). Structural Protein Sequences [Dataset]. https://www.kaggle.com/datasets/shahir/protein-data-set
Explore at:
zip(28782775 bytes)Available download formats
Dataset updated
Feb 3, 2018
Authors
SHAHIR
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

This is a protein data set retrieved from Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).

The PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is then annotated and publicly released into the archive by the wwPDB.

The constantly-growing PDB is a reflection of the research that is happening in laboratories across the world. This can make it both exciting and challenging to use the database in research and education. Structures are available for many of the proteins and nucleic acids involved in the central processes of life, so you can go to the PDB archive to find structures for ribosomes, oncogenes, drug targets, and even whole viruses. However, it can be a challenge to find the information that you need, since the PDB archives so many different structures. You will often find multiple structures for a given molecule, or partial structures, or structures that have been modified or inactivated from their native form.

Content

There are two data files. Both are arranged on "structureId" of the protein:

pdb_data_no_dups.csv contains protein meta data which includes details on protein classification, extraction methods, etc.

data_seq.csv contains >400,000 protein structure sequences.

Acknowledgements

Original data set down loaded from http://www.rcsb.org/pdb/

Inspiration

Protein data base helped the life science community to study about different diseases and come with new drugs and solution that help the human survival.
e
PIRSF
ebi.ac.uk
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). PIRSF [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Apr 7, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.
e
HAMAP
ebi.ac.uk
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.
e
Data from: PROSITE
prosite.expasy.org
toothandnail-mailorder.com
+7more
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE [Dataset]. https://prosite.expasy.org/
Explore at:
Dataset updated
Oct 15, 2025
Description
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
c
Protein Structural Domain Classification
cathdb.info
ec.i4cologne.com
+3more
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
Explore at:
Unique identifier
https://identifiers.org/MIR:00100005
Dataset updated
Sep 30, 2024
Description
CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
Custom protein databases
figshare.com
txt
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edward Lau; Maggie Pui Yu Lam (2023). Custom protein databases [Dataset]. http://doi.org/10.6084/m9.figshare.7780940.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7780940.v2
Dataset updated
Jun 4, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Edward Lau; Maggie Pui Yu Lam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Custom databases in 12 human tissues.
Bioinformatics Protein Dataset - Simulated
kaggle.com
zip
Updated Dec 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
Explore at:
zip(12928905 bytes)Available download formats
Dataset updated
Dec 27, 2024
Authors
Rafael Gallo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Subtitle

"Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

Description

Introduction

This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

Columns Included

ID_Protein: Unique identifier for each protein.

Sequence: String of amino acids.

Molecular_Weight: Molecular weight calculated from the sequence.

Isoelectric_Point: Estimated isoelectric point based on the sequence composition.

Hydrophobicity: Average hydrophobicity calculated from the sequence.

Total_Charge: Sum of the charges of the amino acids in the sequence.

Polar_Proportion: Percentage of polar amino acids in the sequence.

Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.

Sequence_Length: Total number of amino acids in the sequence.

Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

Inspiration and Sources

While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

Proposed Uses

This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

How This Dataset Was Created

Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.

Property Calculation: Physicochemical properties were calculated using the Biopython library.

Class Assignment: Classes were randomly assigned for classification purposes.

Limitations

The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.

The functional classes are simulated and do not correspond to actual biological characteristics.

Data Split

The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

Acknowledgment

This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.
List of nucleic acid databases.
figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Jacques; Paulina Bolivar; Kristian Pietras; Emma U. Hammarlund (2023). List of nucleic acid databases. [Dataset]. http://doi.org/10.1371/journal.pone.0279597.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0279597.t001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Florian Jacques; Paulina Bolivar; Kristian Pietras; Emma U. Hammarlund
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Developments in sequencing technologies and the sequencing of an ever-increasing number of genomes have revolutionised studies of biodiversity and organismal evolution. This accumulation of data has been paralleled by the creation of numerous public biological databases through which the scientific community can mine the sequences and annotations of genomes, transcriptomes, and proteomes of multiple species. However, to find the appropriate databases and bioinformatic tools for respective inquiries and aims can be challenging. Here, we present a compilation of DNA and protein databases, as well as bioinformatic tools for phylogenetic reconstruction and a wide range of studies on molecular evolution. We provide a protocol for information extraction from biological databases and simple phylogenetic reconstruction using probabilistic and distance methods, facilitating the study of biodiversity and evolution at the molecular level for the broad scientific community.
e
PRINTS
ebi.ac.uk
Updated Jun 14, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2012). PRINTS [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Jun 14, 2012
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family or domain. PRINTS is based at the University of Manchester, UK.
MMseqs2 virus protein database with ICTV taxonomy
zenodo.org
datasetcatalog.nlm.nih.gov
bin
Updated Jan 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio Pedro Camargo; Antonio Pedro Camargo (2025). MMseqs2 virus protein database with ICTV taxonomy [Dataset]. http://doi.org/10.5281/zenodo.6574914
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6574914
Dataset updated
Jan 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonio Pedro Camargo; Antonio Pedro Camargo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MMseqs2 virus protein database decorated with ICTV taxonomy. Proteins originally retrieved from NCBI NR in 2022-05-19.

Steps for reproduction can be found at https://github.com/apcamargo/ictv-mmseqs2-protein-database
e
SFLD
ebi.ac.uk
Updated Sep 7, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Sep 7, 2018
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.
n
SUPFAM
neuinfo.org
rrid.site
+2more
Updated Nov 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). SUPFAM [Dataset]. http://identifiers.org/RRID:SCR_005304
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_005304
Dataset updated
Nov 19, 2024
Description
SUPFAM is a database that consists of clusters of potentially related homologous protein domain families, with and without three-dimensional structural information, forming superfamilies. The present release (Release 3.0) of SUPFAM uses homologous families in Pfam (Version 23.0) and SCOP (Release 1.69) which are examples of sequence -alignment and structure classification databases respectively. The two steps involved in setting up of SUPFAM database are * Relating Pfam and SCOP families using a new profile-profile alignment algorithm AlignHUSH. This results in identifying many Pfam families which could be related to a family or superfamily of known structural information. * An all-against-all match among Pfam families with yet unknown structure resulting in identification of related Pfam families forming new potential superfamilies. The SUPFAM database can be used in either the Browse mode or Search mode. In Browse mode you can browse through the Superfamilies, Pfam families or SCOP families. In each of these modes you will be presented with a full list which can be easily browsed. In Search mode, you can search for Pfam families, SCOP families or Superfamilies based on keywords or SCOP/Pfam identifiers of families and superfamilies., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Protein Structure Initiative - TargetTrack 2000-2017 - all data files
zenodo.org
explore.openaire.eu
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helen M. Berman, Margaret J. Gabanyi, Andrei Kouranov, David I. Micallef, John Westbrook; Protein Structure Initiative network of investigators; Helen M. Berman, Margaret J. Gabanyi, Andrei Kouranov, David I. Micallef, John Westbrook; Protein Structure Initiative network of investigators (2020). Protein Structure Initiative - TargetTrack 2000-2017 - all data files [Dataset]. http://doi.org/10.5281/zenodo.821654
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.821654
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Helen M. Berman, Margaret J. Gabanyi, Andrei Kouranov, David I. Micallef, John Westbrook; Protein Structure Initiative network of investigators; Helen M. Berman, Margaret J. Gabanyi, Andrei Kouranov, David I. Micallef, John Westbrook; Protein Structure Initiative network of investigators
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Protein Structure Initiative - TargetTrack protein target registration database (795 MB, gzipped tarball)

The Protein Structure Initiative was a high-throughput structural genomics effort from 2000-2015 focused on developing technologies to enable greater coverage of protein structure space. Over its 15-year tenure, over 100 investigators at 35 centers (see ContributingCenters.xls) declared over 350,000 protein sequences (targets) that they would study using state-of-the-art protein production and structure determination methods. Many of these targets were selected through bioinformatics-based methods to serve as representatives for sequence and structure clusters.

From 2003-2010, these selected sequences and some basic identifying metadata were kept in a database called TargetDB, created at the Research Collaboratory for Structural Bioinformatics at Rutgers University. In 2008, a second database named PepcDB was created to track detailed experimental trial history and the standard protocols used by the PSI centers. These two databases became the principal structural genomics target databases, and were rolled into the PSI Structural Biology Knowledgebase in 2008.

As part of the third phase of the PSI, TargetDB and PepcDB were merged into a single resource, TargetTrack, to facilitate one-stop access to the data as well as expanding the schema to include new required data items. Participating centers deposited the latest status on their active targets and the protocols that were used (along with any deviations) on a weekly or quarterly basis. TargetTrack provided a variety of pre-computed data downloads on a weekly basis as well.

In July 2017, the Structural Biology Knowledgebase ceased operations. The files provided in this tarball represent the final datafiles generated by TargetTrack (timestamp June 30, 2017). Please read the README included in this dataset for descriptions of each file.

The entire TargetTrack datafile in XML format can be found in /TargetTrack XML files/tt.xml.gz

Key documentation can be found in the /Documentation folder.
TargetTrack schema: targetTrack-v1.4.1.pdf
Spreadsheet with TargetTrack enumerations for relevant fields: targetTrackEnumeratedDataItems-v1.4.1-1.xls
Image depicted the XML data schema: targetTrack-v1.4.1.jpg

These files are 868 MB in total size, uncompressed.
To open the tarball, use the command 'tar -zxvf TargetTrack-1Jul2017.tar.gz'

-- created by the PSI Structural Biology Knowledgebase, July 5, 2017
Diamond2GO Database – Version 2025-08-12 (nr_clean_d2go)
zenodo.org
bin
Updated Aug 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rhys Farrer; Rhys Farrer (2025). Diamond2GO Database – Version 2025-08-12 (nr_clean_d2go) [Dataset]. http://doi.org/10.5281/zenodo.16818512
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16818512
Dataset updated
Aug 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rhys Farrer; Rhys Farrer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 28, 2025
Description
This is the updated Diamond2GO reference database built on 12th August 2025.

It is a DIAMOND-formatted protein database (`.dmnd`) consisting of over 27 million sequences derived from the NCBI `nr` dataset, filtered to include only those with Gene Ontology (GO) annotations, and redundancy reduction using MMseqs2 (95% similarity). This version improves sensitivity and annotation coverage compared to the original 2023 release used in the published D2GO manuscript, and the earlier 2025 release.

This database is intended for use with the Diamond2GO tool, which enables rapid GO-term annotation and enrichment analysis for high-throughput sequencing datasets.

For reproducibility of results published using the earlier version (699,409 sequences), please refer to the [v1.0.0 release] https://github.com/rhysf/Diamond2GO/releases/tag/6a035ce
s
iPTMnet
scicrunch.org
rrid.site
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). iPTMnet [Dataset]. http://identifiers.org/RRID:SCR_014416
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014416
Dataset updated
Dec 4, 2023
Description
A protein database which connects multiple disparate bioinformatics tools and systems text mining, data mining, analysis and visualization tools, and databases and ontologies.
n
Bioinformatics Links Directory
neuinfo.org
scicrunch.org
+3more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Bioinformatics Links Directory [Dataset]. http://identifiers.org/RRID:SCR_008018
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008018
Dataset updated
Jan 29, 2022
Description
Database of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.
Ribosomal protein database of Pseudomonas putida KT2440
figshare.com
xlsx
Updated Mar 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenfa Ng (2021). Ribosomal protein database of Pseudomonas putida KT2440 [Dataset]. http://doi.org/10.6084/m9.figshare.14167274.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14167274.v1
Dataset updated
Mar 5, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Wenfa Ng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This work presents the ribosomal protein database of Pseudomonas putida KT2440. Original data for the work came from the annotated proteome data of the bacterium downloaded from UniProt. Using an in-house MATLAB ribosomal protein database analysis software, the original proteome data file was parsed to extract protein name and amino acid sequence of all ribosomal proteins in the species. The database also includes calculated variables such as number of residues, molecular weight, and nucleotide sequence. Overall, the presented database could serve as a ribosomal protein mass fingerprint for use in microbial identification, or it could be used in fundamental studies seeking to uncover new insights into ribosomal protein biology.
e
PANTHER
ebi.ac.uk
Updated Jun 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). PANTHER [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Jun 20, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise. These subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function, as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences. PANTHER is based at the University of Southern California, CA, US.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/

PROSITE profiles

Explore at:

Dataset updated

Feb 5, 2025

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

Clear search

Close search

Google apps

Main menu

PROSITE profiles

NCBIFAM

Structural Protein Sequences

Context

Content

Acknowledgements

Inspiration

PIRSF

HAMAP

Data from: PROSITE

Protein Structural Domain Classification

Custom protein databases

Bioinformatics Protein Dataset - Simulated

Subtitle

Description

Introduction

Columns Included

Inspiration and Sources

Proposed Uses

How This Dataset Was Created

Limitations

Data Split

Acknowledgment

List of nucleic acid databases.

PRINTS

MMseqs2 virus protein database with ICTV taxonomy

SFLD

SUPFAM

Protein Structure Initiative - TargetTrack 2000-2017 - all data files

Diamond2GO Database – Version 2025-08-12 (nr_clean_d2go)

iPTMnet

Bioinformatics Links Directory

Ribosomal protein database of Pseudomonas putida KT2440

PANTHER

PROSITE profiles