55 datasets found
  1. e

    Data from: Expasy

    • expasy.org
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIB Swiss Institute of Bioinformatics (2023). Expasy [Dataset]. http://doi.org/10.25504/FAIRsharing.ceeffa
    Explore at:
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    SIB Swiss Institute of Bioinformatics
    Description

    Expasy is the bioinformatics resource portal of the SIB Swiss Institute of Bioinformatics.

  2. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  3. f

    Additional file 2: of BASILIScan: a tool for high-throughput analysis of...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michal Barski (2023). Additional file 2: of BASILIScan: a tool for high-throughput analysis of intrinsic disorder patterns in homologous proteins [Dataset]. http://doi.org/10.6084/m9.figshare.7453838.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Michal Barski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BASILIScan search with human CDC7 kinase (Uniprot ID: O00311) against all vertebrate sequences available from UniprotKB (both Swissprot and TrEMBL). (CSV 263 kb)

  4. d

    SWISS-2DPAGE

    • dknet.org
    • neuinfo.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). SWISS-2DPAGE [Dataset]. http://identifiers.org/RRID:SCR_006946
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A database of proteins identified by various 2-D PAGE and SDS-PAGE reference maps. Each SWISS-2DPAGE entry contains textual data on one protein, including mapping procedures, physiological and pathological information, experimental data (isoelectric point, molecular weight, amino acid composition, peptide masses) and bibliographical references. In addition to this textual data, SWISS-2DPAGE provides several 2-D PAGE and SDS-PAGE images showing the experimentally determined location of the protein, as well as a theoretical region computed from the sequence protein, indicating where the protein might be found in the gel. Using the database, users can locate these proteins on the 2-D PAGE maps or display the region of a 2-D PAGE map where one might expect to find a protein from UniProtKB/Swiss-Prot.

  5. e

    Data from: PROSITE

    • prosite.expasy.org
    • the-mouth.com
    • +7more
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE [Dataset]. https://prosite.expasy.org/
    Explore at:
    Dataset updated
    Jun 18, 2025
    Description

    PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].

  6. d

    RESID

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Oct 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). RESID [Dataset]. http://identifiers.org/RRID:SCR_003505
    Explore at:
    Dataset updated
    Oct 18, 2019
    Description

    A comprehensive collection of annotations and structures for protein modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link post-translational modifications. It provides: systematic and alternate names, atomic formulas and masses, enzyme activities generating the modifications, keywords, literature citations, Gene Ontology cross-references, Protein Information Resource (PIR) and SWISS-PROT protein sequence database feature table annotations, structure diagrams and molecular models. Each RESID Database entry presents a chemically unique modification and shows how that modification is currently annotated in the protein sequence databases, Swiss-Prot and the Protein Information Resource (PIR). The RESID Database provides a table of corresponding equivalent feature annotations that is used in the UniProt project, an international effort to combine the resources of the Swiss-Prot, TrEMBL and PIR. As an annotation tool, the RESID Database is used in standardizing and enhancing modification descriptions in the feature tables of Swiss-Prot entries.

  7. e

    NCBIFAM

    • ebi.ac.uk
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Dec 16, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).

  8. n

    ExPASy ABCD database

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ExPASy ABCD database [Dataset]. http://identifiers.org/RRID:SCR_017401
    Explore at:
    Dataset updated
    Aug 5, 2024
    Description

    Repository of sequenced antibodies, integrating curated information about antibody and its antigen with cross links to standardized databases of chemical and protein entities. Manually curated repository of sequenced antibodies, developed by Geneva Antibody Facility at University of Geneva, in collaboration with CALIPHO and Swiss Prot groups at SIB Swiss Institute of Bioinformatics. Database provides list of sequenced antibodies with their known targets. Each antibody is assigned unique ID number that can be used in academic publications to increase reproducibility of experiments.

  9. f

    BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation...

    • plos.figshare.com
    ods
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian-Alexander Dudek; Henning Dannheim; Dietmar Schomburg (2023). BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation [Dataset]. http://doi.org/10.1371/journal.pone.0182216
    Explore at:
    odsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Christian-Alexander Dudek; Henning Dannheim; Dietmar Schomburg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de.

  10. e

    SMART

    • ebi.ac.uk
    Updated Feb 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SMART [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 14, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. SMART is based at EMBL, Heidelberg, Germany.

  11. e

    HAMAP

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.

  12. neXtProt Data release 2023-09-11

    • zenodo.org
    application/gzip
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amos Bairoch; Amos Bairoch; Lydie Lane; Lydie Lane (2024). neXtProt Data release 2023-09-11 [Dataset]. http://doi.org/10.5281/zenodo.14163588
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Dec 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amos Bairoch; Amos Bairoch; Lydie Lane; Lydie Lane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 26, 2024
    Description

    neXtProt is a comprehensive human-centric discovery platform, offering its users a seamless integration of and navigation through protein-related data.

    Developed between 2009 and 2023 by the CALIPHO group (Computer Analysis and Laboratory Investigation of Proteins of Human Origin) at the University of Geneva and the SIB Swiss Institute of Bioinformatics, neXtProt was designed to help researchers make sense of what all these human proteins do in our bodies by

    • Adding more information to the corpus of data on human proteins that is already in Swiss-Prot with data originating from a variety of high-throughput approaches (such as micro-array, antibodies screens, proteomics, interactomics, structural genomics).
    • Carefully selecting all of these data sets to provide high-quality data.
    • Organizing the data in such a way that it is possible to seamlessly build powerful queries in the most user-friendly way possible.
    • Developing software tools ranging from sequence analysis to text and data mining to be integrated in various research environments. These tools will meet the specific needs of both academic and industrial users.
  13. e

    PIRSF

    • ebi.ac.uk
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). PIRSF [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Apr 7, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.

  14. f

    Scores of different structural assessment tools for the predicted models...

    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liza Teresa Rozario; Tanima Sharker; Tasnin Akter Nila (2023). Scores of different structural assessment tools for the predicted models from SWISS-MODEL homology-modeling server. [Dataset]. http://doi.org/10.1371/journal.pone.0252932.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Liza Teresa Rozario; Tanima Sharker; Tasnin Akter Nila
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scores of different structural assessment tools for the predicted models from SWISS-MODEL homology-modeling server.

  15. n

    UniProt Chordata protein annotation program

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jul 12, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). UniProt Chordata protein annotation program [Dataset]. http://identifiers.org/RRID:SCR_007071
    Explore at:
    Dataset updated
    Jul 12, 2013
    Description

    Data set of manually annotated chordata-specific proteins as well as those that are widely conserved. The program keeps existing human entries up-to-date and broadens the manual annotation to other vertebrate species, especially model organisms, including great apes, cow, mouse, rat, chicken, zebrafish, as well as Xenopus laevis and Xenopus tropicalis. A draft of the complete human proteome is available in UniProtKB/Swiss-Prot and one of the current priorities of the Chordata protein annotation program is to improve the quality of human sequences provided. To this aim, they are updating sequences which show discrepancies with those predicted from the genome sequence. Dubious isoforms, sequences based on experimental artifacts and protein products derived from erroneous gene model predictions are also revisited. This work is in part done in collaboration with the Hinxton Sequence Forum (HSF), which allows active exchange between UniProt, HAVANA, Ensembl and HGNC groups, as well as with RefSeq database. UniProt is a member of the Consensus CDS project and thye are in the process of reviewing their records to support convergence towards a standard set of protein annotation. They also continuously update human entries with functional annotation, including novel structural, post-translational modification, interaction and enzymatic activity data. In order to identify candidates for re-annotation, they use, among others, information extraction tools such as the STRING database. In addition, they regularly add new sequence variants and maintain disease information. Indeed, this annotation program includes the Variation Annotation Program, the goal of which is to annotate all known human genetic diseases and disease-linked protein variants, as well as neutral polymorphisms.

  16. Classifying protein kinase conformations with machine learning: data

    • zenodo.org
    application/gzip, csv +1
    Updated Jul 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan REVEGUK; Ivan REVEGUK (2023). Classifying protein kinase conformations with machine learning: data [Dataset]. http://doi.org/10.5281/zenodo.8175370
    Explore at:
    csv, tsv, application/gzipAvailable download formats
    Dataset updated
    Jul 23, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ivan REVEGUK; Ivan REVEGUK
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data collection accompanies the manuscript "Classifying protein kinase conformations with machine learning".

    It is created using the kinactive v0.1 tool written in pure Python v3.10. Note that the data are provided for the reference and reproducibility purposes and will not be compatible with later versions of `kinactive` built upon lXtractor > 0.1.1. Refer to the kinactive documentation for instructions on how to obtain an actualized version of the structural kinome collection.

    File descriptions:

    • db_v3.tar.gz -- a structural kinome collection archive. One can unpack it and inspect the contents or load it into the Python interpreter using `kinactive` or `lXtractor` tools.
    • db_af2.tar.gz -- an AlphaFold2 kinome collection for Swiss-Prot sequences.
    • default_*_vs.tsv -- structure/sequence variables calculated with lXtractor and used in an interpretable ML pipeline.
    • *_features.tsv -- lists of ranked features selected by the eBoruta tool for each classifier.
    • Supplement_labels.tsv -- ML model predictions for each PK domain structure found in db_v3.
    • predictions_af2.csv -- Active/Inactive and DFG labels predicted for domains in db_af2.

  17. d

    PRED-GPCR

    • dknet.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). PRED-GPCR [Dataset]. http://identifiers.org/RRID:SCR_006196
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A prediction tool for GPCR Family Classification from sequence alone based on a probabilistic method that uses family-specific profile Hidden Markov Models. The PRED-GPCR system is based on a probabilistic method that uses family specific profile HMMs in order to determine to which GPCR family a query sequence belongs or resembles. The approach proposed in this method exploits the descriptive power of profile HMMs along with an exhaustive discrimination assessment method to select only highly selective and sensitive profiles, for each family. The collection of these profiles constitutes a signature library, which is scanned, for significant matches with a given query sequence. The output report for a query sequence consists of two sections: * A ranked list of the profile HMM matches, below the selected individual motif E-value cutoff, along with their corresponding family. * A ranked list of the Combined P-values, E-values as well as the number of profiles matched for each family. To cross-evaluate your results you can browse through Swiss-Prot, Trembl, Pfam and Prosite family related entries.

  18. o

    Steinegger Lab Datasets

    • registry.opendata.aws
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steinegger Lab, Seoul National University (2025). Steinegger Lab Datasets [Dataset]. https://registry.opendata.aws/steineggerlab/
    Explore at:
    Dataset updated
    May 3, 2025
    Dataset provided by
    <a href="https://steineggerlab.com">Steinegger Lab, Seoul National University</a>
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Steinegger Lab Dataset comprises biological databases and resources critical for protein sequence and structure analysis, developed to support ColabFold, MMseqs2, and Foldseek/Foldcomp—three high-performance computational tools widely used in bioinformatics.The MMseqs2 dataset serves as the backbone for our fast structure prediction tool, ColabFold, and includes UniRef30, BFD, and the ColabFold environmental databases. These datasets are specifically designed for the rapid generation of multiple sequence alignments (MSAs), which are essential for high-accuracy structure prediction. Beyond MSA generation, these resources allow for fast taxonomy annotations and functional annotation, supporting a wide range of bioinformatics applications.The Foldseek dataset includes preprocessed databases such as the AlphaFold Database (AFDB), PDB, SwissProt, and CATH, specifically designed for protein structure similarity searches. These datasets encompass the majority of both experimental and predicted structural resources, supporting analyses for monomers and multimers alike.

  19. f

    Filters applied to UniProt protein entries to parse enzyme data from UniProt...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian-Alexander Dudek; Henning Dannheim; Dietmar Schomburg (2023). Filters applied to UniProt protein entries to parse enzyme data from UniProt flatfiles. [Dataset]. http://doi.org/10.1371/journal.pone.0182216.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Christian-Alexander Dudek; Henning Dannheim; Dietmar Schomburg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Filters applied to UniProt protein entries to parse enzyme data from UniProt flatfiles.

  20. e

    SFLD

    • ebi.ac.uk
    Updated Sep 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Sep 7, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SIB Swiss Institute of Bioinformatics (2023). Expasy [Dataset]. http://doi.org/10.25504/FAIRsharing.ceeffa

Data from: Expasy

Related Article
Explore at:
Dataset updated
Sep 15, 2023
Dataset provided by
SIB Swiss Institute of Bioinformatics
Description

Expasy is the bioinformatics resource portal of the SIB Swiss Institute of Bioinformatics.

Search
Clear search
Close search
Google apps
Main menu