100+ datasets found
  1. f

    Swiss-Prot database

    • springernature.figshare.com
    application/cdfv2
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuqi Wang; Cuihong You; Hongyu Ma; Yin Zhang; Guidong Miao; Qingyang Wu; Fan Lin; Jude Juventus Aweya (2023). Swiss-Prot database [Dataset]. http://doi.org/10.6084/m9.figshare.6124457.v1
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Shuqi Wang; Cuihong You; Hongyu Ma; Yin Zhang; Guidong Miao; Qingyang Wu; Fan Lin; Jude Juventus Aweya
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    All unigenes of Portunus sanguinolentus hit to the Swiss-Prot database.

  2. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  3. s

    UniProtKB

    • scicrunch.org
    • neuinfo.org
    Updated Oct 24, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). UniProtKB [Dataset]. http://identifiers.org/RRID:SCR_004426
    Explore at:
    Dataset updated
    Oct 24, 2019
    Description

    Central repository for collection of functional information on proteins, with accurate and consistent annotation. In addition to capturing core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and experimental and computational data. The UniProt Knowledgebase consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database which brings together experimental results, computed features, and scientific conclusions. UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization that await full manual annotation. Users may browse by taxonomy, keyword, gene ontology, enzyme class or pathway.

  4. Matches Found in Swiss-Prot Database.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kemal Sonmez; Naunihal T. Zaveri; Ilan A. Kerman; Sharon Burke; Charles R. Neal; Xinmin Xie; Stanley J. Watson; Lawrence Toll (2023). Matches Found in Swiss-Prot Database. [Dataset]. http://doi.org/10.1371/journal.pcbi.1000258.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kemal Sonmez; Naunihal T. Zaveri; Ilan A. Kerman; Sharon Burke; Charles R. Neal; Xinmin Xie; Stanley J. Watson; Lawrence Toll
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    False PositivesOther signaling molecules: FGF-3,5,7,10,17,18; GDNF; CD8,28; PDGF-2; TGF; VEGF (vascular endothelial growth factor); HBNF-1; MIP; NGF (nerve growth factor); Cytokine A21, IFN-α (interferon alpha); IGF binding protein 1B,2,3; IL7 (interleukin 7).Other: MAGF (microfibril associated protein), MINK (K-channel), K-channel related peptide, L-type Ca2+ channel, gamma subunit, myelin Po protein, Dif-2, Eosinophil, Syntaxin 1B (vesicle docking), Syntaxin 2, TMP21 (vesicle trafficking protein), Coagulation factor III, PGD2 synthase, syndecans, FKBP12 (FK506 binding protein), Folate receptor, ERp29, COMT, Connexin 32, Cytostatin.

  5. Gene Ontology according to the Swiss-Prot database for the substrates of the...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sander H. Diks; Kaushal Parikh; Marijke van der Sijde; Jos Joore; Tita Ritsema; Maikel P. Peppelenbosch (2023). Gene Ontology according to the Swiss-Prot database for the substrates of the minimal kinome, shown for humanized substrate set. [Dataset]. http://doi.org/10.1371/journal.pone.0000777.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sander H. Diks; Kaushal Parikh; Marijke van der Sijde; Jos Joore; Tita Ritsema; Maikel P. Peppelenbosch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gene Ontology according to the Swiss-Prot database for the substrates of the minimal kinome, shown for humanized substrate set.

  6. The Therapeutic Drug Target Database Human SwissProt

    • johnsnowlabs.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs, The Therapeutic Drug Target Database Human SwissProt [Dataset]. https://www.johnsnowlabs.com/marketplace/the-therapeutic-drug-target-database-human-swissprot/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset is a selection of The Therapeutic Target Database (release 4.3.02, 18th Oct 2013) protein IDs for successful targets. The web page states 388 but these reduced to 345 human Swiss-Prot accessions.

  7. Proven Drug Targets Converted to Human SwissProt Accessions

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Proven Drug Targets Converted to Human SwissProt Accessions [Dataset]. https://www.johnsnowlabs.com/marketplace/proven-drug-targets-converted-to-human-swissprot-accessions/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset is a supplementary data from "Novelty in the target landscape of the pharmaceutical industry" (2013). The listing of proven drug targets is converted to 248 human Swiss-Prot accessions.

  8. e

    HAMAP

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.

  9. Approved and Researched Drug Targets Human SwissProt Accessions

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Approved and Researched Drug Targets Human SwissProt Accessions [Dataset]. https://www.johnsnowlabs.com/marketplace/approved-and-researched-drug-targets-human-swissprot-accessions/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset is a supplementary data from "Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds" (2011). In this case the Entrez Gene IDs were mapped to 1651 human Swiss-Prot accessions but this includes both approved and research targets.

  10. h

    uniprot

    • huggingface.co
    Updated Apr 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Dampier (2022). uniprot [Dataset]. https://huggingface.co/datasets/damlab/uniprot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2022
    Authors
    Will Dampier
    Description

    Dataset Description

      Dataset Summary
    

    This dataset is a mirror of the Uniprot/SwissProt database. It contains the names and sequences of >500K proteins. This dataset was parsed from the FASTA file at https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz. Supported Tasks and Leaderboards: None Languages: English

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    Data Fields: id, description, sequence Data… See the full description on the dataset page: https://huggingface.co/datasets/damlab/uniprot.

  11. e

    SWISS-MODEL Homology Protein Models for Proteome UP000000589 - (Mus...

    • swissmodel.expasy.org
    gz
    Updated Sep 16, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). SWISS-MODEL Homology Protein Models for Proteome UP000000589 - (Mus musculus) [Dataset]. https://swissmodel.expasy.org/repository/species/10090
    Explore at:
    gzAvailable download formats
    Dataset updated
    Sep 16, 2016
    Description

    SWISS-MODEL homology protein models mapping to UniProtKB Proteome UP000000589 (Mus musculus)

  12. e

    Data from: PROSITE

    • prosite.expasy.org
    • the-mouth.com
    • +7more
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE [Dataset]. https://prosite.expasy.org/
    Explore at:
    Dataset updated
    Jun 18, 2025
    Description

    PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].

  13. f

    Number of human protein variations collected from the UniProt/Swiss-Prot...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongwook Choi; Gregory E. Sims; Sean Murphy; Jason R. Miller; Agnes P. Chan (2023). Number of human protein variations collected from the UniProt/Swiss-Prot database. [Dataset]. http://doi.org/10.1371/journal.pone.0046688.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yongwook Choi; Gregory E. Sims; Sean Murphy; Jason R. Miller; Agnes P. Chan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of human protein variations collected from the UniProt/Swiss-Prot database.

  14. e

    CATH-Gene3D

    • ebi.ac.uk
    Updated Oct 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). CATH-Gene3D [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Oct 21, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.

  15. s

    Repository URL

    • cinergi.sdsc.edu
    resource url
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repository URL [Dataset]. http://cinergi.sdsc.edu/geoportal/rest/metadata/item/323ebc5365ec476ebdcb92329cf10b57/html
    Explore at:
    resource urlAvailable download formats
    Description

    Link Function: information

  16. h

    SwissProt-EC-leaf

    • huggingface.co
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LightOn AI (2022). SwissProt-EC-leaf [Dataset]. https://huggingface.co/datasets/lightonai/SwissProt-EC-leaf
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2022
    Dataset authored and provided by
    LightOn AI
    Description

    Dataset

    Swissprot is a high quality manually annotated protein database. The dataset contains annotations with the functional properties of the proteins. Here we extract proteins with Enzyme Commission labels. The dataset is ported from Protinfer: https://github.com/google-research/proteinfer. The leaf level EC-labels are extracted and indexed, the mapping is provided in idx_mapping.json. Proteins without leaf-level-EC tags are removed.

      Example
    

    The protein Q87BZ2 have… See the full description on the dataset page: https://huggingface.co/datasets/lightonai/SwissProt-EC-leaf.

  17. e

    PIRSF

    • ebi.ac.uk
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). PIRSF [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Apr 7, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.

  18. e

    Proteome UP000000625 - (Escherichia coli) SWISS-MODEL dataset

    • swissmodel.expasy.org
    gz
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Proteome UP000000625 - (Escherichia coli) SWISS-MODEL dataset [Dataset]. https://swissmodel.expasy.org/repository
    Explore at:
    gzAvailable download formats
    Dataset updated
    Jul 15, 2025
    Description

    SWISS-MODEL homology models mapping to UniProtKB Proteome UP000000625 (Escherichia coli)

  19. e

    SUPERFAMILY

    • ebi.ac.uk
    Updated Nov 8, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2010). SUPERFAMILY [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Nov 8, 2010
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent the entire SCOP superfamily that the domain belongs to. SUPERFAMILY is based at the University of Bristol, UK.

  20. Z

    PSSH2 - database of protein sequence-to-structure homologies (including...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandeep Kaur (2022). PSSH2 - database of protein sequence-to-structure homologies (including Sars-CoV-2 structures) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4279163
    Explore at:
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    Sean O'Donoghue
    Neblina Sikta
    Andrea Schafferhans
    Sandeep Kaur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein sequence and structure data

    This data set contains data from Uniprot (in the files called protein_sequence, protein_synonyms, protein_names, organism_synonyms) and PDB (in the files called PDB and PDB_chain) as used by the Aquaria web resource at the time of download (2022-02-08).

    The PSSH2 data set

    PSSH2 is a database of protein sequence-to-structure homologies based on HHblits, an alignment method employing iterative comparisons of hidden Markov models (HMMs). To ensure the highest possible final alignment quality for matches in Aquaria using HHblits, we first calculate HMM profiles for each unique PDB sequence (PDB_full) and also for each unique Swiss-Prot sequence. We generated PSSH2 using HHblits to find similarities between HMMs from PDB and HMMs from UniProt sequences.

    Calculating PSSH2

    The Swissprot and PDB data was downloaded in November 2021. Generating PSSH2: We used UniRef30_2021_03 (originally called UniRef30_2021_06) from HH-suite, a database of non-redundant UniProt sequence clusters in which the highest pairwise sequence identity between clusters was 30%. The HHblits code and the code for running the calculations was retrieved from git (https://github.com/soedinglab/hh-suite.git and https://github.com/aschafu/PSSH2.git respectively) at the respective time of calculation in the timeframe until December 2021.

    PDB based sequence-to-structure alignments

    In addition to the PSSH2 data, new PDB structures were retrieved based on the primary accession of the proteins, by querying for all chains in all PDB entries with exact matches using the sequence cross references records given in PDB. Sequence-to-structure alignments were then created, again based on information provided in each PDB entry. These are contained in the PDBchain data.

    This data covers sequences and PDB structures in the timeframe until February 2022.

    Evaluating PSSH2

    The resulting alignment data was analysed using CATH domain assignments downloaded from /cath/releases/all-releases/v4_2_0/cath-classification-data/ to define correct hits and false hits:

    The set of query sequences is defined by the CATH non-redundant S40_overlap_60 dataset (ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/all-releases/v4_2_0/non-redundant-data-sets/)

    The set of all expected hits are all pdb structures containing a domain with the same CATH code if contained in the set of processed sequences (-> all) or only if also contained in the set of non redundant sequences (-> nr40).

    The set of true positives is defined by sharing the same CATH code up to the level of homology ("CATH") or up to the level of topology ("CAT").

    The data was evaluated with respect to false discovery rate (FDR) and recall (true positive rate TPR) by cumulatively considering all hits with an E-value below the threshold ("C") or in bins with an E-value between the threshold and one tenth of the threshold ("B"). This evaluation was carried out for the data obtained in November 2021 (202111) as well as previous data from October 2020 (202010), February 2020 (202002) and September 2017 (201709). The results are collected in PSSH CATH validation.csv.

    Known errors

    Due to processing error, the profile of pdb structure 5fia A / B (sequence md5 052667679fc644184f40063c7602c9e1) is incomplete in the pdb_full hhblits database which led to further errors in generating sequence based alignments for sequences for 1vtm P (sequence md5 c844aff103449363cb8489c78c58ebf1) and 434t A / B (sequence md5 d67aa1c3a36492c719cb48b5e7ecc624).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shuqi Wang; Cuihong You; Hongyu Ma; Yin Zhang; Guidong Miao; Qingyang Wu; Fan Lin; Jude Juventus Aweya (2023). Swiss-Prot database [Dataset]. http://doi.org/10.6084/m9.figshare.6124457.v1

Swiss-Prot database

Related Article
Explore at:
application/cdfv2Available download formats
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Shuqi Wang; Cuihong You; Hongyu Ma; Yin Zhang; Guidong Miao; Qingyang Wu; Fan Lin; Jude Juventus Aweya
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

All unigenes of Portunus sanguinolentus hit to the Swiss-Prot database.

Search
Clear search
Close search
Google apps
Main menu