100+ datasets found
  1. UniProt SPROT

    • kaggle.com
    zip
    Updated Dec 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Apollo (2022). UniProt SPROT [Dataset]. https://www.kaggle.com/datasets/luckyapollo/uniprot-sprot
    Explore at:
    zip(838310998 bytes)Available download formats
    Dataset updated
    Dec 10, 2022
    Authors
    Apollo
    Description

    UniProtKB/Swiss-Prot is the expertly curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.

    The Universal Protein Resource (UniProt, http://www.uniprot.org) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).

    The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.

  2. b

    UniProt Protein

    • bioregistry.io
    Updated Apr 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). UniProt Protein [Dataset]. http://identifiers.org/wikidata:P352
    Explore at:
    Dataset updated
    Apr 26, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. Besides amino acid sequence and a description, it also provides taxonomic data and citation information.

  3. d

    UniProt

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UniProt [Dataset]. http://identifiers.org/RRID:SCR_002380
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.

  4. UniProt Proteins Reviewed (Swiss-Prot)

    • kaggle.com
    zip
    Updated Aug 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Lovyagin (2022). UniProt Proteins Reviewed (Swiss-Prot) [Dataset]. https://www.kaggle.com/datasets/andreylovyagin/uniprot-proteins-reviewed-swissprot
    Explore at:
    zip(479163007 bytes)Available download formats
    Dataset updated
    Aug 6, 2022
    Authors
    Andrey Lovyagin
    Description

    Uploaded UniProt reviewed proteins database with all columns for easier using in kaggle notebooks. All columns have description, but if you will have any questions, you can check UniProt Help where every column have a full explanation.

    For UniProt Species Proteomes check this dataset.

    License: Creative Commons Attribution 4.0 International (CC BY 4.0) License

  5. w

    UniProtKB

    • data.wu.ac.at
    api/sparql
    Updated Jul 30, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linking Open Data Cloud (2016). UniProtKB [Dataset]. https://data.wu.ac.at/odso/datahub_io/YWIwYTQ0ZjMtYzY0Mi00MmM5LWFiODItNDgxOWQ1ZTMzNDNm
    Explore at:
    api/sparql(20.0)Available download formats
    Dataset updated
    Jul 30, 2016
    Dataset provided by
    Linking Open Data Cloud
    Description

    UniProtKB is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.

  6. uniprot-database_(type_eggnog).27.09.2019.tab.rar

    • figshare.com
    application/x-rar
    Updated Jun 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Kumazawa Morais (2020). uniprot-database_(type_eggnog).27.09.2019.tab.rar [Dataset]. http://doi.org/10.6084/m9.figshare.12555425.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Jun 24, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Daniel Kumazawa Morais
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The current database was downloaded on 27.09.2019 and has the data fields (columns) as described below:# 1 Entry# 2 Entry name# 3 Status# 4 Protein names# 5 Gene names# 6 Organism# 7 Length# 8 Cross-reference (KO)# 9 Taxonomic lineage (PHYLUM)# 10 Taxonomic lineage (SPECIES) # This field carries current and old* taxonomic classifications.# 11 Taxonomic lineage (GENUS)# 12 Taxonomic lineage (KINGDOM)# 13 Taxonomic lineage (SUPERKINGDOM)# 14 Cross-reference (OrthoDB)# 15 Cross-reference (eggNOG)*Details about the classification used in UNIPROT can be found at the link: https://www.uniprot.org/help/taxonomy

  7. d

    The Universal Protein Resource (UniProt)

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Jul 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (NIH) (2023). The Universal Protein Resource (UniProt) [Dataset]. https://catalog.data.gov/dataset/the-universal-protein-resource-uniprot
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Institutes of Health (NIH)
    Description

    The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc).

  8. d

    UniProtKB

    • dknet.org
    Updated Oct 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). UniProtKB [Dataset]. http://identifiers.org/RRID:SCR_004426
    Explore at:
    Dataset updated
    Oct 24, 2019
    Description

    Central repository for collection of functional information on proteins, with accurate and consistent annotation. In addition to capturing core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and experimental and computational data. The UniProt Knowledgebase consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database which brings together experimental results, computed features, and scientific conclusions. UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization that await full manual annotation. Users may browse by taxonomy, keyword, gene ontology, enzyme class or pathway.

  9. r

    UniprotKB/SwissProt

    • resodate.org
    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boutet; Lieberherr; Tognolli; Schneider; Bansal; Bridge; Poux; Bougueleret; Xenarios (2024). UniprotKB/SwissProt [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdW5pcHJvdGtiLXN3aXNzcHJvdA==
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Boutet; Lieberherr; Tognolli; Schneider; Bansal; Bridge; Poux; Bougueleret; Xenarios
    Description

    The UniprotKB/SwissProt database contains protein sequence information.

  10. d

    UniProtKB/Swiss-Prot

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Dec 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). UniProtKB/Swiss-Prot [Dataset]. http://identifiers.org/RRID:SCR_021164
    Explore at:
    Dataset updated
    Dec 25, 2023
    Description

    Curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.

  11. b

    UniProt Isoform

    • bioregistry.io
    Updated Dec 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). UniProt Isoform [Dataset]. http://identifiers.org/biolink:UNIPROT.ISOFORM
    Explore at:
    Dataset updated
    Dec 18, 2021
    Description

    The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. This collection is a subset of UniProtKB, and provides a means to reference isoform information.

  12. e

    UniProtKB

    • ebi.ac.uk
    Updated Oct 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). UniProtKB [Dataset]. http://www.ebi.ac.uk/interpro/protein/unreviewed/entry/InterPro/
    Explore at:
    Dataset updated
    Oct 14, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of the type protein from the database UniProtKB - version 2021_04

  13. R

    Isoelectric point for all UniProtKB/TrEMBL proteins April 2016

    • repod.icm.edu.pl
    • commons.datacite.org
    7z, bin
    Updated May 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kozlowski, Lukasz (2016). Isoelectric point for all UniProtKB/TrEMBL proteins April 2016 [Dataset]. http://doi.org/10.18150/repod.9948646
    Explore at:
    7z(11492396457), bin(11492396457)Available download formats
    Dataset updated
    May 18, 2016
    Dataset provided by
    RepOD
    Authors
    Kozlowski, Lukasz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicted isoelectric point for all UniProtKB/TrEMBL proteins (April 2016) done using 18 different algorithms. Over 63 millions of protein sequences. Compressed using 7zip **Primary reference: Kozlowski, LP (2016) Proteome-pI: proteome isoelectric point database. Nucleic Acids Research doi: 10.1093/nar/gkw978 **www: http://isoelectricpointdb.org

  14. b

    UniProt Resource

    • bioregistry.io
    Updated Feb 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UniProt Resource [Dataset]. https://bioregistry.io/uniprot.resource
    Explore at:
    Dataset updated
    Feb 26, 2022
    Description

    The cross-references section of UniProtKB entries displays explicit and implicit links to databases such as nucleotide sequence databases, model organism databases and genomics and proteomics resources.

  15. UniProtKB/Swiss-Prot Protein Embeddings

    • kaggle.com
    zip
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Ofer (2023). UniProtKB/Swiss-Prot Protein Embeddings [Dataset]. https://www.kaggle.com/datasets/danofer/uniprotkbswiss-prot-protein-embeddings/code
    Explore at:
    zip(2087271680 bytes)Available download formats
    Dataset updated
    Apr 23, 2023
    Authors
    Dan Ofer
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description follows is from the official UniProt embeddings page, which also hosts this dataset originally.

    Protein embeddings are a way to encode functional and structural properties of a protein, mostly from its sequence only, in a machine-friendly format (vector representation). Generating such embeddings is computationally expensive, but once computed they can be leveraged for different tasks, such as sequence similarity search, sequence clustering, and sequence classification.

    UniProt provided raw embeddings (mean pooled, per-protein using the ProtT5 model) for UniProtKB/Swiss-Prot.

    Note: Protein sequences longer than 12k residues are excluded due to limitation of GPU memory (this concerns only a handful of proteins).

    Sample code The embeddings.h5 files store the embeddings as key-value pairs. The key is the protein accession number and the value is the embeddings vector. The following code snippet shows how to read and iterate over an embeddings file in python.

    import numpy as np
    import h5py
    
    with h5py.File("path/to/embeddings.h5", "r") as file:
      print(f"number of entries: {len(file.items())}")
      for sequence_id, embedding in file.items():
        print(
          f" id: {sequence_id}, "
          f" embeddings shape: {embedding.shape}, "
          f" embeddings mean: {np.array(embedding).mean()}"
        )
    

    Sample output (SARS-CoV-2 embeddings from release 2022_04) per-protein file:

    number of entries: 17 id: A0A663DJA2, embeddings shape: (1024,), embeddings mean: 0.0006136894226074219 id: P0DTC1, embeddings shape: (1024,), embeddings mean: 0.0011968612670898438 id: P0DTC2, embeddings shape: (1024,), embeddings mean: 0.001041412353515625

    SOURCE: https://www.uniprot.org/help/embeddings https://www.uniprot.org/help/downloads#embeddings Reviewed (Swiss-Prot) - per-protein: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/embeddings/uniprot_sprot/per-protein.h5

  16. d

    NEWT

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). NEWT [Dataset]. http://identifiers.org/RRID:SCR_004477
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    NEWT is the taxonomy database maintained by the UniProt group. It integrates taxonomy data compiled in the NCBI database and data specific to the UniProt Knowledgebase. Browse by hierarchy, List all, or Complete proteomes. Organisms are classified in a hierarchical tree structure. Our taxonomy database contains every node (taxon) of the tree. UniProtKB taxonomy data is manually curated: next to manually verified organism names, we provide a selection of external links, organism strains and viral host information. Species with protein sequences stored in the UniProt Knowledgebase are named according to UniProt nomenclature. We endeavour to maintain a list of manually curated species names for which protein sequence data is available. In particular, we have adopted a systematic convention for naming viral and bacterial strains and isolates. Links to external sites are chosen by the UniProt taxonomy team and show pictures and various scientific data of interest (taxonomy, biology, physiology,...).

  17. Protein-centric rate of sequence evolution according to Rate4Site on...

    • figshare.com
    txt
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    emmanuel levy; Benjamin Dubreuil (2021). Protein-centric rate of sequence evolution according to Rate4Site on orthogroups of 14 fungal species [Dataset]. http://doi.org/10.6084/m9.figshare.13735537.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 9, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    emmanuel levy; Benjamin Dubreuil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overall, 25 descriptors (features) are calculated for 3797 unique proteins.The legend for each descriptor is given in the associated header file.Columns 1-5 provide protein identifiers:- ORF, - SGD Gene Name, - UniprotKB, - Matching PDB structure?- PDB code of closest structureColumns 6-8 correspond to protein expression:- Integrated abundance in ppm,- log10 abundance,- bins of abundance (5 bins)Columns 9-16 contain evolutionary rates averaged over:- Full sequence- Disordered residues- Not Disordered residues- Domain residues- Not Domain residues- Residues with PDB coordinates- Surface residues (>25% relative ASA)- Buried residues (

  18. d

    Data from: UniSave

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 4, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2026). UniSave [Dataset]. http://identifiers.org/RRID:SCR_004946
    Explore at:
    Dataset updated
    Jan 4, 2026
    Description

    The UniProtKB Sequence/Annotation Version Archive (UniSave) is a repository of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entry versions. Entries can be retrieved by entering a primary accession number or an entry name and pressing the Go! button. The result of the query is a list of entry versions with the UniProtKB database name, entry status, primary accession number, entry name, entry version, sequence version, release number and the release date, ordered by the release date, the latest version first. The entry version status can be ''''incorporated'''', ''''active'''', ''''changed'''', ''''replaced'''' or ''''deleted''''. An incorporated entry version is the first entry version added into UniProtKB, an active entry version is part of the latest public release, a changed entry version has been superseded by a newer entry version, a replaced entry has become secondary to another entry, and a deleted entry has been removed from the UniProtKB without becoming secondary to any other entry. For replaced entry versions, the status ''''Replaced'''' can be clicked to return all entries, which have the given entry as a secondary entry. If a date is provided as part of the query then only the version of the entry that was current at that date is displayed. Entries can be viewed by clicking ''''View'''' in the query results table. The ''''>'''' links can be used to access the earlier and later entry versions. The ''''Back to List'''' link returns the user to the query results table. Selecting ''''UniProtKB'''' or ''''Fasta'''' and pressing ''''Save'''' downloads the entry in flat file or fasta format. Comparison between entry versions is straightforward: selecting two entries and clicking the ''''Compare Selected'''' button will show the differences between the two entries. Whenever comparisons are made a Smith-Waterman sequence alignment is computed using SSEARCH, and displayed at the bottom of the entry. The actual alignment is displayed only when the sequences are not identical.

  19. d

    UniProtKB Subcellular Locations

    • dknet.org
    • rrid.site
    • +2more
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). UniProtKB Subcellular Locations [Dataset]. http://identifiers.org/RRID:SCR_004373
    Explore at:
    Dataset updated
    Mar 28, 2025
    Description

    The subcellular locations in which a protein is found are described in UniProtKB entries with a controlled vocabulary, which includes also membrane topology and orientation terms. You may search in subcellular locations or list them all along with their definitions (490). By default, searching the subcellular locations will look for matches in both name and definition.

  20. h

    uniprot

    • huggingface.co
    Updated Dec 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CleverThis (2025). uniprot [Dataset]. https://huggingface.co/datasets/CleverThis/uniprot
    Explore at:
    Dataset updated
    Dec 18, 2025
    Dataset authored and provided by
    CleverThis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    UniProt RDF

      Dataset Description
    

    Comprehensive protein knowledgebase with functional annotations Original Source: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/uniprotkb_reviewed_eukaryota_opisthokonta_metazoa_33208_0.rdf.xz

      Dataset Summary
    

    This dataset contains RDF triples from UniProt RDF converted to HuggingFace dataset format for easy use in machine learning pipelines.

    Format: Originally rdf, converted to HuggingFace Dataset Size: 0.392 GB… See the full description on the dataset page: https://huggingface.co/datasets/CleverThis/uniprot.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Apollo (2022). UniProt SPROT [Dataset]. https://www.kaggle.com/datasets/luckyapollo/uniprot-sprot
Organization logo

UniProt SPROT

Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot)

Explore at:
151 scholarly articles cite this dataset (View in Google Scholar)
zip(838310998 bytes)Available download formats
Dataset updated
Dec 10, 2022
Authors
Apollo
Description

UniProtKB/Swiss-Prot is the expertly curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.

The Universal Protein Resource (UniProt, http://www.uniprot.org) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).

The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.

Search
Clear search
Close search
Google apps
Main menu