100+ datasets found

UniProt SPROT
kaggle.com
zip
Updated Dec 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Apollo (2022). UniProt SPROT [Dataset]. https://www.kaggle.com/datasets/luckyapollo/uniprot-sprot
Explore at:
zip(838310998 bytes)Available download formats
Dataset updated
Dec 10, 2022
Authors
Apollo
Description
UniProtKB/Swiss-Prot is the expertly curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.

The Universal Protein Resource (UniProt, http://www.uniprot.org) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).

The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.
b
UniProt Protein
bioregistry.io
Updated Apr 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). UniProt Protein [Dataset]. http://identifiers.org/wikidata:P352
Explore at:
Unique identifier
https://identifiers.org/wikidata:P352, https://identifiers.org/biolink:UniProtKB https://identifiers.org/re3data:r3d100011521
Dataset updated
Apr 26, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. Besides amino acid sequence and a description, it also provides taxonomic data and citation information.
d
UniProt
dknet.org
neuinfo.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). UniProt [Dataset]. http://identifiers.org/RRID:SCR_002380
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002380
Dataset updated
Jan 29, 2022
Description
Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.
UniProt Proteins Reviewed (Swiss-Prot)
kaggle.com
zip
Updated Aug 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Lovyagin (2022). UniProt Proteins Reviewed (Swiss-Prot) [Dataset]. https://www.kaggle.com/datasets/andreylovyagin/uniprot-proteins-reviewed-swissprot
Explore at:
zip(479163007 bytes)Available download formats
Dataset updated
Aug 6, 2022
Authors
Andrey Lovyagin
Description
Uploaded UniProt reviewed proteins database with all columns for easier using in kaggle notebooks. All columns have description, but if you will have any questions, you can check UniProt Help where every column have a full explanation.

For UniProt Species Proteomes check this dataset.

License: Creative Commons Attribution 4.0 International (CC BY 4.0) License
w
UniProtKB
data.wu.ac.at
api/sparql
Updated Jul 30, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linking Open Data Cloud (2016). UniProtKB [Dataset]. https://data.wu.ac.at/odso/datahub_io/YWIwYTQ0ZjMtYzY0Mi00MmM5LWFiODItNDgxOWQ1ZTMzNDNm
Explore at:
api/sparql(20.0)Available download formats
Dataset updated
Jul 30, 2016
Dataset provided by
Linking Open Data Cloud
Description
UniProtKB is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.
uniprot-database_(type_eggnog).27.09.2019.tab.rar
figshare.com
application/x-rar
Updated Jun 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Kumazawa Morais (2020). uniprot-database_(type_eggnog).27.09.2019.tab.rar [Dataset]. http://doi.org/10.6084/m9.figshare.12555425.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12555425.v1
Dataset updated
Jun 24, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Daniel Kumazawa Morais
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The current database was downloaded on 27.09.2019 and has the data fields (columns) as described below:# 1 Entry# 2 Entry name# 3 Status# 4 Protein names# 5 Gene names# 6 Organism# 7 Length# 8 Cross-reference (KO)# 9 Taxonomic lineage (PHYLUM)# 10 Taxonomic lineage (SPECIES) # This field carries current and old* taxonomic classifications.# 11 Taxonomic lineage (GENUS)# 12 Taxonomic lineage (KINGDOM)# 13 Taxonomic lineage (SUPERKINGDOM)# 14 Cross-reference (OrthoDB)# 15 Cross-reference (eggNOG)*Details about the classification used in UNIPROT can be found at the link: https://www.uniprot.org/help/taxonomy
d
The Universal Protein Resource (UniProt)
catalog.data.gov
data.virginia.gov
+1more
Updated Jul 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (NIH) (2023). The Universal Protein Resource (UniProt) [Dataset]. https://catalog.data.gov/dataset/the-universal-protein-resource-uniprot
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
National Institutes of Health (NIH)
Description
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc).
d
UniProtKB
dknet.org
Updated Oct 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). UniProtKB [Dataset]. http://identifiers.org/RRID:SCR_004426
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004426
Dataset updated
Oct 24, 2019
Description
Central repository for collection of functional information on proteins, with accurate and consistent annotation. In addition to capturing core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and experimental and computational data. The UniProt Knowledgebase consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database which brings together experimental results, computed features, and scientific conclusions. UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization that await full manual annotation. Users may browse by taxonomy, keyword, gene ontology, enzyme class or pathway.
r
UniprotKB/SwissProt
resodate.org
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boutet; Lieberherr; Tognolli; Schneider; Bansal; Bridge; Poux; Bougueleret; Xenarios (2024). UniprotKB/SwissProt [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdW5pcHJvdGtiLXN3aXNzcHJvdA==
Explore at:
Dataset updated
Dec 16, 2024
Dataset provided by
Leibniz Data Manager
Authors
Boutet; Lieberherr; Tognolli; Schneider; Bansal; Bridge; Poux; Bougueleret; Xenarios
Description
The UniprotKB/SwissProt database contains protein sequence information.
d
UniProtKB/Swiss-Prot
dknet.org
neuinfo.org
+2more
Updated Dec 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). UniProtKB/Swiss-Prot [Dataset]. http://identifiers.org/RRID:SCR_021164
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_021164
Dataset updated
Dec 25, 2023
Description
Curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.
b
UniProt Isoform
bioregistry.io
Updated Dec 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). UniProt Isoform [Dataset]. http://identifiers.org/biolink:UNIPROT.ISOFORM
Explore at:
Unique identifier
https://identifiers.org/biolink:UNIPROT.ISOFORM
Dataset updated
Dec 18, 2021
Description
The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. This collection is a subset of UniProtKB, and provides a means to reference isoform information.
e
UniProtKB
ebi.ac.uk
Updated Oct 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). UniProtKB [Dataset]. http://www.ebi.ac.uk/interpro/protein/unreviewed/entry/InterPro/
Explore at:
Dataset updated
Oct 14, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of the type protein from the database UniProtKB - version 2021_04
R
Isoelectric point for all UniProtKB/TrEMBL proteins April 2016
repod.icm.edu.pl
commons.datacite.org
7z, bin
Updated May 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kozlowski, Lukasz (2016). Isoelectric point for all UniProtKB/TrEMBL proteins April 2016 [Dataset]. http://doi.org/10.18150/repod.9948646
Explore at:
7z(11492396457), bin(11492396457)Available download formats
Unique identifier
https://doi.org/10.18150/repod.9948646
Dataset updated
May 18, 2016
Dataset provided by
RepOD
Authors
Kozlowski, Lukasz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Predicted isoelectric point for all UniProtKB/TrEMBL proteins (April 2016) done using 18 different algorithms. Over 63 millions of protein sequences. Compressed using 7zip **Primary reference: Kozlowski, LP (2016) Proteome-pI: proteome isoelectric point database. Nucleic Acids Research doi: 10.1093/nar/gkw978 **www: http://isoelectricpointdb.org
b
UniProt Resource
bioregistry.io
Updated Feb 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). UniProt Resource [Dataset]. https://bioregistry.io/uniprot.resource
Explore at:
Dataset updated
Feb 26, 2022
Description
The cross-references section of UniProtKB entries displays explicit and implicit links to databases such as nucleotide sequence databases, model organism databases and genomics and proteomics resources.
UniProtKB/Swiss-Prot Protein Embeddings
kaggle.com
zip
Updated Apr 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Ofer (2023). UniProtKB/Swiss-Prot Protein Embeddings [Dataset]. https://www.kaggle.com/datasets/danofer/uniprotkbswiss-prot-protein-embeddings/code
Explore at:
zip(2087271680 bytes)Available download formats
Dataset updated
Apr 23, 2023
Authors
Dan Ofer
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description follows is from the official UniProt embeddings page, which also hosts this dataset originally.

Protein embeddings are a way to encode functional and structural properties of a protein, mostly from its sequence only, in a machine-friendly format (vector representation). Generating such embeddings is computationally expensive, but once computed they can be leveraged for different tasks, such as sequence similarity search, sequence clustering, and sequence classification.

UniProt provided raw embeddings (mean pooled, per-protein using the ProtT5 model) for UniProtKB/Swiss-Prot.

Note: Protein sequences longer than 12k residues are excluded due to limitation of GPU memory (this concerns only a handful of proteins).

Sample code The embeddings.h5 files store the embeddings as key-value pairs. The key is the protein accession number and the value is the embeddings vector. The following code snippet shows how to read and iterate over an embeddings file in python.

import numpy as np import h5py with h5py.File("path/to/embeddings.h5", "r") as file: print(f"number of entries: {len(file.items())}") for sequence_id, embedding in file.items(): print( f" id: {sequence_id}, " f" embeddings shape: {embedding.shape}, " f" embeddings mean: {np.array(embedding).mean()}" )

Sample output (SARS-CoV-2 embeddings from release 2022_04) per-protein file:

number of entries: 17 id: A0A663DJA2, embeddings shape: (1024,), embeddings mean: 0.0006136894226074219 id: P0DTC1, embeddings shape: (1024,), embeddings mean: 0.0011968612670898438 id: P0DTC2, embeddings shape: (1024,), embeddings mean: 0.001041412353515625

SOURCE: https://www.uniprot.org/help/embeddings https://www.uniprot.org/help/downloads#embeddings Reviewed (Swiss-Prot) - per-protein: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/embeddings/uniprot_sprot/per-protein.h5
d
NEWT
dknet.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). NEWT [Dataset]. http://identifiers.org/RRID:SCR_004477
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004477
Dataset updated
Jan 29, 2022
Description
NEWT is the taxonomy database maintained by the UniProt group. It integrates taxonomy data compiled in the NCBI database and data specific to the UniProt Knowledgebase. Browse by hierarchy, List all, or Complete proteomes. Organisms are classified in a hierarchical tree structure. Our taxonomy database contains every node (taxon) of the tree. UniProtKB taxonomy data is manually curated: next to manually verified organism names, we provide a selection of external links, organism strains and viral host information. Species with protein sequences stored in the UniProt Knowledgebase are named according to UniProt nomenclature. We endeavour to maintain a list of manually curated species names for which protein sequence data is available. In particular, we have adopted a systematic convention for naming viral and bacterial strains and isolates. Links to external sites are chosen by the UniProt taxonomy team and show pictures and various scientific data of interest (taxonomy, biology, physiology,...).
Protein-centric rate of sequence evolution according to Rate4Site on...
figshare.com
txt
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
emmanuel levy; Benjamin Dubreuil (2021). Protein-centric rate of sequence evolution according to Rate4Site on orthogroups of 14 fungal species [Dataset]. http://doi.org/10.6084/m9.figshare.13735537.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13735537.v2
Dataset updated
Feb 9, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
emmanuel levy; Benjamin Dubreuil
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overall, 25 descriptors (features) are calculated for 3797 unique proteins.The legend for each descriptor is given in the associated header file.Columns 1-5 provide protein identifiers:- ORF, - SGD Gene Name, - UniprotKB, - Matching PDB structure?- PDB code of closest structureColumns 6-8 correspond to protein expression:- Integrated abundance in ppm,- log10 abundance,- bins of abundance (5 bins)Columns 9-16 contain evolutionary rates averaged over:- Full sequence- Disordered residues- Not Disordered residues- Domain residues- Not Domain residues- Residues with PDB coordinates- Surface residues (>25% relative ASA)- Buried residues (
d
Data from: UniSave
dknet.org
scicrunch.org
+2more
Updated Jan 4, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2026). UniSave [Dataset]. http://identifiers.org/RRID:SCR_004946
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004946
Dataset updated
Jan 4, 2026
Description
The UniProtKB Sequence/Annotation Version Archive (UniSave) is a repository of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entry versions. Entries can be retrieved by entering a primary accession number or an entry name and pressing the Go! button. The result of the query is a list of entry versions with the UniProtKB database name, entry status, primary accession number, entry name, entry version, sequence version, release number and the release date, ordered by the release date, the latest version first. The entry version status can be ''''incorporated'''', ''''active'''', ''''changed'''', ''''replaced'''' or ''''deleted''''. An incorporated entry version is the first entry version added into UniProtKB, an active entry version is part of the latest public release, a changed entry version has been superseded by a newer entry version, a replaced entry has become secondary to another entry, and a deleted entry has been removed from the UniProtKB without becoming secondary to any other entry. For replaced entry versions, the status ''''Replaced'''' can be clicked to return all entries, which have the given entry as a secondary entry. If a date is provided as part of the query then only the version of the entry that was current at that date is displayed. Entries can be viewed by clicking ''''View'''' in the query results table. The ''''>'''' links can be used to access the earlier and later entry versions. The ''''Back to List'''' link returns the user to the query results table. Selecting ''''UniProtKB'''' or ''''Fasta'''' and pressing ''''Save'''' downloads the entry in flat file or fasta format. Comparison between entry versions is straightforward: selecting two entries and clicking the ''''Compare Selected'''' button will show the differences between the two entries. Whenever comparisons are made a Smith-Waterman sequence alignment is computed using SSEARCH, and displayed at the bottom of the entry. The actual alignment is displayed only when the sequences are not identical.
d
UniProtKB Subcellular Locations
dknet.org
rrid.site
+2more
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). UniProtKB Subcellular Locations [Dataset]. http://identifiers.org/RRID:SCR_004373
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004373
Dataset updated
Mar 28, 2025
Description
The subcellular locations in which a protein is found are described in UniProtKB entries with a controlled vocabulary, which includes also membrane topology and orientation terms. You may search in subcellular locations or list them all along with their definitions (490). By default, searching the subcellular locations will look for matches in both name and definition.
h
uniprot
huggingface.co
Updated Dec 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CleverThis (2025). uniprot [Dataset]. https://huggingface.co/datasets/CleverThis/uniprot
Explore at:
Dataset updated
Dec 18, 2025
Dataset authored and provided by
CleverThis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
UniProt RDF

Dataset Description

Comprehensive protein knowledgebase with functional annotations Original Source: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/uniprotkb_reviewed_eukaryota_opisthokonta_metazoa_33208_0.rdf.xz

Dataset Summary

This dataset contains RDF triples from UniProt RDF converted to HuggingFace dataset format for easy use in machine learning pipelines.

Format: Originally rdf, converted to HuggingFace Dataset Size: 0.392 GB… See the full description on the dataset page: https://huggingface.co/datasets/CleverThis/uniprot.

Facebook

Twitter

Click to copy link

Link copied

Cite

Apollo (2022). UniProt SPROT [Dataset]. https://www.kaggle.com/datasets/luckyapollo/uniprot-sprot

UniProt SPROT

Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot)

Explore at:

151 scholarly articles cite this dataset (View in Google Scholar)

zip(838310998 bytes)Available download formats

Dataset updated

Dec 10, 2022

Authors

Apollo

Description

UniProtKB/Swiss-Prot is the expertly curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.

The Universal Protein Resource (UniProt, http://www.uniprot.org) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).

The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.

Clear search

Close search

Google apps

Main menu

UniProt SPROT

UniProt Protein

UniProt

UniProt Proteins Reviewed (Swiss-Prot)

UniProtKB

uniprot-database_(type_eggnog).27.09.2019.tab.rar

The Universal Protein Resource (UniProt)

UniProtKB

UniprotKB/SwissProt

UniProtKB/Swiss-Prot

UniProt Isoform

UniProtKB

Isoelectric point for all UniProtKB/TrEMBL proteins April 2016

UniProt Resource

UniProtKB/Swiss-Prot Protein Embeddings

NEWT

Protein-centric rate of sequence evolution according to Rate4Site on...

Data from: UniSave

UniProtKB Subcellular Locations

uniprot

UniProt SPROT

Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot)