100+ datasets found

o
UniProt
registry.opendata.aws
Updated Apr 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SIB Swiss Institute of Bioinformatics on behalf of the UniProt Consortium (2021). UniProt [Dataset]. https://registry.opendata.aws/uniprot/
Explore at:
Dataset updated
Apr 6, 2021
Dataset provided by
UniProthttp://www.uniprot.org/
Description
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB Swiss Institute of Bioinformatics and PIR are committed to the long-term preservation of the UniProt databases.
n
UniProt
neuinfo.org
dknet.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). UniProt [Dataset]. http://identifiers.org/RRID:SCR_002380
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002380
Dataset updated
Jan 29, 2022
Description
Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.
uniprot-database_(type_eggnog).27.09.2019.tab.rar
figshare.com
application/x-rar
Updated Jun 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Kumazawa Morais (2020). uniprot-database_(type_eggnog).27.09.2019.tab.rar [Dataset]. http://doi.org/10.6084/m9.figshare.12555425.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12555425.v1
Dataset updated
Jun 24, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Daniel Kumazawa Morais
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The current database was downloaded on 27.09.2019 and has the data fields (columns) as described below:# 1 Entry# 2 Entry name# 3 Status# 4 Protein names# 5 Gene names# 6 Organism# 7 Length# 8 Cross-reference (KO)# 9 Taxonomic lineage (PHYLUM)# 10 Taxonomic lineage (SPECIES) # This field carries current and old* taxonomic classifications.# 11 Taxonomic lineage (GENUS)# 12 Taxonomic lineage (KINGDOM)# 13 Taxonomic lineage (SUPERKINGDOM)# 14 Cross-reference (OrthoDB)# 15 Cross-reference (eggNOG)*Details about the classification used in UNIPROT can be found at the link: https://www.uniprot.org/help/taxonomy
b
UniProt Protein
bioregistry.io
Updated Apr 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). UniProt Protein [Dataset]. http://identifiers.org/wikidata:P352
Explore at:
Unique identifier
https://identifiers.org/wikidata:P352, https://identifiers.org/biolink:UniProtKB https://identifiers.org/re3data:r3d100011521
Dataset updated
Apr 26, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. Besides amino acid sequence and a description, it also provides taxonomic data and citation information.
UniProt Proteins Reviewed (Swiss-Prot)
kaggle.com
zip
Updated Aug 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Lovyagin (2022). UniProt Proteins Reviewed (Swiss-Prot) [Dataset]. https://www.kaggle.com/datasets/andreylovyagin/uniprot-proteins-reviewed-swissprot
Explore at:
zip(479163007 bytes)Available download formats
Dataset updated
Aug 6, 2022
Authors
Andrey Lovyagin
Description
Uploaded UniProt reviewed proteins database with all columns for easier using in kaggle notebooks. All columns have description, but if you will have any questions, you can check UniProt Help where every column have a full explanation.

For UniProt Species Proteomes check this dataset.

License: Creative Commons Attribution 4.0 International (CC BY 4.0) License
Z
Data from: UniProt subset about proteins and annotations generated using...
data.niaid.nih.gov
observatorio-investigacion.unavarra.es
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ángel Iglesias Préstamo; Jose Emilio Labra Gayo; Kiyoko F. Aoki-Kinoshita; Yasunori Yamamoto; Toshiaki Katayama; Alberto Labarga; Andra Waagmeester (2023). UniProt subset about proteins and annotations generated using Shape Expressions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8086937
Explore at:
Dataset updated
Jun 28, 2023
Dataset provided by
Barcelona Supercomputing Center
GaLSIC, Soka University
WESO Lab - University of Oviedo
Micelio
Research Organization of Information and Systems (ROIS)
Database Center for Life Sciences
Authors
Ángel Iglesias Préstamo; Jose Emilio Labra Gayo; Kiyoko F. Aoki-Kinoshita; Yasunori Yamamoto; Toshiaki Katayama; Alberto Labarga; Andra Waagmeester
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Subset of Uniprot obtained from Shape Expression

Link to Shape expression: https://github.com/shex-consolidator/subsetting-examples/blob/master/protein/protein.shex

Dumps from Uniprot downloaded on 26-June-2023

Tool employed in the creation of the subset: Pschea-rs (https://github.com/angelip2303/pschema-rs)
n
UniProt Chordata protein annotation program
neuinfo.org
scicrunch.org
+2more
Updated Jul 12, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). UniProt Chordata protein annotation program [Dataset]. http://identifiers.org/RRID:SCR_007071
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007071
Dataset updated
Jul 12, 2013
Description
Data set of manually annotated chordata-specific proteins as well as those that are widely conserved. The program keeps existing human entries up-to-date and broadens the manual annotation to other vertebrate species, especially model organisms, including great apes, cow, mouse, rat, chicken, zebrafish, as well as Xenopus laevis and Xenopus tropicalis. A draft of the complete human proteome is available in UniProtKB/Swiss-Prot and one of the current priorities of the Chordata protein annotation program is to improve the quality of human sequences provided. To this aim, they are updating sequences which show discrepancies with those predicted from the genome sequence. Dubious isoforms, sequences based on experimental artifacts and protein products derived from erroneous gene model predictions are also revisited. This work is in part done in collaboration with the Hinxton Sequence Forum (HSF), which allows active exchange between UniProt, HAVANA, Ensembl and HGNC groups, as well as with RefSeq database. UniProt is a member of the Consensus CDS project and thye are in the process of reviewing their records to support convergence towards a standard set of protein annotation. They also continuously update human entries with functional annotation, including novel structural, post-translational modification, interaction and enzymatic activity data. In order to identify candidates for re-annotation, they use, among others, information extraction tools such as the STRING database. In addition, they regularly add new sequence variants and maintain disease information. Indeed, this annotation program includes the Variation Annotation Program, the goal of which is to annotate all known human genetic diseases and disease-linked protein variants, as well as neutral polymorphisms.
f
Integration of Proteomics and Transcriptomics Data Sets for the Analysis of...
acs.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paula Díez; Conrad Droste; Rosa M. Dégano; María González-Muñoz; Nieves Ibarrola; Martín Pérez-Andrés; Alba Garin-Muga; Víctor Segura; Gyorgy Marko-Varga; Joshua LaBaer; Alberto Orfao; Fernando J. Corrales; Javier De Las Rivas; Manuel Fuentes (2023). Integration of Proteomics and Transcriptomics Data Sets for the Analysis of a Lymphoma B‑Cell Line in the Context of the Chromosome-Centric Human Proteome Project [Dataset]. http://doi.org/10.1021/acs.jproteome.5b00474.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.5b00474.s001
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Paula Díez; Conrad Droste; Rosa M. Dégano; María González-Muñoz; Nieves Ibarrola; Martín Pérez-Andrés; Alba Garin-Muga; Víctor Segura; Gyorgy Marko-Varga; Joshua LaBaer; Alberto Orfao; Fernando J. Corrales; Javier De Las Rivas; Manuel Fuentes
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A comprehensive study of the molecular active landscape of human cells can be undertaken to integrate two different but complementary perspectives: transcriptomics, and proteomics. After the genome era, proteomics has emerged as a powerful tool to simultaneously identify and characterize the compendium of thousands of different proteins active in a cell. Thus, the Chromosome-centric Human Proteome Project (C-HPP) is promoting a full characterization of the human proteome combining high-throughput proteomics with the data derived from genome-wide expression profiling of protein-coding genes. Here we present a full proteomic profiling of a human lymphoma B-cell line (Ramos) performed using a nanoUPLC-LTQ-Orbitrap Velos proteomic platform, combined to an in-depth transcriptomic profiling of the same cell type. Data are available via ProteomeXchange with identifier PXD001933. Integration of the proteomic and transcriptomic data sets revealed a 94% overlap in the proteins identified by both -omics approaches. Moreover, functional enrichment analysis of the proteomic profiles showed an enrichment of several functions directly related to the biological and morphological characteristics of B-cells. In turn, about 30% of all protein-coding genes present in the whole human genome were identified as being expressed by the Ramos cells (stable average of 30% genes along all the chromosomes), revealing the size of the protein expression-set present in one specific human cell type. Additionally, the identification of missing proteins in our data sets has been reported, highlighting the power of the approach. Also, a comparison between neXtProt and UniProt database searches has been performed. In summary, our transcriptomic and proteomic experimental profiling provided a high coverage report of the expressed proteome from a human lymphoma B-cell type with a clear insight into the biological processes that characterized these cells. In this way, we demonstrated the usefulness of combining -omics for a comprehensive characterization of specific biological systems.
UniProtKB/Swiss-Prot Protein Embeddings
kaggle.com
zip
Updated Apr 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Ofer (2023). UniProtKB/Swiss-Prot Protein Embeddings [Dataset]. https://www.kaggle.com/datasets/danofer/uniprotkbswiss-prot-protein-embeddings/data
Explore at:
zip(2087271680 bytes)Available download formats
Dataset updated
Apr 23, 2023
Authors
Dan Ofer
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description follows is from the official UniProt embeddings page, which also hosts this dataset originally.

Protein embeddings are a way to encode functional and structural properties of a protein, mostly from its sequence only, in a machine-friendly format (vector representation). Generating such embeddings is computationally expensive, but once computed they can be leveraged for different tasks, such as sequence similarity search, sequence clustering, and sequence classification.

UniProt provided raw embeddings (mean pooled, per-protein using the ProtT5 model) for UniProtKB/Swiss-Prot.

Note: Protein sequences longer than 12k residues are excluded due to limitation of GPU memory (this concerns only a handful of proteins).

Sample code The embeddings.h5 files store the embeddings as key-value pairs. The key is the protein accession number and the value is the embeddings vector. The following code snippet shows how to read and iterate over an embeddings file in python.

import numpy as np import h5py with h5py.File("path/to/embeddings.h5", "r") as file: print(f"number of entries: {len(file.items())}") for sequence_id, embedding in file.items(): print( f" id: {sequence_id}, " f" embeddings shape: {embedding.shape}, " f" embeddings mean: {np.array(embedding).mean()}" )

Sample output (SARS-CoV-2 embeddings from release 2022_04) per-protein file:

number of entries: 17 id: A0A663DJA2, embeddings shape: (1024,), embeddings mean: 0.0006136894226074219 id: P0DTC1, embeddings shape: (1024,), embeddings mean: 0.0011968612670898438 id: P0DTC2, embeddings shape: (1024,), embeddings mean: 0.001041412353515625

SOURCE: https://www.uniprot.org/help/embeddings https://www.uniprot.org/help/downloads#embeddings Reviewed (Swiss-Prot) - per-protein: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/embeddings/uniprot_sprot/per-protein.h5
t
UniProt-GOA Database
toxodb.org
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UniProt-GOA Database [Dataset]. https://toxodb.org/toxo/app/record/dataset/DS_f87ae346fd
Explore at:
Description
The UniProt GO annotation program aims to provide high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB).
e
N-terminal COFRADIC on cytosolic proteins of HEK293T cells - UniProt search
ebi.ac.uk
data.niaid.nih.gov
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Annelies Bogaert, N-terminal COFRADIC on cytosolic proteins of HEK293T cells - UniProt search [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD039392
Explore at:
Authors
Annelies Bogaert
Variables measured
Proteomics
Description
N-terminal proteoforms stem from the same gene but differ at their N-terminus, and most of these are found to be truncated, though some are N-terminally extended caused by ribosomes starting translation from codons in the annotated 5’UTR, and/or carry modified N-termini different from those of the canonical protein. Biological functions of N-terminal proteoforms are emerging, however, it remains unknown to what extend N-terminal proteoforms further expand the functional complexity. To address this in a more global manner, we mapped the interactomes of several pairs of N-terminal proteoforms and their canonical counterparts. For this, we first generated an in-depth catalogue of N-terminal proteoforms in the cytosol of HEK293T cells. As the N-terminal region is the part that differs between the proteoforms, we performed N-terminal enrichment via COFRADIC on the cytosol of HEK293T cells. We combined three digestion enzymes to increase the depth of analysis. Data was searched twise: once with a regular UniProt database and a second time with a custom database (combining the sequences of UniProt proteins, UniProt isoforms and publicly available Ribo-seq data). Data was filtered and this resulted in a catalogue of 3,306 N-termini from which 20 pairs of canonical protein and N-terminal proteoform(s) were selected for interactome analysis. Our analysis of these pairs revealed that the overlap of the interactomes for both proteoforms is in general high, showing their functional relation. However, for all pairs tested we do report differences as well. We show that N-terminal proteoforms can be engaged in new/different interactions and as well can lose several interactions compared to the canonical protein.
w
UniProtKB
data.wu.ac.at
api/sparql
Updated Jul 30, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linking Open Data Cloud (2016). UniProtKB [Dataset]. https://data.wu.ac.at/odso/datahub_io/YWIwYTQ0ZjMtYzY0Mi00MmM5LWFiODItNDgxOWQ1ZTMzNDNm
Explore at:
api/sparql(20.0)Available download formats
Dataset updated
Jul 30, 2016
Dataset provided by
Linking Open Data Cloud
Description
UniProtKB is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.
Z
Prediction and Visualization of Human Transmembrane Proteins using AlphaFold...
data.niaid.nih.gov
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marquet, Céline; Grekova, Anastasia; Houri, Leen; Heinzinger, Michael; Rost, Burkhard (2024). Prediction and Visualization of Human Transmembrane Proteins using AlphaFold and Protein Language Models [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6816082
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Technical University Munich
Authors
Marquet, Céline; Grekova, Anastasia; Houri, Leen; Heinzinger, Michael; Rost, Burkhard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description: TMvis ("TMvis496.tar.gz") is a dataset containing 496 3D-structures of predicted human transmembrane proteins (TMP) and their predicted membrane embedding. The method TMbed [1], based on the protein language model ProtT5 [2] predicted 4.967 TMP for the human proteome (20,375 proteins, UniProt [3] version April 2022; excluding TITIN_HUMAN due to length). For these proteins, we obtained AlphaFold [4] structures from AlphaFoldDB [5] with an average per-residue confidence score (pLDDT) of more than 90%. This resulted in the 496 proteins of TMvis, as can be found in "TMvis496.fasta". The membrane embedding was predicted using the methods ANVIL [6], PPM3 [7], and per-residue TMbed predictions. As the three methods are based on different approaches, we decided to publish results for all. The figure “TMvis_project_overview.png” provides a graphical overview for each step described above.

TMvis Folder Structure: TMvis is separated into “alpha” containing predicted alpha-helical TMPs, and “beta” containing predicted beta-barrel TMPs. Within these folders, each protein is assigned one folder, identifiable by the respective unique UniProt ID. Each protein folder consists of: - “UniprotID.fasta” with UniProt ID, sequence, TMbed per-residue prediction - “AF-UniprotID-F1-model_v2.pdb” with the AlphaFold structure - “AF-UniprotID-F1-model_v2.cif” with the AlphaFold structure - “AF-UniprotID-F1-model_v2_ANVIL.pdb” with predicted ANVIL membrane embedding - “AF-UniprotID-F1-model_v2_ppm.pdb” predicted PPM3 membrane embedding

TMvis
|
├── alpha
│ │
│ ├── A0A087X1C5
│ │ ├── A0A087X1C5.fasta
│ │ ├── AF-A0A087X1C5-F1-model_v2.pdb
│ │ ├── AF-A0A087X1C5-F1-model_v2.cif
│ │ ├── AF-A0A087X1C5-F1-model_v2_ANVIL.pdb
│ │ └── AF-A0A087X1C5-F1-model_v2_ppm.PDB
│ └── ...
└── beta
└── P45880

TMvis visualization: The 3D-visualization of every protein in the dataset TMvis can be easily accessed using the Jupyter Notebook “TMvis.ipynb”. It contains detailed descriptions the different membrane prediction tools ANVIL, PPM3, and TMbed as well as the respective code. Additionally, it allows to visualize the per-residue confidence scores (pLDDT) of AlphaFold.

——————————————————————————————————————————————————————————————————————————

References:

[1] TMbed - TMbed Bernhofer, Michael, and Burkhard Rost. 2022. “TMbed – Transmembrane Proteins Predicted through Language Model Embeddings.” bioRxiv.

[2] ProtT5 - A. Elnaggar et al., "ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2021.3095381.

[3] UniProt - UniProt Consortium (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic acids research, 49(D1), D480–D489.

[4] AlphaFold - AlphaFold Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.

[5] Alphafold DB - Varadi, Mihaly, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, et al. 2022. “AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models.” Nucleic Acids Research 50 (D1): D439–44.

[6] ANVIL - ANVIL Postic, Guillaume, Yassine Ghouzam, Vincent Guiraud, and Jean-Christophe Gelly. 2016. “Membrane Positioning for High- and Low-Resolution Protein Structures through a Binary Classification Approach.” Protein Engineering, Design & Selection: PEDS 29 (3): 87–91.

[7] PPM3 - PPM3 Lomize, Mikhail A., Irina D. Pogozheva, Hyeon Joo, Henry I. Mosberg, and Andrei L. Lomize. 2012. “OPM Database and PPM Web Server: Resources for Positioning of Proteins in Membranes.” Nucleic Acids Research 40 (Database issue): D370–76.

——————————————————————————————————————————————————————————————————————————

License:

This work is licensed under a Creative Commons Attribution 4.0 International License (CC-BY 4.0).
n
UniRef
neuinfo.org
dknet.org
+2more
Updated Nov 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). UniRef [Dataset]. http://identifiers.org/RRID:SCR_010646
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_010646
Dataset updated
Nov 16, 2024
Description
Databases which provide clustered sets of sequences from UniProt Knowledgebase and selected UniParc records, in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences from view. The UniRef100 database combines identical sequences and sub-fragments with 11 or more residues (from any organism) into a single UniRef entry. The sequence of a representative protein, the accession numbers of all the merged entries, and links to the corresponding UniProtKB and UniParc records are all displayed in the entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences with 11 or more residues such that each cluster is composed of sequences that have at least 90% (UniRef90) or 50% (UniRef50) sequence identity to the longest sequence (UniRef seed sequence). All the sequences in each cluster are ranked to facilitate the selection of a representative sequence for the cluster.
d
UniRef at the EBI
dknet.org
scicrunch.org
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). UniRef at the EBI [Dataset]. http://identifiers.org/RRID:SCR_004972
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_004972
Dataset updated
Jan 29, 2022
Description
Various non-redundant databases with different sequence identity cut-offs created by clustering closely similar sequences to yield a representative subset of sequences. In the UniRef90 and UniRef50 databases no pair of sequences in the representative set has >90% or >50% mutual sequence identity. The UniRef100 database presents identical sequences and sub-fragments as a single entry with protein IDs, sequences, bibliography, and links to protein databases. The two major objectives of UniRef are: (i) to facilitate sequence merging in UniProt, and (ii) to allow faster and more informative sequence similarity searches. Although the UniProt Knowledgebase is much less redundant than UniParc, it still contains a certain level of redundancy because it is not possible to use fully automatic merging without risking merging of similar sequences from different proteins. However, such automatic procedures are extremely useful in compiling the UniRef databases to obtain complete coverage of sequence space while hiding redundant sequences (but not their descriptions) from view. A high level of redundancy results in several problems, including slow database searches and long lists of similar or identical alignments that can obscure novel matches in the output. Thus, a more even sampling of sequence space is advantageous. You may access NREF via the FTP server.
q
Data from: A Critical Guide to the UniProtKB Flat-file Format
qubeshub.org
Updated Dec 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teresa Attwood; GOBLET Foundation (2020). A Critical Guide to the UniProtKB Flat-file Format [Dataset]. http://doi.org/10.25334/ZQRR-1577
Explore at:
Unique identifier
https://doi.org/10.25334/ZQRR-1577
Dataset updated
Dec 5, 2020
Dataset provided by
QUBES
Authors
Teresa Attwood; GOBLET Foundation
Description
This Critical Guide briefly presents the need for biological databases and for a standard format for storing and organising biological data.
d
UniProt Proteomes
dknet.org
neuinfo.org
+2more
Updated Nov 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). UniProt Proteomes [Dataset]. http://identifiers.org/RRID:SCR_018666/resolver
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_018666 https://identifiers.org/RRID:SCR_018666/resolver
Dataset updated
Nov 30, 2025
Description
Protein sets from fully sequenced genomes. Proteomes portal offers protein sequence sets obtained from translation of completely sequenced genomes. Published genomes from NCBI Genome are brought into UniProt if genome is annotated and set of coding sequences is available. Number of predicted coding sequences falls within statistically significant range of published proteomes from neighbouring species.
Number of human protein variations collected from the UniProt/Swiss-Prot...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yongwook Choi; Gregory E. Sims; Sean Murphy; Jason R. Miller; Agnes P. Chan (2023). Number of human protein variations collected from the UniProt/Swiss-Prot database. [Dataset]. http://doi.org/10.1371/journal.pone.0046688.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0046688.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yongwook Choi; Gregory E. Sims; Sean Murphy; Jason R. Miller; Agnes P. Chan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Number of human protein variations collected from the UniProt/Swiss-Prot database.
b
UniProt journal
bioregistry.io
Updated May 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). UniProt journal [Dataset]. http://identifiers.org/wikidata:P4616
Explore at:
Unique identifier
https://identifiers.org/wikidata:P4616
Dataset updated
May 10, 2024
Description
identifier for a scientific journal, in the UniProt database
f
Table S9_Homeobox Uniprot Screen in O-GlcNAc Database
datasetcatalog.nlm.nih.gov
figshare.com
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wulff, Eugenia (2024). Table S9_Homeobox Uniprot Screen in O-GlcNAc Database [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001298184
Explore at:
Dataset updated
Mar 4, 2024
Authors
Wulff, Eugenia
Description
The list of human proteins in reviewed entries obtained from UniprotKB was searched in the O-GlcNAc database (oglcnac.com)

Facebook

Twitter

Click to copy link

Link copied

Cite

SIB Swiss Institute of Bioinformatics on behalf of the UniProt Consortium (2021). UniProt [Dataset]. https://registry.opendata.aws/uniprot/

UniProt

Explore at:

Dataset updated

Apr 6, 2021

Dataset provided by

UniProthttp://www.uniprot.org/

Description

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB Swiss Institute of Bioinformatics and PIR are committed to the long-term preservation of the UniProt databases.

Clear search

Close search

Google apps

Main menu

UniProt

UniProt

uniprot-database_(type_eggnog).27.09.2019.tab.rar

UniProt Protein

UniProt Proteins Reviewed (Swiss-Prot)

Data from: UniProt subset about proteins and annotations generated using...

UniProt Chordata protein annotation program

Integration of Proteomics and Transcriptomics Data Sets for the Analysis of...

UniProtKB/Swiss-Prot Protein Embeddings

UniProt-GOA Database

N-terminal COFRADIC on cytosolic proteins of HEK293T cells - UniProt search

UniProtKB

Prediction and Visualization of Human Transmembrane Proteins using AlphaFold...

UniRef

UniRef at the EBI

Data from: A Critical Guide to the UniProtKB Flat-file Format

UniProt Proteomes

Number of human protein variations collected from the UniProt/Swiss-Prot...

UniProt journal

Table S9_Homeobox Uniprot Screen in O-GlcNAc Database

UniProtSee More Versions

UniProt