100+ datasets found
  1. o

    UniProt

    • registry.opendata.aws
    Updated Apr 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIB Swiss Institute of Bioinformatics on behalf of the UniProt Consortium (2021). UniProt [Dataset]. https://registry.opendata.aws/uniprot/
    Explore at:
    Dataset updated
    Apr 6, 2021
    Dataset provided by
    UniProthttp://www.uniprot.org/
    Description

    The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB Swiss Institute of Bioinformatics and PIR are committed to the long-term preservation of the UniProt databases.

  2. n

    UniProt

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UniProt [Dataset]. http://identifiers.org/RRID:SCR_002380
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.

  3. uniprot-database_(type_eggnog).27.09.2019.tab.rar

    • figshare.com
    application/x-rar
    Updated Jun 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Kumazawa Morais (2020). uniprot-database_(type_eggnog).27.09.2019.tab.rar [Dataset]. http://doi.org/10.6084/m9.figshare.12555425.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Jun 24, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Daniel Kumazawa Morais
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The current database was downloaded on 27.09.2019 and has the data fields (columns) as described below:# 1 Entry# 2 Entry name# 3 Status# 4 Protein names# 5 Gene names# 6 Organism# 7 Length# 8 Cross-reference (KO)# 9 Taxonomic lineage (PHYLUM)# 10 Taxonomic lineage (SPECIES) # This field carries current and old* taxonomic classifications.# 11 Taxonomic lineage (GENUS)# 12 Taxonomic lineage (KINGDOM)# 13 Taxonomic lineage (SUPERKINGDOM)# 14 Cross-reference (OrthoDB)# 15 Cross-reference (eggNOG)*Details about the classification used in UNIPROT can be found at the link: https://www.uniprot.org/help/taxonomy

  4. b

    UniProt Protein

    • bioregistry.io
    Updated Apr 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). UniProt Protein [Dataset]. http://identifiers.org/wikidata:P352
    Explore at:
    Dataset updated
    Apr 26, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. Besides amino acid sequence and a description, it also provides taxonomic data and citation information.

  5. UniProt Proteins Reviewed (Swiss-Prot)

    • kaggle.com
    zip
    Updated Aug 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Lovyagin (2022). UniProt Proteins Reviewed (Swiss-Prot) [Dataset]. https://www.kaggle.com/datasets/andreylovyagin/uniprot-proteins-reviewed-swissprot
    Explore at:
    zip(479163007 bytes)Available download formats
    Dataset updated
    Aug 6, 2022
    Authors
    Andrey Lovyagin
    Description

    Uploaded UniProt reviewed proteins database with all columns for easier using in kaggle notebooks. All columns have description, but if you will have any questions, you can check UniProt Help where every column have a full explanation.

    For UniProt Species Proteomes check this dataset.

    License: Creative Commons Attribution 4.0 International (CC BY 4.0) License

  6. Z

    Data from: UniProt subset about proteins and annotations generated using...

    • data.niaid.nih.gov
    • observatorio-investigacion.unavarra.es
    Updated Jun 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ángel Iglesias Préstamo; Jose Emilio Labra Gayo; Kiyoko F. Aoki-Kinoshita; Yasunori Yamamoto; Toshiaki Katayama; Alberto Labarga; Andra Waagmeester (2023). UniProt subset about proteins and annotations generated using Shape Expressions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8086937
    Explore at:
    Dataset updated
    Jun 28, 2023
    Dataset provided by
    Barcelona Supercomputing Center
    GaLSIC, Soka University
    WESO Lab - University of Oviedo
    Micelio
    Research Organization of Information and Systems (ROIS)
    Database Center for Life Sciences
    Authors
    Ángel Iglesias Préstamo; Jose Emilio Labra Gayo; Kiyoko F. Aoki-Kinoshita; Yasunori Yamamoto; Toshiaki Katayama; Alberto Labarga; Andra Waagmeester
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Subset of Uniprot obtained from Shape Expression

    Link to Shape expression: https://github.com/shex-consolidator/subsetting-examples/blob/master/protein/protein.shex

    Dumps from Uniprot downloaded on 26-June-2023

    Tool employed in the creation of the subset: Pschea-rs (https://github.com/angelip2303/pschema-rs)

  7. n

    UniProt Chordata protein annotation program

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jul 12, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). UniProt Chordata protein annotation program [Dataset]. http://identifiers.org/RRID:SCR_007071
    Explore at:
    Dataset updated
    Jul 12, 2013
    Description

    Data set of manually annotated chordata-specific proteins as well as those that are widely conserved. The program keeps existing human entries up-to-date and broadens the manual annotation to other vertebrate species, especially model organisms, including great apes, cow, mouse, rat, chicken, zebrafish, as well as Xenopus laevis and Xenopus tropicalis. A draft of the complete human proteome is available in UniProtKB/Swiss-Prot and one of the current priorities of the Chordata protein annotation program is to improve the quality of human sequences provided. To this aim, they are updating sequences which show discrepancies with those predicted from the genome sequence. Dubious isoforms, sequences based on experimental artifacts and protein products derived from erroneous gene model predictions are also revisited. This work is in part done in collaboration with the Hinxton Sequence Forum (HSF), which allows active exchange between UniProt, HAVANA, Ensembl and HGNC groups, as well as with RefSeq database. UniProt is a member of the Consensus CDS project and thye are in the process of reviewing their records to support convergence towards a standard set of protein annotation. They also continuously update human entries with functional annotation, including novel structural, post-translational modification, interaction and enzymatic activity data. In order to identify candidates for re-annotation, they use, among others, information extraction tools such as the STRING database. In addition, they regularly add new sequence variants and maintain disease information. Indeed, this annotation program includes the Variation Annotation Program, the goal of which is to annotate all known human genetic diseases and disease-linked protein variants, as well as neutral polymorphisms.

  8. f

    Integration of Proteomics and Transcriptomics Data Sets for the Analysis of...

    • acs.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paula Díez; Conrad Droste; Rosa M. Dégano; María González-Muñoz; Nieves Ibarrola; Martín Pérez-Andrés; Alba Garin-Muga; Víctor Segura; Gyorgy Marko-Varga; Joshua LaBaer; Alberto Orfao; Fernando J. Corrales; Javier De Las Rivas; Manuel Fuentes (2023). Integration of Proteomics and Transcriptomics Data Sets for the Analysis of a Lymphoma B‑Cell Line in the Context of the Chromosome-Centric Human Proteome Project [Dataset]. http://doi.org/10.1021/acs.jproteome.5b00474.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Paula Díez; Conrad Droste; Rosa M. Dégano; María González-Muñoz; Nieves Ibarrola; Martín Pérez-Andrés; Alba Garin-Muga; Víctor Segura; Gyorgy Marko-Varga; Joshua LaBaer; Alberto Orfao; Fernando J. Corrales; Javier De Las Rivas; Manuel Fuentes
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A comprehensive study of the molecular active landscape of human cells can be undertaken to integrate two different but complementary perspectives: transcriptomics, and proteomics. After the genome era, proteomics has emerged as a powerful tool to simultaneously identify and characterize the compendium of thousands of different proteins active in a cell. Thus, the Chromosome-centric Human Proteome Project (C-HPP) is promoting a full characterization of the human proteome combining high-throughput proteomics with the data derived from genome-wide expression profiling of protein-coding genes. Here we present a full proteomic profiling of a human lymphoma B-cell line (Ramos) performed using a nanoUPLC-LTQ-Orbitrap Velos proteomic platform, combined to an in-depth transcriptomic profiling of the same cell type. Data are available via ProteomeXchange with identifier PXD001933. Integration of the proteomic and transcriptomic data sets revealed a 94% overlap in the proteins identified by both -omics approaches. Moreover, functional enrichment analysis of the proteomic profiles showed an enrichment of several functions directly related to the biological and morphological characteristics of B-cells. In turn, about 30% of all protein-coding genes present in the whole human genome were identified as being expressed by the Ramos cells (stable average of 30% genes along all the chromosomes), revealing the size of the protein expression-set present in one specific human cell type. Additionally, the identification of missing proteins in our data sets has been reported, highlighting the power of the approach. Also, a comparison between neXtProt and UniProt database searches has been performed. In summary, our transcriptomic and proteomic experimental profiling provided a high coverage report of the expressed proteome from a human lymphoma B-cell type with a clear insight into the biological processes that characterized these cells. In this way, we demonstrated the usefulness of combining -omics for a comprehensive characterization of specific biological systems.

  9. UniProtKB/Swiss-Prot Protein Embeddings

    • kaggle.com
    zip
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Ofer (2023). UniProtKB/Swiss-Prot Protein Embeddings [Dataset]. https://www.kaggle.com/datasets/danofer/uniprotkbswiss-prot-protein-embeddings/data
    Explore at:
    zip(2087271680 bytes)Available download formats
    Dataset updated
    Apr 23, 2023
    Authors
    Dan Ofer
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description follows is from the official UniProt embeddings page, which also hosts this dataset originally.

    Protein embeddings are a way to encode functional and structural properties of a protein, mostly from its sequence only, in a machine-friendly format (vector representation). Generating such embeddings is computationally expensive, but once computed they can be leveraged for different tasks, such as sequence similarity search, sequence clustering, and sequence classification.

    UniProt provided raw embeddings (mean pooled, per-protein using the ProtT5 model) for UniProtKB/Swiss-Prot.

    Note: Protein sequences longer than 12k residues are excluded due to limitation of GPU memory (this concerns only a handful of proteins).

    Sample code The embeddings.h5 files store the embeddings as key-value pairs. The key is the protein accession number and the value is the embeddings vector. The following code snippet shows how to read and iterate over an embeddings file in python.

    import numpy as np
    import h5py
    
    with h5py.File("path/to/embeddings.h5", "r") as file:
      print(f"number of entries: {len(file.items())}")
      for sequence_id, embedding in file.items():
        print(
          f" id: {sequence_id}, "
          f" embeddings shape: {embedding.shape}, "
          f" embeddings mean: {np.array(embedding).mean()}"
        )
    

    Sample output (SARS-CoV-2 embeddings from release 2022_04) per-protein file:

    number of entries: 17 id: A0A663DJA2, embeddings shape: (1024,), embeddings mean: 0.0006136894226074219 id: P0DTC1, embeddings shape: (1024,), embeddings mean: 0.0011968612670898438 id: P0DTC2, embeddings shape: (1024,), embeddings mean: 0.001041412353515625

    SOURCE: https://www.uniprot.org/help/embeddings https://www.uniprot.org/help/downloads#embeddings Reviewed (Swiss-Prot) - per-protein: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/embeddings/uniprot_sprot/per-protein.h5

  10. t

    UniProt-GOA Database

    • toxodb.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UniProt-GOA Database [Dataset]. https://toxodb.org/toxo/app/record/dataset/DS_f87ae346fd
    Explore at:
    Description

    The UniProt GO annotation program aims to provide high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB).

  11. e

    N-terminal COFRADIC on cytosolic proteins of HEK293T cells - UniProt search

    • ebi.ac.uk
    • data.niaid.nih.gov
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Annelies Bogaert, N-terminal COFRADIC on cytosolic proteins of HEK293T cells - UniProt search [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD039392
    Explore at:
    Authors
    Annelies Bogaert
    Variables measured
    Proteomics
    Description

    N-terminal proteoforms stem from the same gene but differ at their N-terminus, and most of these are found to be truncated, though some are N-terminally extended caused by ribosomes starting translation from codons in the annotated 5’UTR, and/or carry modified N-termini different from those of the canonical protein. Biological functions of N-terminal proteoforms are emerging, however, it remains unknown to what extend N-terminal proteoforms further expand the functional complexity. To address this in a more global manner, we mapped the interactomes of several pairs of N-terminal proteoforms and their canonical counterparts. For this, we first generated an in-depth catalogue of N-terminal proteoforms in the cytosol of HEK293T cells. As the N-terminal region is the part that differs between the proteoforms, we performed N-terminal enrichment via COFRADIC on the cytosol of HEK293T cells. We combined three digestion enzymes to increase the depth of analysis. Data was searched twise: once with a regular UniProt database and a second time with a custom database (combining the sequences of UniProt proteins, UniProt isoforms and publicly available Ribo-seq data). Data was filtered and this resulted in a catalogue of 3,306 N-termini from which 20 pairs of canonical protein and N-terminal proteoform(s) were selected for interactome analysis. Our analysis of these pairs revealed that the overlap of the interactomes for both proteoforms is in general high, showing their functional relation. However, for all pairs tested we do report differences as well. We show that N-terminal proteoforms can be engaged in new/different interactions and as well can lose several interactions compared to the canonical protein.

  12. w

    UniProtKB

    • data.wu.ac.at
    api/sparql
    Updated Jul 30, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linking Open Data Cloud (2016). UniProtKB [Dataset]. https://data.wu.ac.at/odso/datahub_io/YWIwYTQ0ZjMtYzY0Mi00MmM5LWFiODItNDgxOWQ1ZTMzNDNm
    Explore at:
    api/sparql(20.0)Available download formats
    Dataset updated
    Jul 30, 2016
    Dataset provided by
    Linking Open Data Cloud
    Description

    UniProtKB is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data.

  13. Z

    Prediction and Visualization of Human Transmembrane Proteins using AlphaFold...

    • data.niaid.nih.gov
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marquet, Céline; Grekova, Anastasia; Houri, Leen; Heinzinger, Michael; Rost, Burkhard (2024). Prediction and Visualization of Human Transmembrane Proteins using AlphaFold and Protein Language Models [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6816082
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Technical University Munich
    Authors
    Marquet, Céline; Grekova, Anastasia; Houri, Leen; Heinzinger, Michael; Rost, Burkhard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description: TMvis ("TMvis496.tar.gz") is a dataset containing 496 3D-structures of predicted human transmembrane proteins (TMP) and their predicted membrane embedding. The method TMbed [1], based on the protein language model ProtT5 [2] predicted 4.967 TMP for the human proteome (20,375 proteins, UniProt [3] version April 2022; excluding TITIN_HUMAN due to length). For these proteins, we obtained AlphaFold [4] structures from AlphaFoldDB [5] with an average per-residue confidence score (pLDDT) of more than 90%. This resulted in the 496 proteins of TMvis, as can be found in "TMvis496.fasta". The membrane embedding was predicted using the methods ANVIL [6], PPM3 [7], and per-residue TMbed predictions. As the three methods are based on different approaches, we decided to publish results for all. The figure “TMvis_project_overview.png” provides a graphical overview for each step described above.

    TMvis Folder Structure: TMvis is separated into “alpha” containing predicted alpha-helical TMPs, and “beta” containing predicted beta-barrel TMPs. Within these folders, each protein is assigned one folder, identifiable by the respective unique UniProt ID. Each protein folder consists of: - “UniprotID.fasta” with UniProt ID, sequence, TMbed per-residue prediction - “AF-UniprotID-F1-model_v2.pdb” with the AlphaFold structure - “AF-UniprotID-F1-model_v2.cif” with the AlphaFold structure - “AF-UniprotID-F1-model_v2_ANVIL.pdb” with predicted ANVIL membrane embedding - “AF-UniprotID-F1-model_v2_ppm.pdb” predicted PPM3 membrane embedding

    TMvis
    |
    ├── alpha
    │ │
    │ ├── A0A087X1C5
    │ │ ├── A0A087X1C5.fasta
    │ │ ├── AF-A0A087X1C5-F1-model_v2.pdb
    │ │ ├── AF-A0A087X1C5-F1-model_v2.cif
    │ │ ├── AF-A0A087X1C5-F1-model_v2_ANVIL.pdb
    │ │ └── AF-A0A087X1C5-F1-model_v2_ppm.PDB
    │ └── ...
    └── beta
    └── P45880

    TMvis visualization: The 3D-visualization of every protein in the dataset TMvis can be easily accessed using the Jupyter Notebook “TMvis.ipynb”. It contains detailed descriptions the different membrane prediction tools ANVIL, PPM3, and TMbed as well as the respective code. Additionally, it allows to visualize the per-residue confidence scores (pLDDT) of AlphaFold.

    ——————————————————————————————————————————————————————————————————————————

    References:

    [1] TMbed - TMbed Bernhofer, Michael, and Burkhard Rost. 2022. “TMbed – Transmembrane Proteins Predicted through Language Model Embeddings.” bioRxiv.

    [2] ProtT5 - A. Elnaggar et al., "ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2021.3095381.

    [3] UniProt - UniProt Consortium (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic acids research, 49(D1), D480–D489.

    [4] AlphaFold - AlphaFold Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.

    [5] Alphafold DB - Varadi, Mihaly, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, et al. 2022. “AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models.” Nucleic Acids Research 50 (D1): D439–44.

    [6] ANVIL - ANVIL Postic, Guillaume, Yassine Ghouzam, Vincent Guiraud, and Jean-Christophe Gelly. 2016. “Membrane Positioning for High- and Low-Resolution Protein Structures through a Binary Classification Approach.” Protein Engineering, Design & Selection: PEDS 29 (3): 87–91.

    [7] PPM3 - PPM3 Lomize, Mikhail A., Irina D. Pogozheva, Hyeon Joo, Henry I. Mosberg, and Andrei L. Lomize. 2012. “OPM Database and PPM Web Server: Resources for Positioning of Proteins in Membranes.” Nucleic Acids Research 40 (Database issue): D370–76.

    ——————————————————————————————————————————————————————————————————————————

    License:

    This work is licensed under a Creative Commons Attribution 4.0 International License (CC-BY 4.0).

  14. n

    UniRef

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Nov 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). UniRef [Dataset]. http://identifiers.org/RRID:SCR_010646
    Explore at:
    Dataset updated
    Nov 16, 2024
    Description

    Databases which provide clustered sets of sequences from UniProt Knowledgebase and selected UniParc records, in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences from view. The UniRef100 database combines identical sequences and sub-fragments with 11 or more residues (from any organism) into a single UniRef entry. The sequence of a representative protein, the accession numbers of all the merged entries, and links to the corresponding UniProtKB and UniParc records are all displayed in the entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences with 11 or more residues such that each cluster is composed of sequences that have at least 90% (UniRef90) or 50% (UniRef50) sequence identity to the longest sequence (UniRef seed sequence). All the sequences in each cluster are ranked to facilitate the selection of a representative sequence for the cluster.

  15. d

    UniRef at the EBI

    • dknet.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UniRef at the EBI [Dataset]. http://identifiers.org/RRID:SCR_004972
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Various non-redundant databases with different sequence identity cut-offs created by clustering closely similar sequences to yield a representative subset of sequences. In the UniRef90 and UniRef50 databases no pair of sequences in the representative set has >90% or >50% mutual sequence identity. The UniRef100 database presents identical sequences and sub-fragments as a single entry with protein IDs, sequences, bibliography, and links to protein databases. The two major objectives of UniRef are: (i) to facilitate sequence merging in UniProt, and (ii) to allow faster and more informative sequence similarity searches. Although the UniProt Knowledgebase is much less redundant than UniParc, it still contains a certain level of redundancy because it is not possible to use fully automatic merging without risking merging of similar sequences from different proteins. However, such automatic procedures are extremely useful in compiling the UniRef databases to obtain complete coverage of sequence space while hiding redundant sequences (but not their descriptions) from view. A high level of redundancy results in several problems, including slow database searches and long lists of similar or identical alignments that can obscure novel matches in the output. Thus, a more even sampling of sequence space is advantageous. You may access NREF via the FTP server.

  16. q

    Data from: A Critical Guide to the UniProtKB Flat-file Format

    • qubeshub.org
    Updated Dec 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresa Attwood; GOBLET Foundation (2020). A Critical Guide to the UniProtKB Flat-file Format [Dataset]. http://doi.org/10.25334/ZQRR-1577
    Explore at:
    Dataset updated
    Dec 5, 2020
    Dataset provided by
    QUBES
    Authors
    Teresa Attwood; GOBLET Foundation
    Description

    This Critical Guide briefly presents the need for biological databases and for a standard format for storing and organising biological data.

  17. d

    UniProt Proteomes

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Nov 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). UniProt Proteomes [Dataset]. http://identifiers.org/RRID:SCR_018666/resolver
    Explore at:
    Dataset updated
    Nov 30, 2025
    Description

    Protein sets from fully sequenced genomes. Proteomes portal offers protein sequence sets obtained from translation of completely sequenced genomes. Published genomes from NCBI Genome are brought into UniProt if genome is annotated and set of coding sequences is available. Number of predicted coding sequences falls within statistically significant range of published proteomes from neighbouring species.

  18. Number of human protein variations collected from the UniProt/Swiss-Prot...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongwook Choi; Gregory E. Sims; Sean Murphy; Jason R. Miller; Agnes P. Chan (2023). Number of human protein variations collected from the UniProt/Swiss-Prot database. [Dataset]. http://doi.org/10.1371/journal.pone.0046688.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yongwook Choi; Gregory E. Sims; Sean Murphy; Jason R. Miller; Agnes P. Chan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of human protein variations collected from the UniProt/Swiss-Prot database.

  19. b

    UniProt journal

    • bioregistry.io
    Updated May 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). UniProt journal [Dataset]. http://identifiers.org/wikidata:P4616
    Explore at:
    Dataset updated
    May 10, 2024
    Description

    identifier for a scientific journal, in the UniProt database

  20. f

    Table S9_Homeobox Uniprot Screen in O-GlcNAc Database

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wulff, Eugenia (2024). Table S9_Homeobox Uniprot Screen in O-GlcNAc Database [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001298184
    Explore at:
    Dataset updated
    Mar 4, 2024
    Authors
    Wulff, Eugenia
    Description

    The list of human proteins in reviewed entries obtained from UniprotKB was searched in the O-GlcNAc database (oglcnac.com)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SIB Swiss Institute of Bioinformatics on behalf of the UniProt Consortium (2021). UniProt [Dataset]. https://registry.opendata.aws/uniprot/

UniProt

Explore at:
Dataset updated
Apr 6, 2021
Dataset provided by
UniProthttp://www.uniprot.org/
Description

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB Swiss Institute of Bioinformatics and PIR are committed to the long-term preservation of the UniProt databases.

Search
Clear search
Close search
Google apps
Main menu