100+ datasets found
  1. UniProt

    • registry.opendata.aws
    Updated Apr 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIB Swiss Institute of Bioinformatics on behalf of the UniProt Consortium (2021). UniProt [Dataset]. https://registry.opendata.aws/uniprot/
    Explore at:
    Dataset updated
    Apr 6, 2021
    Dataset provided by
    UniProthttp://www.uniprot.org/
    Description

    The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB Swiss Institute of Bioinformatics and PIR are committed to the long-term preservation of the UniProt databases.

  2. uniprot-database_(type_ko).27.09.2019.tab.rar

    • figshare.com
    application/x-rar
    Updated Jun 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Kumazawa Morais (2020). uniprot-database_(type_ko).27.09.2019.tab.rar [Dataset]. http://doi.org/10.6084/m9.figshare.12555422.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Jun 24, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Daniel Kumazawa Morais
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The current database was downloaded on 27.09.2019 and has the data fields (columns) as described below:# 1 Entry# 2 Entry name# 3 Status# 4 Protein names# 5 Gene names# 6 Organism# 7 Length# 8 Cross-reference (KO)# 9 Taxonomic lineage (PHYLUM)# 10 Taxonomic lineage (SPECIES) # This field carries current and old* taxonomic classifications.# 11 Taxonomic lineage (GENUS)# 12 Taxonomic lineage (KINGDOM)# 13 Taxonomic lineage (SUPERKINGDOM)# 14 Cross-reference (OrthoDB)# 15 Cross-reference (eggNOG)*Details about the classification used in UNIPROT can be found at the link: https://www.uniprot.org/help/taxonomy

  3. s

    UniProt

    • scicrunch.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UniProt [Dataset]. http://identifiers.org/RRID:SCR_002380
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Collection of data of protein sequence and functional information. Resource for protein sequence and annotation data. Consortium for preservation of the UniProt databases: UniProt Knowledgebase (UniProtKB), UniProt Reference Clusters (UniRef), and UniProt Archive (UniParc), UniProt Proteomes. Collaboration between European Bioinformatics Institute (EMBL-EBI), SIB Swiss Institute of Bioinformatics and Protein Information Resource. Swiss-Prot is a curated subset of UniProtKB.

  4. h

    uniprot

    • huggingface.co
    Updated Apr 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Dampier (2022). uniprot [Dataset]. https://huggingface.co/datasets/damlab/uniprot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2022
    Authors
    Will Dampier
    Description

    Dataset Description

      Dataset Summary
    

    This dataset is a mirror of the Uniprot/SwissProt database. It contains the names and sequences of >500K proteins. This dataset was parsed from the FASTA file at https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz. Supported Tasks and Leaderboards: None Languages: English

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    Data Fields: id, description, sequence Data… See the full description on the dataset page: https://huggingface.co/datasets/damlab/uniprot.

  5. s

    UniProtKB

    • scicrunch.org
    • neuinfo.org
    Updated Oct 24, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). UniProtKB [Dataset]. http://identifiers.org/RRID:SCR_004426
    Explore at:
    Dataset updated
    Oct 24, 2019
    Description

    Central repository for collection of functional information on proteins, with accurate and consistent annotation. In addition to capturing core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and experimental and computational data. The UniProt Knowledgebase consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database which brings together experimental results, computed features, and scientific conclusions. UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization that await full manual annotation. Users may browse by taxonomy, keyword, gene ontology, enzyme class or pathway.

  6. b

    UniProt Protein

    • bioregistry.io
    Updated Apr 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). UniProt Protein [Dataset]. http://identifiers.org/wikidata:P352
    Explore at:
    Dataset updated
    Apr 26, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. Besides amino acid sequence and a description, it also provides taxonomic data and citation information.

  7. s

    Repository URL

    • cinergi.sdsc.edu
    resource url
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repository URL [Dataset]. http://cinergi.sdsc.edu/geoportal/rest/metadata/item/323ebc5365ec476ebdcb92329cf10b57/html
    Explore at:
    resource urlAvailable download formats
    Description

    Link Function: information

  8. f

    Protein-centric rate of sequence evolution according to Rate4Site on...

    • figshare.com
    txt
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    emmanuel levy; Benjamin Dubreuil (2021). Protein-centric rate of sequence evolution according to Rate4Site on orthogroups of 14 fungal species [Dataset]. http://doi.org/10.6084/m9.figshare.13735537.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 9, 2021
    Dataset provided by
    figshare
    Authors
    emmanuel levy; Benjamin Dubreuil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overall, 25 descriptors (features) are calculated for 3797 unique proteins.The legend for each descriptor is given in the associated header file.Columns 1-5 provide protein identifiers:- ORF, - SGD Gene Name, - UniprotKB, - Matching PDB structure?- PDB code of closest structureColumns 6-8 correspond to protein expression:- Integrated abundance in ppm,- log10 abundance,- bins of abundance (5 bins)Columns 9-16 contain evolutionary rates averaged over:- Full sequence- Disordered residues- Not Disordered residues- Domain residues- Not Domain residues- Residues with PDB coordinates- Surface residues (>25% relative ASA)- Buried residues (

  9. r

    UniProtKB/Swiss-Prot

    • rrid.site
    • scicrunch.org
    • +2more
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). UniProtKB/Swiss-Prot [Dataset]. http://identifiers.org/RRID:SCR_021164
    Explore at:
    Dataset updated
    Jun 28, 2025
    Description

    Curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.

  10. e

    UniProtKB

    • ebi.ac.uk
    Updated May 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). UniProtKB [Dataset]. https://www.ebi.ac.uk/interpro/proteome/uniprot/entry/smart/
    Explore at:
    Dataset updated
    May 12, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of the type protein from the database UniProtKB - version 2025_03

  11. f

    Taxonomic groups and the identified number of sequences deposited to the...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhanupratap Chouhan; Alexander Denesyuk; Jyrki Heino; Mark S. Johnson; Konstantin Denessiouk (2023). Taxonomic groups and the identified number of sequences deposited to the UniProtKB database that contain domain architectures similar to (and including) the integrin α (β-propeller) superfamily. [Dataset]. http://doi.org/10.1371/journal.pone.0025069.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bhanupratap Chouhan; Alexander Denesyuk; Jyrki Heino; Mark S. Johnson; Konstantin Denessiouk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Taxonomic groups and the identified number of sequences deposited to the UniProtKB database that contain domain architectures similar to (and including) the integrin α (β-propeller) superfamily.

  12. e

    UniProtKB

    • ebi.ac.uk
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). UniProtKB [Dataset]. https://www.ebi.ac.uk/ebisearch/metadata.ebi?db=uniprot
    Explore at:
    Dataset updated
    Sep 6, 2024
    Description

    UniProt Knowledge Base of protein sequences. The UniProt Knowledgebase is the central hub for the collection of functional information on proteins.

  13. R

    Isoelectric point for all UniProtKB/TrEMBL proteins April 2016

    • repod.icm.edu.pl
    • commons.datacite.org
    7z, bin
    Updated May 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kozlowski, Lukasz (2016). Isoelectric point for all UniProtKB/TrEMBL proteins April 2016 [Dataset]. http://doi.org/10.18150/repod.9948646
    Explore at:
    7z(11492396457), bin(11492396457)Available download formats
    Dataset updated
    May 18, 2016
    Dataset provided by
    RepOD
    Authors
    Kozlowski, Lukasz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicted isoelectric point for all UniProtKB/TrEMBL proteins (April 2016) done using 18 different algorithms. Over 63 millions of protein sequences. Compressed using 7zip **Primary reference: Kozlowski, LP (2016) Proteome-pI: proteome isoelectric point database. Nucleic Acids Research doi: 10.1093/nar/gkw978 **www: http://isoelectricpointdb.org

  14. s

    NEWT

    • scicrunch.org
    • neuinfo.org
    • +1more
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). NEWT [Dataset]. http://identifiers.org/RRID:SCR_004477
    Explore at:
    Dataset updated
    Dec 4, 2023
    Description

    NEWT is the taxonomy database maintained by the UniProt group. It integrates taxonomy data compiled in the NCBI database and data specific to the UniProt Knowledgebase. Browse by hierarchy, List all, or Complete proteomes. Organisms are classified in a hierarchical tree structure. Our taxonomy database contains every node (taxon) of the tree. UniProtKB taxonomy data is manually curated: next to manually verified organism names, we provide a selection of external links, organism strains and viral host information. Species with protein sequences stored in the UniProt Knowledgebase are named according to UniProt nomenclature. We endeavour to maintain a list of manually curated species names for which protein sequence data is available. In particular, we have adopted a systematic convention for naming viral and bacterial strains and isolates. Links to external sites are chosen by the UniProt taxonomy team and show pictures and various scientific data of interest (taxonomy, biology, physiology,...).

  15. n

    Data from: UniSave

    • neuinfo.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UniSave [Dataset]. http://identifiers.org/RRID:SCR_004946
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    The UniProtKB Sequence/Annotation Version Archive (UniSave) is a repository of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL entry versions. Entries can be retrieved by entering a primary accession number or an entry name and pressing the Go! button. The result of the query is a list of entry versions with the UniProtKB database name, entry status, primary accession number, entry name, entry version, sequence version, release number and the release date, ordered by the release date, the latest version first. The entry version status can be ''''incorporated'''', ''''active'''', ''''changed'''', ''''replaced'''' or ''''deleted''''. An incorporated entry version is the first entry version added into UniProtKB, an active entry version is part of the latest public release, a changed entry version has been superseded by a newer entry version, a replaced entry has become secondary to another entry, and a deleted entry has been removed from the UniProtKB without becoming secondary to any other entry. For replaced entry versions, the status ''''Replaced'''' can be clicked to return all entries, which have the given entry as a secondary entry. If a date is provided as part of the query then only the version of the entry that was current at that date is displayed. Entries can be viewed by clicking ''''View'''' in the query results table. The ''''>'''' links can be used to access the earlier and later entry versions. The ''''Back to List'''' link returns the user to the query results table. Selecting ''''UniProtKB'''' or ''''Fasta'''' and pressing ''''Save'''' downloads the entry in flat file or fasta format. Comparison between entry versions is straightforward: selecting two entries and clicking the ''''Compare Selected'''' button will show the differences between the two entries. Whenever comparisons are made a Smith-Waterman sequence alignment is computed using SSEARCH, and displayed at the bottom of the entry. The actual alignment is displayed only when the sequences are not identical.

  16. i

    UniProt Knowledgebase

    • registry.identifiers.org
    Updated Aug 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). UniProt Knowledgebase [Dataset]. https://registry.identifiers.org/registry/uniprot
    Explore at:
    Dataset updated
    Aug 16, 2019
    Description

    The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. Besides amino acid sequence and a description, it also provides taxonomic data and citation information.

  17. b

    UniProt Isoform

    • bioregistry.io
    • registry.identifiers.org
    Updated Dec 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). UniProt Isoform [Dataset]. http://identifiers.org/biolink:UNIPROT.ISOFORM
    Explore at:
    Dataset updated
    Dec 18, 2021
    Description

    The UniProt Knowledgebase (UniProtKB) is a comprehensive resource for protein sequence and functional information with extensive cross-references to more than 120 external databases. This collection is a subset of UniProtKB, and provides a means to reference isoform information.

  18. n

    GOA

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). GOA [Dataset]. http://identifiers.org/RRID:SCR_007691/resolver?q=&i=rrid
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    An annotation program which aims to provide high-quality Gene Ontology (GO) annotations to proteins in the UniProt Knowledgebase (UniProtKB) and International Protein Index (IPI). It is a central dataset for other major multi-species databases, such as Ensembl and NCBI. Because of the multi-species nature of the UniProtKB, UniProtKB-GOA assists in the curation of 200,000 species. This involves electronic annotation and the integration of high-quality manual GO annotation from all GO Consortium model organism groups and specialist groups. Gene Association Files can be accessed from the Downloads section of the website.

  19. Look Up table with Embeddings - UniProt - 2025 - Non Computational...

    • zenodo.org
    tar
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco M. Perez-Canales; Francisco M. Perez-Canales (2025). Look Up table with Embeddings - UniProt - 2025 - Non Computational annotations [Dataset]. http://doi.org/10.5281/zenodo.15704976
    Explore at:
    tarAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Francisco M. Perez-Canales; Francisco M. Perez-Canales
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This PostgreSQL database contains structured information extracted from the UniProt API (retrieved in April 2025). It includes:

    • 126,582 proteins

    • 123,518 sequences

    • 494.072 embeddings generated with ProtT5, ProSTT5, ESM2, Ankh

    • 623,134 GO term annotations (with evidence codes: EXP, IDA, IPI, IMP, IGI, IEP, TAS, IC)

    • Associated biological metadata

    The data was extracted using the protein-information-system repository and is used within the FANTASIA pipeline for automated functional annotation of protein sequences.

    The following UniProt API filter was applied to retrieve annotations:
    https://www.uniprot.org/uniprotkb?query=%28+go_exp%3A*+OR+go_ida%3A*+OR+go_ipi%3A*+OR+go_imp%3A*+OR+go_igi%3A*+OR+go_iep%3A*+OR+go_tas%3A*+OR+go_ic%3A*%29

    To ensure embedding reproducibility, all sequences were processed with batch size = 1, avoiding discrepancies caused by padding artifacts common in PLMs like ProtT5.

    To initialize the database, either of the following methods can be used:

    Option 1: Using pg_restore

    pg_restore -U usuario -h localhost -p 5432 -d BioData ./BioData_backup_2025_hq.tar
    

    Option 2: Using the FANTASIA CLI

    fantasia initialize --embeddings_url https://zenodo.org/records/15704357/files/PIS_2025_ankh_exp.tar?download=1


    Notes:

  20. r

    UniProt Chordata protein annotation program

    • rrid.site
    • scicrunch.org
    • +2more
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). UniProt Chordata protein annotation program [Dataset]. http://identifiers.org/RRID:SCR_007071
    Explore at:
    Dataset updated
    Jun 16, 2025
    Description

    Data set of manually annotated chordata-specific proteins as well as those that are widely conserved. The program keeps existing human entries up-to-date and broadens the manual annotation to other vertebrate species, especially model organisms, including great apes, cow, mouse, rat, chicken, zebrafish, as well as Xenopus laevis and Xenopus tropicalis. A draft of the complete human proteome is available in UniProtKB/Swiss-Prot and one of the current priorities of the Chordata protein annotation program is to improve the quality of human sequences provided. To this aim, they are updating sequences which show discrepancies with those predicted from the genome sequence. Dubious isoforms, sequences based on experimental artifacts and protein products derived from erroneous gene model predictions are also revisited. This work is in part done in collaboration with the Hinxton Sequence Forum (HSF), which allows active exchange between UniProt, HAVANA, Ensembl and HGNC groups, as well as with RefSeq database. UniProt is a member of the Consensus CDS project and thye are in the process of reviewing their records to support convergence towards a standard set of protein annotation. They also continuously update human entries with functional annotation, including novel structural, post-translational modification, interaction and enzymatic activity data. In order to identify candidates for re-annotation, they use, among others, information extraction tools such as the STRING database. In addition, they regularly add new sequence variants and maintain disease information. Indeed, this annotation program includes the Variation Annotation Program, the goal of which is to annotate all known human genetic diseases and disease-linked protein variants, as well as neutral polymorphisms.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SIB Swiss Institute of Bioinformatics on behalf of the UniProt Consortium (2021). UniProt [Dataset]. https://registry.opendata.aws/uniprot/
Organization logo

UniProt

Explore at:
Dataset updated
Apr 6, 2021
Dataset provided by
UniProthttp://www.uniprot.org/
Description

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB Swiss Institute of Bioinformatics and PIR are committed to the long-term preservation of the UniProt databases.

Search
Clear search
Close search
Google apps
Main menu