100+ datasets found
  1. The Therapeutic Drug Target Database Human SwissProt

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). The Therapeutic Drug Target Database Human SwissProt [Dataset]. https://www.johnsnowlabs.com/marketplace/the-therapeutic-drug-target-database-human-swissprot/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset is a selection of The Therapeutic Target Database (release 4.3.02, 18th Oct 2013) protein IDs for successful targets. The web page states 388 but these reduced to 345 human Swiss-Prot accessions.

  2. r

    UniprotKB/SwissProt

    • resodate.org
    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boutet; Lieberherr; Tognolli; Schneider; Bansal; Bridge; Poux; Bougueleret; Xenarios (2024). UniprotKB/SwissProt [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdW5pcHJvdGtiLXN3aXNzcHJvdA==
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Boutet; Lieberherr; Tognolli; Schneider; Bansal; Bridge; Poux; Bougueleret; Xenarios
    Description

    The UniprotKB/SwissProt database contains protein sequence information.

  3. Swiss-Prot database

    • springernature.figshare.com
    application/cdfv2
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuqi Wang; Cuihong You; Hongyu Ma; Yin Zhang; Guidong Miao; Qingyang Wu; Fan Lin; Jude Juventus Aweya (2023). Swiss-Prot database [Dataset]. http://doi.org/10.6084/m9.figshare.6124457.v1
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Shuqi Wang; Cuihong You; Hongyu Ma; Yin Zhang; Guidong Miao; Qingyang Wu; Fan Lin; Jude Juventus Aweya
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    All unigenes of Portunus sanguinolentus hit to the Swiss-Prot database.

  4. BLASTP vs SwissProt

    • figshare.com
    txt
    Updated Feb 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Franco Liberati (2025). BLASTP vs SwissProt [Dataset]. http://doi.org/10.6084/m9.figshare.28407980.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 16, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Franco Liberati
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BLASTP vs SwissProt

  5. Matches Found in Swiss-Prot Database.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kemal Sonmez; Naunihal T. Zaveri; Ilan A. Kerman; Sharon Burke; Charles R. Neal; Xinmin Xie; Stanley J. Watson; Lawrence Toll (2023). Matches Found in Swiss-Prot Database. [Dataset]. http://doi.org/10.1371/journal.pcbi.1000258.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kemal Sonmez; Naunihal T. Zaveri; Ilan A. Kerman; Sharon Burke; Charles R. Neal; Xinmin Xie; Stanley J. Watson; Lawrence Toll
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    False PositivesOther signaling molecules: FGF-3,5,7,10,17,18; GDNF; CD8,28; PDGF-2; TGF; VEGF (vascular endothelial growth factor); HBNF-1; MIP; NGF (nerve growth factor); Cytokine A21, IFN-α (interferon alpha); IGF binding protein 1B,2,3; IL7 (interleukin 7).Other: MAGF (microfibril associated protein), MINK (K-channel), K-channel related peptide, L-type Ca2+ channel, gamma subunit, myelin Po protein, Dif-2, Eosinophil, Syntaxin 1B (vesicle docking), Syntaxin 2, TMP21 (vesicle trafficking protein), Coagulation factor III, PGD2 synthase, syndecans, FKBP12 (FK506 binding protein), Folate receptor, ERp29, COMT, Connexin 32, Cytostatin.

  6. Approved and Researched Drug Targets Human SwissProt Accessions

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Approved and Researched Drug Targets Human SwissProt Accessions [Dataset]. https://www.johnsnowlabs.com/marketplace/approved-and-researched-drug-targets-human-swissprot-accessions/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset is a supplementary data from "Analysis of in vitro bioactivity data extracted from drug discovery literature and patents: Ranking 1654 human protein targets by assayed compounds and molecular scaffolds" (2011). In this case the Entrez Gene IDs were mapped to 1651 human Swiss-Prot accessions but this includes both approved and research targets.

  7. b

    SwissProt search result

    • dbarchive.biosciencedbc.jp
    Updated Jun 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). SwissProt search result [Dataset]. http://doi.org/10.18908/lsdba.nbdc00120-019
    Explore at:
    Dataset updated
    Jun 1, 2016
    Description

    Results of blastx searches against the Swiss-Prot database

  8. UniProt Proteins Reviewed (Swiss-Prot)

    • kaggle.com
    zip
    Updated Aug 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Lovyagin (2022). UniProt Proteins Reviewed (Swiss-Prot) [Dataset]. https://www.kaggle.com/datasets/andreylovyagin/uniprot-proteins-reviewed-swissprot
    Explore at:
    zip(479163007 bytes)Available download formats
    Dataset updated
    Aug 6, 2022
    Authors
    Andrey Lovyagin
    Description

    Uploaded UniProt reviewed proteins database with all columns for easier using in kaggle notebooks. All columns have description, but if you will have any questions, you can check UniProt Help where every column have a full explanation.

    For UniProt Species Proteomes check this dataset.

    License: Creative Commons Attribution 4.0 International (CC BY 4.0) License

  9. Proven Drug Targets Converted to Human SwissProt Accessions

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Proven Drug Targets Converted to Human SwissProt Accessions [Dataset]. https://www.johnsnowlabs.com/marketplace/proven-drug-targets-converted-to-human-swissprot-accessions/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset is a supplementary data from "Novelty in the target landscape of the pharmaceutical industry" (2013). The listing of proven drug targets is converted to 248 human Swiss-Prot accessions.

  10. mESC shotgun and positional proteomics based on deep proteome sequence...

    • data.niaid.nih.gov
    • ebi.ac.uk
    xml
    Updated Feb 25, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerben Menschaert; Gerben Menschaert (2013). mESC shotgun and positional proteomics based on deep proteome sequence database (derived from RIBOseq data) [Dataset]. https://data.niaid.nih.gov/resources?id=pxd000124
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Feb 25, 2013
    Dataset provided by
    Faculty of Bioscience Engineering
    Authors
    Gerben Menschaert; Gerben Menschaert
    Variables measured
    Proteomics
    Description

    Shotgun and positional proteomics study of a mouse embryonic stem cell line. We devised a proteogenomic approach constructing a custom protein sequence search space, built from both SwissProt and RIBO-seq derived translation products, applicable for LC-MSMS spectrum identification. To record the impact of using the constructed deep proteome database we performed two alternative MS-based proteomic strategies: (I) a regular shotgun proteomic and (II) an N-terminal COFRADIC approach. The obtained fragmentation spectra were searched against the custom database (combination of UniProtKB-SwissProt and RIBO-seq derived translation sequences) using three different search engines: OMSSA (version 2.1.9), X!Tandem (TORNADO, version 2010.01.01.04) and Mascot (version 2.3). The first two were run from the SearchGUI graphical user interface (version 1.10.4). A combination of X!Tandem and Mascot was used for the N-terminal COFRADIC analysis, a combination of all three search engines for the shotgun proteome analysis. Note that OMMSA cannot cope with the protease setting semi-ArgC/P needed to analyze N-terminal COFRADIC data.For the shotgun proteome data, trypsin was set as cleavage enzyme allowing for one missed cleavage, and singly to triply charged precursors or singly to quadruple charged precursors were taken into account respectively for the Mascot or X!Tandem/OMSSA search engines, and the precursor and fragment mass tolerance were set to respectively 10 ppm and 0.5 Da. Methionine oxidation to methionine-sulfoxide, pyroglutamate formation of N-terminal glutamine and acetylation (protein N-terminus) were set as variable modifications. For the N-terminal COFRADIC analysis the protease setting semi-ArgC/P (Arg-C specificity with arginine-proline cleavage allowed) was used. No missed cleavages were allowed and the precursor and fragment mass tolerance were also set to respectively 10 ppm and 0.5 Da. Carbamidomethylation of cysteine and methionine oxidation to methionine-sulfoxide and 13C3D2-acetylation of lysines were set as fixed modifications. Peptide N-terminal acetylation or 13C3D2-acetylation and pyroglutamate formation of N-terminal glutamine were set as variable modifications and instrument setting was put on ESI-TRAP. Protein and peptide identification in addition to data interpretation was done using the PeptideShaker algorithm (http://code.google.com/p/peptide-shaker, version 0.18.3), setting the false discovery rate to 1% at all levels (protein, peptide, and peptide to spectrum matching). Aforementioned tools and algorithms (SearchGui, X!Tandem, OMSSA, and PeptideShaker) are freely available as open source.

  11. n

    UniProtKB/Swiss-Prot

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). UniProtKB/Swiss-Prot [Dataset]. http://identifiers.org/RRID:SCR_021164
    Explore at:
    Dataset updated
    Oct 2, 2024
    Description

    Curated component of UniProtKB (produced by the UniProt consortium). It contains hundreds of thousands of protein descriptions, including function, domain structure, subcellular location, post-translational modifications and functionally characterized variants.

  12. r

    Brain Gene Expression Database

    • rrid.site
    • scicrunch.org
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Brain Gene Expression Database [Dataset]. http://identifiers.org/RRID:SCR_007299/resolver?q=&i=rrid
    Explore at:
    Dataset updated
    Nov 12, 2025
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented on June 08, 2011. This database contains gene expression data for various physiological and pathological processes in mouse brain. All the data have been obtained by adaptor-tagged competitive PCR, an advanced version of quantitative PCR. Brain Gene Expression Database (BGED) contains gene expression data for various physiological and pathological processes in mouse brain. All the data have been obtained by adaptor-tagged competitive PCR, an advanced version of quantitative PCR. Manual Download 1. Data retrieval Gene expression data can be retrieved either by ID numbers or by keywords representing functional annotations from this page. The ID numbers include GenBank, RefSeq, SwissProt, Gene Ontology, and BED (our own ID). The keyword search is based either on definition in GenBank, SwissProt and RefSeq, functional annotation of SwissProt database, or Gene Ontology terms. 2. Gene expression pattern display * Display of multiple gene expression patterns. Expression patterns of multiple genes selected by the keyword search can be displayed from the result page of the keyword search. * Gene expression pattern similarity search This function is available on the information page of each gene accessed through BED ID (in-house ID).

  13. Z

    PSSH2 - database of protein sequence-to-structure homologies (including...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Schafferhans; Sean O'Donoghue; Neblina Sikta; Sandeep Kaur (2022). PSSH2 - database of protein sequence-to-structure homologies (including Sars-CoV-2 structures) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4279163
    Explore at:
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    Garvan Institute of Medical Research
    HSWT
    Authors
    Andrea Schafferhans; Sean O'Donoghue; Neblina Sikta; Sandeep Kaur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein sequence and structure data

    This data set contains data from Uniprot (in the files called protein_sequence, protein_synonyms, protein_names, organism_synonyms) and PDB (in the files called PDB and PDB_chain) as used by the Aquaria web resource at the time of download (2022-02-08).

    The PSSH2 data set

    PSSH2 is a database of protein sequence-to-structure homologies based on HHblits, an alignment method employing iterative comparisons of hidden Markov models (HMMs). To ensure the highest possible final alignment quality for matches in Aquaria using HHblits, we first calculate HMM profiles for each unique PDB sequence (PDB_full) and also for each unique Swiss-Prot sequence. We generated PSSH2 using HHblits to find similarities between HMMs from PDB and HMMs from UniProt sequences.

    Calculating PSSH2

    The Swissprot and PDB data was downloaded in November 2021. Generating PSSH2: We used UniRef30_2021_03 (originally called UniRef30_2021_06) from HH-suite, a database of non-redundant UniProt sequence clusters in which the highest pairwise sequence identity between clusters was 30%. The HHblits code and the code for running the calculations was retrieved from git (https://github.com/soedinglab/hh-suite.git and https://github.com/aschafu/PSSH2.git respectively) at the respective time of calculation in the timeframe until December 2021.

    PDB based sequence-to-structure alignments

    In addition to the PSSH2 data, new PDB structures were retrieved based on the primary accession of the proteins, by querying for all chains in all PDB entries with exact matches using the sequence cross references records given in PDB. Sequence-to-structure alignments were then created, again based on information provided in each PDB entry. These are contained in the PDBchain data.

    This data covers sequences and PDB structures in the timeframe until February 2022.

    Evaluating PSSH2

    The resulting alignment data was analysed using CATH domain assignments downloaded from /cath/releases/all-releases/v4_2_0/cath-classification-data/ to define correct hits and false hits:

    The set of query sequences is defined by the CATH non-redundant S40_overlap_60 dataset (ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/all-releases/v4_2_0/non-redundant-data-sets/)

    The set of all expected hits are all pdb structures containing a domain with the same CATH code if contained in the set of processed sequences (-> all) or only if also contained in the set of non redundant sequences (-> nr40).

    The set of true positives is defined by sharing the same CATH code up to the level of homology ("CATH") or up to the level of topology ("CAT").

    The data was evaluated with respect to false discovery rate (FDR) and recall (true positive rate TPR) by cumulatively considering all hits with an E-value below the threshold ("C") or in bins with an E-value between the threshold and one tenth of the threshold ("B"). This evaluation was carried out for the data obtained in November 2021 (202111) as well as previous data from October 2020 (202010), February 2020 (202002) and September 2017 (201709). The results are collected in PSSH CATH validation.csv.

    Known errors

    Due to processing error, the profile of pdb structure 5fia A / B (sequence md5 052667679fc644184f40063c7602c9e1) is incomplete in the pdb_full hhblits database which led to further errors in generating sequence based alignments for sequences for 1vtm P (sequence md5 c844aff103449363cb8489c78c58ebf1) and 434t A / B (sequence md5 d67aa1c3a36492c719cb48b5e7ecc624).

  14. PSSH2 - database of protein sequence-to-structure homologies - Sars-CoV-2...

    • zenodo.org
    application/gzip, csv
    Updated Feb 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Schafferhans; Andrea Schafferhans; Sean O'Donoghue; Sean O'Donoghue (2022). PSSH2 - database of protein sequence-to-structure homologies - Sars-CoV-2 subset [Dataset]. http://doi.org/10.5281/zenodo.4916895
    Explore at:
    application/gzip, csvAvailable download formats
    Dataset updated
    Feb 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrea Schafferhans; Andrea Schafferhans; Sean O'Donoghue; Sean O'Donoghue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The PSSH2 data set

    PSSH2 is a database of protein sequence-to-structure homologies based on HHblits, an alignment method employing iterative comparisons of hidden Markov models (HMMs). To ensure the highest possible final alignment quality for matches in Aquaria using HHblits, we first calculate HMM profiles for each unique PDB sequence (PDB_full) and also for each unique Swiss-Prot sequence. We generated PSSH2 using HHblits to find similarities between HMMs from PDB and HMMs from UniProt sequences.

    This dataset contains a subset of the usual PSSH2 database, including only the proteins relevant to visualise Sars-CoV-2 structures.
    It contains Swissprot and PDB data used for generating PSSH2 along with the PSSH2 data itself. This consists of the sequence-to-structure alignments used in Aquaria (aquaria.ws) and also for the Covid19 resource of Aquaria (http://aquaria.ws/covid).

    Calculating PSSH2

    The main bunch of Swissprot and PDB data was downloaded in October 2020, but incremental updates, especially as related to Covid19 were added until April 2021.
    Generating PSSH2: We used Uniclust30 from HH-suite, a database of non-redundant UniProt sequence clusters in which the highest pairwise sequence identity between clusters was 30% (http://gwdu111.gwdg.de/~compbiol/uniclust/2020_03/UniRef30_2020_03_hhsuite.tar.gz). The HHblits code and the code for running the calculations was retrieved from git (https://github.com/soedinglab/hh-suite.git and https://github.com/aschafu/PSSH2.git respectively) at the respective time of calculation in the timeframe until April 2021.

  15. n

    Alternative Splicing Database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Feb 1, 2001
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2001). Alternative Splicing Database [Dataset]. http://identifiers.org/RRID:SCR_007555
    Explore at:
    Dataset updated
    Feb 1, 2001
    Description

    It has been established with the intention of assembling in a central, publicly accessible site information about alternatively spliced genes, their products and expression patterns. Version 2.1 of ASDB consists of two divisions, ASDB(proteins) , which contains amino acid sequences, and ASDB(nucleotides) with genomic sequences. SWISS-PROT uses two formats for description of alternative splicing Thus the protein sequences were selected from SWISS-PROT using full text search for both the words alternative splicing (usually in the CC lines) and varsplic (in the FT lines). In order to group proteins that could arise by alternative splicing of the same gene, we developed the clustering procedure. Two proteins were linked if they had a common fragment of at least 20 amino acids, and clusters were initially defined as maximum connected groups of linked proteins. It turned out that some clusters were chimeric, in the sense that they contained members of multi-gene families, but not alternatively spliced variants of one gene. Therefore the multiple alignments were subject to additional analysis aimed at detection of chimeric clusters. Each cluster is represented by multiple alignment of its members constructed using CLUSTALW. The distribution of cluster size, representation of species and other relevant statistics of ASDB(proteins) can be accessed through the links below. This processing covers the cases when alternatively spliced variants are described in separate SWISS-PROT entries. The other kinds of ASDB records, originating from the SWISS-PROT entries with the varsplic field in the feature table, usually describe the proteins that are not part of any cluster. In these cases, the information on the variable fragments of the several proteins which result from the alternative splicing of a single gene is contained in the entry itself. ASDB(proteins) entries are marked with different symbols to allow for easy differentiation among the three types: those proteins which are part of the ASDB clusters and the corresponding multialignments, those which have the information on different variants in the associated SWISS-PROT entries, and those for which the information on the variants is not available at the present time. ASDB contains internal links between entries and/or clusters, as well as external links to Medline, GenBank and SWISS-PROT entries. The ASDB(nucleotides) division was generated by collecting all GenBank entries containing the words alternative splicing and further selection of those entries that contain complete gene sequences (all CDS fields are complete, i.e. they do not have continuation signs). Sponsors: This work was supported by the Director, Office of Energy Research, Office of Biological and Environmental Research, of the US Department of Energy under Contract No. DE-ACO3-76SF00098. Additional support came from grants from the Russian Fund of Basic Research (99-04-48347), the Russian State Scientific Program Human Genome (65/99), and the Merck Genome Research Institute (244).

  16. f

    Data from: Method Development for Metaproteomic Analyses of Marine Biofilms

    • acs.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dagmar Hajkova Leary; W. Judson Hervey; Robert W. Li; Jeffrey R. Deschamps; Anne W. Kusterbeck; Gary J. Vora (2023). Method Development for Metaproteomic Analyses of Marine Biofilms [Dataset]. http://doi.org/10.1021/ac203315n.s003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    ACS Publications
    Authors
    Dagmar Hajkova Leary; W. Judson Hervey; Robert W. Li; Jeffrey R. Deschamps; Anne W. Kusterbeck; Gary J. Vora
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The large-scale identification and quantitation of proteins via nanoliquid chromatography (LC)-tandem mass spectrometry (MS/MS) offers a unique opportunity to gain unprecedented insight into the microbial composition and biomolecular activity of true environmental samples. However, in order to realize this potential for marine biofilms, new methods of protein extraction must be developed as many compounds naturally present in biofilms are known to interfere with common proteomic manipulations and LC-MS/MS techniques. In this study, we used amino acid analyses (AAA) and LC-MS/MS to compare the efficacy of three sample preparation methods [6 M guanidine hydrochloride (GuHCl) protein extraction + in-solution digestion + 2D LC; sodium dodecyl sulfate (SDS) protein extraction + 1D gel LC; phenol protein extraction + 1D gel LC] for the metaproteomic analyses of an environmental marine biofilm. The AAA demonstrated that proteins constitute 1.24% of the biofilm wet weight and that the compared methods varied in their protein extraction efficiencies (0.85–15.15%). Subsequent LC-MS/MS analyses revealed that the GuHCl method resulted in the greatest number of proteins identified by one or more peptides whereas the phenol method provided the greatest sequence coverage of identified proteins. As expected, metagenomic sequencing of the same biofilm sample enabled the creation of a searchable database that increased the number of protein identifications by 48.7% (≥1 peptide) or 54.7% (≥2 peptides) when compared to SwissProt database identifications. Taken together, our results provide methods and evidence-based recommendations to consider for qualitative or quantitative biofilm metaproteome experimental design.

  17. n

    UniProtKB

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Oct 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). UniProtKB [Dataset]. http://identifiers.org/RRID:SCR_004426
    Explore at:
    Dataset updated
    Oct 13, 2024
    Description

    Central repository for collection of functional information on proteins, with accurate and consistent annotation. In addition to capturing core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and experimental and computational data. The UniProt Knowledgebase consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database which brings together experimental results, computed features, and scientific conclusions. UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization that await full manual annotation. Users may browse by taxonomy, keyword, gene ontology, enzyme class or pathway.

  18. n

    Human Mitochondrial Protein Database

    • neuinfo.org
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Human Mitochondrial Protein Database [Dataset]. http://identifiers.org/RRID:SCR_002913
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. The mitochondrion plays a central role in cellular metabolism, and evidence of mitochondrial involvement in a number of different human diseases is increasing. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases. Mitochondrial DNA Sequence: A graphical tool was developed to visualize the human mitochondrial DNA sequences that highlight coding regions for RNAs and proteins. Disease susceptible mutations are also noted in the sequence. Mitochondrial DNA Polymorphism: Human mitochondrial sequences of different ethnic groups were obtained from the Human Mitochondrial Genome Database. A DNA sequence analysis tool was developed to compare polymorphisms of different human mitochondrial DNA sequences. This tool allows the user to select mitochondrial sequences from any two human populations and compare them for sequences variations. Mitochondrial proteins related diseases: Malfunction of mitochondrial proteins affect many cells from brain, heart, liver, skeletal muscles, kidney, and the endocrine and the respiratory systems which lead to many diseases. Relevant information for mitochondrial related diseases from OMIM, the Neuromuscular Disease Center and MITOMAP are gathered, and mitochondrion-associated diseases are grouped, categorized, and linked to OMIM. 3-D Structures of Mitochondrial proteins: The available 3D structures for mitochondrial proteins are presented through a custom-made interface. A concise HTML page is generated for reporting the structural details and the associated information obtained from relevant web sites (PDBREPORT, Interatomic Contacts of Structural Units (CSU), PROCHECK, Ligand Protein Contacts (LPC), PROMOTIF and CastP). References are linked to the PubMed site. The 3-D structures are presented through the use of a Kinemage.

  19. UniProtKB/Swiss-Prot Protein Embeddings

    • kaggle.com
    zip
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Ofer (2023). UniProtKB/Swiss-Prot Protein Embeddings [Dataset]. https://www.kaggle.com/datasets/danofer/uniprotkbswiss-prot-protein-embeddings/data
    Explore at:
    zip(2087271680 bytes)Available download formats
    Dataset updated
    Apr 23, 2023
    Authors
    Dan Ofer
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description follows is from the official UniProt embeddings page, which also hosts this dataset originally.

    Protein embeddings are a way to encode functional and structural properties of a protein, mostly from its sequence only, in a machine-friendly format (vector representation). Generating such embeddings is computationally expensive, but once computed they can be leveraged for different tasks, such as sequence similarity search, sequence clustering, and sequence classification.

    UniProt provided raw embeddings (mean pooled, per-protein using the ProtT5 model) for UniProtKB/Swiss-Prot.

    Note: Protein sequences longer than 12k residues are excluded due to limitation of GPU memory (this concerns only a handful of proteins).

    Sample code The embeddings.h5 files store the embeddings as key-value pairs. The key is the protein accession number and the value is the embeddings vector. The following code snippet shows how to read and iterate over an embeddings file in python.

    import numpy as np
    import h5py
    
    with h5py.File("path/to/embeddings.h5", "r") as file:
      print(f"number of entries: {len(file.items())}")
      for sequence_id, embedding in file.items():
        print(
          f" id: {sequence_id}, "
          f" embeddings shape: {embedding.shape}, "
          f" embeddings mean: {np.array(embedding).mean()}"
        )
    

    Sample output (SARS-CoV-2 embeddings from release 2022_04) per-protein file:

    number of entries: 17 id: A0A663DJA2, embeddings shape: (1024,), embeddings mean: 0.0006136894226074219 id: P0DTC1, embeddings shape: (1024,), embeddings mean: 0.0011968612670898438 id: P0DTC2, embeddings shape: (1024,), embeddings mean: 0.001041412353515625

    SOURCE: https://www.uniprot.org/help/embeddings https://www.uniprot.org/help/downloads#embeddings Reviewed (Swiss-Prot) - per-protein: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/embeddings/uniprot_sprot/per-protein.h5

  20. Number of human protein variations collected from the UniProt/Swiss-Prot...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongwook Choi; Gregory E. Sims; Sean Murphy; Jason R. Miller; Agnes P. Chan (2023). Number of human protein variations collected from the UniProt/Swiss-Prot database. [Dataset]. http://doi.org/10.1371/journal.pone.0046688.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yongwook Choi; Gregory E. Sims; Sean Murphy; Jason R. Miller; Agnes P. Chan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of human protein variations collected from the UniProt/Swiss-Prot database.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
John Snow Labs (2021). The Therapeutic Drug Target Database Human SwissProt [Dataset]. https://www.johnsnowlabs.com/marketplace/the-therapeutic-drug-target-database-human-swissprot/
Organization logo

The Therapeutic Drug Target Database Human SwissProt

Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
N/A
Description

This dataset is a selection of The Therapeutic Target Database (release 4.3.02, 18th Oct 2013) protein IDs for successful targets. The web page states 388 but these reduced to 345 human Swiss-Prot accessions.

Search
Clear search
Close search
Google apps
Main menu