Facebook
TwitterDatabase containing several body fluid proteomes, including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000. Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools.
Facebook
TwitterThe Proteome 2D-PAGE Database system for microbial research is a curated database for storing and investigating proteomics data. Software tools are available and for data submission, please contact the Database Curator. Established at the Max Plank Institution for Infection Biology, this system contains four interconnected databases: i.) 2D-PAGE Database: Two dimensional electrophoresis (2-DE) and mass spectrometry of diverse microorganisms and other organisms. This database currently contains 4971 identified spots and 1228 mass peaklists in 44 reference maps representing experiments from 24 different organisms and strains. The data were submitted by 84 Submitters from 24 Institutes and 12 nations. It also contains various software tools that are important in formatting and analyzing gels and mass peaks; software include: *TopSpot: Scanning the gel, editing the spots and saving the information *Fragmentation: Fragmentation of the gel image into sections *MS-Screener: Perl script to compare the similarity of MALDI-PMF peaklists *MS-Screener update: MS-Screener can be used to compare mass spectra (MALDI-MS(/MS) as well as ESI-MS/MS spectra) on the basis of their peak lists (.dta, .pkm, .pkt, or .txt files), to recalibrate mass spectra, to determine and eliminate exogenous contaminant peaks, and to create matrices for cluster analyses. *GelCali: Online calibration of the Mr- and pI-axis of 2-DE gels with mathematical regression methods ii.)Isotope Coded Affinity Tag (ICAT)-LC/MS database: Isotope Coded Affinity Tag (ICAT)-LC/MS data for Mycobacterium tuberculosis strain BCG versus H37Rv. iii.) FUNC_CLASS database: Functional classification of diverse microorganism. This database also integrates genomic, proteomic, and metabolic data. iv.) DIFF database: Presentation of differently regulated proteins obtained by comparative proteomic experiments using computerized gel image analysis.
Facebook
TwitterDatabase of lipid related proteins representing human and mouse proteins involved in lipid metabolism. Collection of lipid related genes and proteins contains data for genes and proteins from Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae, Caenorhabditis elegans, Escherichia coli, Macaca mulata, Drosophila melanogaster, Arabidopsis thaliana and Danio rerio.
Facebook
TwitterRice Proteome Database contains information on proteins identified from several organs and organelles on two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) reference maps.
Facebook
TwitterA Plant Proteome DataBase for Arabidopsis thaliana and maize (Zea mays). The PPDB stores experimental data from in-house proteome and mass spectrometry analysis, curated information about protein function, protein properties and subcellular localization. Importantly, proteins are particularly curated for possible (intra) plastid location and their plastid function. Protein accessions identified in published Arabidopsis (and other Brassicacea) proteomics papers are cross-referenced to rapidly determine previous experimental identification by mass spectrometry. All protein-encoding gene models in the Arabidopsis nuclear and organellar genomes, as assembled by TAIR, as well as all maize EST assemblies (ZmGI) as assembled by DFCI Maize Gene Index project. These are all uploaded in PPDB and are linked to each other via a BLAST alignment. Thus every predicted protein in both species can be searched for experimental and other information (even if not experimentally identified).
Facebook
TwitterThe results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstancesa problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/.
Facebook
TwitterIt archives data on more than 700 proteins that were identified by multiple mass spectrometry (MS) analyses from highly purified preparations of human nucleoli the most prominent nuclear organelle. Each protein entry is annotated with information about its corresponding gene its domain structures and relevant protein homologues across species as well as documenting its MS identification history including all the peptides sequenced by tandem MS/MS. Moreover, data showing the quantitative changes in the relative levels of 500 nucleolar proteins are compared at different timepoints upon transcriptional inhibition. Correlating changes in protein abundance at multiple timepoints highlighted by visualization means in the NOPdb provides clues regarding the potential interactions and relationships between nucleolar proteins and thereby suggests putative functions for factors within the 30% of the proteome which comprises novel/ uncharacterized proteins. The NOPdb is searchable by either gene names protein sequences Gene Ontology terms or motifs or by limiting the range for isoelectric points and/or molecular weights and links to other databases (e.g. LocusLink OMIM and PubMed).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The missing human proteome comprises predicted protein-coding genes with no credible protein level evidence detected so far and constitutes ∼18% of the human protein coding genes (neXtProt release 19/9/2014). The missing proteins may be of pharmacological interest as many of these are membrane receptors, thus requiring comprehensive characterization. In the present study, we explored various computational parameters, crucial during protein searches from tandem mass spectrometry (MS) data, for their impact on missing protein identification. Variables taken into consideration are differences in search database composition, shared peptides, semitryptic searches, post-translational modifications (PTMs), and transcriptome guided proteogenomic searches. We used a multialgorithmic approach for protein detection from publicly available mass spectra from recent studies covering diverse human tissues and cell types. Using the aforementioned approaches, we successfully detected 24 missing proteins (22-PE2, 1-PE4, and 1-PE5). Maximum of these identifications could be attributed to differences in reference proteome databases, exemplifying use of a single standard database for human protein detection from MS data. Our results suggest that search strategies with modified parameters can be rewarding alternatives for extensive profiling of missing proteins. We conclude that using complementary spectral data searches incorporating different parameters like PTMs, against a comprehensive and compact search database, might lead to discoveries of the proteins attributed so far as the missing human proteome.
Facebook
TwitterProgrammed cell death is a ubiquitous process of utmost importance for the development and maintenance of multicellular organisms. More than 10 different types of programmed cell death forms have been discovered. Several proteomics analyses have been performed to gain insight in proteins involved in the different forms of programmed cell death. To consolidate these studies, we have developed the cell death proteomics (CDP) database, which comprehends data from apoptosis, autophagy, cytotoxic granule-mediated cell death, excitotoxicity, mitotic catastrophe, paraptosis, pyroptosis, and Wallerian degeneration. The CDP database is available as a web-based database to compare protein identifications and quantitative information across different experimental setups. The proteomics data of 73 publications were integrated and unified with protein annotations from UniProt-KB and gene ontology (GO). Currently, more than 6,500 records of more than 3,700 proteins are included in the CDP. Comparing apoptosis and autophagy using overrepresentation analysis of GO terms, the majority of enriched processes were found in both, but also some clear differences were perceived. Furthermore, the analysis revealed differences and similarities of the proteome between autophagosomal and overall autophagy. The CDP database represents a useful tool to consolidate data from proteome analyses of programmed cell death and is available at http://celldeathproteomics.uio.no.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A comprehensive study of the molecular active landscape of human cells can be undertaken to integrate two different but complementary perspectives: transcriptomics, and proteomics. After the genome era, proteomics has emerged as a powerful tool to simultaneously identify and characterize the compendium of thousands of different proteins active in a cell. Thus, the Chromosome-centric Human Proteome Project (C-HPP) is promoting a full characterization of the human proteome combining high-throughput proteomics with the data derived from genome-wide expression profiling of protein-coding genes. Here we present a full proteomic profiling of a human lymphoma B-cell line (Ramos) performed using a nanoUPLC-LTQ-Orbitrap Velos proteomic platform, combined to an in-depth transcriptomic profiling of the same cell type. Data are available via ProteomeXchange with identifier PXD001933. Integration of the proteomic and transcriptomic data sets revealed a 94% overlap in the proteins identified by both -omics approaches. Moreover, functional enrichment analysis of the proteomic profiles showed an enrichment of several functions directly related to the biological and morphological characteristics of B-cells. In turn, about 30% of all protein-coding genes present in the whole human genome were identified as being expressed by the Ramos cells (stable average of 30% genes along all the chromosomes), revealing the size of the protein expression-set present in one specific human cell type. Additionally, the identification of missing proteins in our data sets has been reported, highlighting the power of the approach. Also, a comparison between neXtProt and UniProt database searches has been performed. In summary, our transcriptomic and proteomic experimental profiling provided a high coverage report of the expressed proteome from a human lymphoma B-cell type with a clear insight into the biological processes that characterized these cells. In this way, we demonstrated the usefulness of combining -omics for a comprehensive characterization of specific biological systems.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented July 22, 2016.A database on the proteome of rice that contains reference maps based on two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) of proteins from rice tissues and subcellular compartments.
Facebook
TwitterThe Global Proteome Machine Database was constructed to utilize the information obtained by GPM servers to aid in the difficult process of validating peptide MS/MS spectra as well as protein coverage patterns.
Facebook
TwitterA fasta-formatted database of 36,866,870 predicted proteins representing 4,351 unique species from 117 phyla., A database of 36,866,870 predicted proteins representing 4,351 unique species from 117 phyla (see table below) was constructed using the UniProt Reference Proteome (RP) at the 35% co-membership threshold including 4,295 Representative Proteome Groups (RPGs) (Chen et al. 2011) in addition to all taxonomically identifiable transcriptomes of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) (Keeling et al. 2014) that were processed through WinstonCleaner (https://github.com/kolecko007/WinstonCleaner). The database also included proteins inferred from the annotated and assembled genomes of Aurantiochytrium limacinum ATCC MYA-1381, Schizochytrium aggregatum ATCC 28209, and Aplanochytrium kerguelensis PBS07 from the U.S. Department of Energy’s Joint Genome Institute (JGI), all PFAM PF00494 Aurantiochytrium sp. KH105 proteome hits from the Okinawa Institute of Science and Technology Marine Genomics Unit genome browser, all of UniProt's annotated Hondaea fermentalgiana pr...,
Facebook
TwitterShotgun and positional proteomics study of a mouse embryonic stem cell line. We devised a proteogenomic approach constructing a custom protein sequence search space, built from both SwissProt and RIBO-seq derived translation products, applicable for LC-MSMS spectrum identification. To record the impact of using the constructed deep proteome database we performed two alternative MS-based proteomic strategies: (I) a regular shotgun proteomic and (II) an N-terminal COFRADIC approach. The obtained fragmentation spectra were searched against the custom database (combination of UniProtKB-SwissProt and RIBO-seq derived translation sequences) using three different search engines: OMSSA (version 2.1.9), X!Tandem (TORNADO, version 2010.01.01.04) and Mascot (version 2.3). The first two were run from the SearchGUI graphical user interface (version 1.10.4). A combination of X!Tandem and Mascot was used for the N-terminal COFRADIC analysis, a combination of all three search engines for the shotgun proteome analysis. Note that OMMSA cannot cope with the protease setting semi-ArgC/P needed to analyze N-terminal COFRADIC data.For the shotgun proteome data, trypsin was set as cleavage enzyme allowing for one missed cleavage, and singly to triply charged precursors or singly to quadruple charged precursors were taken into account respectively for the Mascot or X!Tandem/OMSSA search engines, and the precursor and fragment mass tolerance were set to respectively 10 ppm and 0.5 Da. Methionine oxidation to methionine-sulfoxide, pyroglutamate formation of N-terminal glutamine and acetylation (protein N-terminus) were set as variable modifications. For the N-terminal COFRADIC analysis the protease setting semi-ArgC/P (Arg-C specificity with arginine-proline cleavage allowed) was used. No missed cleavages were allowed and the precursor and fragment mass tolerance were also set to respectively 10 ppm and 0.5 Da. Carbamidomethylation of cysteine and methionine oxidation to methionine-sulfoxide and 13C3D2-acetylation of lysines were set as fixed modifications. Peptide N-terminal acetylation or 13C3D2-acetylation and pyroglutamate formation of N-terminal glutamine were set as variable modifications and instrument setting was put on ESI-TRAP. Protein and peptide identification in addition to data interpretation was done using the PeptideShaker algorithm (http://code.google.com/p/peptide-shaker, version 0.18.3), setting the false discovery rate to 1% at all levels (protein, peptide, and peptide to spectrum matching). Aforementioned tools and algorithms (SearchGui, X!Tandem, OMSSA, and PeptideShaker) are freely available as open source.
Facebook
TwitterThe Global Proteome Machine Organization was set up so that scientists involved in proteomics using tandem mass spectrometry could use that data to analyze proteomes. The projects supported by the GPMO have been selected to improve the quality of analysis, make the results portable and to provide a common platform for testing and validating proteomics results. The Global Proteome Machine Database was constructed to utilize the information obtained by GPM servers to aid in the difficult process of validating peptide MS/MS spectra as well as protein coverage patterns. This database has been integrated into GPM server pages, allowing users to quickly compare their experimental results with the best results that have been previously observed by other scientists.
Facebook
TwitterA database of mitochondrial proteomics data. It includes two sets of proteins: the MitoMiner Reference Set, which has 10477 proteins from 12 species; and MitoCarta, which has 2909 proteins from mouse and human mitochondrial proteins. MitoMiner provides annotation from the Gene Ontology (GO) and UniProt databases. This reference set contains all proteins that are annotated by either of these resources as mitochondrial in any of the species included in MitoMiner. MitoMiner data via is available via Application Programming Interface (API). The client libraries are provided in Perl, Python, Ruby and Java.
Facebook
TwitterDatabase for the identification of the human proteome and its use across the scientific community. Users can browse proteins and chromosomes and contribute to the data repository.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This work presents the proteome database of Serratia sp. FGI94 by parsing its annotated proteome downloaded from UniProt. Comprising protein name, amino acid sequence, number of residues, molecular weight, and nucleotide sequence of each protein in the proteome, the resource should be useful in gaining a deeper understanding of the metabolism of the organism.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pantoea agglomerans is a Gram-negative bacterium commonly found in soil, seeds, fruit, plant surfaces and in human and animal faeces. More interestingly, it could be found throughout a honeybee’s environment. This work presents the proteome database of P. agglomerans Eh318 by parsing the annotated proteome of the bacterium downloaded from UniProt. Comprising protein name, amino acid sequence, number of residues, molecular weight, and nucleotide sequence of each protein in the bacterium’s proteome, the resource should find use in a variety of biological workflows, particularly in functional genomics seeking to understand the mechanisms underlying specific cellular processes.
Facebook
TwitterIntegrated proteome resources center in China to accelerate data sharing in proteomics. Composed of data submission system and proteome database. Submission system is established under the guidance of data-sharing policy made by ProteomeXchange consortium. Registered users can submit their proteomic datasets to iProX in public or private modes. Once associated manuscript has been published, dataset becomes automatically public.
Facebook
TwitterDatabase containing several body fluid proteomes, including plasma, urine, and cerebrospinal fluid. Cell lines have been mapped to a depth of several thousand proteins and the red blood cell proteome has also been analyzed in depth. The liver proteome is represented with 3200 proteins. By employing high resolution MS and stringent validation criteria, false positive identification rates in MAPU are lower than 1:1000. Thus MAPU datasets can serve as reference proteomes in biomarker discovery. MAPU contains the peptides identifying each protein, measured masses, scores and intensities using a clickable interface of cell or body parts. Proteome data can be queried across proteomes by protein name, accession number, sequence similarity, peptide sequence and annotation information. More than 4500 mouse and 2500 human proteins have already been identified in at least one proteome. Basic annotation information and links to other public databases are provided in MAPU and we plan to add further analysis tools.