Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We analyzed the proteomic data and DIDAR filtered/QC'ed proteomic data from a recent study of 270 single human cells divided between control and sotorasib treatments. The data included here is the processed results using Proteome Discoverer 2.4 using the same search parameters. This data is in support of Figure 2 of Jenkins and Orsburn 2022.
The project contains raw and result files from a comparative proteomic analysis of malignant [primary breast tumor (PT) and axillary metastatic lymph nodes (LN)] and non-tumor [contralateral (NCT) and adjacent breast (ANT)] tissues of patients diagnosed with invasive ductal carcinoma. A label-free mass spectrometry was conducted using nano-liquid chromatography coupled to electrospray ionization–mass spectrometry (LC-ESI-MS/MS) followed by functional annotation to reveal differentially expressed proteins and their predicted impacts on pathways and cellular functions in breast cancer. A total of 462 proteins was observed as differentially expressed (DEPs) among the groups of samples analyzed. Ingenuity Pathway Analysis software version 2.3 (QIAGEN Inc.) was employed to identify the most relevant signaling and metabolic pathways, diseases, biological functions and interaction networks affected by the deregulated proteins. Upstream regulator and biomarker analyses were also performed by IPA’s tools. Altogether, our findings revealed differential proteomic profiles that affected the associated and interconnected cancer signaling processes.
The Proteome 2D-PAGE Database system for microbial research is a curated database for storing and investigating proteomics data. Software tools are available and for data submission, please contact the Database Curator. Established at the Max Plank Institution for Infection Biology, this system contains four interconnected databases: i.) 2D-PAGE Database: Two dimensional electrophoresis (2-DE) and mass spectrometry of diverse microorganisms and other organisms. This database currently contains 4971 identified spots and 1228 mass peaklists in 44 reference maps representing experiments from 24 different organisms and strains. The data were submitted by 84 Submitters from 24 Institutes and 12 nations. It also contains various software tools that are important in formatting and analyzing gels and mass peaks; software include: *TopSpot: Scanning the gel, editing the spots and saving the information *Fragmentation: Fragmentation of the gel image into sections *MS-Screener: Perl script to compare the similarity of MALDI-PMF peaklists *MS-Screener update: MS-Screener can be used to compare mass spectra (MALDI-MS(/MS) as well as ESI-MS/MS spectra) on the basis of their peak lists (.dta, .pkm, .pkt, or .txt files), to recalibrate mass spectra, to determine and eliminate exogenous contaminant peaks, and to create matrices for cluster analyses. *GelCali: Online calibration of the Mr- and pI-axis of 2-DE gels with mathematical regression methods ii.)Isotope Coded Affinity Tag (ICAT)-LC/MS database: Isotope Coded Affinity Tag (ICAT)-LC/MS data for Mycobacterium tuberculosis strain BCG versus H37Rv. iii.) FUNC_CLASS database: Functional classification of diverse microorganism. This database also integrates genomic, proteomic, and metabolic data. iv.) DIFF database: Presentation of differently regulated proteins obtained by comparative proteomic experiments using computerized gel image analysis.
The dataset include the following figures and tables: 1)Changes in protein expression of the 14 pathway regulators induced by Ni (II). 2)Hierarchical clustering of 12 differentially expressed or phosphorylated proteins in BEAS-2B cells treated with Ni (II). 3) Relative cell survival (X-axis) vs. protein expression or phosphorylation levels (Y-axis) in BEAS-2B control cells treated with Ni (II) at 4 different concentrations 4)Four representative proteins, PDIA1, ACADM, RUVBL1, PRDX2 identified using 2-DE profiling were either increased or decreased in a concentration responsive manner 5)Networks of proteins showing inter-relationships and pathways which was obtained using IPA 6)Schematic representation of the interplay of the core proteins and cytotoxicity pathways mediated by Ni (II). 7) some supplementary data. This dataset is associated with the following publication: Ge , Y., M. Bruno , N. Coates , K. Wallace , D. Andrews , A. Swank , W. Winnik , and J. Ross. Proteomic Assessment of Biochemical Pathways That Are Critical to Nickel-Induced Toxicity Responses in Human Epithelial Cells. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 11(9): 1-20, (2016).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proteomics data
OPD is a public database for storing and disseminating mass spectrometry based proteomics data. It covers Escherichia coli, Homo sapiens, Saccharomyces cerevisiae, Mycobacterium smegmatis, and Mus musculus. The database currently contains roughly 3,000,000 spectra representing experiments from these 5 different organisms. The mirror url is provided below as the OPD website is no longer functional (http://bioinformatics.icmb.utexas.edu/OPD/).
MGVB is a collection of tools for proteomics data analysis. It covers data processing from in silico digestion of protein sequences to comprehensive identification of postranslational modifications and solving the protein inference problem. The toolset is developed with efficiency in mind. It enables analysis at a fraction of the resources cost typically required by existing commercial and free tools. MGVB, as it is a native application, is much faster than existing proteomics tools such as MaxQuant and MSFragger and, in the same time, finds very similar, in some cases even larger number of peptides at a chosen level of statistical significance. It implements a probabilistic scoring function to match spectra to sequences, and a novel combinatorial search strategy for finding post-translational modifications, and a Bayesian approach to locate modification sites. This report describes the algorithms behind the tools, presents benchmarking data sets analysis comparing MGVB performance to MaxQuant/Andromeda, and provides step by step instructions for using it in typical analytical scenarios. The toolset is provided free to download and use for academic research and in software projects, but is not open source at the present. It is the intention of the author that it will be made open source in the near future—following rigorous evaluations and feedback from the proteomics research community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this study discovery proteomics data was generated from five tissues (salivary gland, crop, digestive gland, style sac, and intestine) from the golden apple snail, Pomacea canaliculata. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) was applied to study the proteome of the five tissues. Proteins with homology to proteases were selected for relative quantitation. This collection includes IDA-MS data, ProteinPilot Reports from the IDA data, ProteinPilot reports filter, LC-MRM-MS data, R scripts used for the analysis. Lineage: The data was acquire using a liquid chromatograph-mass spectrometer (Eksigent nanoLC 415 - SCIEX 6600 QqTOF). tudies of relative quantitation were performed in Shimadzu Nexera UHPLC and analysed with a 6500 QTRAP mass spectrometer (SCIEX).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel file lists the samples uploaded in PRIDE. The table “Table Sorted PP and Replicates” in the Excel file has all the relevant annotation.
There are more than the expected 168 samples in the PRIDE upload for the following reasons:
First, all of the measurements from the experiment had been uploaded, including files for measurements that were repeated because of problems during the MS run. These samples are not annotated in the table. Second, we had included 4 Gold Standard samples (2 replicates on each of the two large gels used to process all samples). These 4 gold standard samples in 7 fractions explain 28 extra samples. Third, we did not have 168 but 166 samples in the photoperiod set. Fractions 1 and 2 of sample 43 (Photoperiod 2, bio replicate 1, tech. replicate 2) were lost during sample preparation. While the remaining fractions were measured and are included in the PRIDE upload and the table, this sample was not used in the data analysis. Photoperiod 2 bio rep. 1 was only used with one technical replicate in the calculations.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Isobaric labeling-based proteomics is widely applied in deep proteome quantification. Among the platforms for isobaric labeled proteomic data analysis, the commercial software Proteome Discoverer (PD) is widely used, incorporating the search engine CHIMERYS, while FragPipe (FP) is relatively new, free for noncommercial purposes, and integrates the engine MSFragger. Here, we compared PD and FP over three public proteomic data sets labeled using 6plex, 10plex, and 16plex tandem mass tags. Our results showed the protein abundances generated by the two software are highly correlated. PD quantified more proteins (10.02%, 15.44%, 8.19%) than FP with comparable NA ratios (0.00% vs. 0.00%, 0.85% vs. 0.38%, and 11.74% vs. 10.52%) in the three data sets. Using the 16plex data set, PD and FP outputs showed high consistency in quantifying technical replicates, batch effects, and functional enrichment in differentially expressed proteins. However, FP saved 93.93%, 96.65%, and 96.41% of processing time compared to PD for analyzing the three data sets, respectively. In conclusion, while PD is a well-maintained commercial software integrating various additional functions and can quantify more proteins, FP is freely available and achieves similar output with a shorter computational time. Our results will guide users in choosing the most suitable quantification software for their needs.
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. Proteomic analysis for each CPTAC study is carried out independently by Proteomic Characterization Centers (PCCs) using a variety of protein fractionation techniques, instrumentation, and workflows. Mass spectrometry and related data files are organized into datasets by study, sub-proteome, and analysis site.
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Bottum up proteomics (BUP) is a powerful analytical technique that involves digesting complex protein mixtures into peptides and analyzing them with liquid chromatography and tandem mass spectrometry to identify and quantify many proteins simultaneously. This produces massive multidimensional datasets which require informatics tools to analyze. The landscape of software tools for BUP analysis is vast and complex, and often custom programs and scripts are required to answer biological questions of interest in any given experiment.
This dissertation introduces novel methods and tools for analyzing BUP experiments and applies those methods to new samples. First, PrIntMap-R, a custom application for intraprotein intensity mapping, is developed and validated. This application is the first open-source tool to allow for statistical comparisons of peptides within a protein sequence along with quantitative sequence coverage visualization. Next, innovative sample preparation techniques and informatics methods are applied to characterize MUC16, a key ovarian cancer biomarker. This includes the proteomic validation of a novel model of MUC16 differing from the dominant isoform reported in literature. Shifting to bacterial studies, custom differential expression workflows are employed to investigate the role of virulence lipids in mycobacterial protein secretion by analyzing mutant strains of mycobacteria. This work links lipid presence and virulence factor secretion for the first time. Building on these efforts, OnePotN??TA, a labeling technique enabling quantification of N-terminal acetylation in mycobacterial samples, introduced. This method is the first technique to simultaneously quantify protein and N-terminal acetylation abundance using bottom-up proteomics, advancing the field of post-translational modification quantification. This project resulted in the identification of 37 new putative substrates for an N-acetyltransferase, three of which have since been validated biochemically. These tools and methodologies are further applied to various biological research areas, including breast cancer drug characterization and insect saliva analysis to perform the first proteomic studies of their kind with these respective treatments and samples. Additionally, a project focused on teaching programming skills relevant to analytical chemistry is presented. Collectively, this work enhances the analytical capabilities of bottom-up proteomics, providing novel tools and methodologies that advance protein characterization, post-translational modification analysis, and biological discovery across diverse research areas.
This project contains raw data, intermediate files and results used to create the integrated map of protein expression in human cancer (including data from cell lines and tumours). The map is based on joint reanalysis of 11 large-scale quantitative proteomics studies. The datasets were primarily retrieved from the PRIDE database, as well as MassIVE database and CPTAC data portal. The raw files were manually curated in order to capture mass spectrometry acquisition parameters, experimental design and sample characteristics. The raw files were jointly processed with MaxQuant computational platform using standard settings (see Data Processing Protocol). Due to size of the data, the processing was done in two batches denoted as “celllines” and “tumours” analysis. In total, using a 1% peptide spectrum match and protein false discovery rates, the analysis allowed identification of 21,580 protein groups in the cell lines dataset (MQ search results available in ‘txt-celllines’ folder), and 13,441 protein groups in the tumours dataset (MQ search results available in ‘txt-tumours’ folder).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proteomics data processed by Sanger: The metadata is downloaded from https://cellmodelpassports.sanger.ac.uk/downloads, Proteomics_20221214.zip, proteomics_all_20221214.csv
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article contains consolidated proteomic data obtained from xylem sap collected from tomato plants grown in Fe- and Mn-sufficient control, as well as Fe-deficient and Mn-deficient conditions. Data presented here cover proteins identified and quantified by shotgun proteomics and Progenesis LC-MS analyses: proteins identified with at least two peptides and showing changes statistically significant (ANOVA; p ≤ 0.05) and above a biologically relevant selected threshold (fold ≥ 2) between treatments are listed. The comparison between Fe-deficient, Mn-deficient and control xylem sap samples using a multivariate statistical data analysis (Principal Component Analysis, PCA) is also included. Data included in this article are discussed in depth in "Effects of Fe and Mn deficiencies on the protein profiles of tomato (Solanum lycopersicum) xylem sap as revealed by shotgun analyses", Ceballos-Laita et al., J. Proteomics, 2018. This dataset is made available to support the cited study as well to extend analyses at a later stage. Resources in this dataset:Resource Title: ProteomeExchange submission PXD007517. Xylem sap shotgun proteomics from Fe- and Mn-deficient and Mn-toxic tomato plants. . File Name: Web Page, url: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD007517 The MS proteomics data have been deposited to the ProteomeXchange Consortium via the Pride partner repository with the data set identifier PXD007517. Also includes FTP location. Files available at https://www.ebi.ac.uk/pride/archive/projects/PXD007517 via HTML, FTP, or Fast (Aspera) download : 1 SEARCH.xml file, 1 Peak file, 24 RAW files, 1 Mascot information.xlsx file. Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.dib.2018.01.034
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The mass spectrometry proteomics data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present an update of the MaxQuant software for isobaric labeling data and evaluate its performance on benchmark datasets. Impurity correction factors can be applied to labels mixing C- and N-type reporter ions, such as TMT Pro. Application to a single-cell species mixture benchmark shows high accuracy of the impurity-corrected results. TMT data recorded with FAIMS separation can be analyzed directly in MaxQuant without splitting the raw data into separate files per FAIMS voltage. Weighted median normalization, is applied to several datasets, including large-scale human body atlas data. In the benchmark datasets the weighted median normalization either removes or strongly reduces the batch effects between different TMT plexes and results in clustering by biology. In datasets including a reference channel, we find that weighted median normalization performs as well or better when the reference channel is ignored and only the sample channel intensities are used, suggesting that the measurement of a reference channel is unnecessary when using weighted median normalization in MaxQuant. We demonstrate that MaxQuant including the weighted median normalization performs well on multi-notch MS3 data, as well as on phosphorylation data.
Data Summary: Each folder contains MaxQuant output tables used for data analysis with their respectively mqpar files. Please use the MaxQuant version specified in each dataset to open mqpar files. Perseus sessions are provided when Perseus was used for downstream analyses. Please use Perseus version Perseus version 2.1.2 to load the sessions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files for users of the workflow "A Bioconductor workflow for processing, evaluating and interpreting expression proteomics data". Files include Proteome Discoverer (v2.5) processing and consensus workflows for both TMT and LFQ expression proteomics data. Also provided are the output .txt files of a corresponding Proteome Discoverer identification search, as required for users to follow the workflow themselves. For raw data please refer to PRIDE. Appendix is provided as a PDF.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Label-free quantitative methods are advantageous in bottom-up (shotgun) proteomics because they are robust and can easily be applied to different workflows without additional cost. Both label-based and label-free approaches are routinely applied to discovery-based proteomics experiments and are widely accepted as semiquantitative. Label-free quantitation approaches are segregated into two distinct approaches: peak-abundance-based approaches and spectral counting (SpC). Peak abundance approaches like MaxLFQ, which is integrated into the MaxQuant environment, require precursor peak alignment that is computationally intensive and cannot be routinely applied to low-resolution data. Not limited by these constraints, SpC approaches simply use the number of peptide identifications corresponding to a given protein as a measurement of protein abundance. We show here that spectral counts from multidimensional proteomic data sets have a mean-dispersion relationship that can be modeled in edgeR. Furthermore, by simulating spectral counts, we show that this approach can routinely be applied to large-scale discovery proteomics data sets to determine differential protein expression.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We analyzed the proteomic data and DIDAR filtered/QC'ed proteomic data from a recent study of 270 single human cells divided between control and sotorasib treatments. The data included here is the processed results using Proteome Discoverer 2.4 using the same search parameters. This data is in support of Figure 2 of Jenkins and Orsburn 2022.