Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The CETSA and Thermal Proteome Profiling (TPP) analytical methods are invaluable for the study of protein–ligand interactions and protein stability in a cellular context. These tools have increasingly been leveraged in work ranging from understanding signaling paradigms to drug discovery. Consequently, there is an important need to optimize the data analysis pipeline that is used to calculate protein melt temperatures (Tm) and relative melt shifts from proteomics abundance data. Here, we report a user-friendly analysis of the melt shift calculation workflow where we describe the impact of each individual calculation step on the final output list of stabilized and destabilized proteins. This report also includes a description of how key steps in the analysis workflow quantitatively impact the list of stabilized/destabilized proteins from an experiment. We applied our findings to develop a more optimized analysis workflow that illustrates the dramatic sensitivity of chosen calculation steps on the final list of reported proteins of interest in a study and have made the R based program Inflect available for research community use through the CRAN repository [McCracken, N. Inflect: Melt Curve Fitting and Melt Shift Analysis. R package version 1.0.3, 2021]. The Inflect outputs include melt curves for each protein which passes filtering criteria in addition to a data matrix which is directly compatible with downstream packages such as UpsetR for replicate comparisons and identification of biologically relevant changes. Overall, this work provides an essential resource for scientists as they analyze data from TPP and CETSA experiments and implement their own analysis pipelines geared toward specific applications.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and scripts accompanying the paper Standardised workflow for mass spectrometry-based single-cell proteomics data analysis using scp.
These file descriptions are also available in the README.txt file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
UniSpec is a comprehensive DL spectrum predictor that can predict the intensity of the entire HCD MS/MS fragment ion series, going beyond existing tools limited to b/y ion series.
All datasets developed for UniSpec model are shared on Zenodo as part of the UniSpec publication, "UniSpec: Deep Learning for Predicting Comprehensive Peptide Fragment Ion Series to Improve Peptide-Spectrum Matches from Shotgun Proteomics Experiments".
This includes UniSpec datasets, downstream evaluation and analysis, and application case studies.
1. pre-processed training, evaluation and testing data for machine learning;
UniSpec-Datasets.7z, Readme_UniSpecDatasets.txt
2. Streamlined input datasets based on the fragmentation dictionary;
Streamlined_inputdatasets.7z, Readme_Streamlined_inputdatasets.txt
3. Predictions on the validation and test sets;
UniSpecPred_Validation-Test.7z, Readme_Predictons_ValidationTest.txt
4. Evaluation by comparison with Prosit;
a. Predictions: prosit_and_unispec_predictions.7z, Readme_prosit_and_unispec_predictions.txt
b. Cosine similarity scores: prosit_vs_unispec_CS.7z, Readme_prosit_vs_unispec_CS.txt
5. CSS for Different HCD Fragment Ion Series;
CS_for_ion_splits.tsv
6. Application 1: PSM rescoring;
PSM rescoring_zipfiles.7z, PSM rescoring_readme.txt
7. Application 2: In-silico spectral library search
in-silico_librarysearch.7z, in-silico_librarysearch_readme.txt
Limited proteolysis coupled with mass spectrometry (LiP-MS) has emerged as a powerful technique for detecting protein structural changes and drug-protein interactions on a proteome-wide scale. However, there is no consensus on the best quantitative proteomics workflow for analyzing LiP-MS data. In this study, we comprehensively benchmarked two major quantification approaches—data-independent acquisition (DIA) and tandem mass tag (TMT) isobaric labeling—in combination with LiP-MS, using a drug-target deconvolution assay as a model system. Our results show that while TMT labeling enabled the quantification of more peptides and proteins with lower coefficients of variation (CVs), DIA-MS exhibited greater accuracy in identifying true drug targets and stronger dose-response correlation in protein targets peptides. Additionally, we evaluated the performance of freely available (FragPipe) versus commercial (Spectronaut) software tools for DIA-MS analysis, revealing that the choice between precision (FragPipe) and sensitivity (Spectronaut) largely depends on the specific experimental context. Our findings underscore the importance of selecting the appropriate LiP-MS quantification strategy based on the study objectives. This work provides valuable guidelines for researchers in structural proteomics and drug discovery, and highlights how advancements in mass spectrometry instrumentation, such as the Astral mass spectrometer, may further improve sensitivity and protein sequence coverage, potentially reducing the need for TMT labeling.
The analysis of single cell proteomes has recently become a viable complement to transcript and genomics studies. Proteins are the main driver of cellular functionality and mRNA levels are often an unreliable proxy of such. Therefore, the global analysis of the proteome is essential to study cellular identities. Both multiplexed and label-free mass spectrometry-based approaches with single cell resolution have lately attributed surprising heterogeneity to believed homogenous cell populations. Even though specialized experimental designs and instrumentation have demonstrated remarkable advances, the efficient sample preparation of single cells still lacks behind. Here, we introduce the proteoCHIP, a universal option for single cell proteomics sample preparation at surprising sensitivity and throughput. The automated processing using a commercial system combining single cell isolation and picoliter dispensing, the cellenONE®, allows to reduce final sample volumes to low nanoliters submerged in a hexadecane layer simultaneously eliminating error prone manual sample handling and overcoming evaporation. With this specialized workflow we achieved around 1,000 protein groups per analytical run at remarkable reporter ion signal to noise while reducing or eliminating the carrier proteome. We identified close to 2,000 protein groups across 158 multiplexed single cells from two highly similar human cell types and clustered them based on their proteome. In-depth investigation of regulated proteins readily identified one of the main drivers for tumorigenicity in this cell type. Our workflow is compatible with all labeling reagents, can be easily adapted to custom workflows and is a viable option for label-free sample preparation. The specialized proteoCHIP design allows for the direct injection of label-free single cells via a standard autosampler resulting in the recovery of 30% more protein groups compared to samples transferred to PEG coated vials. We therefore are confident that our versatile, sensitive, and automated sample preparation workflow will be easily adoptable by non-specialized groups and will drive biological applications of single cell proteomics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source data of "Benchmarking commonly used software suites and analysis workflows for DIA proteomics and phosphoproteomics".
For reproduction of main and supplementary figures.
MS raw files, spectral libraries, and MS data search results are stored in iProX with identifier IPX0004576001.
Test data for The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g. PacBio, Oxford Nanopore) provides full-length transcript sequencing, which can be used to predict full-length proteins. Here, we describe a long-read proteogenomics approach for integrating matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data in protein inference to enable detection of protein isoforms that are intractable to MS detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis.
Companion Repositories:
Companion Datasets
This Repository contains the test data, specifically:
TEST Data for Long-Read-Proteogenomics Workflow GitHub Actions
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. Proteomic analysis for each CPTAC study is carried out independently by Proteomic Characterization Centers (PCCs) using a variety of protein fractionation techniques, instrumentation, and workflows. Mass spectrometry and related data files are organized into datasets by study, sub-proteome, and analysis site.
Proteomic workflows generate vastly complex peptide mixtures that are analyzed by liquid chromatography–tandem mass spectrometry, creating thousands of spectra, most of which are chimeric and contain fragment ions from more than one peptide. Because of differences in data acquisition strategies such as data-dependent, data-independent or parallel reaction monitoring, separate software packages employing different analysis concepts are used for peptide identification and quantification, even though the underlying information is principally the same. Here, we introduce CHIMERYS, a spectrum-centric search algorithm designed for the deconvolution of chimeric spectra that unifies proteomic data analysis. Using accurate predictions of peptide retention time, fragment ion intensities and applying regularized linear regression, it explains as much fragment ion intensity as possible with as few peptides as possible. Together with rigorous false discovery rate control, CHIMERYS accurately identifies and quantifies multiple peptides per tandem mass spectrum in data-dependent, data-independent or parallel reaction monitoring experiments.
Positional proteomics methodologies have transformed protease research and have brought mass spectrometry (MS)-based degradomics studies to the forefront of protease characterization and system-wide interrogation of protease signaling. Considerable advancements in sensitivity and throughput of liquid chromatography (LC)-MS/MS instrumentation enable generation of enormous positional proteomics datasets of protein termini and neo-termini of cleaved protease substrates. However, such progress has not been observed to the same extent in data analysis and post-processing steps, which arguably constitute the largest bottleneck in positional proteomics workflows. Here, we present a computational tool, CLIPPER 2.0, that builds on prior algorithms developed for MS-based protein termini analysis, facilitating peptide level annotation and data analysis. CLIPPER 2.0 can be used with several sample preparation workflows and proteomics search algorithms, and enables fast and automated database information retrieval, statistical and network analysis, as well as visualization of terminomic datasets. We demonstrate our tool by analyzing GluC and MMP9 cleavages in HeLa lysates. CLIPPER 2.0 is available at https://github.com/UadKLab/CLIPPER-2.0.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Precursor intensity-based label-free quantification software tools for proteomic and multi-omic analysis within the Galaxy Platform.
ABRF: Data was generated through the collaborative work of the ABRF Proteomics Research Group (https://abrf.org/research-group/proteomics-research-group-prg). See Reference for details: Van Riper, S. et al. ‘An ABRF-PRG study: Identification of low abundance proteins in a highly complex protein sample’ at the 64th Annual Conference of American Society of Mass Spectrometry and Allied Topics" at San Antonio, TX."
UPS: MaxLFQ Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014 Sep;13(9):2513-26. doi: 10.1074/mcp.M113.031591. Epub 2014 Jun 17. PubMed PMID: 24942700; PubMed Central PMCID: PMC4159666;
PRIDE #5412; ProteomeXchange repository PXD000279: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2014/09/PXD000279
Clinical BALF samples are rich in biomolecules, including proteins, and useful for molecular studies of lung health and disease. However, MS based proteomic analysis of BALF is impeded by the dynamic range of protein abundance, and potential for interfering contaminants. We have developed a workflow that eliminates these challenges. By combining high abundance protein depletion, protein trapping, clean-up, and in-situ tryptic digestion, our workflow is compatible with both qualitative and quantitative MS-based proteomic analysis. The workflow includes collection of endogenous peptides for peptidomic analysis of BALF, if desired, as well as amenability to offline semi-preparative or microscale fractionation of peptide mixtures prior to LC-MS/MS analysis, for increased depth of analysis. We show the effectiveness of this workflow on BALF samples from COPD patients. Overall, our workflow should allow MS-based proteomics to be applied to a wide variety of studies focused on BALF clinical samples.
The growing field of urinary proteomics has provided a promising opportunity to identify biomarkers useful for diagnosis and prognostication of a number of diseases. Urine is abundant and readily collected in a non-invasive manner which makes it eminently available for testing, however sample preparation for proteomics analysis remains one of the biggest challenges in this field. As newer technologies to analyze tandem mass spectrometry (MS) data develop, utility of urinary proteomics would be enhanced by better sample processing and separation workflows to generate fast, reproducible, and more in-depth proteomics data. In this study, we have evaluated the performance of four sample preparation methods: MStern, PreOmics In-StageTip (iST), Suspension-trapping (S-Trap), and conventional urea In-Solution trypsin hydrolysis for non-depleted urine samples. Data Dependent Acquisition (DDA) mode on QE-HF was used for single-shot label-free data acquisition. Our results demonstrate a high degree of reproducibility within each workflow. PreOmics iST yields the best digestion efficiency with lowest percentage of missed-cleavage peptides. S-trap workflow gave the greatest number of peptide and protein identifications. Using S-trap method, with 0.5mL urine sample as starting material, we identify ~1500 protein groups and ~17700 peptides from DDA analysis with a single injection. The continued refinement of sample preparation (presented here), LC separation methods and mass spectrometry data acquisition can allow the integration of information rich urine proteomic records across large cohorts with other ‘big data’ initiatives becoming more popular in medical research.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Proteogenomics combines large-scale genomic and transcriptomic data with mass-spectrometry-based proteomic data to discover novel protein sequence variants and improve genome annotation. In contrast with conventional proteomic applications, proteogenomic analysis requires a number of additional data processing steps. Ideally, these required steps would be integrated and automated via a single software platform offering accessibility for wet-bench researchers as well as flexibility for user-specific customization and integration of new software tools as they emerge. Toward this end, we have extended the Galaxy bioinformatics framework to facilitate proteogenomic analysis. Using analysis of whole human saliva as an example, we demonstrate Galaxy’s flexibility through the creation of a modular workflow incorporating both established and customized software tools that improve depth and quality of proteogenomic results. Our customized Galaxy-based software includes automated, batch-mode BLASTP searching and a Peptide Sequence Match Evaluator tool, both useful for evaluating the veracity of putative novel peptide identifications. Our complex workflow (approximately 140 steps) can be easily shared using built-in Galaxy functions, enabling their use and customization by others. Our results provide a blueprint for the establishment of the Galaxy framework as an ideal solution for the emerging field of proteogenomics.
Quantitative proteomics generates large datasets with increasing depth and quantitative information. Even after data processing and statistical analysis, interpreting the results and relating their significance back to the system of study remains challenging. Often, this process is performed by scientists with expertise in their field, but limited experience in proteomic or phosphoproteomic analysis. We developed a set of tools for simple, interactive exploration of phosphoproteomics data that can be easily interpreted into biological knowledge. These tools are designed to expedite the processes of reviewing raw data from statistical output, identifying and verifying enriched sequence motifs, and viewing the data from the perspective of functional pathways. Here, we present the workflow and demonstrate its functionality by analyzing a phosphoproteomic data set from two lymphoma cell lines treated with kinase inhibitors.
Proteins in milk have been studied for decades and proteomics, peptidomics, and glycoproteomics are the main approaches previously deployed to decipher the proteome of human milk. In the present work, we aimed at implementing a highly automated pipeline for the proteomic analysis of human milk with liquid chromatography (LC) mass spectrometry (MS). Commercial human milk samples were used to evaluate and optimize workflows.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*Methods that allow the use of covariates.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Direct infusion shotgun proteome analysis (DISPA) is a new paradigm for expedited mass spectrometry-based proteomics, but the original data analysis workflow was onerous. Here, we introduce CsoDIAq, a user-friendly software package for the identification and quantification of peptides and proteins from DISPA data. In addition to establishing a complete and automated analysis workflow with a graphical user interface, CsoDIAq introduces algorithmic concepts to spectrum-spectrum matching to improve peptide identification speed and sensitivity. These include spectra pooling to reduce search time complexity and a new spectrum–spectrum match score called match count and cosine, which improves target discrimination in a target-decoy analysis. Fragment mass tolerance correction also increased the number of peptide identifications. Finally, we adapt CsoDIAq to standard LC–MS DIA and show that it outperforms other spectrum–spectrum matching software.
Plasma proteomics holds immense potential for clinical research and biomarker discovery, serving as a non-invasive "liquid biopsy" for tissue sampling. Mass spectrometry (MS)-based proteomics, thanks to improvement in speed and robustness, emerges as an ideal technology for exploring the plasma proteome for its unbiased and highly specific protein identification and quantification. Despite its potential, plasma proteomics is still a challenge due to the vast dynamic range of protein abundance, hindering the detection of less abundant proteins. Different approaches can help overcome this challenge. Conventional depletion methods face limitations in cost, throughput, accuracy, and off-target depletion. Nanoparticle-based enrichment shows promise in compressing dynamic range, but cost remains a constraint. Enrichment strategies for extracellular vesicles (EVs) can enhance plasma proteome coverage dramatically, but current methods are still too laborious for large series. Neat plasma remains popular for its cost-effectiveness, time efficiency, and low volume requirement. We used a test set of 33 plasma samples for all evaluations. Samples were digested using Strap and analysed on Evosep and nanoElute coupled to a timsTOF Pro using different elution gradients and ion mobility ranges. Data were mainly analyzed using library-free searched using DIANN. This study explores ways to improve proteome coverage in neat plasma both in MS data acquisition and MS data analysis. We demonstrate the value of sampling smaller hydrophilic peptides, increasing chromatographic separation, and using library-free searches. Additionally, we introduce the EV boost approach, that leverages on the extracellular vesicle fraction to enhances protein identification in neat plasma samples. Globally, out optimized analysis workflow allows the quantification of over 1000 proteins in neat plasma with a 24SPD throughput. We believe that these considerations can be of help independently of the LC-MS platform used.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The CETSA and Thermal Proteome Profiling (TPP) analytical methods are invaluable for the study of protein–ligand interactions and protein stability in a cellular context. These tools have increasingly been leveraged in work ranging from understanding signaling paradigms to drug discovery. Consequently, there is an important need to optimize the data analysis pipeline that is used to calculate protein melt temperatures (Tm) and relative melt shifts from proteomics abundance data. Here, we report a user-friendly analysis of the melt shift calculation workflow where we describe the impact of each individual calculation step on the final output list of stabilized and destabilized proteins. This report also includes a description of how key steps in the analysis workflow quantitatively impact the list of stabilized/destabilized proteins from an experiment. We applied our findings to develop a more optimized analysis workflow that illustrates the dramatic sensitivity of chosen calculation steps on the final list of reported proteins of interest in a study and have made the R based program Inflect available for research community use through the CRAN repository [McCracken, N. Inflect: Melt Curve Fitting and Melt Shift Analysis. R package version 1.0.3, 2021]. The Inflect outputs include melt curves for each protein which passes filtering criteria in addition to a data matrix which is directly compatible with downstream packages such as UpsetR for replicate comparisons and identification of biologically relevant changes. Overall, this work provides an essential resource for scientists as they analyze data from TPP and CETSA experiments and implement their own analysis pipelines geared toward specific applications.