Data dependent acquisition (DDA) is the method of choice for mass spectrometry based proteomics discovery experiments, data-independent acquisition (DIA) is steadily becoming more important. One of the most important requirement to perform a DIA analysis is the availability of spectral libraries for the peptide identification and quantification. Several researches were already conducted regarding the creation of spectral libraries from DDA analyses and obtaining identifications with these in DIA measurements. But so far only few experiments were conducted, to estimate the effect of these libraries on the quantitative level. In this work we created a spike-in gold standard dataset with known contents and ratios of proteins in a complex sample matrix. With this dataset, we first created spectral libraries using different sample preparation approaches with and without sample prefractionation on peptide and protein level. Two different search engines were used for protein identification. In total, five different spike-in states were compared with DIA analyses, comparing eight different spectral libraries generated by varying approaches and one library free method, as well as one default DDA analysis. Not only the number of identifications on peptide and protein level in the spectral libraries and the corresponding analyses was inspected, but also the number of expected and identified significant quantifications and their ratios were thoroughly examined. We found, that while libraries of prefractionationed samples are generally larger, the actually yielded identifications are not increased compared to repetitive non-fractionated measurements. Furthermore, we show that the accuracy of the quantifications is also highly dependent on the applied spectra library and also whether the peptide or protein level is analysed. Overall, the reproducibility and accuracy of DIA is superior to DDA in all analysed approaches.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Data-independent acquisition (DIA) is a promising technique for the proteomic analysis of complex protein samples. A number of studies have claimed that DIA experiments are more reproducible than data-dependent acquisition (DDA), but these claims are unsubstantiated since different data analysis methods are used in the two methods. Data analysis in most DIA workflows depends on spectral library searches, whereas DDA typically employs sequence database searches. In this study, we examined the reproducibility of the DIA and DDA results using both sequence database and spectral library search. The comparison was first performed using a cell lysate and then extended to an interactome study. Protein overlap among the technical replicates in both DDA and DIA experiments was 30% higher with library-based identifications than with sequence database identifications. The reproducibility of quantification was also improved with library search compared to database search, with the mean of the coefficient of variation decreasing more than 30% and a reduction in the number of missing values of more than 35%. Our results show that regardless of the acquisition method, higher identification and quantification reproducibility is observed when library search was used.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Tandem mass spectrometry (MS/MS) is an invaluable experimental tool for providing analytical data supporting the identification of small molecules and peptides in mass-spectrometry-based “omics” experiments. Data-dependent MS/MS (DDA) is a real-time MS/MS-acquisition strategy that is responsive to the signals detected in a given sample. However, in analysis of even moderately complex samples with state-of-the-art instrumentation, the speed of MS/MS acquisition is insufficient to offer comprehensive MS/MS coverage of all detected molecules. Data-independent approaches (DIA) offer greater MS/MS coverage, typically at the expense of selectivity or sensitivity. This report describes data-set-dependent MS/MS (DsDA), a novel integration of MS1-data processing and target prioritization to enable comprehensive MS/MS sampling during the initial MS-level experiment. This approach is guided by the premise that in omics experiments, individual injections are typically made as part of a larger set of samples, and feedback between data processing and data acquisition can allow approximately real-time optimization of MS/MS-acquisition parameters and nearly complete MS/MS-sampling coverage. Using a combination of R, Proteowizard, XCMS, and WRENS software, this concept was implemented on a liquid-chromatograph-coupled quadrupole time-of-flight mass spectrometer. The results illustrate comprehensive MS/MS coverage for a set of complex small-molecule samples and demonstrate a strong improvement on traditional DDA.
We generated two comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent and Data-Indepentdent Acquisition modes. This dataset contain a balanced two-class design (cell lines: A549 vs K562), allowing for investigating mixed effects from class, batch and acquisition method. Investigators can also compare and integrate DDA and DIA platforms, delve into the various patterns and mechanisms of missing values, benchmark batch effects correction algorithms and assess confounding between different technical issues.
This dataset consists of 44 raw MS files, comprising 27 DIA (SWATH) and 15 DDA runs on a TripleTOF 5600 and of two raw mass spectrometry files acquired on a Q Exactive. The composition of the dataset is described in the manuscript by Tsou et al., titled: "DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics", Nature Methods, in press Raw files are deposited here in ProteomeXchange and are associated with the DIA-Umpire processed data. All DIA-Umpire processed results for each sample together with DDA results are deposited in separated folders. Also see the "DataSampleID.xlsx" associated with this Readme file. Internal reference from the Gingras lab ProHits implementation: Project 94, Export version VS2 (Tsou_DIA-Umpire)
Data dependent acquisition (DDA) is the method of choice for mass spectrometry based proteomics discovery experiments, data-independent acquisition (DIA) is steadily becoming more important. One of the most important requirement to perform a DIA analysis is the availability of spectral libraries for the peptide identification and quantification. Several researches were already conducted regarding the creation of spectral libraries from DDA analyses and obtaining identifications with these in DIA measurements. But so far only few experiments were conducted, to estimate the effect of these libraries on the quantitative level. In this work we created a spike-in gold standard dataset with known contents and ratios of proteins in a complex sample matrix. With this dataset, we first created spectral libraries using different sample preparation approaches with and without sample prefractionation on peptide and protein level. Two different search engines were used for protein identification. In total, five different spike-in states were compared with DIA analyses, comparing eight different spectral libraries generated by varying approaches and one library free method, as well as one default DDA analysis. Not only the number of identifications on peptide and protein level in the spectral libraries and the corresponding analyses was inspected, but also the number of expected and identified significant quantifications and their ratios were thoroughly examined. We found, that while libraries of prefractionationed samples are generally larger, the actually yielded identifications are not increased compared to repetitive non-fractionated measurements. Furthermore, we show that the accuracy of the quantifications is also highly dependent on the applied spectra library and also whether the peptide or protein level is analysed. Overall, the reproducibility and accuracy of DIA is superior to DDA in all analysed approaches.
Source code and example dataset for LipidMS v3.0.3: a commercially available pooled human serum sample was analyzed in positive and negative detection modes and using MS1, DIA and DDA approaches. The obtained datasets were processed using LipidMS v3.0, MS-DIAL v4.80 or a combination of data pre-processing in XCMS v3.16 and lipid annotation in LipidMS v3.0. This repository contains: - Raw data for positive and negative polarities using MS scan, DIA and DDA acquisition modes. - R scripts for processing with LipidMS v3.0.3 and XCMS v3.16.1 and parameters used for processing with MS-DIAL v4.80. - Source code for LipidMS v3.0.3. - Results obtained for the 3 different softwares employed. - Tutorials for LipidMS R package and online application. - Human pooled serum analysis Raw data for positive and negative polarities using MS scan, DIA and DDA acquisition modes for a human pooled serum sample with or without the addition of 68 lipid standars Results for the data processing and annotation of the lipid standards using LipidMS 3.0, XCMS 3.16 and MS-DIAL 4.80 Results for the manual curation of the total lipid annotations provided by both LipidMS 3.0 and MS-DIAL 4.80
In Dendritic cells (DC), the MHC II eluted immunopeptidome reflects the antigenic composition of the microenvironment. Proteins are transported and processed into peptides in endosomal MHC II compartments through autophagy or phagocytosis; extracellular peptides can also directly bind MHC II proteins at the cell surface. Altogether, these mechanisms allow DC to sample both the intra and extracellular environment. With an increase in mass spectrometry sensitivity and accuracy, we can now finally tackle important questions on the nature and plasticity of the MHC-II immunopeptidome in health and disease. Presented epitopes, neoepitopes, and PTM-modified epitopes can be quantitatively and qualitatively analyzed to provide a comprehensive picture of DC role in immunosurveillance. To determine whether the redox metabolic conditions induce an altered spectrum of presented peptides, we eluted immunoaffinity-purified I-Ab from conventional dendritic cells isolated from control B6 or obese Ob/Ob mice, and analyzed MHC-II-associated peptides by LC/MS/MS using combined data-dependent (DDA) and data-independent acquisition (DIA) approaches. We analyzed the DIA data by employing a reference spectral library consisting of all peptides identified by database matching in the pool of spectra from combined DDA dataset, thus allowing a direct label-free quantitation of relative abundances between the two sample categories. The quantitative analysis of the I-Ab-eluted immunopeptidomes pinpoint important differences in peptide presentation and epitope selection in obese mice.
The integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss
We generated two comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent and Data-Indepentdent Acquisition modes. This dataset contain a balanced two-class design (cell lines: HCC1806 vs HS578T), allowing for investigating mixed effects from class, batch and acquisition method. Investigators can also compare and integrate DDA and DIA platforms, delve into the various patterns and mechanisms of missing values, benchmark batch effects correction algorithms and assess confounding between different technical issues.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3: Table S3. (A) An overview of the peptide and protein identifications and resulting differential quantitation for the 2D-LC DDA/IDA and SWATH run datasets. (B) The impact of the application of a DDA/IDA and SWATH intensity correlation filtering to the differential analysis.
Hair cells undergo postnatal development that leads to formation of their sensory organelles, synaptic machinery, and in the case of cochlear outer hair cells, their electromotile mechanism. To examine the proteome changes over development, we isolated pools of 5000 Pou4f3-Gfp positive or negative cells from the cochlea or utricles; these cell pools were analyzed by data-dependent and data-independent acquisition (DDA and DIA) mass spectrometry. DDA data were used to generate spectral libraries, which enabled identification and accurate quantitation of specific proteins using the DIA datasets. We also isolated and pooled individual inner and outer hair cells from adult cochlea and compared their proteomes to those of developing hair cells. The DDA and DIA datasets will be valuable for accurately quantifying proteins in hair cells and non-hair cells over this developmental window.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Analysis of DIA data from 2000 most abundant DDA-identified proteins.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Analysis of DDA data searched with MaxQuant default values except ≥2 unique peptides.
The Dynamic Organellar Maps (DOMs) approach combines cell fractionation and shotgun-proteomics for global profiling analysis of protein subcellular localization. Here, we have drastically enhanced the performance of DOMs through data-independent acquisition (DIA) mass spectrometry (MS). DIA-DOMs achieve twice the depth of our previous workflow in the same MS runtime, and substantially improve profiling precision and reproducibility. This repository contains all DDA-LFQ datasets used in our work: A reference dataset acquired in single shot on 100 min gradients and three fractionated datasets providing measured libraries for DIA searches on 100 min, 44 min and 21 min gradients.
In Dendritic cells (DC), the MHC II eluted immunopeptidome reflects the antigenic composition of the microenvironment. Proteins are transported and processed into peptides in endosomal MHC II compartments through autophagy or phagocytosis; extracellular peptides can also directly bind MHC II proteins at the cell surface. Altogether, these mechanisms allow DC to sample both the intra and extracellular environment. With an increase in mass spectrometry sensitivity and accuracy, we can now finally tackle important questions on the nature and plasticity of the MHC-II immunopeptidome in health and disease. Presented epitopes, neoepitopes, and PTM-modified epitopes can be quantitatively and qualitatively analyzed to provide a comprehensive picture of DC role in immunosurveillance. To determine whether the redox metabolic conditions induce an altered spectrum of presented peptides, we eluted immunoaffinity-purified I-Ab from conventional dendritic cells isolated from control B6 or obese Ob/Ob mice, and analyzed MHC-II-associated peptides by LC/MS/MS using combined data-dependent (DDA) and data-independent acquisition (DIA) approaches. We analyzed the DIA data by employing a reference spectral library consisting of all peptides identified by database matching in the pool of spectra from combined DDA dataset, thus allowing a direct label-free quantitation of relative abundances between the two sample categories. The quantitative analysis of the I-Ab-eluted immunopeptidomes pinpoint important differences in peptide presentation and epitope selection in obese mice.
The goal of this project is to compare label free quantification, chemical labeling with tandem mass tags, and data independent acquisition discovery proteomics approaches using lung squamous cell carcinomas and adjacent lung tissues. This additional single sample LC-MS/MS analysis with data dependent acquisition was performed to enable direct comparison to the PRIDE dataset, titled, "Comparison of Lung Cancer Proteome Profiles 3: DIA," where single samples were analyzed with LC-MS/MS using data independent acquisition.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary analyzed data from Manuscript: TITLE: Maturation kinetics of a multiprotein complex revealed by metabolic labeling. JOURNAL: CELL. Article Type: Research Article Authors: Evgeny Onischenko*, Elad Noor*, Jonas S. Fischer*, Ludovic Gillet, Matthias Wojtynek, Pascal Vallotton, Karsten Weis *Equally Contributing Authors Corresponding Authors: Evgeny Onischenko and Karsten Weis
Related to Figures 2 and S1; Table S4
Single-lysine precursor ion labelling in KARMA assays for a subset of proteins (Mlp1 bait). The target protein complex is isolated from cell lysates by affinity pulldowns at several time points following the onset of metabolic labeling, tryptically digested and analyzed with LC-MS on an Orbitrap mass-spectrometer. The MS2 fragmentation spectra are acquired in a DIA mode for all samples. Zero time point samples (containing only light lysine) are additionally analyzed in a DDA mode to produce assay spectral libraries. Peptide intensities are extracted from the DIA datasets with Spectronaut software (Biognosis) using complementary assay spectral libraries.
Individual plots: Protein labeling H/(H+L) in each post-labeling sample (red track) is determined based on the summed intensities of light (L) and heavy (H) constituent high quality precursor ions (green tracks). Low-quality precursor ions are excluded from quantification (grey tracks). y-axis: fractional labeling H/(H+L); x-axis: individual samples (3 replicates and 5 timepoints)
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The promises of data-independent acquisition (DIA) strategies are a comprehensive and reproducible digital qualitative and quantitative record of the proteins present in a sample. We developed a fast and robust DIA method for comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies. Compared to a data-dependent acquisition (DDA) experiments, our DIA assay doubled the number of identified peptides and proteins per sample at half the coefficients of variation observed for DDA data (DIA = ∼8%; DDA = ∼16%). We also tested different spectral libraries and their effects on overall protein and peptide identifications and their reproducibilities, which provided clear evidence that sample type-specific spectral libraries are preferred for reliable data analysis. To show applicability for biomarker discovery experiments, we analyzed a sample set of 87 urine samples from children seen in the emergency department with abdominal pain. The whole set was analyzed with high proteome coverage (∼1300 proteins/sample) in less than 4 days. The data set revealed excellent biomarker candidates for ovarian cyst and urinary tract infection. The improved throughput and quantitative performance of our optimized DIA workflow allow for the efficient simultaneous discovery and verification of biomarker candidates without the requirement for an early bias toward selected proteins.
Extracting histones from cells is a routine operation for studies that aim to characterize histones and their post-translational modifications (hPTMs). However, label-free quantitative mass spectrometry (MS) approaches, both data-dependent (DDA) and data-independent (DIA), require streamlined protocols that are highly reproducible even at the peptide level, to enable simultaneous accurate quantification of dozens to hundreds of these hPTMs. We present a step-by-step comparison of different histone extraction protocols based on literature and evaluate their suitability for label-free MS purposes using a nanoESI label-free MS1 intensity-based DDA MS experiment. We evaluate the data both using a targeted and an untargeted (Progenesis QI) approach.
Data dependent acquisition (DDA) is the method of choice for mass spectrometry based proteomics discovery experiments, data-independent acquisition (DIA) is steadily becoming more important. One of the most important requirement to perform a DIA analysis is the availability of spectral libraries for the peptide identification and quantification. Several researches were already conducted regarding the creation of spectral libraries from DDA analyses and obtaining identifications with these in DIA measurements. But so far only few experiments were conducted, to estimate the effect of these libraries on the quantitative level. In this work we created a spike-in gold standard dataset with known contents and ratios of proteins in a complex sample matrix. With this dataset, we first created spectral libraries using different sample preparation approaches with and without sample prefractionation on peptide and protein level. Two different search engines were used for protein identification. In total, five different spike-in states were compared with DIA analyses, comparing eight different spectral libraries generated by varying approaches and one library free method, as well as one default DDA analysis. Not only the number of identifications on peptide and protein level in the spectral libraries and the corresponding analyses was inspected, but also the number of expected and identified significant quantifications and their ratios were thoroughly examined. We found, that while libraries of prefractionationed samples are generally larger, the actually yielded identifications are not increased compared to repetitive non-fractionated measurements. Furthermore, we show that the accuracy of the quantifications is also highly dependent on the applied spectra library and also whether the peptide or protein level is analysed. Overall, the reproducibility and accuracy of DIA is superior to DDA in all analysed approaches.