Data dependent acquisition (DDA) is the method of choice for mass spectrometry based proteomics discovery experiments, data-independent acquisition (DIA) is steadily becoming more important. One of the most important requirement to perform a DIA analysis is the availability of spectral libraries for the peptide identification and quantification. Several researches were already conducted regarding the creation of spectral libraries from DDA analyses and obtaining identifications with these in DIA measurements. But so far only few experiments were conducted, to estimate the effect of these libraries on the quantitative level. In this work we created a spike-in gold standard dataset with known contents and ratios of proteins in a complex sample matrix. With this dataset, we first created spectral libraries using different sample preparation approaches with and without sample prefractionation on peptide and protein level. Two different search engines were used for protein identification. In total, five different spike-in states were compared with DIA analyses, comparing eight different spectral libraries generated by varying approaches and one library free method, as well as one default DDA analysis. Not only the number of identifications on peptide and protein level in the spectral libraries and the corresponding analyses was inspected, but also the number of expected and identified significant quantifications and their ratios were thoroughly examined. We found, that while libraries of prefractionationed samples are generally larger, the actually yielded identifications are not increased compared to repetitive non-fractionated measurements. Furthermore, we show that the accuracy of the quantifications is also highly dependent on the applied spectra library and also whether the peptide or protein level is analysed. Overall, the reproducibility and accuracy of DIA is superior to DDA in all analysed approaches.
Mass spectrometry (MS)-based proteomics aims to characterize comprehensive proteomes in a fast and reproducible manner. Here, we present an ultra-fast scanning data-independent acquisition (DIA) strategy consisting on 2-Th precursor isolation windows, dissolving the differences between data-dependent and independent methods. This is achieved by pairing a Quadrupole Orbitrap mass spectrometer with the asymmetric track lossless (Astral) analyzer that provides >200 Hz MS/MS scanning speed, high resolving power and sensitivity, as well as low ppm-mass accuracy. Narrowwindow DIA enables profiling of up to 100 full yeast proteomes per day, or ~10,000 human proteins in half-an-hour. Moreover, multi-shot acquisition of fractionated samples allows comprehensive coverage of human proteomes in ~3h, showing comparable depth to next-generation RNA sequencing and with 10x higher throughput compared to current state-of-the-art MS. High quantitative precision and accuracy is demonstrated with high peptide coverage in a 3-species proteome mixture, quantifying 14,000+ proteins in a single run in half-an-hour.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Analysis of DDA data searched with MaxQuant default values except ≥2 unique peptides.
Quality control (QC) in mass spectrometry (MS)-based proteomics is mainly based on data-dependent acquisition (DDA) analysis of standard samples. Here, we collected 2638 files acquired by data independent acquisition (DIA) and paired DDA files from mouse liver digests using 21 mass spectrometers across nine laboratories over 31 months. Our data showed that DIA-based LC-MS/MS related consensus QC metric is more sensitive than DDA-based QC in detecting MS status changes. We then optimized 15 DIA-QC metrics, and invited to manually assess the quality of 2638 DIA files generated by 21 mass spectrometers based on each metric. Based on the annotation results, we developed an AI model for DIA-based QC in the training set of 2059 DIA files, and predicted the liquid chromatography (LC) performance with an AUC of 0.91 and the MS performance with an AUC of 0.97 in an independent validation dataset (n = 523). Finally, we developed an offline software called iDIA-QC for convenient adoption of this methodology for LC-MS QC
State-of-the-art proteomics-grade mass spectrometers can measure peptide precursors and their fragments with ppm mass accuracy at sequencing speeds of tens of peptides per second with attomolar sensitivity. Here we describe a compact and robust quadrupole-orbitrap mass spectrometer equipped with a front-end High Field Asymmetric Waveform Ion Mobility Spectrometry (FAIMS) Interface. The performance of the Orbitrap Exploris 480 mass spectrometer is evaluated in data-dependent acquisition (DDA) and data-independent acquisition (DIA) modes in combination with FAIMS. We demonstrate that different compensation voltages (CVs) for FAIMS are optimal for DDA and DIA, respectively. Combining DIA with FAIMS using single CVs, the instrument surpasses 2500 peptides identified per minute. This enables quantification of >5000 proteins with short online LC gradients delivered by the Evosep One LC system allowing acquisition of 60 samples per day. The raw sensitivity of the instrument is evaluated by analyzing 5 ng of a HeLa digest from which >1000 proteins were reproducibly identified with 5 minute LC gradients using DIA-FAIMS. To demonstrate the versatility of the instrument we recorded an organ-wide map of proteome expression across 12 rat tissues quantified by tandem mass tags and label-free quantification using DIA with FAIMS to a depth of >10,000 proteins.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Analysis of DIA data from 2000 most abundant DDA-identified proteins.
This dataset consists of 44 raw MS files, comprising 27 DIA (SWATH) and 15 DDA runs on a TripleTOF 5600 and of two raw mass spectrometry files acquired on a Q Exactive. The composition of the dataset is described in the manuscript by Tsou et al., titled: "DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics", Nature Methods, in press Raw files are deposited here in ProteomeXchange and are associated with the DIA-Umpire processed data. All DIA-Umpire processed results for each sample together with DDA results are deposited in separated folders. Also see the "DataSampleID.xlsx" associated with this Readme file. Internal reference from the Gingras lab ProHits implementation: Project 94, Export version VS2 (Tsou_DIA-Umpire)
These data cover all of the analyses described in the paper "Specter: linear deconvolution as a new paradigm for targeted analysis of data-independent acquisition mass spectrometry proteomics". Specifically, the data consist of - 20 DIA and DDA files from the HEK293T/synthetic phosphopeptides spike-in experiments - 10 DDA files, one for each of ten fractions of an E. coli lysate digest - 14 DIA files for the experiments involving mixtures of synthetic peptides - 11 DDA files for the isolated runs of these synthetic peptides - 84 DIA files for measurements of the phosphoproteome of perturbed PC3 cells - 10 DDA files for spectral library construction for the phosphoproteomics data - 3 DIA and 3 DDA files for analysis of an unfractionated E. coli lysate digest. See the spreadsheet "Specter Datasets Catalog.xlsx" for further descriptions and file metadata.
Full-scan, data-dependent acquisition (DDA), and data-independent acquisition (DIA) are the three common data acquisition modes in high resolution mass spectrometry-based untargeted metabolomics. It is an important yet underrated research topic on which acquisition mode is more suitable for a given untargeted metabolomics application. In this work, we compared the three data acquisition techniques using a standard mixture of 134 endogenous metabolites and a human urine sample. Both hydrophilic interaction and reversed-phase liquid chromatographic separation along with positive and negative ionization modes were tested. Both the standard mixture and urine samples generated consistent results. Full-scan mode is able to capture the largest number of metabolic features, followed by DIA and DDA (53.7% and 64.8% respective features fewer on average in urine than full-scan). Comparing the MS2 spectra in DIA and DDA, spectra quality is higher in DDA with average dot product score 83.1% higher than DIA in Urine(H), and the number of MS2 spectra (spectra quantity) is larger in DIA (on average 97.8% more than DDA in urine). Moreover, a comparison of relative standard deviation distribution between modes shows consistency in the quantitative precision, with the exception of DDA showing a minor disadvantage (on average 19.8% and 26.8% fewer features in urine with RSD < 5% than full-scan and DIA). In terms of data preprocessing convenience, full-scan and DDA data can be processed by well-established software. In contrast, several bioinformatic issues remain to be addressed in processing DIA data and the development of more effective computational programs is highly demanded.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Analysis of DDA data searched with MaxQuant default values.
Viewing the scarce amount of protein material coming from the bacterial pathogen in infection models and despite the availability of contemporary, highly sensitive and fast scanning mass spectrometers, the power requirement still not suffices to study the host and pathogen proteomes simultaneously. In the present work we aimed to establish a DIA mass spectrometry workflow for improving the protein identification and quantification of LC-MS/MS, particularly in case of complex samples containing a fairly low amount of peptide material derived from Salmonella, therefore enabling simultaneous host and pathogen protein expression profiling reflecting actual infection conditions.
In Dendritic cells (DC), the MHC II eluted immunopeptidome reflects the antigenic composition of the microenvironment. Proteins are transported and processed into peptides in endosomal MHC II compartments through autophagy or phagocytosis; extracellular peptides can also directly bind MHC II proteins at the cell surface. Altogether, these mechanisms allow DC to sample both the intra and extracellular environment. With an increase in mass spectrometry sensitivity and accuracy, we can now finally tackle important questions on the nature and plasticity of the MHC-II immunopeptidome in health and disease. Presented epitopes, neoepitopes, and PTM-modified epitopes can be quantitatively and qualitatively analyzed to provide a comprehensive picture of DC role in immunosurveillance. To determine whether the redox metabolic conditions induce an altered spectrum of presented peptides, we eluted immunoaffinity-purified I-Ab from conventional dendritic cells isolated from control B6 or obese Ob/Ob mice, and analyzed MHC-II-associated peptides by LC/MS/MS using combined data-dependent (DDA) and data-independent acquisition (DIA) approaches. We analyzed the DIA data by employing a reference spectral library consisting of all peptides identified by database matching in the pool of spectra from combined DDA dataset, thus allowing a direct label-free quantitation of relative abundances between the two sample categories. The quantitative analysis of the I-Ab-eluted immunopeptidomes pinpoint important differences in peptide presentation and epitope selection in obese mice.
The last decade has seen significant advances in the application of quantitative mass spectrometry-based proteomics technologies to tackle important questions in plant biology. The current standard for quantitative proteomics in plants is the use of data-dependent acquisition (DDA) analysis with or without the use of chemical labels. However, the DDA approach preferentially measures higher abundant proteins, and often requires data imputation due to quantification inconsistency between samples. In this study we systematically benchmarked a recently developed library-free data-independent acquisition (directDIA) method against a state-of-the-art DDA label-free quantitative proteomics workflow for plants. We next developed a novel acquisition approach combining MS1-level BoxCar acquisition with MS2-level directDIA analysis that we call BoxCarDIA. DirectDIA achieves a 33% increase in protein quantification over traditional DDA, and BoxCarDIA a further 8%, without any changes in instrumentation, offline fractionation, or increases in mass-spectrometer run time. BoxCarDIA, especially, offers wholly reproducible quantification of proteins between replicate injections, thereby addressing the long-standing missing-value problem in label-free quantitative proteomics. Further, we find that the gains in dynamic range sampling by directDIA and BoxCarDIA translate to deeper quantification of key, low abundant, functional protein classes (e.g., protein kinases and transcription factors) that are underrepresented in data acquired using DDA. We applied these methods to perform a quantitative proteomic comparison of dark and light grown Arabidopsis cell cultures, providing a critical resource for future plant interactome studies. Our results establish BoxCarDIA as the new method of choice in quantitative proteomics using Orbitrap-type mass-spectrometers, particularly for proteomes with large dynamic range such as that of plants.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
mProphet features for model describing DIA peptide data for 2000 most abundant DDA-identified proteins.
The integration of proteomic datasets, generated by non-cooperating laboratories using different LC-MS/MS setups can overcome limitations in statistically underpowered sample cohorts but has not been demonstrated to this day. In proteomics, differences in sample preservation and preparation strategies, chromatography and mass spectrometry approaches and the used quantification strategy distort protein abundance distributions in integrated datasets. The Removal of these technical batch effects requires setup-specific normalization and strategies that can deal with missing at random (MAR) and missing not at random (MNAR) type values at a time. Algorithms for batch effect removal, such as the ComBat-algorithm, commonly used for other omics types, disregard proteins with MNAR missing values and reduce the informational yield and the effect size for combined datasets significantly. Here, we present a strategy for data harmonization across different tissue preservation techniques, LC-MS/MS instrumentation setups and quantification approaches. To enable batch effect removal without the need for data reduction or error-prone imputation we developed an extension to the ComBat algorithm, ´ComBat HarmonizR, that performs data harmonization with appropriate handling of MAR and MNAR missing values by matrix dissection The ComBat HarmonizR based strategy enables the combined analysis of independently generated proteomic datasets for the first time. Furthermore, we found ComBat HarmonizR to be superior for removing batch effects between different Tandem Mass Tag (TMT)-plexes, compared to commonly used internal reference scaling (iRS). Due to the matrix dissection approach without the need of data imputation, the HarmonizR algorithm can be applied to any type of -omics data while assuring minimal data loss
Activity-based protein profiling (ABPP) uses a combination of activity-based chemical probes with mass spectrometry (MS) to selectively characterise a particular enzyme or enzyme class. ABPP has proven invaluable for profiling enzymatic inhibitors in drug discovery. When applied to cell extracts and cells, challenging the ABP-enzyme complex formation with a small molecule can simultaneously inform on potency, selectivity, reversibility/binding affinity, permeability, and stability. ABPP can also be applied to pharmacodynamic studies to inform on cellular target engagement within specific organs when applied to in vivo models. Recently, we established separate high depth and high throughput ABPP (ABPP-HT) protocols for the profiling of deubiquitylating enzymes (DUBs). However, the combination of the two, deep and fast, in one method has been elusive. To further increase the sensitivity of the current ABPP-HT workflow we implemented state-of-the-art data-independent acquisition (DIA) and data-dependent acquisition (DDA) MS analysis tools. Hereby, we describe an improved methodology, ABPP-HT* (enhanced high-throughput-compatible activity-based protein profiling), that in combination with DIA MS methods allowed for the consistent profiling of 35-40 DUBs and provided a reduced number of missing values, whilst maintaining a throughput of 100 samples per day.
Hair cells undergo postnatal development that leads to formation of their sensory organelles, synaptic machinery, and in the case of cochlear outer hair cells, their electromotile mechanism. To examine the proteome changes over development, we isolated pools of 5000 Pou4f3-Gfp positive or negative cells from the cochlea or utricles; these cell pools were analyzed by data-dependent and data-independent acquisition (DDA and DIA) mass spectrometry. DDA data were used to generate spectral libraries, which enabled identification and accurate quantitation of specific proteins using the DIA datasets. We also isolated and pooled individual inner and outer hair cells from adult cochlea and compared their proteomes to those of developing hair cells. The DDA and DIA datasets will be valuable for accurately quantifying proteins in hair cells and non-hair cells over this developmental window.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Tandem mass spectrometry (MS/MS) is an invaluable experimental tool for providing analytical data supporting the identification of small molecules and peptides in mass-spectrometry-based “omics” experiments. Data-dependent MS/MS (DDA) is a real-time MS/MS-acquisition strategy that is responsive to the signals detected in a given sample. However, in analysis of even moderately complex samples with state-of-the-art instrumentation, the speed of MS/MS acquisition is insufficient to offer comprehensive MS/MS coverage of all detected molecules. Data-independent approaches (DIA) offer greater MS/MS coverage, typically at the expense of selectivity or sensitivity. This report describes data-set-dependent MS/MS (DsDA), a novel integration of MS1-data processing and target prioritization to enable comprehensive MS/MS sampling during the initial MS-level experiment. This approach is guided by the premise that in omics experiments, individual injections are typically made as part of a larger set of samples, and feedback between data processing and data acquisition can allow approximately real-time optimization of MS/MS-acquisition parameters and nearly complete MS/MS-sampling coverage. Using a combination of R, Proteowizard, XCMS, and WRENS software, this concept was implemented on a liquid-chromatograph-coupled quadrupole time-of-flight mass spectrometer. The results illustrate comprehensive MS/MS coverage for a set of complex small-molecule samples and demonstrate a strong improvement on traditional DDA.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Analysis of DIA data from myosin heavy and light chains.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The promises of data-independent acquisition (DIA) strategies are a comprehensive and reproducible digital qualitative and quantitative record of the proteins present in a sample. We developed a fast and robust DIA method for comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies. Compared to a data-dependent acquisition (DDA) experiments, our DIA assay doubled the number of identified peptides and proteins per sample at half the coefficients of variation observed for DDA data (DIA = ∼8%; DDA = ∼16%). We also tested different spectral libraries and their effects on overall protein and peptide identifications and their reproducibilities, which provided clear evidence that sample type-specific spectral libraries are preferred for reliable data analysis. To show applicability for biomarker discovery experiments, we analyzed a sample set of 87 urine samples from children seen in the emergency department with abdominal pain. The whole set was analyzed with high proteome coverage (∼1300 proteins/sample) in less than 4 days. The data set revealed excellent biomarker candidates for ovarian cyst and urinary tract infection. The improved throughput and quantitative performance of our optimized DIA workflow allow for the efficient simultaneous discovery and verification of biomarker candidates without the requirement for an early bias toward selected proteins.
Data dependent acquisition (DDA) is the method of choice for mass spectrometry based proteomics discovery experiments, data-independent acquisition (DIA) is steadily becoming more important. One of the most important requirement to perform a DIA analysis is the availability of spectral libraries for the peptide identification and quantification. Several researches were already conducted regarding the creation of spectral libraries from DDA analyses and obtaining identifications with these in DIA measurements. But so far only few experiments were conducted, to estimate the effect of these libraries on the quantitative level. In this work we created a spike-in gold standard dataset with known contents and ratios of proteins in a complex sample matrix. With this dataset, we first created spectral libraries using different sample preparation approaches with and without sample prefractionation on peptide and protein level. Two different search engines were used for protein identification. In total, five different spike-in states were compared with DIA analyses, comparing eight different spectral libraries generated by varying approaches and one library free method, as well as one default DDA analysis. Not only the number of identifications on peptide and protein level in the spectral libraries and the corresponding analyses was inspected, but also the number of expected and identified significant quantifications and their ratios were thoroughly examined. We found, that while libraries of prefractionationed samples are generally larger, the actually yielded identifications are not increased compared to repetitive non-fractionated measurements. Furthermore, we show that the accuracy of the quantifications is also highly dependent on the applied spectra library and also whether the peptide or protein level is analysed. Overall, the reproducibility and accuracy of DIA is superior to DDA in all analysed approaches.