Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Accurate target-decoy-based false discovery rate (FDR) control of peptide identification from tandem mass-spectrometry data relies on an important but often neglected assumption that incorrect spectrum annotations are equally likely to receive either target or decoy peptides. Here we argue that this assumption is often violated in practice, even by popular methods. Preference can be given to target peptides by biased scoring functions, which result in liberal FDR estimations, or to decoy peptides by correlated spectra, which result in conservative estimations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The source data for reproducing the figures in the original paper.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In bottom-up discovery proteomics, target–decoy competition (TDC) is the most popular method for false discovery rate (FDR) control. Despite unquestionable statistical foundations, this method has drawbacks, including its hitherto unknown intrinsic lack of stability vis-à-vis practical conditions of application. Although some consequences of this instability have already been empirically described, they may have been misinterpreted. This article provides evidence that TDC has become less reliable as the accuracy of modern mass spectrometers improved. We therefore propose to replace TDC by a totally different method to control the FDR at the spectrum, peptide, and protein levels, while benefiting from the theoretical guarantees of the Benjamini–Hochberg framework. As this method is simpler to use, faster to compute, and more stable than TDC, we argue that it is better adapted to the standardization and throughput constraints of current proteomic platforms.
Facebook
TwitterThe extraction of meaningful biological knowledge from high-throughput mass spectrometry data relies on limiting false discoveries to a manageable amount. For targeted approaches in metabolomics a main challenge is the detection of false positive metabolic features in the low signal-to-noise ranges of data-independent acquisition results and their filtering. Another factor is that the creation of assay libraries for data-independent acquisition analysis and the processing of extracted ion chromatograms have not been automated in metabolomics. Here we present a fully automated open-source workflow for high-throughput metabolomics that combines data-dependent and data-independent acquisition for library generation, analysis, and statistical validation, with rigorous control of the false-discovery rate while matching manual analysis regarding quantification accuracy. Using an experimentally specific data-dependent acquisition library based on reference substances allows for accurate identification of compounds and markers from data-independent acquisition data in low concentrations, facilitating biomarker quantification.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accompanying MaxQuant, Percolator and Picked Protein Group FDR files to reproduce results in the publication "Re-analysis of ProteomicsDB using an accurate, sensitive and scalable false discovery rate estimation approach for protein groups". The code for reproducing Protein Group FDRs is available on GitHub at https://github.com/kusterlab/picked_group_fdr
Use the unpack.sh bash script included here to unpack all archives, or adapt it to only unpack subsets of the data.
Facebook
TwitterDe novo peptide sequencing is a fundamental research area in mass spectrometry (MS) based proteomics. However, those methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) and the significance of de novo peptide-spectrum matches (PSMs). Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species), and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide sequencing methods on target-decoy spectra and to estimate their FDRs. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide sequencing methods, and how their performances depend on specific applications and the types of data. Our FDR estimation also shows that some tools may perform better than the others in distinguishing between de novo PSMs and random matches, and can be used to assess the significance of de novo PSMs.
Facebook
TwitterProteomic workflows generate vastly complex peptide mixtures that are analyzed by liquid chromatography–tandem mass spectrometry, creating thousands of spectra, most of which are chimeric and contain fragment ions from more than one peptide. Because of differences in data acquisition strategies such as data-dependent, data-independent or parallel reaction monitoring, separate software packages employing different analysis concepts are used for peptide identification and quantification, even though the underlying information is principally the same. Here, we introduce CHIMERYS, a spectrum-centric search algorithm designed for the deconvolution of chimeric spectra that unifies proteomic data analysis. Using accurate predictions of peptide retention time, fragment ion intensities and applying regularized linear regression, it explains as much fragment ion intensity as possible with as few peptides as possible. Together with rigorous false discovery rate control, CHIMERYS accurately identifies and quantifies multiple peptides per tandem mass spectrum in data-dependent, data-independent or parallel reaction monitoring experiments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metabolites used in prediction of incident T2D with average P-value, false discovery rate (FDR), odds ratio (OR) and 95% confidence interval (95% CI) from the pairwise comparisons of the NGT and IGT groups to the T2D group.
Facebook
TwitterSpectral library searching (SLS) is an attractive alternative to sequence database searching (SDS) for peptide identification due to its speed, sensitivity, and ability to include any selected mass spectra. While decoy methods for SLS have been developed for low mass accuracy peptide spectral libraries, it is not clear that they are optimal or directly applicable to high mass accuracy spectra. Therefore, we report the development and validation of methods for high mass accuracy decoy libraries. Two types of decoy libraries were found to be suitable for this purpose. The first, referred to as Reverse, constructs spectra by reversing a library’s peptide sequences except for the C-terminal residue. The second, termed Random, randomly replaces all non-C-terminal residues and either retains the original C-terminal residue or replaces it based on the amino-acid frequency of the library’s C-terminus. In both cases the m/z values of fragment ions are shifted accordingly. Determination of FDR is performed in a manner equivalent to SDS, concatenating a library with its decoy prior to a search. The utility of Reverse and Random libraries for target-decoy SLS in estimating false-positives and FDRs was demonstrated using spectra derived from a recently published synthetic human proteome project (Zolg, D. P.; et al. Nat. Methods 2017, 14, 259–262). For data sets from two large-scale label-free and iTRAQ experiments, these decoy building methods yielded highly similar score thresholds and spectral identifications at 1% FDR. The results were also found to be equivalent to those of using the decoy-free PeptideProphet algorithm. Using these new methods for FDR estimation, MSPepSearch, which is freely available search software, led to 18% more identifications at 1% FDR and 23% more at 0.1% FDR when compared with other widely used SDS engines coupled to postprocessing approaches such as Percolator. An application of these methods for FDR estimation for the recently reported “hybrid” library search (Burke, M. C.; et al. J. Proteome Res. 2017, 16, 1924–1935) method is also made. The application of decoy methods for high mass accuracy SLS permits the merging of these results with those of SDS, thereby increasing the assignment of more peptides, leading to deeper proteome coverage.
Facebook
TwitterAdvances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with guaranteed control on the false discovery rate (FDR) and guaranteed increase in the identified peptides. To fill in this gap, we propose a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under a target FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex protein standard shows that APIR outpowers individual database search algorithms and guarantees the FDR control. Realdata studies show that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. Note that the APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data.
Facebook
TwitterProteogenomic studies aiming at identification of variant peptides using customized database searches of mass spectrometry data are facing a dilemma of selecting the most efficient database search strategy: A choice has to be made between using combined or sequential searches against reference (wild-type) and mutant protein databases or directly against the mutant database without the wild-type one. Here we called these approaches “all-together”, “one-by-one”, and “direct”, respectively. We share the results of the comparison of these search strategies obtained for large data sets of publicly available proteogenomic data. On the basis of the results of this evaluation, we found that the “all-together” strategy provided, in general, more variant peptide identifications compared with the “one-by-one” approach, while showing similar performance for some specific cases. To validate further the results of this study, we performed a control comparison of the strategies in question using publicly available data for a mixture of the annotated human protein standard UPS1 and E. coli. For these data, both “all-together” and “one-by-one” approaches showed similar sensitivity and specificity of the searches, while the “direct” approach resulted in an increased number of false identifications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Outputs are organised into folders:helaqc_results: outputs from evaluation of a model trained and evaluated on the HeLa Single Shot datasetgeneral_results: outputs from evaluation of the general modelfeature_importance: outputs from feature importance evaluation on the general model, specifically feature permutation and SHAP analysisgeneralisation: outputs from hold-one-out calibrator generalisation assessmentNote: the columns mz_array and intensity_array have been removed from the result of the hold-one-out calibrator analysis (generalisation/calibrator_generalisation_results_no_arrays.csv) to save storage space.
Facebook
TwitterIn the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).
Facebook
TwitterIn the framework of the C-HPP, our Franco-Swiss consortium has adopted chromosomes 2 and 14, coding for a total of 382 missing proteins (proteins for which evidence is lacking at protein level). Over the last 4 years, the French proteomics infrastructure has collected high-quality data sets from 40 human samples, including a series of rarely studied cell lines, tissue types, and sample preparations. Here we described a step-by-step strategy based on the use of bioinformatics screening and subsequent mass spectrometry (MS)-based validation to identify what were up to now missing proteins in these data sets. Screening database search results (85 326 dat files) identified 58 of the missing proteins (36 on chromosome 2 and 22 on chromosome 14) by 83 unique peptides following the latest release of neXtProt (2014-09-19). PSMs corresponding to these peptides were thoroughly examined by applying two different MS-based criteria: peptide-level false discovery rate calculation and expert PSM quality assessment. Synthetic peptides were then produced and used to generate reference MS/MS spectra. A spectral similarity score was then calculated for each pair of reference-endogenous spectra and used as a third criterion for missing protein validation. Finally, LC–SRM assays were developed to target proteotypic peptides from four of the missing proteins detected in tissue/cell samples, which were still available and for which sample preparation could be reproduced. These LC–SRM assays unambiguously detected the endogenous unique peptide for three of the proteins. For two of these, identification was confirmed by additional proteotypic peptides. We concluded that of the initial set of 58 proteins detected by the bioinformatics screen, the consecutive MS-based validation criteria led to propose the identification of 13 of these proteins (8 on chromosome 2 and 5 on chromosome 14) that passed at least two of the three MS-based criteria. Thus, a rigorous step-by-step approach combining bioinformatics screening and MS-based validation assays is particularly suitable to obtain protein-level evidence for proteins previously considered as missing. All MS/MS data have been deposited in ProteomeXchange under identifier PXD002131.
Facebook
TwitterThe purpose of this study was to generate a basis for the decision of what protein quantities are reliable and find a way for accurate and precise protein quantification. To investigate this we have used thousands of peptide measurements to estimate variance and bias for quantification by iTRAQ (isobaric tags for relative and absolute quantification) mass spectrometry in complex human samples. A549 cell lysate was mixed in the proportions 2:2:1:1:2:2:1:1, fractionated by high resolution isoelectric focusing and liquid chromatography and analyzed by three mass spectrometry platforms; LTQ Orbitrap Velos, 4800 MALDI-TOF/TOF and 6530 Q-TOF. We have investigated how variance and bias in the iTRAQ reporter ions data are affected by common experimental variables such as sample amount, sample fractionation, fragmentation energy, and instrument platform. Based on this, we have suggested a concept for experimental design and a methodology for protein quantification. By using duplicate samples in each run, each experiment is validated based on its internal experimental variation. The duplicates are used for calculating peptide weights, unique to the experiment, which is used in the protein quantification. By weighting the peptides depending on reporter ion intensity, we can decrease the relative error in quantification at the protein level and assign a total weight to each protein that reflects the protein quantitation confidence. We also demonstrate the usability of this methodology in a cancer cell line experiment as well as in a clinical data set of lung cancer tissue samples. In conclusion, we have in this study developed a methodology for improved protein quantification in shotgun proteomics and introduced a way to assess quantification for proteins with few peptides. The experimental design and developed algorithms decreased the relative protein quantification error in the analysis of complex biological samples. Data analysis: LTQ Orbitrap Velos Proteome discoverer 1.1 with Mascot 2.2 (Matrix Science) was used for protein identification. Precursor mass tolerance was set to 10 ppm and for fragments 0.8 Da and 0.015 Da were used for detection in the linear iontrap and the orbitrap, respectively. Oxidized methionine was set as dynamic modification and carbamidomethylation, N-terminal 8plex iTRAQ, and lysyl 8plex iTRAQ as fixed modifications. 4800 MALDI TOF/TOF Peptide identification from the Maldi-TOF/TOF data was carried out using the Paragon algorithm in the ProteinPilot 2.0 software package (Applied Biosystems). Default settings for a 4800 instrument were used (i.e. no manual settings for mass tolerance was given). The following parameters were selected in the analysis method: iTRAQ 8plex peptide labeled as sample type, IAA as alkylating agent of cysteine, trypsin as digesting enzyme, 4800 as instrument, gel based ID and Urea denaturation as special factors, biological modifications as ID focus, and thorough ID as search effort. 6530 QTOF Peptide identification from the QTOF data was carried out using the Spectrum Mill Protein Identification software (Agilent). Data was extracted between MH+ 600 and 4000 Da (Agilent’s definition). Trypsin was used as digesting enzyme, and parent and daughter ion tolerance was set to 25 and 50 ppm, respectively. IAA for cysteine and iTRAQ partial-mix (N-term, K) were set as fixed modifications while oxidized methionine was set as variable modification. Database and peptide cut-off for all searches Searches were performed against the IPI database (build 3.64) limited to human sequences allowing 2 missed cleavages. False discovery rate (FDR) was estimated by searching the data against a database consisting of both forward and reversed sequences and set to < 1 % at the protein level using MAYU. Peptides corresponding to a <1% protein FDR rate was used in the calculations. Peptide and protein identification using Mascot for comparison between instruments Peptide identifications were performed using Mascot Daemon 2.3.2 with Mascot 2.4 for fractions 32 to 36 from IPG-IEF with 400 ug loaded peptides. Carbamidomethylation (CAM) for cysteine was set as fixed modification, oxidized methionine as variable modification and iTRAQ 8plex was set as quantification for all searches. MALDI-TOF/TOF search settings: Parent and daughter ion tolerance was set to 150 ppm and 0.2 Da, respectively. LTQ Orbitrap search settings: Precursor mass tolerance was set to 10 ppm and for fragments 0.8 Da and 0.015 Da were used for data generated in the linear ion trap and the orbitrap, respectively. QTOF search settings: Parent and daughter ion tolerance was set to 25 and 50 ppm, respectively.
Facebook
TwitterHere we present quantitative proteomics data used in the evaluation of quantitative accuracy. A human cell line, MCF7 was split into 9 aliquotes that were spiked with a dilution series of 57 protein standards of known amounts spanning 5 orders of magnitude. The protein extracts were trypsinized, and the peptides were analysed by LC-MS using either a label-free or a label-based (TMT 6-plex and iTRAQ 8-plex) quantification approach. The iTRAQ- and TMT-labelled samples were co-analysed and separated by HiRIEF (high resolution isoelectric focusing) prior LC-MS. Raw MS data was identified and quantified under the software platform Proteome Discoverer 1.3.0.339 (Thermo Fisher Scientific Inc.) or MaxQuant software (version 1.2.0.18) (label-free data). For both protein identification and quantification at least 1 unique (i.e. a peptide that occurs in not more than one database entry) peptide was required. The false discovery rate (FDR) for peptide identification was set to 5% in all analyses. For the iTRAQ and TMT labeled samples, all MS/MS spectra were searched by SEQUEST combined with the Percolator algorithm (version 2.0) for PSM search optimization. Searches were performed against a custom made database consisting of SwissProt human sequences(uniprot.org 2012-01-17, 20242 entries), and the spiked in protein standards (57 protein sequences). Peptide FDR was calculated by a target – decoy approach.
Facebook
TwitterTop down identification of proteins detected by MALDI imaging. Childhood absence epilepsy is a prototypic form of generalized nonconvulsive epilepsy characterized by short impairments of consciousness concomitant with synchronous and bilateral spike-and-wave discharges in the electroencephalogram. For scientists in this field, the BS/Orl and BR/Orl mouse lines, derived from a genetic selection, constitute an original mouse model "in mirror" of absence epilepsy. The potential of MALDI imaging mass spectrometry (IMS) for the discovery of potential biomarkers is increasingly recognized. Interestingly, statistical analysis tools specifically adapted to IMS data sets and methods for the identification of detected proteins play an essential role. In this study, a new cross-classification comparative design using a combined discrete wavelet transformation-support vector machine classification was developed to discriminate spectra of brain sections of BS/Orl and BR/Orl mice. Nineteen m/z ratios were thus highlighted as potential markers with very high recognition rates (87-99%). Seven of these potential markers were identified using a top-down approach, in particular a fragment of Synapsin-I. This protein is yet suspected to be involved in epilepsy. Immunohistochemistry and Western Blot experiments confirmed the differential expression of Synapsin-I observed by IMS, thus tending to validate our approach. Functional assays are being performed to confirm the involvement of Synapsin-I in the mechanisms underlying childhood absence epilepsy. Data processing and bioinformatics: The ProSight PC 2.0 software(Thermo Scientific) was used to create the peak list from .raw data. RAW files were processed using the Xtract algorithm that interprets resolved isotopic distributions and output neutral mass values. In this study, two search modes were used: “absolute mass” search for the identification of full-length proteins and “biomarker” search for the identification of protein fragments. For absolute mass search, ProSight PC restricts the protein database to proteins matching the mass of the precursor, whereas for biomarker search, the protein database is restricted to each protein subsequence matching the mass of the precursor. In order to determine the false discovery rate (FDR), the FASTA format Uniprot database (release 2011_08, 531473 sequences, 188463640 residues) filtered with the Mus musculus taxonomy (16401 sequences) was concatenated with a decoy database constituted by randomized protein sequences. The resulting database was imported in the ProSight PC 2.0 software and configured for top-down analysis. For each analyzed fraction, the FDR was calculated as follows: FDR = 2 × FP/(FP + TP), where FP and TP are the number of matches from the decoy and target database, respectively. All identifications reported here correspond to a FDR lower than 5%. MS/MS spectra were searched against the database using the absolute mass search mode with a precursor ion mass tolerance of 10 ppm and a fragment ion mass tolerance of 15 ppm on monoisotopic masses. The identification score is based on the number of observed fragment ions matching the fragment ion tolerance. A second absolute mass search was performed with a loose precursor ion tolerance of 1000 Da to evidence post-translational modifications. The Sequence Gazer tool was used to manually check proteins identified with a significant score but with a difference between the theoretical precursor ion mass and the observed precursor ion mass. Post-translational modification(s) can thus be localized on the sequence of the protein due to mass shift(s) observed for b and/or y ions. MS/MS spectra were then searched using the biomarker search mode with a precursor ion mass tolerance of 10 ppm and a fragment ion mass tolerance of 15 ppm on monoisotopic masses to identify fragments of intact proteins present in the database. A new database was then constructed from all protein sequences identified with the previous searches in order to perform a biomarker search with a precursor ion mass tolerance of 100 Da and a fragment ion mass tolerance of 15 ppm. Indeed, such precursor ion mass tolerance in the biomarker search mode is unusable for large databases such as the Uniprot database filtered with the Mus musculus taxonomy (16401 sequences) and requires the use of small databases. This search was performed to identify modified fragments of previously identified proteins that could not have been identified with the biomarker search performed with a precursor ion mass tolerance of 10 ppm. The Sequence Gazer tool was used to manually localize post-translational modification(s) on the sequence of the protein fragments due to mass shift(s) observed for b and/or y ions.
Facebook
TwitterRecently, we presented the DirectMS1 method of ultrafast proteome-wide analysis based on minute-long LC gradients and MS1-only mass spectra acquisition. Currently, the method provides the depth of human cell proteome coverage of 2500 proteins at a 1% false discovery rate (FDR) when using 5 min LC gradients and 7.3 min runtime in total. While the standard MS/MS approaches provide 4000–5000 protein identifications within a couple of hours of instrumentation time, we advocate here that the higher number of identified proteins does not always translate into better quantitation quality of the proteome analysis. To further elaborate on this issue, we performed a one-on-one comparison of quantitation results obtained using DirectMS1 with three popular MS/MS-based quantitation methods: label-free (LFQ) and tandem mass tag quantitation (TMT), both based on data-dependent acquisition (DDA) and data-independent acquisition (DIA). For comparison, we performed a series of proteome-wide analyses of well-characterized (ground truth) and biologically relevant samples, including a mix of UPS1 proteins spiked at different concentrations into an Echerichia coli digest used as a background and a set of glioblastoma cell lines. MS1-only data was analyzed using a novel quantitation workflow called DirectMS1Quant developed in this work. The results obtained in this study demonstrated comparable quantitation efficiency of 5 min DirectMS1 with both TMT and DIA methods, yet the latter two utilized a 10–20-fold longer instrumentation time.
Facebook
TwitterCapillary zone electrophoresis-electrospray ionization-tandem mass spectrometry (CZE-ESI-MS/MS) has been recognized as an invaluable platform for top-down proteomics. However, the scale of top-down proteomics from CZE-MS/MS is still limited due to the low loading capacity and narrow separation window of CZE. In this work, for the first time we systematically evaluated dynamic pH junction method for focusing of intact proteins during CZE-MS. The optimized dynamic pH junction based CZE-MS/MS system approached 1-µL loading capacity, 90-min separation window and high peak capacity (~280) for separation of Escherichia coli proteome. The results represent the largest loading capacity and the highest peak capacity of CZE for top-down characterization of complex proteomes. About 2,800 proteoform-spectrum matches, nearly 600 proteoforms, and 200 proteins were identified from an Escherichia coli lysate by single-shot CZE-MS/MS with spectrum-level false discovery rate (FDR) less than 1%. The number of proteoforms is over three times higher than that from previous single-shot CZE-MS/MS.
Facebook
TwitterHigh-mass-resolution imaging mass spectrometry promises to localize hundreds of metabolites directly from tissues, cell cultures, and agar plates with cellular resolution, but is hampered by the lack of bioinformatics for automated metabolite identification. We developed the first bioinformatics framework for False Discovery Rate (FDR)-controlled metabolite annotation for high-mass-resolution imaging mass spectrometry (https://github.com/alexandrovteam/pySM) introducing a Metabolite-Signal Match (MSM) score and a target-decoy FDR-estimate for spatial metabolomics. MALDI-FTICR datasets acquired from wild type adult mouse brain were used for the development of spatial metabolomics annotation bioinformatics. This study together with MTBLS317 and MTBLS378 provide the accompanying data for FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Accurate target-decoy-based false discovery rate (FDR) control of peptide identification from tandem mass-spectrometry data relies on an important but often neglected assumption that incorrect spectrum annotations are equally likely to receive either target or decoy peptides. Here we argue that this assumption is often violated in practice, even by popular methods. Preference can be given to target peptides by biased scoring functions, which result in liberal FDR estimations, or to decoy peptides by correlated spectra, which result in conservative estimations.