100+ datasets found
  1. f

    Data from: Bias in False Discovery Rate Estimation in...

    • acs.figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yulia Danilova; Anastasia Voronkova; Pavel Sulimov; Attila Kertész-Farkas (2023). Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00991.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    ACS Publications
    Authors
    Yulia Danilova; Anastasia Voronkova; Pavel Sulimov; Attila Kertész-Farkas
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Accurate target-decoy-based false discovery rate (FDR) control of peptide identification from tandem mass-spectrometry data relies on an important but often neglected assumption that incorrect spectrum annotations are equally likely to receive either target or decoy peptides. Here we argue that this assumption is often violated in practice, even by popular methods. Preference can be given to target peptides by biased scoring functions, which result in liberal FDR estimations, or to decoy peptides by correlated spectra, which result in conservative estimations.

  2. Z

    Data from: Assessment of false discovery rate control in tandem mass...

    • data-staging.niaid.nih.gov
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wen, Bo; Freestone, Jack; Riffle, Michael; MacCoss, Michael; Noble, William; Keich, Uri (2025). Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_15073579
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Department of Genome Sciences, University of Washington
    School of Mathematics and Statistics, University of Sydney
    Authors
    Wen, Bo; Freestone, Jack; Riffle, Michael; MacCoss, Michael; Noble, William; Keich, Uri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The source data for reproducing the figures in the original paper.

  3. f

    Data from: Beyond Target–Decoy Competition: Stable Validation of Peptide and...

    • acs.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yohann Couté; Christophe Bruley; Thomas Burger (2023). Beyond Target–Decoy Competition: Stable Validation of Peptide and Protein Identifications in Mass Spectrometry-Based Discovery Proteomics [Dataset]. http://doi.org/10.1021/acs.analchem.0c00328.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    ACS Publications
    Authors
    Yohann Couté; Christophe Bruley; Thomas Burger
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In bottom-up discovery proteomics, target–decoy competition (TDC) is the most popular method for false discovery rate (FDR) control. Despite unquestionable statistical foundations, this method has drawbacks, including its hitherto unknown intrinsic lack of stability vis-à-vis practical conditions of application. Although some consequences of this instability have already been empirically described, they may have been misinterpreted. This article provides evidence that TDC has become less reliable as the accuracy of modern mass spectrometers improved. We therefore propose to replace TDC by a totally different method to control the FDR at the spectrum, peptide, and protein levels, while benefiting from the theoretical guarantees of the Benjamini–Hochberg framework. As this method is simpler to use, faster to compute, and more stable than TDC, we argue that it is better adapted to the standardization and throughput constraints of current proteomic platforms.

  4. Data from: DIAMetAlyzer allows automated false-discovery rate-controlled...

    • data.niaid.nih.gov
    xml
    Updated Sep 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oliver Alka (2022). DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls1108
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Sep 5, 2022
    Dataset provided by
    Applied Bioinformatics / University of Tübingen
    Authors
    Oliver Alka
    Variables measured
    Metabolomics, Method development (DDA, DIA), Benchmarking, Dilution Series
    Description

    The extraction of meaningful biological knowledge from high-throughput mass spectrometry data relies on limiting false discoveries to a manageable amount. For targeted approaches in metabolomics a main challenge is the detection of false positive metabolic features in the low signal-to-noise ranges of data-independent acquisition results and their filtering. Another factor is that the creation of assay libraries for data-independent acquisition analysis and the processing of extracted ion chromatograms have not been automated in metabolomics. Here we present a fully automated open-source workflow for high-throughput metabolomics that combines data-dependent and data-independent acquisition for library generation, analysis, and statistical validation, with rigorous control of the false-discovery rate while matching manual analysis regarding quantification accuracy. Using an experimentally specific data-dependent acquisition library based on reference substances allows for accurate identification of compounds and markers from data-independent acquisition data in low concentrations, facilitating biomarker quantification.

  5. Picked Protein Group FDR

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip, sh
    Updated Oct 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew The; Matthew The (2022). Picked Protein Group FDR [Dataset]. http://doi.org/10.5281/zenodo.7157677
    Explore at:
    application/gzip, shAvailable download formats
    Dataset updated
    Oct 17, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthew The; Matthew The
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accompanying MaxQuant, Percolator and Picked Protein Group FDR files to reproduce results in the publication "Re-analysis of ProteomicsDB using an accurate, sensitive and scalable false discovery rate estimation approach for protein groups". The code for reproducing Protein Group FDRs is available on GitHub at https://github.com/kusterlab/picked_group_fdr

    Use the unpack.sh bash script included here to unpack all archives, or adapt it to only unpack subsets of the data.

  6. e

    Data from: NovoBoard: a comprehensive framework for evaluating the false...

    • ebi.ac.uk
    • data.niaid.nih.gov
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ngoc Hieu Tran (2024). NovoBoard: a comprehensive framework for evaluating the false discovery rate and accuracy of de novo peptide sequencing [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD055277
    Explore at:
    Dataset updated
    Aug 28, 2024
    Authors
    Ngoc Hieu Tran
    Variables measured
    Proteomics
    Description

    De novo peptide sequencing is a fundamental research area in mass spectrometry (MS) based proteomics. However, those methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) and the significance of de novo peptide-spectrum matches (PSMs). Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species), and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide sequencing methods on target-decoy spectra and to estimate their FDRs. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide sequencing methods, and how their performances depend on specific applications and the types of data. Our FDR estimation also shows that some tools may perform better than the others in distinguishing between de novo PSMs and random matches, and can be used to assess the significance of de novo PSMs.

  7. e

    Data from: Unifying the analysis of bottom-up proteomics data with CHIMERYS

    • ebi.ac.uk
    • data-staging.niaid.nih.gov
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Frejno (2025). Unifying the analysis of bottom-up proteomics data with CHIMERYS [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD053241
    Explore at:
    Dataset updated
    Apr 16, 2025
    Authors
    Martin Frejno
    Variables measured
    Proteomics
    Description

    Proteomic workflows generate vastly complex peptide mixtures that are analyzed by liquid chromatography–tandem mass spectrometry, creating thousands of spectra, most of which are chimeric and contain fragment ions from more than one peptide. Because of differences in data acquisition strategies such as data-dependent, data-independent or parallel reaction monitoring, separate software packages employing different analysis concepts are used for peptide identification and quantification, even though the underlying information is principally the same. Here, we introduce CHIMERYS, a spectrum-centric search algorithm designed for the deconvolution of chimeric spectra that unifies proteomic data analysis. Using accurate predictions of peptide retention time, fragment ion intensities and applying regularized linear regression, it explains as much fragment ion intensity as possible with as few peptides as possible. Together with rigorous false discovery rate control, CHIMERYS accurately identifies and quantifies multiple peptides per tandem mass spectrum in data-dependent, data-independent or parallel reaction monitoring experiments.

  8. Metabolites used in prediction of incident T2D with average P-value, false...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Otto Savolainen; Björn Fagerberg; Mads Vendelbo Lind; Ann-Sofie Sandberg; Alastair B. Ross; Göran Bergström (2023). Metabolites used in prediction of incident T2D with average P-value, false discovery rate (FDR), odds ratio (OR) and 95% confidence interval (95% CI) from the pairwise comparisons of the NGT and IGT groups to the T2D group. [Dataset]. http://doi.org/10.1371/journal.pone.0177738.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Otto Savolainen; Björn Fagerberg; Mads Vendelbo Lind; Ann-Sofie Sandberg; Alastair B. Ross; Göran Bergström
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metabolites used in prediction of incident T2D with average P-value, false discovery rate (FDR), odds ratio (OR) and 95% confidence interval (95% CI) from the pairwise comparisons of the NGT and IGT groups to the T2D group.

  9. f

    Data from: Reverse and Random Decoy Methods for False Discovery Rate...

    • datasetcatalog.nlm.nih.gov
    • acs.figshare.com
    Updated Jan 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hess, Sonja; Markey, Sanford P.; Mirokhin, Yuri A.; Zhang, Zheng; Burke, Meghan; Stein, Stephen E.; Yu, Wen; Tchekhovskoi, Dmitrii V.; Chaerkady, Raghothama (2018). Reverse and Random Decoy Methods for False Discovery Rate Estimation in High Mass Accuracy Peptide Spectral Library Searches [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001800486
    Explore at:
    Dataset updated
    Jan 10, 2018
    Authors
    Hess, Sonja; Markey, Sanford P.; Mirokhin, Yuri A.; Zhang, Zheng; Burke, Meghan; Stein, Stephen E.; Yu, Wen; Tchekhovskoi, Dmitrii V.; Chaerkady, Raghothama
    Description

    Spectral library searching (SLS) is an attractive alternative to sequence database searching (SDS) for peptide identification due to its speed, sensitivity, and ability to include any selected mass spectra. While decoy methods for SLS have been developed for low mass accuracy peptide spectral libraries, it is not clear that they are optimal or directly applicable to high mass accuracy spectra. Therefore, we report the development and validation of methods for high mass accuracy decoy libraries. Two types of decoy libraries were found to be suitable for this purpose. The first, referred to as Reverse, constructs spectra by reversing a library’s peptide sequences except for the C-terminal residue. The second, termed Random, randomly replaces all non-C-terminal residues and either retains the original C-terminal residue or replaces it based on the amino-acid frequency of the library’s C-terminus. In both cases the m/z values of fragment ions are shifted accordingly. Determination of FDR is performed in a manner equivalent to SDS, concatenating a library with its decoy prior to a search. The utility of Reverse and Random libraries for target-decoy SLS in estimating false-positives and FDRs was demonstrated using spectra derived from a recently published synthetic human proteome project (Zolg, D. P.; et al. Nat. Methods 2017, 14, 259–262). For data sets from two large-scale label-free and iTRAQ experiments, these decoy building methods yielded highly similar score thresholds and spectral identifications at 1% FDR. The results were also found to be equivalent to those of using the decoy-free PeptideProphet algorithm. Using these new methods for FDR estimation, MSPepSearch, which is freely available search software, led to 18% more identifications at 1% FDR and 23% more at 0.1% FDR when compared with other widely used SDS engines coupled to postprocessing approaches such as Percolator. An application of these methods for FDR estimation for the recently reported “hybrid” library search (Burke, M. C.; et al. J. Proteome Res. 2017, 16, 1924–1935) method is also made. The application of decoy methods for high mass accuracy SLS permits the merging of these results with those of SDS, thereby increasing the assignment of more peptides, leading to deeper proteome coverage.

  10. Data from: APIR: a universal FDR-control framework for boosting peptide...

    • nde-dev.biothings.io
    • ebi.ac.uk
    xml
    Updated Sep 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yiling Chen; Leo David Wang (2021). APIR: a universal FDR-control framework for boosting peptide identification power by aggregating multiple proteomics database search algorithms [Dataset]. https://nde-dev.biothings.io/resources?id=pxd028558
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Sep 21, 2021
    Dataset provided by
    Departments of Pediatrics and Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte CA 91010
    University of California, Los Angeles
    Authors
    Yiling Chen; Leo David Wang
    Variables measured
    Proteomics
    Description

    Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with guaranteed control on the false discovery rate (FDR) and guaranteed increase in the identified peptides. To fill in this gap, we propose a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under a target FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex protein standard shows that APIR outpowers individual database search algorithms and guarantees the FDR control. Realdata studies show that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. Note that the APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data.

  11. f

    Data from: Comparison of False Discovery Rate Control Strategies for Variant...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Mar 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moshkovskii, Sergei A.; Gorshkov, Mikhail V.; Karpov, Dmitry S.; Lobas, Anna A.; Ivanov, Mark V. (2017). Comparison of False Discovery Rate Control Strategies for Variant Peptide Identifications in Shotgun Proteogenomics [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001774284
    Explore at:
    Dataset updated
    Mar 31, 2017
    Authors
    Moshkovskii, Sergei A.; Gorshkov, Mikhail V.; Karpov, Dmitry S.; Lobas, Anna A.; Ivanov, Mark V.
    Description

    Proteogenomic studies aiming at identification of variant peptides using customized database searches of mass spectrometry data are facing a dilemma of selecting the most efficient database search strategy: A choice has to be made between using combined or sequential searches against reference (wild-type) and mutant protein databases or directly against the mutant database without the wild-type one. Here we called these approaches “all-together”, “one-by-one”, and “direct”, respectively. We share the results of the comparison of these search strategies obtained for large data sets of publicly available proteogenomic data. On the basis of the results of this evaluation, we found that the “all-together” strategy provided, in general, more variant peptide identifications compared with the “one-by-one” approach, while showing similar performance for some specific cases. To validate further the results of this study, we performed a control comparison of the strategies in question using publicly available data for a mixture of the annotated human protein standard UPS1 and E. coli. For these data, both “all-together” and “one-by-one” approaches showed similar sensitivity and specificity of the searches, while the “direct” approach resulted in an increased number of false identifications.

  12. Analysis outputs

    • figshare.com
    csv
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jemma Daniel (2025). Analysis outputs [Dataset]. http://doi.org/10.6084/m9.figshare.30147601.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 29, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jemma Daniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Outputs are organised into folders:helaqc_results: outputs from evaluation of a model trained and evaluated on the HeLa Single Shot datasetgeneral_results: outputs from evaluation of the general modelfeature_importance: outputs from feature importance evaluation on the general model, specifically feature permutation and SHAP analysisgeneralisation: outputs from hold-one-out calibrator generalisation assessmentNote: the columns mz_array and intensity_array have been removed from the result of the hold-one-out calibrator analysis (generalisation/calibrator_generalisation_results_no_arrays.csv) to save storage space.

  13. f

    Data from: Integrated Proteomic Pipeline Using Multiple Search Engines for a...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Aug 30, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee, Hyun Kyoung; Lee, Hyoung-Joo; Hwang, Heeyoun; Park, Ji Yeong; Lee, Ju Yeon; Kim, Jin Young; Yates, John R.; Kim, Kwang Hoe; Yoo, Jong Shin; Paik, Young-Ki; Park, Sung-Kyu Robin; Kwon, Kyung-Hoon; Park, Young Mok; Park, Gun Wook; Ji, Eun Sun (2016). Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001520981
    Explore at:
    Dataset updated
    Aug 30, 2016
    Authors
    Lee, Hyun Kyoung; Lee, Hyoung-Joo; Hwang, Heeyoun; Park, Ji Yeong; Lee, Ju Yeon; Kim, Jin Young; Yates, John R.; Kim, Kwang Hoe; Yoo, Jong Shin; Paik, Young-Ki; Park, Sung-Kyu Robin; Kwon, Kyung-Hoon; Park, Young Mok; Park, Gun Wook; Ji, Eun Sun
    Description

    In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).

  14. f

    Computational and Mass-Spectrometry-Based Workflow for the Discovery and...

    • datasetcatalog.nlm.nih.gov
    • acs.figshare.com
    Updated Feb 13, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bouyssié, David; Van Dorsselaer, Alain; Vandenbrouck, Yves; Bruley, Christophe; Gateau, Alain; Benama, Mohamed; Burel, Alexandre; Jaquinod, Michel; Mouton-Barbosa, Emmanuelle; Burlet-Schiltz, Odile; de Peredo, Anne Gonzalez; Garin, Jérôme; Cianferani, Sarah; Opsomer, Alisson; Carapito, Christine; Garrigues, Luc; Lane, Lydie (2016). Computational and Mass-Spectrometry-Based Workflow for the Discovery and Validation of Missing Human Proteins: Application to Chromosomes 2 and 14 [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001883215
    Explore at:
    Dataset updated
    Feb 13, 2016
    Authors
    Bouyssié, David; Van Dorsselaer, Alain; Vandenbrouck, Yves; Bruley, Christophe; Gateau, Alain; Benama, Mohamed; Burel, Alexandre; Jaquinod, Michel; Mouton-Barbosa, Emmanuelle; Burlet-Schiltz, Odile; de Peredo, Anne Gonzalez; Garin, Jérôme; Cianferani, Sarah; Opsomer, Alisson; Carapito, Christine; Garrigues, Luc; Lane, Lydie
    Description

    In the framework of the C-HPP, our Franco-Swiss consortium has adopted chromosomes 2 and 14, coding for a total of 382 missing proteins (proteins for which evidence is lacking at protein level). Over the last 4 years, the French proteomics infrastructure has collected high-quality data sets from 40 human samples, including a series of rarely studied cell lines, tissue types, and sample preparations. Here we described a step-by-step strategy based on the use of bioinformatics screening and subsequent mass spectrometry (MS)-based validation to identify what were up to now missing proteins in these data sets. Screening database search results (85 326 dat files) identified 58 of the missing proteins (36 on chromosome 2 and 22 on chromosome 14) by 83 unique peptides following the latest release of neXtProt (2014-09-19). PSMs corresponding to these peptides were thoroughly examined by applying two different MS-based criteria: peptide-level false discovery rate calculation and expert PSM quality assessment. Synthetic peptides were then produced and used to generate reference MS/MS spectra. A spectral similarity score was then calculated for each pair of reference-endogenous spectra and used as a third criterion for missing protein validation. Finally, LC–SRM assays were developed to target proteotypic peptides from four of the missing proteins detected in tissue/cell samples, which were still available and for which sample preparation could be reproduced. These LC–SRM assays unambiguously detected the endogenous unique peptide for three of the proteins. For two of these, identification was confirmed by additional proteotypic peptides. We concluded that of the initial set of 58 proteins detected by the bioinformatics screen, the consecutive MS-based validation criteria led to propose the identification of 13 of these proteins (8 on chromosome 2 and 5 on chromosome 14) that passed at least two of the three MS-based criteria. Thus, a rigorous step-by-step approach combining bioinformatics screening and MS-based validation assays is particularly suitable to obtain protein-level evidence for proteins previously considered as missing. All MS/MS data have been deposited in ProteomeXchange under identifier PXD002131.

  15. e

    Data from: Defining, Comparing, and Improving iTRAQ Quantification in Mass...

    • ebi.ac.uk
    Updated Jul 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik J Johansson (2017). Defining, Comparing, and Improving iTRAQ Quantification in Mass Spectrometry Proteomics Data [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD000418
    Explore at:
    Dataset updated
    Jul 31, 2017
    Authors
    Henrik J Johansson
    Variables measured
    Proteomics
    Description

    The purpose of this study was to generate a basis for the decision of what protein quantities are reliable and find a way for accurate and precise protein quantification. To investigate this we have used thousands of peptide measurements to estimate variance and bias for quantification by iTRAQ (isobaric tags for relative and absolute quantification) mass spectrometry in complex human samples. A549 cell lysate was mixed in the proportions 2:2:1:1:2:2:1:1, fractionated by high resolution isoelectric focusing and liquid chromatography and analyzed by three mass spectrometry platforms; LTQ Orbitrap Velos, 4800 MALDI-TOF/TOF and 6530 Q-TOF. We have investigated how variance and bias in the iTRAQ reporter ions data are affected by common experimental variables such as sample amount, sample fractionation, fragmentation energy, and instrument platform. Based on this, we have suggested a concept for experimental design and a methodology for protein quantification. By using duplicate samples in each run, each experiment is validated based on its internal experimental variation. The duplicates are used for calculating peptide weights, unique to the experiment, which is used in the protein quantification. By weighting the peptides depending on reporter ion intensity, we can decrease the relative error in quantification at the protein level and assign a total weight to each protein that reflects the protein quantitation confidence. We also demonstrate the usability of this methodology in a cancer cell line experiment as well as in a clinical data set of lung cancer tissue samples. In conclusion, we have in this study developed a methodology for improved protein quantification in shotgun proteomics and introduced a way to assess quantification for proteins with few peptides. The experimental design and developed algorithms decreased the relative protein quantification error in the analysis of complex biological samples. Data analysis: LTQ Orbitrap Velos Proteome discoverer 1.1 with Mascot 2.2 (Matrix Science) was used for protein identification. Precursor mass tolerance was set to 10 ppm and for fragments 0.8 Da and 0.015 Da were used for detection in the linear iontrap and the orbitrap, respectively. Oxidized methionine was set as dynamic modification and carbamidomethylation, N-terminal 8plex iTRAQ, and lysyl 8plex iTRAQ as fixed modifications. 4800 MALDI TOF/TOF Peptide identification from the Maldi-TOF/TOF data was carried out using the Paragon algorithm in the ProteinPilot 2.0 software package (Applied Biosystems). Default settings for a 4800 instrument were used (i.e. no manual settings for mass tolerance was given). The following parameters were selected in the analysis method: iTRAQ 8plex peptide labeled as sample type, IAA as alkylating agent of cysteine, trypsin as digesting enzyme, 4800 as instrument, gel based ID and Urea denaturation as special factors, biological modifications as ID focus, and thorough ID as search effort. 6530 QTOF Peptide identification from the QTOF data was carried out using the Spectrum Mill Protein Identification software (Agilent). Data was extracted between MH+ 600 and 4000 Da (Agilent’s definition). Trypsin was used as digesting enzyme, and parent and daughter ion tolerance was set to 25 and 50 ppm, respectively. IAA for cysteine and iTRAQ partial-mix (N-term, K) were set as fixed modifications while oxidized methionine was set as variable modification. Database and peptide cut-off for all searches Searches were performed against the IPI database (build 3.64) limited to human sequences allowing 2 missed cleavages. False discovery rate (FDR) was estimated by searching the data against a database consisting of both forward and reversed sequences and set to < 1 % at the protein level using MAYU. Peptides corresponding to a <1% protein FDR rate was used in the calculations. Peptide and protein identification using Mascot for comparison between instruments Peptide identifications were performed using Mascot Daemon 2.3.2 with Mascot 2.4 for fractions 32 to 36 from IPG-IEF with 400 ug loaded peptides. Carbamidomethylation (CAM) for cysteine was set as fixed modification, oxidized methionine as variable modification and iTRAQ 8plex was set as quantification for all searches. MALDI-TOF/TOF search settings: Parent and daughter ion tolerance was set to 150 ppm and 0.2 Da, respectively. LTQ Orbitrap search settings: Precursor mass tolerance was set to 10 ppm and for fragments 0.8 Da and 0.015 Da were used for data generated in the linear ion trap and the orbitrap, respectively. QTOF search settings: Parent and daughter ion tolerance was set to 25 and 50 ppm, respectively.

  16. e

    Mass spectrometry-based quantitative proteomics of whole cell lysate from...

    • ebi.ac.uk
    • data.niaid.nih.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnnSofi Sandberg, Mass spectrometry-based quantitative proteomics of whole cell lysate from breast cancer cell line MCF7 spiked with 57 protein standards [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD000578
    Explore at:
    Authors
    AnnSofi Sandberg
    Variables measured
    Proteomics
    Description

    Here we present quantitative proteomics data used in the evaluation of quantitative accuracy. A human cell line, MCF7 was split into 9 aliquotes that were spiked with a dilution series of 57 protein standards of known amounts spanning 5 orders of magnitude. The protein extracts were trypsinized, and the peptides were analysed by LC-MS using either a label-free or a label-based (TMT 6-plex and iTRAQ 8-plex) quantification approach. The iTRAQ- and TMT-labelled samples were co-analysed and separated by HiRIEF (high resolution isoelectric focusing) prior LC-MS. Raw MS data was identified and quantified under the software platform Proteome Discoverer 1.3.0.339 (Thermo Fisher Scientific Inc.) or MaxQuant software (version 1.2.0.18) (label-free data). For both protein identification and quantification at least 1 unique (i.e. a peptide that occurs in not more than one database entry) peptide was required. The false discovery rate (FDR) for peptide identification was set to 5% in all analyses. For the iTRAQ and TMT labeled samples, all MS/MS spectra were searched by SEQUEST combined with the Percolator algorithm (version 2.0) for PSM search optimization. Searches were performed against a custom made database consisting of SwissProt human sequences(uniprot.org 2012-01-17, 20242 entries), and the spiked in protein standards (57 protein sequences). Peptide FDR was calculated by a target – decoy approach.

  17. e

    Discovery and identification of potential markers of Childhood Absence...

    • ebi.ac.uk
    • nde-dev.biothings.io
    Updated Apr 18, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melanie Lagarrigue (2013). Discovery and identification of potential markers of Childhood Absence Epilepsy by MALDI imaging mass spectrometry [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD000009
    Explore at:
    Dataset updated
    Apr 18, 2013
    Authors
    Melanie Lagarrigue
    Variables measured
    Proteomics
    Description

    Top down identification of proteins detected by MALDI imaging. Childhood absence epilepsy is a prototypic form of generalized nonconvulsive epilepsy characterized by short impairments of consciousness concomitant with synchronous and bilateral spike-and-wave discharges in the electroencephalogram. For scientists in this field, the BS/Orl and BR/Orl mouse lines, derived from a genetic selection, constitute an original mouse model "in mirror" of absence epilepsy. The potential of MALDI imaging mass spectrometry (IMS) for the discovery of potential biomarkers is increasingly recognized. Interestingly, statistical analysis tools specifically adapted to IMS data sets and methods for the identification of detected proteins play an essential role. In this study, a new cross-classification comparative design using a combined discrete wavelet transformation-support vector machine classification was developed to discriminate spectra of brain sections of BS/Orl and BR/Orl mice. Nineteen m/z ratios were thus highlighted as potential markers with very high recognition rates (87-99%). Seven of these potential markers were identified using a top-down approach, in particular a fragment of Synapsin-I. This protein is yet suspected to be involved in epilepsy. Immunohistochemistry and Western Blot experiments confirmed the differential expression of Synapsin-I observed by IMS, thus tending to validate our approach. Functional assays are being performed to confirm the involvement of Synapsin-I in the mechanisms underlying childhood absence epilepsy. Data processing and bioinformatics: The ProSight PC 2.0 software(Thermo Scientific) was used to create the peak list from .raw data. RAW files were processed using the Xtract algorithm that interprets resolved isotopic distributions and output neutral mass values. In this study, two search modes were used: “absolute mass” search for the identification of full-length proteins and “biomarker” search for the identification of protein fragments. For absolute mass search, ProSight PC restricts the protein database to proteins matching the mass of the precursor, whereas for biomarker search, the protein database is restricted to each protein subsequence matching the mass of the precursor. In order to determine the false discovery rate (FDR), the FASTA format Uniprot database (release 2011_08, 531473 sequences, 188463640 residues) filtered with the Mus musculus taxonomy (16401 sequences) was concatenated with a decoy database constituted by randomized protein sequences. The resulting database was imported in the ProSight PC 2.0 software and configured for top-down analysis. For each analyzed fraction, the FDR was calculated as follows: FDR = 2 × FP/(FP + TP), where FP and TP are the number of matches from the decoy and target database, respectively. All identifications reported here correspond to a FDR lower than 5%. MS/MS spectra were searched against the database using the absolute mass search mode with a precursor ion mass tolerance of 10 ppm and a fragment ion mass tolerance of 15 ppm on monoisotopic masses. The identification score is based on the number of observed fragment ions matching the fragment ion tolerance. A second absolute mass search was performed with a loose precursor ion tolerance of 1000 Da to evidence post-translational modifications. The Sequence Gazer tool was used to manually check proteins identified with a significant score but with a difference between the theoretical precursor ion mass and the observed precursor ion mass. Post-translational modification(s) can thus be localized on the sequence of the protein due to mass shift(s) observed for b and/or y ions. MS/MS spectra were then searched using the biomarker search mode with a precursor ion mass tolerance of 10 ppm and a fragment ion mass tolerance of 15 ppm on monoisotopic masses to identify fragments of intact proteins present in the database. A new database was then constructed from all protein sequences identified with the previous searches in order to perform a biomarker search with a precursor ion mass tolerance of 100 Da and a fragment ion mass tolerance of 15 ppm. Indeed, such precursor ion mass tolerance in the biomarker search mode is unusable for large databases such as the Uniprot database filtered with the Mus musculus taxonomy (16401 sequences) and requires the use of small databases. This search was performed to identify modified fragments of previously identified proteins that could not have been identified with the biomarker search performed with a precursor ion mass tolerance of 10 ppm. The Sequence Gazer tool was used to manually localize post-translational modification(s) on the sequence of the protein fragments due to mass shift(s) observed for b and/or y ions.

  18. f

    Data from: DirectMS1Quant: Ultrafast Quantitative Proteomics with MS/MS-Free...

    • datasetcatalog.nlm.nih.gov
    • acs.figshare.com
    Updated Sep 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gorshkov, Vladimir; Gorshkov, Mikhail V.; Solovyeva, Elizaveta M.; Tarasova, Irina A.; Bubis, Julia A.; Levitsky, Lev I.; Ivanov, Mark V.; Lipatova, Anastasiya V.; Kjeldsen, Frank (2022). DirectMS1Quant: Ultrafast Quantitative Proteomics with MS/MS-Free Mass Spectrometry [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000291525
    Explore at:
    Dataset updated
    Sep 15, 2022
    Authors
    Gorshkov, Vladimir; Gorshkov, Mikhail V.; Solovyeva, Elizaveta M.; Tarasova, Irina A.; Bubis, Julia A.; Levitsky, Lev I.; Ivanov, Mark V.; Lipatova, Anastasiya V.; Kjeldsen, Frank
    Description

    Recently, we presented the DirectMS1 method of ultrafast proteome-wide analysis based on minute-long LC gradients and MS1-only mass spectra acquisition. Currently, the method provides the depth of human cell proteome coverage of 2500 proteins at a 1% false discovery rate (FDR) when using 5 min LC gradients and 7.3 min runtime in total. While the standard MS/MS approaches provide 4000–5000 protein identifications within a couple of hours of instrumentation time, we advocate here that the higher number of identified proteins does not always translate into better quantitation quality of the proteome analysis. To further elaborate on this issue, we performed a one-on-one comparison of quantitation results obtained using DirectMS1 with three popular MS/MS-based quantitation methods: label-free (LFQ) and tandem mass tag quantitation (TMT), both based on data-dependent acquisition (DDA) and data-independent acquisition (DIA). For comparison, we performed a series of proteome-wide analyses of well-characterized (ground truth) and biologically relevant samples, including a mix of UPS1 proteins spiked at different concentrations into an Echerichia coli digest used as a background and a set of glioblastoma cell lines. MS1-only data was analyzed using a novel quantitation workflow called DirectMS1Quant developed in this work. The results obtained in this study demonstrated comparable quantitation efficiency of 5 min DirectMS1 with both TMT and DIA methods, yet the latter two utilized a 10–20-fold longer instrumentation time.

  19. e

    Single-shot top-down proteomics with capillary zone...

    • ebi.ac.uk
    • data-staging.niaid.nih.gov
    Updated Oct 30, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liangliang Sun (2017). Single-shot top-down proteomics with capillary zone electrophoresis-electrospray ionization-tandem mass spectrometry for identification of nearly 600 Escherichia coli proteoforms [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD007273
    Explore at:
    Dataset updated
    Oct 30, 2017
    Authors
    Liangliang Sun
    Variables measured
    Proteomics
    Description

    Capillary zone electrophoresis-electrospray ionization-tandem mass spectrometry (CZE-ESI-MS/MS) has been recognized as an invaluable platform for top-down proteomics. However, the scale of top-down proteomics from CZE-MS/MS is still limited due to the low loading capacity and narrow separation window of CZE. In this work, for the first time we systematically evaluated dynamic pH junction method for focusing of intact proteins during CZE-MS. The optimized dynamic pH junction based CZE-MS/MS system approached 1-µL loading capacity, 90-min separation window and high peak capacity (~280) for separation of Escherichia coli proteome. The results represent the largest loading capacity and the highest peak capacity of CZE for top-down characterization of complex proteomes. About 2,800 proteoform-spectrum matches, nearly 600 proteoforms, and 200 proteins were identified from an Escherichia coli lysate by single-shot CZE-MS/MS with spectrum-level false discovery rate (FDR) less than 1%. The number of proteoforms is over three times higher than that from previous single-shot CZE-MS/MS.

  20. FDR-controlled metabolite annotation for high-resolution imaging mass...

    • data.niaid.nih.gov
    xml
    Updated Apr 10, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Palmer (2017). FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry imaging MS (MALDI imaging assay) [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls313
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Apr 10, 2017
    Dataset provided by
    EMBL
    Authors
    Andrew Palmer
    Variables measured
    Metabolomics, mouse status
    Description

    High-mass-resolution imaging mass spectrometry promises to localize hundreds of metabolites directly from tissues, cell cultures, and agar plates with cellular resolution, but is hampered by the lack of bioinformatics for automated metabolite identification. We developed the first bioinformatics framework for False Discovery Rate (FDR)-controlled metabolite annotation for high-mass-resolution imaging mass spectrometry (https://github.com/alexandrovteam/pySM) introducing a Metabolite-Signal Match (MSM) score and a target-decoy FDR-estimate for spatial metabolomics. MALDI-FTICR datasets acquired from wild type adult mouse brain were used for the development of spatial metabolomics annotation bioinformatics. This study together with MTBLS317 and MTBLS378 provide the accompanying data for FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yulia Danilova; Anastasia Voronkova; Pavel Sulimov; Attila Kertész-Farkas (2023). Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00991.s002

Data from: Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
ACS Publications
Authors
Yulia Danilova; Anastasia Voronkova; Pavel Sulimov; Attila Kertész-Farkas
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Accurate target-decoy-based false discovery rate (FDR) control of peptide identification from tandem mass-spectrometry data relies on an important but often neglected assumption that incorrect spectrum annotations are equally likely to receive either target or decoy peptides. Here we argue that this assumption is often violated in practice, even by popular methods. Preference can be given to target peptides by biased scoring functions, which result in liberal FDR estimations, or to decoy peptides by correlated spectra, which result in conservative estimations.

Search
Clear search
Close search
Google apps
Main menu