100+ datasets found
  1. Introduction to Computational Proteomics

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacques Colinge; Keiryn L Bennett (2023). Introduction to Computational Proteomics [Dataset]. http://doi.org/10.1371/journal.pcbi.0030114
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jacques Colinge; Keiryn L Bennett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction to Computational Proteomics

  2. Comparison of computational and proteomics datasets for the secretomes of T....

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZhongQiang Chen; Omar S. Harb; David S. Roos (2023). Comparison of computational and proteomics datasets for the secretomes of T. gondii and P. falciparum. [Dataset]. http://doi.org/10.1371/journal.pone.0003611.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    ZhongQiang Chen; Omar S. Harb; David S. Roos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of computational and proteomics datasets for the secretomes of T. gondii and P. falciparum.

  3. o

    Data from: SugarPy facilitates the universal, discovery-driven analysis of...

    • omicsdi.org
    • ebi.ac.uk
    xml
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Schulze, SugarPy facilitates the universal, discovery-driven analysis of intact glycopeptides [Dataset]. https://www.omicsdi.org/dataset/pride/PXD017345
    Explore at:
    xmlAvailable download formats
    Authors
    Stefan Schulze
    Variables measured
    Proteomics
    Description

    Protein glycosylation is a complex post-translational modification with crucial cellular functions in all domains of life. Currently, large-scale glycoproteomics approaches rely on glycan database dependent algorithms and are thus unsuitable for discovery-driven analyses of glycoproteomes. Therefore, we devised SugarPy, a glycan database independent Python module, and validated it on the glycoproteome of human breast milk. We further demonstrated its applicability by analyzing glycoproteomes with uncommon glycans stemming from the green algae Chalmydomonas reinhardtii and the archaeon Haloferax volcanii. Finally, SugarPy facilitated the novel characterization of glycoproteins from Cyanidioschyzon merolae.

  4. MePPi: A complete and flexible workflow for metaproteomics data analyses

    • zenodo.org
    • search.datacite.org
    bin
    Updated Mar 26, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henning Schiebenhoefer; Henning Schiebenhoefer; Kay Schallert; Kay Schallert; Thilo Muth; Thilo Muth; Stephan Fuchs; Stephan Fuchs (2020). MePPi: A complete and flexible workflow for metaproteomics data analyses [Dataset]. http://doi.org/10.5281/zenodo.3551765
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 26, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Henning Schiebenhoefer; Henning Schiebenhoefer; Kay Schallert; Kay Schallert; Thilo Muth; Thilo Muth; Stephan Fuchs; Stephan Fuchs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for an examplary metaproteomics data analysis with the MetaProteomeAnalyzer (MPA) and Prophane software tools. Data is from the PRIDE dataset PXD010550.

    Files include:

  5. Additional file 1: of Geena 2, improved automated analysis of MALDI/TOF mass...

    • springernature.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paolo Romano; Aldo Profumo; Mattia Rocco; Rosa Mangerini; Fabio Ferri; Angelo Facchiano (2023). Additional file 1: of Geena 2, improved automated analysis of MALDI/TOF mass spectra [Dataset]. http://doi.org/10.6084/m9.figshare.10038149.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Paolo Romano; Aldo Profumo; Mattia Rocco; Rosa Mangerini; Fabio Ferri; Angelo Facchiano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example input file for Geena 2. This example file can be used for testing purposes. It includes 12 MS spectra generated by MALDI/TOF from four biological samples in the context of a real experiment. Three spectra were generated for each sample. The format of the file is described in details, and with examples, in the manuscript and in the information file on Input/Output data formats in the web site. (TXT 26Â kb)

  6. Data from: DIA-Umpire: comprehensive computational framework for data...

    • data.niaid.nih.gov
    • ebi.ac.uk
    xml
    Updated Jan 21, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chih-Chiang Tsou; Alexey I. Nesvizhskii (2015). DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics [Dataset]. https://data.niaid.nih.gov/resources?id=pxd001587
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jan 21, 2015
    Dataset provided by
    University of Michigan
    Authors
    Chih-Chiang Tsou; Alexey I. Nesvizhskii
    Variables measured
    Multiomics, Proteomics
    Description

    This dataset consists of 44 raw MS files, comprising 27 DIA (SWATH) and 15 DDA runs on a TripleTOF 5600 and of two raw mass spectrometry files acquired on a Q Exactive. The composition of the dataset is described in the manuscript by Tsou et al., titled: "DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics", Nature Methods, in press Raw files are deposited here in ProteomeXchange and are associated with the DIA-Umpire processed data. All DIA-Umpire processed results for each sample together with DDA results are deposited in separated folders. Also see the "DataSampleID.xlsx" associated with this Readme file. Internal reference from the Gingras lab ProHits implementation: Project 94, Export version VS2 (Tsou_DIA-Umpire)

  7. f

    Data from: A Network Module for the Perseus Software for Computational...

    • figshare.com
    • acs.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Daniel Rudolph; Jürgen Cox (2023). A Network Module for the Perseus Software for Computational Proteomics Facilitates Proteome Interaction Graph Analysis [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00927.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    ACS Publications
    Authors
    Jan Daniel Rudolph; Jürgen Cox
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Proteomics data analysis strongly benefits from not studying single proteins in isolation but taking their multivariate interdependence into account. We introduce PerseusNet, the new Perseus network module for the biological analysis of proteomics data. Proteomics is commonly used to generate networks, e.g., with affinity purification experiments, but networks are also used to explore proteomics data. PerseusNet supports the biomedical researcher for both modes of data analysis with a multitude of activities. For affinity purification, a volcano-plot-based statistical analysis method for network generation is featured which is scalable to large numbers of baits. For posttranslational modifications of proteins, such as phosphorylation, a collection of dedicated network analysis tools helps in elucidating cellular signaling events. Co-expression network analysis of proteomics data adopts established tools from transcriptome co-expression analysis. PerseusNet is extensible through a plugin architecture in a multi-lingual way, integrating analyses in C#, Python, and R, and is freely available at http://www.perseus-framework.org.

  8. multiFLEX-LF: A Computational Approach to Quantify the Modification...

    • data.niaid.nih.gov
    • acs.figshare.com
    xml
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph Schlaffner; Hanno Steen (2022). multiFLEX-LF: A Computational Approach to Quantify the Modification Stoichiometries in Label-free Proteomics Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=pxd027970
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    Boston Children's Hospital; Harvard Medical School
    Department of Pathology, Boston Children's Hospital, USA Department of Pathology, Harvard Medical School, USA
    Authors
    Christoph Schlaffner; Hanno Steen
    Variables measured
    Proteomics
    Description

    In high-throughput LC-MS/MS-based proteomics, information about the presence and stoichiometry of post-translational modifications is normally not readily available. To overcome this problem we developed multiFLEX-LF, a computational tool that builds upon FLEXIQuant and FLEXIQuant-LF, which detect modified peptides and quantify their modification extent by monitoring the differences between observed and expected intensities of the unmodified peptides. To this end, multiFLEX-LF relies on robust linear regression to calculate the modification extent of a given peptide relative to a within-study reference. multiFLEX-LF can analyze entire label-free discovery proteomics datasets. Furthermore, to analyze modification dynamics and co-regulated modifications, the peptides of all proteins are hierarchically clustered based on their computed relative modification scores. To demonstrate the versatility of multiFLEX-LF we applied it on a cell-cycle time series dataset acquired using data-independent acquisition. The clustering of the peptides highlighted several groups of peptides with different modification dynamics across the four analyzed time points providing evidence of the kinases involved in the cell-cycle. Overall, multiFLEX-LF enables fast identification of potentially differentially modified peptides and quantification of their differential modification extent in large datasets. Additionally, multiFLEX-LF can drive large-scale investigation of modification dynamics of peptides in time series and case-control studies. multiFLEX-LF is available at https://gitlab.com/SteenOmicsLab/multiflex-lf.

  9. e

    Original TIMS-TOF pro BSA data sets for MaxLynx, a novel computational...

    • ebi.ac.uk
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sule Yilmaz, Original TIMS-TOF pro BSA data sets for MaxLynx, a novel computational proteomics workflow for XL-MS integrated into the MaxQuant environment. [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD027161
    Explore at:
    Authors
    Sule Yilmaz
    Variables measured
    Proteomics
    Description

    Cross-linking combined with mass spectrometry (XL-MS) provides a wealth of information about the 3D structure of proteins and their interactions. We introduce MaxLynx, a novel computational proteomics workflow for XL-MS integrated into the MaxQuant environment and here we have tested the performance of MaxLynx on the data sets that were generated by using a Bruker timsTOF pro instrument.

  10. C

    Computational Biology Industry Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Computational Biology Industry Report [Dataset]. https://www.marketreportanalytics.com/reports/computational-biology-industry-96013
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The computational biology market is experiencing robust growth, driven by the increasing adoption of advanced technologies like artificial intelligence (AI) and machine learning (ML) in drug discovery and development. The market's Compound Annual Growth Rate (CAGR) of 13.33% from 2019 to 2024 indicates a significant upward trajectory, projected to continue into the forecast period (2025-2033). Key drivers include the rising prevalence of chronic diseases necessitating faster and more efficient drug development processes, the decreasing cost of high-throughput sequencing and data storage, and the increasing availability of large biological datasets fueling advanced computational analyses. The market segmentation reveals strong demand across various applications, including cellular and biological simulations (particularly in genomics and proteomics), drug discovery and disease modeling (with target identification and validation being prominent areas), and preclinical drug development (focused on pharmacokinetics and pharmacodynamics). Clinical trial applications are also significant, spanning Phases I, II, and III. Software tools like databases, analysis software, and specialized infrastructure are critical components, further segmented by service type (in-house vs. contract) and end-user (academic institutions and commercial entities). North America currently holds a significant market share, but Asia-Pacific is projected to witness substantial growth owing to increasing investments in research and development and the rising adoption of computational biology techniques in emerging economies. The competitive landscape is dynamic, with several major players such as Dassault Systèmes SE, Certara, and Schrödinger contributing to innovation. However, the market also includes numerous smaller, specialized companies focusing on niche applications or specific technologies. This competitive landscape encourages continuous innovation, driving the development of more sophisticated software, improved algorithms, and enhanced analytical capabilities. While data limitations exist regarding precise market size figures, extrapolating from the provided CAGR and industry reports suggests a substantial market value currently, exceeding several billion dollars and poised for continued expansion. The focus on precision medicine and personalized therapies further strengthens the long-term growth potential of the computational biology market. Challenges include the complexity of biological systems, the need for robust data validation, and the ethical considerations associated with the use of AI and big data in healthcare. Recent developments include: February 2023: The Centre for Development of Advanced Computing (C-DAC) launched two software tools critical for research in life sciences. Integrated Computing Environment, one of the products, is an indigenous cloud-based genomics computational facility for bioinformatics that integrates ICE-cube, a hardware infrastructure, and ICE flakes. This software will help securely store and analyze petascale to exascale genomics data., January 2023: Insilico Medicine, a clinical-stage, end-to-end artificial intelligence (AI)-driven drug discovery company, launched the 6th generation Intelligent Robotics Lab to accelerate its AI-driven drug discovery. The fully automated AI-powered robotics laboratory performs target discovery, compound screening, precision medicine development, and translational research.. Key drivers for this market are: Increase in Bioinformatics Research, Increasing Number of Clinical Studies in Pharmacogenomics and Pharmacokinetics; Growth of Drug Designing and Disease Modeling. Potential restraints include: Increase in Bioinformatics Research, Increasing Number of Clinical Studies in Pharmacogenomics and Pharmacokinetics; Growth of Drug Designing and Disease Modeling. Notable trends are: Industry and Commercials Sub-segment is Expected to hold its Highest Market Share in the End User Segment.

  11. Z

    Data from: decoupleR: Ensemble of computational methods to infer biological...

    • data.niaid.nih.gov
    Updated Nov 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Dimitrov (2021). decoupleR: Ensemble of computational methods to infer biological activities from omics data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5644862
    Explore at:
    Dataset updated
    Nov 4, 2021
    Dataset provided by
    Celina Geiss
    Jesús Vélez
    Daniel Dimitrov
    Ricardo O. Ramirez Flores
    Jana Braunger
    Sophia Müller-Dott
    Julio Saez-Rodriguez
    Petr Taus
    Christian H. Holland
    Pau Badia-i-Mompel
    Aurélien Dugourd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used decoupleR to evaluate the performance of individual methods by recovering perturbed transcription factors (TFs) from a curation of single-gene perturbation experiments (Holland et al., 2020). As a resource we used DoRothEA, a gene regulatory network linking TFs to target genes by their mode of regulation (Garcia-Alonso et al., 2019). Perturbation experiments where the targeted regulator was not in DoRothEA were removed. After filtering, this dataset is composed of gene expression data from 92 knockdown and overexpression experiments of 40 unique TFs in human cells. Additionally, we tested the performance of decoupleR on phospho-proteomic data. For this, we filtered in a similar fashion a curated set of knockdown and overexpression single-kinase perturbation experiments, obtaining 63 experiments including 14 unique kinases, and applied a weighted resource from the same publication that links kinases to their target phosphosites (Hernandez-Armenta et al., 2017). For the transcriptomic dataset, differential expression analysis was performed with limma (Ritchie et al., 2015) and the resulting t-values were used as input. For the phospho-proteomics, the quantile-normalized log2-fold changes from different studies were used to make them comparable.

  12. e

    Unbiased Proteomic Unbiased Proteomic Analysis for Real-Life Identification...

    • ebi.ac.uk
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Didier Vertommen (2025). Unbiased Proteomic Unbiased Proteomic Analysis for Real-Life Identification of Podocyte Antigens and Disease Mechanisms in Membranous NephropathyAnalysis for Real-Life Identification of Podocyte Antigens and Disease Mechanisms in Membranous Nephropathy [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD054062
    Explore at:
    Dataset updated
    Mar 7, 2025
    Authors
    Didier Vertommen
    Variables measured
    Proteomics
    Description

    Background: In recent years, an innovative strategy using laser microdissection and mass spectrometry markedly expanded the landscape of antigens associated with membranous nephropathy (MN). Specific associations with phenotypes, diseases and sometimes reversible triggers led to a novel antigen-based classification of MN, paving the way for precision medicine and stressing the need for more routine use of proteomics in MN. Methods: To explore the proteomic landscape of human glomeruli and identify podocyte antigens and disease mechanisms in MN, we expanded the original technique to an integrative approach combining laser capture microdissection, next-generation mass spectrometry and computational analysis. Next to conventional data-dependent acquisition (DDA), we used and assessed the diagnostic yield of the more comprehensive data-independent acquisition (DIA) mass spectrometry, which enables the detection and quantification of every peptide in a sample irrespective of its level of abundance or m/z value. Our proteomic pipeline was applied to residual material from kidney biopsies in 64 individuals, including 31 healthy controls; 5 disease controls; 5 PLA2R-associated MN; and 23 PLA2R-negative MN. Results: Unbiased analyses confirmed the significant enrichment in PLA2R, IgG4 and complement proteins in glomeruli from patients with PLA2R-MN compared with healthy and disease controls, while molecular characterization of complement fragments provided evidence for complement activation in PLA2R-MN. Compared to DDA, DIA mass spectrometry increased the number of glomerular proteins (~3800 vs. ~1200) identified in healthy glomeruli; allowed the detection all known antigens except NELL1 in normal glomeruli; and increased the detection rate of podocyte antigens from 50% to >80% in PLA2R-negative MN. Conclusions: This proof-of-concept study suggests that an integrative approach combining laser microdissection, DIA mass spectrometry and computational biology is a powerful tool, with translational potential, to identify podocyte antigens and unravel disease mechanisms in MN.

  13. e

    Morpheus Spectral Counter - Morpheus spectral counter: a computational tool...

    • ebi.ac.uk
    • data.niaid.nih.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David C Gemperline, Morpheus Spectral Counter - Morpheus spectral counter: a computational tool for label-free quantitative mass spectrometry using the morpheus search engine [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD003002
    Explore at:
    Authors
    David C Gemperline
    Variables measured
    Proteomics
    Description

    Label-free quantitative mass spectrometry (MS) based on the Normalized Spectral Abundance Factor (NSAF) has emerged as a simple and reasonably robust method to determine the relative abundance of individual proteins within complex mixtures. Here, we describe Morpheus Spectral Counter (MSpC) as the first computational tool that directly calculates NSAF values from output obtained from Morpheus, a fast, open-source, peptide-MS/MS matching engine compatible with high-resolution mass instruments. NSAF has distinct advantages over other MS-based quantification methods, including a higher dynamic range as compared to isobaric tags, no requirement to align and re-extract MS1 peaks, and increased speed. MSpC features an easy to use graphic user interface that additionally calculates both distributed and unique NSAF values to permit analyses of both protein families and isoforms/proteoforms. MSpC determinations of protein concentration were linear over several orders of magnitude based on the analysis of several high-mass accuracy datasets either obtained from the Proteomics Identifications Repository or generated de novo with total cell extracts spiked with purified Arabidopsis 20S proteasomes. The MSpC software was developed in C# and is open sourced under a permissive license with the code made available at http://dcgemperline.github.io/Morpheus_SpC/.

  14. o

    Visualization of graphical analysis results: Temporal dynamics of the...

    • explore.openaire.eu
    Updated Mar 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole R. Nicole R. Gay; David Amar; MoTrPAC Study Group (2023). Visualization of graphical analysis results: Temporal dynamics of the multi-omic response to endurance exercise training across tissues [Dataset]. http://doi.org/10.5281/zenodo.7703294
    Explore at:
    Dataset updated
    Mar 5, 2023
    Authors
    Nicole R. Nicole R. Gay; David Amar; MoTrPAC Study Group
    Description

    These tissue-level multi-omic graphical analysis reports are provided as additional data for the manuscript “Temporal dynamics of the multi-omic response to endurance exercise training across tissues” (MoTrPAC Study Group, bioRxiv, 2022). Find the preprint here. Extensive background is included in each report. Briefly, we used a graphical clustering approach to define and visualize the temporal dynamics of molecular analytes regulated by endurance exercise training at multiple training time points in male and female rats across many data types ("omes") and tissues. The objective of these multi-omic reports is to share representations of >34,000 training-regulated molecular features in interactive HTML reports that allow researchers to extract meaningful biology from a complex dataset. Each report presents a summary of the significantly training-regulated features across omes in a specific tissue and the corresponding graphical analysis results, as well as features and pathway enrichment results corresponding to the largest graphical clusters (nodes, edges, and paths) for that tissue. A graphical cluster is a group of training-regulated features that share temporal behavior at some point during the training time course. These multi-omic reports are generated using data and functions available through the MotrpacRatTraining6mo R package. Install this R package to explore the data yourself! Get started with this tutorial. {"references": ["Ignatiadis N, Klaus B, Zaugg JB, Huber W. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat Methods. 2016 Jul;13(7):577-80. doi: 10.1038/nmeth.3885. Epub 2016 May 30. PMID: 27240256; PMCID: PMC4930141.", "Heller R, Yaacoby S, Yekutieli D. repfdr: a tool for replicability analysis for genome-wide association studies. Bioinformatics. 2014 Oct 15;30(20):2971-2. doi: 10.1093/bioinformatics/btu434. Epub 2014 Jul 9. PMID: 25012182.", "Almende B.V. and Contributors, Thieurmel B (2022). visNetwork: Network Visualization using 'vis.js' Library. R package version 2.1.2, https://CRAN.R-project.org/package=visNetwork.", "Gay N, Amar D, Jean Beltran P, MoTrPAC Study Group (2022). MotrpacRatTraining6mo: Analysis of the MoTrPAC endurance exercise training data in 6-month-old rats. R package version 1.5.2, https://motrpac.github.io/MotrpacRatTraining6mo/."]}

  15. e

    A PTM-centric Proteome Informatic Pipeline for Monitoring Post-Translational...

    • ebi.ac.uk
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Ata Saei (2025). A PTM-centric Proteome Informatic Pipeline for Monitoring Post-Translational Modifications by PEIMAN2 [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD037681
    Explore at:
    Dataset updated
    Jun 5, 2025
    Authors
    Amir Ata Saei
    Variables measured
    Proteomics
    Description

    Post-translational modifications (PTMs) are under significant focus in molecular biomedicine due to their importance in signal transduction in most cellular and organismal processes. Identification of PTMs, determination of PTM location sites, discrimination between functional and inert PTMs, and quantification of their occupancies are demanding tasks, especially in the light of PTM crosstalk in each biosystem. On top of that, the study of each PTM often necessitates a particular experimental design in majority of cases. Computational approaches can identify the relevant PTMs in a biosystem and help to design follow-up experiments involving specific PTM enrichment. Here, we present a PTM-centric proteome informatic pipeline for prediction of most probable and relevant PTMs in mass spectrometry-based proteomics data and refining raw data search parameters based on the acquired knowledge. Using expression profiling, we identified cellular proteins that are differentially regulated in response to multikinase inhibitors dasatinib and staurosporine at four different concentrations. Computational enrichment analysis was employed to determine the potential PTMs of protein targets for both drugs. Finally, we conducted an additional round of database search with these predicted chemical modifications. Our pipeline helped analyze the enriched PTMs and even detected proteins that were not picked up in the initial search. Our findings support the idea of PTM-oriented searching of MS data in proteomics based on computational enrichment analysis.

  16. f

    Integrating Genomics and Proteomics Data to Predict Drug Effects Using...

    • plos.figshare.com
    docx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhiwei Ji; Jing Su; Chenglin Liu; Hongyan Wang; Deshuang Huang; Xiaobo Zhou (2023). Integrating Genomics and Proteomics Data to Predict Drug Effects Using Binary Linear Programming [Dataset]. http://doi.org/10.1371/journal.pone.0102798
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Zhiwei Ji; Jing Su; Chenglin Liu; Hongyan Wang; Deshuang Huang; Xiaobo Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Library of Integrated Network-Based Cellular Signatures (LINCS) project aims to create a network-based understanding of biology by cataloging changes in gene expression and signal transduction that occur when cells are exposed to a variety of perturbations. It is helpful for understanding cell pathways and facilitating drug discovery. Here, we developed a novel approach to infer cell-specific pathways and identify a compound's effects using gene expression and phosphoproteomics data under treatments with different compounds. Gene expression data were employed to infer potential targets of compounds and create a generic pathway map. Binary linear programming (BLP) was then developed to optimize the generic pathway topology based on the mid-stage signaling response of phosphorylation. To demonstrate effectiveness of this approach, we built a generic pathway map for the MCF7 breast cancer cell line and inferred the cell-specific pathways by BLP. The first group of 11 compounds was utilized to optimize the generic pathways, and then 4 compounds were used to identify effects based on the inferred cell-specific pathways. Cross-validation indicated that the cell-specific pathways reliably predicted a compound's effects. Finally, we applied BLP to re-optimize the cell-specific pathways to predict the effects of 4 compounds (trichostatin A, MS-275, staurosporine, and digoxigenin) according to compound-induced topological alterations. Trichostatin A and MS-275 (both HDAC inhibitors) inhibited the downstream pathway of HDAC1 and caused cell growth arrest via activation of p53 and p21; the effects of digoxigenin were totally opposite. Staurosporine blocked the cell cycle via p53 and p21, but also promoted cell growth via activated HDAC1 and its downstream pathway. Our approach was also applied to the PC3 prostate cancer cell line, and the cross-validation analysis showed very good accuracy in predicting effects of 4 compounds. In summary, our computational model can be used to elucidate potential mechanisms of a compound's efficacy.

  17. f

    Summary of the dataset and subsequent analyses.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Zhang; Hye Kyong Kweon; Christian Shively; Anuj Kumar; Philip C. Andrews (2023). Summary of the dataset and subsequent analyses. [Dataset]. http://doi.org/10.1371/journal.pcbi.1003077.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Yan Zhang; Hye Kyong Kweon; Christian Shively; Anuj Kumar; Philip C. Andrews
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of the dataset and subsequent analyses.

  18. f

    PSEA-Quant: A Protein Set Enrichment Analysis on Label-Free and Label-Based...

    • acs.figshare.com
    • figshare.com
    txt
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathieu Lavallée-Adam; Navin Rauniyar; Daniel B. McClatchy; John R. Yates (2023). PSEA-Quant: A Protein Set Enrichment Analysis on Label-Free and Label-Based Protein Quantification Data [Dataset]. http://doi.org/10.1021/pr500473n.s006
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    ACS Publications
    Authors
    Mathieu Lavallée-Adam; Navin Rauniyar; Daniel B. McClatchy; John R. Yates
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The majority of large-scale proteomics quantification methods yield long lists of quantified proteins that are often difficult to interpret and poorly reproduced. Computational approaches are required to analyze such intricate quantitative proteomics data sets. We propose a statistical approach to computationally identify protein sets (e.g., Gene Ontology (GO) terms) that are significantly enriched with abundant proteins with reproducible quantification measurements across a set of replicates. To this end, we developed PSEA-Quant, a protein set enrichment analysis algorithm for label-free and label-based protein quantification data sets. It offers an alternative approach to classic GO analyses, models protein annotation biases, and allows the analysis of samples originating from a single condition, unlike analogous approaches such as GSEA and PSEA. We demonstrate that PSEA-Quant produces results complementary to GO analyses. We also show that PSEA-Quant provides valuable information about the biological processes involved in cystic fibrosis using label-free protein quantification of a cell line expressing a CFTR mutant. Finally, PSEA-Quant highlights the differences in the mechanisms taking place in the human, rat, and mouse brain frontal cortices based on tandem mass tag quantification. Our approach, which is available online, will thus improve the analysis of proteomics quantification data sets by providing meaningful biological insights.

  19. f

    High Performance Computational Analysis of Large-scale Proteome Data Sets to...

    • acs.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadin Neuhauser; Nagarjuna Nagaraj; Peter McHardy; Sara Zanivan; Richard Scheltema; Jürgen Cox; Matthias Mann (2023). High Performance Computational Analysis of Large-scale Proteome Data Sets to Assess Incremental Contribution to Coverage of the Human Genome [Dataset]. http://doi.org/10.1021/pr400181q.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Nadin Neuhauser; Nagarjuna Nagaraj; Peter McHardy; Sara Zanivan; Richard Scheltema; Jürgen Cox; Matthias Mann
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Computational analysis of shotgun proteomics data can now be performed in a completely automated and statistically rigorous way, as exemplified by the freely available MaxQuant environment. The sophisticated algorithms involved and the sheer amount of data translate into very high computational demands. Here we describe parallelization and memory optimization of the MaxQuant software with the aim of executing it on a large computer cluster. We analyze and mitigate bottlenecks in overall performance and find that the most time-consuming algorithms are those detecting peptide features in the MS1 data as well as the fragment spectrum search. These tasks scale with the number of raw files and can readily be distributed over many CPUs as long as memory access is properly managed. Here we compared the performance of a parallelized version of MaxQuant running on a standard desktop, an I/O performance optimized desktop computer (“game computer”), and a cluster environment. The modified gaming computer and the cluster vastly outperformed a standard desktop computer when analyzing more than 1000 raw files. We apply our high performance platform to investigate incremental coverage of the human proteome by high resolution MS data originating from in-depth cell line and cancer tissue proteome measurements.

  20. f

    Data from: Comparative Evaluation of Proteome Discoverer and FragPipe for...

    • acs.figshare.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianen He; Youqi Liu; Yan Zhou; Lu Li; He Wang; Shanjun Chen; Jinlong Gao; Wenhao Jiang; Yi Yu; Weigang Ge; Hui-Yin Chang; Ziquan Fan; Alexey I. Nesvizhskii; Tiannan Guo; Yaoting Sun (2023). Comparative Evaluation of Proteome Discoverer and FragPipe for the TMT-Based Proteome Quantification [Dataset]. http://doi.org/10.1021/acs.jproteome.2c00390.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Tianen He; Youqi Liu; Yan Zhou; Lu Li; He Wang; Shanjun Chen; Jinlong Gao; Wenhao Jiang; Yi Yu; Weigang Ge; Hui-Yin Chang; Ziquan Fan; Alexey I. Nesvizhskii; Tiannan Guo; Yaoting Sun
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Isobaric labeling-based proteomics is widely applied in deep proteome quantification. Among the platforms for isobaric labeled proteomic data analysis, the commercial software Proteome Discoverer (PD) is widely used, incorporating the search engine CHIMERYS, while FragPipe (FP) is relatively new, free for noncommercial purposes, and integrates the engine MSFragger. Here, we compared PD and FP over three public proteomic data sets labeled using 6plex, 10plex, and 16plex tandem mass tags. Our results showed the protein abundances generated by the two software are highly correlated. PD quantified more proteins (10.02%, 15.44%, 8.19%) than FP with comparable NA ratios (0.00% vs. 0.00%, 0.85% vs. 0.38%, and 11.74% vs. 10.52%) in the three data sets. Using the 16plex data set, PD and FP outputs showed high consistency in quantifying technical replicates, batch effects, and functional enrichment in differentially expressed proteins. However, FP saved 93.93%, 96.65%, and 96.41% of processing time compared to PD for analyzing the three data sets, respectively. In conclusion, while PD is a well-maintained commercial software integrating various additional functions and can quantify more proteins, FP is freely available and achieves similar output with a shorter computational time. Our results will guide users in choosing the most suitable quantification software for their needs.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jacques Colinge; Keiryn L Bennett (2023). Introduction to Computational Proteomics [Dataset]. http://doi.org/10.1371/journal.pcbi.0030114
Organization logo

Introduction to Computational Proteomics

Explore at:
29 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Jacques Colinge; Keiryn L Bennett
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Introduction to Computational Proteomics

Search
Clear search
Close search
Google apps
Main menu