Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of computational and proteomics datasets for the secretomes of T. gondii and P. falciparum.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction to Computational Proteomics
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples of computational methods for spatial proteomics datasets for prediction and novelty detection.
Intrinsically disordered proteins and protein regions lack a stable three-dimensional structure under physiological conditions. Several proteomic investigations of intrinsic disorder have been performed to date and have found disorder to be prevalent in eukaryotic proteomes. Here we present descriptive statistics of intrinsic disorder features for ten model eukaryotic proteomes that have been calculated from computational disorder prediction algorithms. The data descriptor also provides consensus disorder annotations as well as additional physical parameters relevant to protein disorder, and further provides protein existence information for all proteins included in our analysis. The complete datasets can be downloaded freely, and it is envisaged that they will be updated periodically with new proteomes and protein disorder prediction algorithms. These datasets will be especially useful for assessing protein disorder, and conducting novel analyses that advance our understanding of intri...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for an examplary metaproteomics data analysis with the MetaProteomeAnalyzer (MPA) and Prophane software tools. Data is from the PRIDE dataset PXD010550.
Files include:
The FragPipe computational proteomics platform is gaining widespread popularity among the proteomics research community because of its fast processing speed and user-friendly graphical interface. Although FragPipe produces well-formatted output tables that are ready for analysis, there is still a need for an easy-to-use and user-friendly downstream statistical analysis and visualization tool. FragPipe-Analyst addresses this need by providing an R shiny web server to assist FragPipe users in conducting downstream analyses of the resulting quantitative proteomics data. It supports major quantification workflows, including label-free quantification, tandem mass tags, and data-independent acquisition. FragPipe-Analyst offers a range of useful functionalities, such as various missing value imputation options, data quality control, unsupervised clustering, differential expression (DE) analysis using Limma, and gene ontology and pathway enrichment analysis using Enrichr. To support advanced analysis and customized visualizations, we also developed FragPipeAnalystR, an R package encompassing all FragPipe-Analyst functionalities that is extended to support site-specific analysis of post-translational modifications (PTMs). FragPipe-Analyst and FragPipeAnalystR are both open-source and freely available.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Proteomics data analysis strongly benefits from not studying single proteins in isolation but taking their multivariate interdependence into account. We introduce PerseusNet, the new Perseus network module for the biological analysis of proteomics data. Proteomics is commonly used to generate networks, e.g., with affinity purification experiments, but networks are also used to explore proteomics data. PerseusNet supports the biomedical researcher for both modes of data analysis with a multitude of activities. For affinity purification, a volcano-plot-based statistical analysis method for network generation is featured which is scalable to large numbers of baits. For posttranslational modifications of proteins, such as phosphorylation, a collection of dedicated network analysis tools helps in elucidating cellular signaling events. Co-expression network analysis of proteomics data adopts established tools from transcriptome co-expression analysis. PerseusNet is extensible through a plugin architecture in a multi-lingual way, integrating analyses in C#, Python, and R, and is freely available at http://www.perseus-framework.org.
Protein glycosylation is a complex post-translational modification with crucial cellular functions in all domains of life. Currently, large-scale glycoproteomics approaches rely on glycan database dependent algorithms and are thus unsuitable for discovery-driven analyses of glycoproteomes. Therefore, we devised SugarPy, a glycan database independent Python module, and validated it on the glycoproteome of human breast milk. We further demonstrated its applicability by analyzing glycoproteomes with uncommon glycans stemming from the green algae Chalmydomonas reinhardtii and the archaeon Haloferax volcanii. Finally, SugarPy facilitated the novel characterization of glycoproteins from Cyanidioschyzon merolae.
Cross-linking combined with mass spectrometry (XL-MS) provides a wealth of information about the 3D structure of proteins and their interactions. We introduce MaxLynx, a novel computational proteomics workflow for XL-MS integrated into the MaxQuant environment and here we have tested the performance of MaxLynx on the data sets that were generated by using a Bruker timsTOF pro instrument.
This dataset consists of 44 raw MS files, comprising 27 DIA (SWATH) and 15 DDA runs on a TripleTOF 5600 and of two raw mass spectrometry files acquired on a Q Exactive. The composition of the dataset is described in the manuscript by Tsou et al., titled: "DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics", Nature Methods, in press Raw files are deposited here in ProteomeXchange and are associated with the DIA-Umpire processed data. All DIA-Umpire processed results for each sample together with DDA results are deposited in separated folders. Also see the "DataSampleID.xlsx" associated with this Readme file. Internal reference from the Gingras lab ProHits implementation: Project 94, Export version VS2 (Tsou_DIA-Umpire)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An example input file for Geena 2. This example file can be used for testing purposes. It includes 12 MS spectra generated by MALDI/TOF from four biological samples in the context of a real experiment. Three spectra were generated for each sample. The format of the file is described in details, and with examples, in the manuscript and in the information file on Input/Output data formats in the web site. (TXT 26Â kb)
Background: In recent years, an innovative strategy using laser microdissection and mass spectrometry markedly expanded the landscape of antigens associated with membranous nephropathy (MN). Specific associations with phenotypes, diseases and sometimes reversible triggers led to a novel antigen-based classification of MN, paving the way for precision medicine and stressing the need for more routine use of proteomics in MN. Methods: To explore the proteomic landscape of human glomeruli and identify podocyte antigens and disease mechanisms in MN, we expanded the original technique to an integrative approach combining laser capture microdissection, next-generation mass spectrometry and computational analysis. Next to conventional data-dependent acquisition (DDA), we used and assessed the diagnostic yield of the more comprehensive data-independent acquisition (DIA) mass spectrometry, which enables the detection and quantification of every peptide in a sample irrespective of its level of abundance or m/z value. Our proteomic pipeline was applied to residual material from kidney biopsies in 64 individuals, including 31 healthy controls; 5 disease controls; 5 PLA2R-associated MN; and 23 PLA2R-negative MN. Results: Unbiased analyses confirmed the significant enrichment in PLA2R, IgG4 and complement proteins in glomeruli from patients with PLA2R-MN compared with healthy and disease controls, while molecular characterization of complement fragments provided evidence for complement activation in PLA2R-MN. Compared to DDA, DIA mass spectrometry increased the number of glomerular proteins (~3800 vs. ~1200) identified in healthy glomeruli; allowed the detection all known antigens except NELL1 in normal glomeruli; and increased the detection rate of podocyte antigens from 50% to >80% in PLA2R-negative MN. Conclusions: This proof-of-concept study suggests that an integrative approach combining laser microdissection, DIA mass spectrometry and computational biology is a powerful tool, with translational potential, to identify podocyte antigens and unravel disease mechanisms in MN.
Label-free quantitative mass spectrometry (MS) based on the Normalized Spectral Abundance Factor (NSAF) has emerged as a simple and reasonably robust method to determine the relative abundance of individual proteins within complex mixtures. Here, we describe Morpheus Spectral Counter (MSpC) as the first computational tool that directly calculates NSAF values from output obtained from Morpheus, a fast, open-source, peptide-MS/MS matching engine compatible with high-resolution mass instruments. NSAF has distinct advantages over other MS-based quantification methods, including a higher dynamic range as compared to isobaric tags, no requirement to align and re-extract MS1 peaks, and increased speed. MSpC features an easy to use graphic user interface that additionally calculates both distributed and unique NSAF values to permit analyses of both protein families and isoforms/proteoforms. MSpC determinations of protein concentration were linear over several orders of magnitude based on the analysis of several high-mass accuracy datasets either obtained from the Proteomics Identifications Repository or generated de novo with total cell extracts spiked with purified Arabidopsis 20S proteasomes. The MSpC software was developed in C# and is open sourced under a permissive license with the code made available at http://dcgemperline.github.io/Morpheus_SpC/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We used decoupleR to evaluate the performance of individual methods by recovering perturbed transcription factors (TFs) from a curation of single-gene perturbation experiments (Holland et al., 2020). As a resource we used DoRothEA, a gene regulatory network linking TFs to target genes by their mode of regulation (Garcia-Alonso et al., 2019). Perturbation experiments where the targeted regulator was not in DoRothEA were removed. After filtering, this dataset is composed of gene expression data from 92 knockdown and overexpression experiments of 40 unique TFs in human cells. Additionally, we tested the performance of decoupleR on phospho-proteomic data. For this, we filtered in a similar fashion a curated set of knockdown and overexpression single-kinase perturbation experiments, obtaining 63 experiments including 14 unique kinases, and applied a weighted resource from the same publication that links kinases to their target phosphosites (Hernandez-Armenta et al., 2017). For the transcriptomic dataset, differential expression analysis was performed with limma (Ritchie et al., 2015) and the resulting t-values were used as input. For the phospho-proteomics, the quantile-normalized log2-fold changes from different studies were used to make them comparable.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In liquid-chromatography–tandem-mass-spectrometry-based proteomics, information about the presence and stoichiometry of protein modifications is not readily available. To overcome this problem, we developed multiFLEX-LF, a computational tool that builds upon FLEXIQuant, which detects modified peptide precursors and quantifies their modification extent by monitoring the differences between observed and expected intensities of the unmodified precursors. multiFLEX-LF relies on robust linear regression to calculate the modification extent of a given precursor relative to a within-study reference. multiFLEX-LF can analyze entire label-free discovery proteomics data sets in a precursor-centric manner without preselecting a protein of interest. To analyze modification dynamics and coregulated modifications, we hierarchically clustered the precursors of all proteins based on their computed relative modification scores. We applied multiFLEX-LF to a data-independent-acquisition-based data set acquired using the anaphase-promoting complex/cyclosome (APC/C) isolated at various time points during mitosis. The clustering of the precursors allows for identifying varying modification dynamics and ordering the modification events. Overall, multiFLEX-LF enables the fast identification of potentially differentially modified peptide precursors and the quantification of their differential modification extent in large data sets using a personal computer. Additionally, multiFLEX-LF can drive the large-scale investigation of the modification dynamics of peptide precursors in time-series and case-control studies. multiFLEX-LF is available at https://gitlab.com/SteenOmicsLab/multiflex-lf.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary File 2 from Identification of Optimal Drug Combinations Targeting Cellular Networks: Integrating Phospho-Proteomics and Computational Network Analysis
https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
According to our latest research, the global AI in Proteomics market size reached USD 1.42 billion in 2024, reflecting the rapid integration of artificial intelligence into proteomics research and applications. The market is expected to grow at a robust CAGR of 32.1% during the forecast period, reaching approximately USD 16.66 billion by 2033. This remarkable growth is driven by the increasing adoption of AI technologies for analyzing complex proteomic datasets, accelerating drug discovery, and enabling personalized medicine. The proliferation of high-throughput proteomic technologies, coupled with the need for advanced computational tools to interpret vast amounts of data, is propelling the expansion of the AI in Proteomics market globally.
One of the primary growth drivers for the AI in Proteomics market is the surging demand for efficient and accurate data analysis solutions in life sciences. Proteomics generates massive datasets that are often too complex for traditional computational methods. AI-based platforms, leveraging machine learning and deep learning algorithms, can identify patterns, predict protein functions, and uncover novel biomarkers with unprecedented speed and accuracy. This capability is particularly valuable in drug discovery and development, where AI significantly reduces the time and cost required to identify viable drug targets and optimize therapeutic candidates. The increasing collaboration between AI technology providers and pharmaceutical companies is further accelerating innovation and adoption in this space.
Another significant factor fueling market growth is the rising focus on personalized medicine and precision healthcare. As healthcare systems worldwide shift toward individualized treatment strategies, there is a heightened need for detailed proteomic profiling to understand patient-specific disease mechanisms. AI-driven proteomics enables clinicians and researchers to analyze protein expression patterns, detect early signs of disease, and tailor therapies to individual patients. This not only improves patient outcomes but also enhances the efficiency of clinical trials and reduces adverse drug reactions. The integration of AI in clinical diagnostics and biomarker identification is expected to transform the landscape of healthcare, making it more predictive, preventive, and personalized.
Additionally, the increasing availability of cloud-based AI platforms and services is making advanced proteomic analysis accessible to a broader range of organizations, including small and medium-sized enterprises (SMEs) and academic research institutes. Cloud deployment eliminates the need for substantial upfront investment in hardware and infrastructure, enabling users to leverage scalable computational resources on demand. This democratization of AI-powered proteomics is fostering innovation across the life sciences ecosystem, facilitating cross-institutional collaborations, and accelerating scientific discoveries. The growing emphasis on open-source AI tools and shared data repositories is also contributing to the rapid evolution of the market.
From a regional perspective, North America continues to dominate the AI in Proteomics market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The United States, in particular, benefits from a strong presence of leading pharmaceutical and biotechnology companies, advanced research infrastructure, and substantial investments in AI and life sciences. Europe is witnessing significant growth, driven by government initiatives to promote precision medicine and the expansion of proteomics research networks. Meanwhile, Asia Pacific is emerging as a high-growth region, supported by increasing R&D expenditure, a growing biotechnology sector, and rising awareness of the benefits of AI in healthcare. Latin America and the Middle East & Africa are gradually adopting AI-driven proteomics, with market expansion expected to accelerate as technological infrastructure improves.
The Component segment of the AI in Proteomics market is categorized into software, hardware, and services, each playing a pivotal role in the advancement and adoption of AI-powered proteomics solutions. Software constitutes the largest share of this segment, driven by the growing need for sophisticated algorithms and analytical platforms capable of processing and interpreting comple
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of the dataset and subsequent analyses.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
We describe and demonstrate the proteomics computational toolkit provided in the open-source msInspect software distribution. The toolkit includes modules written in Java and in the R statistical programming language to aid the rapid development of proteomics software applications. It contains tools for processing and manipulating standard MS data files, including signal processing of LC−MS data and parsing of MS/MS search results, as well as for modeling proteomics data structures, creating charts, and other common tasks. We present this toolkit’s capability to rapidly develop new computational tools by presenting an example application, Qurate, a graphical tool for manually curating isotopically labeled peptide quantitative events.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unfiltered supplementary tables for "Prosit: Proteome-wide prediction of peptide tandem mass spectra by deep learning"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of computational and proteomics datasets for the secretomes of T. gondii and P. falciparum.