Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Chemical cross-linking combined with mass spectrometry provides a powerful method for identifying protein−protein interactions and probing the structure of protein complexes. A number of strategies have been reported that take advantage of the high sensitivity and high resolution of modern mass spectrometers. Approaches typically include synthesis of novel cross-linking compounds, and/or isotopic labeling of the cross-linking reagent and/or protein, and label-free methods. We report Xlink-Identifier, a comprehensive data analysis platform that has been developed to support label-free analyses. It can identify interpeptide, intrapeptide, and deadend cross-links as well as underivatized peptides. The software streamlines data preprocessing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein−protein interactions and protein structure using mass spectrometry. The software has been evaluated using a custom synthesized cross-linking reagent that features an enrichment tag. Xlink-Identifier offers the potential to perform large-scale identifications of protein−protein interactions using tandem mass spectrometry.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The number of methods for pre-processing and analysis of gene expression data continues to increase, often making it difficult to select the most appropriate approach. We present a simple procedure for comparative estimation of a variety of methods for microarray data pre-processing and analysis. Our approach is based on the use of real microarray data in which controlled fold changes are introduced into 20% of the data to provide a metric for comparison with the unmodified data. The data modifications can be easily applied to raw data measured with any technological platform and retains all the complex structures and statistical characteristics of the real-world data. The power of the method is illustrated by its application to the quantitative comparison of different methods of normalization and analysis of microarray data. Our results demonstrate that the method of controlled modifications of real experimental data provides a simple tool for assessing the performance of data preprocessing and analysis methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clustering frequency results for each of the pre- and post-processing data-type.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This spreadsheet presents the meticulously classified results from the conducting phase of our systematic literature review titled "From Manual to Automated: A State-of-the-Art Review to Examine the Impact of Intelligent Document Processing in Banking Automation". Each entry within this document represents an individual study analyzed during our research, categorized according to a carefully designed classification framework to ensure a comprehensive and clear understanding of the evolving landscape in banking automation through Intelligent Document Processing (IDP) technologies.
Classification Framework Overview
This classification scheme is instrumental in providing a structured, in-depth analysis of the field's current state, trends, and future directions. The framework aids in navigating the vast amount of information in the domain, offering researchers, practitioners, and policymakers a clear vision of the significant aspects of each study to foster informed decisions and further innovation in banking automations through IDP.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Specificity results for each of the pre- and post-processing data-type.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Publicly available RNA-sequencing (RNA-seq) data are a rich resource for elucidating the mechanisms of human disease; however, preprocessing these data requires considerable bioinformatic expertise and computational infrastructure. Analyzing multiple datasets with a consistent computational workflow increases the accuracy of downstream meta-analyses. This collection of datasets represents the human intracellular transcriptional response to disorders and diseases such as acute lymphoblastic leukemia (ALL), B-cell lymphomas, chronic obstructive pulmonary disease (COPD), colorectal cancer, lupus erythematosus; as well as infection with pathogens including Borrelia burgdorferi, hantavirus, influenza A virus, Middle East respiratory syndrome coronavirus (MERS-CoV), Streptococcus pneumoniae, respiratory syncytial virus (RSV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We calculated the statistically significant differentially expressed genes and Gene Ontology (GO) terms for all datasets. In addition, a subset of the datasets also include results from splice variant analyses, intracellular signaling pathway enrichments as well as read mapping and quantification. All analyses were performed using well-established algorithms and are provided to facilitate future data mining activities, wet lab studies, and to accelerate collaboration and discovery.
Liquid chromatography-mass spectrometry (LC-MS) based lipidomics generate a large dataset, which requires high-performance data pre-processing tools for their interpretation such as XCMS, mzMine and Progenesis. These pre-processing tools rely heavily on accurate peak detection, which depends on setting the peak detection mass tolerance (PDMT) properly. The PDMT is usually set with a fixed value in either ppm or Da units. However, this fixed value may result in duplicates or missed peak detection. Therefore, we developed the dynamic binning method for accurate peak detection, which takes into account the peak broadening described by well-known physics laws of ion separation and set dynamically the value of PDMT as a function of m/z. Namely, in our method, the PDMT is proportional to for FTICR, to for Orbitrap, to m/z for Q-TOF and is a constant for Quadrupole mass analyzer, respectively. The dynamic binning method was implemented in XCMS. Our further goal was to compare the performance of different lipidomics pre-processing tools to find differential compounds. We have generated set samples with 43 lipids internal standards differentially spiked to aliquots of one human plasma lipid sample using Orbitrap LC-MS/MS. The performance of the various pipelines using aligned parameter sets was quantified by a quality score system which reflects the ability of a pre-processing pipeline to detect differential peaks spiked at various concentration levels. The quality score indicates that the dynamic binning method improves the performance of XCMS (maximum p-value 9.8·10-3 of two-sample Wilcoxon test). The modified XCMS software was further compared with mzMine and Progenesis. The results showed that modified XCMS and Progenesis had a similarly good performance in the aspect of finding differential compounds. In addition, Progenesis shows lower variability as indicated by lower CVs, followed by XCMS and mzMine. The lower variability of Progenesis improve the quantification, however, provide an incorrect quantification abundance order of spiked-in internal standards.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1. Data analyzed in this work after preprocessing but prior to any adjustments.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Chemical cross-linking combined with mass spectrometry provides a powerful method for identifying protein−protein interactions and probing the structure of protein complexes. A number of strategies have been reported that take advantage of the high sensitivity and high resolution of modern mass spectrometers. Approaches typically include synthesis of novel cross-linking compounds, and/or isotopic labeling of the cross-linking reagent and/or protein, and label-free methods. We report Xlink-Identifier, a comprehensive data analysis platform that has been developed to support label-free analyses. It can identify interpeptide, intrapeptide, and deadend cross-links as well as underivatized peptides. The software streamlines data preprocessing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein−protein interactions and protein structure using mass spectrometry. The software has been evaluated using a custom synthesized cross-linking reagent that features an enrichment tag. Xlink-Identifier offers the potential to perform large-scale identifications of protein−protein interactions using tandem mass spectrometry.