9 datasets found
  1. f

    Data from: Xlink-Identifier: An Automated Data Analysis Platform for...

    • acs.figshare.com
    zip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiuxia Du; Saiful M. Chowdhury; Nathan P. Manes; Si Wu; M. Uljana Mayer; Joshua N. Adkins; Gordon A. Anderson; Richard D. Smith (2023). Xlink-Identifier: An Automated Data Analysis Platform for Confident Identifications of Chemically Cross-Linked Peptides Using Tandem Mass Spectrometry [Dataset]. http://doi.org/10.1021/pr100848a.s004
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    ACS Publications
    Authors
    Xiuxia Du; Saiful M. Chowdhury; Nathan P. Manes; Si Wu; M. Uljana Mayer; Joshua N. Adkins; Gordon A. Anderson; Richard D. Smith
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Chemical cross-linking combined with mass spectrometry provides a powerful method for identifying protein−protein interactions and probing the structure of protein complexes. A number of strategies have been reported that take advantage of the high sensitivity and high resolution of modern mass spectrometers. Approaches typically include synthesis of novel cross-linking compounds, and/or isotopic labeling of the cross-linking reagent and/or protein, and label-free methods. We report Xlink-Identifier, a comprehensive data analysis platform that has been developed to support label-free analyses. It can identify interpeptide, intrapeptide, and deadend cross-links as well as underivatized peptides. The software streamlines data preprocessing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein−protein interactions and protein structure using mass spectrometry. The software has been evaluated using a custom synthesized cross-linking reagent that features an enrichment tag. Xlink-Identifier offers the potential to perform large-scale identifications of protein−protein interactions using tandem mass spectrometry.

  2. A Comprehensive and Universal Method for Assessing the Performance of...

    • plos.figshare.com
    tiff
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail G. Dozmorov; Joel M. Guthridge; Robert E. Hurst; Igor M. Dozmorov (2023). A Comprehensive and Universal Method for Assessing the Performance of Differential Gene Expression Analyses [Dataset]. http://doi.org/10.1371/journal.pone.0012657
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mikhail G. Dozmorov; Joel M. Guthridge; Robert E. Hurst; Igor M. Dozmorov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The number of methods for pre-processing and analysis of gene expression data continues to increase, often making it difficult to select the most appropriate approach. We present a simple procedure for comparative estimation of a variety of methods for microarray data pre-processing and analysis. Our approach is based on the use of real microarray data in which controlled fold changes are introduced into 20% of the data to provide a metric for comparison with the unmodified data. The data modifications can be easily applied to raw data measured with any technological platform and retains all the complex structures and statistical characteristics of the real-world data. The power of the method is illustrated by its application to the quantitative comparison of different methods of normalization and analysis of microarray data. Our results demonstrate that the method of controlled modifications of real experimental data provides a simple tool for assessing the performance of data preprocessing and analysis methods.

  3. Clustering frequency results for each of the pre- and post-processing...

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben O. L. Mellors; Abigail M. Spear; Christopher R. Howle; Kelly Curtis; Sara Macildowie; Hamid Dehghani (2023). Clustering frequency results for each of the pre- and post-processing data-type. [Dataset]. http://doi.org/10.1371/journal.pone.0238647.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ben O. L. Mellors; Abigail M. Spear; Christopher R. Howle; Kelly Curtis; Sara Macildowie; Hamid Dehghani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Clustering frequency results for each of the pre- and post-processing data-type.

  4. Classification results of the studies analyzed in A State-of-the-Art Review...

    • zenodo.org
    Updated Apr 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Luis Alonso-Rocha; Antonio Martínez-Rojas; Antonio Martínez-Rojas; José González-Enríquez; José González-Enríquez; Jesús M. Sánchez-Oliva; José Luis Alonso-Rocha; Jesús M. Sánchez-Oliva (2025). Classification results of the studies analyzed in A State-of-the-Art Review to Examine the Impact of Intelligent Document Processing in Banking Automations [Dataset]. http://doi.org/10.5281/zenodo.15268178
    Explore at:
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    José Luis Alonso-Rocha; Antonio Martínez-Rojas; Antonio Martínez-Rojas; José González-Enríquez; José González-Enríquez; Jesús M. Sánchez-Oliva; José Luis Alonso-Rocha; Jesús M. Sánchez-Oliva
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This spreadsheet presents the meticulously classified results from the conducting phase of our systematic literature review titled "From Manual to Automated: A State-of-the-Art Review to Examine the Impact of Intelligent Document Processing in Banking Automation". Each entry within this document represents an individual study analyzed during our research, categorized according to a carefully designed classification framework to ensure a comprehensive and clear understanding of the evolving landscape in banking automation through Intelligent Document Processing (IDP) technologies.

    Classification Framework Overview

    • RQ1. General Study Characterization
      • Date: indicates the year of publication of the study.
      • Contribution Source: refers to the type of publication in which the study appears, such as a journal article or conference paper.
      • Validation: describes the context in which the study’s findings are validated, distinguishing between research environments and industrial or practical applications.
      • Contribution Type: defines the nature of the study’s main contribution, whether it presents an algorithm, a theoretical analysis, a framework, a method, or a model.
      • Public Data Exposure: reflects whether the study generates original datasets and makes them publicly accessible, distinguishing between contributions that provide new open data and those that rely on existing sources or do not disclose their data.
    • RQ2. Machine Learning Approaches and Trends
      • Learning Paradigm: classifies the study’s learning approach as supervised or unsupervised.
      • AI Subfield: identifies the primary Artificial Intelligence (AI) domain of the study, such as data mining, computer vision, or natural language processing (NLP).
      • Model Category: describes the specific type of Machine Learning (ML) model applied in the study, including rule-based models, regression models, clustering, support vector machines, decision trees, or neural networks.
    • RQ3. Business Automation Strategies
      • Automation Compatibility: assesses whether the study’s proposal aligns with Robotic Process Automation (RPA) or fits within a broader, more general automation context.
      • IDP Life Cycle Stage: defines the phase of the IDP life cycle addressed by the study, such as preprocessing, data extraction, or classification.
      • Business Environment Integration: assesses whether the proposed solution is designed for integration within business environments or remains conceptual.
      • Data Preparation Techniques: describes the preprocessing steps applied to structure and enhance raw inputs, employed in the study, including cleaning, transformation, vectorization, or token and label manipulation.
    • RQ4. Application Areas
      • Application Domain: identifies the sector or industry targeted by the study, such as banking, finance, fraud detection, accounting, or auditing.
      • Case Study: specifies the particular application context or document type addressed by the study, for example, checks, invoices, signatures, or broader document categories.

    This classification scheme is instrumental in providing a structured, in-depth analysis of the field's current state, trends, and future directions. The framework aids in navigating the vast amount of information in the domain, offering researchers, practitioners, and policymakers a clear vision of the significant aspects of each study to foster informed decisions and further innovation in banking automations through IDP.

  5. f

    Metaverse Gait Authentication Dataset (MGAD)

    • figshare.com
    csv
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sandeep ravikanti (2025). Metaverse Gait Authentication Dataset (MGAD) [Dataset]. http://doi.org/10.6084/m9.figshare.28387664.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    figshare
    Authors
    sandeep ravikanti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    1. Dataset OverviewThe Metaverse Gait Authentication Dataset (MGAD) is a large-scale dataset for gait-based biometric authentication in virtual environments. It consists of gait data from 5,000 simulated users, generated using Unity 3D and processed using OpenPose and MediaPipe. This dataset is ideal for researchers working on biometric authentication, gait analysis, and AI-driven identity verification systems.2. Data Structure & FormatFile Format: CSVNumber of Samples: 5,000 usersNumber of Features: 16 gait-based featuresColumns: Each row represents a user with corresponding gait feature valuesSize: Approximately (mention size in MB/GB after upload)3. Feature DescriptionsThe dataset includes 16 extracted gait features:Stride Length (m): Average distance covered in one gait cycle.Step Frequency (steps/min): Number of steps taken per minute.Stance Phase Duration (s): Stance phase in a gait cycle.Swing Phase Duration (s): Duration of the swing phase in a gait cycle.Double Support Phase Duration (s): Time both feet are in contact with the ground.Step Length (m): Distance between consecutive foot placements.Cadence Variability (%): Variability in step rate.Hip Joint Angle (°): Maximum angle variation in the hip joint.Knee Joint Angle (°): Maximum flexion-extension knee angle.Ankle Joint Angle (°): Angle variation at the ankle joint.Avg. Vertical GRF (N): Average vertical ground reaction force.Avg. Anterior-Posterior GRF (N): Ground reaction force in the forward-backward direction.Avg. Medial-Lateral GRF (N): Ground reaction force in the side-to-side direction.Avg. COP Excursion (mm): Center of pressure movement during stance phase.Foot Clearance during Swing Phase (mm): Minimum height of the foot during the swing phase.Gait Symmetry Index (%): Measure of symmetry between left and right gait cycles.4. How to Use the DatasetLoad the dataset in Python using Pandas:Use the features for machine learning models in biometric authentication.Apply preprocessing techniques like normalization and feature scaling.Train and evaluate deep learning or ensemble models for gait recognition.5. Citation & LicenseIf you use this dataset, please cite it as follows:Sandeep Ravikanti, "Metaverse Gait Authentication Dataset (MGAD)," IEEE DataPort, 2025. DOI: https://dx.doi.org/10.21227/rvh5-88426. Contact InformationFor inquiries or collaborations, please contact: bitsrmit2023@gmail.com
  6. f

    Specificity results for each of the pre- and post-processing data-type.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben O. L. Mellors; Abigail M. Spear; Christopher R. Howle; Kelly Curtis; Sara Macildowie; Hamid Dehghani (2023). Specificity results for each of the pre- and post-processing data-type. [Dataset]. http://doi.org/10.1371/journal.pone.0238647.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ben O. L. Mellors; Abigail M. Spear; Christopher R. Howle; Kelly Curtis; Sara Macildowie; Hamid Dehghani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Specificity results for each of the pre- and post-processing data-type.

  7. Data from: Preprocessing of Public RNA-sequencing Datasets to Facilitate...

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated May 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naomi Rapier Sharman; John Krapohl; Ethan Beausoleil; Kennedy Gifford; Ben Hinatsu; Curtis Hoffman; Makayla Komer; Tiana M. Scott; Brett E. Pickett; Naomi Rapier Sharman; John Krapohl; Ethan Beausoleil; Kennedy Gifford; Ben Hinatsu; Curtis Hoffman; Makayla Komer; Tiana M. Scott; Brett E. Pickett (2021). Preprocessing of Public RNA-sequencing Datasets to Facilitate Downstream Analyses of Human Diseases: Dataset [Dataset]. http://doi.org/10.5281/zenodo.4757764
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    May 14, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Naomi Rapier Sharman; John Krapohl; Ethan Beausoleil; Kennedy Gifford; Ben Hinatsu; Curtis Hoffman; Makayla Komer; Tiana M. Scott; Brett E. Pickett; Naomi Rapier Sharman; John Krapohl; Ethan Beausoleil; Kennedy Gifford; Ben Hinatsu; Curtis Hoffman; Makayla Komer; Tiana M. Scott; Brett E. Pickett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Publicly available RNA-sequencing (RNA-seq) data are a rich resource for elucidating the mechanisms of human disease; however, preprocessing these data requires considerable bioinformatic expertise and computational infrastructure. Analyzing multiple datasets with a consistent computational workflow increases the accuracy of downstream meta-analyses. This collection of datasets represents the human intracellular transcriptional response to disorders and diseases such as acute lymphoblastic leukemia (ALL), B-cell lymphomas, chronic obstructive pulmonary disease (COPD), colorectal cancer, lupus erythematosus; as well as infection with pathogens including Borrelia burgdorferi, hantavirus, influenza A virus, Middle East respiratory syndrome coronavirus (MERS-CoV), Streptococcus pneumoniae, respiratory syncytial virus (RSV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We calculated the statistically significant differentially expressed genes and Gene Ontology (GO) terms for all datasets. In addition, a subset of the datasets also include results from splice variant analyses, intracellular signaling pathway enrichments as well as read mapping and quantification. All analyses were performed using well-established algorithms and are provided to facilitate future data mining activities, wet lab studies, and to accelerate collaboration and discovery.

  8. Data from: Dynamic binning peak detection and assessment of various...

    • data.niaid.nih.gov
    • metabolomicsworkbench.org
    xml
    Updated Sep 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Horvatovich Péter (2020). Dynamic binning peak detection and assessment of various lipidomics liquid chromatography-mass spectrometry pre-processing platforms [Dataset]. https://data.niaid.nih.gov/resources?id=st001493
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Sep 25, 2020
    Dataset provided by
    University of Groningen
    Authors
    Horvatovich Péter
    Variables measured
    Metabolomics, Treatment:Pure Plasma, Treatment:Plasma + IS 1, Treatment:Plasma + IS 1/2, Treatment:Plasma + IS 1/4, Treatment:Plasma + IS 1/8, Treatment:Plasma + IS 1/16
    Description

    Liquid chromatography-mass spectrometry (LC-MS) based lipidomics generate a large dataset, which requires high-performance data pre-processing tools for their interpretation such as XCMS, mzMine and Progenesis. These pre-processing tools rely heavily on accurate peak detection, which depends on setting the peak detection mass tolerance (PDMT) properly. The PDMT is usually set with a fixed value in either ppm or Da units. However, this fixed value may result in duplicates or missed peak detection. Therefore, we developed the dynamic binning method for accurate peak detection, which takes into account the peak broadening described by well-known physics laws of ion separation and set dynamically the value of PDMT as a function of m/z. Namely, in our method, the PDMT is proportional to for FTICR, to for Orbitrap, to m/z for Q-TOF and is a constant for Quadrupole mass analyzer, respectively. The dynamic binning method was implemented in XCMS. Our further goal was to compare the performance of different lipidomics pre-processing tools to find differential compounds. We have generated set samples with 43 lipids internal standards differentially spiked to aliquots of one human plasma lipid sample using Orbitrap LC-MS/MS. The performance of the various pipelines using aligned parameter sets was quantified by a quality score system which reflects the ability of a pre-processing pipeline to detect differential peaks spiked at various concentration levels. The quality score indicates that the dynamic binning method improves the performance of XCMS (maximum p-value 9.8·10-3 of two-sample Wilcoxon test). The modified XCMS software was further compared with mzMine and Progenesis. The results showed that modified XCMS and Progenesis had a similarly good performance in the aspect of finding differential compounds. In addition, Progenesis shows lower variability as indicated by lower CVs, followed by XCMS and mzMine. The lower variability of Progenesis improve the quantification, however, provide an incorrect quantification abundance order of spiked-in internal standards.

  9. Additional file 1 of Multivariate pattern analysis: a method and software to...

    • springernature.figshare.com
    xlsx
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim U. H. Baumeister; Eivind Aadland; Roger G. Linington; Olav M. Kvalheim (2024). Additional file 1 of Multivariate pattern analysis: a method and software to reveal, quantify, and visualize predictive association patterns in multicollinear data [Dataset]. http://doi.org/10.6084/m9.figshare.25123885.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 1, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tim U. H. Baumeister; Eivind Aadland; Roger G. Linington; Olav M. Kvalheim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. Data analyzed in this work after preprocessing but prior to any adjustments.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Xiuxia Du; Saiful M. Chowdhury; Nathan P. Manes; Si Wu; M. Uljana Mayer; Joshua N. Adkins; Gordon A. Anderson; Richard D. Smith (2023). Xlink-Identifier: An Automated Data Analysis Platform for Confident Identifications of Chemically Cross-Linked Peptides Using Tandem Mass Spectrometry [Dataset]. http://doi.org/10.1021/pr100848a.s004

Data from: Xlink-Identifier: An Automated Data Analysis Platform for Confident Identifications of Chemically Cross-Linked Peptides Using Tandem Mass Spectrometry

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
ACS Publications
Authors
Xiuxia Du; Saiful M. Chowdhury; Nathan P. Manes; Si Wu; M. Uljana Mayer; Joshua N. Adkins; Gordon A. Anderson; Richard D. Smith
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Chemical cross-linking combined with mass spectrometry provides a powerful method for identifying protein−protein interactions and probing the structure of protein complexes. A number of strategies have been reported that take advantage of the high sensitivity and high resolution of modern mass spectrometers. Approaches typically include synthesis of novel cross-linking compounds, and/or isotopic labeling of the cross-linking reagent and/or protein, and label-free methods. We report Xlink-Identifier, a comprehensive data analysis platform that has been developed to support label-free analyses. It can identify interpeptide, intrapeptide, and deadend cross-links as well as underivatized peptides. The software streamlines data preprocessing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein−protein interactions and protein structure using mass spectrometry. The software has been evaluated using a custom synthesized cross-linking reagent that features an enrichment tag. Xlink-Identifier offers the potential to perform large-scale identifications of protein−protein interactions using tandem mass spectrometry.

Search
Clear search
Close search
Google apps
Main menu