9 datasets found

f
Data from: Xlink-Identifier: An Automated Data Analysis Platform for...
acs.figshare.com
zip
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiuxia Du; Saiful M. Chowdhury; Nathan P. Manes; Si Wu; M. Uljana Mayer; Joshua N. Adkins; Gordon A. Anderson; Richard D. Smith (2023). Xlink-Identifier: An Automated Data Analysis Platform for Confident Identifications of Chemically Cross-Linked Peptides Using Tandem Mass Spectrometry [Dataset]. http://doi.org/10.1021/pr100848a.s004
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/pr100848a.s004
Dataset updated
Jun 4, 2023
Dataset provided by
ACS Publications
Authors
Xiuxia Du; Saiful M. Chowdhury; Nathan P. Manes; Si Wu; M. Uljana Mayer; Joshua N. Adkins; Gordon A. Anderson; Richard D. Smith
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Chemical cross-linking combined with mass spectrometry provides a powerful method for identifying protein−protein interactions and probing the structure of protein complexes. A number of strategies have been reported that take advantage of the high sensitivity and high resolution of modern mass spectrometers. Approaches typically include synthesis of novel cross-linking compounds, and/or isotopic labeling of the cross-linking reagent and/or protein, and label-free methods. We report Xlink-Identifier, a comprehensive data analysis platform that has been developed to support label-free analyses. It can identify interpeptide, intrapeptide, and deadend cross-links as well as underivatized peptides. The software streamlines data preprocessing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein−protein interactions and protein structure using mass spectrometry. The software has been evaluated using a custom synthesized cross-linking reagent that features an enrichment tag. Xlink-Identifier offers the potential to perform large-scale identifications of protein−protein interactions using tandem mass spectrometry.
A Comprehensive and Universal Method for Assessing the Performance of...
plos.figshare.com
tiff
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikhail G. Dozmorov; Joel M. Guthridge; Robert E. Hurst; Igor M. Dozmorov (2023). A Comprehensive and Universal Method for Assessing the Performance of Differential Gene Expression Analyses [Dataset]. http://doi.org/10.1371/journal.pone.0012657
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0012657
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mikhail G. Dozmorov; Joel M. Guthridge; Robert E. Hurst; Igor M. Dozmorov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The number of methods for pre-processing and analysis of gene expression data continues to increase, often making it difficult to select the most appropriate approach. We present a simple procedure for comparative estimation of a variety of methods for microarray data pre-processing and analysis. Our approach is based on the use of real microarray data in which controlled fold changes are introduced into 20% of the data to provide a metric for comparison with the unmodified data. The data modifications can be easily applied to raw data measured with any technological platform and retains all the complex structures and statistical characteristics of the real-world data. The power of the method is illustrated by its application to the quantitative comparison of different methods of normalization and analysis of microarray data. Our results demonstrate that the method of controlled modifications of real experimental data provides a simple tool for assessing the performance of data preprocessing and analysis methods.
Clustering frequency results for each of the pre- and post-processing...
plos.figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben O. L. Mellors; Abigail M. Spear; Christopher R. Howle; Kelly Curtis; Sara Macildowie; Hamid Dehghani (2023). Clustering frequency results for each of the pre- and post-processing data-type. [Dataset]. http://doi.org/10.1371/journal.pone.0238647.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0238647.t001
Dataset updated
Jun 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Ben O. L. Mellors; Abigail M. Spear; Christopher R. Howle; Kelly Curtis; Sara Macildowie; Hamid Dehghani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustering frequency results for each of the pre- and post-processing data-type.
Classification results of the studies analyzed in A State-of-the-Art Review...
zenodo.org
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
José Luis Alonso-Rocha; Antonio Martínez-Rojas; Antonio Martínez-Rojas; José González-Enríquez; José González-Enríquez; Jesús M. Sánchez-Oliva; José Luis Alonso-Rocha; Jesús M. Sánchez-Oliva (2025). Classification results of the studies analyzed in A State-of-the-Art Review to Examine the Impact of Intelligent Document Processing in Banking Automations [Dataset]. http://doi.org/10.5281/zenodo.15268178
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15268178
Dataset updated
Apr 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
José Luis Alonso-Rocha; Antonio Martínez-Rojas; Antonio Martínez-Rojas; José González-Enríquez; José González-Enríquez; Jesús M. Sánchez-Oliva; José Luis Alonso-Rocha; Jesús M. Sánchez-Oliva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This spreadsheet presents the meticulously classified results from the conducting phase of our systematic literature review titled "From Manual to Automated: A State-of-the-Art Review to Examine the Impact of Intelligent Document Processing in Banking Automation". Each entry within this document represents an individual study analyzed during our research, categorized according to a carefully designed classification framework to ensure a comprehensive and clear understanding of the evolving landscape in banking automation through Intelligent Document Processing (IDP) technologies.

Classification Framework Overview

RQ1. General Study Characterization

Date: indicates the year of publication of the study.

Contribution Source: refers to the type of publication in which the study appears, such as a journal article or conference paper.

Validation: describes the context in which the study’s findings are validated, distinguishing between research environments and industrial or practical applications.

Contribution Type: defines the nature of the study’s main contribution, whether it presents an algorithm, a theoretical analysis, a framework, a method, or a model.

Public Data Exposure: reflects whether the study generates original datasets and makes them publicly accessible, distinguishing between contributions that provide new open data and those that rely on existing sources or do not disclose their data.

RQ2. Machine Learning Approaches and Trends

Learning Paradigm: classifies the study’s learning approach as supervised or unsupervised.

AI Subfield: identifies the primary Artificial Intelligence (AI) domain of the study, such as data mining, computer vision, or natural language processing (NLP).

Model Category: describes the specific type of Machine Learning (ML) model applied in the study, including rule-based models, regression models, clustering, support vector machines, decision trees, or neural networks.

RQ3. Business Automation Strategies

Automation Compatibility: assesses whether the study’s proposal aligns with Robotic Process Automation (RPA) or fits within a broader, more general automation context.

IDP Life Cycle Stage: defines the phase of the IDP life cycle addressed by the study, such as preprocessing, data extraction, or classification.

Business Environment Integration: assesses whether the proposed solution is designed for integration within business environments or remains conceptual.

Data Preparation Techniques: describes the preprocessing steps applied to structure and enhance raw inputs, employed in the study, including cleaning, transformation, vectorization, or token and label manipulation.

RQ4. Application Areas

Application Domain: identifies the sector or industry targeted by the study, such as banking, finance, fraud detection, accounting, or auditing.

Case Study: specifies the particular application context or document type addressed by the study, for example, checks, invoices, signatures, or broader document categories.

This classification scheme is instrumental in providing a structured, in-depth analysis of the field's current state, trends, and future directions. The framework aids in navigating the vast amount of information in the domain, offering researchers, practitioners, and policymakers a clear vision of the significant aspects of each study to foster informed decisions and further innovation in banking automations through IDP.
f
Metaverse Gait Authentication Dataset (MGAD)
figshare.com
csv
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sandeep ravikanti (2025). Metaverse Gait Authentication Dataset (MGAD) [Dataset]. http://doi.org/10.6084/m9.figshare.28387664.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28387664.v1
Dataset updated
Feb 11, 2025
Dataset provided by
figshare
Authors
sandeep ravikanti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset OverviewThe Metaverse Gait Authentication Dataset (MGAD) is a large-scale dataset for gait-based biometric authentication in virtual environments. It consists of gait data from 5,000 simulated users, generated using Unity 3D and processed using OpenPose and MediaPipe. This dataset is ideal for researchers working on biometric authentication, gait analysis, and AI-driven identity verification systems.2. Data Structure & FormatFile Format: CSVNumber of Samples: 5,000 usersNumber of Features: 16 gait-based featuresColumns: Each row represents a user with corresponding gait feature valuesSize: Approximately (mention size in MB/GB after upload)3. Feature DescriptionsThe dataset includes 16 extracted gait features:Stride Length (m): Average distance covered in one gait cycle.Step Frequency (steps/min): Number of steps taken per minute.Stance Phase Duration (s): Stance phase in a gait cycle.Swing Phase Duration (s): Duration of the swing phase in a gait cycle.Double Support Phase Duration (s): Time both feet are in contact with the ground.Step Length (m): Distance between consecutive foot placements.Cadence Variability (%): Variability in step rate.Hip Joint Angle (°): Maximum angle variation in the hip joint.Knee Joint Angle (°): Maximum flexion-extension knee angle.Ankle Joint Angle (°): Angle variation at the ankle joint.Avg. Vertical GRF (N): Average vertical ground reaction force.Avg. Anterior-Posterior GRF (N): Ground reaction force in the forward-backward direction.Avg. Medial-Lateral GRF (N): Ground reaction force in the side-to-side direction.Avg. COP Excursion (mm): Center of pressure movement during stance phase.Foot Clearance during Swing Phase (mm): Minimum height of the foot during the swing phase.Gait Symmetry Index (%): Measure of symmetry between left and right gait cycles.4. How to Use the DatasetLoad the dataset in Python using Pandas:Use the features for machine learning models in biometric authentication.Apply preprocessing techniques like normalization and feature scaling.Train and evaluate deep learning or ensemble models for gait recognition.5. Citation & LicenseIf you use this dataset, please cite it as follows:Sandeep Ravikanti, "Metaverse Gait Authentication Dataset (MGAD)," IEEE DataPort, 2025. DOI: https://dx.doi.org/10.21227/rvh5-88426. Contact InformationFor inquiries or collaborations, please contact: bitsrmit2023@gmail.com
f
Specificity results for each of the pre- and post-processing data-type.
plos.figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben O. L. Mellors; Abigail M. Spear; Christopher R. Howle; Kelly Curtis; Sara Macildowie; Hamid Dehghani (2023). Specificity results for each of the pre- and post-processing data-type. [Dataset]. http://doi.org/10.1371/journal.pone.0238647.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0238647.t003
Dataset updated
Jun 14, 2023
Dataset provided by
PLOS ONE
Authors
Ben O. L. Mellors; Abigail M. Spear; Christopher R. Howle; Kelly Curtis; Sara Macildowie; Hamid Dehghani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Specificity results for each of the pre- and post-processing data-type.
Data from: Preprocessing of Public RNA-sequencing Datasets to Facilitate...
zenodo.org
data.niaid.nih.gov
bin, zip
Updated May 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naomi Rapier Sharman; John Krapohl; Ethan Beausoleil; Kennedy Gifford; Ben Hinatsu; Curtis Hoffman; Makayla Komer; Tiana M. Scott; Brett E. Pickett; Naomi Rapier Sharman; John Krapohl; Ethan Beausoleil; Kennedy Gifford; Ben Hinatsu; Curtis Hoffman; Makayla Komer; Tiana M. Scott; Brett E. Pickett (2021). Preprocessing of Public RNA-sequencing Datasets to Facilitate Downstream Analyses of Human Diseases: Dataset [Dataset]. http://doi.org/10.5281/zenodo.4757764
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4757764
Dataset updated
May 14, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Naomi Rapier Sharman; John Krapohl; Ethan Beausoleil; Kennedy Gifford; Ben Hinatsu; Curtis Hoffman; Makayla Komer; Tiana M. Scott; Brett E. Pickett; Naomi Rapier Sharman; John Krapohl; Ethan Beausoleil; Kennedy Gifford; Ben Hinatsu; Curtis Hoffman; Makayla Komer; Tiana M. Scott; Brett E. Pickett
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Publicly available RNA-sequencing (RNA-seq) data are a rich resource for elucidating the mechanisms of human disease; however, preprocessing these data requires considerable bioinformatic expertise and computational infrastructure. Analyzing multiple datasets with a consistent computational workflow increases the accuracy of downstream meta-analyses. This collection of datasets represents the human intracellular transcriptional response to disorders and diseases such as acute lymphoblastic leukemia (ALL), B-cell lymphomas, chronic obstructive pulmonary disease (COPD), colorectal cancer, lupus erythematosus; as well as infection with pathogens including Borrelia burgdorferi, hantavirus, influenza A virus, Middle East respiratory syndrome coronavirus (MERS-CoV), Streptococcus pneumoniae, respiratory syncytial virus (RSV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We calculated the statistically significant differentially expressed genes and Gene Ontology (GO) terms for all datasets. In addition, a subset of the datasets also include results from splice variant analyses, intracellular signaling pathway enrichments as well as read mapping and quantification. All analyses were performed using well-established algorithms and are provided to facilitate future data mining activities, wet lab studies, and to accelerate collaboration and discovery.
Data from: Dynamic binning peak detection and assessment of various...
data.niaid.nih.gov
metabolomicsworkbench.org
xml
Updated Sep 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Horvatovich Péter (2020). Dynamic binning peak detection and assessment of various lipidomics liquid chromatography-mass spectrometry pre-processing platforms [Dataset]. https://data.niaid.nih.gov/resources?id=st001493
Explore at:
xmlAvailable download formats
Dataset updated
Sep 25, 2020
Dataset provided by
University of Groningen
Authors
Horvatovich Péter
Variables measured
Metabolomics, Treatment:Pure Plasma, Treatment:Plasma + IS 1, Treatment:Plasma + IS 1/2, Treatment:Plasma + IS 1/4, Treatment:Plasma + IS 1/8, Treatment:Plasma + IS 1/16
Description
Liquid chromatography-mass spectrometry (LC-MS) based lipidomics generate a large dataset, which requires high-performance data pre-processing tools for their interpretation such as XCMS, mzMine and Progenesis. These pre-processing tools rely heavily on accurate peak detection, which depends on setting the peak detection mass tolerance (PDMT) properly. The PDMT is usually set with a fixed value in either ppm or Da units. However, this fixed value may result in duplicates or missed peak detection. Therefore, we developed the dynamic binning method for accurate peak detection, which takes into account the peak broadening described by well-known physics laws of ion separation and set dynamically the value of PDMT as a function of m/z. Namely, in our method, the PDMT is proportional to for FTICR, to for Orbitrap, to m/z for Q-TOF and is a constant for Quadrupole mass analyzer, respectively. The dynamic binning method was implemented in XCMS. Our further goal was to compare the performance of different lipidomics pre-processing tools to find differential compounds. We have generated set samples with 43 lipids internal standards differentially spiked to aliquots of one human plasma lipid sample using Orbitrap LC-MS/MS. The performance of the various pipelines using aligned parameter sets was quantified by a quality score system which reflects the ability of a pre-processing pipeline to detect differential peaks spiked at various concentration levels. The quality score indicates that the dynamic binning method improves the performance of XCMS (maximum p-value 9.8·10-3 of two-sample Wilcoxon test). The modified XCMS software was further compared with mzMine and Progenesis. The results showed that modified XCMS and Progenesis had a similarly good performance in the aspect of finding differential compounds. In addition, Progenesis shows lower variability as indicated by lower CVs, followed by XCMS and mzMine. The lower variability of Progenesis improve the quantification, however, provide an incorrect quantification abundance order of spiked-in internal standards.
Additional file 1 of Multivariate pattern analysis: a method and software to...
springernature.figshare.com
xlsx
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim U. H. Baumeister; Eivind Aadland; Roger G. Linington; Olav M. Kvalheim (2024). Additional file 1 of Multivariate pattern analysis: a method and software to reveal, quantify, and visualize predictive association patterns in multicollinear data [Dataset]. http://doi.org/10.6084/m9.figshare.25123885.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25123885.v1
Dataset updated
Feb 1, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tim U. H. Baumeister; Eivind Aadland; Roger G. Linington; Olav M. Kvalheim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1. Data analyzed in this work after preprocessing but prior to any adjustments.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Xiuxia Du; Saiful M. Chowdhury; Nathan P. Manes; Si Wu; M. Uljana Mayer; Joshua N. Adkins; Gordon A. Anderson; Richard D. Smith (2023). Xlink-Identifier: An Automated Data Analysis Platform for Confident Identifications of Chemically Cross-Linked Peptides Using Tandem Mass Spectrometry [Dataset]. http://doi.org/10.1021/pr100848a.s004

Data from: Xlink-Identifier: An Automated Data Analysis Platform for Confident Identifications of Chemically Cross-Linked Peptides Using Tandem Mass Spectrometry

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.1021/pr100848a.s004

Dataset updated

Jun 4, 2023

Dataset provided by

ACS Publications

Authors

Xiuxia Du; Saiful M. Chowdhury; Nathan P. Manes; Si Wu; M. Uljana Mayer; Joshua N. Adkins; Gordon A. Anderson; Richard D. Smith

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Chemical cross-linking combined with mass spectrometry provides a powerful method for identifying protein−protein interactions and probing the structure of protein complexes. A number of strategies have been reported that take advantage of the high sensitivity and high resolution of modern mass spectrometers. Approaches typically include synthesis of novel cross-linking compounds, and/or isotopic labeling of the cross-linking reagent and/or protein, and label-free methods. We report Xlink-Identifier, a comprehensive data analysis platform that has been developed to support label-free analyses. It can identify interpeptide, intrapeptide, and deadend cross-links as well as underivatized peptides. The software streamlines data preprocessing, peptide scoring, and visualization and provides an overall data analysis strategy for studying protein−protein interactions and protein structure using mass spectrometry. The software has been evaluated using a custom synthesized cross-linking reagent that features an enrichment tag. Xlink-Identifier offers the potential to perform large-scale identifications of protein−protein interactions using tandem mass spectrometry.

Clear search

Close search

Google apps

Main menu

Data from: Xlink-Identifier: An Automated Data Analysis Platform for...

A Comprehensive and Universal Method for Assessing the Performance of...

Clustering frequency results for each of the pre- and post-processing...

Classification results of the studies analyzed in A State-of-the-Art Review...

Metaverse Gait Authentication Dataset (MGAD)

Specificity results for each of the pre- and post-processing data-type.

Data from: Preprocessing of Public RNA-sequencing Datasets to Facilitate...

Data from: Dynamic binning peak detection and assessment of various...

Additional file 1 of Multivariate pattern analysis: a method and software to...

Data from: Xlink-Identifier: An Automated Data Analysis Platform for Confident Identifications of Chemically Cross-Linked Peptides Using Tandem Mass Spectrometry