16 datasets found
  1. Data from: High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity...

    • figshare.com
    • acs.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongmei Yang; Kevin Ramkissoon; Eric Hamlett; Morgan C. Giddings (2023). High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning [Dataset]. http://doi.org/10.1021/pr070088g.s005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    ACS Publications
    Authors
    Dongmei Yang; Kevin Ramkissoon; Eric Hamlett; Morgan C. Giddings
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    For MALDI-TOF mass spectrometry, we show that the intensity of a peptide–ion peak is directly correlated with its sequence, with the residues M, H, P, R, and L having the most substantial effect on ionization. We developed a machine learning approach that exploits this relationship to significantly improve peptide mass fingerprint (PMF) accuracy based on training data sets from both true-positive and false-positive PMF searches. The model’s cross-validated accuracy in distinguishing real versus false-positive database search results is 91%, rivaling the accuracy of MS/MS-based protein identification.

  2. d

    Metabolomic fingerprints of individual algal cells using the Single-Probe...

    • search.dataone.org
    • bco-dmo.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boris Wawrik; Zhibo Yang (2021). Metabolomic fingerprints of individual algal cells using the Single-Probe Mass Spectrometry technique from experiments conducted between May and August of 2017 [Dataset]. https://search.dataone.org/view/sha256%3A381fb884dfb472caf19efcb9c40b25790fe18fcd708e35995ae7e48723780fd6
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    Boris Wawrik; Zhibo Yang
    Time period covered
    May 1, 2017 - Aug 1, 2017
    Description

    These data were published in Sun et al., 2018.

    A tabular version of this dataset is accessible by clicking the \"Get Data\" button on this page. The columns of the tabular dataset are described in the \"Parameters\" section on this page. This dataset is available as standard MS file format (.mzML) files within the following tar.gz file:
    Algal_Cell_Data_mzML.tar.gz (2.4 GB, contains 20 .mzML files)

    .mzML is a standard MS file format that can be viewed using freeware such as mMass (http://www.mmass.org/) and ProteoWizard (http://proteowizard.sourceforge.net/index.shtml).

  3. o

    SaganMC: A molecular complexity dataset with mass spectra

    • ora.ox.ac.uk
    csv
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baydin, AG; Bell, A; Gebhard, T; Gong, J; Hastings, J; Fricke, M; Phillips, M; Warren-Rhodes, K; Cabrol, N; Mascaro, M; Sandford, S (2025). SaganMC: A molecular complexity dataset with mass spectra [Dataset]. http://doi.org/10.5287/ora-vyqqmdonx
    Explore at:
    csv(356803750), csv(40237369)Available download formats
    Dataset updated
    Jan 1, 2025
    Dataset provided by
    University of Oxford
    Authors
    Baydin, AG; Bell, A; Gebhard, T; Gong, J; Hastings, J; Fricke, M; Phillips, M; Warren-Rhodes, K; Cabrol, N; Mascaro, M; Sandford, S
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SaganMC: A Molecular Complexity Dataset with Mass Spectra

    Summary

    SaganMC is a machine learning-ready dataset designed for molecular complexity prediction, spectral analysis, and chemical discovery. Molecular complexity metrics quantify how structurally intricate a molecule is, reflecting how difficult it is to construct or synthesize.

    The dataset includes 406,446 molecules. A subset of 16,653 molecules includes experimental mass spectra. We provide standard representations (SMILES, InChI, SELFIES), RDKit-derived molecular descriptors, Morgan fingerprints, and three complementary complexity scores: Bertz, Böttcher, and the Molecular Assembly Index (MA). MA scores, computed using code from the Cronin Group, are especially relevant to astrobiology research as potential agnostic biosignatures. Assigning MA indices to molecules is compute intensive, and generating this dataset required over 100,000 CPU hours on Google Cloud.

    SaganMC is named in honor of Carl Sagan, the astronomer and science communicator whose work inspired generations to explore life beyond Earth. The initial version of this dataset was produced during a NASA Frontier Development Lab (FDL) astrobiology sprint.

    Intended Uses

    * Train machine learning models to predict molecular complexity directly from molecular structure or mass spectrometry data.
    * Develop surrogate models to approximate Molecular Assembly Index (MA) scores efficiently at large scale. * Benchmark complexity metrics (Bertz, Böttcher, MA) across diverse molecular classes.
    * Enable onboard ML pipelines for spacecraft to prioritize high-complexity chemical targets during exploration.
    * Explore correlations between molecular complexity and experimental observables such as mass spectra. * Support AI-driven chemical discovery tasks.

    Available Formats

    CSV

    The original dataset is in CSV format.

    * SaganMC-400k (sagan-mc-400k.csv): The full dataset with 406,446 molecules, including structural and complexity features.
    * SaganMC-Spectra-16k (sagan-mc-spectra-16k.csv): A 16,653-molecule subset of the full dataset, with experimental mass spectra from NIST.

    Dataset Structure

    The data includes 36 columns. A split column assigns rows into train, val, or test splits (80/10/10).

    Features

    * inchi: International Chemical Identifier (InChi).
    * inchikey: Hashed version of the InChI string, used for indexing.
    * selfies: SELFIES (SELF-referencIng Embedded Strings) representation of the molecule.
    * smiles: SMILES (Simplified Molecular Input Line Entry System) representation of the molecule.
    * smiles_scaffold: Murcko scaffold representation extracted from the molecule.
    * formula: Molecular formula.
    * fingerprint_morgan: Base64-encoded 2048-bit Morgan fingerprint (ECFP4) with chirality.
    * num_atoms: Number of heavy atoms (excluding hydrogens).
    * num_atoms_all: Total number of atoms (including hydrogens).
    * num_bonds: Number of bonds between heavy atoms.
    * num_bonds_all: Total number of bonds (including to hydrogens).
    * num_rings: Number of rings in the molecule.
    * num_aromatic_rings: Number of aromatic rings in the molecule.
    * physchem_mol_weight: Molecular weight (Daltons).
    * physchem_logp: LogP, a measure of lipophilicity.
    * physchem_tpsa: Topological Polar Surface Area, related to hydrogen bonding.
    * physchem_qed: Quantitative Estimate of Drug-likeness.
    * physchem_h_acceptors: Number of hydrogen bond acceptors.
    * physchem_h_donors: Number of hydrogen bond donors.
    * physchem_rotatable_bonds: Number of rotatable bonds.
    * physchem_fraction_csp3: Fraction of sp3-hybridized carbon atoms.
    * mass_spectrum_nist: Mass spectrum data sourced from the NIST Chemistry WebBook, encoded in JCAMP-DX format as a string. Includes metadata, units, and a peak table.
    * complex_ma_score: Molecular Assembly Index score (pathway complexity).
    * complex_ma_runtime: Wall-clock runtime (in seconds) to compute MA score.
    * complex_bertz_score: Bertz/Hendrickson/Ihlenfeldt (BHI) complexity score.
    * complex_bertz_runtime: Wall-clock runtime (in seconds) to compute BHI score.
    * complex_boettcher_score: Böttcher complexity score, based on atom environments.
    * complex_boettcher_runtime: Wall-clock runtime (in seconds) to compute Böttcher score.
    * synth_sa_score: Synthetic accessibility score (SAScore)
    * meta_cas_number: CAS Registry Number, if available.
    * meta_names: Common names or synonyms for the molecule.
    * meta_iupac_name: IUPAC name for the molecule.
    * meta_comment: Optional comments associated with the molecule.
    * meta_origin: Source or origin information for the molecule.
    * meta_reference: Reference or source citation for the molecule.
    * split: Predefined data split (train, val, test).


    Data Generation

    For a detailed description of the data generation details for an earlier version of this dataset, please consult the following technical report:

    Bell, Aaron C., Timothy D. Gebhard, Jian Gong, Jaden J. A. Hastings, Atılım Güneş Baydin, G. Matthew Fricke, Michael Phillips, Kimberley Warren-Rhodes, Nathalie A. Cabrol, Massimo Mascaro, and Scott Sanford. 2022. Signatures of Life: Learning Features of Prebiotic and Biotic Molecules. NASA Frontier Development Lab Technical Memorandum.
    https://oxai4science.github.io/assets/pdf/bell-2022-molecules.pdf

    Sources and Tools Used

    * ChEMBL: https://www.ebi.ac.uk/chembl/
    * RDKit: https://www.rdkit.org/](https://www.rdkit.org/
    * Chemical Identifier Resolver, NCI/CADD Group: https://cactus.nci.nih.gov/
    * NIST Chemistry WebBook: https://webbook.nist.gov/
    * Go code for MA complexity computation, obtained privately from the Cronin group:
    https://www.chem.gla.ac.uk/cronin/
    * Böttcher complexity score code from Boskovic group:
    https://github.com/boskovicgroup/bottchercomplexity

    Citation

    Please cite the following if you use this dataset:

    @inproceedings{gebhard-2022-molecular,
    title = {Inferring molecular complexity from mass spectrometry data using machine learning}, author = {Gebhard, Timothy D. and Bell, Aaron and Gong, Jian and Hastings, Jaden J.A. and Fricke, George M. and Cabrol, Nathalie and Sandford, Scott and Phillips, Michael and Warren-Rhodes, Kimberley and Baydin, {Atılım Güneş}}, booktitle = {Machine Learning and the Physical Sciences workshop, NeurIPS 2022}, year = {2022} }

    Acknowledgments

    This work was enabled by and carried out during an eight-week research sprint as part of the Frontier Development Lab (FDL), a public-private partnership between NASA, the U.S. Department of Energy, the SETI Institute, Trillium Technologies, and leaders in commercial AI, space exploration, and Earth sciences, formed with the purpose of advancing the application of machine learning, data science, and high performance computing to problems of material concern to humankind.

    We thank Google Cloud and the University of New Mexico Center for Advanced Research Computing for providing the compute resources critical to completing this work. GMF was funded by NASA Astrobiology NfoLD grant #80NSSC18K1140. We also thank the Cronin Group at the University of Glasgow for their collaboration, and for providing us with the code for computing MA values.

  4. m

    Dataset Advancement: Glycan Composition and Peptide Mass Fingerprinting of...

    • data.mendeley.com
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Maldonado-Hernandez (2023). Dataset Advancement: Glycan Composition and Peptide Mass Fingerprinting of the Native Torpedo californica Nicotinic Acetylcholine Receptor after a Multi-Step Sequential Purification Method [Dataset]. http://doi.org/10.17632/zbfxf8kwvn.1
    Explore at:
    Dataset updated
    Sep 18, 2023
    Authors
    Rafael Maldonado-Hernandez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The study described in this dataset focuses on the characterization of the nicotinic acetylcholine receptor (nAChR), a member of the Cys-loop pentameric ligand-gated ion channels that play critical roles in organismal function. Our research efforts have resulted in the successful production of a highly pure nAChR, which has allowed for more robust studies to be conducted, including high-throughput crystallization experiments. Using advanced technologies such as Nano Lc MS/MS and MALDI ToF/ToF, we have identified and extensively characterized each nAChR subunit, achieving a 100% identity. Furthermore, we have investigated the N-linked glycans in the Torpedo californica-nAChR (Tc-nAChR) subunits. By digesting the Tc-nAChR subunits with PNGase F and analyzing the released glycans with MALDI-ToF, we identified the presence of high-mannose N-glycan in all native Tc-nAChR subunits. Specifically, we observed the oligommanose population Man8-9GlcNac2 with peaks at m/z 1742 and 1904 ([M + Na]+ ions). Overall, this study emphasizes the importance of continued research efforts aimed at understanding the complex biology of the Cys-loop pentameric ligand-gated ion channels and their involvement in organismal function and disease. The insights gained from such studies could have far-reaching implications for the development of new therapeutic strategies to combat a wide range of health issues.

  5. n

    Data from: Mathematical chromatography deciphers the molecular fingerprints...

    • data.niaid.nih.gov
    zip
    Updated Jan 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Urban J. Wünsch; Jeffrey A. Hawkes (2020). Mathematical chromatography deciphers the molecular fingerprints of dissolved organic matter [Dataset]. http://doi.org/10.5061/dryad.nk98sf7pp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 17, 2020
    Dataset provided by
    Chalmers University of Technology
    Uppsala University
    Authors
    Urban J. Wünsch; Jeffrey A. Hawkes
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    High-resolution mass spectrometry (HRMS) elucidates the molecular composition of dissolved organic matter (DOM) through the unequivocal assignment of molecular formulas. When HRMS is used as a detector coupled to high performance liquid chromatography (HPLC), the molecular fingerprints of DOM are further augmented. However, the identification of eluting compounds remains impossible when DOM chromatograms consist of unresolved humps. Here, we utilized the concept of mathematical chromatography to achieve information reduction and feature extraction. Parallel Factor Analysis (PARAFAC) was applied to a dataset describing the reverse-phase separation of DOM in headwater streams located in southeast Sweden. A dataset consisting of 1355 molecular formulas and 7178 mass spectra was reduced to five components that described 96.89% of the data. Each component summarized the distinct chromatographic elution of molecular formulas with different polarity. Component scores represented the abundance of the identified HPLC features in each sample. Using this chemometric approach allowed the identification of common patterns in HPLC–HRMS datasets by reducing thousands of mass spectra to only a few statistical components. Unlike in principal component analysis (PCA), components closely followed the analytical principles of HPLC–HRMS and therefore represented more realistic pools of DOM. This approach provides a wealth of new opportunities for unravelling the composition of complex mixtures in natural and engineered systems.

    Methods Dataset1.zip

    Samples were stored unfiltered in the dark at 4° C for approximately five months after sampling.
    On the day of measurements, specific volumes of samples were transferred to 2 mL Eppendorf vials so that 11.25 µg carbon was present in each sample vial, while 2 mL of blanks were transferred.
    The water in samples and blanks was subsequently removed by vacuum evaporation at 45° C, after which samples were reconstituted in 150 µL 1 % (v/v) formic acid to a final concentration of 75 mg/L carbon.
    
    Reverse-phase chromatography separations were performed on an Agilent 1100 series instrument with an Agilent PLRP‑S series column (150 x 1 mm, 3 µm bed size, 100 Å pore size). Eighty µL sample was loaded at a flow rate of 100 µL min-1 0.1 % formic acid, 0.05 % ammonia, and 5 % acetonitrile. The elution of DOM was achieved through a step-wise increase in concentration of solvent B (100 % acetonitrile) from zero initially, followed by 20 %, and ending in > 45 % solvent B.
    
    
    Mass spectrometry detection was carried out with an Orbitrap LTQ-Velos-Pro (Thermo Scientific, Germany) with electrospray ionization (ESI, negative mode) as ion source. Transient ions were collected in the range of m/z 150 ‑1000 at an instrumental resolving power set to 105. An external calibration with the manufacturer’s calibration mixture was followed by an internal calibration using six ubiquitous ions in the range of m/z 251 ‑ 493.
    
    
    Transients were filtered for noise after considering peaks with mass defect 0.6-0.8 as noise and removing all peaks with intensity lower than the mean + 3 standard deviations of these peaks. Molecular formulas were assigned within the range C4-40, H4-80, O1-40, N0-1, S0-1 in the mass range m/z 170 – 700. Additionally, assignments were constricted to O/C 0-1, H/C 0.3 ‑2, a double bond equivalent minus oxygen less than or equal to 10, and a mass defect of ‑0.1 to 0.3 (decimal after the nominal mass).
    
    
    Formulas detected in process blanks were excluded from further analysis. Formulas were also removed from consideration in samples if the intensity did not exceed the noise + 10 standard deviations in at least 10 sequential transients at some point in the elution. This molecular formula assignment and data treatment yielded 2052 unique molecular formulas. Several sequential intensities (typically 3-4) were summed to a chromatographic resolution of 0.1 min to favour analyte signals over instrument noise and to reduce computational requirements.
    
    
    To yield a more quantitative dataset in subsequent analyses, the DOC normalization was reversed by accounting for the sample specific volume that yielded the constant amount of carbon dissolved for chromatographic analysis. For statistical modelling, the retention time window of 5.0 ‑22.9 min was selected, yielding a preliminary dataset size of 74 samples x 2052 molecular formulas x 180 retention times (dataset_1.zip).
    

    Dataset2.zip

    Dataset 1 was the source of dataset 2.
    All mass spectra were divided by a factor of 4.92 x 107
    
    Masses that were detected in less than 10 % of measurements (including samples and retention times) were excluded from further analysis (N = 661, Dataset4.zip).
    An additional 36 molecular formulas (Dataset3.zip) were removed from the dataset due to noticeably unique chromatograms.
    Chromatographic sections with missing observations of at least 2 min (20 observations or more) were set to zero while leaving a gap of missing numbers of 0.7 min to each end of the section. 
    Every 2nd retention time (after t = 7 min) was excluded
    All data above retention times of 22.2 min was excluded.
    

    Dataset3.zip

    Dataset3 contains outliers that were removed in step 4 of Dataset2
    

    Dataset4.zip

    Dataset4 contains rarely observed formulas that were removed in step 3 of Dataset2
    

    Dataset5.zip

    To isolate groups of molecular formulas with identical chromatographic elution profiles, parallel factor analysis (PARAFAC) was utilized. All data processing and modelling was carried out using PLS_Toolbox (v8.61, Eigenvector Research Inc.) in MATLAB (v9.7, MathWorks Inc.). PARAFAC models were constrained to nonnegativity in all modes and the convergence criterion was set to a relative change in fitting error between iterations of 10-12. Each model was initialized 50 times with orthogonalized random numbers and only the least squares solution was further inspected. Models with two to nine components were considered. A five-component model was validated. Dataset5 contains it's properties and supporting geochemical sample parameters.

    Dataset6.zip

    Dataset6 contains the residual chromatograms (data minus model = dataset2 minus dataset5) for every sample and formula. To create one file, the residuals were unfolded into one large matrix. Each formula in every sample is acompanied by a tag that categorizes the residual chromatogram. The categories were assigned as follows (numbers correspond to the numbering scheme in the csv file):

    (1) Underestimations are chromatograms in which more than 80% of residuals were positive.

    (2) Overestimations are chromatograms in which more than 80% of residuals were negative.

    (3) False positive abundances were identified by counting the cases in which PARAFAC estimated a non-zero chromatogram, but the data only contained zeros or missing observations.

    (5) Residuals were classified as random when they did not fall into any other category, their absolute median was < 0.001, and the number of positive and negative residuals each accounted for between 40 and 60 % of the raw chromatograms (not counting zeros or missing observations).

    (NaN) Residuals did not fall into any of the above categories. Therefore "uncategorized".

    The number "4" was not used in this listing.

  6. m

    HS-GC-IMS data of fermentations of different organisms

    • data.mendeley.com
    Updated Jun 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Weller (2023). HS-GC-IMS data of fermentations of different organisms [Dataset]. http://doi.org/10.17632/v9gxkpdp3c.1
    Explore at:
    Dataset updated
    Jun 26, 2023
    Authors
    Philipp Weller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains raw headspace GC-IMS data in the .mea file format (Gesellschaft für analytische Sensorsysteme mbH, Dortmund, Germany) of shake flask fermentations of different pure and mixed cultures. The aim was to differentiate the microorganisms and detect mixed cultures compared to pure cultures by non-target profiling of volatile metabolites (mVOC) as a first step towards a new analytical method for early contamination detection. Microorganisms were obtained from the German Collection of Microorganisms and Cell Cultures GmbH (DSMZ, Braunschweig, Germany, https://www.dsmz.de). Levilactobacillus brevis (DSM 2647), Escherichia coli (DSM 498), Saccharomyces cerevisiae (DSM 70449) and Pseudomonas fluorescens (DSM 6147) were used. The culture media was a modified De Man, Rogosa, and Sharpe (MRS) medium, adapted from https://www.dsmz.de/microorganisms/medium/pdf/DSMZ_Medium11.pdf. The inhibiting compounds ammonium citrate and sodium acetate were omitted. Fermentations were carried out in 300 mL shake flasks at 30°C and 180 rpm on a shaking incubator for 6 hours. The dataset contains four batches of each organism (GCIMS_pure_cultures folder) and two - four batches of each combination of two organisms (GCIMS_mixed_cultures folder). 2 mL of fermentation broth were sampled every hour for headspace GC-IMS analysis. Measurements were performed using an Agilent 6890 GC (Agilent Technologies Deutschland GmbH, Waldbronn, Germany) equipped with an OEM IMS detector (Gesellschaft für analytische Sensorsysteme mbH, Dortmund, Germany). A non-polar ZB-5 GC column (Phenomenex Inc., Torrance, USA) was used.

  7. f

    Data from: Workflow for Evaluating Normalization Tools for Omics Data Using...

    • acs.figshare.com
    txt
    Updated Oct 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleesa E. Chua; Leah D. Pfeifer; Emily R. Sekera; Amanda B. Hummon; Heather Desaire (2023). Workflow for Evaluating Normalization Tools for Omics Data Using Supervised and Unsupervised Machine Learning [Dataset]. http://doi.org/10.1021/jasms.3c00295.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 28, 2023
    Dataset provided by
    ACS Publications
    Authors
    Aleesa E. Chua; Leah D. Pfeifer; Emily R. Sekera; Amanda B. Hummon; Heather Desaire
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    To achieve high quality omics results, systematic variability in mass spectrometry (MS) data must be adequately addressed. Effective data normalization is essential for minimizing this variability. The abundance of approaches and the data-dependent nature of normalization have led some researchers to develop open-source academic software for choosing the best approach. While these tools are certainly beneficial to the community, none of them meet all of the needs of all users, particularly users who want to test new strategies that are not available in these products. Herein, we present a simple and straightforward workflow that facilitates the identification of optimal normalization strategies using straightforward evaluation metrics, employing both supervised and unsupervised machine learning. The workflow offers a “DIY” aspect, where the performance of any normalization strategy can be evaluated for any type of MS data. As a demonstration of its utility, we apply this workflow on two distinct datasets, an ESI-MS dataset of extracted lipids from latent fingerprints and a cancer spheroid dataset of metabolites ionized by MALDI-MSI, for which we identified the best-performing normalization strategies.

  8. d

    McGill MS 73, Peptide Mass Fingerprinting

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fiddyment, Sarah (2023). McGill MS 73, Peptide Mass Fingerprinting [Dataset]. http://doi.org/10.5683/SP3/MEDVXE
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Fiddyment, Sarah
    Time period covered
    Jan 1, 1400 - Jan 1, 1500
    Description

    Peptide Mass Fingerprint Spectra MADLI data sets. Samples taken from five locations from McGill MS 73 : middle lower pastedown on interior right board, shiny substance on surface of wood upper right board, interior, where there is also a lot of dark residue on leather turn-in of inner right board; hoping for adhesive as well as the hide of the leather pulp loose on surface of wood; probably(?) insect frass Insect frass in grooves in board of inside lower cover (beneath parchment endpaper) Samples analyzed / data produced by Sarah Fiddyment as a collaboration for The Book and The Silk Roads project. Sample Numbers: BSRP15, BSRP16, BSRP17, BSRP18, BSRP19, BSRP43.

  9. d

    Data from: Mathematical chromatography deciphers the molecular fingerprints...

    • datadryad.org
    zip
    Updated Jan 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Urban J. Wünsch; Jeffrey A. Hawkes (2020). Mathematical chromatography deciphers the molecular fingerprints of dissolved organic matter [Dataset]. http://doi.org/10.5061/dryad.nk98sf7pp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 17, 2020
    Dataset provided by
    Dryad
    Authors
    Urban J. Wünsch; Jeffrey A. Hawkes
    Time period covered
    2020
    Description

    Dataset1.zip

    Samples were stored unfiltered in the dark at 4° C for approximately five months after sampling.
    On the day of measurements, specific volumes of samples were transferred to 2 mL Eppendorf vials so that 11.25 µg carbon was present in each sample vial, while 2 mL of blanks were transferred.
    The water in samples and blanks was subsequently removed by vacuum evaporation at 45° C, after which samples were reconstituted in 150 µL 1 % (v/v) formic acid to a final concentration of 75 mg/L carbon.
    
    Reverse-phase chromatography separations were performed on an Agilent 1100 series instrument with an Agilent PLRP‑S series column (150 x 1 mm, 3 µm bed size, 100 Å pore size). Eighty µL sample was loaded at a flow rate of 100 µL min-1 0.1 % formic acid, 0.05 % ammonia, and 5 % acetonitrile. The elution of DOM was achieved through a step-wise increase in concentrat...
    
  10. Data from: Assessing Reliability of Non-targeted High-Resolution Mass...

    • acs.figshare.com
    xlsx
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine T. Peter; Edward P. Kolodziej; John R. Kucklick (2023). Assessing Reliability of Non-targeted High-Resolution Mass Spectrometry Fingerprints for Quantitative Source Apportionment in Complex Matrices [Dataset]. http://doi.org/10.1021/acs.analchem.1c03202.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    ACS Publications
    Authors
    Katherine T. Peter; Edward P. Kolodziej; John R. Kucklick
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Effective management of contaminated sites requires differentiating and deconvoluting contaminant source impacts in complex environmental systems. The existing source apportionment approaches that use targeted analyses of preselected indicator chemicals are limited whenever target analytes are below the detection limits or derived from multiple sources. However, non-targeted analyses that leverage high-resolution mass spectrometry (HRMS) yield rich datasets that deeply characterize sample-specific chemical compositions, providing additional potential end-members for source differentiation and apportionment. Previous work demonstrated that HRMS fingerprints can define sample uniqueness and support accurate, quantitative source concentration estimates. Here, using two aqueous film-forming foams as representative complex sources, we assessed the qualitative fidelity and quantitative accuracy of HRMS source fingerprints in increasingly complex background matrices. Across all matrices, HRMS-derived source concentration estimates were 0.81 ± 0.11-fold and 0.64 ± 0.24-fold of actual in samples impacted solely by analytical matrix effects (MEs) or by sample processing recovery and analytical MEs, respectively. Isotopic internal standards were not easily paired to individual unidentified non-target features, but bulk internal standard-based abundance corrections improved apportionment accuracy in higher matrix samples (to 0.90 ± 0.12-fold of actual) and/or informed concentration estimate relative errors. HRMS fingerprint mining could identify, based on the dilution behavior, effective individual chemical end-members across 16 homologous series. Although method development is needed, the results further demonstrate the potential applications of non-targeted HRMS data for source apportionment and other quantitative outcomes.

  11. n

    Data from: Evaluating species richness using proteomic fingerprinting and...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Sep 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Rossel; Katja Uhlenkott; Janna Peters; Annemiek Vink; Pedro Martínez Arbizu (2022). Evaluating species richness using proteomic fingerprinting and DNA-barcoding – a case study on meiobenthic copepods from the Clarion Clipperton Fracture Zone [Dataset]. http://doi.org/10.5061/dryad.qfttdz0m3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 27, 2022
    Dataset provided by
    Federal Institute For Geosciences and Natural Resources
    Senckenberg am Meer
    Authors
    Sven Rossel; Katja Uhlenkott; Janna Peters; Annemiek Vink; Pedro Martínez Arbizu
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Clipperton Island
    Description

    The Clarion Clipperton Fracture Zone (CCZ) is a vast deep-sea region harboring a highly diverse benthic fauna, which will be affected by potential future deep-sea mining of metal-rich polymetallic nodules. Despite the need for conservation plans and monitoring strategies in this context, the majority of taxonomic groups remains scientifically undescribed. However, molecular rapid assessment methods such as DNA-barcoding and Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) provide the potential to accelerate specimen identification and biodiversity assessment significantly in the deep-sea areas. In this study, we successfully applied both methods to investigate the diversity of meiobenthic copepods in the eastern CCZ, including the first application of MALDI-TOF MS for the identification of these deep-sea organisms. Comparing several different species delimitation tools for both datasets, we found that biodiversity values were very similar, with Pielou’s Evenness varying between 0.97 and 0.99 in all datasets. Still, direct comparisons of species clusters revealed differences between all techniques and methods, which are likely caused by the high number of rare species being represented by only one specimen, despite our extensive dataset of more than 2000 specimens. Hence, we regard our study as a first approach toward setting up a reference library for mass spectrometry data of the CCZ in combination with DNA-barcodes. We conclude that proteome fingerprinting, as well as the more established DNA-barcoding, can be seen as a valuable tool for rapid biodiversity assessments in the future, even when no reference information is available. Methods Sediment sampling in the CCZ was conducted using a multicorer during the cruises MANGAN 2018 (SO262:, 05/04 to 29/05/2018, Rühlemann et al., 2019), and MiningImpact2 (SO268/2:, 30/03 to 22/05/2019, Haeckel and Linke, 2021), both on the German research vessel SONNE. The study area is located within the eastern part of the German contract area for the exploration of polymetallic nodules, which has been licensed by the German Federal Institute for Geosciences and Natural Resources (BGR) from the International Seabed Authority (ISA). Meiofauna was sampled using multicores with an (inner diameter of 94-96 mm. Bottom water was sieved over a 32 µm sieve and fixed with 99.8% ethanol denatured with methyl ethyl ketone together with, in 2018, the upper 3 cm and in 2019 the upper 5 cm of sediment in a Kautex wide-neck bottle (1000 ml). All samples were re-fixed with the same fixative after 24 hours and stored at -20°C. To extract all meiofauna organisms from the sediment, samples were centrifuged according to the differential flotation method (Heip et al., 1985) with the colloidal gel Levasil®. Centrifuged samples were transferred into a Kautex wide-neck bottle (100 ml) and further stored at -20°C in the same fixative. All copepods were sorted out of the supernatant under a dissecting microscope. Prior to molecular processing, all individuals were photographed to document their basic morphology, and the ontogenetic stage was determined. Further processing was conducted according to two different protocols. In the first approach conducted on 58% of all available specimens, the individual was cut into two pieces. The posterior part was used for DNA-barcoding, while the anterior part was used for investigations with MALDI-TOF MS. In the second, enhanced protocol conducted on 42% of the specimens, the individuals were first prepared for MALDI-TOF MS and then washed with 10 µl molecular grade water before they were processed for DNA-barcoding, to increase biomass used for the MALDI measurements. The change of protocol only influenced the success rate of MALDI-TOF MS, but had no influence on the resulting DNA-barcode or the mass spectrum. Furthermore, the exuviae could be retained for potential morphological investigations in the future. The tissue was transferred with 5-µl ethanol into a 0.2 ml microcentrifuge tube. After the ethanol had evaporated at room temperature, 2.5-µl α-cyano-4-hydroxycinnamic acid (HCCA) was added and the tissue was incubated for at least 5 min. Thereafter, the extract with the HCCA was transferred to a metallic target plate and measured in a Microflex LT/SH System (Bruker Daltonics) using the method MBTAuto. Peak evaluation was carried out in a mass peak range between 2 k and 10 k Dalton (Da) using a centroid peak detection algorithm, a signal to noise threshold of 2 and a minimum intensity threshold of 600. To create a sum spectrum, 160 satisfactory shots were summed up. Raw spectra were imported to R and further processed using the R-packages MALDIquantForeign (Gibb, 2015) and MALDIquant (Gibb and Strimmer, Korbinian, 2012). Spectra were square-root transformed, smoothed using the Savitzky Golay method (Savitzky and Golay, 1964), baseline corrected using the Statistics-sensitive Non-linear Iterative Peak-clipping algorithm (SNIP)(Ryan et al., 1988) and spectra normalized using the Total Ion Current (TIC) method. Repeated measurements were averaged by using mean intensities.

  12. n

    Data from: Potential of MALDI−TOF MS-based proteomic fingerprinting for...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Jun 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Rossel; Janna Peters; Silke Laakmann; Pedro Martínez Arbizu; Sabine Holst (2023). Potential of MALDI−TOF MS-based proteomic fingerprinting for species identification of Cnidaria across classes, species, regions and developmental stages [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc8q
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 30, 2023
    Dataset provided by
    Senckenberg am Meer
    Alfred-Wegener-Institut Helmholtz-Zentrum für Polar- und Meeresforschung
    Authors
    Sven Rossel; Janna Peters; Silke Laakmann; Pedro Martínez Arbizu; Sabine Holst
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Morphological identification of cnidarian species can be difficult throughout all life stages due to the lack of distinct morphological characters. Moreover, in some cnidarian taxa genetic markers are not fully informative, and in these cases combinations of different markers or additional morphological verifications may be required. Proteomic fingerprinting based on MALDI-TOF mass spectra was previously shown to provide reliable species identification in different metazoans including some cnidarian taxa. For the first time, we tested the method across four cnidarian classes (Staurozoa, Scyphozoa, Anthozoa, Hydrozoa) and included different scyphozoan life-history stages (polyp, ephyra, medusa) into our dataset. Our results revealed reliable species identification based on MALDI-TOF mass spectra across all taxa with species-specific clusters for all 23 analyzed species. In addition, proteomic fingerprinting was successful for distinguishing developmental stages, still by retaining a species specific signal. Furthermore, we identified the impact of different salinities in different regions (North Sea and Baltic Sea) on proteomic fingerprints to be negligible. In conclusion, the effects of environmental factors and developmental stages on proteomic fingerprints seem to be low in cnidarians. This would allow using reference libraries built up entirely of adult or cultured cnidarian specimens for the identification of their juvenile stages or specimens from different geographic regions in future biodiversity assessment studies. Methods In total, 278 specimens of Cnidaria belonging to 23 different species from four classes were analyzed. Field specimens were morphologically identified to species level by taxonomic experts immediately after collection, before complete specimens or subsamples were preserved in undenatured ethanol (80 - 96%). From each specimen, a small tissue fragment (max. 1 mm³) was incubated for 5 minutes with 5 µl of alpha-cyano-4-hydroxycinnamic acid (HCCA) matrix. Of this incubated solution, 1 to 1.5 µl were transferred to a target plate on one to nine spots for co-crystallization of matrix and analytes. Each spot was measured one to three times using a Microflex LT/SH System (Bruker Daltonics). Employing the flexControl 3.4. (Bruker Daltonics) software, molecule masses were measured from 2 to 20k Dalton (kDA). A centroid peak detection algorithm was carried out for peak evaluation by analyzing the mass peak range from 2 to 20 kDa. Furthermore, peak evaluation was carried out by a signal-to-noise threshold of two and a minimum intensity threshold of 600 with a peak resolution higher than 400. To validate fuzzy control, the proteins/oligonucleotide method was employed by maximal resolution of ten times above the threshold. To create a sum spectrum, a total of at least 120 laser shots were applied to a spot. Measurements were carried out using the same instrument at different occasions between 2013 and 2019. MALDI-TOF data processing MALDI-TOF raw data were imported to R, Version 4.1.0 (R-Core-Team, 2022) and processed using R packages MALDIquantForeign, Version 0.12 (Gibb, 2015) and MALDIquant, Version 1.20 (Gibb and Strimmer, 2012). Spectra were square-root transformed, smoothed using the Savitzky Golay method (Savitzky and Golay, 1964), baseline corrected using the SNIP method (Ryan et al., 1988) and spectra normalized using the TIC method. Repeated measurements were averaged by using mean intensities. Peak picking was carried out using a signal to noise ratio (SNR) of 12 and a half window size of 13. Mass peaks smaller than a SNR of 12 were however retained, if they occurred in other mass spectra as long as these were larger than a SNR value of 1.75, which is assumed as a lower detection limit. Repeated peak binning was carried out to align homologous mass peaks. Resulting data was Hellinger transformed (Legendre and Gallagher, 2001) and used for further analyses.

  13. g

    Metabolomic data and fingerprints of deep-sea coral species and populations...

    • data.griidc.org
    • search.dataone.org
    Updated Nov 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iliana B. Baums (2020). Metabolomic data and fingerprints of deep-sea coral species and populations obtained aboard the E/V Nautilus cruises NA057 and NA058 in the Gulf of Mexico from 2015-04-22 to 2015-05-11 and 2015-07-15 [Dataset]. http://doi.org/10.7266/n7-n3hr-pz39
    Explore at:
    Dataset updated
    Nov 25, 2020
    Dataset provided by
    GRIIDC
    Authors
    Iliana B. Baums
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    This dataset contains metabolomic data and fingerprints of deep-sea coral species and populations obtained aboard the E/V Nautilus cruises NA057 and NA058 in the northern Gulf of Mexico from 2015-04-22 to 2015-05-11 and on 2015-07-15 with other methods such as SCUBA diving. Untargeted liquid chromatography-mass spectrometry was used to examine the metabolomic diversity of Callogorgia delta, across three sites in the northern Gulf of Mexico. This data was contrasted with that of Stichopathes sp., Leiopathes glaberrima, Lophelia pertusa, and a shallow-water species, Acropora palmata. Metabolomic fingerprints were species-specific and differed in metabolic richness which C. delta being the least diverse and Lophelia pertusa being the most diverse. The dataset also includes the date, latitudes and longitudes of the sampling locations, and the cruise documentation for the E/V Nautilus cruises NA057 (leg 1) and NA058 (leg 2) led by chief scientists Dr. Chuck Fisher and Dr. Erik Cordes respectively. This dataset supports the publication: Vohsen, Samuel A., Charles R. Fisher, and Iliana B. Baums. 2019. Metabolomic richness and fingerprints of deep-sea coral species and populations. Metabolomics, 15(34). doi:10.1007/s11306-019-1500-y.

  14. f

    Data from: Mass Spectrometry-Based Metabolic Fingerprinting Contributes to...

    • acs.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuela R. Martinefski; Belén Elguero; María Elena Knott; David Gonilski; Lucas Tedesco; Juan M. Gurevich Messina; Cora Pollak; Eduardo Arzt; María Eugenia Monge (2023). Mass Spectrometry-Based Metabolic Fingerprinting Contributes to Unveil the Role of RSUME in Renal Cell Carcinoma Cell Metabolism [Dataset]. http://doi.org/10.1021/acs.jproteome.0c00655.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    ACS Publications
    Authors
    Manuela R. Martinefski; Belén Elguero; María Elena Knott; David Gonilski; Lucas Tedesco; Juan M. Gurevich Messina; Cora Pollak; Eduardo Arzt; María Eugenia Monge
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Clear cell renal cell carcinoma (ccRCC) is a heterogeneous disease with 50–80% patients exhibiting mutations in the von Hippel–Lindau (VHL) gene. RSUME (RWD domain (termed after three major RWD-containing proteins: RING finger-containing proteins, WD-repeat-containing proteins, and yeast DEAD (DEXD)-like helicases)-containing protein small ubiquitin-related modifier (SUMO) enhancer) acts as a negative regulator of VHL function in normoxia. A discovery-based metabolomics approach was developed by means of ultraperformance liquid chromatography coupled to quadrupole time-of-flight mass spectrometry (MS) for fingerprinting the endometabolome of a human ccRCC cell line 786-O and three other transformed cell systems (n = 102) with different expressions of RSUME and VHL. Cross-validated orthogonal projection to latent structures discriminant analysis models were built on positive, negative, and a combination of positive- and negative-ion mode MS data sets. Discriminant feature panels selected by an iterative multivariate classification allowed differentiating cells with different expressions of RSUME and VHL. Fifteen identified discriminant metabolites with level 1, including glutathione, butyrylcarnitine, and acetylcarnitine, contributed to understand the role of RSUME in ccRCC. Altered pathways associated with the RSUME expression were validated by biological and bioinformatics analyses. Combined results showed that in the absence of VHL, RSUME is involved in the downregulation of the antioxidant defense system, whereas in the presence of VHL, it acts in rerouting energy-related pathways, negatively modulating the lipid utilization, and positively modulating the fatty acid synthesis, which may promote deposition in droplets.

  15. f

    Reconstructing Asian faunal introductions to eastern Africa from multi-proxy...

    • plos.figshare.com
    • datadryad.org
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mary E. Prendergast; Michael Buckley; Alison Crowther; Laurent Frantz; Heidi Eager; Ophélie Lebrasseur; Rainer Hutterer; Ardern Hulme-Beaman; Wim Van Neer; Katerina Douka; Margaret-Ashley Veall; Eriéndira M. Quintana Morales; Verena J. Schuenemann; Ella Reiter; Richard Allen; Evangelos A. Dimopoulos; Richard M. Helm; Ceri Shipton; Ogeto Mwebi; Christiane Denys; Mark Horton; Stephanie Wynne-Jones; Jeffrey Fleisher; Chantal Radimilahy; Henry Wright; Jeremy B. Searle; Johannes Krause; Greger Larson; Nicole L. Boivin (2023). Reconstructing Asian faunal introductions to eastern Africa from multi-proxy biomolecular and archaeological datasets [Dataset]. http://doi.org/10.1371/journal.pone.0182565
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mary E. Prendergast; Michael Buckley; Alison Crowther; Laurent Frantz; Heidi Eager; Ophélie Lebrasseur; Rainer Hutterer; Ardern Hulme-Beaman; Wim Van Neer; Katerina Douka; Margaret-Ashley Veall; Eriéndira M. Quintana Morales; Verena J. Schuenemann; Ella Reiter; Richard Allen; Evangelos A. Dimopoulos; Richard M. Helm; Ceri Shipton; Ogeto Mwebi; Christiane Denys; Mark Horton; Stephanie Wynne-Jones; Jeffrey Fleisher; Chantal Radimilahy; Henry Wright; Jeremy B. Searle; Johannes Krause; Greger Larson; Nicole L. Boivin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Africa, East Africa
    Description

    Human-mediated biological exchange has had global social and ecological impacts. In sub-Saharan Africa, several domestic and commensal animals were introduced from Asia in the pre-modern period; however, the timing and nature of these introductions remain contentious. One model supports introduction to the eastern African coast after the mid-first millennium CE, while another posits introduction dating back to 3000 BCE. These distinct scenarios have implications for understanding the emergence of long-distance maritime connectivity, and the ecological and economic impacts of introduced species. Resolution of this longstanding debate requires new efforts, given the lack of well-dated fauna from high-precision excavations, and ambiguous osteomorphological identifications. We analysed faunal remains from 22 eastern African sites spanning a wide geographic and chronological range, and applied biomolecular techniques to confirm identifications of two Asian taxa: domestic chicken (Gallus gallus) and black rat (Rattus rattus). Our approach included ancient DNA (aDNA) analysis aided by BLAST-based bioinformatics, Zooarchaeology by Mass Spectrometry (ZooMS) collagen fingerprinting, and direct AMS (accelerator mass spectrometry) radiocarbon dating. Our results support a late, mid-first millennium CE introduction of these species. We discuss the implications of our findings for models of biological exchange, and emphasize the applicability of our approach to tropical areas with poor bone preservation.

  16. f

    Table6_Species identification of modern and archaeological shark and ray...

    • figshare.com
    • frontiersin.figshare.com
    xlsx
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Buckley; Ellie-May Oldfield; Cristina Oliveira; Clara Boulanger; Andrew C. Kitchener; Nicole R. Fuller; Traci Ardren; Victor D. Thompson; Scott M. Fitzpatrick; Michelle J. LeFebvre (2024). Table6_Species identification of modern and archaeological shark and ray skeletal tissues using collagen peptide mass fingerprinting.xlsx [Dataset]. http://doi.org/10.3389/fmars.2024.1500595.s008
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Frontiers
    Authors
    Michael Buckley; Ellie-May Oldfield; Cristina Oliveira; Clara Boulanger; Andrew C. Kitchener; Nicole R. Fuller; Traci Ardren; Victor D. Thompson; Scott M. Fitzpatrick; Michelle J. LeFebvre
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionElasmobranchs, such as sharks and rays, are among the world’s most endangered vertebrates, with over 70% loss in abundance over the past 50 years due to human impacts. Zooarchaeological baselines of elasmobranch diversity, distribution, and exploitation hold great promise for contributing essential historical contexts in the assessment of contemporary patterns in their taxonomic diversity and vulnerability to human-caused extinction. Yet, the historical ecology of elasmobranchs receives relatively less archaeological attention compared to that of ray-finned fishes or marine mammals, largely due to issues of taxonomic resolution across zooarchaeological identifications.MethodsWe explore the use of Zooarchaeology by Mass Spectrometry (ZooMS) for species identification in this unstudied group, using an archaeological case study from the marine environments of the Florida Keys, a marine biodiversity hotspot that is home to an array of elasmobranch species and conservation efforts. By comparison with 39 modern reference species, we could distinguish 12 taxa within the zooarchaeological assemblage from the Clupper archaeological site (Upper Matecumbe Key) that included nine sharks, two rays and a sawfish.Results and discussionThe results indicate that, through additional complexity of the collagen peptide mass fingerprint, obtained due to the presence of the cartilaginous type II collagen, ZooMS collagen peptide mass fingerprinting provides exceptionally high taxonomic resolution in this group, yielding species-level identifications in all cases where sufficient reference material was used. This case study also highlights the added value of ZooMS for taxa that are more difficult to distinguish in zooarchaeological analyses, such as vertebrae of the Atlantic sharpnose shark (Rhizoprionodon terraenovae) and the hammerhead sharks (Sphyrna spp.) in the Florida Keys. Therefore, the application of collagen peptide mass fingerprinting to elasmobranchs offers great potential to improve our understanding of their archaeological past and historical ecology.

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dongmei Yang; Kevin Ramkissoon; Eric Hamlett; Morgan C. Giddings (2023). High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning [Dataset]. http://doi.org/10.1021/pr070088g.s005
Organization logo

Data from: High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
ACS Publications
Authors
Dongmei Yang; Kevin Ramkissoon; Eric Hamlett; Morgan C. Giddings
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

For MALDI-TOF mass spectrometry, we show that the intensity of a peptide–ion peak is directly correlated with its sequence, with the residues M, H, P, R, and L having the most substantial effect on ionization. We developed a machine learning approach that exploits this relationship to significantly improve peptide mass fingerprint (PMF) accuracy based on training data sets from both true-positive and false-positive PMF searches. The model’s cross-validated accuracy in distinguishing real versus false-positive database search results is 91%, rivaling the accuracy of MS/MS-based protein identification.

Search
Clear search
Close search
Google apps
Main menu