100+ datasets found

Database Infrastructure for Mass Spectrometry - Per- and Polyfluoroalkyl...
nist.gov
data.nist.gov
+1more
Updated Jul 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). Database Infrastructure for Mass Spectrometry - Per- and Polyfluoroalkyl Substances [Dataset]. http://doi.org/10.18434/mds2-2905
Explore at:
Unique identifier
https://doi.org/10.18434/mds2-2905, https://identifiers.org/ark:/88434/mds2-2905
Dataset updated
Jul 5, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
Data here contain and describe an open-source structured query language (SQLite) portable database containing high resolution mass spectrometry data (MS1 and MS2) for per- and polyfluorinated alykl substances (PFAS) and associated metadata regarding their measurement techniques, quality assurance metrics, and the samples from which they were produced. These data are stored in a format adhering to the Database Infrastructure for Mass Spectrometry (DIMSpec) project. That project produces and uses databases like this one, providing a complete toolkit for non-targeted analysis. See more information about the full DIMSpec code base - as well as these data for demonstration purposes - at GitHub (https://github.com/usnistgov/dimspec) or view the full User Guide for DIMSpec (https://pages.nist.gov/dimspec/docs). Files of most interest contained here include the database file itself (dimspec_nist_pfas.sqlite) as well as an entity relationship diagram (ERD.png) and data dictionary (DIMSpec for PFAS_1.0.1.20230615_data_dictionary.json) to elucidate the database structure and assist in interpretation and use.
d
Mass Spectral Library
dknet.org
scicrunch.org
+1more
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Mass Spectral Library [Dataset]. http://identifiers.org/RRID:SCR_014668
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014668
Dataset updated
May 15, 2024
Description
A library containing spectra upwards of 200,000 chemical compounds. Spectra include metabolites, peptides, contaminants, and lipids. All spectra and chemical structures are examined by professionals.
f
HREI-MSDB: High-resolution electron ionization mass spectral database for...
figshare.com
xlsx
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitriy Matyushin; Anastasia Yu. Sholokhova; Timur Baygildiev; Ekaterina Chichkanova; Svetlana Borovikova; Yuriy Ikhalaynen; Igor Rodin (2025). HREI-MSDB: High-resolution electron ionization mass spectral database for diverse volatile compounds [Dataset]. http://doi.org/10.6084/m9.figshare.29713460.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29713460.v2
Dataset updated
Aug 1, 2025
Dataset provided by
figshare
Authors
Dmitriy Matyushin; Anastasia Yu. Sholokhova; Timur Baygildiev; Ekaterina Chichkanova; Svetlana Borovikova; Yuriy Ikhalaynen; Igor Rodin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a database of high-resolution electron ionization (EI) mass spectra recorded under gas chromatography - mass spectrometry (GC-MS) conditions. The vast majority of publicly available GC-MS data sets are obtained using low-resolution mass spectrometry. Few exceptions are the works E.J. Price, 2021, and V.Castro, 2022. At the same time, gas chromatography-high-resolution mass spectrometry (GC-HRMS) is used quite often in studies.This database aimed to create a GC-HRMS data set covering the diverse classes of volatile compounds (trimethylsilyl- derivatives are not included!), using a wide m/z range (starting from m/z = 40). Mass spectra were recorded using an Orbitrap Exploris GC mass detector (Thermo Fisher Scientific, USA). The mass determination error is no more than 0.0006 Da, and the mass spectral resolution value is 30000. All mass spectra were checked manually; the .zip archives contain information on peak annotations. The data.xlsx file contains a list of compounds and spectra IDs. Peaks with intensity less than 1/999 of the most intense were discarded.The data set includes:130 mass spectra of pure compounds recorded using GC-MS of 10-molecule batches or GC-MS of individual compound solutions.61 mass spectra of compounds included in the 8270 MegaMix standard compound mixture.45 mass spectra of volatile compounds included in lavender essential oil.38 mass spectra of volatile compounds included in mint essential oil.33 mass spectra of volatile compounds included in lemon essential oil.22 mass spectra of volatile compounds included in coffee.These groups of spectra are designated as Pure samples, 8270 MegaMix Standard, Lavender (essential oil), Mint (essential oil), Lemon (essential oil), and Coffee, respectively in the data.xlsx file and in the "Comments" tag in the MSP files. Please note which spectrum was obtained in what way. Identification of compounds in essential oils and coffee is quite reliable, but it was still performed without using standard samples.For convenience, in some cases (for essential oils), SMILES are provided using symbols denoting stereoisomers, but we cannot be sure that we really know which stereoisomer we are considering: often, both the retention indices and mass spectra are very close.Detailed information on the experimental conditions under which the spectra were obtained, on the equipment, and data processing is contained in the info.pdf file. The quality_assessment.xlsx file contains data obtained during quality control of the mass spectra (see the info.pdf file for additional information).Each file named all_spectra contains all spectra (both those obtained using the sample collection and those obtained from essential oil and coffee samples) in different file formats. Most likely, you need the all_spectra.msp file (NIST-compatible), it contains all the data. The plant_volatiles.msp file contains all mass spectra obtained from essential oils and coffee. The names of the remaining files are self-explanatory. If you need annotations of all peaks or more file formats, then look at the .zip archives. JCAMP (.jdx) files are in the .zip archives.Processing (interpretation) of mass spectra was done using our software:https://github.com/mtshn/gchrmsexplain versions 0.0.2 and 0.0.3.The settings used are given in the info.pdf file; however, these settings are the default for the corresponding versions.Levels of explanation of each peak in the mass spectrum:Level 1 - the molecular formula is selected, but some isotopic peaks are not found at allLevel 2 - isotopic peaks merge with other peaks. For example, the 13C peak of some ion X is superimposed (taking into account the resolution) on the main peak X + H. At not very high resolutions, such peaks may not be resolved. This also includes cases of "incorrect" isotopic peak intensity, differing from the theoretically calculated one.Level 3 - all main isotopic peaks are observed correctly, up to the accuracy of mass determination.The minimum number of bonds that must be broken to obtain such a fragment is indicated without taking into account the loss of hydrogens, as well as without some other "trivial" bond breaks: the loss of a halogen atom, a methyl group, NO-loss from a nitro-group. Details are given in the documentation of the software used to process the mass spectra: https://github.com/mtshn/gchrmsexplain.In files containing abbreviated interpretations of mass spectra (e.g., in CSV_annotated folders in .zip archives), notations like 3-1 are used. The first number denotes the interpretation level (see above), and the second denotes the number of (non-trivial) bond breaks required to obtain such a molecular formula.
n
METLIN
neuinfo.org
dknet.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). METLIN [Dataset]. http://identifiers.org/RRID:SCR_010500
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_010500
Dataset updated
Jan 29, 2022
Description
A public repository of metabolite information as well as tandem mass spectrometry data is provided to facilitate metabolomics experiments. It contains structures and represents a data management system designed to assist in a broad array of metabolite research and metabolite identification. An annotated list of known metabolites and their mass, chemical formula, and structure are available. Each metabolite is linked to outside resources for further reference and inquiry. MS/MS data is also available on many of the metabolites.
NIST DART-MS Forensics Database (is-CID)
nist.gov
data.nist.gov
+3more
Updated Nov 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2020). NIST DART-MS Forensics Database (is-CID) [Dataset]. http://doi.org/10.18434/mds2-2313
Explore at:
Unique identifier
https://doi.org/10.18434/mds2-2313, https://identifiers.org/ark:/88434/mds2-2313
Dataset updated
Nov 5, 2020
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
The NIST DART-MS Forensics Database is an evaluated collection of in-source collisionally-induced dissociation (is-CID) mass spectra of compounds of interest to the forensics community (e.g. seized drugs, cutting agents, etc.). The is-CID mass spectra were collected using Direct Analysis in Real-Time (DART) Mass Spectrometry (MS), either by NIST scientists or by contributing agencies noted per compound. The database is provided as a general-purpose structure data file (.SDF). For users on Windows operating systems, the .SDF format library can be converted to NIST MS Search format using Lib2NIST and then explored using NIST MS Search v2.4 for general mass spectral analysis. These software tools can be downloaded at https://chemdata.nist.gov. The database is now (09-28-2021) also provided in R data format (.RDS) for use with the R programming language. This database, also commonly referred to as a library, is one in a series of high-quality mass spectral libraries/databases produced by NIST (see NIST SRD 1a, https://dx.doi.org/10.18434/T4H594).
Version 4 (20230306) of the MALDI-ToF Mass Spectrometry Database for...
data.niaid.nih.gov
Updated Dec 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lasch, Peter; Stämmler, Maren; Schneider, Andy (2024). Version 4 (20230306) of the MALDI-ToF Mass Spectrometry Database for Identification and Classification of Highly Pathogenic Microorganisms from the Robert Koch-Institute (RKI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7702374
Explore at:
Dataset updated
Dec 27, 2024
Dataset provided by
Robert Koch Institutehttps://www.rki.de/
Authors
Lasch, Peter; Stämmler, Maren; Schneider, Andy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
(Version 20230306)

Version 4 (20230306) of the RKI MALDI-ToF mass spectra database is the third update of the original database (version 20161027, https://doi.org/10.5281/zenodo.163517). The RKI Database v.4 now contains a total of 11055 MALDI-ToF mass spectra from 1599 microbial strains of highly pathogenic (i.e. biosafety level 3, BSL-3) bacteria such as Bacillus anthracis, Brucella melitensis, Yersinia pestis, Burkholderia mallei / pseudomallei and Francisella tularensis as well as a selection of spectra of their close and distant relatives. The database can be used as a reference for the diagnosis of BSL-3 bacteria using proprietary and free software packages for MALDI-ToF MS-based microbial identification. The spectral data are provided as a zip archive (zenodo db 230306.zip) containing the original mass spectra in their native data format (Bruker Daltonics). Please refer to the pdf file (230306-ZENODO-Metadata.pdf) for information on cultivation conditions, sample preparation and details of the spectra acquisition. Please do not try to print this document (>1600 pages!).

Version 20230306 of the RKI database contains for the first time a file in btmsp format (230306_v4_RKI_DB_BSL3.btmsp). This file was generated using the MALDI Biotyper software (Bruker Daltonics) and contains a total of 1599 main spectra from the BSL-3 database in the proprietary data format of the MALDI Biotyper software. *.btmsp files can be imported and used for identification with this software solution. Note that the btmsp file available in database version 4 is broken and cannot be imported. Please refer to updated database versions (4.1, or 4.2) to download valid btmsp files.

The pkf files (230306_ZENODO_30Peaks_0.75.pkf, 230306_ZENODO_45Peaks_0.75.pkf) represent two versions of the MS peak list data in a Matlab compatible format. The latter data can be imported into MicrobeMS, a free Matlab-based software solution developed at the RKI. MicrobeMS can be used for the identification of microorganisms by MALDI-ToF MS and is available at https://wiki-ms.microbe-ms.com.

The RKI mass spectrometry database is updated regularly.

The author would like to thank the following individuals for providing microbial strains and species or mass spectra thereof. Without their help, this work would not have been possible.

Wolfgang Beyer - University of Hohenheim, Faculty of Agricultural Sciences, Stuttgart, Germany

Guido Werner - Robert Koch-Institute, Nosocomial Pathogens and Antibiotic Resistances (FG13), Wernigerode, Germany

Alejandra Bosch - CINDEFI, CONICET-CCT La Plata, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina

Michal Drevinek - National Institute for Nuclear, Biological and Chemical Protection, Milin, Czech Republic

Roland Grunow, Daniela Jacob, Silke Klee, Susann Dupke and Holger Scholz - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

Jörg Rau - Chemisches und Veterinäruntersuchungsamt Stuttgart, Fellbach, Germany

Jens Jacob - Robert Koch-Institute, Hospital Hygiene, Infection Prevention and Control (FG14), Berlin, Germany

Martin Mielke - Robert Koch-Institute, Department 1 - Infectious Diseases, Berlin, Germany

Monika Ehling-Schulz - Functional Microbiology, Institute of Microbiology, University of Veterinary Medicine, Vienna, Austria

Armand Paauw - Department of Medical Microbiology, CBRN protection, Universitair Medisch Centrum Utrecht, TNO, Rijswijk, The Netherlands

Herbert Tomaso – Friedrich-Löffler-Institut (FLI), Federal Research Institute for Animal Health, Jena, Germany

Gabriel Karner - Karner Düngerproduktion GmbH, Research & Development, Neulengbach, Austria

Rainer Borriss - Institute of Marine Biotechnology e.V. (IMaB), Greifswald, Germany

Le Thi Thanh Tam - Division of Plant Pathology and Phyto-Immunology, Plant Protection Research Institute, Hanoi, Socialist Republic of Vietnam

Xuewen Gao - College of Plant Protection, Nanjing Agricultural University, Key Laboratory of Integrated Management of Crop Diseases and Pests, Nanjing, People’s Republic of China
n
MassBank of North America
neuinfo.org
scicrunch.org
+2more
Updated Mar 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). MassBank of North America [Dataset]. http://identifiers.org/RRID:SCR_015536
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_015536
Dataset updated
Mar 9, 2018
Description
Metadata-centric, auto-curating repository designed for storage and querying of mass spectral records. It contains metabolite mass spectra, metadata and associated compounds.
s
Data from: MassBank
scicrunch.org
Updated Oct 17, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). MassBank [Dataset]. http://identifiers.org/RRID:SCR_015535
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_015535
Dataset updated
Oct 17, 2019
Description
Public repository of mass spectral data which allows users to search similar spectra on a peak-to-peak basis, on a neutral loss-to-neutral loss basis, or by the m/z value and molecular formula, search chemical compounds by substructures, and keyword search chemical compounds
NIST Libraries of Peptide Fragmentation Mass Spectra Database - SRD 1c
catalog.data.gov
gimi9.com
+4more
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). NIST Libraries of Peptide Fragmentation Mass Spectra Database - SRD 1c [Dataset]. https://catalog.data.gov/dataset/nist-libraries-of-peptide-fragmentation-mass-spectra-database-srd-1c
Explore at:
Dataset updated
Sep 30, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
NIST peptide libraries are comprehensive, annotated mass spectral reference collections from various organisms and proteins useful for the rapid matching and identification of acquired MS/MS spectra. Spectra were produced by tandem mass spectrometers using liquid chromatographic separations followed by electrospray ionization. Unlike the NIST small molecule electron ionization library which contains one spectrum per molecular structure, there are several different modes of fragmentation (ion trap and ?beam-type? collision cells are currently the most commonly used fragmentation devices) that result in spectra with different, energy dependent, patterns. These result in multiple spectral libraries, distinguished by ionization mode, each of which may contain several spectra per peptide. Different libraries have also been assembled for iTRAQ-4 derivatized peptides and for phosphorylated peptides. Separating libraries by animal species reduces search time, although investigators may elect to include several species in their searches.
mini-Ni : A small GC-MS database of nitrogen-containing compounds
figshare.com
xlsx
Updated Sep 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitriy Matyushin; Anastasia Yu. Sholokhova; Svetlana Borovikova (2025). mini-Ni : A small GC-MS database of nitrogen-containing compounds [Dataset]. http://doi.org/10.6084/m9.figshare.30185032.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30185032.v1
Dataset updated
Sep 23, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Dmitriy Matyushin; Anastasia Yu. Sholokhova; Svetlana Borovikova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
mini-Ni is a small database of electron ionization mass spectra and retention indices for nitrogen-containing compounds, mainly aromatic heterocycles.Gas chromatography-mass spectrometry (GC-MS) is a very important method of chemical analysis. Identification is performed using a library search of the mass spectrum in a mass spectral database, and retention index information is also used. Nitrogen-containing small molecules with a high content of nitrogen (Ni means i nitrogen atoms), such as derivatives of triazoles, pyrazoles, imidazoles, diazines, etc., including those containing -CN and -NH2 groups, are an important class of analytes, including priority pollutants. Unfortunately, such compounds are insufficiently represented in the available GC-MS databases, and the databases also contain erroneous entries. This database partially fills this gap.All data (and further notes) are provided in an XLSX file. The other three files also contain mass spectra and retention indices, in formats suitable for import into NIST MS Search and other software. To import the SDF file into NIST MS Search, use the Lib2NIST utility.Electron ionization mass spectra and retention indices (5%-phenylpolydimethylsiloxane and polyethylene glycol) for 104 molecules are provided. For 72 molecules, retention indices are given for three different heating rates (temperature programming mode). The following chromatographic conditions were used for these molecules:Non-polar stationary phase:Column: 5%-phenyl-methylpolysiloxane, HP-5MS, 30m х 0.25mm х 0.50µm, Agilent; starting temperature 40 °C; helium flow rate 0.84 mL/min;Polar stationary phase:Column: polyethylene glycol, HP-INNOWax, 30m х 0.25mm х 0.25µm, Agilent; starting temperature 40 °C; helium flow rate 1.01 mL/min;The analyte solution in methanol (up to 10 analytes in one batch, concentration of each ~0.1 mg/mL) was injected using split injection mode (0.5 μL, 1:20). In some cases, the concentration was increased until high-quality mass spectra of sufficient intensity were obtained.Some of the retention index data was taken from the following source: Karnaeva A. E., Sholokhova A. Y. Validation of the identification reliability of known and assumed UDMH transformation products using gas chromatographic retention indices and machine learning //Chemosphere. – 2024. – V. 362. – P. 142679. https://doi.org/10.26434/chemrxiv-2024-mfbd6 (CC BY 4.0 license)All mass spectra were recorded using a Shimadzu GCMS-TQ8040 (quadrupole mass analyzer, 70 eV electron ionization mode). All mass spectra and retention indices were obtained using pure standards.
b
Metabolite and Tandem Mass Spectrometry Database
bioregistry.io
Updated Nov 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Metabolite and Tandem Mass Spectrometry Database [Dataset]. http://identifiers.org/re3data:r3d100012311
Explore at:
Unique identifier
https://identifiers.org/re3data:r3d100012311
Dataset updated
Nov 16, 2021
Description
The METLIN (Metabolite and Tandem Mass Spectrometry) Database is a repository of metabolite information as well as tandem mass spectrometry data, providing public access to its comprehensive MS and MS/MS metabolite data. An annotated list of known metabolites and their mass, chemical formula, and structure are available, with each metabolite linked to external resources for further reference and inquiry.
f
Data from: Combining High-Resolution and Exact Calibration To Boost...
figshare.com
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Oct 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Lin; J. Jeffry Howbert; William Stafford Noble (2018). Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00206.s007
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.8b00206.s007
Dataset updated
Oct 18, 2018
Dataset provided by
ACS Publications
Authors
Andy Lin; J. Jeffry Howbert; William Stafford Noble
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
To achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high-resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine’s scores are well calibrated, that is, that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum, has proven to be challenging. Here we describe a database search score function, the “residue evidence” (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a “combined p value” score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p value to the score functions used by several existing search engines. Our results suggest that the combined p value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit (http://crux.ms).
Liquid Chromatography - Tandem Mass Spectrometry (LC-MS/MS) and Gas...
zenodo.org
csv
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hunter Dlugas; Hunter Dlugas; Xiang Zhang; Xiang Zhang; Seongho Kim; Seongho Kim (2024). Liquid Chromatography - Tandem Mass Spectrometry (LC-MS/MS) and Gas Chromatography - Mass Spectrometry (GC-MS) Reference Libraries from Global Natural Products Social Molecular Networking (GNPS) and National Institute of Standards and Technology (NIST) WebBook Processed for Spectral Library Matching [Dataset]. http://doi.org/10.5281/zenodo.12786324
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12786324
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hunter Dlugas; Hunter Dlugas; Xiang Zhang; Xiang Zhang; Seongho Kim; Seongho Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 20, 2024
Description
In order to obtain a high-quality LC-MS/MS reference database for spectral library matching, we selected 22 high-quality GNPS tandem mass spectrometry databases generated under the positive ion mode. Further preprocessing similar to Huber et al involving mass-to-charge (m/z) and intensity filtering yields the database found in the file LCMS_GNPS_reference_library.csv which contains 14,705 electrospray ionization (ESI) mass spectra, each of which corresponds to a unique compound. The NIST WebBook database was used to construct GC-MS database contained in the file GCMS_NIST_WebBook.csv. This database contains 23,721 electron ionization (EI) mass spectra, each of which corresponds to a unique non-hyphenated Chemical Abstract Service (CAS) Registry Number.

Both LC-MS/MS and GC-MS databases are organized into three columns: one for the identifier, one for the m/z values, and one for the intensity values. For example, if spectrum A has 20 ion fragments, then there will be 20 rows corresponding to spectrum A in the corresponding database with the identifier A repeated 20 times with the corresponding m/z and intensity values.
Version 2 (20170523) of the MALDI-TOF Mass Spectrometry Database for...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Dec 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lasch, Peter; Stämmler, Maren; Schneider, Andy (2024). Version 2 (20170523) of the MALDI-TOF Mass Spectrometry Database for Identification and Classification of Highly Pathogenic Microorganisms from the Robert Koch-Institute (RKI) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_582602
Explore at:
Dataset updated
Dec 27, 2024
Dataset provided by
Robert Koch Institutehttps://www.rki.de/
Authors
Lasch, Peter; Stämmler, Maren; Schneider, Andy
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
(Version 20170523)

Edit #1 (Nov 30, 2018): New database version (v.3 - 20181130) - available: 10.5281/zenodo.1880975

Edit #2 (Mar 06, 2023): New database version (v.4.2 - 20230306) - available: 10.5281/zenodo.7702375

Version 2 (20170523) of the RKI’s MALDI-TOF mass spectral database is an update of the original database (version 20161027, https://doi.org/10.5281/zenodo.163517). The RKI database contains mass spectral entries from highly pathogenic (biosafety level 3, BSL-3) bacteria such as Bacillus anthracis, Yersinia pestis, Burkholderia mallei, Burkholderia pseudomallei and Francisella tularensis as well as a selection of spectra from their close and more distant relatives. The database can be used as a reference for the diagnostics of BSL-3 bacteria using proprietary and free software packages for MALDI-TOF MS-based microbial identification. Spectral data are distributed as a 7-zip archive that contains the original mass spectra in its native data format (Bruker Daltonics). Please refer to the pdf file (170523-ZENODO-Metadata.pdf) to obtain information on the metadata of the spectra. Do not try to print this document (~1100 pages!)

The pkf-file (170523_ZENODO_Peaklist_30Peaks_1.6.pkf) contains the MS peak list data in a Matlab compatible format. The latter data file can be imported into MicrobeMS, a Matlab-based free-of-charge software solution developed at RKI. MicrobeMS is available from http://www.microbe-ms.com.

The RKI mass spectral database will be updated on a regular basis.

The author's grateful thanks are given to the following persons for providing microbial strains and species. Without their help this work would not be possible.

Wolfgang Beyer - University of Hohenheim, Faculty of Agricultural Sciences, Stuttgart, Germany

Guido Werner - Robert Koch-Institute, Nosocomial Pathogens and Antibiotic Resistances (FG13), Wernigerode, Germany

Alejandra Bosch - CINDEFI, CONICET-CCT La Plata, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina

Michal Drevinek - National Institute for Nuclear, Biological and Chemical Protection, Milin, Czech Republic

Roland Grunow - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

Daniela Jacob - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

Silke Klee - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany

Jörg Rau - Chemisches und Veterinäruntersuchungsamt Stuttgart, Fellbach, Germany

Jens Jacob - Robert Koch-Institute, Hospital Hygiene, Infection Prevention and Control (FG14), Berlin, Germany

Martin Mielke - Robert Koch-Institute, Department 1 - Infectious Diseases, Berlin, Germany

Monika Ehling-Schulz - Functional Microbiology, Institute of Microbiology, University of Veterinary Medicine, Vienna, Austria

Armand Paauw - Department of Medical Microbiology, CBRN protection, Universitair Medisch Centrum Utrecht, TNO, Rijswijk, The Netherlands
d
Proteome 2D-PAGE Database
dknet.org
scicrunch.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Proteome 2D-PAGE Database [Dataset]. http://identifiers.org/RRID:SCR_001678/resolver?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001678 https://identifiers.org/RRID:SCR_001678/resolver?q=&i=rrid
Dataset updated
Jan 29, 2022
Description
The Proteome 2D-PAGE Database system for microbial research is a curated database for storing and investigating proteomics data. Software tools are available and for data submission, please contact the Database Curator. Established at the Max Plank Institution for Infection Biology, this system contains four interconnected databases: i.) 2D-PAGE Database: Two dimensional electrophoresis (2-DE) and mass spectrometry of diverse microorganisms and other organisms. This database currently contains 4971 identified spots and 1228 mass peaklists in 44 reference maps representing experiments from 24 different organisms and strains. The data were submitted by 84 Submitters from 24 Institutes and 12 nations. It also contains various software tools that are important in formatting and analyzing gels and mass peaks; software include: *TopSpot: Scanning the gel, editing the spots and saving the information *Fragmentation: Fragmentation of the gel image into sections *MS-Screener: Perl script to compare the similarity of MALDI-PMF peaklists *MS-Screener update: MS-Screener can be used to compare mass spectra (MALDI-MS(/MS) as well as ESI-MS/MS spectra) on the basis of their peak lists (.dta, .pkm, .pkt, or .txt files), to recalibrate mass spectra, to determine and eliminate exogenous contaminant peaks, and to create matrices for cluster analyses. *GelCali: Online calibration of the Mr- and pI-axis of 2-DE gels with mathematical regression methods ii.)Isotope Coded Affinity Tag (ICAT)-LC/MS database: Isotope Coded Affinity Tag (ICAT)-LC/MS data for Mycobacterium tuberculosis strain BCG versus H37Rv. iii.) FUNC_CLASS database: Functional classification of diverse microorganism. This database also integrates genomic, proteomic, and metabolic data. iv.) DIFF database: Presentation of differently regulated proteins obtained by comparative proteomic experiments using computerized gel image analysis.
Z
MSnLib Mass spectral libraries (.mgf and .json)
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brungs, Corinna; Schmid, Robin; Pluskal, Tomas (2025). MSnLib Mass spectral libraries (.mgf and .json) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11163380
Explore at:
Dataset updated
Jan 24, 2025
Dataset provided by
Czech Academy of Sciences, Institute of Organic Chemistry and Biochemistry
Authors
Brungs, Corinna; Schmid, Robin; Pluskal, Tomas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data for MSnLib are divided into several Zenodo records due to size constraints.

raw positive: 10966404raw negative: 10967081mzml positive and negative: 10966280spectral libraries: 11163380

This record includes the automatically generated spectral libraries (MSnLib) within mzmine, acquired using a flow injection method on an Orbitrap ID-X instrument, for all compound libraries. There are multiple files for each compound library containing MS2 only or MSn in two data formats (.mgf or .json) for both polarities.

MS2 contains next to all MS2 spectra all pseudo MS2 spectra (a full MSn tree merged into one spectrum per compound ion). MSn contains all individual MSn stages additionally. The first number for each file highlights the library building date.

7 Compound Libraries:

Short Name: Full name, Provider (Catalog number), total compounds (not all detected during library building)

MCEBIO: Bioactive Compound Library, MedChemExpress (HY-L001), 10,315 compounds

MCESAF: 5k Scaffold Library, MedChemExpress, (HY-L902), 4998 compounds

NIHNP: NIH NPAC ACONN collection of NP, NIH/NCATS, 3988 compounds

OTAVAPEP: Alpha-helix Peptiomimetic Library, OTAVAchemicals (a-helix-Peptido), 1298 compounds

ENAMDISC: Discovery Diversity Set -10, Enamine (DDS-10), 10,240 compounds

ENAMMOL: Carboxylic Acid Fragment Library + Random, Enamine and Molport, 4378 compounds

MCEDRUG: FDA-Approved Drug Library, MedChemExpress (HY-L022), 2610 compounds

Information regarding the SPECTYPE

no SPECTYPE or SINGLE_BEST_SCAN: Best spectrum for each precursor and energy (highest TIC)

'SAME_ENERGY' = Additionally, if a spectrum was acquired multiple times for a precursor with the same energy, they are merged into one spectrum only with the same energy (max. signal height used for each fragment signal).

'ALL_ENERGIES' = merged spectrum of all used energies (in our case 3 for each precursor, using the merged (same energy) if available).

'ALL_MSN_TO_PSEUDO_MS2' = mzmine merges all MSn into one pseudo MS2.

V5 fixed USIs
Data from: Database for accurate EI-MS spectra for volatile compounds...
zenodo.org
investigacion.usc.es
xml
Updated Mar 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodriguez Isaac; Rodriguez Isaac; Cobo Golpe; Cobo Golpe; Ramil María; Ramil María (2023). Database for accurate EI-MS spectra for volatile compounds identified in e-liquids for e-cigarettes [Dataset]. http://doi.org/10.5281/zenodo.7738079
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7738079
Dataset updated
Mar 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rodriguez Isaac; Rodriguez Isaac; Cobo Golpe; Cobo Golpe; Ramil María; Ramil María
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of accurate EI-MS spectra, recorded using a GC-EI-TOF-MS instrument, for volatile and semi-volatile compounds tentatively identified in diluted e-liquids for e-smoking. Tentative identifications were based on normalized spectral matches above 75 (0-100 scale) with the low resolution NIST17 database, linear retention index differences between 50 units compared to literature values for semipolar (HP5-type) and Carbowax columns, and mass errors lower than 50 ppm for a minimum of two relevant ions in the spectra of each compound.

The database can be used in combination with Library Editor software, included in MassHunter software. Experimental spectra were obtained after spectral deconvolution of GC-MS records for 1:100 diluted commercially available e-liquids
f
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics
acs.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric W. Deutsch; Zhi Sun; David S. Campbell; Pierre-Alain Binz; Terry Farrah; David Shteynberg; Luis Mendoza; Gilbert S. Omenn; Robert L. Moritz (2023). Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics [Dataset]. http://doi.org/10.1021/acs.jproteome.6b00445.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.6b00445.s002
Dataset updated
Jun 9, 2023
Dataset provided by
ACS Publications
Authors
Eric W. Deutsch; Zhi Sun; David S. Campbell; Pierre-Alain Binz; Terry Farrah; David Shteynberg; Luis Mendoza; Gilbert S. Omenn; Robert L. Moritz
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstancesa problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/.
Gas Chromatography-Mass Spectrometry (GC-MS) Biomarker Database Table
ecat.ga.gov.au
researchdata.edu.au
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth of Australia (Geoscience Australia) (2024). Gas Chromatography-Mass Spectrometry (GC-MS) Biomarker Database Table [Dataset]. https://ecat.ga.gov.au/geonetwork/js/api/records/0bef7c86-8724-4bc6-ab1a-283fdf80fc90
Explore at:
www:link-1.0-http--linkAvailable download formats
Dataset updated
Aug 12, 2024
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
Area covered

Description
The Gas Chromatography-Mass Spectrometry (GC-MS) biomarker database table contains publicly available results from Geoscience Australia's organic geochemistry (ORGCHEM) schema and supporting oracle databases for the molecular (biomarker) compositions of source rock extracts and petroleum liquids (e.g., condensate, crude oil, bitumen) sampled from boreholes and field sites. These analyses are undertaken by various laboratories in service and exploration companies, Australian government institutions and universities using either gas chromatography-mass spectrometry (GC-MS) or gas chromatography-mass spectrometry-mass spectrometry (GC-MS-MS). Data includes the borehole or field site location, sample depth, shows and tests, stratigraphy, analytical methods, other relevant metadata, and the molecular composition of aliphatic hydrocarbons, aromatic hydrocarbons and heterocyclic compounds, which contain either nitrogen, oxygen or sulfur.

These data provide information about the molecular composition of the source rock and its generated petroleum, enabling the determination of the type of organic matter and depositional environment of the source rock and its thermal maturity. Interpretation of these data enable the determination of oil-source and oil-oil correlations, migration pathways, and any secondary alteration of the generated fluids. This information is useful for mapping total petroleum systems, and the assessment of sediment-hosted resources. Some data are generated in Geoscience Australia’s laboratory and released in Geoscience Australia records. Data are also collated from destructive analysis reports (DARs), well completion reports (WCRs), and literature. The biomarker data for crude oils and source rocks are delivered in the Petroleum and Rock Composition – Biomarker web services on the Geoscience Australia Data Discovery Portal at https://portal.ga.gov.au which will be periodically updated.
d
Plant Protein Phosphorylation Database
dknet.org
scicrunch.org
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Plant Protein Phosphorylation Database [Dataset]. http://identifiers.org/RRID:SCR_007841
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007841
Dataset updated
Jan 29, 2022
Description
It was established with an overall objective to provide a resource of protein phosphorylation data from multiple plants. P3DB was constructed with a dataset from oilseed rape. The data was obtained using a combination of data-dependent neutral loss and multistage activation mass spectrometry. The dataset includes 14,670 non-redundant phosphorylation sites from 8,894 phospho-peptides in 6,382 substrate proteins.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institute of Standards and Technology (2023). Database Infrastructure for Mass Spectrometry - Per- and Polyfluoroalkyl Substances [Dataset]. http://doi.org/10.18434/mds2-2905

Database Infrastructure for Mass Spectrometry - Per- and Polyfluoroalkyl Substances

Explore at:

Unique identifier

https://doi.org/10.18434/mds2-2905, https://identifiers.org/ark:/88434/mds2-2905

Dataset updated

Jul 5, 2023

Dataset provided by

National Institute of Standards and Technologyhttp://www.nist.gov/

License

https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

Description

Data here contain and describe an open-source structured query language (SQLite) portable database containing high resolution mass spectrometry data (MS1 and MS2) for per- and polyfluorinated alykl substances (PFAS) and associated metadata regarding their measurement techniques, quality assurance metrics, and the samples from which they were produced. These data are stored in a format adhering to the Database Infrastructure for Mass Spectrometry (DIMSpec) project. That project produces and uses databases like this one, providing a complete toolkit for non-targeted analysis. See more information about the full DIMSpec code base - as well as these data for demonstration purposes - at GitHub (https://github.com/usnistgov/dimspec) or view the full User Guide for DIMSpec (https://pages.nist.gov/dimspec/docs). Files of most interest contained here include the database file itself (dimspec_nist_pfas.sqlite) as well as an entity relationship diagram (ERD.png) and data dictionary (DIMSpec for PFAS_1.0.1.20230615_data_dictionary.json) to elucidate the database structure and assist in interpretation and use.

Clear search

Close search

Google apps

Main menu

Database Infrastructure for Mass Spectrometry - Per- and Polyfluoroalkyl...

Mass Spectral Library

HREI-MSDB: High-resolution electron ionization mass spectral database for...

METLIN

NIST DART-MS Forensics Database (is-CID)

Version 4 (20230306) of the MALDI-ToF Mass Spectrometry Database for...

MassBank of North America

Data from: MassBank

NIST Libraries of Peptide Fragmentation Mass Spectra Database - SRD 1c

mini-Ni : A small GC-MS database of nitrogen-containing compounds

Metabolite and Tandem Mass Spectrometry Database

Data from: Combining High-Resolution and Exact Calibration To Boost...

Liquid Chromatography - Tandem Mass Spectrometry (LC-MS/MS) and Gas...

Version 2 (20170523) of the MALDI-TOF Mass Spectrometry Database for...

Proteome 2D-PAGE Database

MSnLib Mass spectral libraries (.mgf and .json)

Data from: Database for accurate EI-MS spectra for volatile compounds...

Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics

Gas Chromatography-Mass Spectrometry (GC-MS) Biomarker Database Table

Plant Protein Phosphorylation Database

Database Infrastructure for Mass Spectrometry - Per- and Polyfluoroalkyl SubstancesSee More Versions

Database Infrastructure for Mass Spectrometry - Per- and Polyfluoroalkyl Substances