NIST peptide libraries are comprehensive, annotated mass spectral reference collections from various organisms and proteins useful for the rapid matching and identification of acquired MS/MS spectra. Spectra were produced by tandem mass spectrometers using liquid chromatographic separations followed by electrospray ionization. Unlike the NIST small molecule electron ionization library which contains one spectrum per molecular structure, there are several different modes of fragmentation (ion trap and ?beam-type? collision cells are currently the most commonly used fragmentation devices) that result in spectra with different, energy dependent, patterns. These result in multiple spectral libraries, distinguished by ionization mode, each of which may contain several spectra per peptide. Different libraries have also been assembled for iTRAQ-4 derivatized peptides and for phosphorylated peptides. Separating libraries by animal species reduces search time, although investigators may elect to include several species in their searches.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data for MSnLib are divided into several Zenodo records due to size constraints.
raw positive: 10966404raw negative: 10967081mzml positive and negative: 10966280spectral libraries: 11163380
This record includes the automatically generated spectral libraries (MSnLib) within mzmine, acquired using a flow injection method on an Orbitrap ID-X instrument, for all compound libraries. There are multiple files for each compound library containing MS2 only or MSn in two data formats (.mgf or .json) for both polarities.
MS2 contains next to all MS2 spectra all pseudo MS2 spectra (a full MSn tree merged into one spectrum per compound ion). MSn contains all individual MSn stages additionally. The first number for each file highlights the library building date.
7 Compound Libraries:
Short Name: Full name, Provider (Catalog number), total compounds (not all detected during library building)
MCEBIO: Bioactive Compound Library, MedChemExpress (HY-L001), 10,315 compounds
MCESAF: 5k Scaffold Library, MedChemExpress, (HY-L902), 4998 compounds
NIHNP: NIH NPAC ACONN collection of NP, NIH/NCATS, 3988 compounds
OTAVAPEP: Alpha-helix Peptiomimetic Library, OTAVAchemicals (a-helix-Peptido), 1298 compounds
ENAMDISC: Discovery Diversity Set -10, Enamine (DDS-10), 10,240 compounds
ENAMMOL: Carboxylic Acid Fragment Library + Random, Enamine and Molport, 4378 compounds
MCEDRUG: FDA-Approved Drug Library, MedChemExpress (HY-L022), 2610 compounds
Information regarding the SPECTYPE
no SPECTYPE or SINGLE_BEST_SCAN: Best spectrum for each precursor and energy (highest TIC)
'SAME_ENERGY' = Additionally, if a spectrum was acquired multiple times for a precursor with the same energy, they are merged into one spectrum only with the same energy (max. signal height used for each fragment signal).
'ALL_ENERGIES' = merged spectrum of all used energies (in our case 3 for each precursor, using the merged (same energy) if available).
'ALL_MSN_TO_PSEUDO_MS2' = mzmine merges all MSn into one pseudo MS2.
V5 fixed USIs
Data here contain and describe an open-source structured query language (SQLite) portable database containing high resolution mass spectrometry data (MS1 and MS2) for per- and polyfluorinated alykl substances (PFAS) and associated metadata regarding their measurement techniques, quality assurance metrics, and the samples from which they were produced. These data are stored in a format adhering to the Database Infrastructure for Mass Spectrometry (DIMSpec) project. That project produces and uses databases like this one, providing a complete toolkit for non-targeted analysis. See more information about the full DIMSpec code base - as well as these data for demonstration purposes - at GitHub (https://github.com/usnistgov/dimspec) or view the full User Guide for DIMSpec (https://pages.nist.gov/dimspec/docs).Files of most interest contained here include the database file itself (dimspec_nist_pfas.sqlite) as well as an entity relationship diagram (ERD.png) and data dictionary (DIMSpec for PFAS_1.0.1.20230615_data_dictionary.json) to elucidate the database structure and assist in interpretation and use.
A library containing spectra upwards of 200,000 chemical compounds. Spectra include metabolites, peptides, contaminants, and lipids. All spectra and chemical structures are examined by professionals.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
GC-MS Database NIST/EPA/NIH MASS SPECTRAL LIBRARY (NIST 08) + update 2010 2.0f Apr 1 2009 x86 [2008, ENG] This library package contains the NIST 2008 Mass Spectral Library in the following manufacturer formats: 1. Agilent Chemstation (.L) (with structures) 2. NIST MS Search (compatible with most mass spectrometry software brands): Bruker; JEOL; LECO; PerkinElmer TurboMass; Thermo Electron XCalibur; Varian MS Workstation; Waters MassLynx; and other brands 3. PerkinElmer TurboMass (IDB) (with structures) 4. Shimadzu GCMS Solution (QP5000) (SPC) (no structures) 5. Waters MassLynx (IDB) (with structures) 6. Finnigan GCQ/Varian ITS-40 7. Thermo Galactic Spectral ID Includes: - Over 220,000 spectra, - Over 190,000 chemical structures, and - GC Retention Index Library, MS/MS Library - Licenses keys
A mass spectral database for organic compounds. The spectra included in the database are: electron impact Mass spectrum (EI-MS), Fourier transform infrared spectrum (FT-IR), 1H nuclear magnetic resonance (NMR) spectrum, 13C NMR spectrum, laser Raman spectrum, and electron spin resonance (ESR) spectrum.
Metadata-centric, auto-curating repository designed for storage and querying of mass spectral records. It contains metabolite mass spectra, metadata and associated compounds.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
(Version 20230306, btmsp files modified May 31, 2023, additional taxonomic information added Dec 27, 2024)
Version 4.2 (20230306) of the RKI MALDI-ToF mass spectra database represents the third update of the original database (version 20161027, https://doi.org/10.5281/zenodo.163517). The RKI Database v.4.2 now contains a total of 11055 MALDI-ToF mass spectra from 1601 microbial strains of highly pathogenic (i.e. biosafety level 3, BSL-3) bacteria such as Bacillus anthracis, Brucella melitensis, Yersinia pestis, Burkholderia mallei / pseudomallei and Francisella tularensis as well as a selection of spectra of their close and distant relatives. The database can be used as a reference for the diagnosis of BSL-3 bacteria using proprietary and free software packages for MALDI-ToF MS-based microbial identification. The spectral data are provided as a zip archive (zenodo db 230306.zip) containing the original mass spectra in their native data format (Bruker Daltonics). Please refer to the pdf file (230306-ZENODO-Metadata.pdf) for information on cultivation conditions, sample preparation and details of the spectra acquisition. Please do not try to print this document (>1600 pages!).
Version 20230306 of the RKI database contains for the first time files in the btmsp format (e.g. 2023-May-23-Bacillus-RKI-Database-568.btmsp and others). These files were generated using the MALDI Biotyper software (Bruker Daltonics) and contain a total of 1601 main spectra (msp) from the BSL-3 database in the proprietary data format of the MALDI Biotyper software. *.btmsp files can be imported and used for identification with this software solution. Please refer to the manufacturer's manual for details on importing btmsp files. Note that the btmsp file available in database version 4 is broken and cannot be imported.
The pkf files (230306_ZENODO_30Peaks_0.75.pkf, 230306_ZENODO_45Peaks_0.75.pkf) represent two versions of the MS peak list data in a Matlab compatible format. The latter data can be imported into MicrobeMS, a free Matlab-based software solution developed at the RKI. MicrobeMS can be used for the identification of microorganisms by MALDI-ToF MS and is available at https://wiki-ms.microbe-ms.com.
The Excel file Taxonomy information - RKI MALDI-ToF MS database of HPB at ZENODO v.4.xlsx contains additional taxonomic information such as a detailed list of bacterial MALDI-ToF mass spectra (sheet #1), overviews on the number of spectra per strain, species or bacterial genus (sheet #2), numbers of strains per species, or genus (sheet #3), etc.
The RKI mass spectrometry database is updated regularly.
The author would like to thank the following individuals for providing microbial strains and species or mass spectra thereof. Without their help, this work would not have been possible.
For a detailed description of the database see: Lasch, P., Beyer, W., Bosch, A. et al. A MALDI-ToF mass spectrometry database for identification and classification of highly pathogenic bacteria. Sci Data 12, 187 (2025). https://doi.org/10.1038/s41597-025-04504-z
This database is the product of a multi-year, comprehensive evaluation and expansion of the world's most widely used mass spectral reference library.
The NIST DART-MS Forensics Database is an evaluated collection of in-source collisionally-induced dissociation (is-CID) mass spectra of compounds of interest to the forensics community (e.g. seized drugs, cutting agents, etc.). The is-CID mass spectra were collected using Direct Analysis in Real-Time (DART) Mass Spectrometry (MS), either by NIST scientists or by contributing agencies noted per compound. The database is provided as a general-purpose structure data file (.SDF). For users on Windows operating systems, the .SDF format library can be converted to NIST MS Search format using Lib2NIST and then explored using NIST MS Search v2.4 for general mass spectral analysis. These software tools can be downloaded at https://chemdata.nist.gov. The database is now (09-28-2021) also provided in R data format (.RDS) for use with the R programming language. This database, also commonly referred to as a library, is one in a series of high-quality mass spectral libraries/databases produced by NIST (see NIST SRD 1a, https://dx.doi.org/10.18434/T4H594).
A mass spectral database that assists in identifying compunds in life sciences, matabolomics, pharmaceutical research, toxicology, forensic investigations, environemnta analysis, food control, and industry.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Agilent 1260 Infinity nHPLC stack and Thermo Orbitrap Velos Pro hybrid mass spectrometer were used for analysis of 8-µl samples with a C-18 column (75 μm x 15 cm; 300 Å; 5 μm; Phenomenex). All data were acquired in collision-induced dissociation mode. The phase A was 0.1% FA in ddH2O and phase B was 0.1% formic acid (FA) in 15% ddH2O/85% acetonitrile). The mobile phase gradient was: 10 min at 2% phase B, 90 min at 5-40% phase B, 5 min at 70% phase B and 10 min at 0% phase B. The MS detection included a full scan (m/z 300 -1200) with resolution at 60k and data-dependent MS2 scans on the top abundant ions (15 ions). The MS data files were converted to MzXML using ReAdW (v. 3.5.1). MzXML2 Search was used to create a Mascot generic format file. Data were analyzed using the SEQUEST engine and searches were performed using the Uniref100 database. The peptide ID lists were then further analyzed by Scaffold viewer. The mass spectrometry peptide identifications were filtered by Scaffold. In short, protein probabilities were set to ≥0.99 with false discovery rate
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw LC-MS (liquid chromatography-mass spectrometry) data for photocatalyst 1 (PC1). Data acquired on a Waters Acquity UPLC + Xevo G2-XS (LC-MS/MS). Sample in Water:Acetonitrile 95:5.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Flow injection method to acquire MSn data on an Orbitrap ID-X instrument for different compound libraries in positive and negative ionization mode. MS2 contains next to all MS2 spectra all pseudo MS2 spectra (a full MSn tree merged into one spectrum per compound ion). MSn contains all individual MSn stages additionally. .mgf and .json data formats are available with V3.
7 Compound Libraries:
Information regarding the SPECTYPE
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FragHub v1.2.4
Repository v8
INPUT_FILES downloaded in .mgf, .msp, .json or .csv)
https://systemsomicslab.github.io/compms/msdial/main.html#MS
https://github.com/MassBank/MassBank-data/releases/tag/2024.11
https://mona.fiehnlab.ucdavis.edu/downloads
https://external.gnps2.org/gnpslibrary
https://www.metaboanalyst.ca/docs/Databases.xhtml
https://zenodo.org/records/8287341
https://zenodo.org/records/13911806
OUTPUT_FILES available in .csv, .json or .msp
Integration results:
===================== PARAMETERS =====================
normalize_intensity: ON
remove_peak_above_precursormz: ON
check_minimum_peak_requiered: ON
n_peaks: 3.0
reduce_peak_list: ON
max_peaks: 500.0
remove_spectrum_under_entropy_score: OFF
entropy_score_value: 0.5
keep_mz_in_range: ON
from_mz: 50.0
to_mz: 2000.0
check_minimum_of_high_peaks_requiered: ON
intensity_percent: 5
no_peaks: 2
reset_updates: NO
======================= FILTERED OUT =======================
No peaks list: 0
No smiles, no inchi, no inchikey: 123401
No precursor mz: 9898
No or bad adduct: 4315818
Low entropy score: 0
Minimum peaks not required: 159243
All peaks above precursor mz: 3502
No peaks in mz range: 1006
Minimum high peaks not required: 93163
================== SPECTRUM NUMBER ==================
POS LC Exp: 1056998
NEG LC Exp: 389093
POS LC InSilico: 233623
NEG LC InSilico: 228443
POS GC Exp: 46
NEG GC Exp: 0
POS GC InSilico: 1
NEG GC InSilico: 0
Total: 1908204
================= UNIQUE INCHIKEYS ==================
POS LC Exp: 138374
NEG LC Exp: 80992
POS LC InSilico: 109085
NEG LC InSilico: 105233
POS GC Exp: 43
NEG GC Exp: 0
POS GC InSilico: 1
NEG GC InSilico: 0
TOTAL Unique InChIKeys: 304124
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
(Version 20170523)
Edit #1 (Nov 30, 2018): New database version (v.3 - 20181130) - available: 10.5281/zenodo.1880975
Edit #2 (Mar 06, 2023): New database version (v.4.2 - 20230306) - available: 10.5281/zenodo.7702375
Version 2 (20170523) of the RKI’s MALDI-TOF mass spectral database is an update of the original database (version 20161027, https://doi.org/10.5281/zenodo.163517). The RKI database contains mass spectral entries from highly pathogenic (biosafety level 3, BSL-3) bacteria such as Bacillus anthracis, Yersinia pestis, Burkholderia mallei, Burkholderia pseudomallei and Francisella tularensis as well as a selection of spectra from their close and more distant relatives. The database can be used as a reference for the diagnostics of BSL-3 bacteria using proprietary and free software packages for MALDI-TOF MS-based microbial identification. Spectral data are distributed as a 7-zip archive that contains the original mass spectra in its native data format (Bruker Daltonics). Please refer to the pdf file (170523-ZENODO-Metadata.pdf) to obtain information on the metadata of the spectra. Do not try to print this document (~1100 pages!)
The pkf-file (170523_ZENODO_Peaklist_30Peaks_1.6.pkf) contains the MS peak list data in a Matlab compatible format. The latter data file can be imported into MicrobeMS, a Matlab-based free-of-charge software solution developed at RKI. MicrobeMS is available from http://www.microbe-ms.com.
The RKI mass spectral database will be updated on a regular basis.
The author's grateful thanks are given to the following persons for providing microbial strains and species. Without their help this work would not be possible.
Wolfgang Beyer - University of Hohenheim, Faculty of Agricultural Sciences, Stuttgart, Germany
Guido Werner - Robert Koch-Institute, Nosocomial Pathogens and Antibiotic Resistances (FG13), Wernigerode, Germany
Alejandra Bosch - CINDEFI, CONICET-CCT La Plata, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina
Michal Drevinek - National Institute for Nuclear, Biological and Chemical Protection, Milin, Czech Republic
Roland Grunow - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany
Daniela Jacob - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany
Silke Klee - Robert Koch-Institute, Highly Pathogenic Microorganisms (ZBS2), Berlin, Germany
Jörg Rau - Chemisches und Veterinäruntersuchungsamt Stuttgart, Fellbach, Germany
Jens Jacob - Robert Koch-Institute, Hospital Hygiene, Infection Prevention and Control (FG14), Berlin, Germany
Martin Mielke - Robert Koch-Institute, Department 1 - Infectious Diseases, Berlin, Germany
Monika Ehling-Schulz - Functional Microbiology, Institute of Microbiology, University of Veterinary Medicine, Vienna, Austria
Armand Paauw - Department of Medical Microbiology, CBRN protection, Universitair Medisch Centrum Utrecht, TNO, Rijswijk, The Netherlands
Public repository of mass spectral data which allows users to search similar spectra on a peak-to-peak basis, on a neutral loss-to-neutral loss basis, or by the m/z value and molecular formula, search chemical compounds by substructures, and keyword search chemical compounds,
Spectral database of the subspecies of the Mycobacterium abscessus complex (MALDI-TOF Mass Spectrometry)
This data set originates from a collection of 41 clinical strains of Mycobacterium abscessus complex corresponding to 1001 mass spectra:
25 strains of Mycobacterium abscessus subsp. abscessus (633 mass spectra)
9 strains of Mycobacterium abscessus subsp. massiliense (204 mass spectra)
7 strains of Mycobacterium abscessus subsp. bolletii (164 mass spectra)
Each strain has been characterized using molecular method (DNA/DNA hydridation, using GenoType NTM-DR (Hain Lifescience, Nehren, Germany) according to the manufacturer's instructions for identification and analyzed by MALDI-TOF mass spectrometry according MycoEx protocol (Bruker®). The mass spectra spectra were obtained according to the following steps :
Each of the 41 strains was cultured in aerobic atmosphere at 37°C for 7 ± 2 days on blood agar (COH, bioMerieux®). Then, one colony was extracted according to the MycoEx protocol (Bruker®). For each of the extracts, 8 technical replicates were realized and analyzed by MALDITOF MS (Bruker®). Dried spots were overlaid with 1µL of MALDI matrix (α-HCCA).
Data acquisition was performed using a Microflex LT (Bruker® Daltonics) mass spectrometer equipped with a N2 laser (λ =377 nm). Instrument parameters used were as follows: a masse range between 200-20000 Da, ion source 1: 20 kV, ion source 2: 18.5 kV, Iens: 8.45 kV, pulsed ion extraction: 330 ns, laser frequency: 20.0 Hz. Spectra were obtained after 500 shots. Each spot was analyzed three times. In total 24 spectra were obtained for each extraction.
Spectra acquired for each isolate were visualized and analyzed using Flex Analysis software (Bruker® Daltonics), and spectra with low quality peaks were removed. A minimum of 15 spectra per extraction was necessary to validate the extraction.
This database is only intended for medical research. Please contact: medecine-drv@sorbonne-universite.fr for data access.
After access agreement, the three following files will be available :
The MABSC_spectra.zip file contains the MS peak list data in a Matlab compatible format.
The MABSC_metadata.pdf file contains the molecular identifications of strains.
The MABSC_notes.txt file contains informations concerning contains informations on the method of obtaining the data.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
At least two independent parameters are necessary for compound identification in metabolomics. We have compiled 2 212 electron impact mass spectra and retention indices for quadrupole and time-of-flight gas chromatography/mass spectrometry (GC/MS) for over 1 000 primary metabolites below 550 Da, covering lipids, amino acids, fatty acids, amines, alcohols, sugars, amino-sugars, sugar alcohols, sugar acids, organic phosphates, hydroxyl acids, aromatics, purines, and sterols as methoximated and trimethylsilylated mass spectra under electron impact ionization. Compounds were selected from different metabolic pathway databases. The structural diversity of the libraries was found to be highly overlapping with metabolites represented in the BioMeta/KEGG pathway database using chemical fingerprints and calculations using Instant-JChem. In total, the FiehnLib libraries comprised 68% more compounds and twice as many spectra with higher spectral diversity than the public Golm Metabolite Database. A range of unique compounds are present in the FiehnLib libraries that are not comprised in the 4 345 trimethylsilylated spectra of the commercial NIST05 mass spectral database. The libraries can be used in conjunction with GC/MS software but also support compound identification in the public BinBase metabolomic database that currently comprises 5 598 unique mass spectra generated from 19 032 samples covering 279 studies of 47 species (plants, animals, and microorganisms).
Golm Metabolome Database (GMD) provides public access to custom mass spectral libraries, metabolite profiling experiments as well as additional information and tools. Analytes are subjected to a gas chromatograph coupled to a mass spectrometer, which records the mass spectrum and the retention time linked to an analyte. This collection references GC-MS spectra.
NIST peptide libraries are comprehensive, annotated mass spectral reference collections from various organisms and proteins useful for the rapid matching and identification of acquired MS/MS spectra. Spectra were produced by tandem mass spectrometers using liquid chromatographic separations followed by electrospray ionization. Unlike the NIST small molecule electron ionization library which contains one spectrum per molecular structure, there are several different modes of fragmentation (ion trap and ?beam-type? collision cells are currently the most commonly used fragmentation devices) that result in spectra with different, energy dependent, patterns. These result in multiple spectral libraries, distinguished by ionization mode, each of which may contain several spectra per peptide. Different libraries have also been assembled for iTRAQ-4 derivatized peptides and for phosphorylated peptides. Separating libraries by animal species reduces search time, although investigators may elect to include several species in their searches.