This data set contains the spectral data associated with the collection of EC-SERS spectra using mainly a nontargeted drug identification approach, with several samples using a targeted fentanyl identification approach. The data set contains the replicate measurements and averaged Raman spectra used in the characterization of the analytes (drugs of abuse and adulterant compounds) to allow for forensic library formation. The data set also contains spectra of analytes collected at varying concentrations and additional fentanyl analog data collected using a targeted method.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raman spectroscopy is a rapid, non-invasive, and non-destructive method, featuring high chemical specificity for different biological materials and low sensitivity to water. This makes it ideal for natural medicines, as it offers a relatively objective and comprehensive characterization for their complicated material basis. Therefore, Raman spectroscopy plays a crucial role in the identification of medicinal properties, authentication of authenticity, and quality control of TCMs. At present, TCMRSD stands as the only downloadable, comprehensive Raman spectral database for TCMs, which encompasses spectra of 327 Chinese medicines collected through rigorous methodological validation. The selection of TCMs for database development is based on the considerations of diversity of medicines, medicinal importance and variety of medicinal properties, in order to guarantee a comprehensive range and representation of substances used in Chinese medicine.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Establishment of nanoplastic dataset. The spectra included in the nanoplastic database were obtained directly from the plastic samples. To establish the internal Raman spectral dataset, a total of 1,000 individual nanoparticles were examined, encompassing five common plastic contaminants, namely Polyethylene (PE), polytetrafluoroethylene (PTFE), Polystyrene (PS), polymethyl methacrylate (PMMA) and Polyvinyl chloride (PVC). For each specific plastic category, 200 nanoparticles were selected for subsequent analysis.
Content In each txt file corresponding to a Raman spectrum, the first two columns are the corresponding X and Y coordinates, respectively. The columns are: X-coordinate - wavenumber, Y-coordinate - Raman signal intensity.
More data are available upon request for research purposes only. Please send an email to zhanglw@fudan.edu.cn with a brief description of the purpose of use and your request for more data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary LF-785 and FT-Raman data of all the excipient samples included in the database. More information available in the following paper: https://doi.org/10.1016/j.vibspec.2020.103021
This portion of the data release presents Raman spectroscopy of rock samples collected from Von Damm vent field, Mid Cayman Rise, in the Caribbean Sea. These data were collected in 2020 (USGS Field Activity 2020-602-FA). Location information for the sample is included in each Attribute Definition of this metadata file.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Please note that there is no peer-reviewed publication associated with this data record.This fileset consists of 13 data files, 1 code file and 2 ReadMe files.The dataset data.mat is in .mat file format and therefore not openly-accessible. The following datasets, are an openly-accessible version of the .mat file:Fig2_1.txt in .txt file formatFig2_2.txt in .txt file formatFig2_3.txt in .txt file formatFig2_4.txt in .txt file formatFig2_5.txt in .txt file formatFig2_6.txt in .txt file formatraw_COVID.txt in .txt file formatraw_Helthy.txt in .txt file formatraw_Suspected.txt in .txt file formatraw_Tube.txt in .txt file formattable2_data.txt in .txt file formatwave_number.txt in .txt file formatThe code file is the following: code.m in .m file formatThe two ReadMe files are the following: readme.txt in .txt file format and readme.m in .m file format.Data in Fig2_1.txt, Fig2_2.txt, Fig2_3.txt, Fig2_4.txt, Fig2_5.txt and Fig2_6.txt were used to plot Figure 2 in the related manuscript.raw_COVID.txt contains the raw Raman spectroscopy data from the serum samples obtained from the 53 confirmed COVID-19 patients.raw_Helthy.txt contains the raw Raman spectroscopy data from the serum samples obtained from healthy individuals.raw_Suspected.txt contains the raw Raman spectroscopy data from the serum samples obtained from suspected cases (individuals suspected of COVID-19 infection)raw_Tube.txt contains the raw spectra data from cryopreservation tubes with saline solution inside.wave_number.txt contains data of the Raman Spectrum shift.table2_data.txt was used to generate Table 2 in the related manuscript.The code code.m was used for data processing.Software needed to access data: data.mat can only be accessed using the Matlab software. Running the code code.m also requires Matlab.Study aims and methodology: The recommended diagnosis method for the coronavirus disease (COVID-19 is a qPCR-based technique, however, it is a time consuming, expensive, and a sample dependent procedure with relative high false negative ratio. The aim of this study was to develop a widely available, cheap and quick method to diagnose COVID-19 disease based on Raman spectroscopy.A total of 157 serum samples were collected from 53 confirmed patients, 54 suspected cases (fever but not COVID-19) and 50 healthy controls. Raman spectroscopy was used to analyse these samples and the machine learning support vector machine (SVM) method were applied to the spectral dataset to build a diagnostic algorithm.The experimental set up consisted of a Volume Phase Holographic (VPH) spectrograph, deep-cooled CCD camera, and a Raman probe and laser. A total of 2355 spectra from 157 individuals were imported to MATLAB (R2013a) software (Math-200 works, Inc.).For more details on the methodology, please read the related article.
The development of uniform, consistent spectroscopic databases of Raman spectra is important for the community to maximize the value of emerging machine learning techniques. This dataset contains processed and augmented Raman spectra acquired on a variety of common plastics, with variations in manufacturer and properties such as plastic color. The Raman spectra span the frequency window from 300 to 3900 cm-1, were collected using variations in instrumentation settings, were interpolated to 1 cm-1 wavenumber spacing to ensure compatibility, and were augmented 5X by random scaling and artificial noise introduction. Three different versions of the data are provided, each enabling exploration of a different strategy for training machine learning classification models. This data was used to train microplastic classification models using K-nearest neighbor algorithm of the sklearn package in python, as published in the associated manuscript. Python pickle files are included in the dataset, which contain the optimized models and supporting information for the models. The data are being posted in support of this research. The data was created by the authors.
Included in this data release are eight files: one metadata file, 6 comma separated value (.csv) datafiles and one data dictionary file defining the entities and attributes that constitute this dataset. The 6 datafiles support the publication, Deep syntectonic burial of the Anthracite belt, Eastern Pennsylvania, by providing in csv format the data from Figure 11 – Representative Raman spectra for selected individual CH4 ± CO2 fluid inclusions, Figure 13 – Representative Raman spectra pairs for selected individual High-ThA fluid inclusions, Figure 15 – Representative Raman spectra pairs for selected individual Low-ThA fluid inclusions, Table 1 – Comparison of composition by Raman and microthermometry for Single-Phase inclusions, Table 2 – Comparison of inclusion density by Raman and microthermometry for single-phase Inclusions, and Table 3 – Two-phase inclusion vapor bubble composition determined by Raman spectroscopy. These files contain Raman data from fluid inclusions contained in rocks from the Anthracite belt region, Pennsylvania.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided dataset includes information on edible oil samples collected from grocery stores in and around Newark, Delaware, between the summer of 2014 and the spring of 2016. A total of 100 oil bottles were obtained, and three different data sets were created.
Data Set 1 contains measurements from all 100 samples and was obtained using NIR, MIR, and Raman spectroscopic techniques. The peroxide values (PVs) of the samples were determined through titration at Lawrence Livermore National Laboratory. Data Set 1 is divided into two subgroups: Data Set 1A, measured in 2016, and Data Set 1B, measured in 2019.
Data Set 2 is a subset of Data Set 1, consisting of 53 oil samples. These samples were measured using Raman spectroscopy and titrated to determine the PV at the University of Delaware.
Data Set 3 is another subset of Data Set 1, comprising 356 IR spectra of 20 varieties of edible oils as well as 120 spectra of extra virgin olive oil that has been adulterated by corn oil, canola oil or almond oil. These samples were measured using ATR-FTIR spectroscopy at 4 cm^-1 resolution at Oklahoma State University. Data Set 3 includes pure oil samples as well as adulterated oil samples, specifically adulterated extra virgin olive oil (EVOO) with corn oil, canola oil, or almond oil.
The measurement techniques and parameters varied for each data set. NIR spectra were acquired using FTIR spectrometers with different optical path lengths, MIR spectra were collected using a liquid nitrogen-cooled mercury cadmium telluride (MCT) detector, and Raman spectra were obtained with different Raman probes and lasers. The spectroscopic measurements were complemented with titration measurements to determine the PVs.
The dataset is provided as individual csv files for each type of spectroscopy, with the first two columns capturing class and corresponding peroxide value for the spectrum and the top row capturing the wavelength range of the spectra.
Note: There are a few instances where replicates were not taken or certain samples were replaced with NaN variables to maintain the proper matrix dimensions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is presented in the publication "Hyperspectral image analysis for CARS, SRS, and Raman data", J. Raman. Spectroscopy (2015): (http://dx.doi.org/10.1002/jrs.4729). It contains coherent anti-Stokes Raman scattering hyperspectral images and their analysis in terms of concentrations of chemical components and their spectra using the hyperspectral image analysis (HIA) software developed by ourselves. Additional data is shown to exemplify the functionality of HIA to filter motion artefacts.
Located on “the roof of the world” Tibetan Plateau, the western Qaidam Basin is a cold, dry, and irradiative environment that shapes itself with landforms (e.g., dunes, yardangs, playas, wind streaks, polygonal terrains, and gullies) commonly found on Mars. A clastic quartz stone was sampled from a Cenozoic gravel deposit (38°35′44″ N, 90°59′6″ E, 3245.17 m altitude) from the hyperarid Dalangtan Playa, western Qaidam Basin, on 29 July 2021. The Cenozoic gravel deposit was likely derived from the weathering of Mesozoic (Pre-Jurassic and Jurassic) rocks, and quartz stones were common in the deposit. Visible light greenish color could be observed at the bottom of the quartz stone. Multiple spots of four vertical lines (11 spots for line 1 Qz-l1, 6 spots for line 2 Qz-l2, 9 spots for line 3 Qz-l3, and 8 spots for line 4 Qz-l4) of the Qaidam quartz stone were selected to stereoscopically investigate the spatial distribution of Raman spectra-based mineralogical or organic/biotic signals. An a...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data contain Raman spectra, calibration and data alalysis results for all experiments conducted in the manuscript
Figure 3 Raman spectra.xlsx: Provides the source data for drawing this picture.
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Spectra of three over-the-counter pharmaceuticals—acetylsalicylic acid, paracetamol, and ibuprofen—were collected at the Faculty of Exact and Natural Sciences of the National University of Asuncion, with the aim of creating a dataset that serves as a reference for Raman responses from different drug manufacturers. This dataset will also provide the scientific community with data that can be used for multivariate analysis and model training.
In the data collection phase, spectra were obtained using a Raman spectroscopy system (iRaman 785s model from BWTEK) equipped with a 785 nm excitation laser. Samples were collected from diverse sales points such as pharmacies, shopping centers, and street vendors. Each spectrum was captured at 50% laser power with a measurement time of 1 second and an accumulation of 10 spectra over a range of 150 to 3200 cm-1. This method preserved the integrity of the raw data, which includes a common column for Raman shifts and additional columns for intensities and labels, detailing the activation modes in the Raman spectrum.
The data is structured into specific xlsx files for each drug, such as "Paracetamol.xlsx", "acetylsalicylic-acid .xlsx", and "Ibuprofen .xlsx", each containing 50 spectra categorized by the type of pharmaceutical but not by brand. Brand-specific categorization is detailed in separate files like "Paracetamol-trademark .xlsx", where samples are classified using codes such as "Par-A" for different brands. This organization aids the scientific community in using clustering methods to analyze the spectral data and differentiate pharmaceutical brands based on their excipients or binders, with consistent codes across different drugs suggesting common manufacturers for various medications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
X-ray diffraction (XRD) pattern of the iron substrate sample after electrodeposition.
X-ray photoelectron spectroscopy (XPS) of the synthetic mackinawite deposit for (a) Fe 2p spectrum and (b) S 2p spectrum, confirming the deposited sulfide is mackinawite and not stoichiometric pyrite.
Raman spectrum of iron pyrite nano-particles synthesized by the hot injection method at 200 deg C.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The RamanLab database system consists of two primary pickle files that store comprehensive mineral spectroscopic data. The main database file RamanLab_Database_20250602.pkl contains the core spectral library with reference Raman spectra for hundreds of minerals, including their wavenumber positions, relative intensities, and associated metadata such as chemical formulas, crystal systems, and space groups. This database serves as the primary reference for the correlation-based search and match functionality, enabling identification of unknown minerals through spectral comparison algorithms. The database is structured to support both individual mineral identification and complex mixed-mineral analysis workflows.The complementary mineral_modes.pkl file focuses specifically on vibrational mode assignments and implements the complete Hey-Celestian classification system with all 15 mineral groups, including Sheet Silicates, Simple Oxides, Octahedral Framework minerals, various Silicate chains (Single and Double), Ring Silicates, Complex Oxides, Hydroxides, and Mixed Modes. This database provides detailed vibrational mode information for each mineral, including fundamental frequencies, overtones, combination bands, and their structural origins. The classification system includes chemical constraints and scoring mechanisms that provide 2.0x boosts when sample chemistry matches expected mineral compositions, enabling more accurate phase identification in complex samples. Together, these databases form an integrated system that supports both spectral matching and crystallographic interpretation of Raman spectroscopic data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Raman spectral dataset comprising 3,510 spectra from 32 chemical substances. This dataset includes organic solvents and reagents commonly used in API development, along with information regarding the products in the XLSX, and code to visualise and perform technical validation on the data.
The minerals used in this study were supplied by the Australian Museum (ASM). The minerals have been characterized by both X-ray diffraction (XRD) and by chemical analysis using ICP-AES (inductively coupled plasma atomic emission spectroscopy) techniques.
The following samples were used: (a) sample ASM-D49056 boléite from the Amelia Mine, Santa Rosalia, Baja, California, Mexico; (b) sample ASM-D 27575 cumengéite, Beleo, Baja California, Mexico; (c) sample ASM D36845 diaboléite from Mannoth mine, Tiger, Arizona, USA; and (d) sample ASM D191881 phosgenite from Consols mine, Broken Hill, South Australia.
Crystals of the minerals were placed and orientated on a polished metal surface on the stage of an Olympus BHSM microscope, which is equipped with 10 × and 50 × objectives. The microscope is part of a Renishaw 1000 Raman microscope system, which also includes a monochromator, a filter system and a Charge Coupled Device (CCD). Raman spectra were excited by a Spectra-Physics model 127 He-Ne laser (633 nm) at a resolution of 2 cm−1 in the range between 100 and 4000 cm−1. Repeated acquisition using the highest magnification was accumulated to improve the signal to noise ratio in the spectra. Spectra were calibrated using the 520.5 cm−1 line of a silicon wafer.
Infrared (IR) spectra were obtained using a Nicolet Nexus 870 FTIR spectrometer with a smart endurance single bounce diamond ATR cell. Spectra over the 4000 to 525 cm−1 range were obtained by the co-addition of 64 scans with a resolution of 4 cm−1 and a mirror velocity of 0.6329 cm/s.
Spectroscopic manipulation such as baseline adjustment, smoothing and normalization were performed using the Spectracalc software package GRAMS (Galactic Industries Corporation, New Hampshire, USA). Band component analysis was undertaken using the Jandel ‘Peakfit’ software package, which enabled the type of fitting function to be selected and allows specific parameters to be fixed or varied accordingly. Band fitting was done using a Gauss-Lorentz cross-product function with the minimum number of component bands used for the fitting process. The Gauss-Lorentz ratio was maintained at values >0.7 and fitting was undertaken until reproducible results were obtained with squared correlations of r2 >0.995.
Figure 1 is Raman spectra of the hydroxyl-stretching region of (a) phosgenite, (b) boléite, (c) diaboléite and (d) cumengéite. Figure 2 shows band component analysis of the hydroxyl-stretching region of the Raman spectrum of (a) diaboléite and (b) cumengéite. Figure 3 is Raman spectra of the 600–1000 cm−1 region of (a) boléite, (b) diaboléite and (c) cumengéite. Figure 4 is Raman spectra of the carbonate region of phosgenite. Figure 5 is Raman spectra of the 100–500 cm−1 region of (a) phosgenite, (b) boléite, (c) diaboléite and (d) cumengéite.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets Mineral and Organic for replicating the paper "Raman spectrum matching with contrastive learning".
The detailed instruction about how to use the dataset, please visit our Github Repository.
To use these two datasets, please cite:
@inproceedings{Lafuente2016ThePO, title={The power of databases: The RRUFF project}, author={B. Lafuente and R. Downs and Hexiong Yang and N. Stone}, booktitle = {Highlights in Mineralogical Crystallography}, year={2016} }
@article{organic_dataset, author = {Zhang, Rui and Xie, Huimin and Cai, Shuning and Hu, Yong and Liu, Guo-kun and Hong, Wenjing and Tian, Zhong-qun}, title = {Transfer-learning-based Raman spectra identification}, journal = {Journal of Raman Spectroscopy}, volume = {51}, number = {1}, pages = {176-186}, keywords = {deep learning, Raman spectroscopy, transfer learning}, year = {2020} }
@Article{D2AN00403H, author ="Li, Bo and Schmidt, Mikkel N. and Alstrøm, Tommy S.", title ="Raman spectrum matching with contrastive representation learning", journal ="Analyst", year ="2022", volume ="147", issue ="10", pages ="2238-2246", publisher ="The Royal Society of Chemistry", doi ="https://doi.org/10.1039/d2an00403h", url ="http://dx.doi.org/10.1039/D2AN00403H", }
This data set contains the spectral data associated with the collection of EC-SERS spectra using mainly a nontargeted drug identification approach, with several samples using a targeted fentanyl identification approach. The data set contains the replicate measurements and averaged Raman spectra used in the characterization of the analytes (drugs of abuse and adulterant compounds) to allow for forensic library formation. The data set also contains spectra of analytes collected at varying concentrations and additional fentanyl analog data collected using a targeted method.