Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the spectroscopic data obtained from a homemade NIR spectrometer developed for agricultural quality analysis, along with the calibration and validation of a model database for predicting agricultural soil properties. We collected NIR spectral data from 190 soil samples taken at a depth of 0-20 cm from agricultural areas in northern Thailand, including vegetable farms, orchards, and field crops. The acquisition process started by air-drying the soil and sieving it through 2.0 mm and 0.5 mm mesh. Six preprocessing techniques, including Savitzky-Golay smoothing, multiplicative scatter correction (MSC), standard normal variate (SNV), first derivative, second derivative, and mean centering, were used with partial least squares (PLS) regression to create the prediction model for soil organic matter and total carbon. Seventy percent of the sample was divided into calibration and the remaining thirty percent was validation. Our results demonstrate the effectiveness of these models. The NIR dataset spanning 900-1,700 nm proved to be an ideal wavelength range for developing a portable/handheld NIR spectrometer, with potential for further accuracy improvements through model refinement.
Facebook
TwitterThe ICRAF-ISRIC Soil MIR Spectral Library contains visible near infrared spectra of 4,438 soils selected from the Soil Information System (ISIS) of the International Soil Reference and Information Centre (ISRIC). The samples consist of all physically archived samples at ISRIC in 2004 for which soil attribute data was available. The spectra were measured at the World Agroforestry Center's (ICRAF) Soil and Plant Spectral Diagnostic Laboratory. The samples are from 58 countries spanning Africa, Asia, Europe, North America, and South America. Associated attribute data, such as geographical coordinates, horizon (depth), and physical and chemical properties, are provided in a single relational database. The purpose of the library is to provide a resource for research and applications for sensing soil quality both in the laboratory and from space
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
The dataset is composed of chemical analysis results and the NIR spectra of 490 solid animal organic waste products that were collected in 2 campaigns conducted in 2018 and 2019. The sampling was designed to capture the large diversity of animal species (mainly cattle, pigs and poultry), type of farming and storage modes). Compositional parameters (dry matter, organic matter, total and ammonium nitrogen, phosphorus, potassium, calcium and magnesium contents) were analyzed according to French AFNOR standards. Samples were scanned using a Q-interline AgriQuant B8 equipped with a patented spiral sampler, which aggregates the heterogeneity of the sample. This dataset covers a wide range of variability in the composition of solid animal manure, and is of great interest to chemometricians and agronomists in search of references on the fertilizing value of these products.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Up-to-date information on soil properties and the ability to track changes in soil properties over time are critical for improving multiple decisions on soil security at various scales, ranging from global climate change modeling and policy to national level environmental and development planning, to farm and field level resource management. Diffuse reflectance infrared spectroscopy has become an indispensable laboratory tool for the rapid estimation of numerous soil properties to support various soil mapping, soil monitoring, and soil testing applications. Recent advances in hardware technology have enabled the development of handheld sensors with similar performance specifications as laboratory-grade near-infrared (NIR) spectrometers.
Here, we've compiled a hand-held NIR spectral library (1350-2550 nm) using the NeoSpectra Handheld NIR Analyzer developed by Si-Ware. Each scanner is fitted with Fourier-Transform technology based on the semiconductor Micro Electromechanical Systems (MEMS) manufacturing technique, promising accuracy, and consistency between devices.
This library includes 2,106 distinct mineral soil samples scanned across 9 of these portable low-cost NIR spectrometers (indicated by serial no). 2,016 of these soil samples were selected to represent the diversity of mineral soils found in the United States, and 90 samples were selected across Ghana, Kenya, and Nigeria. 519 of the US samples were selected and scanned by Woodwell Climate Research Center. These samples were queried from the USDA NRCS NSSC-KSSL Soil Archives as having a complete set of eight measured properties (TC, OC, TN, CEC, pH, clay, sand, and silt). They were stratified based on the major horizon and taxonomic order, omitting the categories with less than 500 samples. Three percent of each stratum (i.e., a combination of major horizon and taxonomic order) was then randomly selected as the final subset retrieved from KSSL's physical soil archive as 2-mm sieved samples. The remaining 1,604 US samples were queried from the USDA NRCS NSSC-KSSL Soil Archives by the University of Nebraska - Lincoln to meet the following criteria: Lower depth <= 30 cm, pH range 4.0 to 9.5, Organic carbon <10%, Greater than lower detection limits, Actual physical samples available in the archive, Samples collected and analyzed from 2001 onwards, Samples having complete analyses for high-priority properties (Sand, Silt, Clay, CEC, Exchangeable Ca, Exchangeable Mg, Exchangeable K, Exchangeable Na, CaCO3, OC, TN), & MIR scanned.
All samples were scanned dry 2mm sieved. ~20g of sample was added to a plastic weighing boat where the NeoSpectra scanner would be placed down to make direct contact with the soil surface. The scanner was gently moved across the surface of the sample as 6 replicate scans were taken. These replicates were then averaged so that there is one spectra per sample per scanner in the resulting database.
A subset of 1,976 US topsoil samples was used to create Cubist models for 8 soil properties including bulk density (BD, <2mm fraction, 1/3 Bar, units in grams per cubic centimeter), calcium carbonate (CaCO3, <2mm fraction, units in weight percent), clay content (percent), buffered ammonium-acetate exchangeable potassium (Ex. K, units in centimoles of charge per kilogram of soil), pH, sand content (percent), silt content (percent), and estimated organic carbon (SOC, estimated after inorganic carbon removal, units in weight percent). Two strategies were evaluated for handling scanner-to-scanner variability: averaging scans per sample (avg) versus retaining replicate scans across all scanners (reps) during model building. Cubist avg models and cubist reps models are provided here for the 8 soil properties outlined in “.qs” file format and can be opened and worked with in the R programming language. The subset of 1,976 samples has also been provided here for reproducibility (1976_NSlibrary_withmetadata.csv).
The repository contains:
Neospectra_database_column_names.csv: describes the variables (columns) of site and soil data, and the range of near-infrared (NIR, 1350-2550 nm) and mid-infrared (MIR, 600-4000 cm-1) spectra. The CSV is composed of the file name, column name, type, example, and description with measurement unit.
Neospectra_WoodwellKSSL_MIR.csv: the equivalent MIR spectra of neospectra samples fetched from the KSSL database and formatted to the OSSL specifications.
Neospectra_WoodwellKSSL_soil+site+NIR.csv: soil, site, and Neospectra's NIR. Each row contains one replicated spectra of a given scanner (6 repeats per scanner per soil sample). Soil and site info is filled within the same soil sample.
1976_NSlibrary_withmetadata.csv: data matrix for reproducible model calibration.
Models:
log..bd_model_nir.neospectra_cubist_AVG_ossl_na_v1.2.qs: Cubist average NIR model for log(1+BD).
log..caco3_model_nir.neospectra_cubist_AVG_ossl_na_v1.2.qs: Cubist average NIR model for log(1+CaCO3).
clay_model_nir.neospectra_cubist_AVG_ossl_na_v1.2.qs: Cubist average NIR model for clay.
log..k.ex_model_nir.neospectra_cubist_AVG_ossl_na_v1.2.qs: Cubist average NIR model for log(1+Ex. K).
ph.h2o_model_nir.neospectra_cubist_AVG_ossl_na_v1.2.qs: Cubist average NIR model for pH.
sand_model_nir.neospectra_cubist_AVG_ossl_na_v1.2.qs: Cubist average NIR model for sand.
silt_model_nir.neospectra_cubist_AVG_ossl_na_v1.2.qs: Cubist average NIR model for silt.
log..soc_model_nir.neospectra_cubist_AVG_ossl_na_v1.2.qs: Cubist average NIR model for log(1+SOC).
log..bd_model_nir.neospectra_cubist_REPS_ossl_na_v1.2.qs: Cubist replicates NIR model for log(1+BD).
log..caco3_model_nir.neospectra_cubist_REPS_ossl_na_v1.2.qs: Cubist replicates NIR model for log(1+CaCO3).
clay_model_nir.neospectra_cubist_REPS_ossl_na_v1.2.qs: Cubist replicates NIR model for clay.
log..k.ex_model_nir.neospectra_cubist_REPS_ossl_na_v1.2.qs: Cubist replicates NIR model for log(1+Ex. K).
ph.h2o_model_nir.neospectra_cubist_REPS_ossl_na_v1.2.qs: Cubist replicates NIR model for pH.
sand_model_nir.neospectra_cubist_REPS_ossl_na_v1.2.qs: Cubist replicates NIR model for sand.
silt_model_nir.neospectra_cubist_REPS_ossl_na_v1.2.qs: Cubist replicates NIR model for silt.
log..soc_model_nir.neospectra_cubist_REPS_ossl_na_v1.2.qs: Cubist replicates NIR model for log(1+SOC).
Facebook
TwitterThe beer dataset contains 60 samples published by Norgaard et al. Recorded with a 30mm quartz cell on the undiluted degassed beer and measured from 1100 to 2250 nm (576 data points) in steps of 2 nm.
A good playing ground for regression methods starting from spectral intensities.
References Norgaard, L., Saudland, A., Wagner, J., Nielsen, J. P., Munck, L., & Engelsen, S. B. (2000). Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413–419.
Adapted from a R dataset available as part of the OHPL package (https://search.r-project.org/CRAN/refmans/OHPL/html/00Index.html).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets contain Near-infrared (NIR) absorbance spectra of the wavelength range 309-149 nm of mango mesocarp with corresponding Dry Matter Content (DMC) values.
The file "MangoDMC_NIR_Data_v3.csv" contains data as used in the publication "Achieving robustness across season, location and cultivar for a NIRS model for intact mango fruit dry matter content" (Postharvest Biology and Technology, 2020, 168:111202; https://www.sciencedirect.com/science/article/pii/S0925521420301629), with addition of data from an additional harvest season, as used in the publication "Evaluation of 1D Convolutional Neural Network in Estimation of Mango Dry Matter Content" (Spectrochimica Acta Part A 2024 311: 124003; https://www.sciencedirect.com/science/article/pii/S1386142524001690). This file is as presented in version 3 of this data repository.
The current version (4) has an additional file ".csv". This file augments the data of version 3 with data from additional instruments and seasons as used in the submitted thesis of Jeremy Walsh, 2024, Central Queensland University, "Deep Learning in Estimation of Fruit Attributes Using Near Infrared Spectroscopy".
Facebook
TwitterNear-infrared (NIR) calibration models are created by applying multivariate calibration methods to the combination of wet chemistry data and NIR spectra of a given set of biomass samples. Wet chemical compositional data and NIR spectra exist for the following types of biomass samples: corn stover, switchgrass, mixed hardwoods, mixed softwoods, sorghum, and miscanthus. These samples may be feedstock samples, washed and dried solids from one or more pretreatment processes, liquors derived from one or more pretreatment processes, or whole pretreated slurries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dry matter content of mango fruit is an important metric for determining harvest maturity and ensuring the eating quality of the ripened fruit. Near infrared spectroscopy can be used as a non-invasive method of estimating attributes of individual fruit, including dry matter content. The technique relies on statistic models (‘chemometrics’) to deduce information on sample attributes from spectra collected from the fruit. Barriers to the adoption of this technique for practical use in the fruit industry include the robustness of models across fruit from different growing conditions and spectra collected on different instruments. The proposed research is intended to reduce these barriers for the assessment of mango dry matter content by exploring new techniques for developing robust, global models across season, growing conditions, fruit variety, individual instruments and other variations. This would allow new instruments to be used ‘out of the box’ without the need for local calibration, hence greatly reducing the cost of uptake. Deep learning modelling techniques have been recently applied to spectroscopic applications, with claims of improved performance over the standard chemometric method, Partial Least Squares Regression, although these studies have typically involved relatively small datasets with limited testing on new populations of data. With access to an extended dataset of over 80,000 spectra from over 500 fruit populations, the aim of this proposed study is to validate previous publication claims that the use of a Convolutional Neural Network (CNN) model, a deep learning technique, is superior to existing methods in NIRS based prediction of mango dry matter content. The study also aims to optimise the operation and the architecture of the CNN model over that employed in previous publications, in context of the mango dry matter data set and use on a portable instrument.
Facebook
Twitterhttps://www.nist.gov/open/licensehttps://www.nist.gov/open/license
This repository includes example code and data to correlate near-infrared spectroscopy (NIR) measurements with physical measurements for pure polyolefin samples. Specifically the provided Jupyter Notebook focuses on NIR correlations with density, crystallinity, and short chain branching. For additional information on using the code, please refer to the README.md, and for more information on the provided data, please refer to the README_data.md. Both files are included here and with the github code repository. For further information on methodology and interpretation, please refer to Bradley P. Sutliff, Shailja Goyal, Tyler B. Martin, Peter A. Beaucage, Debra J. Audus, and Sara V. Orski, "Correlations of Near-Infrared Spectra to Bulk Properties in Polyolefins, using Principal Component Analysis" Macromolecules 2024, 57, 5, 2329-2338, DOI: https://doi.org/10.1021/acs.macromol.3c02290
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
This dataset presents near infrared spectra of soil samples from the experimental INRAE stations of the CAREX network including Auzeville, Epoisses, Crouel, Theix, Lusignan, Lusignan_Oasys and Ploudaniel sites (n=1040). Spectra data were acquired using a near infrared spectrometer BUCHI at Laboratoire d'Analyses des sols (LAS), Arras. The granulometric fractions and chemical properties measurements are available with their uncertainties. The tables of NIR spectra and chemical analysis and granulometry of soils from Isère (n=28) and from Plaine_de_Versailles (n=99) locations were added. The details of the transformed NIR spectra table of Plaine_de_Versailles are available at https://doi.org/10.15454/LXKFAS.
Facebook
TwitterThis data record contains a CSV file with spectral reflectance (420-2114 nm) for sediment samples collected from each of four source locations (cropland, stream bank, glaciolacustrine, and street dust) located across sites in the Northwestern US during prior studies. Data were collected in a laboratory setting using Spectrecology USB4000-VIS-NIR and NIRQuest512-2.2 spectrometers. The data contain spectra collected from sieved, non-sieved, dry and suspended samples from each sample source, as well as spectra for calibration standards. Explanation of fields in contained in the spectral properties data table are described in a data dictionary. Data on the physical properties of samples used in this study have been collected and are summarized in this data release.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains two core asset types: Data Files and Model Files. 1. Data Files The dataset is provided in two separate .xlsx files: Raw-nir-spectra-data: This file contains the raw near-infrared spectral dataset. It records the spectral information for all 347 tobacco samples and includes metadata such as each sample's unique ID, cultivation year, and country of origin. 13-Chemical-Components-data: This file contains the reference dataset for the chemical constituents. It includes the quantitative analysis results for the 13 key chemical components for all 347 samples, corresponding one-to-one with the spectral data. 2. Model Files The database provides 99 pre-trained prediction and classification models in .joblib format. All models were built in a Python 3.9 environment and can be loaded and called directly. To facilitate easy identification and use, the model files adhere to the following naming conventions: A. Quantitative Models (Chemical Prediction) This naming format is used for the quantitative prediction models of the 13 chemical constituents. Format: [Chemical_Component]_[Preprocessing_Method]_[Modeling_Method].joblib Example: TotalSugars_MSC_PLS.joblib represents a PLS model for predicting Total Sugars using MSC preprocessing. B. Classification Models (Origin Prediction) This naming format is used for classification models built with different types of input data. Format (based on spectral data): [Preprocessing_Method]_[Modeling_Method].joblib Example: SecondDerivative_RF.joblib represents a Random Forest (RF) classification model built using second-derivative spectral data. Special Note: The file Thirteen_chemical_components-RF.joblib is a special classification model. It does not use spectral data; instead, it is built using the quantitative results of the 13 chemical components directly as its input features.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset description
This is the corresponding dataset to the publication "Austrian NIR Soil Spectral Library for Soil Health Assessments" by Fohrafellner et al. (2025). In this publication, we created the first Austrian Near-Infrared (NIR) Soil Spectral Library (680 – 2500 nm) using 2,129 legacy samples from all environmental zones of Austria. Additionally, we utilized partial least squares regression modeling to evaluate the dataset's current effectiveness for soil health assessments. The dataset contains three tabs, "Document meta data", "Legend" and "Dataset". Tab "Document meta data" gives information on the authors, the data collection time frame, terms of use, etc. In "Legend", each column of the "Dataset" is described. The "Dataset" contains information on the legacy soil samples including:
Project description
This Austrian NIR Soil Spectral Library was built within the ProbeField project (November 2021 – January 2025), which was part of the European Joint Program for SOIL "Towards climate-smart sustainable management of agricultural soils" (EJP SOIL) funded by the European Union Horizon 2020 research and innovation programme (Grant Agreement N° 862695). The project aimed to create a protocol detailing procedures and methodologies for accurately estimating fertility-related properties in agricultural soils in the field. Additionally, the potential for extending this data to two- and three-dimensional mapping using co-variates was demonstrated. ProbeField further collected field spectra that closely match laboratory spectra, enabling the prediction of soil properties using models calibrated with soil spectral libraries.
References
Fohrafellner, J., Lippl, M., Bajraktarevic, A., Baumgarten, A., Spiegel, H., Körner, R. and Sandén, T.: Austrian NIR Soil Spectral Library for Soil Health Assessments, 2025, in review.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chemical and Nir spectra measurements were made on 60 sugarcane samples from different plant parts (leaves, stem or whole aerial part). Chemical parameters (total sugar content - TS, crude protein content - CP, acid detergent fiber - ADF, in vitro organic matter digestibility - IVOMD were determined. In parallel, reflectance spectra were measured using eight spectrometers. To illustrate, PLS-R results are showed for the total sugar content and crude protein contents.
Facebook
TwitterThe ICRAF-ISRIC Soil VNIR Spectral Library contains visible near infrared spectra of 4,438 soils selected from the Soil Information System (ISIS) of the International Soil Reference and Information Centre (ISRIC). The samples consist of all physically archived samples at ISRIC in 2004 for which soil attribute data was available. The spectra were measured at the World Agroforestry Center's (ICRAF) Soil and Plant Spectral Diagnostic Laboratory. The samples are from 58 countries spanning Africa, Asia, Europe, North America, and South America. Associated attribute data, such as geographical coordinates, horizon (depth), and physical and chemical properties, are provided in a single relational database. The purpose of the library is to provide a resource for research and applications for sensing soil quality both in the laboratory and from space.
Facebook
TwitterVisible/near-infrared (VNIR) and mid-infrared (MIR) spectral library composed of twenty natural tephra samples from ten volcanic sources that span a range of compositions and components. The bulk, glass, and mineral phase compositions of each sample have been measured and VNIR and MIR spectra from four size fractions of each sample were collected. The VNIR spectral library has been expanded (ed. 2) to twenty-two samples.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset includes average near-infrared (NIR) reflectance spectra for 68 main-belt asteroids that were observed at the NASA Infrared Telescope Facility (IRTF), Mauna Kea, Hawaii, from April 2001 to January 2015. Raw NIR spectral data were obtained under mostly uniform instrumental conditions and include observations of the asteroids, extinction stars, and solar analog stars that were necessary for data reduction and production of the final average asteroid NIR reflectance spectra. SpecPR and Spextool were used during data reduction to produce the final spectra and both programs utilize similar functions that include sky background subtraction, telluric corrections, channel shifting, and averaging routines. The set of asteroids observed include a wide variety of taxonomic types and include V-, S-, M-, X-types that correspond to a wide variety of surface mineralogies, rock types, and potential meteorite analogs.
Facebook
TwitterSoil texture, vis-NIR spectra and derived soil chemistry were taken from soils sampled at Chequamegon Heterogeneous Ecosystem Energy-balance Study Enabled by a High-density Extensive Array of Detectors (CHEESEHEAD) ISFS forest sites. All soil samples were taken on September 28, 2019. Samples were taken within 5 m of the tower. Tower center coordinates are used. All samples were of the top 0-15 cm of soil mixed. They were then air-dried, ground, and sieved through a 2-mm sieve for analysis. See the readme document to learn about this dataset and particle size analysis, visible–near infrared (vis–NIR), and portable X-ray fluorescence (PXRF).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains all spectra published in the SIMP survey paper (http://adsabs.harvard.edu/abs/2016arXiv160706117R). Please reference it if you use any of the data.
All spectra are provided in three formats : FITS files, ascii TXT files, and PNG previews.
These data are part of the Montreal Spectral Library, located at https://jgagneastro.wordpress.com/the-montreal-spectral-library/
Facebook
Twitterhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/LXKFAShttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/LXKFAS
This dataset presents near infrared spectra of different soils samples acquired on two different spectrometers in two different labs. A first set of soil spectra (BIPEA soils from LAS), was used to compute one spectra transfer model thanks to the Piecewise Direct Standardization function (PDS). A second set of soil spectra,Plaine_de_Versailles soils, independant to the first one, was used to validate the transfer model. (2021-04-09).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the spectroscopic data obtained from a homemade NIR spectrometer developed for agricultural quality analysis, along with the calibration and validation of a model database for predicting agricultural soil properties. We collected NIR spectral data from 190 soil samples taken at a depth of 0-20 cm from agricultural areas in northern Thailand, including vegetable farms, orchards, and field crops. The acquisition process started by air-drying the soil and sieving it through 2.0 mm and 0.5 mm mesh. Six preprocessing techniques, including Savitzky-Golay smoothing, multiplicative scatter correction (MSC), standard normal variate (SNV), first derivative, second derivative, and mean centering, were used with partial least squares (PLS) regression to create the prediction model for soil organic matter and total carbon. Seventy percent of the sample was divided into calibration and the remaining thirty percent was validation. Our results demonstrate the effectiveness of these models. The NIR dataset spanning 900-1,700 nm proved to be an ideal wavelength range for developing a portable/handheld NIR spectrometer, with potential for further accuracy improvements through model refinement.