Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains images of various radio frequency (RF) signals captured on waterfall plots using a spectrum analyzer. It is designed to aid in the classification and identification of different types of RF signals commonly encountered in wireless communications and radio technologies.
The dataset comprises waterfall plot images of RF signals across 21 different classes, representing a wide range of communication protocols, technologies, and signal types. Each image in the dataset is a visual representation of a specific RF signal's frequency and time characteristics.
This dataset is primarily suited for:
image: A waterfall plot image of the RF signalclass: The label identifying the type of RF signalThe dataset includes the following 21 classes of RF signals:
This dataset contains images of radio frequency (RF) signals captured as waterfall plots using a spectrum analyzer. The dataset is organized as follows:
- datasets
--- signal class
----- image
For example:
- datasets
--- bluetooth
----- c17afe0fe5cc3cc1308605cf390ecbb5.png
The images in this dataset were captured using a spectrum analyzer, which visualizes RF signals as waterfall plots. These plots show the frequency content of a signal over time, with color representing signal strength.
RF signals were collected across various frequency bands using appropriate antennas and receivers. The spectrum analyzer was used to generate waterfall plots for each captured signal. Care was taken to ensure a diverse representation of signal types and conditions.
[MIT]
We welcome contributions to improve this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We provide a preprocessed dataset to enable model development for the detection/classification of drone RF signals. It consists of the non-overlapping signal vectors of length of 1048576 samples, which corresponds to approx. 74.9ms at 14MHz. We have also added Labnoise (Bluetooth, Wi-Fi, Amplifier) and Gaussian noise to the dataset.
After normalization, the drone signals were mixed with either Labnoise (50%) or Gaussian noise (50%). The noise class was created by mixing Labnoise and Gaussian noise in all possible combinations (i.e., Labnoise + Labnoise, Labnoise + Gaussian noise, Gaussian noise + Labnoise, and Gaussian noise + Gaussian noise). For the drone signal classes, as for the noise class, the number of samples for each level of SNR is equally distributed over the interval of SNR in [-20, 30]dB in steps of 2dB, i.e., 679 - 685 samples per SNR. The resulting number of samples per class is shown in the Table below.
| DJI | FutabaT14 | FutabaT7 | Graupner | Taranis | Turnigy | Noise |
|---|---|---|---|---|---|---|
| 1280 | 3472 | 801 | 801 | 1663 | 855 | 8872 |
See https://github.com/sgluege/Robust-Drone-Detection-and-Classification for a script to load and inspect the dataset. Further you'll find code to train and evaluate a model.
Further information about the data, and how to build a classifier, can be found in our related manuscript. Please cite it if you find it useful.
S. Glüge, M. Nyfeler, A. Aghaebrahimian, N. Ramagnano and C. Schüpbach, "Robust Low-Cost Drone Detection and Classification Using Convolutional Neural Networks in Low SNR Environments," in IEEE Journal of Radio Frequency Identification, vol. 8, pp. 821-830, 2024, doi: 10.1109/JRFID.2024.3487303
Bibtex:
@ARTICLE{10737118,
author={Glüge, Stefan and Nyfeler, Matthias and Aghaebrahimian, Ahmad and Ramagnano, Nicola and Schüpbach, Christof},
journal={IEEE Journal of Radio Frequency Identification},
title={Robust Low-Cost Drone Detection and Classification Using Convolutional Neural Networks in Low SNR Environments},
year={2024},
volume={8},
number={},
pages={821-830},
doi={10.1109/JRFID.2024.3487303}
}
Facebook
TwitterThis software tool generates simulated radar signals and creates RF datasets. The datasets can be used to develop and test detection algorithms by utilizing machine learning/deep learning techniques for the 3.5 GHz Citizens Broadband Radio Service (CBRS) or similar bands. In these bands, the primary users of the band are federal incumbent radar systems. The software tool generates radar waveforms and randomizes the radar waveform parameters. The pulse modulation types for the radar signals and their parameters are selected based on NTIA testing procedures for ESC certification, available at http://www.its.bldrdoc.gov/publications/3184.aspx. Furthermore, the tool mixes the waveforms with interference and packages them into one RF dataset file. The tool utilizes a graphical user interface (GUI) to simplify the selection of parameters and the mixing process. A reference RF dataset was generated using this software. The RF dataset is published at https://doi.org/10.18434/M32116.
Facebook
Twitterhttps://www.nist.gov/director/licensinghttps://www.nist.gov/director/licensing
The RF dataset can be used to develop and test detection algorithms for the 3.5 GHz CBRS or similar bands where the primary users of the band are federal incumbent radar systems. The dataset consists of synthetically generated radar waveforms with added white Gaussian noise. The RF dataset is suitable for development and testing of machine/deep learning detection algorithms. A large number of parameters of the waveforms are randomized across the dataset. Due to its large size, the dataset is divided into groups, and each group consists of multiple files. For more information about the dataset, refer to: R. Caromi, M. Souryal, and T. Hall, "RF Dataset of Incumbent Radar Systems in the 3.5 GHz CBRS Band," Journal of Research of the National Institute of Standards and Technology. (in press). In addition, the metadata of the dataset is summarized in "Data Dictionary of 3.5 GHz Radar Waveforms" [pdf] accompanying the data. For more information about the motivation behind this RF dataset, refer to: T. Hall, R. Caromi, M. Souryal, and A. Wunderlich, "Reference Datasets for Training and Evaluating RF Signal Detection and Classification Models," to appear in Proc. IEEE GLOBECOM Workshop on Advancements in Spectrum Sharing, Dec. 2019.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Deepsig Inc. has created a small corpus of standard datasets which can be used for original and reproducible research, experimentation, measurement and comparison by fellow scientists and engineers. These datasets allow machine learning researchers with new ideas to dive directly into an important technical area without the need for collecting or generating new datasets, and allows for direct comparison to efficacy of prior work.
This dataset includes both synthetic simulated channel effects and over-the-air recordings of 24 digital and analog modulation types which has been heavily validated.
This dataset was used for Over-the-air deep learning based radio signal classification published 2017 in IEEE Journal of Selected Topics in Signal Processing, which provides additional details and description of the dataset. Data are stored in hdf5 format as complex floating point values, with 2 million examples, each 1024 samples long.
The original content is those that is provided in the compressed archive that is accessible at the Deepsig Inc. website.
The dataset is provided in the "GOLD_XYZ_OSC.0001_1024.hdf5" file. The HDF5 format is designed to store and organize large amounts of data. See this list of libraries and interfaces for HDF5 manipulation.
The dataset exhibits the following structure: - 24 modulations: OOK, ASK4, ASK8, BPSK, QPSK, PSK8, PSK16, PSK32, APSK16, APSK32, APSK64, APSK128, QAM16, QAM32, QAM64, QAM128, QAM256, AM_SSB_WC, AM_SSB_SC, AM_DSB_WC, AM_DSB_SC, FM, GMSK and OQPS. - 26 SNRs per modulation (-20 dB to +30 dB in steps of 2dB). - 4096 frames per modulation-SNR combination. - 1024 complex time-series samples per frame. - Samples as floating point in-phase and quadrature (I/Q) components, resulting in a (1024,2) frame shape. - 2.555.904 frames in total.
Each frame can be retrieved by accessing the HDF5 groups: - X: I/Q components of the frame; - Y: Modulation of the frame (one-hot encoded) - Z: SNR of the frame
Data consist of 24 Modulations --> 26 SNR --> 4096 Frames --> (1024, 2) I/Q Samples. Below is a structural example of the dataset: ``` python
Modulation 0: { SNR -20: [ Frame 0: # sample 0 (e.g. hdf5_file['X'][0]) [ [I0, Q0], # sample 0.0 (e.g. hdf5_file['X'][0][0]) [I1, Q1], # sample 0.1 (e.g. hdf5_file['X'][0][1]) ..., [I1023, Q1023] # sample 0.1023 (e.g. hdf5_file['X'][0][1023]) ], Frame 1: [ ... ] # sample 1 (e.g. hdf5_file['X'][1]) ..., Frame 4094: [ ... ] # sample 4094 (e.g. hdf5_file['X'][4094]) Frame 4095: [ ... ] # sample 4095 (e.g. hdf5_file['X'][4095]) ] SNR -18: [ Frame 0: [ ... ] # sample 4096 (e.g. hdf5_file['X'][4096]) ..., Frame 4095: [ ... ] # sample 8191 (e.g. hdf5_file['X'][8191]) ] ... SNR 30: [ Frame 0: [ ... ] # sample 102400 (e.g. hdf5_file['X'][102400]) ..., Frame 4095: [ ... ] # sample 106495 (e.g. hdf5_file['X'][106495]) ] } ... Modulation 23: { ... } ```
The original file provided with the dataset. The order of the modulation classes in this file is incorrect. This is a known issue, please consider the order of the classes provided in the paper Over-the-Air Deep Learning Based Radio Signal Classification. A text file and a json file with the fixed classes are provided along this dataset.
The original license provided with the dataset. Dataset provided by Deepsig Inc. and licensed under Creative Commons Attribution - NonCommercial - ShareAlike 4.0 License (CC BY-NC-SA 4.0)
The additional content is those that is not provided in the compressed archive that is accessible at the Deepsig Inc. website.
A text file with the classes in the correct order.
A JSON file with the classes in the correct order.
A shortcut to Deepsig Inc. datasets.
This dataset is provided by Deepsig Inc..
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains measurements of radio-frequency electromagnetic emissions from a home-built sender module for BB84 quantum key distribution. The goal of these measurements was to evaluate information leakage through this side-channel. This dataset supplements our publication and allows to reproduce our results together with the source code hosted at GitHub (and also on Zenodo via integration with GitHub).The measurements are performed using a magnetic near-field probe, an amplifier and an oscilloscope. The dataset contains raw measured data in the file format output by the oscilloscope. Use our source code to make use of it. Detailed descriptions of measurement procedure can be found in our paper and in the metadata JSON files found within the dataset.
Commented list of datasets
This file lists the datasets that were analyzed and reported on in the paper. The datasets in the list refer to directories here. Note that most of the datasets contain additional files with metadata, which detail where and how the measurements were performed. The mentioned Jupyter notebooks refer to the source code repository https://github.com/XQP-Munich/EmissionSecurityQKD (not included in this dataset). Most of those notebooks output JSON files storing results. The processed JSON files are also included in the source code repository.
In naming of datasets,
Antenna refers to the log-periodic dipole antenna. All datasets that do not contain Antenna in their name are recorded with the magnetic near-field probe.
Rev1 refers to the initial electronics design, while rev2 refers to the revised electronics design which contains countermeasures aiming to reduce emissions.
Shielding refers to measurements where the device is enclosed in a metallic shielding and the measurement takes place outside the shielding.
Rotation refers to orientation of the magnetic near-field probe at the same spacial location
Datasets collected with near-field probe for Rev1 electronics
Rev1Distance: contains measurements at different distances from the Rev1 electronics performed above the FPGA. The deep learning attack is analyzed in TEMPEST_ATTACK.ipynb. The amplitude is analyzed in get_raw_data_RMS_amplitude.ipynb.
Rev12D: different locations on a 2d grid at a constant distance from the electronics. The deep learning attack is analyzed in TEMPEST_ATTACK.ipynb.
Rev130meas2.5cm: 30 measurements above the FPGA at a hight of 2.5cm. Used to evaluate how much amount of training data affects neural network performance. The deep learning attack is analyzed in notebooks TEMPEST_ATTACK*.ipynb. In particular, TEMPEST_ATTACK_VARY_TRAINING_DATA.ipynb is used on this dataset.
Rev1Rotation10deg contains a measurement for varying orientation of the probe at the same location. This is not mentioned in the paper and is only included for completeness. The deep learning attack is analyzed in notebooks TEMPEST_ATTACK*.ipynb.
Rev1TEMPESTShieldingFPGA Measurements with and without shielding at 4cm above the FPGA.
TEMPEST_ATTACK*.ipynb.Datasets collected with near-field probe for Rev2 electronics
Rev2Distance contains measurements at different distances from the Rev2 electronics performed above the FPGA.
Rev22D and Rev22Dstart_7_0 contain measurements on a 2d grid performed on the revised electronics. The dataset is split in two directories because the measurement procedure crashed in the middle. This split structure was kept in order to maintain consistency with the automatic metadata.
Rev230meas2.5cm 30 measurements above the FPGA at a hight of 2.5cm. Used to evaluate how much amount of training data affects neural network performance. The deep learning attack is analyzed in notebooks TEMPEST_ATTACK*.ipynb. In particular, TEMPEST_ATTACK_VARY_TRAINING_DATA.ipynb is used on this dataset.
Other datasets
BackgroundTuesday background measurement (QKD device is not powered at all) performed with near-field probe on 2022 June 21st.
BackgroundSaturday background measurement (QKD device is not powered at all) performed with near-field probe on 2022 June 11th.
AntennaSpectra Dataset of spectra directly recorded by the oscilloscope. Used to demonstrate ability of telling apart the situation of sending QKD key (standard operation) and having the device turned on but not sending any key at a distance. Analyzed in notebook Comparing_KeyNokey_Measurements.ipynb.
Rev2ShieldingAntenna Raw amplitude measurements with log-periodic dipole antenna on Rev2 electronics including shielding enclosure, collected at various distances. None of our attacks against this scenario were successful. The dataset represents a challenge to test more advanced attacks using improved data processing.
Facebook
Twitterhttps://www.nist.gov/open/licensehttps://www.nist.gov/open/license
This project aims to create a comprehensive framework for generating radio frequency (RF) datasets, designing deep learning (DL) detectors, and evaluating their detection performance using both simulated and experimental test data. The proposed tools and techniques are developed in the context of dynamic spectrum use for the 3.5 GHz Citizens Broadband Radio Service (CBRS), but they can be utilized and expanded for standardization of machine learned spectrum awareness technologies and methods. This dataset consists of pre-trained DL models for radar detection in the CBRS band using simulated waveforms. The code for creating and using these models is available at https://github.com/usnistgov/BaselineDeepLearningRadarDetectors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data contains two sets of datasets. One for pKb, and the other for LogP machine learning prediction. The datasets contain several descriptors generated using RDKit and density functional theory (DFT).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset consists of labeled RF signal feature vectors, each row representing a single signal sample. Features may include signal strength, bandwidth, center frequency, modulation index, mean amplitude, FFT-based values, and other time/frequency domain attributes. Labels correspond to predefined signal types or modulation classes such as AM, FM, QPSK, etc.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
2, 3, 4, 5, 6, 7 respectively represent UAVs--Parrot, Fimi, Phantom 20, Mavic air, Mavic mini, and Phantom 10. Outdoors, pilot the UAVs at 50 m from the USRP.Also Cite the following Paper: P. Podder, M. Zawodniok and S. Madria, "Deep Learning for UAV Detection and Classification via Radio Frequency Signal Analysis," 2024 25th IEEE International Conference on Mobile Data Management (MDM), Brussels, Belgium, 2024, pp. 165-174, doi: 10.1109/MDM61037.2024.00040.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The generated dataset contains radio frequency (RF) signal data for a period of one month, from May 5, 2023, to June 11, 2023 collected via SDR hardware interfaced to DragonOS Focal. Each row of the dataset represents a single RF signal observation, with various features that describe the signal and its environment.
The dataset can be used for tasks such as machine learning, statistical analysis, and signal processing.
The generated dataset can be used for various types of analysis and predictive analysis, which can help machine learning scientists in developing and testing models for RF signal processing, interference detection and mitigation, and device performance optimization. Some of the possible analysis and predictive analysis that can be performed using this data are:
Signal Classification: The dataset can be used to classify RF signals based on their modulation type, frequency, bandwidth, and other features. This can help in identifying specific types of signals, such as voice or data transmissions, and can aid in tasks such as signal detection, interception, and decoding.
Interference Detection: The dataset contains information about the type and level of interference present in the environment. This can be used to develop models for detecting and mitigating interference, which can improve the overall quality of the RF signal.
Device Performance Optimization: The dataset includes information about the type of RF device used to generate the signal, as well as its CPU usage, memory usage, and battery level. This can be used to develop models for optimizing the performance of RF devices, such as reducing power consumption or improving signal quality.
Weather Condition Analysis: The dataset provides information about the weather conditions at the time of signal observation, including temperature, humidity, wind speed, precipitation, and weather condition. This ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for training and evaluating RFI detection schemes representing MeerKat instrumentation and predominantly satellite-based contamination. These datasets are produced using Tabascal and output in hdf5 format. The choice of format is to allow for easy use with machine-learning workflows, not other astronomy pipelines (for example, measurement sets). These datasets are prepared for immediate loading with Tensorflow. The attached config.json files describe the parameters used to generate these datasets.
Dataset parameters
Name
Num Satellite Sources
Num Ground RFI Sources
obs_100AST_0SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
0
0
obs_100AST_1SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
1
0
obs_100AST_1SAT_3GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
1
3
obs_100AST_2SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
2
0
obs_100AST_2SAT_3GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
2
3
Using simulated data allows for access to ground truth for noise contamination. As such, these datasets contain the observation visibility amplitudes (without noise), noise visibilities and boolean pixel-wise masks at several thresholds on the noise visibilities. We outline the dimensions of all datasets below:
Dataset Dimensions
Field
vis
masks_orig
masks_0
masks_1
masks_2
masks_4
masks_8
masks_16
Datatype
float32
float32
bool
bool
bool
bool
bool
bool
Of course, one can produce masks at arbitrary thresholds, but for convenience, we include several pre-computed options.
All datasets and all fields have the dimensions 512, 512, 512, 1 (baseline, time, frequency, amplitude/mask)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These data were used in the First RF Spectrum-Sharing Challenge which was held in 2024-2025, culminating in announcing the winner AiRANACULUS on January 30, 2025. The first RFSSC was co-sponsored by the United States Air Force Research Laboratory (AFRL) and the National Security Innovation Network (NSIN). The first RFSSC was a data-centric contest open to US academic and industrial organizations intended to spur innovation in machine learning and signal processing for automatic radio-frequency scene analysis (RFSA).
Facebook
TwitterThe long-term monitoring of gross primary production (GPP) is crucial to the assessment of the carbon cycle of terrestrial ecosystems. In this study, a well-known machine learning model (Random Forest, RF) is established to reconstruct the global GPP dataset named ECGC_GPP. The model distinguished nine functional plant types, including C3 and C4 crops, using eddy fluxes, meteorological variables, and leaf area index as training data of the RF model. Based on ERA5_Land and the corrected GEOV2 data, the global monthly GPP dataset at a 0.05-degree resolution from 1999 to 2019 was estimated. The results showed that the RF model could explain 74.81% of the monthly variation of GPP in the testing dataset, of which the average contribution of Leaf Area Index (LAI) reached 41.73%. The average annual and standard deviation of GPP during 1999–2019 were 117.14 ± 1.51 Pg C yr-1, with an upward trend of 0.21 Pg C yr-2 (p < 0.01). By using the plant functional type classification, the underestimat..., We unified the ERA5_Land and the corrected GEOV2 datasets to 0.05 degree and monthly scales. The meteorological and remote sensing datasets were classified by the eight PFTs to estimate the GPP of different PFT. Particularly, we established site-level PFT training models for CRO_C3 and CRO_C4, respectively, due to their significant differences. The CRO cells were a mixture of CRO_C3 and CRO_C4. Therefore, trained CRO_C3 and CRO_C4 models were both applied to the CRO cells and multiplied by their respective proportions to generate the final GPP estimation of CRO. This is what we designed to improve the current situation of GPP underestimation over CRO_C4 dominated regions. In this way, we generated a 0.05 degree and monthly scales global GPP dataset (ECGC_GPP) from 1999 to 2019., The ECGC_GPP dataset is stored in .nc file format and can be opened using Matlab or Python.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Given the importance of datasets for sensing-communication integration research, a novel simulation platform for constructing communication and multi-modal sensory dataset is developed. The developed platform integrates three high-precision software, i.e., AirSim, WaveFarer, and Wireless InSite, and further achieves in-depth integration and precise alignment of them. Based on the developed platform, a new synthetic intelligent multi-modal sensing-communication dataset for Synesthesia of Machines (SoM), named SynthSoM, is proposed. The SynthSoM dataset contains various air-ground multi-link cooperative scenarios with comprehensive conditions, including multiple weather conditions, times of the day, intelligent agent densities, frequency bands, and antenna types. The SynthSoM dataset encompasses multiple data modalities, including radio-frequency (RF) channel large-scale and small-scale fading data, RF millimeter wave (mmWave) radar sensory data, and non-RF sensory data, e.g., RGB images, depth maps, and light detection and ranging (LiDAR) point clouds. The quality of SynthSoM dataset is validated via statistics-based qualitative inspection and evaluation metrics through machine learning (ML) via real-world measurements. The SynthSoM dataset is open-sourced and provides consistent data for cross-comparing SoM-related algorithms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the datasets used for the study titled: A Comparison framework for deep learning RFI detection algorithms in radio astronomy. These files are made publicly available as an additional resource to the submission of the author's Masters degree at Stellenbosch University. The detection is done in the field of radio astronomy. Each dataset consists of images/spectrograms/waterfall plots for baselines, and the corresponding binary mask for each image. The datasets can be used to train machine learning models, or for the case of this study, supervised fully convolutional neural networks.
The LOFAR datasets consists of real observations and was slightly modified from https://zenodo.org/record/6724065. See this resource regarding the observational parameters used to retrieve the data from the LOFAR Long Term Archive.The HERA dataset consists of simulated observations generated with hera_sim (https://readthedocs.org/projects/hera-sim/). The 28 March dataset consists of accurate pixel-perfect binary masks for each image. The 20 July dataset is identical to the first, except the binary masks are generated with AOFlagger. All three datasets have a test set stored with pixel-perfected simulation masks (HERA) or expert hand labeled masks (LOFAR).
The csv file contains the results of all trained models and and has fields for: model class, #filters, #FLOPS, #weights, preprocessing methods, train, validation and test accuracy scores as well as list of (threshold, FPR, TPR) values to generate receiver operating characteristic curves. See https://github.com/CharlDuToit/RFI-NLN to visualize the results, to train new models.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Machine learning classifiers trained on class imbalanced data are prone to overpredict the majority class. This leads to a larger misclassification rate for the minority class, which in many real-world applications is the class of interest. For binary data, the classification threshold is set by default to 0.5 which, however, is often not ideal for imbalanced data. Adjusting the decision threshold is a good strategy to deal with the class imbalance problem. In this work, we present two different automated procedures for the selection of the optimal decision threshold for imbalanced classification. A major advantage of our procedures is that they do not require retraining of the machine learning models or resampling of the training data. The first approach is specific for random forest (RF), while the second approach, named GHOST, can be potentially applied to any machine learning classifier. We tested these procedures on 138 public drug discovery data sets containing structure–activity data for a variety of pharmaceutical targets. We show that both thresholding methods improve significantly the performance of RF. We tested the use of GHOST with four different classifiers in combination with two molecular descriptors, and we found that most classifiers benefit from threshold optimization. GHOST also outperformed other strategies, including random undersampling and conformal prediction. Finally, we show that our thresholding procedures can be effectively applied to real-world drug discovery projects, where the imbalance and characteristics of the data vary greatly between the training and test sets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Estimation of fruit quality parameters are usually based on destructive techniques which are tedious, costly and unreliable when dealing with huge amounts of fruits. Alternatively, non–destructive techniques such as image processing and spectral reflectance would be useful in rapid detection of fruit quality parameters. This research study aimed to assess the potential of image processing, spectral reflectance indices (SRIs), and machine learning models such as decision tree (DT) and random forest (RF) to qualitatively estimate characteristics of mandarin and tomato fruits at different ripening stages. Quality parameters such as chlorophyll a (Chl a), chlorophyll b (Chl b), total soluble solids (TSS), titratable acidity (TA), TSS/TA, carotenoids (car), lycopene and firmness were measured. The results showed that Red-Blue-Green (RGB) indices and newly developed SRIs demonstrated high efficiency for quantifying different fruit properties. For example, the R2 of the relationships between all RGB indices (RGBI) and measured parameters varied between 0.62 and 0.96 for mandarin and varied between 0.29 and 0.90 for tomato. The RGBI such as visible atmospheric resistant index (VARI) and normalized red (Rn) presented the highest R2 = 0.96 with car of mandarin fruits. While excess red vegetation index (ExR) presented the highest R2 = 0.84 with car of tomato fruits. The SRIs such as RSI 710,600, and R730,650 showed the greatest R2 values with respect to Chl a (R2 = 0.80) for mandarin fruits while the GI had the greatest R2 with Chl a (R2 = 0.68) for tomato fruits. Combining RGB and SRIs with DT and RF models would be a robust strategy for estimating eight observed variables associated with reasonable accuracy. Regarding mandarin fruits, in the task of predicting Chl a, the DT-2HV model delivered exceptional results, registering an R2 of 0.993 with an RMSE of 0.149 for the training set, and an R2 of 0.991 with an RMSE of 0.114 for the validation set. As well as for tomato fruits, the DT-5HV model demonstrated exemplary performance in the Chl a prediction, achieving an R2 of 0.905 and an RMSE of 0.077 for the training dataset, and an R2 of 0.785 with an RMSE of 0.077 for the validation dataset. The overall outcomes showed that the RGB, newly SRIs as well as DT and RF based RGBI, and SRIs could be used to evaluate the measured parameters of mandarin and tomato fruits.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets represent predictions and associated probabilities using four machine learning methods, associated with collection: Mapping Wetlands with High Resolution Planet SuperDove Satellite Imagery: An Assessment of Machine Learning Models Across the Diverse Waterscapes of New Zealand (10.26021/canterburynz.c.7848596).The following datasets are available:HGB predictionHGB probabilityMLPC predictionMLPC probabilityRandom forest predictionRandom forest probability [this dataset]XGBoost predictionXGBoost probabilityFor details of the models developed, please see the collection and associated paper. The following files are available in each dataset, each representing an area within New Zealand:xxxxx_mmm_prediction.tif: model prediction, encoded as 8-bit integers where 1 is predicted as wetland (>50% probability), and NA (no data) is non-wetland.xxxxx_mmm_probability.tif: model wetland probability, encoded as 16-bit integers, with probability values from 0 to 1 rescaled from 0 to 10,000. Divide the values by 10,000 to obtain probabilities to four decimal places.In the tile filenames, xxxxx refers to the UUID of the grid area, which can be found in the file nzgrid_uuid.gpkg, and mmm is a code which refers to the model used:hgb: histogram gradient boostmlpc: multi-layer perceptron classificationrf: random forestxgb: extreme gradient boostingIn addition to the tif images, two virtual raster tile files are included to enable mapping at the national scale:_mmm_prediction.vrt_mmm_probability.vrtAll tif images are saved using cloud optimised geotiff (COG), which makes them fast to display even at a national level, although increases the data size. Total size is around 700 MB for the prediction datasets, and ~75 GB for the probability datasets.Metadata for the Planet SuperDove imagery used for each pixel of the predictions is available here: https://doi.org/10.26021/canterburynz.29231837.v
Facebook
TwitterThe dataset used in this paper for RF-based UAV detection and identification system using machine learning.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains images of various radio frequency (RF) signals captured on waterfall plots using a spectrum analyzer. It is designed to aid in the classification and identification of different types of RF signals commonly encountered in wireless communications and radio technologies.
The dataset comprises waterfall plot images of RF signals across 21 different classes, representing a wide range of communication protocols, technologies, and signal types. Each image in the dataset is a visual representation of a specific RF signal's frequency and time characteristics.
This dataset is primarily suited for:
image: A waterfall plot image of the RF signalclass: The label identifying the type of RF signalThe dataset includes the following 21 classes of RF signals:
This dataset contains images of radio frequency (RF) signals captured as waterfall plots using a spectrum analyzer. The dataset is organized as follows:
- datasets
--- signal class
----- image
For example:
- datasets
--- bluetooth
----- c17afe0fe5cc3cc1308605cf390ecbb5.png
The images in this dataset were captured using a spectrum analyzer, which visualizes RF signals as waterfall plots. These plots show the frequency content of a signal over time, with color representing signal strength.
RF signals were collected across various frequency bands using appropriate antennas and receivers. The spectrum analyzer was used to generate waterfall plots for each captured signal. Care was taken to ensure a diverse representation of signal types and conditions.
[MIT]
We welcome contributions to improve this dataset.