Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Manuscript in review. Preprint: https://arxiv.org/abs/2501.04916
This repository contains the dataset used to train and evaluate the Spectroscopic Transformer model for EMIT cloud screening.
v2 adds validation_scenes.pdf, a PDF displaying the 69 validation scenes in RGB and Falsecolor, their existing baseline cloud masks, as well as their cloud masks produced by the ANN and GBT reference models and the SpecTf model.
221 EMIT Scenes were initially selected for labeling with diversity in mind. After sparse segmentation labeling of confident regions in Labelbox, up to 10,000 spectra were selected per-class per-scene to form the spectf_cloud_labelbox dataset. We deployed a preliminary model trained on these spectra on all EMIT scenes observed in March 2024, then labeled another 313 EMIT Scenes using MMGIS's polygonal labeling tool to correct false positive and false negative detections. After similarly sampling spectra from these scenes, A total of 3,575,442 spectra were labeled and sampled.
The train/test split was randomly determined by scene FID to prevent the same EMIT scene from contributing spectra to both the training and validation datasets.
Please refer to Section 4.2 in the paper for a complete description, and to our code repository for example usage and a Pytorch dataloader.
Each hdf5 file contains the following arrays:
Each hdf5 file contains the following attribute:
The EMIT online mapping tool was developed by the JPL MMGIS team. The High Performance Computing resources used in this investigation were provided by funding from the JPL Information and Technology Solutions Directorate.
This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).
© 2024 California Institute of Technology. Government sponsorship acknowledged.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Manuscript in preparation/submitted.
This repository contains the dataset used to train and evaluate the Spectroscopic Transformer model for EMIT cloud screening.
221 EMIT Scenes were initially selected for labeling with diversity in mind. After sparse segmentation labeling of confident regions in Labelbox, up to 10,000 spectra were selected per-class per-scene to form the spectf_cloud_labelbox dataset. We deployed a preliminary model trained on these spectra on all EMIT scenes observed in March 2024, then labeled another 313 EMIT Scenes using MMGIS's polygonal labeling tool to correct false positive and false negative detections. After similarly sampling spectra from these scenes, A total of 3,575,442 spectra were labeled and sampled.
The train/test split was randomly determined by scene FID to prevent the same EMIT scene from contributing spectra to both the training and validation datasets.
Please refer to Section 4.2 in the paper for a complete description, and to our code repository for example usage and a Pytorch dataloader.
Each hdf5 file contains the following arrays:
Each hdf5 file contains the following attribute:
The EMIT online mapping tool was developed by the JPL MMGIS team. The High Performance Computing resources used in this investigation were provided by funding from the JPL Information and Technology Solutions Directorate.
This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).
© 2024 California Institute of Technology. Government sponsorship acknowledged.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Manuscript in review. Preprint: https://arxiv.org/abs/2501.04916
This repository contains the dataset used to train and evaluate the Spectroscopic Transformer model for EMIT cloud screening.
v2 adds validation_scenes.pdf, a PDF displaying the 69 validation scenes in RGB and Falsecolor, their existing baseline cloud masks, as well as their cloud masks produced by the ANN and GBT reference models and the SpecTf model.
221 EMIT Scenes were initially selected for labeling with diversity in mind. After sparse segmentation labeling of confident regions in Labelbox, up to 10,000 spectra were selected per-class per-scene to form the spectf_cloud_labelbox dataset. We deployed a preliminary model trained on these spectra on all EMIT scenes observed in March 2024, then labeled another 313 EMIT Scenes using MMGIS's polygonal labeling tool to correct false positive and false negative detections. After similarly sampling spectra from these scenes, A total of 3,575,442 spectra were labeled and sampled.
The train/test split was randomly determined by scene FID to prevent the same EMIT scene from contributing spectra to both the training and validation datasets.
Please refer to Section 4.2 in the paper for a complete description, and to our code repository for example usage and a Pytorch dataloader.
Each hdf5 file contains the following arrays:
Each hdf5 file contains the following attribute:
The EMIT online mapping tool was developed by the JPL MMGIS team. The High Performance Computing resources used in this investigation were provided by funding from the JPL Information and Technology Solutions Directorate.
This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).
© 2024 California Institute of Technology. Government sponsorship acknowledged.