2 datasets found
  1. Dataset for "SpecTf: Transformers Enable Data-Driven Imaging Spectroscopy...

    • zenodo.org
    bin, csv, pdf
    Updated Jan 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jake Lee; Jake Lee; Michael Kiper; Michael Kiper; David R. Thompson; David R. Thompson; Philip Brodrick; Philip Brodrick (2025). Dataset for "SpecTf: Transformers Enable Data-Driven Imaging Spectroscopy Cloud Detection" [Dataset]. http://doi.org/10.5281/zenodo.14614218
    Explore at:
    bin, pdf, csvAvailable download formats
    Dataset updated
    Jan 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jake Lee; Jake Lee; Michael Kiper; Michael Kiper; David R. Thompson; David R. Thompson; Philip Brodrick; Philip Brodrick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SpecTf: Transformers Enable Data-Driven Imaging Spectroscopy Cloud Detection

    Summary

    Manuscript in review. Preprint: https://arxiv.org/abs/2501.04916

    This repository contains the dataset used to train and evaluate the Spectroscopic Transformer model for EMIT cloud screening.

    • spectf_cloud_labelbox.hdf5
      • 1,841,641 Labeled spectra from 221 EMIT Scenes.
    • spectf_cloud_mmgis.hdf5
      • 1,733,801 Labeled spectra from 313 EMIT Scenes.
      • These scenes were speciffically labeled to correct false detections by an earlier version of the model.
    • train_fids.csv
      • 465 EMIT scenes comprising the training set.
    • test_fids.csv
      • 69 EMIT scenes comprising the held-out validation set.

    v2 adds validation_scenes.pdf, a PDF displaying the 69 validation scenes in RGB and Falsecolor, their existing baseline cloud masks, as well as their cloud masks produced by the ANN and GBT reference models and the SpecTf model.

    Data Description

    221 EMIT Scenes were initially selected for labeling with diversity in mind. After sparse segmentation labeling of confident regions in Labelbox, up to 10,000 spectra were selected per-class per-scene to form the spectf_cloud_labelbox dataset. We deployed a preliminary model trained on these spectra on all EMIT scenes observed in March 2024, then labeled another 313 EMIT Scenes using MMGIS's polygonal labeling tool to correct false positive and false negative detections. After similarly sampling spectra from these scenes, A total of 3,575,442 spectra were labeled and sampled.

    The train/test split was randomly determined by scene FID to prevent the same EMIT scene from contributing spectra to both the training and validation datasets.

    Please refer to Section 4.2 in the paper for a complete description, and to our code repository for example usage and a Pytorch dataloader.

    Each hdf5 file contains the following arrays:

    • 'spectra'
    • 'fids'
      • The FID from which each spectrum was sampled
      • Binary string of shape (n,)
    • 'indices'
      • The (col, row) index from which each spectrum was sampled
      • Int64 of shape (n, 2)
    • 'labels'
      • Annotation label of each spectrum
        • 0 - "Clear"
        • 1 - "Cloud"
        • 2 - "Cloud Shadow" (Only for the Labelbox dataset, and this class was combined with the clear class for this work. See paper for details.)
          • label[label==2] = 0
      • Int64 of shape (n,2)

    Each hdf5 file contains the following attribute:

    • 'bands'
      • The band center wavelengths (nm) of the spectrum
      • Float64 of shape (268,)

    Acknowledgements

    The EMIT online mapping tool was developed by the JPL MMGIS team. The High Performance Computing resources used in this investigation were provided by funding from the JPL Information and Technology Solutions Directorate.

    This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).

    © 2024 California Institute of Technology. Government sponsorship acknowledged.

  2. Dataset for "Spectroscopic Transformer for Improved EMIT Cloud Masks"

    • zenodo.org
    bin, csv
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jake Lee; Jake Lee; Michael Kiper; Michael Kiper; David R. Thompson; David R. Thompson; Philip Brodrick; Philip Brodrick (2025). Dataset for "Spectroscopic Transformer for Improved EMIT Cloud Masks" [Dataset]. http://doi.org/10.5281/zenodo.14607938
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jake Lee; Jake Lee; Michael Kiper; Michael Kiper; David R. Thompson; David R. Thompson; Philip Brodrick; Philip Brodrick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spectroscopic Transformer for Improved EMIT Cloud Masks

    Summary

    Manuscript in preparation/submitted.

    This repository contains the dataset used to train and evaluate the Spectroscopic Transformer model for EMIT cloud screening.

    • spectf_cloud_labelbox.hdf5
      • 1,841,641 Labeled spectra from 221 EMIT Scenes.
    • spectf_cloud_mmgis.hdf5
      • 1,733,801 Labeled spectra from 313 EMIT Scenes.
      • These scenes were speciffically labeled to correct false detections by an earlier version of the model.
    • train_fids.csv
      • 465 EMIT scenes comprising the training set.
    • test_fids.csv
      • 69 EMIT scenes comprising the held-out validation set.

    Data Description

    221 EMIT Scenes were initially selected for labeling with diversity in mind. After sparse segmentation labeling of confident regions in Labelbox, up to 10,000 spectra were selected per-class per-scene to form the spectf_cloud_labelbox dataset. We deployed a preliminary model trained on these spectra on all EMIT scenes observed in March 2024, then labeled another 313 EMIT Scenes using MMGIS's polygonal labeling tool to correct false positive and false negative detections. After similarly sampling spectra from these scenes, A total of 3,575,442 spectra were labeled and sampled.

    The train/test split was randomly determined by scene FID to prevent the same EMIT scene from contributing spectra to both the training and validation datasets.

    Please refer to Section 4.2 in the paper for a complete description, and to our code repository for example usage and a Pytorch dataloader.

    Each hdf5 file contains the following arrays:

    • 'spectra'
    • 'fids'
      • The FID from which each spectrum was sampled
      • Binary string of shape (n,)
    • 'indices'
      • The (col, row) index from which each spectrum was sampled
      • Int64 of shape (n, 2)
    • 'labels'
      • Annotation label of each spectrum
        • 0 - "Clear"
        • 1 - "Cloud"
        • 2 - "Cloud Shadow" (Only for the Labelbox dataset, and this class was combined with the clear class for this work. See paper for details.)
          • label[label==2] = 0
      • Int64 of shape (n,2)

    Each hdf5 file contains the following attribute:

    • 'bands'
      • The band center wavelengths (nm) of the spectrum
      • Float64 of shape (268,)

    Acknowledgements

    The EMIT online mapping tool was developed by the JPL MMGIS team. The High Performance Computing resources used in this investigation were provided by funding from the JPL Information and Technology Solutions Directorate.

    This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).

    © 2024 California Institute of Technology. Government sponsorship acknowledged.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jake Lee; Jake Lee; Michael Kiper; Michael Kiper; David R. Thompson; David R. Thompson; Philip Brodrick; Philip Brodrick (2025). Dataset for "SpecTf: Transformers Enable Data-Driven Imaging Spectroscopy Cloud Detection" [Dataset]. http://doi.org/10.5281/zenodo.14614218
Organization logo

Dataset for "SpecTf: Transformers Enable Data-Driven Imaging Spectroscopy Cloud Detection"

Explore at:
bin, pdf, csvAvailable download formats
Dataset updated
Jan 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jake Lee; Jake Lee; Michael Kiper; Michael Kiper; David R. Thompson; David R. Thompson; Philip Brodrick; Philip Brodrick
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

SpecTf: Transformers Enable Data-Driven Imaging Spectroscopy Cloud Detection

Summary

Manuscript in review. Preprint: https://arxiv.org/abs/2501.04916

This repository contains the dataset used to train and evaluate the Spectroscopic Transformer model for EMIT cloud screening.

  • spectf_cloud_labelbox.hdf5
    • 1,841,641 Labeled spectra from 221 EMIT Scenes.
  • spectf_cloud_mmgis.hdf5
    • 1,733,801 Labeled spectra from 313 EMIT Scenes.
    • These scenes were speciffically labeled to correct false detections by an earlier version of the model.
  • train_fids.csv
    • 465 EMIT scenes comprising the training set.
  • test_fids.csv
    • 69 EMIT scenes comprising the held-out validation set.

v2 adds validation_scenes.pdf, a PDF displaying the 69 validation scenes in RGB and Falsecolor, their existing baseline cloud masks, as well as their cloud masks produced by the ANN and GBT reference models and the SpecTf model.

Data Description

221 EMIT Scenes were initially selected for labeling with diversity in mind. After sparse segmentation labeling of confident regions in Labelbox, up to 10,000 spectra were selected per-class per-scene to form the spectf_cloud_labelbox dataset. We deployed a preliminary model trained on these spectra on all EMIT scenes observed in March 2024, then labeled another 313 EMIT Scenes using MMGIS's polygonal labeling tool to correct false positive and false negative detections. After similarly sampling spectra from these scenes, A total of 3,575,442 spectra were labeled and sampled.

The train/test split was randomly determined by scene FID to prevent the same EMIT scene from contributing spectra to both the training and validation datasets.

Please refer to Section 4.2 in the paper for a complete description, and to our code repository for example usage and a Pytorch dataloader.

Each hdf5 file contains the following arrays:

  • 'spectra'
  • 'fids'
    • The FID from which each spectrum was sampled
    • Binary string of shape (n,)
  • 'indices'
    • The (col, row) index from which each spectrum was sampled
    • Int64 of shape (n, 2)
  • 'labels'
    • Annotation label of each spectrum
      • 0 - "Clear"
      • 1 - "Cloud"
      • 2 - "Cloud Shadow" (Only for the Labelbox dataset, and this class was combined with the clear class for this work. See paper for details.)
        • label[label==2] = 0
    • Int64 of shape (n,2)

Each hdf5 file contains the following attribute:

  • 'bands'
    • The band center wavelengths (nm) of the spectrum
    • Float64 of shape (268,)

Acknowledgements

The EMIT online mapping tool was developed by the JPL MMGIS team. The High Performance Computing resources used in this investigation were provided by funding from the JPL Information and Technology Solutions Directorate.

This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).

© 2024 California Institute of Technology. Government sponsorship acknowledged.

Search
Clear search
Close search
Google apps
Main menu