In the sugar industry, it is very useful to find the sizes and shape types of crystals of A, B, and C Mascott. The major hurdle in this task is that it is very hard to detect crystals with only image processing techniques. A deep learning model can be more accurate to get the segmented result. This dataset contains microscopic images of mascot samples and their respective masks.
Dataset is in h5py file. There are two types of records: original images, and their masks. You need to install h5py before using this dataset. h5py can be installed by running the following command.
pip install h5py
To access the files run following.
import h5py
f = h5py.File('dataset.h5','r')
images,masks = (f['images'][:],f['masks'][:])
GLAH06 is used in conjunction with GLAH05 to create the Level-2 altimetry products. Level-2 altimetry data provide surface elevations for ice sheets (GLAH12), sea ice (GLAH13), land (GLAH14), and oceans (GLAH15). Data also include the laser footprint geolocation and reflectance, as well as geodetic, instrument, and atmospheric corrections for range measurements. The Level-2 elevation products, are regional products archived at 14 orbits per granule, starting and stopping at the same demarcation (± 50° latitude) as GLAH05 and GLAH06. Each regional product is processed with algorithms specific to that surface type. Surface type masks define which data are written to each of the products. If any data within a given record fall within a specific mask, the entire record is written to the product. Masks can overlap: for example, non-land data in the sea ice region may be written to the sea ice and ocean products. This means that an algorithm may write the same data to more than one Level-2 product. In this case, different algorithms calculate the elevations in their respective products. The surface type masks are versioned and archived at NSIDC, so users can tell which data to expect in each product.Each data granule has an associated browse product.
GLAH06 is used in conjunction with GLAH05 to create the Level-2 altimetry products. Level-2 altimetry data provide surface elevations for ice sheets (GLAH12), sea ice (GLAH13), land (GLAH14), and oceans (GLAH15). Data also include the laser footprint geolocation and reflectance, as well as geodetic, instrument, and atmospheric corrections for range measurements. The Level-2 elevation products, are regional products archived at 14 orbits per granule, starting and stopping at the same demarcation (± 50° latitude) as GLAH05 and GLAH06. Each regional product is processed with algorithms specific to that surface type. Surface type masks define which data are written to each of the products. If any data within a given record fall within a specific mask, the entire record is written to the product. Masks can overlap: for example, non-land data in the sea ice region may be written to the sea ice and ocean products. This means that an algorithm may write the same data to more than one Level-2 product. In this case, different algorithms calculate the elevations in their respective products. The surface type masks are versioned and archived at NSIDC, so users can tell which data to expect in each product. Each data granule has an associated browse product.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Manuscript in preparation/submitted.
This repository contains the dataset used to train and evaluate the Spectroscopic Transformer model for EMIT cloud screening.
221 EMIT Scenes were initially selected for labeling with diversity in mind. After sparse segmentation labeling of confident regions in Labelbox, up to 10,000 spectra were selected per-class per-scene to form the spectf_cloud_labelbox dataset. We deployed a preliminary model trained on these spectra on all EMIT scenes observed in March 2024, then labeled another 313 EMIT Scenes using MMGIS's polygonal labeling tool to correct false positive and false negative detections. After similarly sampling spectra from these scenes, A total of 3,575,442 spectra were labeled and sampled.
The train/test split was randomly determined by scene FID to prevent the same EMIT scene from contributing spectra to both the training and validation datasets.
Please refer to Section 4.2 in the paper for a complete description, and to our code repository for example usage and a Pytorch dataloader.
Each hdf5 file contains the following arrays:
Each hdf5 file contains the following attribute:
The EMIT online mapping tool was developed by the JPL MMGIS team. The High Performance Computing resources used in this investigation were provided by funding from the JPL Information and Technology Solutions Directorate.
This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).
© 2024 California Institute of Technology. Government sponsorship acknowledged.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Climatological radar rainfall dataset of 5 minute precipitation depths at a 1-km grid, which have been adjusted employing validated and complete rain gauge data from both KNMI rain gauge networks. Same dataset as "RAD_NL25_RAC_MFBS_5min", except that now an Extended Mask (EM) has been applied to this dataset. As a result, data are also available up to tens of kilometers away from the land surface of the Netherlands, i.e. above Belgium, Germany, and above open water. This dataset is updated once a month providing data up to a few months ago.
The Custom Silicone Mask Attack Dataset (CSMAD) contains presentation attacks made of six custom-made silicone masks. Each mask cost about USD 4000. The dataset is designed for face presentation attack detection experiments.
The Custom Silicone Mask Attack Dataset (CSMAD) has been collected at the Idiap Research Institute. It is intended for face presentation attack detection experiments, where the presentation attacks have been mounted using a custom-made silicone mask of the person (or identity) being attacked.
The dataset contains videos of face-presentations, as a set of files specifying the experimental protocol corresponding the experiments presented in the corresponding publication.
Reference
If you publish results using this dataset, please cite the following publication.
Sushil Bhattacharjee, Amir Mohammadi and Sebastien Marcel: "Spoofing Deep Face Recognition With Custom Silicone Masks." in Proceedings of International Conference on Biometrics: Theory, Applications, and Systems (BTAS), 2018.
10.1109/BTAS.2018.8698550
http://publications.idiap.ch/index.php/publications/show/3887
Data Collection
Face-biometric data has been collected from 14 subjects to create this dataset. Subjects participating in this data-collection have played three roles: targets, attackers, and bona-fide clients. The subjects represented in the dataset are referred to here with letter-codes: A .. N. The subjects A..F have also been targets. That is, face-data for these six subjects has been used to construct their corresponding flexible masks (made of silicone). These masks have been made by Nimba Creations Ltd., a special effects company.
Bona fide presentations have been recorded for all subjects A..N. Attack presentations (presentations where the subject wears one of 6 masks) have been recorded for all six targets, made by different subjects. That is, each target has been attacked several times, each time by a different attacker wearing the mask in question. This is one way of increasing the variability in the dataset. Another way we have augmented the variability of the dataset is by capturing presentations under different illumination conditions. Presentations have been captured in four different lighting conditions:
All presentations have been captured with a green uniform background. See the paper mentioned above for more details of the data-collection process.
Dataset Structure
The dataset is organized in three subdirectories: ‘attack’, ‘bonafide’, ‘protocols’. The two directories: ‘attack’ and ‘bonafide’ contain presentation-videos and still images for attacks and bona fide presentations, respectively. The folder ‘protocols’ contains text files specifying the experimental protocol for vulnerability analysis of face-recognition (FR) systems.
The number of data-files per category are as follows:
The folder ‘attack/WEAR’ contains videos where the attack has been made by a person (attacker) wearing the mask of the target being attacked. The ‘attack/STAND’ folder contains videos where the attack has been made using a the target’s mask mounted on an appropriate stand.
Video File Format
The video files for the face-presentations are in ‘hdf5’ format (with file-extensions ‘.h5’. The folder structure of the hdf5 file is shown in Figure 1. Each file contains data collected using two cameras:
As shown in Figure 1, frames from the different channels (color, infrared, depth, thermal) from he two cameras are stored in separate directory-hierarchies in the hdf5 file. Each file respresents a video of approximately 10 seconds, or roughly, 300 frames.
In the hdf5 file, the directory for SR300 also contains a subdirectory named ‘aligned_color_to_depth’. This folder contains post-processed data, where the frames of depth channel have been aligned with those of the color channel based on the time-stamps of the frames.
Experimental Protocol
The ‘protocols’ folder contains text files that specify the protocols for vulnerability analysis experiments reported in the paper mentioned above. Please see the README file in the protocols folder for details.
https://eoportal.eumetsat.int/userMgmt/terms.faceshttps://eoportal.eumetsat.int/userMgmt/terms.faces
Cloud Mask products from the Regional NOAA and Metop services (1-minute duration). The EUMETSAT Advanced Retransmission Service (EARS) provides instrument data from the Metop and NOAA satellites collected via a network of Direct Readout stations. The product is generated with the NWC SAF Polar Platform package. Segments of one-minute duration are disseminated to users via EUMETCast and these segments can be concatenated together by users to construct a regional pass.
GLAH06 is used in conjunction with GLAH05 to create the Level-2 altimetry products. Level-2 altimetry data provide surface elevations for ice sheets (GLAH12), sea ice (GLAH13), land (GLAH14), and oceans (GLAH15). Data also include the laser footprint geolocation and reflectance, as well as geodetic, instrument, and atmospheric corrections for range measurements. The Level-2 elevation products, are regional products archived at 14 orbits per granule, starting and stopping at the same demarcation (± 50° latitude) as GLAH05 and GLAH06. Each regional product is processed with algorithms specific to that surface type. Surface type masks define which data are written to each of the products. If any data within a given record fall within a specific mask, the entire record is written to the product. Masks can overlap: for example, non-land data in the sea ice region may be written to the sea ice and ocean products. This means that an algorithm may write the same data to more than one Level-2 product. In this case, different algorithms calculate the elevations in their respective products. The surface type masks are versioned and archived at NSIDC, so users can tell which data to expect in each product. Each data granule has an associated browse product.
# California Burned Areas Dataset This is the second part of the dataset ### Dataset Summary This dataset contains images from Sentinel-2 satellites taken before and after a wildfire. The ground truth masks are provided by the California Department of Forestry and Fire Protection and they are mapped on the images. ### Supported Tasks The dataset is designed to do binary semantic segmentation of burned vs unburned areas. ## Dataset Structure ### Dataset opening Dataset was compressed using `h5py` and BZip2 from `hdf5plugin`. **WARNING: `hdf5plugin` is necessary to extract data** ### Data Instances Each matrix has a shape of 5490x5490xC, where C is 12 for pre-fire and post-fire images, while it is 0 for binary masks. ### Data Fields In each HDF5 file, you can find post-fire, pre-fire images and binary masks. The file is structured in this way: ```bash ├── foldn │ ├── uid0 │ │ ├── pre_fire │ │ ├── post_fire │ │ ├── mask │ ├── uid1 │ ├── post_fire │ ├── mask │ ├── foldm ├── uid2 │ ├── post_fire │ ├── mask ├── uid3 ├── pre_fire ├── post_fire ├── mask ... ``` where `foldn` and `foldm` are fold names and `uidn` is a unique identifier for the wilfire. ### Data Splits There are 5 random splits whose names are: 0, 1, 2, 3 and 4. ## Dataset Creation ### Source Data #### Initial Data Collection and Normalization Data are collected directly from Copernicus Open Access Hub through the API. The band files are aggregated into one single matrix.
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Simulated transmission images of a hopfion for testing MARTApp, an application developed in ALBA Synchrotron for the analysis and reconstruction of magnetic tomographies performed at BL09 MISTRAL beamline.
The dataset consist of two orthogonal tilt series (TS1 and TS2) and its magnetic reconstruction using MARTApp.
The transmission images of sizes 588x588 are located in the following files:
- Hopfion_1_FFnorm_stack.hdf5: contains the transmission images for the C+ polarization, TS1.
- Hopfion_-1_FFnorm_stack.hdf5: contains the transmission images for the C- polarization, TS1.
- HopfionRot_1_FFnorm_stack.hdf5: contains the transmission images for the C+ polarization, TS2.
- HopfionRot_1_FFnorm_stack.hdf5: contains the transmission images for the C- polarization, TS2.
Images are saved in the HDF5 dataset /TomoNormalized/TomoNormalized/. The dataset TomoNormalized/polarization/ refers to the polarization, being '1' to C+ and '-1' to C-. The tilt angles are located in the datasets '/TomoNormalized/rotation_angle/'.
The recovered reconstruction using MARTApp is included in the file MagneticReconstruction_Norm.hdf5. This file contains the 3 components of the magnetization in the datasets /mx/, /my/, and /mz/. The dataset /Mask3D/ includes the mask used for registration between TS1 and TS2.
Overview: This repository stores cloud masks and MDGMS for Martian Years (MYs) 28-33. MDGMs were obtained from Harvard Dataverse (https://doi.org/10.7910/DVN/U3766S), and cloud masks were created using the cloudmask model (https://github.com/03kalven/cloudmask). The cloud masks contained in the binary folders have already been binarized using the threshold of 0.912. This dataset is considerably smaller in size than the floating-point cloud mask dataset and is suited for researchers that prefer to use the default threshold of 0.912. Quick breakdown of the files and folders: phase folders contain the complete set of MDGMs and cloud masks for Mars Reconnaisance Orbiter mission phases P, B, G, D, F, and J (MYs 28-33) phase_binary folders contain the complete set of binary MDGMs and cloud masks for Mars Reconnaisance Orbiter mission phases P, B, G, D, F, and J (MYs 28-33) view_masks.ipynb has a few handy methods to plot cloud masks and MDGMs Phase folders: Each folder is organized based on phase (P, B, G, D, F, J) and subphase (_01 to _23). In each subphase, there are cloudmasks and mdgms folders, as well as a .txt file with MY and solar longitude (Ls) data for each day. MDGM and cloud mask formats: mdgm: JPEG, 3600x1801 cloudmask: (NETCDF4_CLASSIC data model, file format HDF5): dimensions(sizes): x(3600), y(1801) variables(dimensions): float32 longitude(x), float32 latitude(y), float32/int16 cloudmask(y, x) The cloud masks' values for any pixel are -999 for NaN and a float from 0 to 1 reporting the model's confidence in that pixel being a cloud. The cloud masks can be binarized using get_cloudmask() included in view_masks.ipynb. A binarized mask would report -999 for NaN, 0 for no cloud, and 1 for cloud. The default threshold is 0.912, but this value can be adjusted if desired. The cloud masks' (0,0) coordinate is the lower left corner of the map, so it may be needed to flip the cloudmask vertically before plotting on a Martian map. The cloud mask NetCDF files are constructed the same way as Wang and González Abad's (https://doi.org/10.7910/DVN/WU6VZ8). {"references": ["WANG, HUIQUN (Helen); Gonzalez Abad, Gonzalo, 2021, "Cloud Masks derived from Mars Daily Global Maps for MRO", https://doi.org/10.7910/DVN/WU6VZ8, Harvard Dataverse, V1", "WANG, HUIQUN (Helen); Battalio, Michael; Huber, Zachary, 2020, "Replication Data for: MRO MARCI Mars Daily Global Maps V2", https://doi.org/10.7910/DVN/U3766S, Harvard Dataverse, V2"]}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This texture dataset is intended for training, testing and validation of a convolutional neural network used in the task of building cloud and snow cover masks using data received from a multi-zone scanning device used for hydrometeorological support (MSU-GS) and installed on the Russian satellite Electro-L No.2.All data is stored in an «HDF5» format file. The data is divided into three groups: «train», «test» and «val», respectively, for training, testing and validating the neural network. Each group contains a set of classes, these are «Surface», «Snow» and «Cloud». Each class consists of a set of texture blocks and has the dimension NxTxTxM. Where N is the number of samples of texture blocks for a given class and group, TxT is the spatial resolution of one texture from a block in pixels (in our case 11x11), M is the number of textures in a block (in our case 5: these are 3 textures with reflectance data of channels 0.6, 0.7 and 0.9 μm; 2 textures with temperature brightness data of channel 3.8 and 10.7 μm of the MSU-GS instrument).
description: GLAH13 contains sea ice and open ocean elevations corrected for geodetic and atmospheric affects, calculated from algorithms fine-tuned for sea ice returns. Granules contain 14 orbits of data within a sea ice mask. (Suggested Usage: GLAH13 contains sea ice, open ocean, and iceberg elevations corrected for geodetic and atmospheric affects, calculated from algorithms fine-tuned for sea ice returns. Granules contain 14 orbits of data within the sea ice mask. Each GLAH13 file was created from an equivalent GLA13 binary file. The data used to create the GLAH13 values are contained in the equivalent GLAHxx files for the GLAxx files. See the provenance metadata for the creation of the GLA13.); abstract: GLAH13 contains sea ice and open ocean elevations corrected for geodetic and atmospheric affects, calculated from algorithms fine-tuned for sea ice returns. Granules contain 14 orbits of data within a sea ice mask. (Suggested Usage: GLAH13 contains sea ice, open ocean, and iceberg elevations corrected for geodetic and atmospheric affects, calculated from algorithms fine-tuned for sea ice returns. Granules contain 14 orbits of data within the sea ice mask. Each GLAH13 file was created from an equivalent GLA13 binary file. The data used to create the GLAH13 values are contained in the equivalent GLAHxx files for the GLAxx files. See the provenance metadata for the creation of the GLA13.)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for training and evaluating RFI detection schemes representing MeerKat instrumentation and predominantly satellite-based contamination. These datasets are produced using Tabascal and output in hdf5 format. The choice of format is to allow for easy use with machine-learning workflows, not other astronomy pipelines (for example, measurement sets). These datasets are prepared for immediate loading with Tensorflow. The attached config.json files describe the parameters used to generate these datasets.
Dataset parameters
Name
Num Satellite Sources
Num Ground RFI Sources
obs_100AST_0SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
0
0
obs_100AST_1SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
1
0
obs_100AST_1SAT_3GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
1
3
obs_100AST_2SAT_0GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
2
0
obs_100AST_2SAT_3GRD_512BSL_64A_512T-0440-1462_016I_512F-1.227e+09-1.334e+09
2
3
Using simulated data allows for access to ground truth for noise contamination. As such, these datasets contain the observation visibility amplitudes (without noise), noise visibilities and boolean pixel-wise masks at several thresholds on the noise visibilities. We outline the dimensions of all datasets below:
Dataset Dimensions
Field
vis
masks_orig
masks_0
masks_1
masks_2
masks_4
masks_8
masks_16
Datatype
float32
float32
bool
bool
bool
bool
bool
bool
Of course, one can produce masks at arbitrary thresholds, but for convenience, we include several pre-computed options.
All datasets and all fields have the dimensions 512, 512, 512, 1 (baseline, time, frequency, amplitude/mask)
description: GLAH12 contains the ice sheet elevation and elevation distribution corrected for geodetic and atmospheric affects calculated from algorithms fine-tuned for ice sheet returns. Data granules contain 14 orbits of data within the ice sheet mask. (Suggested Usage: GLAH12 contains ice sheet elevation and elevation distribution calculated from algorithms fine-tuned for ice sheet returns for use by researchers. Parameters are at the full 40Hz resolution that fall within the ICESat ice sheet mask. Each GLAH12 file was created from an equivalent GLA12 binary file. The data used to create the GLAH12 values are contained in the equivalent GLAHxx files for the GLAxx files. See the provenance metadata for the creation of the GLA12.); abstract: GLAH12 contains the ice sheet elevation and elevation distribution corrected for geodetic and atmospheric affects calculated from algorithms fine-tuned for ice sheet returns. Data granules contain 14 orbits of data within the ice sheet mask. (Suggested Usage: GLAH12 contains ice sheet elevation and elevation distribution calculated from algorithms fine-tuned for ice sheet returns for use by researchers. Parameters are at the full 40Hz resolution that fall within the ICESat ice sheet mask. Each GLAH12 file was created from an equivalent GLA12 binary file. The data used to create the GLAH12 values are contained in the equivalent GLAHxx files for the GLAxx files. See the provenance metadata for the creation of the GLA12.)
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Here we provide time variant training and testing data for the GlottisNetV2 study. In particular, this dataset relies on the benchmark for automatic glottis segmentation (BAGLS dataset). This three-dimensional data (time, y, x) is used for experiments involving deep neural networks capable of processing time-variant data.
Each folder contains videos each with the following data:
Endoscopic video as mp4 (*.mp4)
Glottis segmentation as mask-file (hdf5 container, *.mask)
Glottis segmentation as mp4 file (*_mask.mp4)
Metadata as JSON file (*.meta)
Glottal midline annotation as JSON file (*.points)
GLAH14 contains the land elevation and elevation distribution corrected for geodetic and atmospheric affects calculated from algorithms fine-tuned for over land returns. Data granules contain 14 orbits of data within the land mask.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regular observations at the Kodaikanal Solar Observatory (KoSO) began in 1904 using a white-light telescope with a 10-cm aperture lens and an f/15 light beam. Between 1912 and 1917, the objective lens was changed several times. In 1918, a 15-cm achromatic lens was installed. This new configuration produced a 20.4 cm size image of the Sun in the image plane. Photographic plates were used to capture the image. The same telescope has been used since 1918 up until 2017 to take regular white-light observations of the Sun. This data set provides the sunspot mask in HDF5 format for all the White Light Observations acquired at KoSO. Each HDF5 file contains the sunspot mask for all the observations for that year. The sunspot masks are provided in two different coordinate systems: (i) Full Disk as observed and (ii) Carrington heliographic coordinate, which is transformed from full disk using near point interpolation. Each data set also contains metadata in the form of HDF5 attributes. The Carrington co-ordinate data if full Sun map, hence the near-side of the Sun is the region where values in the mask in non-zero, where as sunspot regions are filled with value 2.
A Python package (KoSOpy), which can be located on GitHub, is being developed which can be used to navigate through these data sets.
CoastWatch swath data derived from AVHRR is full pass swath projection data from different AVHRR receiving stations. The files contain multiple data variables stored using the HDF-4 Scientific Data Sets (SDS) model. The product contents are channel 1 albedo, channel 2 albedo, channel 3a albedo, channel 3 brightness temperature, channel 4 brightness temperature, channel 5 brightness temperature, moisture corrected sea-surface-temperature, 8-bit CLAVR ocean cloud mask, 2-bit CLAVR-X cloud mask, satellite zenith angle, solar zenith angle and relative azimuth angle.
In the sugar industry, it is very useful to find the sizes and shape types of crystals of A, B, and C Mascott. The major hurdle in this task is that it is very hard to detect crystals with only image processing techniques. A deep learning model can be more accurate to get the segmented result. This dataset contains microscopic images of mascot samples and their respective masks.
Dataset is in h5py file. There are two types of records: original images, and their masks. You need to install h5py before using this dataset. h5py can be installed by running the following command.
pip install h5py
To access the files run following.
import h5py
f = h5py.File('dataset.h5','r')
images,masks = (f['images'][:],f['masks'][:])