U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is composed of 81 pairs of correlated images. Each pair contains one image of an iron ore sample acquired through reflected light microscopy (RGB, 24-bit), and the corresponding binary reference image (8-bit), in which the pixels are labeled as belonging to one of two classes: ore (0) or embedding resin (255).
The sample came from an itabiritic iron ore concentrate from Quadrilátero Ferrífero (Brazil) mainly composed of hematite and quartz, with little magnetite and goethite. It was classified by size and concentrated with a dense liquid. Then, the fraction -149+105 μm with density greater than 3.2 was cold mounted with epoxy resin and subsequently ground and polished.
Correlative microscopy was employed for image acquisition. Thus, 81 fields were imaged on a reflected light microscope with a 10× (NA 0.20) objective lens and on a scanning electron microscope (SEM). In sequence, they were registered, resulting in images of 999×756 pixels with a resolution of 1.05 µm/pixel. Finally, the images from SEM were thresholded to generate the reference images.
Further description of this sample and its imaging procedure can be found in the work by Gomes and Paciornik (2012).
This dataset was created for developing and testing deep learning models on semantic segmentation tasks. The paper of Filippo et al. (2021) presented a variant of the DeepLabv3+ model that reached mean values of 91.43% and 93.13% for overall accuracy and F1 score, respectively, for 5 rounds of experiments (training and testing), each with a different, random initialization of network weights.
For further questions and suggestions, please do not hesitate to contact us.
Contact email: ogomes@gmail.com
If you use this dataset in your own work, please cite this DOI: 10.5281/zenodo.5014700
Please also cite this paper, which provides additional details about the dataset:
Michel Pedro Filippo, Otávio da Fonseca Martins Gomes, Gilson Alexandre Ostwald Pedro da Costa, Guilherme Lucio Abelha Mota. Deep learning semantic segmentation of opaque and non-opaque minerals from epoxy resin in reflected light microscopy images. Minerals Engineering, Volume 170, 2021, 107007, https://doi.org/10.1016/j.mineng.2021.107007.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Target Image Segmentation Data is a dataset for instance segmentation tasks - it contains Targets annotations for 293 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Dataset Card for "brain-tumor-image-dataset-semantic-segmentation"
Dataset Description
The Brain Tumor Image Dataset (BTID) for Semantic Segmentation contains MRI images and annotations aimed at training and evaluating segmentation models. This dataset was sourced from Kaggle and includes detailed segmentation masks indicating the presence and boundaries of brain tumors. This dataset can be used for developing and benchmarking algorithms for medical image segmentation… See the full description on the dataset page: https://huggingface.co/datasets/dwb2023/brain-tumor-image-dataset-semantic-segmentation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Images and 2-class labels for semantic segmentation of Sentinel-2 and Landsat RGB satellite images of coasts (water, other)
Images and 2-class labels for semantic segmentation of Sentinel-2 and Landsat RGB satellite images of coasts (water, other)
Description
4088 images and 4088 associated labels for semantic segmentation of Sentinel-2 and Landsat RGB satellite images of coasts. The 2 classes are 1=water, 0=other. Imagery are a mixture of 10-m Sentinel-2 and 15-m pansharpened Landsat 7, 8, and 9 visible-band imagery of various sizes. Red, Green, Blue bands only
These images and labels could be used within numerous Machine Learning frameworks for image segmentation, but have specifically been made for use with the Doodleverse software package, Segmentation Gym**.
Two data sources have been combined
Dataset 1
Dataset 2
File descriptions
References
*Doodler: Buscombe, D., Goldstein, E.B., Sherwood, C.R., Bodine, C., Brown, J.A., Favela, J., Fitzpatrick, S., Kranenburg, C.J., Over, J.R., Ritchie, A.C. and Warrick, J.A., 2021. Human‐in‐the‐Loop Segmentation of Earth Surface Imagery. Earth and Space Science, p.e2021EA002085https://doi.org/10.1029/2021EA002085. See https://github.com/Doodleverse/dash_doodler.
**Segmentation Gym: Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym
***Coast Train data release: Wernette, P.A., Buscombe, D.D., Favela, J., Fitzpatrick, S., and Goldstein E., 2022, Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation: U.S. Geological Survey data release, https://doi.org/10.5066/P91NP87I. See https://coasttrain.github.io/CoastTrain/ for more information
****Buscombe, Daniel, Goldstein, Evan, Bernier, Julie, Bosse, Stephen, Colacicco, Rosa, Corak, Nick, Fitzpatrick, Sharon, del Jesús González Guillén, Anais, Ku, Venus, Paprocki, Julie, Platt, Lindsay, Steele, Bethel, Wright, Kyle, & Yasin, Brandon. (2022). Images and 4-class labels for semantic segmentation of Sentinel-2 and Landsat RGB satellite images of coasts (water, whitewater, sediment, other) (v1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7335647
*****Seale, C., Redfern, T., Chatfield, P. 2022. Sentinel-2 Water Edges Dataset (SWED) https://openmldata.ukho.gov.uk/
******Seale, C., Redfern, T., Chatfield, P., Luo, C. and Dempsey, K., 2022. Coastline detection in satellite imagery: A deep learning approach on new benchmark data. Remote Sensing of Environment, 278, p.113044.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Doodleverse/Segmentation Zoo/Seg2Map Res-UNet models for FloodNet/10-class segmentation of RGB 768x512 UAV images
These Residual-UNet model data are based on [FloodNet](https://github.com/BinaLab/FloodNet-Challenge-EARTHVISION2021) images and associated labels.
Models have been created using Segmentation Gym* using the following dataset**: https://github.com/BinaLab/FloodNet-Challenge-EARTHVISION2021
Image size used by model: 768 x 512 x 3 pixels
classes:
1. Background
2. Building-flooded
3. Building-non-flooded
4. Road-flooded
5. Road-non-flooded
6. Water
7. Tree
8. Vehicle
9. Pool
10. Grass
File descriptions
For each model, there are 5 files with the same root name:
1. '.json' config file: this is the file that was used by Segmentation Gym* to create the weights file. It contains instructions for how to make the model and the data it used, as well as instructions for how to use the model for prediction. It is a handy wee thing and mastering it means mastering the entire Doodleverse.
2. '.h5' weights file: this is the file that was created by the Segmentation Gym* function `train_model.py`. It contains the trained model's parameter weights. It can called by the Segmentation Gym* function `seg_images_in_folder.py`. Models may be ensembled.
3. '_modelcard.json' model card file: this is a json file containing fields that collectively describe the model origins, training choices, and dataset that the model is based upon. There is some redundancy between this file and the `config` file (described above) that contains the instructions for the model training and implementation. The model card file is not used by the program but is important metadata so it is important to keep with the other files that collectively make the model and is such is considered part of the model
4. '_model_history.npz' model training history file: this numpy archive file contains numpy arrays describing the training and validation losses and metrics. It is created by the Segmentation Gym function `train_model.py`
5. '.png' model training loss and mean IoU plot: this png file contains plots of training and validation losses and mean IoU scores during model training. A subset of data inside the .npz file. It is created by the Segmentation Gym function `train_model.py`
Additionally, BEST_MODEL.txt contains the name of the model with the best validation loss and mean IoU
images.zip and labels.zip contain the images and labels, respectively, used to train the model
References
*Segmentation Gym: Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym
** Rahnemoonfar, M., Chowdhury, T., Sarkar, A., Varshney, D., Yari, M. and Murphy, R.R., 2021. Floodnet: A high resolution aerial imagery dataset for post flood scene understanding. IEEE Access, 9, pp.89644-89654.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The collection includes beach coastlines from Southeastern Australia, specifically Victoria and New South Wales, used to train an image segmentation model using the U-Net deep learning architecture for mapping sandy beaches. The dataset contains polygons that represent the outline or extent of the raster images and polygons drawn by citizen-scientists. Additionally, we provide the trained model itself, which can be utilized for further evaluation or refined through fine-tuning. The resulting predictions are also available in Shapefiles format, which can be loaded to NationalMap.
This collection supplements the publication: Regional-Scale Image Segmentation of Sandy Beaches: Comparison of Training and Prediction Across Two Extensive Coastlines in Southeastern Australia (Yong et al.) Lineage: The training dataset of citizen science-drawn beach outlines and polygons was sourced from OpenStreetMap (OSM) https://www.openstreetmap.org/). Tiled images along the coast were sourced from Microsoft Bing imagery to process new beach outlines, as it is also one of the main sources of imagery used for drawing features in OSM. Note, the original OSM data was licensed ODbL and should be considered when using the processed dataset, which required a Creative Commons Licence to be published in this portal. CC-BY was identified as the most suitable license in the portal to align with ODbL.
The saved deep learning model was trained on the dataset using a U-Net architecture, which is used to generate the predicted maps.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Semantic segmentation results using a training dataset of real underwater sonar images and synthetic underwater sonar images.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Target: Left Atrium
Modality: Mono-modal MRI
Size: 30 3D volumes (20 Training + 10 Testing)
Source: King’s College London
Challenge: Small training dataset with large variability
Powered by the ImageNet dataset, unsupervised learning on large-scale data has made significant advances for classification tasks. There are two major challenges to allowing such an attractive learning modality for segmentation tasks: i) a large-scale benchmark for assessing algorithms is missing; ii) unsupervised shape representation learning is difficult. We propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to track the research progress. Based on the ImageNet dataset, we propose the ImageNet-S dataset with 1.2 million training images and 50k high-quality semantic segmentation annotations for evaluation. Our benchmark has a high data diversity and a clear task objective. We also present a simple yet effective baseline method that works surprisingly well for LUSS. In addition, we benchmark related un/weakly/fully supervised methods accordingly, identifying the challenges and possible directions of LUSS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Segmentation data of the training images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains 48 images of grains and the corresponding segmentation masks that form the majority of the images that were used to train the U-Net model that the 'Segmenteverygrain' Python package is based on. The images have filenames that terminate in '_image.png'; the mask filenames terminate in '_mask.png'. The mask rasters only contain three values: 0 for background, 1 for the grain itself, and 2 for the grain boundary.
These files can be used to train a new U-Net model, either using 'Segmenteverygrain' functions, or using any machine learning framework that has functionality for training image segmentation models.
Some of these images come from the SediNet project (Buscombe, 2019). A few images of fluvial gravel were collected by Mair et al. (2022), using UAVs; see this repository. The remaining images were taken either with a handheld digital camera or using a microscope.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The image dataset was prepared for training deep learning image segmentation models to identify karst sinkholes. Information about the work can be found at (https://github.com/mvrl/sink-seg/). The dataset consists of a DEM image, an aerial image, and a binary sinkhole label image in an area in central Kentucky, USA. It also includes four images derived from the DEM image. The image dataset is sourced from publicly available data from Kentucky's Elevation Data & Aerial Photography Program (https://kyfromabove.ky.gov/) and Kentucky LiDAR-derived sinkholes (https://kgs.uky.edu/geomap).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"Image.zip" contains 955 corrrosion images, 1480 crack images, 1269 free lime images, 873 water leakage images, and 1244 spalling images. These images are labeled with numbers from 0 to 6 including the background. The "Label.zip" file contains the labeled images, and the "Image.json" file contains the label information.
Segment Anything 1 Billion (SA-1B) is a dataset designed for training general-purpose object segmentation models from open world images. The dataset was introduced in the paper "Segment Anything".
The SA-1B dataset consists of 11M diverse, high-resolution, licensed, and privacy-protecting images and 1.1B mask annotations. Masks are given in the COCO run-length encoding (RLE) format, and do not have classes.
The license is custom. Please, read the full terms and conditions on https://ai.facebook.com/datasets/segment-anything-downloads.
All the features are in the original dataset except image.content
(content
of the image).
You can decode segmentation masks with:
import tensorflow_datasets as tfds
pycocotools = tfds.core.lazy_imports.pycocotools
ds = tfds.load('segment_anything', split='train')
for example in tfds.as_numpy(ds):
segmentation = example['annotations']['segmentation']
for counts, size in zip(segmentation['counts'], segmentation['size']):
encoded_mask = {'size': size, 'counts': counts}
mask = pycocotools.decode(encoded_mask) # np.array(dtype=uint8) mask
...
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('segment_anything', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Road Segmentation Dataset
This dataset comprises a collection of images captured through DVRs (Digital Video Recorders) showcasing roads. Each image is accompanied by segmentation masks demarcating different entities (road surface, cars, road signs, marking and background) within the scene.
💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset
The dataset can be utilized… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/roads-segmentation-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset visualizes Grad-CAM based heatmap applied on membrane segmentation results when using Unet. The training data are shown in "train" folder in which: - "checkpoint" folder: stores checkpoint files for 3 epochs: 100, 500, and 5,000. - "image" folder: holds training images - "label" folder: stores labelled membrane images The testing results are stored in "test_xxx" folders for 3 epochs: 100, 500 and 5,000.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This image dataset contains synthetic structure images used for training the deep-learning based nanowire segmentation model presented in our work "A deep learned nanowire segmentation model using synthetic data augmentation" to be published in npj Computational materials. Detailed information can be found in the corresponding article.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Today, deep neural networks are widely used in many computer vision problems, also for geographic information systems (GIS) data. This type of data is commonly used for urban analyzes and spatial planning. We used orthophotographic images of two residential districts from Kielce, Poland for research including urban sprawl automatic analysis with Transformer-based neural network application.Orthophotomaps were obtained from Kielce GIS portal. Then, the map was manually masked into building and building surroundings classes. Finally, the ortophotomap and corresponding classification mask were simultaneously divided into small tiles. This approach is common in image data preprocessing for machine learning algorithms learning phase. Data contains two original orthophotomaps from Wietrznia and Pod Telegrafem residential districts with corresponding masks and also their tiled version, ready to provide as a training data for machine learning models.Transformed-based neural network has undergone a training process on the Wietrznia dataset, targeted for semantic segmentation of the tiles into buildings and surroundings classes. After that, inference of the models was used to test model's generalization ability on the Pod Telegrafem dataset. The efficiency of the model was satisfying, so it can be used in automatic semantic building segmentation. Then, the process of dividing the images can be reversed and complete classification mask retrieved. This mask can be used for area of the buildings calculations and urban sprawl monitoring, if the research would be repeated for GIS data from wider time horizon.Since the dataset was collected from Kielce GIS portal, as the part of the Polish Main Office of Geodesy and Cartography data resource, it may be used only for non-profit and non-commertial purposes, in private or scientific applications, under the law "Ustawa z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (Dz.U. z 2006 r. nr 90 poz 631 z późn. zm.)". There are no other legal or ethical considerations in reuse potential.Data information is presented below.wietrznia_2019.jpg - orthophotomap of Wietrznia districtmodel's - used for training, as an explanatory imagewietrznia_2019.png - classification mask of Wietrznia district - used for model's training, as a target imagewietrznia_2019_validation.jpg - one image from Wietrznia district - used for model's validation during training phasepod_telegrafem_2019.jpg - orthophotomap of Pod Telegrafem district - used for model's evaluation after training phasewietrznia_2019 - folder with wietrznia_2019.jpg (image) and wietrznia_2019.png (annotation) images, divided into 810 tiles (512 x 512 pixels each), tiles with no information were manually removed, so the training data would contain only informative tilestiles presented - used for the model during training (images and annotations for fitting the model to the data)wietrznia_2019_vaidation - folder with wietrznia_2019_validation.jpg image divided into 16 tiles (256 x 256 pixels each) - tiles were presented to the model during training (images for validation model's efficiency); it was not the part of the training datapod_telegrafem_2019 - folder with pod_telegrafem.jpg image divided into 196 tiles (256 x 265 pixels each) - tiles were presented to the model during inference (images for evaluation model's robustness)Dataset was created as described below.Firstly, the orthophotomaps were collected from Kielce Geoportal (https://gis.kielce.eu). Kielce Geoportal offers a .pst recent map from April 2019. It is an orthophotomap with a resolution of 5 x 5 pixels, constructed from a plane flight at 700 meters over ground height, taken with a camera for vertical photos. Downloading was done by WMS in open-source QGIS software (https://www.qgis.org), as a 1:500 scale map, then converted to a 1200 dpi PNG image.Secondly, the map from Wietrznia residential district was manually labelled, also in QGIS, in the same scope, as the orthophotomap. Annotation based on land cover map information was also obtained from Kielce Geoportal. There are two classes - residential building and surrounding. Second map, from Pod Telegrafem district was not annotated, since it was used in the testing phase and imitates situation, where there is no annotation for the new data presented to the model.Next, the images was converted to an RGB JPG images, and the annotation map was converted to 8-bit GRAY PNG image.Finally, Wietrznia data files were tiled to 512 x 512 pixels tiles, in Python PIL library. Tiles with no information or a relatively small amount of information (only white background or mostly white background) were manually removed. So, from the 29113 x 15938 pixels orthophotomap, only 810 tiles with corresponding annotations were left, ready to train the machine learning model for the semantic segmentation task. Pod Telegrafem orthophotomap was tiled with no manual removing, so from the 7168 x 7168 pixels ortophotomap were created 197 tiles with 256 x 256 pixels resolution. There was also image of one residential building, used for model's validation during training phase, it was not the part of the training data, but was a part of Wietrznia residential area. It was 2048 x 2048 pixel ortophotomap, tiled to 16 tiles 256 x 265 pixels each.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...