https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/
ADE20K Dataset
Description
ADE20K is composed of more than 27K images from the SUN and Places databases. Images are fully annotated with objects, spanning over 3K object categories. Many of the images also contain object parts, and parts of parts. We also provide the original annotated polygons, as well as object instances for amodal segmentation. Images are also anonymized, blurring faces and license plates.
Images
MIT, CSAIL does not own the copyright of the… See the full description on the dataset page: https://huggingface.co/datasets/1aurent/ADE20K.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Ade20k Dataset Part 3 is a dataset for instance segmentation tasks - it contains Wall annotations for 7,212 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
msdkhairi/ade20k dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Ade20k Dataset V5.0.1 is a dataset for instance segmentation tasks - it contains Wall KZSU annotations for 6,442 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example of classes included in the ADE20K dataset [25].
MewtwoX23/ade20k dataset hosted on Hugging Face and contributed by the HF Datasets community
The dataset used in the paper is ADE20K for semantic segmentation.
This dataset was created by Shubhjyot
qubvel-hf/ade20k-nano dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "ade20k-panoptic-demo-imagefolder"
More Information needed
inpaint-context/ade20k-fix dataset hosted on Hugging Face and contributed by the HF Datasets community
Application: Semantic Segmentation ML Task: DeepLabV3Plus Framework: TensorFlow 1.15 for training/ckpt, pb evaluation; Tensorflow 2.2 for post-training quantization/tflite evaluation Training Information: Quality: Precision: FP32/INT8 Is Quantized: yes Is ONNX: no Dataset: ADE20K
ADE20K: A Dataset for Semantic Segmentation
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ranksu/ADE20K dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are two datasets in this data release:
1. Model training dataset. A manually (or semi-manually) labeled image dataset that was used to train and evaluate a machine (deep) learning model designed to identify subaerial accumulations of large wood, alluvial sediment, water, and vegetation in orthoimagery of alluvial river corridors in forested catchments.
2. Model output dataset. A labeled image dataset that uses the aforementioned model to estimate subaerial accumulations of large wood, alluvial sediment, water, and vegetation in a larger orthoimagery dataset of alluvial river corridors in forested catchments.
All of these label data are derived from raw gridded data that originate from the U.S. Geological Survey (Ritchie et al., 2018). That dataset consists of 14 orthoimages of the Middle Reach (MR, in between the former Aldwell and Mills reservoirs) and 14 corresponding Lower Reach (LR, downstream of the former Mills reservoir) of the Elwha River, Washington, collected between the period 2012-04-07 and 2017-09-22. That orthoimagery was generated using SfM photogrammetry (following Over et al., 2021) using a photographic camera mounted to an aircraft wing. The imagery capture channel change as it evolved under a ~20 Mt sediment pulse initiated by the removal of the two dams. The two reaches are the ~8 km long Middle Reach (MR) and the lower-gradient ~7 km long Lower Reach (LR).
The orthoimagery have been labeled (pixelwise, either manually or by an automated process) according to the following classes (inter class in the label data in parentheses):
1. vegetation / other (0)
2. water (1)
3. sediment (2)
4. large wood (3)
Imagery was labeled using a combination of the open-source software Doodler (Buscombe et al., 2021; https://github.com/Doodleverse/dash_doodler) and hand-digitization using QGIS at 1:300 scale, rasterizeing the polygons, and gridded and clipped in the same way as all other gridded data. Doodler facilitates relatively labor-free dense multiclass labeling of natural imagery, enabling relatively rapid training dataset creation. The final training dataset consists of 4382 images and corresponding labels, each 1024 x 1024 pixels and representing just over 5% of the total data set. The training data are sampled approximately equally in time and in space among both reaches. All training and validation samples purposefully included all four label classes, to avoid model training and evaluation problems associated with class imbalance (Buscombe and Goldstein, 2022).
Data are provided in geoTIFF format. The imagery and label grids (imagery) are reprojected to be co-located in the NAD83(2011) / UTM zone 10N projection, and to consist of 0.125 x 0.125m pixels.
Pixel-wise labels measurements such as these facilitate development and evaluation of image segmentation, image classification, object-based image-analysis (OBIA), and object-in-image detection models, and numerous potential other machine learning models for the general purposes of river corridor classification, description, enumeration, inventory, and process or state quantification. For example this dataset may serve in transfer learning contexts for application in different river or coastal environments or for different tasks or class ontologies.
1. Labels_used_for_model_training_Buscombe_Labeled_high_resolution_orthoimagery_time_series_of_an_alluvial_river_corridor_Elwha_River_Washington_USA.zip, 63 MB, label tiffs
2. Model_training_ images1of4.zip, 1.5 GB, imagery tiffs
3. Model_training_ images2of4.zip, 1.5 GB, imagery tiffs
4. Model_training_ images3of4.zip, 1.7 GB, imagery tiffs
5. Model_training_ images4of4.zip, 1.6 GB, imagery tiffs
Imagery was labeled using a deep-learning based semantic segmentation model (Buscombe, 2023) trained specifically for the task using the Segmentation Gym (Buscombe and Goldstein, 2022) modeling suite. We use the software package Segmentation Gym (Buscombe and Goldstein, 2022) to fine-tune a Segformer (Xie et al., 2021) deep learning model for semantic image segmentation. We take the instance (i.e. model architecture and trained weights) of the model of Xie et al. (2021), itself fine-tuned on ADE20k dataset (Zhou et al., 2019) at resolution 512x512 pixels, and fine-tune it on our 1024x1024 pixel training data consisting of 4-class label images.
The spatial extent of the imagery in the MR is 455157.2494695878122002,5316532.9804129302501678 : 457076.1244695878122002,5323771.7304129302501678. Imagery width is 15351 pixels and imagery height is 57910 pixels. The spatial extent of the imagery in the LR is 457704.9227139975992031,5326631.3750646486878395 : 459241.6727139975992031,5333311.0000646486878395. Imagery width is 12294 pixels and imagery height is 53437 pixels. Data are provided in Cloud-Optimzed geoTIFF (COG) format. The imagery and label grids (imagery) are reprojected to be co-located in the NAD83(2011) / UTM zone 10N projection, and to consist of 0.125 x 0.125m pixels. All grids have been clipped to the union of extents of active channel margins during the period of interest.
Reach-wide pixel-wise measurements such as these facilitate comparison of wood and sediment storage at any scale or location. These data may be useful for studying the morphodynamics of wood-sediment interactions in other geomorphically complex channels, wood storage in channels, the role of wood in ecosystems and conservation or restoration efforts.
1. Elwha_MR_labels_Buscombe_Labeled_high_resolution_orthoimagery_time_series_of_an_alluvial_river_corridor_Elwha_River_Washington_USA.zip, 9.67 MB, label COGs from Elwha River Middle Reach (MR)
2. ElwhaMR_ imagery_ part1_ of 2.zip, 566 MB, imagery COGs from Elwha River Middle Reach (MR)
3. ElwhaMR imagery_ part2_ of_ 2.zip, 618 MB, imagery COGs from Elwha River Middle Reach (MR)
3. Elwha_LR_labels_Buscombe_Labeled_high_resolution_orthoimagery_time_series_of_an_alluvial_river_corridor_Elwha_River_Washington_USA.zip, 10.96 MB, label COGs from Elwha River Lower Reach (LR)
4. ElwhaLR_ imagery_ part1_ of 2.zip, 622 MB, imagery COGs from Elwha River Middle Reach (MR)
5. ElwhaLR imagery_ part2_ of_ 2.zip, 617 MB, imagery COGs from Elwha River Middle Reach (MR)
This dataset was created using open-source tools of the Doodleverse, a software ecosystem for geoscientific image segmentation, by Daniel Buscombe (https://github.com/dbuscombe-usgs) and Evan Goldstein (https://github.com/ebgoldstein). Thanks to the contributors of the Doodleverse!. Thanks especially Sharon Fitzpatrick (https://github.com/2320sharon) and Jaycee Favela for contributing labels.
• Buscombe, D. (2023). Doodleverse/Segmentation Gym SegFormer models for 4-class (other, water, sediment, wood) segmentation of RGB aerial orthomosaic imagery (v1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8172858
• Buscombe, D., Goldstein, E. B., Sherwood, C. R., Bodine, C., Brown, J. A., Favela, J., et al. (2021). Human-in-the-loop segmentation of Earth surface imagery. Earth and Space Science, 9, e2021EA002085. https://doi.org/10.1029/2021EA002085
• Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym
• Over, J.R., Ritchie, A.C., Kranenburg, C.J., Brown, J.A., Buscombe, D., Noble, T., Sherwood, C.R., Warrick, J.A., and Wernette, P.A., 2021, Processing coastal imagery with Agisoft Metashape Professional Edition, version 1.6—Structure from motion workflow documentation: U.S. Geological Survey Open-File Report 2021–1039, 46 p., https://doi.org/10.3133/ofr20211039.
• Ritchie, A.C., Curran, C.A., Magirl, C.S., Bountry, J.A., Hilldale, R.C., Randle, T.J., and Duda, J.J., 2018, Data in support of 5-year sediment budget and morphodynamic analysis of Elwha River following dam removals: U.S. Geological Survey data release, https://doi.org/10.5066/F7PG1QWC.
• Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M. and Luo, P., 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34, pp.12077-12090.
• Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A. and Torralba, A., 2019. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127, pp.302-321.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here is a set of three sequences recorded at the Mobile Robotic Laboratory (MRL) in ISR. The first bag is set to run up to 130 seconds, as a failure occurred after that point, but everything is correct until then. Along with sequence 1, there is a bag containing the semantic segmentation, obtained using the PSPNet network with the ADE20K dataset, with an opacity of 255. However, the filter.bag contains only part of the classes, while the all.bag document contains all the classes. Additionally, there are two more sequences with similar trajectories. The data includes recordings from the RGBD camera (D435i), wheel odometry, stereo-camera (Mynt Eye S1030), and 2D LiDAR (Hokuyo URG-04L). At this stage, the Mynt Eye did not yet have the IMU correction or the corrected right camera info values.
The transformations are defined as follows (approximately):
RGB-D camera: 0.18 0.005 0.71 0 0.2007 0 base_link realsense_link
Mynt Eye camera: 0.205 0.0 0.63 -1.57 0 -1.72 base_link mynteye_link
It should be noted that the laser transformations have already been recorded on the bag. The segmented images are in BGR format.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is ade20k Datasets after semantic segmentation
The Cambrian Vision-Centric Benchmark (CV-Bench) is designed to address the limitations of existing vision-centric benchmarks by providing a comprehensive evaluation framework for multimodal large language models (MLLMs). With 2,638 manually-inspected examples, CV-Bench significantly surpasses other vision-centric MLLM benchmarks, offering 3.5 times more examples than RealWorldQA and 8.8 times more than MMVP.
Motivation and Content Summary:
CV-Bench repurposes standard vision benchmarks such as ADE20K, COCO, and Omni3D to assess models on classic vision tasks within a multimodal context. Leveraging the rich ground truth annotations from these benchmarks, natural language questions are formulated to probe the fundamental 2D and 3D understanding of models.
Potential Use Cases:
Evaluating the spatial relationship and object counting capabilities of models (2D understanding). Assessing the depth order and relative distance understanding of models (3D understanding). Benchmarking the performance of multimodal models in both vision-specific and cross-modal tasks.
Dataset Characteristics:
2D Understanding Tasks: Spatial Relationship: Determine the relative position of an object with respect to the anchor object, considering left-right or top-bottom relationships.
Object Count: Determine the number of instances present in the image.
3D Understanding Tasks:
Depth Order: Determine which of the two distinct objects is closer to the camera. Relative Distance: Determine which of the two distinct objects is closer to the anchor object.
Type | Task | Description | Sources | # Samples |
---|---|---|---|---|
2D | Spatial Relationship | Determine the relative position of an object w.r.t. the anchor object. | ADE20K, COCO | 650 |
2D | Object Count | Determine the number of instances present in the image. | ADE20K, COCO | 788 |
3D | Depth Order | Determine which of the two distinct objects is closer to the camera. | Omni3D | 600 |
3D | Relative Distance | Determine which of the two distinct objects is closer to the anchor object. | Omni3D | 600 |
Curation Process:
Questions for each task are programmatically constructed and then manually inspected to ensure clarity and accuracy. Any unclear, ambiguous, or erroneous questions are removed to maintain the benchmark's reliability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MSeg is a composite dataset for multi-domain semantic segmentation. This dataset unifies semantic segmentation datasets from different domains: COCO, ADE20K, Mapillary, IDD, BDD, Cityscapes, and SUN RGB-D. By harmonizing classifications, merging and splitting classifications, a unified classification of 194 categories was obtained. In order to make the pixel-level annotations conform to a unified taxonomy, the authors conduct a large-scale annotation work through the Mechanical Turk platform, and generate compatible annotations in the dataset by relabeling object masks.
https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/
ADE20K Dataset
Description
ADE20K is composed of more than 27K images from the SUN and Places databases. Images are fully annotated with objects, spanning over 3K object categories. Many of the images also contain object parts, and parts of parts. We also provide the original annotated polygons, as well as object instances for amodal segmentation. Images are also anonymized, blurring faces and license plates.
Images
MIT, CSAIL does not own the copyright of the… See the full description on the dataset page: https://huggingface.co/datasets/1aurent/ADE20K.