Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning.
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
Annotations: The dataset has annotations for
object detection: bounding boxes and per-instance segmentation masks with 80 object categories, captioning: natural language descriptions of the images (see MS COCO Captions), keypoints detection: containing more than 200,000 images and 250,000 person instances labeled with keypoints (17 possible keypoints, such as left eye, nose, right hip, right ankle), stuff image segmentation – per-pixel segmentation masks with 91 stuff categories, such as grass, wall, sky (see MS COCO Stuff), panoptic: full scene segmentation, with 80 thing categories (such as person, bicycle, elephant) and a subset of 91 stuff categories (grass, sky, road), dense pose: more than 39,000 images and 56,000 person instances labeled with DensePose annotations – each labeled person is annotated with an instance id and a mapping between image pixels that belong to that person body and a template 3D model. The annotations are publicly available only for training and validation images.
Dataset Card for "coco-stuff-captioned-depth"
More Information needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SketchyCOCO dataset consists of two parts: Object-level data Object-level data contains $20198(train18869+val1329)$ triplets of {foreground sketch, foreground image, foreground edge map} examples covering 14 classes, $27683(train22171+val5512)$ pairs of {background sketch, background image} examples covering 3 classes. Scene-level data Scene-level data contains $14081(train 11265 + val 2816)$ pairs of {foreground image&background sketch, scene image} examples, $14081(train 11265 + val 2816)$ pairs of {scene sketch, scene image} examples and the segmentation ground truth for $14081(train 11265 + val 2816)$ scene sketches. Some val scene images come from the train images of the COCO-Stuff dataset for increasing the number of the val images of the SketchyCOCO dataset.
Panoptic segmentation aims to unify instance and semantic segmentation in the same framework. Existing works propose to merge instance and semantic segmentation using post-processing layers. Recent works unify both segmentation tasks by producing binary masks and class scores for both things and stuff classes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO is a large-scale object detection, segmentation, and captioning dataset. http://cocodataset.org COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image * 250,000 people with keypoints
This is an open source object detection model by TensorFlow in TensorFlow Lite format. While it is not recommended to use this model in production surveys, it can be useful for demonstration purposes and to get started with smart assistants in ArcGIS Survey123. You are responsible for the use of this model. When using Survey123, it is your responsibility to review and manually correct outputs.This object detection model was trained using the Common Objects in Context (COCO) dataset. COCO is a large-scale object detection dataset that is available for use under the Creative Commons Attribution 4.0 License.The dataset contains 80 object categories and 1.5 million object instances that include people, animals, food items, vehicles, and household items. For a complete list of common objects this model can detect, see Classes.The model can be used in ArcGIS Survey123 to detect common objects in photos that are captured with the Survey123 field app. Using the modelFollow the guide to use the model. You can use this model to detect or redact common objects in images captured with the Survey123 field app. The model must be configured for a survey in Survey123 Connect.Fine-tuning the modelThis model cannot be fine-tuned using ArcGIS tools.InputCamera feed (either low-resolution preview or high-resolution capture).OutputImage with common object detections written to its EXIF metadata or an image with detected objects redacted.Model architectureThis is an open source object detection model by TensorFlow in TensorFlow Lite format with MobileNet architecture. The model is available for use under the Apache License 2.0.Sample resultsHere are a few results from the model.
The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. There are totally 150 semantic categories, which include stuffs like sky, road, grass, and discrete objects like person, car, bed.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning.