Segmentation models perform a pixel-wise classification by classifying the pixels into different classes. The classified pixels correspond to different objects or regions in the image. These models have a wide variety of use cases across multiple domains. When used with satellite and aerial imagery, these models can help to identify features such as building footprints, roads, water bodies, crop fields, etc.Generally, every segmentation model needs to be trained from scratch using a dataset labeled with the objects of interest. This can be an arduous and time-consuming task. Meta's Segment Anything Model (SAM) is aimed at creating a foundational model that can be used to segment (as the name suggests) anything using zero-shot learning and generalize across domains without additional training. SAM is trained on the Segment Anything 1-Billion mask dataset (SA-1B) which comprises a diverse set of 11 million images and over 1 billion masks. This makes the model highly robust in identifying object boundaries and differentiating between various objects across domains, even though it might have never seen them before. Use this model to extract masks of various objects in any image.Using the modelFollow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS. Fine-tuning the modelThis model can be fine-tuned using SamLoRA architecture in ArcGIS. Follow the guide and refer to this sample notebook to fine-tune this model.Input8-bit, 3-band imagery.OutputFeature class containing masks of various objects in the image.Applicable geographiesThe model is expected to work globally.Model architectureThis model is based on the open-source Segment Anything Model (SAM) by Meta.Training dataThis model has been trained on the Segment Anything 1-Billion mask dataset (SA-1B) which comprises a diverse set of 11 million images and over 1 billion masks.Sample resultsHere are a few results from the model.
This deep learning model is used to detect and segment trees in high resolution drone or aerial imagery. Tree detection can be used for applications such as vegetation management, forestry, urban planning, etc. High resolution aerial and drone imagery can be used for tree detection due to its high spatio-temporal coverage.This deep learning model is based on DeepForest and has been trained on data from the National Ecological Observatory Network (NEON). The model also uses Segment Anything Model (SAM) by Meta.Using the modelFollow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.Fine-tuning the modelThis model cannot be fine-tuned using ArcGIS tools.Input8 bit, 3-band high-resolution (10-25 cm) imagery.OutputFeature class containing separate masks for each tree.Applicable geographiesThe model is expected to work well in the United States.Model architectureThis model is based upon the DeepForest python package which uses the RetinaNet model architecture implemented in torchvision and open-source Segment Anything Model (SAM) by Meta.Accuracy metricsThis model has an precision score of 0.66 and recall of 0.79.Training dataThis model has been trained on NEON Tree Benchmark dataset, provided by the Weecology Lab at the University of Florida. The model also uses Segment Anything Model (SAM) by Meta that is trained on 1-Billion mask dataset (SA-1B) which comprises a diverse set of 11 million images and over 1 billion masks.Sample resultsHere are a few results from the model.CitationsWeinstein, B.G.; Marconi, S.; Bohlman, S.; Zare, A.; White, E. Individual Tree-Crown Detection in RGB Imagery Using Semi-Supervised Deep Learning Neural Networks. Remote Sens. 2019, 11, 1309Geographic Generalization in Airborne RGB Deep Learning Tree Detection Ben Weinstein, Sergio Marconi, Stephanie Bohlman, Alina Zare, Ethan P White bioRxiv 790071; doi: https://doi.org/10.1101/790071
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
1. Digitized specimens are an indispensable resource for rapidly acquiring big datasets and typically must be preprocessed prior to conducting analyses. One crucial image preprocessing step in any image analysis workflow is image segmentation, or the ability to clearly contrast the foreground target from the background noise in an image. This procedure is typically done manually, creating a potential bottleneck for efforts to quantify biodiversity from image databases. Image segmentation meta-algorithms using deep learning provide an opportunity to relax this bottleneck. However, the most accessible pre-trained convolutional neural networks (CNNs) have been trained on a small fraction of biodiversity, thus limiting their utility.
2. We trained a deep learning model to automatically segment target fish from images with both standardized and complex, noisy backgrounds. We then assessed the performance of our deep learning model using qualitative visual inspection and quantitative image segmentation metrics of pixel overlap between reference segmentation masks generated manually by experts and those automatically predicted by our model.
3. Visual inspection revealed that our model segmented fishes with high precision and relatively few artifacts. These results suggest that the meta-algorithm (Mask R-CNN), in which our current fish segmentation model relies on, is well-suited for generating high-fidelity segmented specimen images across a variety of background contexts at rapid pace.
4. We present Sashimi, a user-friendly command line toolkit to facilitate rapid, automated high-throughput image segmentation of digitized organisms. Sashimi is accessible to non-programmers and does not require experience with deep learning to use. The flexibility of Mask R-CNN allows users to generate a segmentation model for use on diverse animal and plant images using transfer learning with training datasets as small as a few hundred images. To help grow the taxonomic scope of images that can be recognized, Sashimi also includes a central database for sharing and distributing custom-trained segmentation models of other unrepresented organisms. Lastly, Sashimi includes both auxiliary image preprocessing functions useful for some popular downstream color pattern analysis workflows, as well as a simple script to aid users in qualitatively and quantitatively assessing segmentation model performance for complementary sets of automatically and manually segmented images.
https://maadaa.ai/path/to/licensehttps://maadaa.ai/path/to/license
The "Industrial Metal Smelting Flame Classification Dataset" is designed for the industry sector, featuring a collection of internet-collected images of metal smelting flames, all with a resolution of 350 x 350 pixels. This dataset is dedicated to the classification of flame images into 10 categories, including overexposure, black smoke, fire mass, sparks, and various intensities of slag jumping and spatter, providing crucial data for monitoring and optimizing smelting processes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Brain tumors, characterized by the uncontrolled growth of abnormal cells, pose a significant threat to human health. Early detection is crucial for successful treatment and improved patient outcomes. Magnetic Resonance Imaging (MRI) is the primary diagnostic tool for brain tumors, providing detailed visualizations of the brain’s intricate structures. However, the complexity and variability of tumor shapes and locations often challenge physicians in achieving accurate tumor segmentation on MRI images. Precise tumor segmentation is essential for effective treatment planning and prognosis. To address this challenge, we propose a novel hybrid deep learning technique, Convolutional Neural Network and ResNeXt101 (ConvNet-ResNeXt101), for automated tumor segmentation and classification. Our approach commences with data acquisition from the BRATS 2020 dataset, a benchmark collection of MRI images with corresponding tumor segmentations. Next, we employ batch normalization to smooth and enhance the collected data, followed by feature extraction using the AlexNet model. This involves extracting features based on tumor shape, position, shape, and surface characteristics. To select the most informative features for effective segmentation, we utilize an advanced meta-heuristics algorithm called Advanced Whale Optimization (AWO). AWO mimics the hunting behavior of humpback whales to iteratively search for the optimal feature subset. With the selected features, we perform image segmentation using the ConvNet-ResNeXt101 model. This deep learning architecture combines the strengths of ConvNet and ResNeXt101, a type of ConvNet with aggregated residual connections. Finally, we apply the same ConvNet-ResNeXt101 model for tumor classification, categorizing the segmented tumor into distinct types. Our experiments demonstrate the superior performance of our proposed ConvNet-ResNeXt101 model compared to existing approaches, achieving an accuracy of 99.27% for the tumor core class with a minimum learning elapsed time of 0.53 s.
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
This repository contains the MetaGraspNet Dataset described in the paper "MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis" (https://arxiv.org/abs/2112.14663 ).
There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One particular impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations.
To tackle this big, diverse data problem, we are inspired by the recent rise in the concept of metaverse, which has greatly closed the gap between virtual worlds and the physical world. In particular, metaverses allow us to create digital twins of real-world manufacturing scenarios and to virtually create different scenarios from which large volumes of data can be generated for training models. We present MetaGraspNet: a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types, and is split into 5 difficulties to evaluate object detection and segmentation model performance in different grasping scenarios. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance in a manner that is more appropriate for robotic grasp applications compared to existing general-purpose performance metrics. This repository contains the first phase of MetaGraspNet benchmark dataset which includes detailed object detection, segmentation, layout annotations, and a script for layout-weighted performance metric (https://github.com/y2863/MetaGraspNet ).
https://raw.githubusercontent.com/y2863/MetaGraspNet/main/.github/500.png">
If you use MetaGraspNet dataset or metric in your research, please use the following BibTeX entry.
BibTeX
@article{chen2021metagraspnet,
author = {Yuhao Chen and E. Zhixuan Zeng and Maximilian Gilles and
Alexander Wong},
title = {MetaGraspNet: a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis},
journal = {arXiv preprint arXiv:2112.14663},
year = {2021}
}
This dataset is arranged in the following file structure:
root
|-- meta-grasp
|-- scene0
|-- 0_camera_params.json
|-- 0_depth.png
|-- 0_rgb.png
|-- 0_order.csv
...
|-- scene1
...
|-- difficulty-n-coco-label.json
Each scene is an unique arrangement of objects, which we then display at various different angles. For each shot of a scene, we provide the camera parameters (x_camara_params.json
), a depth image (x_depth.png
), an rgb image (x_rgb.png
), as well as a matrix representation of the ordering of each object (x_order.csv
). The full label for the image are all available in difficulty-n-coco-label.json
(where n is the difficulty level of the dataset) in the coco data format.
The matrix describes a pairwise obstruction relationship between each object within the image. Given a "parent" object covering a "child" object:
relationship_matrix[child_id, parent_id] = -1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionImage segmentation is an important process for quantifying characteristics of malignant bone lesions, but this task is challenging and laborious for radiologists. Deep learning has shown promise in automating image segmentation in radiology, including for malignant bone lesions. The purpose of this review is to investigate deep learning-based image segmentation methods for malignant bone lesions on Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and Positron-Emission Tomography/CT (PET/CT).MethodThe literature search of deep learning-based image segmentation of malignant bony lesions on CT and MRI was conducted in PubMed, Embase, Web of Science, and Scopus electronic databases following the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). A total of 41 original articles published between February 2017 and March 2023 were included in the review.ResultsThe majority of papers studied MRI, followed by CT, PET/CT, and PET/MRI. There was relatively even distribution of papers studying primary vs. secondary malignancies, as well as utilizing 3-dimensional vs. 2-dimensional data. Many papers utilize custom built models as a modification or variation of U-Net. The most common metric for evaluation was the dice similarity coefficient (DSC). Most models achieved a DSC above 0.6, with medians for all imaging modalities between 0.85–0.9.DiscussionDeep learning methods show promising ability to segment malignant osseous lesions on CT, MRI, and PET/CT. Some strategies which are commonly applied to help improve performance include data augmentation, utilization of large public datasets, preprocessing including denoising and cropping, and U-Net architecture modification. Future directions include overcoming dataset and annotation homogeneity and generalizing for clinical applicability.
https://www.emergenresearch.com/purpose-of-privacy-policyhttps://www.emergenresearch.com/purpose-of-privacy-policy
Explore the detailed segmentation analysis of the Metal Foam market. Understand detailed breakdown for each segment and uncover market opportunities.
There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One particular impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations.
To tackle this big, diverse data problem, we are inspired by the recent rise in the concept of metaverse, which has greatly closed the gap between virtual worlds and the physical world. In particular, metaverses allow us to create digital twins of real-world manufacturing scenarios and to virtually create different scenarios from which large volumes of data can be generated for training models. We present MetaGraspNet: a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types, and is split into 5 difficulties to evaluate object detection and segmentation model performance in different grasping scenarios. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance in a manner that is more appropriate for robotic grasp applications compared to existing general-purpose performance metrics. This repository contains the first phase of MetaGraspNet benchmark dataset which includes detailed object detection, segmentation, layout annotations, and a script for layout-weighted performance metric (https://github.com/y2863/MetaGraspNet ).
Meta Platforms continues to dominate the digital landscape, with its Family of Apps segment generating a remarkable 162.4 billion U.S. dollars in revenue for 2024. This figure underscores the company's ability to monetize its vast user base across platforms like Facebook, Instagram, Messenger, and WhatsApp, despite facing challenges in recent years. Advertising fuels growth amid market fluctuations Despite experiencing its first-ever year-on-year decline in 2022, Meta rebounded strongly in 2024, with total annual revenue reaching 164.5 billion U.S. dollars. This resilience showcases Meta's adaptability in the face of market changes and its continued appeal to advertisers seeking to reach a global audience. Expanding reach and engagement Facebook was the first social network to surpass one billion registered accounts and currently sits at more than three billion monthly active users. Additionally, 2024 saw an astounding 138.9 million Reels played on Facebook and Instagram every 60 seconds.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistical Analysis for semantic segmentation on various classifiers for dataset 2.
https://www.emergenresearch.com/purpose-of-privacy-policyhttps://www.emergenresearch.com/purpose-of-privacy-policy
Analyze the market segmentation of the Metal Bonding Adhesives industry. Gain insights into market share distribution with a detailed breakdown of key segments and their growth.
Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page. Relevant computer vision tasks: bounding box detection classification instance segmentation keypoint estimation 3D bounding box estimation 3D voxel reconstruction 3D reconstruction The dataset is for academic research use only, since it uses resources with restrictive licenses. For a detailed description of how the resources are used, we refer to our paper and project page. Licenses of the resources in detail: Google Scanned Objects: CC BY 4.0 (for details on which files are used, see the respective meta folder) Cardboard Dataset: CC BY 4.0 Shipping Label Dataset: CC BY-NC 4.0 Other Labels: See file misc/source_urls.json LDR Dataset: License for Non-Commercial Use Large Logo Dataset (LLD): Please notice that this dataset is made available for academic research purposes only. All the images are collected from the Internet, and the copyright belongs to the original owners. If any of the images belongs to you and you would like it removed, please kindly inform us, we will remove it from our dataset immediately. You can use our textureless models (i.e. the obj files) of damaged parcels under CC BY 4.0 (note that this does not apply to the textures). If you use this resource for scientific research, please consider citing @inproceedings{naumannParcel3DShapeReconstruction2023, author = {Naumann, Alexander and Hertlein, Felix and D"orr, Laura and Furmans, Kai}, title = {Parcel3D: Shape Reconstruction From Single RGB Images for Applications in Transportation Logistics}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {4402-4412} }
A comprehensive dataset of 15M+ images sourced globally, featuring full EXIF data, including camera settings and photography details. Enriched with object and scene detection metadata, this dataset is ideal for AI model training in image recognition, classification, and segmentation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an image dataset of Anemone species. The dataset was used to develop and evaluate a tepal arrangement estimation method. The dataset consists of flower images of A. flaccida, A. hepatica, A. x hybrida, A. nikoensis, A.pulsatilla, and A. soyensis. These images were taken at locations in Shiga, Kyoto, Hyogo, Okayama, and Hiroshima prefectures, using an Olympus TG-5 digital camera and a Nikon D5200 SLR camera. All images were recorded in PNG format. The images were normalized to 1,000 × 662 pixels.
India Base Metal Mining Market Segmentation Indias Mineral Conservation and Development Rules (MCDR), revised in 2021, emphasize sustainable mining practices, including stricter requirements for mineral conservation, waste management, and rehabilitation of mined land. In 2023, compliance audits revealed that 95% of mining companies met these requirements, showcasing improvements in environmental stewardship. The Ministry of Mines monitors these regulations, ensuring that mining companies adhere to guidelines aimed at minimizing environmental impact and ensuring long-term mineral sustainability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Doodleverse/Segmentation Zoo/Seg2Map Res-UNet models for DeepGlobe/7-class segmentation of RGB 512x512 high-res. images
These Residual-UNet model data are based on the [DeepGlobe dataset](https://www.kaggle.com/datasets/balraj98/deepglobe-land-cover-classification-dataset)
Models have been created using Segmentation Gym* using the following dataset**: https://www.kaggle.com/datasets/balraj98/deepglobe-land-cover-classification-dataset
Image size used by model: 512 x 512 x 3 pixels
classes:
1. urban
2. agricultural
3. rangeland
4. forest
5. water
6. bare
7. unknown
File descriptions
For each model, there are 5 files with the same root name:
1. '.json' config file: this is the file that was used by Segmentation Gym* to create the weights file. It contains instructions for how to make the model and the data it used, as well as instructions for how to use the model for prediction. It is a handy wee thing and mastering it means mastering the entire Doodleverse.
2. '.h5' weights file: this is the file that was created by the Segmentation Gym* function `train_model.py`. It contains the trained model's parameter weights. It can called by the Segmentation Gym* function `seg_images_in_folder.py`. Models may be ensembled.
3. '_modelcard.json' model card file: this is a json file containing fields that collectively describe the model origins, training choices, and dataset that the model is based upon. There is some redundancy between this file and the `config` file (described above) that contains the instructions for the model training and implementation. The model card file is not used by the program but is important metadata so it is important to keep with the other files that collectively make the model and is such is considered part of the model
4. '_model_history.npz' model training history file: this numpy archive file contains numpy arrays describing the training and validation losses and metrics. It is created by the Segmentation Gym function `train_model.py`
5. '.png' model training loss and mean IoU plot: this png file contains plots of training and validation losses and mean IoU scores during model training. A subset of data inside the .npz file. It is created by the Segmentation Gym function `train_model.py`
Additionally, BEST_MODEL.txt contains the name of the model with the best validation loss and mean IoU
References
*Segmentation Gym: Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym
**Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D. and Raskar, R., 2018. Deepglobe 2018: A challenge to parse the earth through satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 172-181).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data record contains 5 zip files all used to build and use a semantic segmentation model to operate on beach imagery taken at the Field Research Facility (FRF) in Duck, North Carolina, USA. All data is from 2015-2021
The training_data.zip
contains all data used to train the ML model. All images come from the north facing (c1) camera. This zip file includes: a list of classes used to label the imagery, and folders of 107 images, 107 sparse annotations (doodles), 107 labels, and 107 overlays. All labeling was done with the open-source labeling tool ‘Doodler (Buscombe et al., 2021).
The model.zip
file contains the ML model, and associated metadata. This includes: a JSON model configuration file, a figure showing model training statistics, an .npz
file of model training output, a list of training and validation files, the model as an h5 file and in the Tensorflow ‘saved model’ format. All modeling was done with Segmentation Gym (Buscombe & Goldstein 2022).
The test_data_c6.zip
file contains all data from the south facing (c6) camera to test the ML model. This includes: a list of classes used to label the imagery, and folders of 10 images, 10 sparse annotations (doodles), 10 labels, and 10 overlays. All labeling was done with the open-source labeling tool ‘Doodler (Buscombe et al., 2021). Testing the model with this data was done with codes in: https://github.com/ebgoldstein/FRF_GrainSize
The test_data_c1.zip
file contains all data from the north facing (c1) camera to test the ML model. This includes: a list of classes used to label the imagery, and folders of 10 images, 10 sparse annotations (doodles), 10 labels, and 10 overlays. All labeling was done with an open-source labeling tool ‘Doodler (Buscombe et al., 2021). Testing the model with this data was done with codes in: https://github.com/ebgoldstein/FRF_GrainSize
The predictions.zip
file contains 4418 images from the north facing (c1) camera that were run through the trained segmentation model as well as the resulting output (presented as side-by-side image and overlays). These images were created using codes in Segmentation Gym (Buscombe & Goldstein 2022).
https://www.emergenresearch.com/purpose-of-privacy-policyhttps://www.emergenresearch.com/purpose-of-privacy-policy
Explore the detailed segmentation analysis of the Surface Treatments For Metal market. Understand detailed breakdown for each segment and uncover market opportunities.
This prostate MRI segmentation dataset is collected from six different data sources.
Segmentation models perform a pixel-wise classification by classifying the pixels into different classes. The classified pixels correspond to different objects or regions in the image. These models have a wide variety of use cases across multiple domains. When used with satellite and aerial imagery, these models can help to identify features such as building footprints, roads, water bodies, crop fields, etc.Generally, every segmentation model needs to be trained from scratch using a dataset labeled with the objects of interest. This can be an arduous and time-consuming task. Meta's Segment Anything Model (SAM) is aimed at creating a foundational model that can be used to segment (as the name suggests) anything using zero-shot learning and generalize across domains without additional training. SAM is trained on the Segment Anything 1-Billion mask dataset (SA-1B) which comprises a diverse set of 11 million images and over 1 billion masks. This makes the model highly robust in identifying object boundaries and differentiating between various objects across domains, even though it might have never seen them before. Use this model to extract masks of various objects in any image.Using the modelFollow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS. Fine-tuning the modelThis model can be fine-tuned using SamLoRA architecture in ArcGIS. Follow the guide and refer to this sample notebook to fine-tune this model.Input8-bit, 3-band imagery.OutputFeature class containing masks of various objects in the image.Applicable geographiesThe model is expected to work globally.Model architectureThis model is based on the open-source Segment Anything Model (SAM) by Meta.Training dataThis model has been trained on the Segment Anything 1-Billion mask dataset (SA-1B) which comprises a diverse set of 11 million images and over 1 billion masks.Sample resultsHere are a few results from the model.