Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NPM3D is a public benchmark for point cloud semantic segmentation, with 10 classes including: ground, building, pole (road sign and traffic light), bollard, trash can, barrier, pedestrian, car, natural (vegetation) and unclassified. Results are evaluated only w.r.t. 9 classes, disregarding the "unclassified" label. The data has been captured with a mapping-grade mobile laser scanning system in different cities in France. There are 4 regions designated for training, all captured in Paris and Lille; and 3 regions for testing, captured in Dijon and Ajaccio. The standard 10-class version described above has actually been derived from a more fine-grained version of the dataset by keeping only the most frequent labels. The original annotations feature 50 different semantic classes (most of which are very rare), and also individual object instance labels for the training regions. For panoptic segmentation, a new version has been generated that still uses the 10 semantic category labels listed above, but also includes instance labels. The classes ground, building and barrier are considered "stuff" and are not separated into instances. As no instance labels are available for the 3 test regions, our version for panoptic (or pure instance) segmentation only contains 4 different regions from Paris and Lille. Instead of a fixed training/test split all experiments therefore use 4-fold cross-validation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Insect Hotel Dataset is a photorealistic synthetic dataset designed for pose estimation and panoptic segmentation tasks. It contains 20,000 synthetically generated photorealistic images of objects used in a human-robot collaborative assembly scenario. The dataset was created using NViSII. It also includes the 3D object meshes and YOLOv8 model weights.
This dataset accompanies the following upcoming publication:
Juan Carlos Saborío, Marc Vinci, Oscar Lima, Sebastian Stock, Lennart Niecksch, Martin Günther, Joachim Hertzberg, and Martin Atzmüller (2025): “Uncertainty-Resilient Active Intention Recognition for Robotic Assistants”. (submitted)
To facilitate easier downloading, the dataset has been split into 10 parts. Each part is further divided into three archives:
RGB images + JSON annotations
Depth images (optional)
Instance segmentation images (optional)
To use the complete dataset, download all 30 archives and extract them into the same root folder, so that the depth and segmentation images are located alongside the corresponding RGB and JSON files.
The dataset format (coordinate systems, conventions, and JSON fields) follows the structure documented here.
Contents of the archives:
.
├── insect_hotel_20k_00.tgz # RGB images + annotation JSON files
│ └── 00 # archive index (00...09)
│ ├── 0000 # scene index (0000...0099), each with 20 images in front of the same background
│ │ ├── 00000.jpg # RGB image
│ │ ├── 00000.json # pose, bounding boxes, etc.
│ │ ├── [...]
│ │ ├── 00019.jpg
│ │ ├── 00019.json
│ │ ├── _camera_settings.json # camera intrinsics
│ │ └── _object_settings.json # object metadata
│ ├── [...]
│ └── 0099
├── insect_hotel_20k_00.depth.tgz # Depth images (.exr)
│ └── 00
│ └── 0000
│ ├── 00000.depth.exr
│ └── [...]
├── insect_hotel_20k_00.seg.tgz # Instance segmentation images (.exr)
│ └── 00
│ └── 0000
│ ├── 00000.seg.exr
│ └── [...]
└── insect_hotel_20k_01.tgz
└── 01
└── 0000
├── 00000.jpg
├── 00000.json
└── [...]
The file meshes.tgz contains all object meshes used for training.
bright_green_part
dark_green_part
magenta_part
purple_part
red_part
yellow_part
klt — “Kleinladungsträger” (small load carrier / blue box)
multimeter
power_drill_with_grip
relay
screwdriver
Additionally, the images include various distractor objects from the Google Scanned Objects (GSO) dataset. The corresponding meshes are not included here but can be obtained directly from the GSO dataset.
The file yolov8_weights.tgz contains a YOLOv8 model that was trained on a subset of the object classes. The class index mapping is as follows:
0: bright_green_part
1: dark_green_part
2: magenta_part
3: purple_part
4: red_part
5: yellow_part
6: klt
Helper utilities for converting the DOPE format to YOLO format, along with scripts for training, inference, and visualization, are available via:
git clone -b insect_hotel https://github.com/DFKI-NI/yolo8_keypoint_utils.git
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following parameters are static, and their respective columns are hidden: we use our proposed training configuration, the loss function is the binary cross entropy, no augmentation is performed, DEF selection is performed with Joint Optimization (JO), and we use the Meyer Watershed (MWS) for CSE.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset for DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
Project Page | arXiv
File Structure
File Optional Description
pickled_data
Raw data (images, etc.) from InstPIFu
instpifu_mask
Instance masks from InstPIFu
metadata
JSONL metadata for scenes
panoptic
Panoptic segmentation maps we rendered
depth ✅ Estimated depth with Depth Pro
grounded_sam ✅ Estimated segmentation with Grounded SAM… See the full description on the dataset page: https://huggingface.co/datasets/zx1239856/DepR-3D-FRONT.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO semantic segmentation maps
This dataset contains semantic segmentation maps (monochrome images where each pixel corresponds to one of the 133 COCO categories used for panoptic segmentation). It was generated from the 2017 validation annotations using the following process:
git clone https://github.com/cocodataset/panopticapi and install it. python converters/panoptic2semantic_segmentation.py --input_json_file /data/datasets/coco/2017/annotations/panoptic_val2017.json… See the full description on the dataset page: https://huggingface.co/datasets/enterprise-explorers/coco-semantic-segmentation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following parameters are static, and their respective columns are hidden: model architecture is U-Net (trained from scratch), we use the improved training variant, the loss function is the binary cross entropy, the best DEF is selected using joint optimization, and Meyer Watershed (MWS) is used for CSE.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following parameters are static, and their respective columns are hidden: we use the Meyer Watershed (MWS) for CSE and Joint Optimization (JO) for DEF selection, we use our proposed training configuration, no augmentation is performed. For the architectures, * indicates pre-trained variants: the network is trained first using binary cross-entropy, then using a custom loss.
https://github.com/DISIC/politique-de-contribution-open-source/blob/master/LICENSE.pdfhttps://github.com/DISIC/politique-de-contribution-open-source/blob/master/LICENSE.pdf
To enhance the spatial resolution and utility of PASTIS-R dataset, we introduce PASTIS-HD, which integrates contemporaneous VHR satellite images (SPOT 6-7), resampled to a 1m resolution and converted to 8 bits. This enhancement significantly improves the dataset's spatial content, providing more granular information for agricultural parcel segmentation.
This folder can be added to the PASTIS-R dataset to get the PASTIS-HD version.
The SPOT images are opendata thanks to the Dataterra Dinamis initiative in the case of the "Couverture France DINAMIS" program.
If you use PASTIS please cite the related paper:
@article{garnot2021panoptic,
title={Panoptic Segmentation of Satellite Image Time Series
with Convolutional Temporal Attention Networks},
author={Sainte Fare Garnot, Vivien and Landrieu, Loic },
journal={ICCV},
year={2021}
}
For the PASTIS-R optical-radar fusion dataset, please also cite this paper:
@article{garnot2021mmfusion, title = {Multi-modal temporal attention models for crop mapping from satellite time series}, journal = {ISPRS Journal of Photogrammetry and Remote Sensing}, year = {2022}, doi = {https://doi.org/10.1016/j.isprsjprs.2022.03.012}, author = {Vivien {Sainte Fare Garnot} and Loic Landrieu and Nesrine Chehata}, }
For the PASTIS-HD with the 3 modality optical-radar time series plus VHR images dataset, please also cite this paper:
@article{astruc2024omnisat,
title={Omni{S}at: {S}elf-Supervised Modality Fusion for {E}arth Observation},
author={Astruc, Guillaume and Gonthier, Nicolas and Mallet, Clement and Landrieu, Loic},
journal={arXiv preprint arXiv:2404.08351},
year={2024}
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following parameters are static, and their respective columns are hidden: we use the Meyer Watershed (MWS) for CSE and Joint Optimization (JO) for DEF selection, we use our proposed training configuration, the loss function is the binary cross entropy, no augmentation is performed. For the architectures, * indicates pre-trained variants.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
COST Dataset
The COST dataset includes the following components for training and evaluating MLLMs on object-level perception tasks:
RGB Images obtained from the COCO-2017 dataset. Segmentation Maps for semantic, instance, and panoptic segmentation tasks, obtained using the publicly available DiNAT-L OneFormer model trained on the COCO dataset. Questions obtained by prompting GPT-4 for object identification and object order perception tasks. You can find the questions in… See the full description on the dataset page: https://huggingface.co/datasets/shi-labs/COST.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The training configuration from [5] is indicated as “Original”, while our proposed method is indicated as “Proposed”. The following parameters are static, and their respective columns are hidden: the CSE used is a naive connected component labelling ([5] used a grid search to find the best threshold θ for EPM binarization while we use a fixed value of 0.5), the loss function is the binary cross entropy, the best DEF is selected using the protocol of [5], no augmentation is performed. For the architectures, * indicates pre-trained variants.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NPM3D is a public benchmark for point cloud semantic segmentation, with 10 classes including: ground, building, pole (road sign and traffic light), bollard, trash can, barrier, pedestrian, car, natural (vegetation) and unclassified. Results are evaluated only w.r.t. 9 classes, disregarding the "unclassified" label. The data has been captured with a mapping-grade mobile laser scanning system in different cities in France. There are 4 regions designated for training, all captured in Paris and Lille; and 3 regions for testing, captured in Dijon and Ajaccio. The standard 10-class version described above has actually been derived from a more fine-grained version of the dataset by keeping only the most frequent labels. The original annotations feature 50 different semantic classes (most of which are very rare), and also individual object instance labels for the training regions. For panoptic segmentation, a new version has been generated that still uses the 10 semantic category labels listed above, but also includes instance labels. The classes ground, building and barrier are considered "stuff" and are not separated into instances. As no instance labels are available for the 3 test regions, our version for panoptic (or pure instance) segmentation only contains 4 different regions from Paris and Lille. Instead of a fixed training/test split all experiments therefore use 4-fold cross-validation.