12 datasets found

P
SceneNet RGB-D Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SceneNet RGB-D Dataset [Dataset]. https://paperswithcode.com/dataset/scenenet-rgb-d
Explore at:
Description
SceneNet-RGBD is a synthetic dataset containing large-scale photorealistic renderings of indoor scene trajectories with pixel-level annotations. Random sampling permits virtually unlimited scene configurations, and the dataset creators provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system.
Data from: Synthetic Data for Non-rigid 3D Reconstruction using a Moving...
data.csiro.au
researchdata.edu.au
Updated Sep 13, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shafeeq Elanattil; Peyman Moghadam (2018). Synthetic Data for Non-rigid 3D Reconstruction using a Moving RGB-D Camera [Dataset]. http://doi.org/10.25919/5b7b60176d0cd
Explore at:
Unique identifier
https://doi.org/10.25919/5b7b60176d0cd
Dataset updated
Sep 13, 2018
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Shafeeq Elanattil; Peyman Moghadam
License
https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
Dataset funded by
CSIROhttp://www.csiro.au/
Queensland University of Technology
Description
We introduce a synthetic dataset for evaluating no-rigid 3D reconstruction using a moving RGB-D camera. The dataset consist of two subjects captured with four different camera trajectories. For each case we provide frame-by-frame ground truth geometry of the scene, the camera trajectory and foreground mask. This synthetic data was a part of paper "Non-rigid reconstruction with a single moving RGB-D camera" published at ICPR 2018. If you are using this dataset please cite the paper and this collection. More information can be found at the supporting documents.
Synthetic RGB-D data for plant segmentation
kaggle.com
Updated Nov 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
El Baha Farouk (2020). Synthetic RGB-D data for plant segmentation [Dataset]. https://www.kaggle.com/harlequeen/synthetic-rgbd-images-of-plants/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
El Baha Farouk
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Nowadays, AI has revolutionized almost all fields. But, the lack of data was and still represents a wall in front of this revolution. Fortunately, we can solve this problem by generating synthetic data to train our models on and using fine-tuning with the few available real data that we possess. This dataset was built and generated for the purpose of training deep neural networks toward Semantic and Instance Segmentation of plants in the agricultural field.

Content

A dataset of RGB-D 224x224 images of plants. It contains 5 sub-folders: rgb, depth, semantic, instances masks and node. In fact, each plant has 5 classes ( leaf, petiole, stem, fruit and background ). It could be used in different Computer Vision and Deep Learning projects such: Plant detection, Measure of objects using Depth, Robotics and also for making agriculture smart.

Inspiration

If this dataset was useful for you please let me know ! and in what project did you use it ?
GESDPD depth people detection dataset
kaggle.com
Updated Jan 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Fuentes Jimenez (2020). GESDPD depth people detection dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/915718
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/915718
Dataset updated
Jan 28, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
David Fuentes Jimenez
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
1) Introduction

The GESDPD is a depth images database containing 22000 frames , that simulates to have been taken with a sensor in an elevated front position, in an indoor environment. It was designed to fulfill the following objectives: • Allow the train and evaluation of people detection algorithms based on depth , or RGB-D data, without the need of manually labeling. • Provide quality synthetic data to the research community in people detection tasks. The people detection task can also be extended to practical applications such as video-surveillance, access control, people flow analysis, behaviour analysis or event capacity management.

2) General contents

GESDPD is composed of 22000 depth synthetic images, that simulates to have been taken with a sensor in an elevated front position, in a rectangular, indoor working environment. These have been generated using the simulation software Blender. The synthetic images show a room with different persons walking in different directions. The camera perspective is not stationary, it moves around the room along the database, which avoids a constant background. Some examples of the different views are shown in the next figures. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1411532%2Fdf423fca8672eab818d38a456ad36546%2Fsala_blend.png?generation=1578587944416132&alt=media" alt="">

3) Quantitative details on the database content are provided below.

• Number of frames: 22000

• Number of different people: 4 (3 men and 1 woman)

• Number of labeled people: 20800

• Image resolution: 320 × 240 pixels

For each image, the are provided the depth map and the ground truth including the position of each person in the scene.

To give you an idea on what to expect, the next figure shows some examples of images from the dataset. In this figure, depth values are represented in millimeters, using a colormap. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1411532%2F30197eb1dbbfb1e6e5bcbf0d5354b3f2%2Fsample-synthetic-images.png?generation=1578588087445161&alt=media" alt="">

4) ** Geometry details** As it has been said before, the dataset simulates to have been taken with a sensor in an elevated front position, in a rectangular indoor working environment. Specifically, the camera was placed at a height of 3 meters, and it rotates along the sequence. Regarding the room (whose distribution is shown in figure the next figure), its dimensions are 8.56 × 5.02m, and it has a height of 3.84m. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1411532%2Fa1ebe9f4de9d06e508b3105bfd9973f9%2Fdistribution.png?generation=1578588287335433&alt=media" alt="">

5) File Formats

5.1) Depth data

The depth information (distance to the camera plane) in stored as a .png image, in which each pixel represent the depth value in millimeters as a (little endian) unsigned integer of two bytes. Its values range from 0 to 15000.

5.2) Position Ground Truth Data

The ground truth information is also provided as a .png file, with the same dimensions that the gener- ated images (320 × 240 pixels). The ground truth files have in their names the same number than the corresponding depth files. For labeling people positions, there have been placed Gaussian functions over the centroid of the head of each person in the scene, so that the centroid corresponds to the 2D position of the center of the head and has a normalized value of one. The standard deviation has a value of 15 pixels for all the Gaussians, regardless of the size of each head and the distance from the head to the camera. This value has been calculated based on an estimated value of the average diameter of a person head, taking into account anthropometric considerations. It is worth to highlight that, when two heads are very closely or overlapping with each other, instead of adding both Gaussian functions, the maximum value of them prevail. That modification provides a set of Gaussians that are always separated, so that the CNN can learn to generate that separation between Gaussians in its output. The next figure shows an example of two Gaussian functions. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1411532%2F935aa3e17689e9183fe181fc50b76239%2Fgaussian.png?generation=1578588393522470&alt=media" alt="">

6) Disclaimer, Licensing, Request and Contributions This document and the data provided are work in progress and provided as is. The GEINTRA Synthetic Depth People Detection (GESDPD) Database (and accompanying files and documentation) by David Fuentes-Jiménez, Roberto Martín-López, Cristina Losada-Gutiérrez, Javier Macías-Guarasa and Carlos A. Luna is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

If you make use of this database and/or its related documentation,...
CHOC: The CORSMAL Hand-Occluded Containers dataset
zenodo.org
explore.openaire.eu
+1more
bin, zip
Updated Sep 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xavier Weber; Tommaso Apicella; Alessio Xompero; Alessio Xompero; Andrea Cavallaro; Andrea Cavallaro; Xavier Weber; Tommaso Apicella (2023). CHOC: The CORSMAL Hand-Occluded Containers dataset [Dataset]. http://doi.org/10.5281/zenodo.8332421
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8332421
Dataset updated
Sep 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Xavier Weber; Tommaso Apicella; Alessio Xompero; Alessio Xompero; Andrea Cavallaro; Andrea Cavallaro; Xavier Weber; Tommaso Apicella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CORSMAL Hand-Occluded Containers (CHOC) is an image-based dataset for category-level 6D object pose and size estimation, affordance segmentation, object detection, object and arm segmentation, and hand+object reconstruction. The dataset has 138,240 pseudo-realistic composite RGB-D images of hand-held containers on top of 30 real backgrounds (mixed-reality set) and 3,951 RGB-D images selected from the CORSMAL Container Manipulation (CCM) dataset (real set). CHOC-AFF is the subset that focuses on the problem of visual affordance segmentation. CHOC-AFF consists of the RGB images, the object and arm segmentation masks, and the affordance segmentation masks.

The images of the mixed-reality set are automatically rendered using Blender, and are split into 129,600 images of handheld containers and 8,640 images of objects without hand. Only one synthetic container is rendered for each image. Images are evenly split among 48 unique synthetic objects from three categories, namely 16 boxes, 16 drinking containers without stem (nonstems) and 16 drinking containers with stems (stems), selected from ShapeNetSem. For each object, 6 realistic grasps were manually annotated using GraspIt!: bottom grasp, natural grasp, and top grasp for the left and right hand. The mixed-reality set provides RGB images, depth images, segmentation masks (hand and object), normalised object coordinates images (only object), object meshes, annotated 6D object poses (orientation and translation in 3D with respect to the camera view), and grasp meshes with their MANO parameters. Each image has a resolution of 640x480 pixels. Background images were acquired using an Intel RealSense D435i depth camera, and include 15 indoor and 15 outdoor scenes. All information necessary to re-render the dataset is provided, namely backgrounds, camera intrinsic parameters, lighting, object models, and hand + forearm meshes, and poses; users can complement the existing data with additional annotations. Note: The mixed-reality set was built on top of previous works for the generation of synthetic and mixed-reality datasets, such as OBMan and NOCS-CAMERA.

The images of the real set are selected from 180 representative sequences of the CCM dataset. Each image contains a person holding one of the 15 containers during a manipulation occurring in the video prior to a handover (e.g., picking up an empty container, shaking an empty or filled food box, or pouring a content into a cup or drinking glass). For each object instance, sequences were chosen under four randomly sampled conditions, including background and lighting conditions, scenarios (person sitting, with the object on the table; person sitting and already holding the object; person standing while holding the container and then walking towards the table), and filling amount and type. The same sequence is selected from the three fixed camera views (two side and one frontal view) of the CCM setup (60 sequences for each view). Fifteen sequences exhibit the case of the empty container for all fifteen objects, whereas the other sequences have the person filling the container with either pasta, rice or water at 50% or 90% of the full container capacity. The real set has RGB images, depth images and 6D pose annotations. For each sequence, the 6D poses of the containers are manually annotated every 10 frames if the container is visible in at least two views, resulting in a total of 3,951 annotations. Annotations of the 6D poses for the intermediate frames are also provided by using interpolation.

Contacts
For enquiries, questions, or comments, please contact Alessio Xompero. For enquiries, questions, or comments about CHOC-AFF, please contact Tommaso Apicella.

References
If you work on Visual Affordance Segmentation and you use the subset CHOC-AFF, please see the related work on ACANet and also cite:
Affordance segmentation of hand-occluded containers from exocentric images
T. Apicella, A. Xompero, E. Ragusa, R. Berta, A. Cavallaro, P. Gastaldo
IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2023

Additional resources
Webpage of 6D pose estimation using CHOC
Toolkit to parse and inspect the dataset, or generate new data

Release notes
2023/09/10
- Added object affordance segmentation masks

2023/02/08
- Fixed NOCS maps due to a missing rotation during the generation
- Fixed annotations to include the missing rotation

2023/01/09
- Fixed RGB_070001_80000 (wrong files previously)

2022/12/14
- Added a mapping dictionary from grasp-IDs to their corresponding MANO-parameters-IDs to grasp.zip
- Added object meshes with the NOCS textures/material in object_models.zip
- Fixed folder name in annotations.zip
- Updated README file to include these changes and fix a typo in the code block to unzip files
f
Data from: SynthSoM: A synthetic intelligent multi-modal...
springernature.figshare.com
bin
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Cheng; Ziwei Huang; Yong Yu; Lu Bai; Mingran Sun; Zengrui Han; Ruide Zhang; Sijiang Li (2025). SynthSoM: A synthetic intelligent multi-modal sensing-communication dataset for Synesthesia of Machines (SoM) [Dataset]. http://doi.org/10.6084/m9.figshare.28123646.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28123646.v1
Dataset updated
May 20, 2025
Dataset provided by
figshare
Authors
Xiang Cheng; Ziwei Huang; Yong Yu; Lu Bai; Mingran Sun; Zengrui Han; Ruide Zhang; Sijiang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Given the importance of datasets for sensing-communication integration research, a novel simulation platform for constructing communication and multi-modal sensory dataset is developed. The developed platform integrates three high-precision software, i.e., AirSim, WaveFarer, and Wireless InSite, and further achieves in-depth integration and precise alignment of them. Based on the developed platform, a new synthetic intelligent multi-modal sensing-communication dataset for Synesthesia of Machines (SoM), named SynthSoM, is proposed. The SynthSoM dataset contains various air-ground multi-link cooperative scenarios with comprehensive conditions, including multiple weather conditions, times of the day, intelligent agent densities, frequency bands, and antenna types. The SynthSoM dataset encompasses multiple data modalities, including radio-frequency (RF) channel large-scale and small-scale fading data, RF millimeter wave (mmWave) radar sensory data, and non-RF sensory data, e.g., RGB images, depth maps, and light detection and ranging (LiDAR) point clouds. The quality of SynthSoM dataset is validated via statistics-based qualitative inspection and evaluation metrics through machine learning (ML) via real-world measurements. The SynthSoM dataset is open-sourced and provides consistent data for cross-comparing SoM-related algorithms.
PEGASET
zenodo.org
zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Meyer; Lukas Meyer; Floris Erich; Floris Erich (2024). PEGASET [Dataset]. http://doi.org/10.5281/zenodo.12625040
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12625040
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lukas Meyer; Lukas Meyer; Floris Erich; Floris Erich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

We introduce Physically Enhanced GAussian Splatting SimUlation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting.

Preparation starts by separate scanning of both environments and objects. PEGASUS allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene by interacting with their extracted mesh. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS surpasses the performance of existing 6DoF pose estimation networks such as deep object pose.

Furthermore, our sim-to-real approach validates the successful transfer of tasks from synthetic data to real-world data. Moreover, we introduce the CupNoodle dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.
O
NYU Hand
opendatalab.com
paperswithcode.com
zip
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York University (2023). NYU Hand [Dataset]. https://opendatalab.com/OpenDataLab/NYU_Hand
Explore at:
zip(97793852238 bytes)Available download formats
Dataset updated
Mar 24, 2023
Dataset provided by
New York University
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The NYU Hand pose dataset contains 8252 test-set and 72757 training-set frames of captured RGBD data with ground-truth hand-pose information. For each frame, the RGBD data from 3 Kinects is provided: a frontal view and 2 side views. The training set contains samples from a single user only (Jonathan Tompson), while the test set contains samples from two users (Murphy Stein and Jonathan Tompson). A synthetic re-creation (rendering) of the hand pose is also provided for each view.
d
Data from: acusim: a synthetic dataset for cervicocranial acupuncture points...
search.dataone.org
datadryad.org
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qilei Sun; Jiatao Ma; Paul Craig; Linjun Dai; EngGee Lim (2025). acusim: a synthetic dataset for cervicocranial acupuncture points localisation [Dataset]. http://doi.org/10.5061/dryad.zs7h44jkz
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.zs7h44jkz
Dataset updated
Apr 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
Qilei Sun; Jiatao Ma; Paul Craig; Linjun Dai; EngGee Lim
Description
The locations of acupuncture points (acupoints) differ among human individuals due to variations in factors such as height, weight, and fat proportions. However, acupoint annotation is expert-dependent, labour-intensive, and highly expensive, which limits the data size and detection accuracy. In this paper, we introduce the "AcuSim" dataset as a new synthetic dataset for the task of localising points on the human cervicocranial area from an input image using an automatic render and labelling pipeline during acupuncture treatment. It includes the creation of 63,936 RGB-D images and 504 synthetic anatomical models with 174 volumetric acupoints annotated, to capture the variability and diversity of human anatomies. The study validates a convolutional neural network (CNN) on the proposed dataset with an accuracy of 99.73% and shows that 92.86% of predictions in the validation set align within a 5mm threshold of margin error when compared to expert-annotated data. This dataset addresses the ..., , , # AcuSim: A Synthetic Dataset for Cervicocranial Acupuncture Points Localisation

Dryad DOI:Â https://doi.org/10.5061/dryad.zs7h44jkz

Dataset Overview

A multi-view acupuncture point dataset containing:

64x64, 128x128, 256x256, 512Ã—512 and 1024x1024resolution RGB images

Corresponding JSON annotations with:

2D/3D keypoint coordinates

Visibility weights (0.9-1.0 scale)

Meridian category indices

Visibility masks

174 standard acupuncture points (map.txt)

Occlusion handling implementation

Key Features

Multi-view Rendering: Generated using Blender 3.5 with realistic occlusion simulation

Structured Annotation:

Default initialization for occluded points ([0.0, 0.0, 0.5])

Meridian category preservation for occluded points

Weighted visibility scoring

ML-Ready Format: Preconfigured PyTorch DataLoader implementation

Dataset Structure

dataset_root/ â”œâ”€â”€ map.txt # Complete list of 174 acupuncture points â”œâ”€â”€ train/ ...,
h
video2articulation
huggingface.co
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
3D Language & Generation Research Group (2025). video2articulation [Dataset]. https://huggingface.co/datasets/3dlg-hcvc/video2articulation
Explore at:
Dataset updated
Jun 21, 2025
Dataset authored and provided by
3D Language & Generation Research Group
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This repository contains the synthetic data used in the paper Generalizable Articulated Object Reconstruction from Casually Captured RGBD Videos

Term of Use

Our dataset is derived from the PartNet-Mobility dataset. Users are required to agree on the terms of use of the PartNet-Mobility dataset before using our dataset. Researchers shall use our dataset only for non-commercial research and educational purposes.

File Structure

Inside the sim_data folder, there are several… See the full description on the dataset page: https://huggingface.co/datasets/3dlg-hcvc/video2articulation.
4
Mined Object and Relational Data for Sets of Locations
data.4tu.nl
zip
Updated Feb 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Timothy Balint (2019). Mined Object and Relational Data for Sets of Locations [Dataset]. http://doi.org/10.4121/uuid:1fbfd4a0-1b7f-4dec-8097-617fea87cde5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:1fbfd4a0-1b7f-4dec-8097-617fea87cde5
Dataset updated
Feb 13, 2019
Dataset provided by
4TU.Centre for Research Data
Authors
J. Timothy Balint
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mined Location Object and Relation data

Overview This data-set contains the mined objects and relationships from a few different data-sets comprising of annotated images (SUNRGBD) and annotated virtual environments (SUNCG). It is split up into pairwise distance/angle relationships (PAIRWISE) and higher level semantic relationships. For distance/angle relationships, the name of the file (Fisher12 or Kermani) represent the way in which they were parsed. Fisher uses Gaussian Mixture Models, and Kermani uses K-Means clustering to figure out the number of different relationships that there are.

Note that there are a few changes between Kermani et al.'s original implementation and how this data-set was mined. Specifically: 1) We change the probabilities for symmetry. For scenes that have only a few examples, 0.005 is too low to be salient for anything. 2) We require a location type to have more than one example, and to have salient objects be more than one. This is not explictely stated in Kermani et al., because most scene generation methods only consider rooms that have many examples (order of 100 at least). We make having at least one location a requirement and mine on the examples that have less objects. This cuts out a few locations that have very few rooms in general. 3) We preprocess the nodes for the min-spanning tree to only consider objects whose count is above the threshold. This has the effect of making our connections more salient in general, and cleans up a bit of noise.

Citations: If you find this data-set useful, please cite the original data-sets that the information came from: NYU:N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” Computer Vision–ECCV 2012, pp. 746–760, 2012. SUN-RGBD (Note I do not include the datasets in sun-rgbd, but you should):S. Song, S. P. Lichtenberg, and J. Xiao, “Sun rgb-d: A rgb-d scene understanding benchmark suite,” presented at the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 567–576. SUNCG:S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic Scene Completion from a Single Depth Image,” IEEE Conference on Computer Vision and Pattern Recognition, 2017.

As well as the methods that they were obtained from: Fisher12-PairWise: M. Fisher, D. Ritchie, M. Savva, T. Funkhouser, and P. Hanrahan, “Example-based synthesis of 3D object arrangements,” ACM Transactions on Graphics (TOG), vol. 31, no. 6, p. 135, 2012. Kermani: Z. S. Kermani, Z. Liao, P. Tan, and H. Zhang, “Learning 3D Scene Synthesis from Annotated RGB‐D Images,” Computer Graphics Forum, vol. 35, no. 5, pp. 197–206, 2016. SceneSuggest (This paper contains the equations used in scene suggest): M. Savva, A. X. Chang, and P. Hanrahan, “Semantically-enriched 3D models for common-sense knowledge,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 24–31.
P
CARLA Dataset
paperswithcode.com
Updated Feb 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun (2021). CARLA Dataset [Dataset]. https://paperswithcode.com/dataset/carla
Explore at:
Dataset updated
Feb 2, 2021
Authors
Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun
Description
CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

SceneNet RGB-D Dataset [Dataset]. https://paperswithcode.com/dataset/scenenet-rgb-d

SceneNet RGB-D Dataset

Explore at:

311 scholarly articles cite this dataset (View in Google Scholar)

Description

SceneNet-RGBD is a synthetic dataset containing large-scale photorealistic renderings of indoor scene trajectories with pixel-level annotations. Random sampling permits virtually unlimited scene configurations, and the dataset creators provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system.

Clear search

Close search

Google apps

Main menu

SceneNet RGB-D Dataset

Data from: Synthetic Data for Non-rigid 3D Reconstruction using a Moving...

Synthetic RGB-D data for plant segmentation

Context

Content

Inspiration

GESDPD depth people detection dataset

CHOC: The CORSMAL Hand-Occluded Containers dataset

Data from: SynthSoM: A synthetic intelligent multi-modal...

PEGASET

NYU Hand

Data from: acusim: a synthetic dataset for cervicocranial acupuncture points...

Dataset Overview

Key Features

Dataset Structure

video2articulation

Mined Object and Relational Data for Sets of Locations

CARLA Dataset

SceneNet RGB-D Dataset