12 datasets found
  1. P

    SceneNet RGB-D Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SceneNet RGB-D Dataset [Dataset]. https://paperswithcode.com/dataset/scenenet-rgb-d
    Explore at:
    Description

    SceneNet-RGBD is a synthetic dataset containing large-scale photorealistic renderings of indoor scene trajectories with pixel-level annotations. Random sampling permits virtually unlimited scene configurations, and the dataset creators provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system.

  2. Data from: Synthetic Data for Non-rigid 3D Reconstruction using a Moving...

    • data.csiro.au
    • researchdata.edu.au
    Updated Sep 13, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shafeeq Elanattil; Peyman Moghadam (2018). Synthetic Data for Non-rigid 3D Reconstruction using a Moving RGB-D Camera [Dataset]. http://doi.org/10.25919/5b7b60176d0cd
    Explore at:
    Dataset updated
    Sep 13, 2018
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Shafeeq Elanattil; Peyman Moghadam
    License

    https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/

    Dataset funded by
    CSIROhttp://www.csiro.au/
    Queensland University of Technology
    Description

    We introduce a synthetic dataset for evaluating no-rigid 3D reconstruction using a moving RGB-D camera. The dataset consist of two subjects captured with four different camera trajectories. For each case we provide frame-by-frame ground truth geometry of the scene, the camera trajectory and foreground mask. This synthetic data was a part of paper "Non-rigid reconstruction with a single moving RGB-D camera" published at ICPR 2018. If you are using this dataset please cite the paper and this collection. More information can be found at the supporting documents.

  3. Synthetic RGB-D data for plant segmentation

    • kaggle.com
    Updated Nov 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    El Baha Farouk (2020). Synthetic RGB-D data for plant segmentation [Dataset]. https://www.kaggle.com/harlequeen/synthetic-rgbd-images-of-plants/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    El Baha Farouk
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Nowadays, AI has revolutionized almost all fields. But, the lack of data was and still represents a wall in front of this revolution. Fortunately, we can solve this problem by generating synthetic data to train our models on and using fine-tuning with the few available real data that we possess. This dataset was built and generated for the purpose of training deep neural networks toward Semantic and Instance Segmentation of plants in the agricultural field.

    Content

    A dataset of RGB-D 224x224 images of plants. It contains 5 sub-folders: rgb, depth, semantic, instances masks and node. In fact, each plant has 5 classes ( leaf, petiole, stem, fruit and background ). It could be used in different Computer Vision and Deep Learning projects such: Plant detection, Measure of objects using Depth, Robotics and also for making agriculture smart.

    Inspiration

    If this dataset was useful for you please let me know ! and in what project did you use it ?

  4. GESDPD depth people detection dataset

    • kaggle.com
    Updated Jan 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Fuentes Jimenez (2020). GESDPD depth people detection dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/915718
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    David Fuentes Jimenez
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    1) Introduction

    The GESDPD is a depth images database containing 22000 frames , that simulates to have been taken with a sensor in an elevated front position, in an indoor environment. It was designed to fulfill the following objectives: • Allow the train and evaluation of people detection algorithms based on depth , or RGB-D data, without the need of manually labeling. • Provide quality synthetic data to the research community in people detection tasks. The people detection task can also be extended to practical applications such as video-surveillance, access control, people flow analysis, behaviour analysis or event capacity management.

    2) General contents

    GESDPD is composed of 22000 depth synthetic images, that simulates to have been taken with a sensor in an elevated front position, in a rectangular, indoor working environment. These have been generated using the simulation software Blender. The synthetic images show a room with different persons walking in different directions. The camera perspective is not stationary, it moves around the room along the database, which avoids a constant background. Some examples of the different views are shown in the next figures. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1411532%2Fdf423fca8672eab818d38a456ad36546%2Fsala_blend.png?generation=1578587944416132&alt=media" alt="">

    3) Quantitative details on the database content are provided below.

    • Number of frames: 22000

    • Number of different people: 4 (3 men and 1 woman)

    • Number of labeled people: 20800

    • Image resolution: 320 × 240 pixels

    For each image, the are provided the depth map and the ground truth including the position of each person in the scene.

    To give you an idea on what to expect, the next figure shows some examples of images from the dataset. In this figure, depth values are represented in millimeters, using a colormap. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1411532%2F30197eb1dbbfb1e6e5bcbf0d5354b3f2%2Fsample-synthetic-images.png?generation=1578588087445161&alt=media" alt="">

    4) ** Geometry details** As it has been said before, the dataset simulates to have been taken with a sensor in an elevated front position, in a rectangular indoor working environment. Specifically, the camera was placed at a height of 3 meters, and it rotates along the sequence. Regarding the room (whose distribution is shown in figure the next figure), its dimensions are 8.56 × 5.02m, and it has a height of 3.84m. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1411532%2Fa1ebe9f4de9d06e508b3105bfd9973f9%2Fdistribution.png?generation=1578588287335433&alt=media" alt="">

    5) File Formats

    5.1) Depth data

    The depth information (distance to the camera plane) in stored as a .png image, in which each pixel represent the depth value in millimeters as a (little endian) unsigned integer of two bytes. Its values range from 0 to 15000.

    5.2) Position Ground Truth Data

    The ground truth information is also provided as a .png file, with the same dimensions that the gener- ated images (320 × 240 pixels). The ground truth files have in their names the same number than the corresponding depth files. For labeling people positions, there have been placed Gaussian functions over the centroid of the head of each person in the scene, so that the centroid corresponds to the 2D position of the center of the head and has a normalized value of one. The standard deviation has a value of 15 pixels for all the Gaussians, regardless of the size of each head and the distance from the head to the camera. This value has been calculated based on an estimated value of the average diameter of a person head, taking into account anthropometric considerations. It is worth to highlight that, when two heads are very closely or overlapping with each other, instead of adding both Gaussian functions, the maximum value of them prevail. That modification provides a set of Gaussians that are always separated, so that the CNN can learn to generate that separation between Gaussians in its output. The next figure shows an example of two Gaussian functions. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1411532%2F935aa3e17689e9183fe181fc50b76239%2Fgaussian.png?generation=1578588393522470&alt=media" alt="">

    6) Disclaimer, Licensing, Request and Contributions This document and the data provided are work in progress and provided as is. The GEINTRA Synthetic Depth People Detection (GESDPD) Database (and accompanying files and documentation) by David Fuentes-Jiménez, Roberto Martín-López, Cristina Losada-Gutiérrez, Javier Macías-Guarasa and Carlos A. Luna is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

    If you make use of this database and/or its related documentation,...

  5. CHOC: The CORSMAL Hand-Occluded Containers dataset

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin, zip
    Updated Sep 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xavier Weber; Tommaso Apicella; Alessio Xompero; Alessio Xompero; Andrea Cavallaro; Andrea Cavallaro; Xavier Weber; Tommaso Apicella (2023). CHOC: The CORSMAL Hand-Occluded Containers dataset [Dataset]. http://doi.org/10.5281/zenodo.8332421
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Sep 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xavier Weber; Tommaso Apicella; Alessio Xompero; Alessio Xompero; Andrea Cavallaro; Andrea Cavallaro; Xavier Weber; Tommaso Apicella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CORSMAL Hand-Occluded Containers (CHOC) is an image-based dataset for category-level 6D object pose and size estimation, affordance segmentation, object detection, object and arm segmentation, and hand+object reconstruction. The dataset has 138,240 pseudo-realistic composite RGB-D images of hand-held containers on top of 30 real backgrounds (mixed-reality set) and 3,951 RGB-D images selected from the CORSMAL Container Manipulation (CCM) dataset (real set). CHOC-AFF is the subset that focuses on the problem of visual affordance segmentation. CHOC-AFF consists of the RGB images, the object and arm segmentation masks, and the affordance segmentation masks.

    The images of the mixed-reality set are automatically rendered using Blender, and are split into 129,600 images of handheld containers and 8,640 images of objects without hand. Only one synthetic container is rendered for each image. Images are evenly split among 48 unique synthetic objects from three categories, namely 16 boxes, 16 drinking containers without stem (nonstems) and 16 drinking containers with stems (stems), selected from ShapeNetSem. For each object, 6 realistic grasps were manually annotated using GraspIt!: bottom grasp, natural grasp, and top grasp for the left and right hand. The mixed-reality set provides RGB images, depth images, segmentation masks (hand and object), normalised object coordinates images (only object), object meshes, annotated 6D object poses (orientation and translation in 3D with respect to the camera view), and grasp meshes with their MANO parameters. Each image has a resolution of 640x480 pixels. Background images were acquired using an Intel RealSense D435i depth camera, and include 15 indoor and 15 outdoor scenes. All information necessary to re-render the dataset is provided, namely backgrounds, camera intrinsic parameters, lighting, object models, and hand + forearm meshes, and poses; users can complement the existing data with additional annotations. Note: The mixed-reality set was built on top of previous works for the generation of synthetic and mixed-reality datasets, such as OBMan and NOCS-CAMERA.

    The images of the real set are selected from 180 representative sequences of the CCM dataset. Each image contains a person holding one of the 15 containers during a manipulation occurring in the video prior to a handover (e.g., picking up an empty container, shaking an empty or filled food box, or pouring a content into a cup or drinking glass). For each object instance, sequences were chosen under four randomly sampled conditions, including background and lighting conditions, scenarios (person sitting, with the object on the table; person sitting and already holding the object; person standing while holding the container and then walking towards the table), and filling amount and type. The same sequence is selected from the three fixed camera views (two side and one frontal view) of the CCM setup (60 sequences for each view). Fifteen sequences exhibit the case of the empty container for all fifteen objects, whereas the other sequences have the person filling the container with either pasta, rice or water at 50% or 90% of the full container capacity. The real set has RGB images, depth images and 6D pose annotations. For each sequence, the 6D poses of the containers are manually annotated every 10 frames if the container is visible in at least two views, resulting in a total of 3,951 annotations. Annotations of the 6D poses for the intermediate frames are also provided by using interpolation.

    Contacts
    For enquiries, questions, or comments, please contact Alessio Xompero. For enquiries, questions, or comments about CHOC-AFF, please contact Tommaso Apicella.

    References
    If you work on Visual Affordance Segmentation and you use the subset CHOC-AFF, please see the related work on ACANet and also cite:
    Affordance segmentation of hand-occluded containers from exocentric images
    T. Apicella, A. Xompero, E. Ragusa, R. Berta, A. Cavallaro, P. Gastaldo
    IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2023

    Additional resources
    Webpage of 6D pose estimation using CHOC
    Toolkit to parse and inspect the dataset, or generate new data

    Release notes
    2023/09/10
    - Added object affordance segmentation masks

    2023/02/08
    - Fixed NOCS maps due to a missing rotation during the generation
    - Fixed annotations to include the missing rotation

    2023/01/09
    - Fixed RGB_070001_80000 (wrong files previously)

    2022/12/14
    - Added a mapping dictionary from grasp-IDs to their corresponding MANO-parameters-IDs to grasp.zip
    - Added object meshes with the NOCS textures/material in object_models.zip
    - Fixed folder name in annotations.zip
    - Updated README file to include these changes and fix a typo in the code block to unzip files

  6. f

    Data from: SynthSoM: A synthetic intelligent multi-modal...

    • springernature.figshare.com
    bin
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Cheng; Ziwei Huang; Yong Yu; Lu Bai; Mingran Sun; Zengrui Han; Ruide Zhang; Sijiang Li (2025). SynthSoM: A synthetic intelligent multi-modal sensing-communication dataset for Synesthesia of Machines (SoM) [Dataset]. http://doi.org/10.6084/m9.figshare.28123646.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    figshare
    Authors
    Xiang Cheng; Ziwei Huang; Yong Yu; Lu Bai; Mingran Sun; Zengrui Han; Ruide Zhang; Sijiang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Given the importance of datasets for sensing-communication integration research, a novel simulation platform for constructing communication and multi-modal sensory dataset is developed. The developed platform integrates three high-precision software, i.e., AirSim, WaveFarer, and Wireless InSite, and further achieves in-depth integration and precise alignment of them. Based on the developed platform, a new synthetic intelligent multi-modal sensing-communication dataset for Synesthesia of Machines (SoM), named SynthSoM, is proposed. The SynthSoM dataset contains various air-ground multi-link cooperative scenarios with comprehensive conditions, including multiple weather conditions, times of the day, intelligent agent densities, frequency bands, and antenna types. The SynthSoM dataset encompasses multiple data modalities, including radio-frequency (RF) channel large-scale and small-scale fading data, RF millimeter wave (mmWave) radar sensory data, and non-RF sensory data, e.g., RGB images, depth maps, and light detection and ranging (LiDAR) point clouds. The quality of SynthSoM dataset is validated via statistics-based qualitative inspection and evaluation metrics through machine learning (ML) via real-world measurements. The SynthSoM dataset is open-sourced and provides consistent data for cross-comparing SoM-related algorithms.

  7. PEGASET

    • zenodo.org
    zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Meyer; Lukas Meyer; Floris Erich; Floris Erich (2024). PEGASET [Dataset]. http://doi.org/10.5281/zenodo.12625040
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lukas Meyer; Lukas Meyer; Floris Erich; Floris Erich
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce Physically Enhanced GAussian Splatting SimUlation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting.

    Preparation starts by separate scanning of both environments and objects. PEGASUS allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene by interacting with their extracted mesh. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS surpasses the performance of existing 6DoF pose estimation networks such as deep object pose.

    Furthermore, our sim-to-real approach validates the successful transfer of tasks from synthetic data to real-world data. Moreover, we introduce the CupNoodle dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.

  8. O

    NYU Hand

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York University (2023). NYU Hand [Dataset]. https://opendatalab.com/OpenDataLab/NYU_Hand
    Explore at:
    zip(97793852238 bytes)Available download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    New York University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The NYU Hand pose dataset contains 8252 test-set and 72757 training-set frames of captured RGBD data with ground-truth hand-pose information. For each frame, the RGBD data from 3 Kinects is provided: a frontal view and 2 side views. The training set contains samples from a single user only (Jonathan Tompson), while the test set contains samples from two users (Murphy Stein and Jonathan Tompson). A synthetic re-creation (rendering) of the hand pose is also provided for each view.

  9. d

    Data from: acusim: a synthetic dataset for cervicocranial acupuncture points...

    • search.dataone.org
    • datadryad.org
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qilei Sun; Jiatao Ma; Paul Craig; Linjun Dai; EngGee Lim (2025). acusim: a synthetic dataset for cervicocranial acupuncture points localisation [Dataset]. http://doi.org/10.5061/dryad.zs7h44jkz
    Explore at:
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Qilei Sun; Jiatao Ma; Paul Craig; Linjun Dai; EngGee Lim
    Description

    The locations of acupuncture points (acupoints) differ among human individuals due to variations in factors such as height, weight, and fat proportions. However, acupoint annotation is expert-dependent, labour-intensive, and highly expensive, which limits the data size and detection accuracy. In this paper, we introduce the "AcuSim" dataset as a new synthetic dataset for the task of localising points on the human cervicocranial area from an input image using an automatic render and labelling pipeline during acupuncture treatment. It includes the creation of 63,936 RGB-D images and 504 synthetic anatomical models with 174 volumetric acupoints annotated, to capture the variability and diversity of human anatomies. The study validates a convolutional neural network (CNN) on the proposed dataset with an accuracy of 99.73% and shows that 92.86% of predictions in the validation set align within a 5mm threshold of margin error when compared to expert-annotated data. This dataset addresses the ..., , , # AcuSim: A Synthetic Dataset for Cervicocranial Acupuncture Points Localisation

    Dryad DOI:Â https://doi.org/10.5061/dryad.zs7h44jkz

    Dataset Overview

    A multi-view acupuncture point dataset containing:

    • 64x64, 128x128, 256x256, 512×512 and 1024x1024resolution RGB images
    • Corresponding JSON annotations with:
      • 2D/3D keypoint coordinates
      • Visibility weights (0.9-1.0 scale)
      • Meridian category indices
      • Visibility masks
    • 174 standard acupuncture points (map.txt)
    • Occlusion handling implementation

    Key Features

    • Multi-view Rendering: Generated using Blender 3.5 with realistic occlusion simulation
    • Structured Annotation:
      • Default initialization for occluded points ([0.0, 0.0, 0.5])
      • Meridian category preservation for occluded points
      • Weighted visibility scoring
    • ML-Ready Format: Preconfigured PyTorch DataLoader implementation

    Dataset Structure

    dataset_root/
    ├── map.txt         # Complete list of 174 acupuncture points
    ├── train/
    ...,
    
  10. h

    video2articulation

    • huggingface.co
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    3D Language & Generation Research Group (2025). video2articulation [Dataset]. https://huggingface.co/datasets/3dlg-hcvc/video2articulation
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset authored and provided by
    3D Language & Generation Research Group
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This repository contains the synthetic data used in the paper Generalizable Articulated Object Reconstruction from Casually Captured RGBD Videos

      Term of Use
    

    Our dataset is derived from the PartNet-Mobility dataset. Users are required to agree on the terms of use of the PartNet-Mobility dataset before using our dataset. Researchers shall use our dataset only for non-commercial research and educational purposes.

      File Structure
    

    Inside the sim_data folder, there are several… See the full description on the dataset page: https://huggingface.co/datasets/3dlg-hcvc/video2articulation.

  11. 4

    Mined Object and Relational Data for Sets of Locations

    • data.4tu.nl
    zip
    Updated Feb 13, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Timothy Balint (2019). Mined Object and Relational Data for Sets of Locations [Dataset]. http://doi.org/10.4121/uuid:1fbfd4a0-1b7f-4dec-8097-617fea87cde5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 13, 2019
    Dataset provided by
    4TU.Centre for Research Data
    Authors
    J. Timothy Balint
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mined Location Object and Relation data

    Overview This data-set contains the mined objects and relationships from a few different data-sets comprising of annotated images (SUNRGBD) and annotated virtual environments (SUNCG). It is split up into pairwise distance/angle relationships (PAIRWISE) and higher level semantic relationships. For distance/angle relationships, the name of the file (Fisher12 or Kermani) represent the way in which they were parsed. Fisher uses Gaussian Mixture Models, and Kermani uses K-Means clustering to figure out the number of different relationships that there are.

    Note that there are a few changes between Kermani et al.'s original implementation and how this data-set was mined. Specifically: 1) We change the probabilities for symmetry. For scenes that have only a few examples, 0.005 is too low to be salient for anything. 2) We require a location type to have more than one example, and to have salient objects be more than one. This is not explictely stated in Kermani et al., because most scene generation methods only consider rooms that have many examples (order of 100 at least). We make having at least one location a requirement and mine on the examples that have less objects. This cuts out a few locations that have very few rooms in general. 3) We preprocess the nodes for the min-spanning tree to only consider objects whose count is above the threshold. This has the effect of making our connections more salient in general, and cleans up a bit of noise.

    Citations: If you find this data-set useful, please cite the original data-sets that the information came from: NYU:N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” Computer Vision–ECCV 2012, pp. 746–760, 2012. SUN-RGBD (Note I do not include the datasets in sun-rgbd, but you should):S. Song, S. P. Lichtenberg, and J. Xiao, “Sun rgb-d: A rgb-d scene understanding benchmark suite,” presented at the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 567–576. SUNCG:S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic Scene Completion from a Single Depth Image,” IEEE Conference on Computer Vision and Pattern Recognition, 2017.

    As well as the methods that they were obtained from: Fisher12-PairWise: M. Fisher, D. Ritchie, M. Savva, T. Funkhouser, and P. Hanrahan, “Example-based synthesis of 3D object arrangements,” ACM Transactions on Graphics (TOG), vol. 31, no. 6, p. 135, 2012. Kermani: Z. S. Kermani, Z. Liao, P. Tan, and H. Zhang, “Learning 3D Scene Synthesis from Annotated RGB‐D Images,” Computer Graphics Forum, vol. 35, no. 5, pp. 197–206, 2016. SceneSuggest (This paper contains the equations used in scene suggest): M. Savva, A. X. Chang, and P. Hanrahan, “Semantically-enriched 3D models for common-sense knowledge,” presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 24–31.

  12. P

    CARLA Dataset

    • paperswithcode.com
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun (2021). CARLA Dataset [Dataset]. https://paperswithcode.com/dataset/carla
    Explore at:
    Dataset updated
    Feb 2, 2021
    Authors
    Alexey Dosovitskiy; German Ros; Felipe Codevilla; Antonio Lopez; Vladlen Koltun
    Description

    CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SceneNet RGB-D Dataset [Dataset]. https://paperswithcode.com/dataset/scenenet-rgb-d

SceneNet RGB-D Dataset

Explore at:
311 scholarly articles cite this dataset (View in Google Scholar)
Description

SceneNet-RGBD is a synthetic dataset containing large-scale photorealistic renderings of indoor scene trajectories with pixel-level annotations. Random sampling permits virtually unlimited scene configurations, and the dataset creators provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system.

Search
Clear search
Close search
Google apps
Main menu