zxh4546/ntu-rgbd dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
AppleDetection RGBD is a dataset for object detection tasks - it contains Apple annotations for 967 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks
For detailed statistics about our datasets, please refer to the following paper:Preprint: https://arxiv.org/abs/2501.01685 Github pages:https://github.com/AIM-SKKU/NYUDv2-IS https://github.com/AIM-SKKU/SUN-RGBD-IS https://github.com/AIM-SKKU/Box-IS
Xiaowangji/RGBD dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
While a great variety of 3D cameras have been introduced in recent years, most publicly available datasets for object recognition and pose estimation focus on one single camera. This dataset consists of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames. This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used camera and the development of more robust algorithms that are more independent of the camera model. Vice versa, our dataset enables researchers to perform a quantitative comparison of the data from several different cameras and depth sensing technologies and evaluate their algorithms before selecting a camera for their specific task. The scenes in our dataset contain 20 different objects from the common benchmark YCB object and model set. We provide full ground truth 6DoF poses for each object, per-pixel segmentation, 2D and 3D bounding boxes and a measure of the amount of occlusion of each object.
If you use this dataset in your research, please cite the following publication:
T. Grenzdörffer, M. Günther, and J. Hertzberg, “YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31-June 4, 2020. IEEE, 2020.
@InProceedings{Grenzdoerffer2020ycbm, title = {{YCB-M}: A Multi-Camera {RGB-D} Dataset for Object Recognition and {6DoF} Pose Estimation}, author = {Grenzd{"{o}}rffer, Till and G{"{u}}nther, Martin and Hertzberg, Joachim}, booktitle = {2020 {IEEE} International Conference on Robotics and Automation, {ICRA} 2020, Paris, France, May 31-June 4, 2020}, year = {2020}, publisher = {{IEEE}} }
This paper is also available on arXiv: https://arxiv.org/abs/2004.11657
To visualize the dataset, follow these instructions (tested on Ubuntu Xenial 16.04):
sudo add-apt-repository -y ppa:deadsnakes/ppa # to get python3.6 on Ubuntu Xenial sudo apt-get update sudo apt-get install -y python3.6 libsm6 libxext6 libxrender1 python-virtualenv python-pip
virtualenv -p python3.6 venv_nvdu cd venv_nvdu/ source bin/activate
pip install -e 'git+https://github.com/mintar/Dataset_Utilities.git#egg=nvdu'
nvdu_ycb -s
cd nvdu_viz --name_filters '*.jpg'
For further details, see README.md.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bosch Industrial Depth Completion Dataset (BIDCD) is an RGBD dataset of static table top scenes with industrial objects, collected with a depth-camera from multiple Points-of-View (POV), approximately 60 for each scene.
We generated depth ground truth with a customized pipeline for removing erroneous depth values, and applied Multi-View geometry to fuse the cleaned depth frames and fill-in missing information. The fused scene mesh was back-projected to each POV, and finally a bi-lateral filter was applied to reduce the remaining holes.
For each scene we provide RGB, raw Depth, Ground-Truth Depth. Auxiliary information includes (a) workspace masks, corresponding to the footprint of workspace volume, and (b) cleaned depth, an intermediate result from the pipe-line mentioned above. For more details see our publication "BIDCD - Bosch Industrial Depth Completion Dataset".
The NTU-RGBD dataset is a large-scale dataset for 3D human activity analysis, containing 56,000 videos and 60 actions performed by 40 people from 80 different views.
RGB-D images of faces cropped to mouth region with participants showing making 1 of 7 mouth/tongue states. Dataset includes images from 17 participants. Multiple locations/lighting environments were used for filming, but each participant was filmed in a single location. Annotation file contains path to rgb image, depth image, and the mouth/tongue state for that those images. Mouth/tongue states: - Mouth open - Mouth closed - Tongue up - Tongue down - Tongue middle - Tongue left - Tongue right
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The comparison experimental results (pixel accuracy, mean accuracy, and mean IoU) of the SIEANs with the previous state-of-the-art methods on SUN-RGBD dataset including or not including depth images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This RGB-D dataset is part is part of our publication
Heindl, Christoph, et al. "Spatio-thermal depth correction of RGB-D sensors based on Gaussian processes in real-time." Tenth International Conference on Machine Vision (ICMV 2017). Vol. 10696. SPIE, 2018.
Our capture setup consists of a RGB-D sensor looking towards a known planar object. The sensor is coupled with an electronic linear axis to adjust distance. We captured data at distances [40cm, 90cm, 10cm steps] in the temperate range of [25°C, 35°C, 1°C steps]. At each temperature/distance tuple we grabbed 50 images from both RGB and IR (aligned with RGB) sensors. We then created an artificial depth map for all RGB images utilizing the known calibration target in sight.
For more information visit https://github.com/cheind/rgbd-correction
📦 Spatial Perception And Reasoning Dataset – RGBD (SPAR-7M-RGBD)
A large-scale multimodal dataset for 3D-aware spatial perception and reasoning in vision-language models.
SPAR-7M-RGBD extends the original SPAR-7M with additional depths, camera intrinsics, and pose information. It contains over 7 million QA pairs across 33 spatial tasks, built from 4,500+ richly annotated indoor 3D scenes. This version supports single-view, multi-view, and… See the full description on the dataset page: https://huggingface.co/datasets/jasonzhango/SPAR-7M-RGBD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was recorded with an intel® RealSense™ Depth Camera D435i. The dataset was recorded in the corridors of the laboratory Digiteo bât 660. This dataset has three acquisition modes: IR-D, Passive-Stereo RGB-D and Stereo.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The RGB-D camera market is experiencing robust growth, driven by increasing demand across diverse sectors. The market, estimated at $2.5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated market value exceeding $8 billion by 2033. This expansion is fueled by several key factors. The proliferation of robotics and automation in manufacturing and logistics necessitates precise depth perception, a core capability of RGB-D cameras. Simultaneously, the advancements in augmented reality (AR) and virtual reality (VR) technologies are driving the adoption of these cameras for immersive experiences. Furthermore, the automotive industry's burgeoning interest in advanced driver-assistance systems (ADAS) and autonomous driving is significantly boosting demand. The increasing availability of high-resolution, low-cost RGB-D sensors further accelerates market penetration across various applications. Several market trends are shaping the future of this technology. Miniaturization and power efficiency are critical considerations, leading to the development of smaller, more energy-efficient cameras suitable for mobile devices and embedded systems. The integration of artificial intelligence (AI) and machine learning (ML) algorithms within RGB-D cameras enables more sophisticated applications, such as real-time object recognition and scene understanding. Competition among established players like Microsoft and Intel, alongside emerging companies such as Ultraleap and Stereolabs, is fostering innovation and driving down costs, making RGB-D technology accessible to a wider range of applications. Despite these positive trends, challenges remain, including the need for improved accuracy and robustness in challenging lighting conditions and the development of standardized interfaces to facilitate seamless integration across different platforms.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
RGB-D Camera Market Size And Forecast
RGB-D Camera Market size was valued at USD 8.44 Billion in 2023 and is expected to reach USD 9.58 Billion by 2031, with a CAGR of 13.43% from 2024-2031.
Global RGB-D Camera Market Drivers
The market drivers for the RGB-D Camera Market can be influenced by various factors. These may include:
Advancements in Imaging Technology: Continuous improvements in imaging sensors, depth sensing technology, and algorithms enhance the performance and capabilities of RGB-D cameras, making them more appealing for various applications. Growing Demand in Robotics and Automation: RGB-D cameras are increasingly utilized in robotics for navigation, obstacle detection, and interaction with environments. The automation of industries and the rise of autonomous robots drive market demand.
Global RGB-D Camera Market Restraints
Several factors can act as restraints or challenges for the RGB-D Camera Market, These may include:
High Cost: RGB-D cameras can be expensive compared to traditional cameras. This cost can be a barrier for small businesses and startups looking to implement RGB-D technology for various applications. Technological Complexity: The technology behind RGB-D cameras is complex, which can lead to difficulties in integration with existing systems and workflows. This complexity may deter some businesses from adopting this technology.
https://dataverse.lib.nycu.edu.tw/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57770/VOW6CIhttps://dataverse.lib.nycu.edu.tw/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.57770/VOW6CI
We propose a novel superpixel-based multi-view convolutional neural network for semantic image segmentation. The proposed network produces a high quality segmentation of a single image by leveraging information from additional views of the same scene. Particularly in indoor videos such as captured by robotic platforms or handheld and bodyworn RGBD cameras, nearby video frames provide diverse viewpoints and additional context of objects and scenes. To leverage such information, we first compute region correspondences by optical flow and image boundary-based superpixels. Given these region correspondences, we propose a novel spatio-temporal pooling layer to aggregate information over space and time. We evaluate our approach on the NYU--Depth--V2 and the SUN3D datasets and compare it to various state-of-the-art single-view and multi-view approaches. Besides a general improvement over the state-of-the-art, we also show the benefits of making use of unlabeled frames during training for multi-view as well as single-view prediction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A high-quality dynamic coal flow monocular depth estimation dataset, based onmulti-sensor fusion, is presented to provide reliable support for the coal industry’sproduction, transportation, and processing stages. The dataset is meticulouslydesigned to address the specific requirements of coal flow monitoring. It encompasses three typical collection scenarios: coal handling galleries, manual gangueselection areas, and idle conveyor belts. The acquisition of high-precision, low-noise, and spatiotemporally aligned RGBD data was facilitated by the utilizationof ToF depth cameras and high-performance industrial cameras, thereby ensuring its suitability for operation in complex industrial environments, such as coalmines.
chouss/rgbd dataset hosted on Hugging Face and contributed by the HF Datasets community
HUMAN4D: A Human-Centric Multimodal Dataset for Motions & Immersive Media (Subject #1)
The dataset was captured with the use of VCL Volumetric Capture free software (https://github.com/VCL3D/VolumetricCapture)
HUMAN4D is a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system.
By capturing 2 female and 2 male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data.
Despite the existence of multi-view color datasets captured with the use of hardware (HW) synchronization, to the best of our knowledge, HUMAN4D is the first and only public resource that provides volumetric depth maps with high synchronization precision due to the use of intra- and inter-sensor HW-SYNC.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The RealCMB dataset comprises 58 sets of images, each containing blurry, sharp, and depth images, as well as synchronized RGB frames, poses, and depth maps. Out of the 58 sets, 48 were recorded using the data collection app from Chugovov et al. 2022, while the remaining 10 sets were already available in Chugovov et al. 2022.
If you use it, please cite:
@inproceedings{torres2023parallaxicb, title={Depth-Aware Image Compositing Model for Parallax Camera Motion Blur}, author={Torres, German F and K{"a}m{"a}r{"a}inen, Joni}, booktitle={Image Analysis: 23rd Scandinavian Conference, SCIA 2023, Sirkka, Finland, April 18--21, 2023, Proceedings, Part I}, pages={279--296}, year={2023}, organization={Springer} }
zxh4546/ntu-rgbd dataset hosted on Hugging Face and contributed by the HF Datasets community