The Matterport3D dataset is a large RGB-D dataset for scene understanding in indoor environments. It contains 10,800 panoramic views inside 90 real building-scale scenes, constructed from 194,400 RGB-D images. Each scene is a residential building consisting of multiple rooms and floor levels, and is annotated with surface construction, camera poses, and semantic segmentation.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Matterport3D Dataset A panoramic dataset for indoor semantic segmentation
📕 Introduction
This dataset originates from Matterport3D, propcessed by 360BEV and is used by our ECCV'24 paper OPS. If you're interested in my research topics, feel free to check my homepage for more findings! If you're interested in this dataset, please cite the following paper:
📖 Citation
@article{Matterport3D, title={Matterport3D: Learning from RGB-D Data in Indoor… See the full description on the dataset page: https://huggingface.co/datasets/JunweiZheng/matterport3d.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is an extension of Matterport3D that contains data to train and validate high resolution 360 monocular depth estimation models. The data is structured in 90 folders belonging to 90 different buildings storing a total of 9684 samples. Each sample of the dataset consists of 4 files: the RGB equirectangular 360 image (.png), its depth ground-truth (.dpt), a visualisation of the depth ground-truth (.png) and the camera to world extrinsic parameters for the image (.txt) saved as 7 parameters: 3 for the camera center and the last 4 for the XYWZ rotation quaternion.
Matterport3D Dataset
This dataset contains the matterport3d dataset split into chunks for easier download.
Files
Original file: matterport3d.zip (~7.2GB) Chunks: 8 files (~1GB each) Scripts: merge.sh, download.py, unzip.sh
Usage
Download all files:
git clone https://huggingface.co/datasets/Gen3DF/Matterport3d cd Matterport3d/matterport3d
Reassemble the original file:
chmod +x merge.sh ./merge.sh
Extract contents:
chmod +x unzip.sh ./unzip.sh
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
chuanmao/matterport3d dataset hosted on Hugging Face and contributed by the HF Datasets community
Spherical cameras capture scenes in a holistic manner and have been used for room layout estimation. Recently, with the availability of appropriate datasets, there has also been progress in depth estimation from a single omnidirectional image. While these two tasks are complementary, few works have been able to explore them in parallel to advance indoor geometric perception, and those that have done so either relied on synthetic data, or used small scale datasets, as few options are available that include both layout annotations and dense depth maps in real scenes. This is partly due to the necessity of manual annotations for room layouts. In this work, we move beyond this limitation and generate a 360° geometric vision (360V) dataset that includes multiple modalities, multi-view stereo data and automatically generated weak layout cues. We also explore an explicit coupling between the two tasks to integrate them into a single-shot trained model. We rely on depth-based layout reconstruction and layout-based depth attention, demonstrating increased performance across both tasks. By using single 360° cameras to scan rooms, the opportunity for facile and quick building-scale 3D scanning arises. The project page is available at https://vcl3d.github.io/ExplicitLayoutDepth/.
Habitat-Matterport 3D Semantics Dataset (HM3D-Semantics v0.1) is the largest-ever dataset of semantically-annotated 3D indoor spaces. It contains dense semantic annotations for 120 high-resolution 3D scenes from the Habitat-Matterport 3D dataset. The HM3D scenes are annotated with the 1700+ raw object names, which are mapped to 40 Matterport categories. On average, each scene in HM3D-Semantics v0.1 consists of 646 objects from 114 categories.
It can be used to train embodied agents, such as home robots and AI assistants, at scale for semantic navigation tasks.
Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. To achieve that we generate a new dataset and integrate evaluation metrics that capture not only depth performance, but also secondary traits like boundary preservation and smoothness. Moreover, Pano3D takes a step beyond typical intra-dataset evaluation schemes to inter-dataset performance assessment. By disentangling generalization to three different axes, Pano3D facilitates proper extrapolation assessment under different out-of-training data conditions. Relying on the Pano3D holistic benchmark for 360 depth estimation we perform an extended analysis and derive a solid baseline for the task.
https://kaldir.vc.in.tum.de/matterport/MP_TOS.pdfhttps://kaldir.vc.in.tum.de/matterport/MP_TOS.pdf
Habitat-Matterport 3D (HM3D) is a large-scale dataset of 1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each scene in the dataset consists of a textured 3D mesh reconstruction of interiors such as multi-floor residences, stores, and other private indoor spaces. HM3D surpasses existing datasets available for academic research in terms of physical scale, completeness of the reconstruction, and visual fidelity. HM3D contains 112.5k m^2 of navigable space, which is 1.4 - 3.7x larger than other building-scale datasets such as MP3D and Gibson. When compared to existing photorealistic 3D datasets such as Replica, MP3D, Gibson, and ScanNet, images rendered from HM3D have 20 - 85% higher visual fidelity w.r.t. counterpart images captured with real cameras, and HM3D meshes have 34 - 91% fewer artifacts due to incomplete surface reconstruction.
http://kaldir.vc.in.tum.de/matterport/MP_TOS.pdfhttp://kaldir.vc.in.tum.de/matterport/MP_TOS.pdf
MatterportLayout extends the Matterport3D dataset with general Manhattan layout annotations. It has 2,295 RGBD panoramic images from Matterport3D which are extended with ground truth 3D layouts.
The Habitat-Matterport 3D Semantics Dataset (HM3DSem) is the largest-ever dataset of 3D real-world and indoor spaces with densely annotated semantics that is available to the academic community. HM3DSem v0.2 consists of 142,646 object instance annotations across 216 3D-spaces from HM3D and 3,100 rooms within those spaces. The HM3D scenes are annotated with the 142,646 raw object names, which are mapped to 40 Matterport categories. On average, each scene in HM3DSem v0.2 consists of 661 objects from 106 categories. This dataset is the result of 14,200+ hours of human effort for annotation and verification by 20+ annotators.
HM3DSem v0.2 is free and available here for academic, non-commercial research. Researchers can use it with FAIR’s Habitat simulator to train embodied agents, such as home robots and AI assistants, at scale for semantic navigation tasks. HM3DSem v0.1 was also the basis of the recently concluded Habitat 2022 ObjectNav challenge. Please see our arxiv report for more details.
3D60 is a collective dataset generated in the context of various 360o vision research works. It comprises multi-modal stereo renders of scenes from realistic and synthetic large-scale 3D datasets (Matterport3D, Stanford2D3D, SunCG)
The UAVA dataset us specifically designed for fostering applications which consider UAVs and humans as cooperative agents. We employ a real-world 3D scanned dataset (Matterport3D ), physically-based shading, a gamified simulator for realistic drone navigation trajectory collection and randomized sampling, to generate multimodal data both from the user’s exocentric view of the drone, as well as the drone’s egocentric view.
This is a subset of the UAVA dataset consisting of DJI M2ED drone renders from an exocentric user view.
Recent work on depth estimation up to now has only focused on projective images ignoring 360 content which is now increasingly and more easily produced. However, we show that monocular depth estimation models trained on traditional images produce sub-optimal results on omnidirectional images, showcasing the need for training directly on 360 datasets, which however, are hard to acquire. In this work, we circumvent the challenges associated to acquiring high quality 360 datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360 via rendering. This dataset, which is considerably larger than similar projective datasets, is publicly offered to the community to enable future research in this direction. We use this dataset to learn in an end-to-end fashion the task of depth estimation from 360 images. We show promising results in our synthesized data as well as in unseen realistic images. A dataset of omnidirectional (360 - spherical panoramas) images with their corresponding ground truth depths. The 360 dataset provides 360 color images of indoor scenes along with their corresponding ground truth depth annotations. This dataset is composed from renders of other publicly available textured 3D datasets of indoor scenes. Specifically, it contains renders from two Computer Generated (CG) datasets, SunCG, SceneNet, and two realistic ones, acquired by scanning indoor building, Stanford2D3D, and Matterport3D. The 360 renders are produced by utilizing a path-tracing renderer and placing a spherical camera and a uniform light source at the same position in the scene. More information can be found @ http://vcl.iti.gr/360-dataset/
MAOMaps is a dataset for evaluation of Visual SLAM, RGB-D SLAM and Map Merging algorithms. It contains 40 samples with RGB and depth images, and ground truth trajectories and maps. These 40 samples are joined into 20 pairs of overlapping maps for map merging methods evaluation. The samples were collected using Matterport3D dataset and Habitat simulator.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The 3D scanner market for building applications is experiencing robust growth, projected to reach $1469.3 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 11.5% from 2025 to 2033. This expansion is fueled by several key drivers. Increasing adoption of Building Information Modeling (BIM) and the need for precise as-built documentation are accelerating the demand for accurate and efficient 3D scanning solutions. Furthermore, the rising complexity of construction projects and the growing need for faster turnaround times are pushing the adoption of these technologies. The ability of 3D scanners to capture intricate details and generate accurate models is also proving invaluable for various stages of the building lifecycle, from initial design and planning to construction monitoring and facility management. Different scanner types, such as handheld, vehicular, and tripod-mounted systems cater to diverse project needs and budgets, further contributing to market expansion. While initial investment costs can be a restraint, the long-term benefits in terms of cost savings, improved accuracy, and reduced project timelines are driving widespread adoption across residential, office building, and other sectors. The increasing availability of user-friendly software and data processing tools further lowers the barrier to entry for new users. The geographical distribution of this market is diverse, with North America and Europe currently dominating the market share. However, rapid infrastructure development and increasing construction activity in regions like Asia-Pacific and the Middle East & Africa are expected to drive significant growth in these markets over the forecast period. The competitive landscape is characterized by both established players like Leica Geosystems, Trimble, and Faro, and emerging technology companies offering innovative solutions. The continued advancement of scanning technology, including improvements in accuracy, speed, and ease of use, will further propel market growth. The development of advanced software and integration with other BIM and construction management platforms will also shape the future of the 3D building scanner market, creating opportunities for continued expansion.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The 3D virtual store software market is experiencing robust growth, driven by the increasing adoption of e-commerce and the need for immersive online shopping experiences. The market, estimated at $2 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an estimated $10 billion by 2033. This expansion is fueled by several key factors. Firstly, the fast-moving consumer goods (FMCG), automotive, and cosmetic industries are leading adopters, leveraging 3D virtual stores to enhance product visualization and customer engagement. Secondly, the shift towards cloud-based solutions offers scalability and cost-effectiveness, attracting businesses of all sizes. The rising popularity of augmented reality (AR) and virtual reality (VR) technologies further complements this trend, enabling highly interactive and engaging shopping experiences. Furthermore, advancements in 3D modeling and rendering technologies are constantly improving the realism and quality of virtual stores, enhancing user experience and driving market adoption. However, challenges such as high initial investment costs for implementation and the need for skilled personnel to manage and maintain these systems could potentially restrain market growth to some extent. The market segmentation reveals a strong preference for cloud-based solutions over on-premise deployments, reflecting the growing trend towards flexible and scalable IT infrastructure. Geographically, North America and Europe currently dominate the market, but Asia Pacific is poised for significant growth due to its expanding e-commerce sector and increasing internet penetration. Key players like Tangiblee, Adloid, Treedis, Matterport, Inc., and others are actively shaping the market landscape through continuous innovation and strategic partnerships. The competitive landscape is characterized by both established players and emerging startups, leading to a dynamic and innovative market with continuous advancements in features and functionalities. The forecast period suggests continued strong growth, driven by ongoing technological improvements and increasing consumer demand for immersive online shopping experiences.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
SceneSplat-7K
We propose SceneSplat-7K dataset, including indoor 3D Gaussian Splatting scenes generated from ScanNet, ScanNet++, ScanNet++ v2, Replica, Hypersim, 3RScan, ARKitScenes, and Matterport3D. The dataset in total contains 7,916 scenes and 11.27 Billion 3DGS. Constructing this dataset required computational resources equivalent to 150 GPU-days on one NVIDIA L4 GPU. SceneSplat-7K achieves high-fidelity reconstruction quality with an average PSNR of 29.64 dB and depth_l1 loss… See the full description on the dataset page: https://huggingface.co/datasets/GaussianWorld/scene_splat_7k.
R2R is the first benchmark dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. There is a test evaluation server for this dataset available at EvalAI.
Modern 3D vision advancements rely on data driven methods and thus, task specific annotated datasets. Especially for geometric inference tasks like depth and surface estimation, the collection of high quality data is very challenging, expensive and laborious. While considerable efforts have been made for traditional pinhole cameras, the same cannot be said for omnidirectional ones.
3D60 is a collective dataset generated in the context of various 360o vision research works. It comprises multi-modal omnidirectional stereo renders of scenes from realistic and synthetic large-scale 3D datasets (Matterport3D, Stanford2D3D and SunCG).
Our dataset fills a very important gap in data-driven spherical 3D vision and, more specifically, for the monocular and stereo dense depth and surface estimation tasks.
We originate by exploiting the efforts made in providing synthetic and real scanned 3D datasets of interior spaces and re-using them via ray-tracing in order to generate high quality, densely annotated spherical panoramas.
The Matterport3D dataset is a large RGB-D dataset for scene understanding in indoor environments. It contains 10,800 panoramic views inside 90 real building-scale scenes, constructed from 194,400 RGB-D images. Each scene is a residential building consisting of multiple rooms and floor levels, and is annotated with surface construction, camera poses, and semantic segmentation.