This 3D high-fidelity synthetic dataset simulates real-world Driver Monitoring System (DMS) environments using photorealistic 3D scene modeling. It includes multi-modal sensor outputs such as camera images, videos, and point clouds, all generated through simulation. The dataset is richly annotated with object classification, detection, and segmentation labels, as well as human pose data (head, eye, arm, and leg position/orientation), camera parameters, and temporal metadata such as illumination and weather conditions. Ideal for training and evaluating models in autonomous driving, robotics, driver monitoring, computer vision, and synthetic perception tasks.
With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist, which if proved effective can be exploited. We therefore argue that research in this domain would be hugely useful. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is labelled from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender. See our project website http://www.synthcity.xyz or paper https://arxiv.org/abs/1907.04758 for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract: We combine forest inventory information, a tree point cloud database and the open-source laser scanning simulation framework HELIOS++ to generate synthetic laser scanning data of forests. Airborne laser scanning data of six 1-ha plots in temperate, central European forests was simulated and compared to real ALS data of these plots. The synthetic 3D representations of the forest stands were composed of real ALS point clouds of single trees, and, for comparison, simplified tree models with cylindrical stems and spheroidal crowns, both in form of penetrable point clouds and with an impenetrable surface. This dataset includes the HELIOS++ data files to reproduce the simulations: - the height normalized original ALS point clouds - the synthetic forest stand point clouds - soil layers - scene files - survey files TechnicalRemarks: This dataset includes files for HELIOS++ (version 1.0.7).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The synthetic dataset was generated using KITTI-like specifications and annotations format. It is comprised by the KITTI standard folders: label_2, image_2 and calib. Furthermore, there is a velodyne file for each of the following use cases:
Additionally, the label split for testing and training sets used can be found at file: Labels_split.
This work was made as part of a master thesis. For further details, please check the dataset generation source code [1]. Any further questions please contact Leandro Alexandrino (l.alexandrino@ua.pt).
[1] - Fork deepgtav-presil - leandro alexandrino, https://github.com/leandroalexandrino1995/DeepGTAVPreSIL.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
SynLiDAR is a large-scale synthetic LiDAR sequential point cloud dataset with point-wise annotations. 13 sequences of LiDAR point cloud with around 20k scans (over 19 billion points and 32 semantic classes) are collected from virtual urban cities, suburban towns, neighborhood, and harbor.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Monitoring and preserving forests is becoming increasingly important due to the escalating effects of climate change and threats of deforestation. In the domain of forest science, three-dimensional data acquired through remote sensing technology has gained prominence for its ability to provide deep insights into the complex nature of forest environments. The process of identifying and segmenting individual trees in three-dimensional point clouds is a crucial yet challenging prerequisite for many forest analyses such as the classification of tree health and species. Tree segmentation is currently dominated by classical approaches that often rely on the forest’s canopy height model to identify tree crowns, but with limited success in complex environments and in particular areas underneath the canopy. Recent deep learning models are adept at performing instance segmentation on point clouds, but the performance of these models relies on the quantity and quality of training data. The difficulty of obtaining forest data owing to the cost of technology and annotation process hinders the development of neural networks for tree segmentation in forest point clouds. In this thesis, a scalable workflow is presented to produce arbitrarily large quantities of synthetic forest point clouds, and its effectiveness in deep learning is demonstrated. It is shown that by applying large amounts of synthetic forest data to pretrain neural networks, the individual tree segmentation performance in synthetic and real forests is significantly improved, outperforming classical segmentation methods. It is concluded that this workflow is effective at producing large quantities of realistic forest data, and its incorporation in deep learning fosters progress in tackling tree segmentation in forest point clouds. Its efficiency and scalability further indicate its potential for the development of frameworks, benchmarking systems, high throughput data analysis, and other analytical tasks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This page only provides the drone-view image dataset.
The dataset contains drone-view RGB images, depth maps and instance segmentation labels collected from different scenes. Data from each scene is stored in a separate .7z file, along with a color_palette.xlsx
file, which contains the RGB_id and corresponding RGB values.
All files follow the naming convention: {central_tree_id}_{timestamp}
, where {central_tree_id}
represents the ID of the tree centered in the image, which is typically in a prominent position, and timestamp
indicates the time when the data was collected.
Specifically, each 7z file includes the following folders:
rgb: This folder contains the RGB images (PNG) of the scenes and their metadata (TXT). The metadata describes the weather conditions and the world time when the image was captured. An example metadata entry is: Weather:Snow_Blizzard,Hour:10,Minute:56,Second:36
.
depth_pfm: This folder contains absolute depth information of the scenes, which can be used to reconstruct the point cloud of the scene through reprojection.
instance_segmentation: This folder stores instance segmentation labels (PNG) for each tree in the scene, along with metadata (TXT) that maps tree_id
to RGB_id
. The tree_id
can be used to look up detailed information about each tree in obj_info_final.xlsx
, while the RGB_id
can be matched to the corresponding RGB values in color_palette.xlsx
. This mapping allows for identifying which tree corresponds to a specific color in the segmentation image.
obj_info_final.xlsx: This file contains detailed information about each tree in the scene, such as position, scale, species, and various parameters, including trunk diameter (in cm), tree height (in cm), and canopy diameter (in cm).
landscape_info.txt: This file contains the ground location information within the scene, sampled every 0.5 meters.
For birch_forest, broadleaf_forest, redwood_forest and rainforest, we also provided COCO-format annotation files (.json). Two such files can be found in these datasets:
⚠️: 7z files that begin with "!" indicate that the RGB values in the images within the instance_segmentation
folder cannot be found in color_palette.xlsx
. Consequently, this prevents matching the trees in the segmentation images to their corresponding tree information, which may hinder the application of the dataset to certain tasks. This issue is related to a bug in Colossium/AirSim, which has been reported in link1 and link2.
With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist, which if proved effective can be exploited. We therefore argue that research in this domain would be hugely useful. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is labelled from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The increasing use of deep learning techniques has reduced interpretation time and, ideally, reduced interpreter bias by automatically deriving geological maps from digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge due to the subjective nature of geological mapping and the difficulty in collecting quantitative validation data. Additionally, many state-of-the-art deep learning methods are limited to 2D image data, which is insufficient for 3D digital outcrops, such as hyperclouds. To address these challenges, we present Tinto, a multi-sensor benchmark digital outcrop dataset designed to facilitate the development and validation of deep learning approaches for geological mapping, especially for non-structured 3D data like point clouds. Tinto comprises two complementary sets: 1) a real digital outcrop model from Corta Atalaya (Spain), with spectral attributes and ground-truth data, and 2) a synthetic twin that uses latent features in the original datasets to reconstruct realistic spectral data (including sensor noise and processing artifacts) from the ground-truth. The point cloud is dense and contains 3,242,964 labeled points. We used these datasets to explore the abilities of different deep learning approaches for automated geological mapping. By making Tinto publicly available, we hope to foster the development and adaptation of new deep learning tools for 3D applications in Earth sciences.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset contains 3D point cloud data of a synthetic plant with 10 sequences. Each sequence contains 0-19 days data at every growth stage of the specific sequence.
The USGS, in cooperation with the U.S. Bureau of Land Management (BLM), created a series of geospatial products of the South Cow Mountain Recreational Area, Lake County, California, using historic aerial imagery and structure-from-motion (SfM) photogrammetry methods. Products were generated from stereo historical aerial imagery acquired by the BLM in May of 1977. The aerial imagery were downloaded from the USGS Earth Resources Observation and Science (EROS) Data Center's USGS Single Aerial Frame Photo archive and a was created using USGS guidelines. Data were processed using SfM photogrammetry to generate a three-dimensional point cloud (.laz) that identifies pixels of an object from multiple images taken from various angles and calculates the x, y, and z coordinates of that object/pixel. The point cloud was processed to create a DSM (.tif) representing the continuous surface of the uppermost reflective surface (57.3 cm resolution). Finally, source images were stitched together based on shared pixels and orthogonally adjusted to the DSM to create a high resolution (approximately 18.3 cm) orthoimage (.tif) for the study area. This dataset includes a point cloud, digital surface model (DSM), and orthoimagery, as well as synthetic ground-control points (GCPs) and point clusters used to georeference the datasets. Separate metadata for each product are provided on the ScienceBase page for each child item.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
51WORLD Synthetic Dataset Usage Documentation
1 Introduction
The 51WORLD systhetic dataset mainly contains camera sensor-related data and LiDAR sensor-related data generated by 51Sim-One. Camera sensor-related data mainly includes images and corresponding semantic segmentation, instance segmentation, depth annotation, and Object Detection annotation; LiDAR sensor-related data mainly includes laser point clouds and annotation of 3Dbboxes, semantic segmentation annotation… See the full description on the dataset page: https://huggingface.co/datasets/51WORLD/DataOne-synthetic-nuscenes-v1.1-sample.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The cost and effort of modelling existing bridges from point clouds currently outweighs the perceived benefits of the resulting model. There is a pressing need to automate this process. Previous research has achieved the automatic generation of surface primitives combined with rule-based classification to create labelled cuboids and cylinders from point clouds. While these methods work well in synthetic datasets or idealized cases, they encounter huge challenges when dealing with real-world bridge point clouds, which are often unevenly distributed and suffer from occlusions. In addition, real bridge geometries are complicated. In this paper, we propose a novel top-down method to tackle these challenges for detecting slab, pier, pier cap, and girder components in reinforced concrete bridges. This method uses a slicing algorithm to separate the deck assembly from pier assemblies. It then detects and segments pier caps using their surface normal, and girders using oriented bounding boxes and density histograms. Finally, our method merges over-segments into individually labelled point clusters. The results of 10 real-world bridge point cloud experiments indicate that our method achieves an average detection precision of 98.8%. This is the first method of its kind to achieve robust detection performance for the four component types in reinforced concrete bridges and to directly produce labelled point clusters. Our work provides a solid foundation for future work in generating rich Industry Foundation Classes models from the labelled point clusters.
A large-scale synthetic dataset for LiDAR semantic segmentation, consisting of 13 LiDAR point cloud sequences with 198,396 scans in total.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract: This dataset includes files to reproduce HELIOS++ airborne laser scanning simulations of six 1-ha plots in temperate forests in south-western Germany. In addition, the real ALS data of the same plots is provided. Synthetic 3D representations of the plots were created based on forest inventory information. They were composed of real ULS point clouds of single trees, and, for comparison, simplified tree models with cylindrical stems and spheroidal crowns. This dataset includes all files required to run the HELIOS++ simulations, including the synthetic forest stand point clouds, soil layers, scene files, and survey files. The simulation output can then be compared to the real ALS data which is also provided. TechnicalRemarks: This dataset includes files for HELIOS++ (version 1.1.0)
The USGS, in cooperation with the U.S. Bureau of Land Management (BLM), created a series of geospatial products using historic aerial imagery and Structure from Motion (SfM) photogrammetry methods. A point cloud dataset (.laz) of the South Cow Mountain Recreational Area was generated from stereo historical aerial imagery acquired in by the BLM in 1977. The aerial imagery was downloaded from the USGS Earth Resources Observation and Science (EROS) Data Center's USGS Single Aerial Frame Photo archive and the point cloud was created using USGS guidelines. Photo alignment, error reduction, and dense point cloud generation followed guidelines documented in Over, J.R., Ritchie, A.C., Kranenburg, C.J., Brown, J.A., Buscombe, D., Noble, T., Sherwood, C.R., Warrick, J.A., and Wernette, P.A., 2021, Processing coastal imagery with Agisoft Metashape Professional Edition, version 1.6— Structure from motion workflow documentation: U.S. Geological Survey Open-File Report 2021–1039, 46 p., https://doi.org/10.3133/ofr20211039. Photo-identifiable points, selected as synthetic ground-control points, followed guidelines documented in Sherwood, C.R.; Warrick, J.A.; Hill, A.D.; Ritchie, A.C.; Andrews, B.D., and Plant, N.G., 2018. Rapid, remote assessment of Hurricane Matthew impacts using four-dimensional structure-from-motion photogrammetry https://doi.org/10.2112/JCOASTRES-D-18-00016.1 Additional post-processing of the 1977 dense point cloud, using Iterative Closest Point (ICP) analysis, was used to improve the alignment with the 2015 LiDAR point cloud. The ICP analysis is explained in Low, K.L., 2004. Linear least-squares optimization for point-to-plane ICP surface registration. Chapel Hill, University of North Carolina, 4(10), pp.1-3. http://www.comp.nus.edu.sg/~lowkl/publications/lowk_point-to-plane_icp_techrep.pdf Data were processed using photogrammetry to generate a three-dimensional point cloud that identifies pixels of an object from multiple images taken from various angles and calculates the x, y, and z coordinates of that object/pixel. The point cloud was processed to create a digital surface model of the study area (57.3 cm resolution). Finally, source images were stitched together based on shared pixels and orthogonally adjusted to the digital surface model to create a high resolution (approximately 18.3 cm) orthoimage for the study area.
https://choosealicense.com/licenses/bsd-3-clause-clear/https://choosealicense.com/licenses/bsd-3-clause-clear/
LiSu: A Dataset for LiDAR Surface Normal Estimation
LiSu provides synthetic LiDAR point clouds, each annotated with surface normal vectors. This dataset is generated using CARLA simulator, ensuring diverse environmental conditions for robust training and evaluation. Below is an example from LiSu, where surface normals are linearly mapped to the RGB color space for intuitive visualization:
Dataset Details
Dataset Description… See the full description on the dataset page: https://huggingface.co/datasets/dmalic/LiSu.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🏗️ BridgePoint-Seg Dataset
BridgePoint-Seg is a synthetic 3D point cloud dataset developed for large-scale masonry bridge segmentation. It provides training and test sets of point clouds with detailed semantic labels across straight and curved masonry bridges.
📁 Dataset Structure
BridgePoint-Seg/ ├── syn_data/ │ ├── train/ │ │ ├── straight_bridge/ # 2,177 training samples │ │ └── curved_bridge/ # 1,500 training samples │ └── test/ │ ├──… See the full description on the dataset page: https://huggingface.co/datasets/jing222/syn_masonry_bridge.
We combine forest inventory information, a tree point cloud database and the open-source laser scanning simulation framework HELIOS++ to generate synthetic laser scanning data of forests. Airborne laser scanning data of six 1-ha plots in temperate, central European forests was simulated and compared to real ALS data of these plots. The synthetic 3D representations of the forest stands were composed of real ALS point clouds of single trees, and, for comparison, simplified tree models with cylindrical stems and spheroidal crowns, both in form of penetrable point clouds and with an impenetrable surface. This dataset includes the HELIOS++ data files to reproduce the simulations: - the height normalized original ALS point clouds - the synthetic forest stand point clouds - soil layers - scene files - survey files This dataset includes files for HELIOS++ (version 1.0.7).
https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6
The dataset comprises three dynamic scenes characterized by both simple and complex lighting conditions. The quantity of cameras ranges from 4 to 512, including 4, 6, 8, 10, 12, 14, 16, 32, 64, 128, 256, and 512. The point clouds are randomly generated.
This 3D high-fidelity synthetic dataset simulates real-world Driver Monitoring System (DMS) environments using photorealistic 3D scene modeling. It includes multi-modal sensor outputs such as camera images, videos, and point clouds, all generated through simulation. The dataset is richly annotated with object classification, detection, and segmentation labels, as well as human pose data (head, eye, arm, and leg position/orientation), camera parameters, and temporal metadata such as illumination and weather conditions. Ideal for training and evaluating models in autonomous driving, robotics, driver monitoring, computer vision, and synthetic perception tasks.