Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A labeled point-cloud dataset taken from the Semantic3D project (http://semantic3d.net/view_dbase.php?chl=1). The dataset has billions of XYZ-RGB points and labels them into 7 classes.
The data are raw ASCII files containing 7 columns (X, Y, Z, Intensity, R, G, B) and the labels are
{1: man-made terrain, 2: natural terrain, 3: high vegetation, 4: low vegetation, 5: buildings, 6: hard scape, 7: scanning artefacts, 8: cars} including an 8th class of unlabeled.
The data are taken directly from the Semantic3D competition and users must check and cite the rules and regulations posted on the original site: http://semantic3d.net/view_dbase.php?chl=1
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The BelHouse3D dataset is a synthetic point cloud dataset for 3D indoor scene semantic segmentation. It is constructed using real-world references from 32 houses in Belgium, ensuring that the synthetic data closely aligns with real-world conditions. Additionally, it includes a test set with data occlusion to simulate out-of-distribution (OOD) scenarios, reflecting the occlusions commonly encountered in real-world point clouds. The dataset is used to benchmark both fully supervised and few-shot learning (FSL) segmentation models.
Facebook
TwitterLiDAR point-cloud segmentation is an important problem for many applications. For large-scale point cloud segmentation, the de facto method is to project a 3D point cloud to get a 2D LiDAR image and use convolutions to process it.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains 2.904 geometries of single-family houses in the form of annotated Point Clouds, and was developed in order to train 3D Generative Adversarial Networks with architecturally relevant data. More specificaly the geometries are segmented within 3 classes: wall, roof, floor. The points of the point clouds are saved in .pts files while their labels are saved in .seg files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The proposed dataset
Facebook
TwitterAccurate identification of ligand binding sites (LBS) on a protein structure is critical for understanding protein function and designing structure-based drugs. As the previous pocket-centric methods are usually based on the investigation of pseudo-surface-points outside the protein structure, they cannot fully take advantage of the local connectivity of atoms within the protein, as well as the global 3D geometrical information from all the protein atoms. In this paper, we propose a novel point clouds segmentation method, PointSite, for accurate identification of protein ligand binding atoms, which performs protein LBS identification at the atom-level in a protein-centric manner. Specifically, we first transfer the original 3D protein structure to point clouds and then conduct segmentation through Submanifold Sparse Convolution based U-Net. With the fine-grained atom-level binding atoms representation and enhanced feature learning, PointSite can outperform previous methods in atom Intersection over Union (atom-IoU) by a large margin. Furthermore, our segmented binding atoms, that is, atoms with high probability predicted by our model can work as a filter on predictions achieved by previous pocket-centric approaches, which significantly decreases the false-positive of LBS candidates. Besides, we further directly extend PointSite trained on bound proteins for LBS identification on unbound proteins, which demonstrates the superior generalization capacity of PointSite. Through cascaded filter and reranking aided by the segmented atoms, state-of-the-art performance can be achieved over various canonical benchmarks, CAMEO hard targets, and unbound proteins in terms of the commonly used DCA criteria.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Urban sewer pipelines, as the critical guarantors of urban resilience and sustainable development, undertake the task of sewage disposal and flood prevention. However, in many countries, the most municipal sewer systems have been in service for 60 to 100 years, with the worst condition rating (D+) evaluated by ASCE.
As laser scanning is fast becoming the state-of-the-art inspection technique for underground sewers, semantic segmentation of pipeline point clouds is an essential intermediate step for pipeline condition assessment and digital twinning. Currently, similar to other building structures, the scarcity of real-world point clouds has hindered the application of deep learning techniques for automated sewer pipeline semantic segmentation.
We provided a high-quality, realistic, semantically-rich public dataset named "**Sewer3D Semantic Segmentation**" (S3DSS), including 800 synthetic scans and 500 real-world scans, for point cloud semantic segmentation in sewer pipeline domain, for which there are no public datasets in the past. S3DSS contains over 917 million points with 8 categories of common sewer defects. We hope it can be a starting point for benchmarking developed approaches to promote deep learning research on point cloud of sewer pipeline defects.
The two sub-datasets were obtained in the following way.
The real point cloud data were captured in laboratory scenarios using a FARO Focus S laser scanner. We used two prototype reinforced concrete sewer pipes to create most of the defect scenes. However, for misalign and displace defects that are difficult to operate with concrete pipes, we used two steel pipes which were well-designed to simulate. A total of 500 real scans were collected.
The synthetic point cloud data were obtained by our automated synthetic data generator in Unity3D. The introduction to the synthetic point cloud data generation methodology can be found in our paper. We generated 800 scans of sewer defect scenes. If you need more data, please contact Minghao Li (liminghao@dlut.edu.cn). In S3DSS, 8 common defect classes are used which includes:
This work was supported by the National Key R & D Program of China (Grant No. 2022YFC3801000) and the National Natural Science Foundation of China (Grant No. 52479118). We also thank Haurum et al. for sharing their great work "Sewer Defect Classification using Synthetic Point Clouds" as a reference for this work.
【M. Li, X. Feng, Z. Wu, J. Bai, F. Yang, Game engine-driven synthetic point cloud generation method for LiDAR-based defect detection in sewers, Tunnelling and Underground Space Technology 163 (2025) 106755. https://doi.org/10.1016/j.tust.2025.106755.】
【Z. Wu, M. Li, Y. Han, X. Feng, Semantic segmentation of 3D point cloud for sewer defect detection using an integrated global and local deep learning network, Measurement 253 (2025) 117434. https://doi.org/10.1016/j.measurement.2025.117434.】
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by yangxin6
Released under MIT
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Ratan Jyoti
Released under MIT
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ElecsDataset is a specialized 3D semantic segmentation dataset designed for substation environments. It addresses the shortage of domain-specific annotated data in the field of substation 3D semantic segmentation. This dataset offers high-resolution
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
datasets used in the experiment
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These point clouds have been used for ISPRS funded project "Integrating IndoorGML with outdoors: Automatic routing graph generation for indoor‐outdoor transitional space for seamless navigation"!
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
UA_L-DoTT (University of Alabama’s Large Dataset of Trains and Trucks) is a collection of camera images and 3D LiDAR point cloud scans from five different data sites. Four of the data sites targeted trains on railways and the last targeted trucks on a four-lane highway. Low light conditions were present at one of the data sites showcasing unique differences between individual sensor data. The final data site utilized a mobile platform which created a large variety of view points in images and point clouds. The dataset consists of 93,397 raw images, 11,415 corresponding labeled text files, 354,334 raw point clouds, 77,860 corresponding labeled point clouds, and 33 timestamp files. These timestamps correlate images to point cloud scans via POSIX time. The data was collected with a sensor suite consisting of five different LiDAR sensors and a camera. This provides various viewpoints and features of the same targets due to the variance in operational characteristics of the sensors. The inclusion of both raw and labeled data allows users to get started immediately with the labeled subset, or label additional raw data as needed. This large dataset is beneficial to any researcher interested in machine learning using cameras, LiDARs, or both.
The full dataset is too large (~1 Tb) to be uploaded to Mendeley Data. Please see the attached link for access to the full dataset.
Facebook
TwitterThe proposed dataset, termed PC-Urban (Urban Point Cloud), is captured with an Ouster LiDAR sensor with 64 channels. The sensor is installed on an SUV that drives through the downtown of Perth, Western Australia (WA), Australia. The dataset comprises over 4.3 billion points captured for 66K sensor frames. The labelled data is organized as registered and raw point cloud frames, where the former has a different number of registered consecutive frames. We provide 25 class labels in the dataset covering 23 million points and 5K instances. Labelling is performed with PC-Annotate and can easily be extended by the end-users employing the same tool.The data is organized into unlabelled and labelled 3D point clouds. The unlabelled data is provided in .PCAP file format, which is the direct output format of the used Ouster LiDAR sensor. Raw frames are extracted from the recorded .PCAP files in the form of Ply and Excel files using the Ouster Studio Software. Labelled 3D point cloud data consists of registered or raw point clouds. A labelled point cloud is a combination of Ply, Excel, Labels and Summary files. A point cloud in Ply file contains X, Y, Z values along with color information. An Excel file contains X, Y, Z values, Intensity, Reflectivity, Ring, Noise, and Range of each point. These attributes can be useful in semantic segmentation using deep learning algorithms. The Label and Label Summary files have been explained in the previous section. Our one GB raw data contains nearly 1,300 raw frames, whereas 66,425 frames are provided in the dataset, each comprising 65,536 points. Hence, 4.3 billion points captured with the Ouster LiDAR sensor are provided. Annotation of 25 general outdoor classes is provided, which include car, building, bridge, tree, road, letterbox, traffic signal, light-pole, rubbish bin, cycles, motorcycle, truck, bus, bushes, road sign board, advertising board, road divider, road lane, pedestrians, side-path, wall, bus stop, water, zebra-crossing, and background. With the released data, a total of 143 scenes are annotated which include both raw and registered frames.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NPM3D (https://npm3d.fr/paris-carla-3d) consists of mobile laser scanning (MLS) point clouds collected in four different regions in the French cities of Paris and Lille, where each point has been annotated with two labels: one that assigns it to one out of 10 semantic categories and another one that assigns it to an object instance. When inspecting the data, we found 9 cases where multiple tree instances had not been separated correctly (i.e., they had the same ground truth instance label). These cases were manually corrected using the CloudCompare software (https://www.cloudcompare.org), and 35 individual tree instances were obtained. Our variant of the dataset with 10 semantic categories and enhanced instance labels is publicly available.
Facebook
Twitterhttps://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
Supervised training of a deep neural network for semantic segmentation of point clouds requires a large amount of labelled data. Nowadays, it is easy to acquire a huge number of points with high density in large-scale areas using current LiDAR and photogrammetric techniques. However it is extremely time-consuming to manually label point clouds for model training. We propose an active and incremental learning strategy to iteratively query informative point cloud data for manual annotation and the model is continuously trained to adapt to the newly labelled samples in each iteration. We evaluate the data informativeness step by step and effectively and incrementally enrich the model knowledge. We use the airborne laser scanning point clouds captured over the Rotterdam central to evaluate our proposed method. Date Submitted: 2020-12-16
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains high-resolution point cloud data of nine reinforced concrete bridges located in rural areas of Japan, collected in December 2023 using the Matterport Pro3 terrestrial laser scanner. The scanner features a 360° horizontal field of view (FOV) and a 295° vertical FOV, operating with a 904nm wavelength laser beam. It achieves a measurement accuracy of ±20 mm at a distance of 10 m and captures up to 100,000 points per second.Key characteristics of the dataset:Data Format: LASCoordinate System: Local, without georeferencingResolution: Coordinate scale value of 1 mmThis dataset was created to support research on automated dimension estimation of bridge components using semantic segmentation and geometric analysis. It can be utilized by researchers and practitioners in structural engineering, computer vision, and infrastructure management for tasks such as semantic segmentation, structural analysis, and digital twin development.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bamboo forests are an important type of forest resource in China. The extraction and analysis of relevant structural parameters of bamboo forests are conducive to improving the intelligent, efficient and precise production and management level of bamboo forests, and providing scientific basis for the health assessment and monitoring of bamboo forests. The deep learning method based on two-dimensional images for extracting structural parameters of bamboo forests is easily affected by environmental conditions and lacks three-dimensional spatial information, making it difficult to comprehensively reflect multiple key structural parameters of bamboo forests. Therefore, the construction of a three-dimensional point cloud semantic segmentation dataset of bamboo forests is of great significance for the research on high-throughput and accurate extraction methods of relevant structural parameters of bamboo forests and the automatic acquisition of multiple structural parameters of bamboo forests in large-scale environments. Through field collection and data processing, a three-dimensional point cloud semantic segmentation dataset of bamboo forests was constructed, including raw three-dimensional point cloud data, three types of bamboo forest point cloud data, merged data of point cloud and label, and data of label category and file path information, totaling 8.68GB. This dataset will provide valuable basic point cloud data resources for research directions such as three-dimensional point cloud processing and semantic segmentation, and is also of great significance for the research on high-throughput and automated acquisition methods of multiple structural parameters in large-scale bamboo forest environments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 11 terrestrial laser scanning (TLS) tree point clouds (in .LAZ format v1.4) of 7 different species, which have been manually labeled into leaf and wood points. The labels are contained in the Classification field (0 = wood, 1 = leaf). The point clouds have additional attributes (Deviation, Reflectance, Amplitude, GpsTime, PointSourceId, NumberOfReturns, ReturnNumber). Before labeling, all point clouds were filtered by Deviation, discarding all points with a Deviation greater than 50. An ASCII file with tree species and tree positions (in ETRS89 / UTM zone 32N; EPSG:25832) is provided, which can be used to normalize and center the point clouds. This dataset is intended to be used for training and validation of algorithms for semantic segmentation (leaf-wood separation) of TLS tree point clouds, as done by Esmorís et al. 2023 (Related Publication). The point clouds are a subset of a larger dataset, which is available on PANGAEA (Weiser et al. 2022b, see Related Dataset). More details on data acquisition and processing, file formats, and quality assessments can be found in the corresponding data description paper (Weiser et al. 2022a, see Related Material).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Background
This resource contains Indoor Lodz University of Technology Point Cloud Dataset (InLUT3D) - a point cloud dataset tailored for real object classification and both semantic and instance segmentation tasks. Comprising of 321 scans, some areas in the dataset are covered by multiple scans. All of them are captured using the Leica BLK360 scanner.
Train/test split
The datset's authors impose the following train-test split:
Split Setup range
train from setup_0 to setup_300
test from setup_301 to setup_320
The corresponding setups' objects are recommended for splits for the classificcation task.
Available categories
The points are divided into 18 distinct categories outlined in the label.yaml file along with their respective codes and colors. Among categories you will find:
ceiling,
floor,
wall,
stairs,
column,
chair,
sofa,
table,
storage,
door,
window,
plant,
dish,
wallmounted,
device,
radiator,
lighting,
other.
Challenges
Several challenges are intrinsic to the presented dataset:
Extremely non-uniform categories distribution across the dataset.
Presence of virtual images, particularly in reflective surfaces, and data exterior to windows and doors.
Occurrence of missing data due to scanning shadows (certain areas were inaccessible to the scanner's laser beam).
High point density throughout the dataset.
Dataset structure
The structure of the dataset is the following:
inlut3d.tar.gz/├─ setup_0/│ ├─ projection.jpg│ ├─ segmentation.jpg│ ├─ setup_0.pts├─ setup_1/│ ├─ projection.jpg│ ├─ segmentation.jpg│ ├─ setup_1.pts...
projection.jpg A file containing a spherical projection of a corresponding PTS file.
segmentation.jpg A file with objects marked with unique colours.
setup_x.pts A file with point cloud int the textual PTS format.
Point characteristic
Each PTS file contains 8 columns:
Column ID Description
1 X Cartesian coordinate
2 Y Cartesian coordinate
3 Z Cartesian Coordinate
4 Red colour in RGB space in the range [0, 255]
5 Green colour in RGB space in the range [0, 255]
6 Blue colour in RGB space in the range [0, 255]
7 Category code
8 Instance ID
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A labeled point-cloud dataset taken from the Semantic3D project (http://semantic3d.net/view_dbase.php?chl=1). The dataset has billions of XYZ-RGB points and labels them into 7 classes.
The data are raw ASCII files containing 7 columns (X, Y, Z, Intensity, R, G, B) and the labels are
{1: man-made terrain, 2: natural terrain, 3: high vegetation, 4: low vegetation, 5: buildings, 6: hard scape, 7: scanning artefacts, 8: cars} including an 8th class of unlabeled.
The data are taken directly from the Semantic3D competition and users must check and cite the rules and regulations posted on the original site: http://semantic3d.net/view_dbase.php?chl=1