Facebook
TwitterDataset Overview
This dataset is a simulated dataset containing 1,000 entries of construction cost estimates. It is designed for use in predictive modeling, machine learning, and business analytics, particularly in the construction and project management domains. The dataset includes both numerical and textual data, providing opportunities for hybrid modeling approaches that combine structured data and natural language processing.
The primary objective of this dataset is to facilitate modeling of construction cost estimation while considering policy-driven adjustments (discounts or markups). It can be used to analyze and predict how various factors, such as material costs, labor costs, and policy reasons, affect final project estimates.
Feature Descriptions
1) Material_Cost (numeric):
2) Labor_Cost (numeric):
3) Profit_Rate (numeric):
4) Discount_or_Markup (numeric):
5) Policy_Reason (text):
6) Total_Estimate (numeric): - The final estimated project cost, calculated as:
(Material_Cost + Labor_Cost) × (1 + Profit_Rate/100) + Discount_or_Markup
Facebook
TwitterData-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.
Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico
The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.
Facebook
TwitterThis dataset features over 80,000 high-quality images of construction sites sourced from photographers worldwide. Built to support AI and machine learning applications, it delivers richly annotated and visually diverse imagery capturing real-world construction environments, machinery, and processes.
Key Features: 1. Comprehensive Metadata: the dataset includes full EXIF data such as aperture, ISO, shutter speed, and focal length. Each image is annotated with construction phase, equipment types, safety indicators, and human activity context—making it ideal for object detection, site monitoring, and workflow analysis. Popularity metrics based on performance on our proprietary platform are also included.
Unique Sourcing Capabilities: images are collected through a proprietary gamified platform, with competitions focused on industrial, construction, and labor themes. Custom datasets can be generated within 72 hours to target specific scenarios, such as building types, stages (excavation, framing, finishing), regions, or safety compliance visuals.
Global Diversity: sourced from contributors in over 100 countries, the dataset reflects a wide range of construction practices, materials, climates, and regulatory environments. It includes residential, commercial, industrial, and infrastructure projects from both urban and rural areas.
High-Quality Imagery: includes a mix of wide-angle site overviews, close-ups of tools and equipment, drone shots, and candid human activity. Resolution varies from standard to ultra-high-definition, supporting both macro and contextual analysis.
Popularity Scores: each image is assigned a popularity score based on its performance in GuruShots competitions. These scores provide insight into visual clarity, engagement value, and human interest—useful for safety-focused or user-facing AI models.
AI-Ready Design: this dataset is structured for training models in real-time object detection (e.g., helmets, machinery), construction progress tracking, material identification, and safety compliance. It’s compatible with standard ML frameworks used in construction tech.
Licensing & Compliance: fully compliant with privacy, labor, and workplace imagery regulations. Licensing is transparent and ready for commercial or research deployment.
Use Cases: 1. Training AI for safety compliance monitoring and PPE detection. 2. Powering progress tracking and material usage analysis tools. 3. Supporting site mapping, autonomous machinery, and smart construction platforms. 4. Enhancing augmented reality overlays and digital twin models for construction planning.
This dataset provides a comprehensive, real-world foundation for AI innovation in construction technology, safety, and operational efficiency. Custom datasets are available on request. Contact us to learn more!
Facebook
TwitterThis chipped training dataset is over Paris and includes 30cm high-resolution imagery (.tif format) and corresponding building footprint vector labels (.geojson format) in 256 x 256 or smaller pixel tile/label pairs. This dataset is a ramp Tier 1 dataset, meaning it has been thoroughly reviewed and improved. This dataset was used in developing the ramp baseline model and contains 1,027 tiles and 3,468 buildings. The original dataset was sourced from the SpaceNet 2 Dataset before the imagery was tiled down from 650x650 pixel chips and labels were revised to be consistent with the ramp datasets notion of rooftop as the building footprint. Dataset keywords: Urban, Dense.
Facebook
Twitterlutherwaves/sample-construction-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis documentation and dataset can be used to test the performance of automated fault detection and diagnostics algorithms for buildings. The dataset was created by LBNL, PNNL, NREL, ORNL and ASHRAE RP-1312 (Drexel University). It includes data for air-handling units and rooftop units simulated with PNNL's large office building model.
Facebook
TwitterThis dataset contains 58,255 images from construction site scenes, include indoor and outdoor scenes. The data includes workers of Asian background. The data includes multiple devices, multiple lighting conditions, multiple scenes and multiple collection time periods. Annotations cover rectangular bounding boxes of human body, safety helmets and safety vests.It is suitable for construction site safety monitoring, PPE detection, and worker behavior analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Construction Worker is a dataset for object detection tasks - it contains Worker annotations for 1,782 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Access the Home Depot products dataset, a comprehensive collection of web-scraped data featuring home improvement products. Discover trending tools, hardware, appliances, décor, and gardening essentials to enhance your projects. From power tools and building materials to lighting, furniture, and outdoor living items, this dataset provides insights into top-rated products, best-selling brands, and emerging trends.
Download now to explore detailed product data for smarter decision-making in home improvement, DIY, and construction projects.
For a closer look at the product-level data we’ve extracted from Home Depot, including pricing, stock status, and detailed specifications, visit the Home Depot dataset page. You can explore sample records and submit a request for tailored extracts directly from there.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The partner company’s historical data could be utilized in developing a data-driven prediction model with project division details as its inputs and project division labor-hours as the desired output. The BIM models contain 42 design features and 1559 records, each record denoting a division of fabrication. The BIM design features are listed in Table 1. Labor-hours spent on each division were extracted from job costing databases serving as the output parameter in the regression model. Although the variables in Table 1 are all considered related, there are certain inter-correlations between them and some variables can be explained by others. For instance, material length and weight are highly correlated; by knowing one, the other can be deduced. Therefore, a variable selection technique is instrumental in removing these inter-correlations in an analytical manner. It is noteworthy that the dataset was linearly scaled prior to performing analyses in order not to reveal sensitive information of the partner company without distorting patterns and relationships inherent in the data.
Facebook
Twitter## Overview
Construction Dataset is a dataset for object detection tasks - it contains Construction annotations for 2,010 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset of the images collected by the S.M.A.R.T Construction Research Group at NYUAD from a construction site on campus. The dataset contains the images used for the manuscript titled 'Transfer-learning and texture features for detailed recognition of the conditions of construction materials with small datasets' by Eyob Mengiste, Karunakar Reddy Mannem, Samuel A. Prieto and Borja García de Soto. For any inquiries, contact the corresponding author (eyob.mengiste@nyu.edu).
This database contains a total of 208 images for 7 construction material conditions broken down as follows: CMU wall - 24 images, Chiseled concrete - 49 images, Concrete - 18 images, Gypsum - 26 images Mesh - 25 images First coat plaster - 37 images Second coat plaster - 29 images
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This open-access dataset, provides a detailed time-motion study of construction work, specifically focusing on MEP (Mechanical, Electrical, and Plumbing) activities. The dataset is intended to facilitate research and analysis to improve operational efficiency and safety within the construction industry. It includes anonymized and pseudonymized data, ensuring privacy while still offering valuable insights into worker activities.
Contents: (1)Time-motion study dataset: Captures categorized work activities by MEP workers at a second-to-second level. (2) Description of work activities: Provides detailed classifications of the tasks performed, allowing for in-depth analysis.
This dataset has been made publicly available under the CC-BY-SA license, encouraging reuse and redistribution with proper attribution and share-alike terms. By downloading the dataset, users acknowledge and agree to comply with the terms outlined above.
Funding and Support: This work has been supported by the “Hukka LVI- ja sähkötöissä” (Waste in Plumbing and Electrical Work) project, funded by STUL (Electrical Contractor Association), LVI-TU (HVAC Contractor Association), and STTA (Electrical Employers Union) from Finland.
This comprehensive dataset offers valuable resources for research and analysis purposes. For further information or collaboration inquiries, feel free to reach out to discuss data collection methods and potential research partnerships: olli.seppanen@aalto.fi & christopher.gorsch@vtt.fi.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CWPV (Construction Work Posture Video) dataset encompasses recordings of twenty-one subjects performing eight awkward motions of construction tasks, amounting to a total of 504 video sets (21 subjects * 8 tasks * 3 repetitions) and 1080 IMU data sets (9 subjects * 8 tasks * 3 repetitions * 5 IMUs). Compared to existing datasets, CWPV offers continuous posture images from four different camera positions for the same motion, along with corresponding inertial measurement unit (IMU) data. This facilitates interdisciplinary research.Additionally, we developed a computer vision-based algorithm for evaluating work-related musculoskeletal disorders (WMSDs) using this dataset. The results demonstrate that this dataset is beneficial for conducting vision-based ergonomic risk assessments.For detailed instructions on using the CWPV dataset, please refer to the "README.pdf" file in the attachment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is the building dataset with a total rooftop area of 23.6 billion square meters in 3,667 natural cities in China, including the attribute of building rooftop, height, structure, function, age, style and quality, as well as the code files used to calculate these data. The deep learning models used are OCRNet, XGBoost, fine-tuned CLIP and Yolo-v8. Please refer to the paper and README file for details of specific parameters. This building data is the original version, and the processed version can be viewed here: 10.6084/m9.figshare.27992417. Related papers are published in Scientific Data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper presents the first national-scale Multi-Attribute Building dataset (CMAB) with artificial intelligence, covering 3,667 spatial cities, 31 million buildings, and 23.6 billion m² of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 363 billion m³ of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating morphology, location, and function features. Using multi-source data, including billions of remote sensing images and 60 million street view images (SVIs), we generated rooftop, height, structure, function, style, age, and quality attributes for each building with machine learning and large multimodal models. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning.Data records: A building dataset with a total rooftop area of 23.6 billion square meters in 3,667 natural cities in China, including the attribute of building rooftop, height, structure, function, age, style and quality, as well as the code files used to calculate these data. The deep learning models used are OCRNet, XGBoost, fine-tuned CLIP and Yolo-v8.Supplementary note: The architectural structure, style, and quality are affected by the temporal and spatial distribution of street views in China. Regarding the recognition of building colors, we found that the existing CLIP series model can not accurately judge the composition and proportion of building colors, and then it will be accurately calculated and supplemented by semantic segmentation and image processing. Please contact zhangyec23@mails.tsinghua.edu.cn or ylong@tsinghua.edu.cn if you have any technical problems.Reference Format: Zhang, Y., Zhao, H. & Long, Y. CMAB: A Multi-Attribute Building Dataset of China. Sci Data 12, 430 (2025). https://doi.org/10.1038/s41597-025-04730-5.
Facebook
TwitterAccess 4.7M+ high-precision building footprints across the United Kingdom, enabling advanced mapping, location analysis, and strategic decision-making. With 30+ years of data expertise, we provide clean, validated, and enriched datasets to power businesses worldwide.
Our use cases demonstrate how our data has been beneficial and helped our customers in several key areas:
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This report brings together under one cover a wide range of statistics that are currently available on the construction industry. It gives a broad perspective of statistical trends in the construction industry in Great Britain through the last decade together with some international comparisons and features on leading initiatives that may influence the future. Source agency: Office for National Statistics Designation: National Statistics Language: English Alternative title: Construction Statistics
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Building Performance Database (BPD) is the largest publicly-available source of measured energy performance data for buildings in the United States. It contains information about the building's energy use, location, and physical and operational characteristics. The BPD can be used by building owners, operators, architects and engineers to compare a building's energy performance against customized peer groups, identify energy performance opportunities, and set energy performance. It can also be used by energy performance program implementers to analyze energy performance features and trends in the building stock. The BPD compiles data from various data sources, converts it into a standard format, cleanses and quality checks the data, and provides users with access to the data in a way that maintains anonymity for data providers.
The BPD consists of the database itself, a graphical user interface allowing exploration of the data, and an application programming interface allowing the development of third-party applications using the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
All Of Construction 1_ is a dataset for object detection tasks - it contains Person annotations for 1,548 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterDataset Overview
This dataset is a simulated dataset containing 1,000 entries of construction cost estimates. It is designed for use in predictive modeling, machine learning, and business analytics, particularly in the construction and project management domains. The dataset includes both numerical and textual data, providing opportunities for hybrid modeling approaches that combine structured data and natural language processing.
The primary objective of this dataset is to facilitate modeling of construction cost estimation while considering policy-driven adjustments (discounts or markups). It can be used to analyze and predict how various factors, such as material costs, labor costs, and policy reasons, affect final project estimates.
Feature Descriptions
1) Material_Cost (numeric):
2) Labor_Cost (numeric):
3) Profit_Rate (numeric):
4) Discount_or_Markup (numeric):
5) Policy_Reason (text):
6) Total_Estimate (numeric): - The final estimated project cost, calculated as:
(Material_Cost + Labor_Cost) × (1 + Profit_Rate/100) + Discount_or_Markup