Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Bing Maps team at Microsoft released a U.S.-wide vector building dataset in 2018, which includes over 125 million building footprints for all 50 states in GeoJSON format. This dataset is extracted from aerial images using deep learning object classification methods. Large-extent modelling (e.g., urban morphological analysis or ecosystem assessment models) or accuracy assessment with vector layers is highly challenging in practice. Although vector layers provide accurate geometries, their use in large-extent geospatial analysis comes at a high computational cost. We used High Performance Computing (HPC) to develop an algorithm that calculates six summary values for each cell in a raster representation of each U.S. state: (1) total footprint coverage, (2) number of unique buildings intersecting each cell, (3) number of building centroids falling inside each cell, and area of the (4) average, (5) smallest, and (6) largest area of buildings that intersect each cell. These values a ...
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Microsoft recently released a free set of deep learning generated building footprints covering the United States of America. In support of this great work and to make these building footprints available to the ArcGIS community, Esri has consolidated the buildings into a single layer and shared them in ArcGIS Online. The footprints can be used for visualization using vector tile format or as hosted feature layer to do analysis. Learn more about the Microsoft Project at the Announcement Blog or the raw data is available at Github.
Facebook
TwitterComputer generated buiilding footprints for Maryland. The methodology for the generation of the building footprints can be found at: https://github.com/Microsoft/USBuildingFootprints. These building footprints should be used a reference only and the geometries are not considered accurate enough to provide detailed estimates related to their location, area, or associated attributes.
Facebook
TwitterBing Maps is releasing open building footprints around the world. We have detected 1.3B buildings from Bing Maps imagery between 2014 and 2024 including Maxar, Airbus, and IGN France imagery. The data is freely available for download and use under ODbL.Source: https://github.com/microsoft/GlobalMLBuildingFootprintsFile Geodatabase for download
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
From: MS BuildingsMicrosoft recently released a free set of deep learning generated building footprints covering the United States of America. In support of this great work and to make these building footprints available to the ArcGIS community, Esri has consolidated the buildings into a single layer and shared them in ArcGIS Online. The footprints can be used for visualization using vector tile format or as hosted feature layer to do analysis. Learn more about the Microsoft Project at the Announcement Blog or the raw data is available at Github.
Facebook
TwitterNYS Building Footprints - metadata info: Building footprints in New York State are from four different sources: Microsoft, Open Data, New York State Energy Research and Development Authority (NYSERDA), and Geospatial Services. The majority of the footprints are from NYSERDA, except in NYC where the primary source was Open Data. Microsoft footprints were added where the other 2 sources were missing polygons. Field Descriptions: NYSGeo Source : tells the end user if the source is NYSERDA, Microsoft, NYC Open Data, and could expand from here in the future County Name : County name populated from CIESINs. If not populated from CIESINs, identified by the GS Municipality Name : Municipality name populated from CIESINs. If not populated from CIESINs, identified by the GS Source: Source where the data came from. If NYSGeo Source = NYSERDA, the data would typically list orthoimagery, LIDAR, county data, etc. Source ID: if NYSGeo Source = NYSERDA, Source ID would typically list an orthoimage or LIDAR tile Source Date: Date the footprint was created. If the source image was from 2016 orthoimagery, 2016 would be the Source Date. Description of each footprint source:NYSERDA Building footprints that were created as part of the New York State Flood Impact Decision Support Systems https://fidss.ciesin.columbia.edu/home Footprints vary in age from county to county. Microsoft Building Footprints released 6/28/2018 - vintage unknown/varies. More info on this dataset can be found at https://blogs.bing.com/maps/2018-06/microsoft-releases-125-million-building-footprints-in-the-us-as-open-data. NYC Open Data - Building Footprints of New York City as a polygon feature class. Last updated 7/30/2018, downloaded on 8/6/2018. Feature Class of footprint outlines of buildings in New York City. Please see the following link for additional documentation- https://github.com/CityOfNewYork/nyc-geo-metadata/blob/master/Metadata/Metadata_BuildingFootprints.md Spatial Reference of Source Data: UTM Zone 18, meters, NAD 83. Spatial Reference of Web Service: Spatial Reference of Web Service: WGS 1984 Web Mercator Auxiliary Sphere.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Microsoft released a U.S.-wide vector building dataset in 2018. Although the vector building layers provide relatively accurate geometries, their use in large-extent geospatial analysis comes at a high computational cost. We used High-Performance Computing (HPC) to develop an algorithm that calculates six summary values for each cell in a raster representation of each U.S. state, excluding Alaska and Hawaii: (1) total footprint coverage, (2) number of unique buildings intersecting each cell, (3) number of building centroids falling inside each cell, and area of the (4) average, (5) smallest, and (6) largest area of buildings that intersect each cell. These values are represented as raster layers with 30 m cell size covering the 48 conterminous states. We also identify errors in the original building dataset. We evaluate precision and recall in the data for three large U.S. urban areas. Precision is high and comparable to results reported by Microsoft while recall is high for buildings with footprints larger than 200 m2 but lower for progressively smaller buildings.
Building footprints are a critical environmental descriptor. Microsoft produced a U.S.-wide vector building dataset in 20181 that was generated from aerial images available to Bing Maps using deep learning methods for object classification2. The main goal of this product has been to increase the coverage of building footprints available for OpenStreetMap. Microsoft identified building footprints in two phases; first, using semantic segmentation to identify building pixels from aerial imagery using Deep Neural Networks and second, converting building pixel blobs into polygons. The final dataset includes 125,192,184 building footprint polygon geometries in GeoJSON vector format, covering all 50 U.S. States, with data for each state distributed separately. These data have 99.3% precision and 93.5% pixel recall accuracy2. Temporal resolution of the data (i.e., years of the aerial imagery used to derive the data) are not provided by Microsoft in the metadata.
Using vector layers for large-extent (i.e., national or state-level) spatial analysis and modelling (e.g., mapping the Wildland-Urban Interface, flood and coastal hazards, or large-extent urban typology modelling) is challenging in practice. Although vector data provide accurate geometries, incorporating them in large-extent spatial analysis comes at a high computational cost. We used High Performance Computing (HPC) to develop an algorithm that calculates six summary statistics (described below) for buildings at 30-m cell size in the 48 conterminous U.S. states, to better support national-scale and multi-state modelling that requires building footprint data. To develop these six derived products from the Microsoft buildings dataset, we created an algorithm that took every single building and built a small meshgrid (a 2D array) for the bounding box of the building and calculated unique values for each cell of the meshgrid. This grid structure is aligned with National Land Cover Database (NLCD) products (projected using Albers Equal Area Conic system), enabling researchers to combine or compare our products with standard national-scale datasets such as land cover, tree canopy cover, and urban imperviousness3.
Locations, shapes, and distribution patterns of structures in urban and rural areas are the subject of many studies. Buildings represent the density of built up areas as an indicator of urban morphology or spatial structures of cities and metropolitan areas4,5. In local studies, the use of vector data types is easier6,7. However, in regional and national studies a raster dataset would be more preferable. For example in measuring the spatial structure of metropolitan areas a rasterized building layer would be more useful than the original vector datasets8.
Our output raster products are: (1) total building footprint coverage per cell (m2 of building footprint per 900 m2 cell); (2) number of buildings that intersect each cell; (3) number of building centroids falling within each cell; (4) area of the largest building intersecting each cell (m2); (5) area of the smallest building intersecting each cell (m2); and (6) average area of all buildings intersecting each cell (m2). The last three area metrics include building area that falls outside the cell but where part of the building intersects the cell (Fig. 1). These values can be used to describe the intensity and typology of the built environment.
Our software is available through U.S. Geological Survey code r...
Facebook
TwitterMicrosoft Buildings Footprints with Heights from service: https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/MS_Buildings_Training_Data_with_Heights/FeatureServer (restrictions, do not use)Source: Approx. 9.8 million building footprints for portions of metro areas in 44 US States in Shapefile format.Microsoft recently released a free set of deep learning generated building footprints covering the United States of America. As part of that project Microsoft shared 8 million digitized building footprints with height information used for training the Deep Learning Algorithm. This map layer includes all buildings with height information for the original training set that can be used in scene viewer and ArcGIS pro to create simple 3D representations of buildings. Learn more about the Microsoft Project at the Announcement Blog or the raw data is available at Github.Click see Microsoft Building Layers in ArcGIS Online.Digitized building footprint by State and CityAlabamaGreater Phoenix City, Mobile, and MontgomeryArizonaTucsonArkansasLittle Rock with 5 buildings just across the river from MemphisCaliforniaBakersfield, Fresno, Modesto, Santa Barbara, Sacramento, Stockton, Calaveras County, San Fran & bay area south to San Jose and north to CloverdaleColoradoInterior of DenverConnecticutEnfield and Windsor LocksDelawareDoverFloridaTampa, Clearwater, St. Petersburg, Orlando, Daytona Beach, Jacksonville and GainesvilleGeorgiaColumbus, Atlanta, and AugustaIllinoisEast St. Louis, downtown area, Springfield, Champaign and UrbanaIndianaIndianapolis downtown and Jeffersonville downtownIowaDes MoinesKansasTopekaKentuckyLouisville downtown, Covington and NewportLouisianaShreveport, Baton Rouge and center of New OrleansMaineAugusta and PortlandMarylandBaltimoreMassachusettsBoston, South Attleboro, commercial area in Seekonk, and SpringfieldMichiganDowntown DetroitMinnesotaDowntown MinneapolisMississippiBiloxi and GulfportMissouriDowntown St. Louis, Jefferson City and SpringfieldNebraskaLincolnNevadaCarson City, Reno and Los VegasNew HampshireConcordNew JerseyCamden and downtown Jersey CityNew MexicoAlbuquerque and Santa FeNew YorkSyracuse and ManhattanNorth CarolinaGreensboro, Durham, and RaleighNorth DakotaBismarckOhioDowntown Cleveland, downtown Cincinnati, and downtown ColumbusOklahomaDowntown Tulsa and downtown Oklahoma CityOregonPortlandPennsylvaniaDowntown Pittsburgh, Harrisburg, and PhiladelphiaRhode IslandThe greater Providence areaSouth CarolinaGreensville, downtown Augsta, greater Columbia area and greater Charleston areaSouth Dakotagreater Pierre areaTennesseeMemphis and NashvilleTexasLubbock, Longview, part of Fort Worth, Austin, downtown Houston, and Corpus ChristiUtahSalt Lake City downtownVirginiaRichmondWashingtonGreater Seattle area to Tacoma to the south and Marysville to the northWisconsinGreen Bay, downtown Milwaukee and MadisonWyomingCheyenne
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
125 million building footprints deep learning generated by Microsoft for the USA.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The growth of the Wildland-Urban Interface (WUI) underscores the need for accurate mapping to support effective wildfire risk management. One major obstacle is the lack of comprehensive national building footprint databases. Our study addresses that gap by developing a semi-automated, multi-criteria filtering framework aimed at enhancing the quality of global open-source building datasets, with a focus on Microsoft’s Global Building Footprints (MSB), applied to mainland Portugal. The proposed method incorporates regional adaptability and spatial analysis techniques—such as area-based thresholds and proximity criteria—using Portugal’s official Geographic Buildings Location Database (BGE) as a benchmark. To better represent residential structures, the framework systematically removes non-residential anomalies (e.g., industrial complexes, solar farms, transmission lines) through dynamically calibrated thresholds at multiple administrative levels, including municipalities and NUTS-2 regions. As a result, the filtering process reduced the original dataset from approximately 5.6 million to 3.0 million building footprints. The original and filtered datasets are available here.
Facebook
TwitterPolygons of the buildings footprints clipped Broward County. This is a product MicroSoft.
The orginal dataset This dataset contains 125,192,184 computer generated building footprints in all 50 US states. This data is freely available for download and use.
The data set was clipped to the Broward County developed boundary.
https://github.com/microsoft/USBuildingFootprints/blob/master/README.md">Additional information
Facebook
TwitterIntroduction
Bing Maps is releasing country wide open building footprints datasets in Australia. This dataset contains 11,334,866 computer generated building footprints derived using Bing Maps algorithms on satellite imagery. Satellite imagery used for extraction is from our imagery partners Maxar Technologies among others. The data is freely available for download and use under applicable license.
License
This data is licensed by Microsoft under the Open Data Commons Open Database License (ODbL). FAQ What does the data include?
11,334,866 building footprint polygon geometries in Australia in GeoJSON format. You may download the data in GeoJSON format here: Country Number of Buildings Zipped MB Unzipped MB Australia 11,334,866 845 6,410 What is the GeoJSON format?
GeoJSON is a format for encoding a variety of geographic data structures. For intensive documentation and tutorials, refer to GeoJson blog. Why is the data being released?
Microsoft has a continued interest in supporting a thriving OpenStreetMap ecosystem. Should we import the data into OpenStreetMap?
Maybe. Never overwrite the hard work of other contributors or blindly import data into OSM without first checking the local quality. While our metrics show that this data meets or exceeds the quality of hand-drawn building footprints, the data does vary in quality from place to place, between rural and urban, mountains and plains, and so on. Inspect quality locally and discuss an import plan with the community. Always follow the OSM import community guidelines. Will the data be used or made available in larger OpenStreetMap ecosystem?
Yes. Currently Microsoft Open Buildings dataset is used in ml-enabler for task creation. You can try it out at AI assisted Tasking Manager. The data will also be made avaialble in Facebook RapiD. What is the creation process for this data?
The building extraction is done in two stages:
Semantic Segmentation – Recognizing building pixels on the aerial image using DNNs
Polygonization – Converting building pixel blobs into polygons
Stage1: Semantic Segmentation
Stage 2: Polygonization
Is there any technical improvement used in this round than previous ones?
To train models for Australia we only had a few thousand building labels, which made it hard to rely only on supervised training. Typically we’ve used hundreds of thousands or best case tens of millions of building labels for training. In order to create a good and robust model for Australia we took advantage of self-supervised training and unsupervised domain adaptation techniques to leverage our training data from other countries and domains. We believe this is a good proof of concept to scale to building extraction to the whole world. Evaluation set metrics
Australia evaluation set contains 6,785 buildings from several diverse and represenative regions.
Building match metrics on the evaluation set: Metric Value Precision 98.59% Recall 64.95%
We track following metrics to measure the quality of matched buildings in the evaluation set:
Intersection over Union – This is a standard metric measuring the overlap quality against the labels
Shape distance – With this metric we measure the polygon outline similarity
Dominant angle rotation error – This measures the polygon rotation deviation
IoU Shape distance Rotation error [deg] 0.79 0.44 4.46
False positive ratio in the corpus
We estimate ~1% false postive ratio in 1000 randomly sampled buildings from the entire output corpus. Evaluation recall error space
Correctly detecting connected buildings and small buildings are sometimes difficult tasks, even for a human labeller. There are often ambiguities in whether one is looking at multiple connected buildings or a single fragmented building. Similarly, it is sometimes hard to estimate for a small object if it should be classified as a building or not.
Output precision and recall metrics are calculated after optimal 1-to-1 matching between output polygons and labels scored by polygons intersection over union. The labels are usually very granular whilst it is sometimes very hard for DNN model to separate connected buildings. This results with significant ratio of unmatched false negatives which are pushing the recall down. Error category 35.05% Gap Very small buildings 15.4% Connected buildings 14.0% DNN 2.8% Various 2.1% Polygonization 0.7%
Small building example:
Connected buildings example: What is the vintage of this data?
Vintage of extracted building footprints depends on vintage of the underlying imagery. Bing Imagery is a composite of multiple sources, therefore it is difficult to know the exact dates for individual pieces of data. However we believe the vintage is anywhere from 2013 to 2018, with majority being from 2018. How good is the data?
Our metrics show that in the vast majority of cases the quali...
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Microsoft Maps is releasing country wide open building footprints datasets in United States. This dataset contains 129,591,852 computer generated building footprints derived using our computer vision algorithms on satellite imagery. This data is freely available for download and use.
Facebook
TwitterThis data set is a conversion of Califonia building footprint file from GeoJSON format to shapefile format. The California building footprint file which contains 10,988,525 computer generated building footprints in California state is extracting from US building footprint dataset by Microsoft (2018).
Facebook
TwitterSGID10.LOCATION.Buildings was derived from building footprints generated by Microsoft for all 50 States https://github.com/Microsoft/USBuildingFootprints In some cases the pixel prediction algorithm used by Microsoft identified and created building footprints where no buildings existed. To flag potential errors, building footprints within 750 meters of known populated areas (SGID10.DEMOGRAPHIC.PopBlockAreas2010_Approx) and within 500 meters of an address point (SGID10.LOCATION.AddressPoints) were selected and indentified as being a likely structure, footprints falling outside these areas were identified as possible buildings in the 'TYPE' field. In addition, attributes were added for address, city, county, and zip where possible.
Facebook
TwitterRepresentative, computer generated building footprints for Rhode Island. Originally developed by Microsoft, these data were released by Microsoft as open source data in June 2018. Source date for these data is unknown, please see metadata for details.Original Microsoft announcement regarding availability of these data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 3D Global Building Footprints (3D-GloBFP) dataset is the first global-scale building height dataset at the individual building footprint level for the year 2020, generated through the integration of multisource Earth Observation (EO) data and the extreme gradient boosting (XGBoost) model. The reliability and accuracy of 3D-GloBFP have been validated across 33 subregions, achieving R² values ranging from 0.66 to 0.96 and root-mean-square errors (RMSEs) between 1.9 m and 14.6 m.
This version supplements building footprints and height attributes for some countries in South America, Asia, Africa, and Europe, based on building footprints provided by Microsoft (https://github.com/microsoft/GlobalMLBuildingFootprints), Open Street Map (https://osmbuildings.org/), Google-Microsoft Open Buildings - combined by VIDA (https://source.coop/repositories/vida/google-microsoft-open-buildings), and EUBUCCO (https://eubucco.com/).
The dataset is divided into spatial grid-based tiles, each stored as an individual ShapeFile (.shp) containing estimated building heights (in meters) in attribute tables. See world_grid.shp and readme.txt for details on the spatial grid and file naming.
Data download links are provided in data_links.txt.
Facebook
TwitterThis chipped training dataset is over Mzuzu and includes high-resolution imagery (.tif format) and corresponding building footprint vector labels (.geojson format) in 256 x 256 pixel tile/label pairs. This dataset is a ramp Tier 1 dataset, meaning it has been thoroughly reviewed and improved. This dataset was used in developing the ramp baseline model and contains 3,357 tiles and 91,391 individual buildings. The satellite imagery resolution is 45 cm and was sourced from Maxar ODP (10500100195A6700). Dataset keywords: Urban, Peri-Urban, Dense.
Facebook
TwitterBuilding Outlines provided by Microsoft, clipped to the County. Full dataset available at: https://github.com/Microsoft/USBuildingFootprints
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In this project, I focus on enhancing global building data by combining multiple open-source geospatial datasets to predict building attributes, specifically the number of levels (floors). The core datasets used are the Microsoft Open Buildings dataset, which provides detailed building footprints across many regions, and Google’s Temporal Buildings Dataset (V1), which includes estimated building heights over time derived from satellite imagery. While Google's dataset includes height information for many buildings, a significant portion contains missing or unreliable values.
To address this, I first performed data preprocessing and merged the two datasets based on geographic coordinates. For buildings with missing height values, I used LightGBM, a gradient boosting framework, to impute missing heights using features like footprint area, geometry, and surrounding context. I then brought in OpenStreetMap (OSM) data to enrich the dataset with additional contextual information, such as building type, land use, and nearby infrastructure.
Using the combined dataset — now with both original and imputed heights — I trained a Random Forest Regressor to predict the number of building levels. Since floor count is not always directly available, especially in developing regions, this approach offers a way to estimate it from height and footprint data with relatively high accuracy.
This kind of modeling has important real-world applications. Predicting building levels can help support urban planning, disaster response, infrastructure development, and climate risk modeling. For example, knowing the number of floors in buildings allows for better estimation of population density, potential occupancy, or structural vulnerability in earthquake-prone or flood-prone regions. It can also help fill gaps in existing GIS data where traditional surveys are too expensive or time-consuming.
In future work, this framework could be extended globally and refined with additional data sources like LIDAR or census information to further improve the accuracy and coverage of building-level models
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Bing Maps team at Microsoft released a U.S.-wide vector building dataset in 2018, which includes over 125 million building footprints for all 50 states in GeoJSON format. This dataset is extracted from aerial images using deep learning object classification methods. Large-extent modelling (e.g., urban morphological analysis or ecosystem assessment models) or accuracy assessment with vector layers is highly challenging in practice. Although vector layers provide accurate geometries, their use in large-extent geospatial analysis comes at a high computational cost. We used High Performance Computing (HPC) to develop an algorithm that calculates six summary values for each cell in a raster representation of each U.S. state: (1) total footprint coverage, (2) number of unique buildings intersecting each cell, (3) number of building centroids falling inside each cell, and area of the (4) average, (5) smallest, and (6) largest area of buildings that intersect each cell. These values a ...