U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Extracting useful and accurate information from scanned geologic and other earth science maps is a time-consuming and laborious process involving manual human effort. To address this limitation, the USGS partnered with the Defense Advanced Research Projects Agency (DARPA) to run the AI for Critical Mineral Assessment Competition, soliciting innovative solutions for automatically georeferencing and extracting features from maps. The competition opened for registration in August 2022 and concluded in December 2022. Training and validation data from the map georeferencing challenge are provided here, as well as competition details and a baseline solution. The data were derived from published sources and are provided to the public to support continued development of automated georeferencing and feature extraction tools. References for all maps are included with the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy Training Network tutorial that demonstrates mapping-by-sequencing analysis and represent a subsample of the data used in Sun & Schneeberger, 2015 (DOI:10.1007/978-1-4939-2444-8_19).
This training, developed by UNEP, covers the basic of Google Earth Pro, including how to search for locations and create data. Google Earth Pro is a useful tool for participatory mapping processes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LAWRENCE — Fall break is barely behind us, but a group of University of Kansas students has just finished an innovative eight-week course in using drones to develop aerial maps. Over the past two months, they’ve visited sites in KU's West District and at the Baker Wetlands, taking still images and videos over those areas. “The drone mapping course has been excellent in providing a hands-on experience with the drones,” said Siddharth Shankar, graduate student from Lucknow, India. “The course has focused not just on drones and how to fly them but also has made us aware of the FAA rules and regulations about drone flying and safety precautions. “My research has been in glaciology, with the study of icebergs in Greenland. The drone mapping course has provided new insights into incorporating it with my research in the near future.” The course, offered annually during the fall semester, is designed to teach students about the rapidly growing technology of small unmanned aerial systems, referred to as drones, and its wide-ranging applications — which include search-and-rescue, real estate and environmental monitoring. Students in the course come from a variety of disciplines including geography & atmospheric science, geology, ecology & evolutionary biology and civil engineering. Enthusiasm for the course has been very high, and it has filled rapidly each time it has been offered.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
These data were compiled for the use of training natural feature machine learning (GeoAI) detection and delineation. The natural feature classes include the Geographic Names Information System (GNIS) feature types Basins, Bays, Bends, Craters, Gaps, Guts, Islands, Lakes, Ridges and Valleys, and are an areal representation of those GNIS point features. Features were produced using heads-up digitizing from 2018 to 2019 by Dr. Sam Arundel's team at the U.S. Geological Survey, Center of Excellence for Geospatial Information Science, Rolla, Missouri, USA, and Dr. Wenwen Li's team in the School of Geographical Sciences at Arizona State University, Tempe, Arizona, USA.
This dataset presents the data underlying the interactive map of all training courses accessible via Parcoursup in 2020, 2021, 2022 and 2023 (‘https://dossier.parcoursup.fr/Candidat/carte’). The 2024 data will be completed gradually until 17 January. This dataset is updated daily.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
For the purposes of training AI-based models to identify (map) road features in rural/remote tropical regions on the basis of true-colour satellite imagery, and subsequently testing the accuracy of these AI-derived road maps, we produced a dataset of 8904 satellite image ‘tiles’ and their corresponding known road features across Equatorial Asia (Indonesia, Malaysia, Papua New Guinea). Methods
The main dataset shared here was derived from a set of 200 input satellite images, also provided here. These 200 images are effectively ‘screenshots’ (i.e., reduced-resolution copies) of high-resolution true-colour satellite imagery (~0.5-1m pixel resolution) observed using the Elvis Elevation and Depth spatial data portal (https://elevation.fsdf.org.au/), which here is functionally equivalent to the more familiar Google Earth. Each of these original images was initially acquired at a resolution of 1920x886 pixels. Actual image resolution was coarser than the native high-resolution imagery. Visual inspection of these 200 images suggests a pixel resolution of ~5 meters, given the number of pixels required to span features of familiar scale, such as roads and roofs, as well as the ready discrimination of specific land uses, vegetation types, etc. These 200 images generally spanned either forest-agricultural mosaics or intact forest landscapes with limited human intervention. Sloan et al. (2023) present a map indicating the various areas of Equatorial Asia from which these images were sourced.
IMAGE NAMING CONVENTION
A common naming convention applies to satellite images’ file names:
XX##.png
where:
XX – denotes the geographical region / major island of Equatorial Asia of the image, as follows: ‘bo’ (Borneo), ‘su’ (Sumatra), ‘sl’ (Sulawesi), ‘pn’ (Papua New Guinea), ‘jv’ (java), ‘ng’ (New Guinea [i.e., Papua and West Papua provinces of Indonesia])
INTERPRETING ROAD FEATURES IN THE IMAGES For each of the 200 input satellite images, its road was visually interpreted and manually digitized to create a reference image dataset by which to train, validate, and test AI road-mapping models, as detailed in Sloan et al. (2023). The reference dataset of road features was digitized using the ‘pen tool’ in Adobe Photoshop. The pen’s ‘width’ was held constant over varying scales of observation (i.e., image ‘zoom’) during digitization. Consequently, at relatively small scales at least, digitized road features likely incorporate vegetation immediately bordering roads. The resultant binary (Road / Not Road) reference images were saved as PNG images with the same image dimensions as the original 200 images.
IMAGE TILES AND REFERENCE DATA FOR MODEL DEVELOPMENT
The 200 satellite images and the corresponding 200 road-reference images were both subdivided (aka ‘sliced’) into thousands of smaller image ‘tiles’ of 256x256 pixels each. Subsequent to image subdivision, subdivided images were also rotated by 90, 180, or 270 degrees to create additional, complementary image tiles for model development. In total, 8904 image tiles resulted from image subdivision and rotation. These 8904 image tiles are the main data of interest disseminated here. Each image tile entails the true-colour satellite image (256x256 pixels) and a corresponding binary road reference image (Road / Not Road).
Of these 8904 image tiles, Sloan et al. (2023) randomly selected 80% for model training (during which a model ‘learns’ to recognize road features in the input imagery), 10% for model validation (during which model parameters are iteratively refined), and 10% for final model testing (during which the final accuracy of the output road map is assessed). Here we present these data in two folders accordingly:
'Training’ – contains 7124 image tiles used for model training in Sloan et al. (2023), i.e., 80% of the original pool of 8904 image tiles. ‘Testing’– contains 1780 image tiles used for model validation and model testing in Sloan et al. (2023), i.e., 20% of the original pool of 8904 image tiles, being the combined set of image tiles for model validation and testing in Sloan et al. (2023).
IMAGE TILE NAMING CONVENTION A common naming convention applies to image tiles’ directories and file names, in both the ‘training’ and ‘testing’ folders: XX##_A_B_C_DrotDDD where
XX – denotes the geographical region / major island of Equatorial Asia of the original input 1920x886 pixel image, as follows: ‘bo’ (Borneo), ‘su’ (Sumatra), ‘sl’ (Sulawesi), ‘pn’ (Papua New Guinea), ‘jv’ (java), ‘ng’ (New Guinea [i.e., Papua and West Papua provinces of Indonesia])
A, B, C and D – can all be ignored. These values, which are one of 0, 256, 512, 768, 1024, 1280, 1536, and 1792, are effectively ‘pixel coordinates’ in the corresponding original 1920x886-pixel input image. They were recorded within the names of image tiles’ sub-directories and file names merely to ensure that names/directory were uniquely named)
rot – implies an image rotation. Not all image tiles are rotated, so ‘rot’ will appear only occasionally.
DDD – denotes the degree of image-tile rotation, e.g., 90, 180, 270. Not all image tiles are rotated, so ‘DD’ will appear only occasionally.
Note that the designator ‘XX##’ is directly equivalent to the filenames of the corresponding 1920x886-pixel input satellite images, detailed above. Therefore, each image tiles can be ‘matched’ with its parent full-scale satellite image. For example, in the ‘training’ folder, the subdirectory ‘Bo12_0_0_256_256’ indicates that its image tile therein (also named ‘Bo12_0_0_256_256’) would have been sourced from the full-scale image ‘Bo12.png’.
Quivira National Wildlife Refuge was established in 1955, and a detailed vegetation map was not available for management purposes. With the present development of a biological program and Comprehensive Conservation Plan (CCP), a baseline vegetation map of the refuge was identified as a necessity. Development of the vegetation map and associated report was a multi-step process. Aerial photography (NAIP, 2008) was used with eCognition to create polygons of different plant communities based on the likeness of surrounding pixels in the area. Prior to ground-truthing, the following activities were accomplished: training on vegetation mapping using GIS (previous experience and National Conservation Training Center course), creation of an vegetation association and alliance dichotomous key, development of a refuge plant key and identification skills, and preparation of maps for ground truthing. Once out in the field dominant plants were identified for appropriate vegetation alliance and association classification, plant specimens were collected for the refuge herbarium as necessary and additional observations and photos were gathered for the report. Over the course of the project, classification data was entered into a GIS and polygons were appropriately modified to create the final map. At Quivira, results found a total of 42 alliances and 43 associations.The most dominant plants throughout the refuge in 2008 based on canopy cover were saltgrass, plum, little bluestem and cottonwood. The number of alliances and associations found on the refuge show high species diversity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This upload contains data required to replicate a tutorial that applies regression-based unmixing of spectral-temporal metrics for sub-pixel land cover mapping with synthetically created training data. The tutorial uses the Framework for Operational Radiometric Correction for Environmental monitoring.
This dataset contains intermediate and final results of the workflow described in that tutorial as well as auxiliary data such as parameter files.
Please refer to the above mentioned tutorial for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sensitivity and specificity of global native T1 value and pattern-derived texture features to identify subjects in the three cohorts (i.e. control, HCM, and DCM) using multiclass linear SVM.
Accurate, high-resolution maps of bedrock outcrops are extremely valuable. The increasing availability of high-resolution imagery can be coupled with machine learning techniques to improve regional bedrock maps. This data release contains training data created for developing a machine learning model capable of identifying exposed bedrock across the entire Sierra Nevada Mountains (California, USA). The training data consist of 20 thematic rasters in GeoTIFF format, where image labels represent three categories: rock, not rock, and no data. These training data labels were created using 0.6-m imagery from the National Agriculture Imagery Program (NAIP) acquired in 2016. Eight existing labeled sites were available from Petliak et al. (2019), an earlier effort. We further revised those labels for improved accuracy and created additional 12 reference sites following the same protocol of semi-manual mapping in Petliak et al. (2019). A machine learning model (https://github.com/nasa/delta) was trained and tested based on these image labels as detailed in Shastry et al. (in review). The trained model was then used to map exposed bedrock across the entire Sierra Nevada region using 2016 NAIP imagery, and this data release also includes these model outputs. The model output gives the likelihood (from 0 to 255) that each pixel is bedrock, and not a direct binary classification. The associated publication used a threshold of 50%, or pixel value 127, where all pixel values 127 or higher are classified as rock and less than as not rock.
The dataset depicts the authoritative locations of the most commonly known Department of Defense (DoD) sites, installations, ranges, and training areas world-wide. These sites encompass land which is federally owned or otherwise managed. This dataset was created from source data provided by the four Military Service Component headquarters and was compiled by the Defense Installation Spatial Data Infrastructure (DISDI) Program within the Office of the Assistant Secretary of Defense for Energy, Installations, and Environment. Only sites reported in the BSR or released in a map supplementing the Foreign Investment Risk Review Modernization Act of 2018 (FIRRMA) Real Estate Regulation (31 CFR Part 802) were considered for inclusion. This list does not necessarily represent a comprehensive collection of all Department of Defense facilities. For inventory purposes, installations are comprised of sites, where a site is defined as a specific geographic location of federally owned or managed land and is assigned to military installation. DoD installations are commonly referred to as a base, camp, post, station, yard, center, homeport facility for any ship, or other activity under the jurisdiction, custody, control of the DoD.While every attempt has been made to provide the best available data quality, this data set is intended for use at mapping scales between 1:50,000 and 1:3,000,000. For this reason, boundaries in this data set may not perfectly align with DoD site boundaries depicted in other federal data sources. Maps produced at a scale of 1:50,000 or smaller which otherwise comply with National Map Accuracy Standards, will remain compliant when this data is incorporated. Boundary data is most suitable for larger scale maps; point locations are better suited for mapping scales between 1:250,000 and 1:3,000,000.If a site is part of a Joint Base (effective/designated on 1 October, 2010) as established under the 2005 Base Realignment and Closure process, it is attributed with the name of the Joint Base. All sites comprising a Joint Base are also attributed to the responsible DoD Component, which is not necessarily the pre-2005 Component responsible for the site.
This dataset offers a comprehensive collection of Telegram users' geolocation data, including IP addresses, with full user consent, covering 50,000 records. This data is specifically tailored for use in AI, ML, DL, and LLM models, as well as applications requiring Geographic Data and Social Media Data. The dataset provides critical geospatial information, making it a valuable resource for developing location-based services, targeted marketing strategies, and more.
What Makes This Data Unique? This dataset is unique due to its focus on geolocation data tied to Telegram users, a platform with a global user base. It includes IP to Geolocation Data, offering precise geospatial insights that are essential for accurate geographic analysis. The inclusion of user consent ensures that the data is ethically sourced and legally compliant. The dataset's broad coverage across various regions makes it particularly valuable for AI and machine learning models that require diverse, real-world data inputs.
Data Sourcing: The data is collected through a network of in-app tasks across different mini-apps within Telegram. Users participate in these tasks voluntarily, providing explicit consent to share their geolocation and IP information. The data is collected in real-time, capturing accurate geospatial details as users interact with various Telegram mini-apps. This method of data collection ensures that the information is both relevant and up-to-date, making it highly valuable for applications that require current location data.
Primary Use-Cases: This dataset is highly versatile and can be applied across multiple categories, including:
IP to Geolocation Data: The dataset provides precise mapping of IP addresses to geographical locations, making it ideal for applications that require accurate geolocation services. Geographic Data: The geospatial information contained in the dataset supports a wide range of geographic analysis, including regional behavior studies and location-based service optimization. Social Media Data: The dataset's integration with Telegram users' activities provides insights into social media behaviors across different regions, enhancing social media analytics and targeted marketing. Large Language Model (LLM) Data: The geolocation data can be used to train LLMs to better understand and generate content that is contextually relevant to specific regions. Deep Learning (DL) Data: The dataset is ideal for training deep learning models that require accurate and diverse geospatial inputs, such as those used in autonomous systems and advanced geographic analytics. Integration with Broader Data Offering: This geolocation dataset is a valuable addition to the broader data offerings from FileMarket. It can be combined with other datasets, such as web browsing behavior or social media activity data, to create comprehensive AI models that provide deep insights into user behaviors across different contexts. Whether used independently or as part of a larger data strategy, this dataset offers unique value for developers and data scientists focused on enhancing their models with precise, consented geospatial data.
A very high spatial resolution Land Use and Land Cover map was produced for the greater Marino watershed (Peru) using the MORINGA processing chain. The methods involved multisource satellite imagery and a random forest model, as well as manual post-treatment. The final map provides important information for environmental management and monitoring and contributes to developing standardized methodologies for accurate LULC mapping.
Training Dataset
The Sonoma County fine scale vegetation and habitat map is an 83-class vegetation map of Sonoma County with 212,391 polygons. The fine scale vegetation and habitat map represents the state of the landscape in 2013 and adheres to the National Vegetation Classification System (NVC). The map was designed to be used at scales of 1:5,000 and smaller. This layer file is just to be used for symbology - no spatial data is included. For the spatial data, download the veg map layer package, file geodatabase, or shapefile. The full datasheet for this product is available here: https://sonomaopenspace.egnyte.com/dl/qOm3JEb3tDClass definitions, as well as a dichotomous key for the map classes, can be found in the Sonoma Vegetation and Habitat Map Key (https://sonomaopenspace.egnyte.com/dl/xObbaG6lF8). The fine scale vegetation and habitat map was created using semi-automated methods that include field work, computer-based machine learning, and manual aerial photo interpretation. The vegetation and habitat map was developed by first creating a lifeform map, an 18-class map that served as a foundation for the fine-scale map. The lifeform map was created using “expert systems” rulesets in Trimble Ecognition. These rulesets combine automated image segmentation (stand delineation) with object based image classification techniques. In contrast with machine learning approaches, expert systems rulesets are developed heuristically based on the knowledge of experienced image analysts. Key data sets used in the expert systems rulesets for lifeform included: orthophotography (’11 and ’13), the LiDAR derived Canopy Height Model (CHM), and other LiDAR derived landscape metrics. After it was produced using Ecognition, the preliminary lifeform map product was manually edited by photo interpreters. Manual editing corrected errors where the automated methods produced incorrect results. Edits were made to correct two types of errors: 1) unsatisfactory polygon (stand) delineations and 2) incorrect polygon labels.The mapping team used the lifeform map as the foundation for the finer scale and more floristically detailed Fine Scale Vegetation and Habitat map. For example, a single polygon mapped in the lifeform map as forest might be divided into four polygons in the in the fine scale map including redwood forest, Douglas-fir forest, Oregon white oak forest, and bay forest. The fine scale vegetation and habitat map was developed using a semi-automated approach. The approach combines Ecognition segmentation, extensive field data collection, machine learning, manual editing, and expert review. Ecognition segmentation results in a refinement of the lifeform polygons. Field data collection results in a large number of training polygons labeled with their field-validated map class. Machine learning relies on the field collected data as training data and a stack of GIS datasets as predictor variables. The resulting model is used to create automated fine-scale labels countywide. Machine learning algorithms for this project included both Random Forests and Support Vector Machines (SVMs). Machine learning is followed by extensive manual editing, which is used to 1) edit segment (polygon) labels when they are incorrect and 2) edit segment (polygon) shape when necessary.The map classes in the fine scale vegetation and habitat map generally correspond to the alliance level of the National Vegetation Classification, but some map classes - especially riparian vegetation and herbaceous types - correspond to higher levels of the hierarchy (such as group or macrogroup).
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Have you ever wanted to create your own maps, or integrate and visualize spatial datasets to examine changes in trends between locations and over time? Follow along with these training tutorials on QGIS, an open source geographic information system (GIS) and learn key concepts, procedures and skills for performing common GIS tasks – such as creating maps, as well as joining, overlaying and visualizing spatial datasets. These tutorials are geared towards new GIS users. We’ll start with foundational concepts, and build towards more advanced topics throughout – demonstrating how with a few relatively easy steps you can get quite a lot out of GIS. You can then extend these skills to datasets of thematic relevance to you in addressing tasks faced in your day-to-day work.
This is a vector tile service with labels for the fine scale vegetation and habitat map, to be used in web maps and GIS software packages. Labels appear at scales greater than 1:5,000 and show the full Latin name or vegetation group name. At scales smaller than 1:5,000 the abbreviated vegetation class name is displayed. This service is mean to be used in conjunction with the vector tile services of the veg map polygons (either the solid symbology service or the hollow symbology service). The key to map class abbreviations can be found here. The Sonoma County fine scale vegetation and habitat map is an 82-class vegetation map of Sonoma County with 212,391 polygons. The fine scale vegetation and habitat map represents the state of the landscape in 2013 and adheres to the National Vegetation Classification System (NVC). The map was designed to be used at scales of 1:5,000 and smaller. The full datasheet for this product is available here: https://sonomaopenspace.egnyte.com/dl/qOm3JEb3tD The final report for the fine scale vegetation map, containing methods and an accuracy assessment, is available here: https://sonomaopenspace.egnyte.com/dl/1SWyCSirE9Class definitions, as well as a dichotomous key for the map classes, can be found in the Sonoma Vegetation and Habitat Map Key (https://sonomaopenspace.egnyte.com/dl/xObbaG6lF8) The fine scale vegetation and habitat map was created using semi-automated methods that include field work, computer-based machine learning, and manual aerial photo interpretation. The vegetation and habitat map was developed by first creating a lifeform map, an 18-class map that served as a foundation for the fine-scale map. The lifeform map was created using “expert systems” rulesets in Trimble Ecognition. These rulesets combine automated image segmentation (stand delineation) with object based image classification techniques. In contrast with machine learning approaches, expert systems rulesets are developed heuristically based on the knowledge of experienced image analysts. Key data sets used in the expert systems rulesets for lifeform included: orthophotography (’11 and ’13), the LiDAR derived Canopy Height Model (CHM), and other LiDAR derived landscape metrics. After it was produced using Ecognition, the preliminary lifeform map product was manually edited by photo interpreters. Manual editing corrected errors where the automated methods produced incorrect results. Edits were made to correct two types of errors: 1) unsatisfactory polygon (stand) delineations and 2) incorrect polygon labels. The mapping team used the lifeform map as the foundation for the finer scale and more floristically detailed Fine Scale Vegetation and Habitat map. For example, a single polygon mapped in the lifeform map as forest might be divided into four polygons in the in the fine scale map including redwood forest, Douglas-fir forest, Oregon white oak forest, and bay forest. The fine scale vegetation and habitat map was developed using a semi-automated approach. The approach combines Ecognition segmentation, extensive field data collection, machine learning, manual editing, and expert review. Ecognition segmentation results in a refinement of the lifeform polygons. Field data collection results in a large number of training polygons labeled with their field-validated map class. Machine learning relies on the field collected data as training data and a stack of GIS datasets as predictor variables. The resulting model is used to create automated fine-scale labels countywide. Machine learning algorithms for this project included both Random Forests and Support Vector Machines (SVMs). Machine learning is followed by extensive manual editing, which is used to 1) edit segment (polygon) labels when they are incorrect and 2) edit segment (polygon) shape when necessary. The map classes in the fine scale vegetation and habitat map generally correspond to the alliance level of the National Vegetation Classification, but some map classes - especially riparian vegetation and herbaceous types - correspond to higher levels of the hierarchy (such as group or macrogroup).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tentative forest plantation mapping training data set collected by the author team, September 2016, using Google Earth / Collect Earth software
Remotely sensed imagery is increasingly used by emergency managers to monitor and map the impact of flood events to support preparedness, response, and critical decision making throughout the flood event lifecycle. To reduce latency in delivery of imagery-derived information, ensure consistent and reliably derived map products, and facilitate processing of an increasing volume of remote sensing data-streams, automated flood mapping workflows are needed. The U.S. Geological Survey is facilitating the development and integration of machine-learning algorithms in collaboration with NASA, National Geospatial Intelligence Agency (NGA), University of Alabama, and University of Illinois to create a workflow for rapidly generating improved flood-map products. A major bottleneck to the training of robust, generalizable machine learning algorithms for pattern recognition is a lack of training data that is representative across the landscape. To overcome this limitation for the training of algorithms capable of detection of surface inundation in diverse contexts, this publication includes the data developed from MAXAR Worldview sensors that is input as training data for machine learning. This data release consists of 100 thematic rasters, in GeoTiff format, with image labels representing five discrete categories: water, not water, maybe water, clouds and background/no data. Specifically, these training data were created by labeling 8-band, multispectral scenes from the MAXAR-Digital Globe, Worldview-2 and 3 satellite-based sensors. Scenes were selected to be spatially and spectrally diverse and geographically representative of different water features within the continental U.S. The labeling procedures used a hybrid approach of unsupervised classification for the initial spectral clustering, followed by expert-level manual interpretation and QA/QC peer review to finalize each labeled image. Updated versions of the data may be issued along with version update documentation. The 100 raster files that make up the training data are available to download here (https://doi.org/10.5066/P9C7HYRV).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was compiled as part of the TIME4CS project, WP4, and lists identified citizen science training resources, as of July 2022.
The EU-citizen.science platform provided the basis for mapping CS training in Europe, as the team behind the platform has put considerable effort into compiling, and encouraging the CS community to contribute, CS training resources. Additionally, training courses were identified based on the case studies in WP1, as most universities do not list their courses on the EU-citizen.science platform.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Extracting useful and accurate information from scanned geologic and other earth science maps is a time-consuming and laborious process involving manual human effort. To address this limitation, the USGS partnered with the Defense Advanced Research Projects Agency (DARPA) to run the AI for Critical Mineral Assessment Competition, soliciting innovative solutions for automatically georeferencing and extracting features from maps. The competition opened for registration in August 2022 and concluded in December 2022. Training and validation data from the map georeferencing challenge are provided here, as well as competition details and a baseline solution. The data were derived from published sources and are provided to the public to support continued development of automated georeferencing and feature extraction tools. References for all maps are included with the data.