55 datasets found

D
Data Labeling Market Report
datainsightsmarket.com
doc, pdf, ppt
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.
R
Geospatial Dataset
universe.roboflow.com
zip
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahman (2025). Geospatial Dataset [Dataset]. https://universe.roboflow.com/rahman-nlvei/geospatial-dataset/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 9, 2025
Dataset authored and provided by
Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Geospatial Polygons
Description
Geospatial Dataset

## Overview Geospatial Dataset is a dataset for instance segmentation tasks - it contains Geospatial annotations for 1,048 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
S
Two residential districts datasets from Kielce, Poland for building semantic...
scidb.cn
Updated Sep 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agnieszka Łysak (2022). Two residential districts datasets from Kielce, Poland for building semantic segmentation task [Dataset]. http://doi.org/10.57760/sciencedb.02955
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.02955
Dataset updated
Sep 29, 2022
Dataset provided by
Science Data Bank
Authors
Agnieszka Łysak
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
Poland, Kielce
Description
Today, deep neural networks are widely used in many computer vision problems, also for geographic information systems (GIS) data. This type of data is commonly used for urban analyzes and spatial planning. We used orthophotographic images of two residential districts from Kielce, Poland for research including urban sprawl automatic analysis with Transformer-based neural network application.Orthophotomaps were obtained from Kielce GIS portal. Then, the map was manually masked into building and building surroundings classes. Finally, the ortophotomap and corresponding classification mask were simultaneously divided into small tiles. This approach is common in image data preprocessing for machine learning algorithms learning phase. Data contains two original orthophotomaps from Wietrznia and Pod Telegrafem residential districts with corresponding masks and also their tiled version, ready to provide as a training data for machine learning models.Transformed-based neural network has undergone a training process on the Wietrznia dataset, targeted for semantic segmentation of the tiles into buildings and surroundings classes. After that, inference of the models was used to test model's generalization ability on the Pod Telegrafem dataset. The efficiency of the model was satisfying, so it can be used in automatic semantic building segmentation. Then, the process of dividing the images can be reversed and complete classification mask retrieved. This mask can be used for area of the buildings calculations and urban sprawl monitoring, if the research would be repeated for GIS data from wider time horizon.Since the dataset was collected from Kielce GIS portal, as the part of the Polish Main Office of Geodesy and Cartography data resource, it may be used only for non-profit and non-commertial purposes, in private or scientific applications, under the law "Ustawa z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (Dz.U. z 2006 r. nr 90 poz 631 z późn. zm.)". There are no other legal or ethical considerations in reuse potential.Data information is presented below.wietrznia_2019.jpg - orthophotomap of Wietrznia districtmodel's - used for training, as an explanatory imagewietrznia_2019.png - classification mask of Wietrznia district - used for model's training, as a target imagewietrznia_2019_validation.jpg - one image from Wietrznia district - used for model's validation during training phasepod_telegrafem_2019.jpg - orthophotomap of Pod Telegrafem district - used for model's evaluation after training phasewietrznia_2019 - folder with wietrznia_2019.jpg (image) and wietrznia_2019.png (annotation) images, divided into 810 tiles (512 x 512 pixels each), tiles with no information were manually removed, so the training data would contain only informative tilestiles presented - used for the model during training (images and annotations for fitting the model to the data)wietrznia_2019_vaidation - folder with wietrznia_2019_validation.jpg image divided into 16 tiles (256 x 256 pixels each) - tiles were presented to the model during training (images for validation model's efficiency); it was not the part of the training datapod_telegrafem_2019 - folder with pod_telegrafem.jpg image divided into 196 tiles (256 x 265 pixels each) - tiles were presented to the model during inference (images for evaluation model's robustness)Dataset was created as described below.Firstly, the orthophotomaps were collected from Kielce Geoportal (https://gis.kielce.eu). Kielce Geoportal offers a .pst recent map from April 2019. It is an orthophotomap with a resolution of 5 x 5 pixels, constructed from a plane flight at 700 meters over ground height, taken with a camera for vertical photos. Downloading was done by WMS in open-source QGIS software (https://www.qgis.org), as a 1:500 scale map, then converted to a 1200 dpi PNG image.Secondly, the map from Wietrznia residential district was manually labelled, also in QGIS, in the same scope, as the orthophotomap. Annotation based on land cover map information was also obtained from Kielce Geoportal. There are two classes - residential building and surrounding. Second map, from Pod Telegrafem district was not annotated, since it was used in the testing phase and imitates situation, where there is no annotation for the new data presented to the model.Next, the images was converted to an RGB JPG images, and the annotation map was converted to 8-bit GRAY PNG image.Finally, Wietrznia data files were tiled to 512 x 512 pixels tiles, in Python PIL library. Tiles with no information or a relatively small amount of information (only white background or mostly white background) were manually removed. So, from the 29113 x 15938 pixels orthophotomap, only 810 tiles with corresponding annotations were left, ready to train the machine learning model for the semantic segmentation task. Pod Telegrafem orthophotomap was tiled with no manual removing, so from the 7168 x 7168 pixels ortophotomap were created 197 tiles with 256 x 256 pixels resolution. There was also image of one residential building, used for model's validation during training phase, it was not the part of the training data, but was a part of Wietrznia residential area. It was 2048 x 2048 pixel ortophotomap, tiled to 16 tiles 256 x 265 pixels each.
m
Data from: Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka,...
data.mendeley.com
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Fahad Khan (2025). Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka, Bangladesh: A Resource for Environmental Analysis [Dataset]. http://doi.org/10.17632/hst78yczmy.5
Explore at:
Unique identifier
https://doi.org/10.17632/hst78yczmy.5
Dataset updated
May 28, 2025
Authors
Md Fahad Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, Dhaka
Description
Google Earth Pro facilitated the acquisition of satellite imagery to monitor deforestation in Dhaka, Bangladesh. Multiple years of images were systematically captured from specific locations, allowing comprehensive analysis of tree cover reduction. The imagery displays diverse aspect ratios based on satellite perspectives and possesses high resolution, suitable for remote sensing. Each site provided 5 to 35 images annually, accumulating data over a ten-year period. The dataset classifies images into three primary categories: tree cover, deforested regions, and masked images. Organized by year, it comprises both raw and annotated images, each paired with a JSON file containing annotations and segmentation masks. This organization enhances accessibility and temporal analysis. Furthermore, the dataset is conducive to machine learning initiatives, particularly in training models for object detection and segmentation to evaluate environmental alterations.
a
Chatham County - Parcel Annotation
opendata-chathamncgis.opendata.arcgis.com
hub.arcgis.com
Updated Oct 3, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chatham County GIS Portal (2016). Chatham County - Parcel Annotation [Dataset]. https://opendata-chathamncgis.opendata.arcgis.com/maps/9fd1ad747e0a45b0bae38566021dad04
Explore at:
Dataset updated
Oct 3, 2016
Dataset authored and provided by
Chatham County GIS Portal
Area covered

Description
Annotation feature class that provides labels for property boundary lengths and acreage of parcels in Chatham County, NC. This service also provides annotation for easements in the Chatham County parlines feature class.

The annotation feature class is maintained by the Chatham County GIS & Tax departments and is updated on a daily basis.Chatham GIS SOP: "MAPSERV-163"
a
Assessor Base Map Annotation
hub.arcgis.com
Updated Oct 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clark County GIS Management Office (2015). Assessor Base Map Annotation [Dataset]. https://hub.arcgis.com/maps/36d39996ff15407487b8e63a93e4a51b
Explore at:
Dataset updated
Oct 6, 2015
Dataset authored and provided by
Clark County GIS Management Office
Area covered

Description
Annotation for the Assessor's GIS data. This service is used in the OpenWeb and Opendoor application's.
U
Coast Train--Labeled imagery for training and evaluation of data-driven...
data.usgs.gov
catalog.data.gov
Updated Aug 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand (2024). Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation [Dataset]. http://doi.org/10.5066/P91NP87I
Explore at:
Unique identifier
https://doi.org/10.5066/P91NP87I
Dataset updated
Aug 31, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jan 1, 2008 - Dec 31, 2020
Description
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...
o
MEDDOPLACE Corpus: Gold Standard annotations for Medical Documents...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Mar 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvador Lima López; Eulàlia Farré-Maduell; Vicent Briva-Iglesias; Luis Gasco; Martin Krallinger (2023). MEDDOPLACE Corpus: Gold Standard annotations for Medical Documents Place-related Content Extraction [Dataset]. http://doi.org/10.5281/zenodo.8403498
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8403498
Dataset updated
Mar 8, 2023
Authors
Salvador Lima López; Eulàlia Farré-Maduell; Vicent Briva-Iglesias; Luis Gasco; Martin Krallinger
Description
MEDDOPLACE stands for MEDical DOcument PLAce-related Content Extraction. It is a shared task and set of resources focused on the detection, normalization (entity linking/toponym resolution) and classification of different kinds of places, as well as related types of information such as clinical departments, nationalities or patient movements, in medical documents in Spanish. This repository includes the corpus' train and test sets in multiple formats, as well as the SNOMED gazetteer, cross-mapping between SNOMED and MeSH and the multilingual silver standard in 8 languages (Catalan, English, French, Italian, Dutch, Portuguese, Romanian and Swedish). For more information, please check the attached README file. MEDDOPLACE was developed by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis and used as part of IberLEF 2023. For more information on the corpus, annotation scheme and task in general, please visit: https://temu.bsc.es/meddoplace. Please cite if you use this resource: Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Brivá-Iglesias and Martin Krallinger. NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. In Procesamiento del Lenguaje Natural, 67. 2021. @article{meddoplace, title={MEDDOPLACE Shared Task overview: recognition, normalization and classification of locations and patient movement in clinical texts}, author={Lima-López, Salvador and Farré-Maduell, Eulàlia and Brivá-Iglesias, Vicent and Gasco-Sanchez, Luis and Krallinger, Martin}, journal = {Procesamiento del Lenguaje Natural}, volume = {71}, year={2023}, issn = {1135-5948},DOI = {10.26342/2023-71-23}, url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6561/3961}, pages = {301--311} } Related Links: - MEDDOPLACE website: https://temu.bsc.es/meddoplace - MEDDOPLACE overview paper: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6561 - Annotation Guidelines (Spanish): https://doi.org/10.5281/zenodo.7775234 - Annotation Guidelines (English): https://doi.org/10.5281/zenodo.7928145 License This work is licensed under a Creative Commons Attribution 4.0 International License. Contact If you have any questions or suggestions, please contact us at: - Salvador Lima-López ()- Martin Krallinger ()
z
PipedWaterAfrica: Geospatial Dataset of Water and Sanitation Access
zenodo.org
csv, zip
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Othmane Echchabi; Othmane Echchabi; Aya Lahlou; Aya Lahlou; Nizar Talty; Josh Manto; Ka Leung Lam; Ka Leung Lam; Nizar Talty; Josh Manto (2025). PipedWaterAfrica: Geospatial Dataset of Water and Sanitation Access [Dataset]. http://doi.org/10.48550/arxiv.2411.19093
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.48550/arxiv.2411.19093
Dataset updated
May 28, 2025
Dataset provided by
Duke Kunshan University
Columbia University
Authors
Othmane Echchabi; Othmane Echchabi; Aya Lahlou; Aya Lahlou; Nizar Talty; Josh Manto; Ka Leung Lam; Ka Leung Lam; Nizar Talty; Josh Manto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 26, 2025
Description
This dataset contains 256x256 pixel satellite images obtained from Sentinel and Landsat of areas across Africa, annotated for the presence or absence of piped water and sewage access. The data was curated using ground-truth survey information from Afrobarometer and cleaned to ensure high-quality annotations. The dataset is intended for research and analysis in remote sensing, water and sanitation infrastructure, and sustainable development applications.
Z
Seatizen Atlas
data.niaid.nih.gov
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julien Barde (2025). Seatizen Atlas [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11125847
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Julien Barde
Sylvain Bonhommeau
Victor Illien
Alexis Joly
Matteo Contini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This deposit offers a comprehensive collection of geospatial and metadata files that constitute the Seatizen Atlas dataset, facilitating the management and analysis of spatial information. To navigate through the data, you can use an interface available at seatizenmonitoring.ifremer.re, which provides a condensed CSV file tailored to your choice of metadata and the selected area.To retrieve the associated images, you will need to use a script that extracts the relevant frames. A brief tutorial is available here: Tutorial.All the scripts for processing sessions, creating the geopackage, and generating files can be found here: SeatizenDOI github repository.The repository includes:

seatizen_atlas_db.gpkg: geopackage file that stores extensive geospatial data, allowing for efficient management and analysis of spatial information.
session_doi.csv: a CSV file listing all sessions published on Zenodo. This file contains the following columns:

session_name: identifies the session.
session_doi: indicates the URL of the session.
place: indicates the location of the session.
date: indicates the date of the session.
raw_data: indicates whether the session contains raw data or not.
processed_data: indicates whether the session contains processed data.
metadata_images.csv: a CSV file describing all metadata for each image published in open access. This file contains the following columns:

OriginalFileName: indicates the original name of the photo.
FileName: indicates the name of the photo adapted to the naming convention adopted by the Seatizen team (i.e., YYYYMMDD_COUNTRYCODE-optionalplace_device_session-number_originalimagename).
relative_file_path: indicates the path of the image in the deposit.
frames_doi: indicates the DOI of the version where the image is located.
GPSLatitude: indicates the latitude of the image (if available).
GPSLongitude: indicates the longitude of the image (if available).
GPSAltitude: indicates the depth of the frame (if available).
GPSRoll: indicates the roll of the image (if available).
GPSPitch: indicates the pitch of the image (if available).
GPSTrack: indicates the track of the image (if available).
GPSDatetime: indicates when frames was take (if available).
GPSFix: indicates GNSS quality levels (if available).
metadata_multilabel_predictions.csv: a CSV file describing all predictions from last multilabel model with georeferenced data.

FileName: indicates the name of the photo adapted to the naming convention adopted by the Seatizen team (i.e., YYYYMMDD_COUNTRYCODE-optionalplace_device_session-number_originalimagename).
frames_doi: indicates the DOI of the version where the image is located.
GPSLatitude: indicates the latitude of the image (if available).
GPSLongitude: indicates the longitude of the image (if available).
GPSAltitude: indicates the depth of the frame (if available).
GPSRoll: indicates the roll of the image (if available).
GPSPitch: indicates the pitch of the image (if available).
GPSTrack: indicates the track of the image (if available).
GPSFix: indicates GNSS quality levels (if available).
prediction_doi: refers to a specific AI model prediction on the current image (if available).
A column for each class predicted by the AI model.
metadata_multilabel_annotation.csv: a CSV file listing the subset of all the images that are annotated, along with their annotations. This file contains the following columns:

FileName: indicates the name of the photo.
frame_doi: indicates the DOI of the version where the image is located.
relative_file_path: indicates the path of the image in the deposit.
annotation_date: indicates the date when the image was annotated.
A column for each class with values:

1: if the class is present.
0: if the class is absent.
-1: if the class was not annotated.
seatizen_atlas.qgz: a qgis project which formats and highlights the geopackage file to facilitate data visualization.
darwincore_multilabel_annotations.zip: a Darwin Core Archive (DwC-A) file listing the subset of all the images that are annotated, along with their annotations.
M
Geodatabase to Shapefile Warning Tool
gisdata.mn.gov
esri_toolbox
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Minnesota (2025). Geodatabase to Shapefile Warning Tool [Dataset]. https://gisdata.mn.gov/dataset/gdb-to-shp-warning-tool
Explore at:
esri_toolboxAvailable download formats
Dataset updated
Apr 1, 2025
Dataset provided by
University of Minnesota
Description
The Geodatabase to Shapefile Warning Tool examines feature classes in input file geodatabases for characteristics and data that would be lost or altered if it were transformed into a shapefile. Checks include:
1) large files (feature classes with more than 255 fields or over 2GB), 2) field names longer than 10 characters
string fields longer than 254 characters, 3) date fields with time values 4) NULL values, 5) BLOB, guid, global id, and raster field types, 6) attribute domains or subtypes, and 7) annotation or topology

The results of this inspection are written to a text file ("warning_report_[geodatabase_name]") in the directory where the geodatabase is located. A section at the top provides a list of feature classes and information about the geodatabase as a whole. The report has a section for each valid feature class that returned a warning, with a summary of possible warnings and then more details about issues found.

The tool can process multiple file geodatabases at once. A separate text file report will be created for each geodatabase. The toolbox was created using ArcGIS Pro 3.7.11.

For more information about this and other related tools, explore the Geospatial Data Curation toolkit
d
All Cadastral GIS Data (FGDB)
catalog.data.gov
data.seattle.gov
+2more
Updated Jan 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.seattle.gov (2025). All Cadastral GIS Data (FGDB) [Dataset]. https://catalog.data.gov/dataset/all-cadastral-gis-data-fgdb-fd4d8
Explore at:
Dataset updated
Jan 31, 2025
Dataset provided by
data.seattle.gov
Description
This compressed file geodatabase contains the following layers: Legal Subdivisions - Line Legal Subdivisions - Polygon Legal Annotation Cadastral Control Points This dataset is updated on a weekly basis.
d
Data from: Preliminary geologic map of the Chugach National Forestspecial...
datadiscoverystudio.org
e00
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USGS Information Services, Preliminary geologic map of the Chugach National Forestspecial study area, Alaska [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/af2cd344a1354b209352fdbd32100bb8/html
Explore at:
e00Available download formats
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered

Description
Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information
H
Data from: Dataset: Forage grasses in crop fields from ultra-high spatial...
dataverse.harvard.edu
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andres Felipe Ruiz-Hurtado; Rodrigo Andres Camelo-Munevar; Darwin Alexis Arrechea-Castillo; Rosa Noemi Jauregui; Juan Andres Cardoso Arango (2025). Dataset: Forage grasses in crop fields from ultra-high spatial resolution UAV-based imagery [Dataset]. http://doi.org/10.7910/DVN/DBGUFW
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/DBGUFW
Dataset updated
Apr 28, 2025
Dataset provided by
Harvard Dataverse
Authors
Andres Felipe Ruiz-Hurtado; Rodrigo Andres Camelo-Munevar; Darwin Alexis Arrechea-Castillo; Rosa Noemi Jauregui; Juan Andres Cardoso Arango
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/DBGUFWhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/DBGUFW
Area covered
Palmira, Florencia, Caquetá, Colombia, Cauca, Colombia, Santander de Quilichao
Dataset funded by
CGIAR Fund
Description
This dataset contains orthomosaics and individual Regions of Interest (ROIs) of forage grasses in crop fields from experimental trials of CIAT’s tropical forages breeding program; and annotations in Common Objects in Context (COCO) format derived from that data. The ROIs were manually annotated on UAV imagery and exported in common objects in context (COCO) format compatible with different machine learning models and architectures. 9,554 ROIs in the geospatial data and 12,365 annotations of forage grasses in COCO format. Methodology: The dataset was generated through a multi-step process beginning with data acquisition of forages crop fields via UAV flights (DJI Phantom 4 Multispectral drone) with RTK determining the geolocation. These images were processed in Agisoft Metashape to generate georeferenced orthomosaics as raster files. Manual annotation of forage grasses ROIs was performed in QGIS and the geospatial data for 8 different orthomosaics was later converted to COCO format using custom python scripting. To ensure compatibility witch COCO standards and optimize training efficiency, the large orthomosaics where clipped to the annotations’ extents with additional 1% spatial buffer and split into tiles with a maximum dimension close to 1024 pixels for the larger side and 25% overlap.

Global Image Annotation Tool Market Research Report: By Application (Object...

wiseguyreports.com

Updated Jul 23, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Image Annotation Tool Market Research Report: By Application (Object Detection and Recognition, Image Classification, Image Segmentation, Image Generation, Image Editing and Enhancement), By End User (Automotive, Healthcare, Retail, Media and Entertainment, Education, Manufacturing), By Deployment Mode (Cloud-Based, On-Premise, Hybrid), By Access Type (Licensed Software, Software as a Service (SaaS), Open Source), By Image Type (2D Images, 3D Images, Medical Images) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/image-annotation-tool-market

Explore at:

Dataset updated

Jul 23, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 7, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	4.1(USD Billion)
MARKET SIZE 2024	4.6(USD Billion)
MARKET SIZE 2032	11.45(USD Billion)
SEGMENTS COVERED	Application ,End User ,Deployment Mode ,Access Type ,Image Type ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	Growing AI ML and DL adoption Increasing demand for image analysis and object recognition Cloudbased deployment and subscriptionbased pricing models Emergence of semiautomated and automated annotation tools Competitive landscape with established vendors and new entrants
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Tech Mahindra ,Capgemini ,Whizlabs ,Cognizant ,Tata Consultancy Services ,Larsen & Toubro Infotech ,HCL Technologies ,IBM ,Accenture ,Infosys BPM ,Genpact ,Wipro ,Infosys ,DXC Technology
MARKET FORECAST PERIOD	2024 - 2032
KEY MARKET OPPORTUNITIES	1 AI and ML Advancements 2 Growing Big Data Analytics 3 Cloudbased Image Annotation Tools 4 Image Annotation for Medical Imaging 5 Geospatial Image Annotation
COMPOUND ANNUAL GROWTH RATE (CAGR)	12.08% (2024 - 2032)

f
Data from: An empirical study of the semantic similarity of geospatial...
tandf.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niloofar Aflaki; Kristin Stock; Christopher B. Jones; Hans Guesgen; Jeremy Morley (2023). An empirical study of the semantic similarity of geospatial prepositions and their senses [Dataset]. http://doi.org/10.6084/m9.figshare.20517959.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20517959.v1
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Niloofar Aflaki; Kristin Stock; Christopher B. Jones; Hans Guesgen; Jeremy Morley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spatial prepositions have been studied in some detail from multiple disciplinary perspectives. However, neither the semantic similarity of these prepositions, nor the relationships between the multiple senses of different spatial prepositions, are well understood. In an empirical study of 24 spatial prepositions, we identify the degree and nature of semantic similarity and extract senses for three semantically similar groups of prepositions using t-SNE, DBSCAN clustering, and Venn diagrams. We validate the work by manual annotation with another data set. We find nuances in meaning among proximity and adjacency prepositions, such as the use of close to instead of near for pairs of lines, and the importance of proximity over contact for the next to preposition, in contrast to other adjacency prepositions.
d
Disaster Preparedness Information_River Warning
data.gov.tw
csv
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Water Resources Agency,Ministry of Economic Affairs (2025). Disaster Preparedness Information_River Warning [Dataset]. https://data.gov.tw/en/datasets/5983
Explore at:
csvAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Water Resources Agency,Ministry of Economic Affairs
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
The Ministry of Economic Affairs' Water Resources Agency's Disaster Emergency Response Team, utilizing long-term disaster response experience, further combines real-time data such as rainfall, water levels, and reservoir levels, through computer technology to provide water level alerts to the public and relevant units. This helps people understand the risk of home flooding, prepare early, and reduce the occurrence of disasters. This dataset is linked to a Keyhole Markup Language (KML) file list, which is a markup language based on the eXtensible Markup Language (XML) syntax standard, developed and maintained by Google's Keyhole company for expressing geospatial annotations. Documents written in the KML language are referred to as KML files and are used in Google Earth-related software (Google Earth, Google Map, Google Maps for mobile, etc.) for displaying geospatial data. Many GIS-related systems now also use this format for geospatial data exchange, and the KML of this data uses UTF-8 encoding.
GANDR: Georelating-Annotated Natural Disaster Reports
zenodo.org
json
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kai Moltzen; Kai Moltzen; Ricardo Usbeck; Ricardo Usbeck; Junbo Huang; Junbo Huang (2025). GANDR: Georelating-Annotated Natural Disaster Reports [Dataset]. http://doi.org/10.5281/zenodo.15612556
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15612556
Dataset updated
Jun 7, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kai Moltzen; Kai Moltzen; Ricardo Usbeck; Ricardo Usbeck; Junbo Huang; Junbo Huang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To facilitate research on Georelating - inferring unnamed disaster-affected areas from NL- we construct GANDR, a silver-standard dataset of 2,000 synthetic disaster reports with annotated H3 DGGS cell indices and geospatial relations for the US and EU.
m
Mercado de serviços de anotação de dados Análise de Tamanho, Participação e...
marketresearchintellect.com
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Intellect (2025). Mercado de serviços de anotação de dados Análise de Tamanho, Participação e Tendências 2033 [Dataset]. https://www.marketresearchintellect.com/pt/product/global-data-annotation-service-market-size-and-forecast/
Explore at:
Dataset updated
May 19, 2025
Dataset authored and provided by
Market Research Intellect
License
https://www.marketresearchintellect.com/pt/privacy-policyhttps://www.marketresearchintellect.com/pt/privacy-policy
Area covered
Global
Description
O tamanho e a participação do mercado são categorizados com base em Image Annotation (Bounding Box Annotation, Polygon Annotation, Semantic Segmentation, 3D Cuboid Annotation, Image Classification) and Text Annotation (Named Entity Recognition, Sentiment Analysis, Text Categorization, Part-of-Speech Tagging, Text Summarization) and Video Annotation (Object Tracking, Action Recognition, Event Detection, Video Classification, Frame-by-Frame Annotation) and Audio Annotation (Speech Recognition, Speaker Identification, Emotion Recognition, Transcription Services, Audio Classification) and Sensor Data Annotation (Lidar Data Annotation, Radar Data Annotation, Depth Data Annotation, Time-Series Data Annotation, Geospatial Data Annotation) and regiões geográficas (América do Norte, Europa, Ásia-Pacífico, América do Sul, Oriente Médio e África)
d
Data from: Preliminary geologic Map of the Perris 7.5' Quadrangle, Riverside...
dataone.org
Updated Dec 1, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Douglas M. Morton (2016). Preliminary geologic Map of the Perris 7.5' Quadrangle, Riverside County, California [Dataset]. https://dataone.org/datasets/a6c99038-0cb3-424a-92ce-0ac305f1920d
Explore at:
Dataset updated
Dec 1, 2016
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Douglas M. Morton
Area covered

Variables measured
DIP, SHD, LABL, NAME, LTYPE, PLABL, L-SYMB, P-SYMB, PTTYPE, SHDFIL, and 1 more
Description
This data set maps and describes the geology of the Perris 7.5' quadrangle, Riverside County, California. Created using Environmental Systems Research Institute's ARC/INFO software, the data base consists of the following items: (1) a map coverage containing geologic contacts and units, (2) a coverage containing structural data, (3) a coverage containing geologic unit annotation and leaders, and (4) attribute tables for geologic units (polygons), contacts (arcs), and site-specific data (points). In addition, the data set includes the following graphic and text products: (1) a postscript graphic plot-file containing the geologic map, topography, cultural data, a Correlation of Map Units (CMU) diagram, a Description of Map Units (DMU), and a key for point and line symbols, and (2) PDF files of the Readme (including the metadata file as an appendix), and the graphic produced by the Postscript plot file. The Perris quadrangle is located in the northern part of the Peninsular Ranges Province within the central part of the Perris block, a relatively stable, rectangular in plan area located between the Elsinore and San Jacinto fault zones. The quadrangle is underlain by Cretaceous age and older basement rocks. The Cretaceous plutonic rocks are part of the composite Peninsular Ranges batholith. A wide variety of intermediate composition granitic rocks are located in the quadrangle. These rocks are mainly of tonalitic composition but range from monzogranite to diorite. Most rock is faintly to intensely foliated. Many are heterogenous and contain varying amounts of meso-and melanocratic discoidal-shaped inclusions. Some rocks are composed essentially of inclusion material and some are migmatitic. Included within these granitic rocks are a few septa of Paleozoic(?) schist of upper amphibolite metamorphic grade. Metamorphic rocks of probable Mesozoic age occur in the southwest corner of the quadrangle. Most of these rocks are well-foliated phyllite of Mesozoic age. The metamorphic grade of these rocks is greenschist or sub-greenschist. Rocks of probable Paleozoic age occur as scattered masses within plutonic rocks in the northern part of the quadrangle. These rocks are of amphibolite grade and include cordierite and sillimanite biotite schist. In the center and southeast quarter of the quadrangle, biotite-hornblende tonalite of the Lakeview Mountains pluton is characterized by ubiquitous schlieren and by a lack of potassium feldspar. Masses of leucocratic and melanocratic rock occur scattered throughout the pluton. Mesocratic-to melanocratic discoidal-shaped inclusions are oriented parallel to the schlieren. A small body of comb-layered gabbro is located with the tonalite near the southern margin of the pluton. The tonalite contains rare-earth bearing, zoned pegmatite dikes. Biotite-hornblende tonalite located in the southwest part of the quadrangle is part of the Val Verde pluton. This tonalite is similar to that of the Lakeview Mountains pluton but lacks the ubiquitous schlieren and contains potassium feldspar. Diagonally crossing the quadrangle is the channel and flood plain of the ephemeral San Jacinto River. Most of the alluviated area west of the San Jacinto River consists of Pleistocene age fluvial deposits, which have a degraded upper surface that is preserved in some places near the contact with granitic rocks. The upper part of these deposits form the Paloma surface of Woodford and others (1971). A modern-to Holocene-age drainage channel is within these older Pleistocene deposits. Younger Pleistocene alluvial fans emanate from the Lakeview Mountains east of the San Jacinto River.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383

Data Labeling Market Report

Explore at:

doc, ppt, pdfAvailable download formats

Dataset updated

Mar 8, 2025

Dataset authored and provided by

Data Insights Market

License

https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.

Clear search

Close search

Google apps

Main menu

Data Labeling Market Report

Geospatial Dataset

Geospatial Dataset

Two residential districts datasets from Kielce, Poland for building semantic...

Data from: Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka,...

Chatham County - Parcel Annotation

Assessor Base Map Annotation

Coast Train--Labeled imagery for training and evaluation of data-driven...

MEDDOPLACE Corpus: Gold Standard annotations for Medical Documents...

PipedWaterAfrica: Geospatial Dataset of Water and Sanitation Access

Seatizen Atlas

Geodatabase to Shapefile Warning Tool

All Cadastral GIS Data (FGDB)

Data from: Preliminary geologic map of the Chugach National Forestspecial...

Data from: Dataset: Forage grasses in crop fields from ultra-high spatial...

Global Image Annotation Tool Market Research Report: By Application (Object...

Data from: An empirical study of the semantic similarity of geospatial...

Disaster Preparedness Information_River Warning

GANDR: Georelating-Annotated Natural Disaster Reports

Mercado de serviços de anotação de dados Análise de Tamanho, Participação e...

Data from: Preliminary geologic Map of the Perris 7.5' Quadrangle, Riverside...

Data Labeling Market Report