55 datasets found
  1. D

    Data Labeling Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 8, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.

  2. R

    Geospatial Dataset

    • universe.roboflow.com
    zip
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahman (2025). Geospatial Dataset [Dataset]. https://universe.roboflow.com/rahman-nlvei/geospatial-dataset/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Geospatial Polygons
    Description

    Geospatial Dataset

    ## Overview
    
    Geospatial Dataset is a dataset for instance segmentation tasks - it contains Geospatial annotations for 1,048 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. S

    Two residential districts datasets from Kielce, Poland for building semantic...

    • scidb.cn
    Updated Sep 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agnieszka Łysak (2022). Two residential districts datasets from Kielce, Poland for building semantic segmentation task [Dataset]. http://doi.org/10.57760/sciencedb.02955
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2022
    Dataset provided by
    Science Data Bank
    Authors
    Agnieszka Łysak
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Poland, Kielce
    Description

    Today, deep neural networks are widely used in many computer vision problems, also for geographic information systems (GIS) data. This type of data is commonly used for urban analyzes and spatial planning. We used orthophotographic images of two residential districts from Kielce, Poland for research including urban sprawl automatic analysis with Transformer-based neural network application.Orthophotomaps were obtained from Kielce GIS portal. Then, the map was manually masked into building and building surroundings classes. Finally, the ortophotomap and corresponding classification mask were simultaneously divided into small tiles. This approach is common in image data preprocessing for machine learning algorithms learning phase. Data contains two original orthophotomaps from Wietrznia and Pod Telegrafem residential districts with corresponding masks and also their tiled version, ready to provide as a training data for machine learning models.Transformed-based neural network has undergone a training process on the Wietrznia dataset, targeted for semantic segmentation of the tiles into buildings and surroundings classes. After that, inference of the models was used to test model's generalization ability on the Pod Telegrafem dataset. The efficiency of the model was satisfying, so it can be used in automatic semantic building segmentation. Then, the process of dividing the images can be reversed and complete classification mask retrieved. This mask can be used for area of the buildings calculations and urban sprawl monitoring, if the research would be repeated for GIS data from wider time horizon.Since the dataset was collected from Kielce GIS portal, as the part of the Polish Main Office of Geodesy and Cartography data resource, it may be used only for non-profit and non-commertial purposes, in private or scientific applications, under the law "Ustawa z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (Dz.U. z 2006 r. nr 90 poz 631 z późn. zm.)". There are no other legal or ethical considerations in reuse potential.Data information is presented below.wietrznia_2019.jpg - orthophotomap of Wietrznia districtmodel's - used for training, as an explanatory imagewietrznia_2019.png - classification mask of Wietrznia district - used for model's training, as a target imagewietrznia_2019_validation.jpg - one image from Wietrznia district - used for model's validation during training phasepod_telegrafem_2019.jpg - orthophotomap of Pod Telegrafem district - used for model's evaluation after training phasewietrznia_2019 - folder with wietrznia_2019.jpg (image) and wietrznia_2019.png (annotation) images, divided into 810 tiles (512 x 512 pixels each), tiles with no information were manually removed, so the training data would contain only informative tilestiles presented - used for the model during training (images and annotations for fitting the model to the data)wietrznia_2019_vaidation - folder with wietrznia_2019_validation.jpg image divided into 16 tiles (256 x 256 pixels each) - tiles were presented to the model during training (images for validation model's efficiency); it was not the part of the training datapod_telegrafem_2019 - folder with pod_telegrafem.jpg image divided into 196 tiles (256 x 265 pixels each) - tiles were presented to the model during inference (images for evaluation model's robustness)Dataset was created as described below.Firstly, the orthophotomaps were collected from Kielce Geoportal (https://gis.kielce.eu). Kielce Geoportal offers a .pst recent map from April 2019. It is an orthophotomap with a resolution of 5 x 5 pixels, constructed from a plane flight at 700 meters over ground height, taken with a camera for vertical photos. Downloading was done by WMS in open-source QGIS software (https://www.qgis.org), as a 1:500 scale map, then converted to a 1200 dpi PNG image.Secondly, the map from Wietrznia residential district was manually labelled, also in QGIS, in the same scope, as the orthophotomap. Annotation based on land cover map information was also obtained from Kielce Geoportal. There are two classes - residential building and surrounding. Second map, from Pod Telegrafem district was not annotated, since it was used in the testing phase and imitates situation, where there is no annotation for the new data presented to the model.Next, the images was converted to an RGB JPG images, and the annotation map was converted to 8-bit GRAY PNG image.Finally, Wietrznia data files were tiled to 512 x 512 pixels tiles, in Python PIL library. Tiles with no information or a relatively small amount of information (only white background or mostly white background) were manually removed. So, from the 29113 x 15938 pixels orthophotomap, only 810 tiles with corresponding annotations were left, ready to train the machine learning model for the semantic segmentation task. Pod Telegrafem orthophotomap was tiled with no manual removing, so from the 7168 x 7168 pixels ortophotomap were created 197 tiles with 256 x 256 pixels resolution. There was also image of one residential building, used for model's validation during training phase, it was not the part of the training data, but was a part of Wietrznia residential area. It was 2048 x 2048 pixel ortophotomap, tiled to 16 tiles 256 x 265 pixels each.

  4. m

    Data from: Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka,...

    • data.mendeley.com
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Fahad Khan (2025). Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka, Bangladesh: A Resource for Environmental Analysis [Dataset]. http://doi.org/10.17632/hst78yczmy.5
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Md Fahad Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, Dhaka
    Description

    Google Earth Pro facilitated the acquisition of satellite imagery to monitor deforestation in Dhaka, Bangladesh. Multiple years of images were systematically captured from specific locations, allowing comprehensive analysis of tree cover reduction. The imagery displays diverse aspect ratios based on satellite perspectives and possesses high resolution, suitable for remote sensing. Each site provided 5 to 35 images annually, accumulating data over a ten-year period. The dataset classifies images into three primary categories: tree cover, deforested regions, and masked images. Organized by year, it comprises both raw and annotated images, each paired with a JSON file containing annotations and segmentation masks. This organization enhances accessibility and temporal analysis. Furthermore, the dataset is conducive to machine learning initiatives, particularly in training models for object detection and segmentation to evaluate environmental alterations.

  5. a

    Chatham County - Parcel Annotation

    • opendata-chathamncgis.opendata.arcgis.com
    • hub.arcgis.com
    Updated Oct 3, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chatham County GIS Portal (2016). Chatham County - Parcel Annotation [Dataset]. https://opendata-chathamncgis.opendata.arcgis.com/maps/9fd1ad747e0a45b0bae38566021dad04
    Explore at:
    Dataset updated
    Oct 3, 2016
    Dataset authored and provided by
    Chatham County GIS Portal
    Area covered
    Description

    Annotation feature class that provides labels for property boundary lengths and acreage of parcels in Chatham County, NC. This service also provides annotation for easements in the Chatham County parlines feature class.

    The annotation feature class is maintained by the Chatham County GIS & Tax departments and is updated on a daily basis.Chatham GIS SOP: "MAPSERV-163"

  6. a

    Assessor Base Map Annotation

    • hub.arcgis.com
    Updated Oct 6, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clark County GIS Management Office (2015). Assessor Base Map Annotation [Dataset]. https://hub.arcgis.com/maps/36d39996ff15407487b8e63a93e4a51b
    Explore at:
    Dataset updated
    Oct 6, 2015
    Dataset authored and provided by
    Clark County GIS Management Office
    Area covered
    Description

    Annotation for the Assessor's GIS data. This service is used in the OpenWeb and Opendoor application's.

  7. U

    Coast Train--Labeled imagery for training and evaluation of data-driven...

    • data.usgs.gov
    • catalog.data.gov
    Updated Aug 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand (2024). Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation [Dataset]. http://doi.org/10.5066/P91NP87I
    Explore at:
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 2008 - Dec 31, 2020
    Description

    Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...

  8. o

    MEDDOPLACE Corpus: Gold Standard annotations for Medical Documents...

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Mar 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salvador Lima López; Eulàlia Farré-Maduell; Vicent Briva-Iglesias; Luis Gasco; Martin Krallinger (2023). MEDDOPLACE Corpus: Gold Standard annotations for Medical Documents Place-related Content Extraction [Dataset]. http://doi.org/10.5281/zenodo.8403498
    Explore at:
    Dataset updated
    Mar 8, 2023
    Authors
    Salvador Lima López; Eulàlia Farré-Maduell; Vicent Briva-Iglesias; Luis Gasco; Martin Krallinger
    Description

    MEDDOPLACE stands for MEDical DOcument PLAce-related Content Extraction. It is a shared task and set of resources focused on the detection, normalization (entity linking/toponym resolution) and classification of different kinds of places, as well as related types of information such as clinical departments, nationalities or patient movements, in medical documents in Spanish. This repository includes the corpus' train and test sets in multiple formats, as well as the SNOMED gazetteer, cross-mapping between SNOMED and MeSH and the multilingual silver standard in 8 languages (Catalan, English, French, Italian, Dutch, Portuguese, Romanian and Swedish). For more information, please check the attached README file. MEDDOPLACE was developed by the Barcelona Supercomputing Center's NLP for Biomedical Information Analysis and used as part of IberLEF 2023. For more information on the corpus, annotation scheme and task in general, please visit: https://temu.bsc.es/meddoplace. Please cite if you use this resource: Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Brivá-Iglesias and Martin Krallinger. NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. In Procesamiento del Lenguaje Natural, 67. 2021. @article{meddoplace, title={MEDDOPLACE Shared Task overview: recognition, normalization and classification of locations and patient movement in clinical texts}, author={Lima-López, Salvador and Farré-Maduell, Eulàlia and Brivá-Iglesias, Vicent and Gasco-Sanchez, Luis and Krallinger, Martin}, journal = {Procesamiento del Lenguaje Natural}, volume = {71}, year={2023}, issn = {1135-5948},DOI = {10.26342/2023-71-23}, url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6561/3961}, pages = {301--311} } Related Links: - MEDDOPLACE website: https://temu.bsc.es/meddoplace - MEDDOPLACE overview paper: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6561 - Annotation Guidelines (Spanish): https://doi.org/10.5281/zenodo.7775234 - Annotation Guidelines (English): https://doi.org/10.5281/zenodo.7928145 License This work is licensed under a Creative Commons Attribution 4.0 International License. Contact If you have any questions or suggestions, please contact us at: - Salvador Lima-López ()- Martin Krallinger ()

  9. z

    PipedWaterAfrica: Geospatial Dataset of Water and Sanitation Access

    • zenodo.org
    csv, zip
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Othmane Echchabi; Othmane Echchabi; Aya Lahlou; Aya Lahlou; Nizar Talty; Josh Manto; Ka Leung Lam; Ka Leung Lam; Nizar Talty; Josh Manto (2025). PipedWaterAfrica: Geospatial Dataset of Water and Sanitation Access [Dataset]. http://doi.org/10.48550/arxiv.2411.19093
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset provided by
    Duke Kunshan University
    Columbia University
    Authors
    Othmane Echchabi; Othmane Echchabi; Aya Lahlou; Aya Lahlou; Nizar Talty; Josh Manto; Ka Leung Lam; Ka Leung Lam; Nizar Talty; Josh Manto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 26, 2025
    Description

    This dataset contains 256x256 pixel satellite images obtained from Sentinel and Landsat of areas across Africa, annotated for the presence or absence of piped water and sewage access. The data was curated using ground-truth survey information from Afrobarometer and cleaned to ensure high-quality annotations. The dataset is intended for research and analysis in remote sensing, water and sanitation infrastructure, and sustainable development applications.

  10. Z

    Seatizen Atlas

    • data.niaid.nih.gov
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Barde (2025). Seatizen Atlas [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11125847
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Julien Barde
    Sylvain Bonhommeau
    Victor Illien
    Alexis Joly
    Matteo Contini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This deposit offers a comprehensive collection of geospatial and metadata files that constitute the Seatizen Atlas dataset, facilitating the management and analysis of spatial information. To navigate through the data, you can use an interface available at seatizenmonitoring.ifremer.re, which provides a condensed CSV file tailored to your choice of metadata and the selected area.To retrieve the associated images, you will need to use a script that extracts the relevant frames. A brief tutorial is available here: Tutorial.All the scripts for processing sessions, creating the geopackage, and generating files can be found here: SeatizenDOI github repository.The repository includes:

    seatizen_atlas_db.gpkg: geopackage file that stores extensive geospatial data, allowing for efficient management and analysis of spatial information.
    session_doi.csv: a CSV file listing all sessions published on Zenodo. This file contains the following columns:

    session_name: identifies the session.
    session_doi: indicates the URL of the session.
    place: indicates the location of the session.
    date: indicates the date of the session.
    raw_data: indicates whether the session contains raw data or not.
    processed_data: indicates whether the session contains processed data.
    metadata_images.csv: a CSV file describing all metadata for each image published in open access. This file contains the following columns:

    OriginalFileName: indicates the original name of the photo.
    FileName: indicates the name of the photo adapted to the naming convention adopted by the Seatizen team (i.e., YYYYMMDD_COUNTRYCODE-optionalplace_device_session-number_originalimagename).
    relative_file_path: indicates the path of the image in the deposit.
    frames_doi: indicates the DOI of the version where the image is located.
    GPSLatitude: indicates the latitude of the image (if available).
    GPSLongitude: indicates the longitude of the image (if available).
    GPSAltitude: indicates the depth of the frame (if available).
    GPSRoll: indicates the roll of the image (if available).
    GPSPitch: indicates the pitch of the image (if available).
    GPSTrack: indicates the track of the image (if available).
    GPSDatetime: indicates when frames was take (if available).
    GPSFix: indicates GNSS quality levels (if available).
    metadata_multilabel_predictions.csv: a CSV file describing all predictions from last multilabel model with georeferenced data.

    FileName: indicates the name of the photo adapted to the naming convention adopted by the Seatizen team (i.e., YYYYMMDD_COUNTRYCODE-optionalplace_device_session-number_originalimagename).
    frames_doi: indicates the DOI of the version where the image is located.
    GPSLatitude: indicates the latitude of the image (if available).
    GPSLongitude: indicates the longitude of the image (if available).
    GPSAltitude: indicates the depth of the frame (if available).
    GPSRoll: indicates the roll of the image (if available).
    GPSPitch: indicates the pitch of the image (if available).
    GPSTrack: indicates the track of the image (if available).
    GPSFix: indicates GNSS quality levels (if available).
    prediction_doi: refers to a specific AI model prediction on the current image (if available).
    A column for each class predicted by the AI model.
    metadata_multilabel_annotation.csv: a CSV file listing the subset of all the images that are annotated, along with their annotations. This file contains the following columns:

    FileName: indicates the name of the photo.
    frame_doi: indicates the DOI of the version where the image is located.
    relative_file_path: indicates the path of the image in the deposit.
    annotation_date: indicates the date when the image was annotated.
    A column for each class with values:

    1: if the class is present.
    0: if the class is absent.
    -1: if the class was not annotated.
    seatizen_atlas.qgz: a qgis project which formats and highlights the geopackage file to facilitate data visualization.
    darwincore_multilabel_annotations.zip: a Darwin Core Archive (DwC-A) file listing the subset of all the images that are annotated, along with their annotations.

  11. M

    Geodatabase to Shapefile Warning Tool

    • gisdata.mn.gov
    esri_toolbox
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Minnesota (2025). Geodatabase to Shapefile Warning Tool [Dataset]. https://gisdata.mn.gov/dataset/gdb-to-shp-warning-tool
    Explore at:
    esri_toolboxAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    University of Minnesota
    Description

    The Geodatabase to Shapefile Warning Tool examines feature classes in input file geodatabases for characteristics and data that would be lost or altered if it were transformed into a shapefile. Checks include:
    1) large files (feature classes with more than 255 fields or over 2GB), 2) field names longer than 10 characters
    string fields longer than 254 characters, 3) date fields with time values 4) NULL values, 5) BLOB, guid, global id, and raster field types, 6) attribute domains or subtypes, and 7) annotation or topology

    The results of this inspection are written to a text file ("warning_report_[geodatabase_name]") in the directory where the geodatabase is located. A section at the top provides a list of feature classes and information about the geodatabase as a whole. The report has a section for each valid feature class that returned a warning, with a summary of possible warnings and then more details about issues found.

    The tool can process multiple file geodatabases at once. A separate text file report will be created for each geodatabase. The toolbox was created using ArcGIS Pro 3.7.11.

    For more information about this and other related tools, explore the Geospatial Data Curation toolkit

  12. d

    All Cadastral GIS Data (FGDB)

    • catalog.data.gov
    • data.seattle.gov
    • +2more
    Updated Jan 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.seattle.gov (2025). All Cadastral GIS Data (FGDB) [Dataset]. https://catalog.data.gov/dataset/all-cadastral-gis-data-fgdb-fd4d8
    Explore at:
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    data.seattle.gov
    Description

    This compressed file geodatabase contains the following layers: Legal Subdivisions - Line Legal Subdivisions - Polygon Legal Annotation Cadastral Control Points This dataset is updated on a weekly basis.

  13. d

    Data from: Preliminary geologic map of the Chugach National Forestspecial...

    • datadiscoverystudio.org
    e00
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USGS Information Services, Preliminary geologic map of the Chugach National Forestspecial study area, Alaska [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/af2cd344a1354b209352fdbd32100bb8/html
    Explore at:
    e00Available download formats
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Description

    Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information

  14. H

    Data from: Dataset: Forage grasses in crop fields from ultra-high spatial...

    • dataverse.harvard.edu
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andres Felipe Ruiz-Hurtado; Rodrigo Andres Camelo-Munevar; Darwin Alexis Arrechea-Castillo; Rosa Noemi Jauregui; Juan Andres Cardoso Arango (2025). Dataset: Forage grasses in crop fields from ultra-high spatial resolution UAV-based imagery [Dataset]. http://doi.org/10.7910/DVN/DBGUFW
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Andres Felipe Ruiz-Hurtado; Rodrigo Andres Camelo-Munevar; Darwin Alexis Arrechea-Castillo; Rosa Noemi Jauregui; Juan Andres Cardoso Arango
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/DBGUFWhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/DBGUFW

    Area covered
    Palmira, Florencia, Caquetá, Colombia, Cauca, Colombia, Santander de Quilichao
    Dataset funded by
    CGIAR Fund
    Description

    This dataset contains orthomosaics and individual Regions of Interest (ROIs) of forage grasses in crop fields from experimental trials of CIAT’s tropical forages breeding program; and annotations in Common Objects in Context (COCO) format derived from that data. The ROIs were manually annotated on UAV imagery and exported in common objects in context (COCO) format compatible with different machine learning models and architectures. 9,554 ROIs in the geospatial data and 12,365 annotations of forage grasses in COCO format. Methodology: The dataset was generated through a multi-step process beginning with data acquisition of forages crop fields via UAV flights (DJI Phantom 4 Multispectral drone) with RTK determining the geolocation. These images were processed in Agisoft Metashape to generate georeferenced orthomosaics as raster files. Manual annotation of forage grasses ROIs was performed in QGIS and the geospatial data for 8 different orthomosaics was later converted to COCO format using custom python scripting. To ensure compatibility witch COCO standards and optimize training efficiency, the large orthomosaics where clipped to the annotations’ extents with additional 1% spatial buffer and split into tiles with a maximum dimension close to 1024 pixels for the larger side and 25% overlap.

  15. w

    Global Image Annotation Tool Market Research Report: By Application (Object...

    • wiseguyreports.com
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Image Annotation Tool Market Research Report: By Application (Object Detection and Recognition, Image Classification, Image Segmentation, Image Generation, Image Editing and Enhancement), By End User (Automotive, Healthcare, Retail, Media and Entertainment, Education, Manufacturing), By Deployment Mode (Cloud-Based, On-Premise, Hybrid), By Access Type (Licensed Software, Software as a Service (SaaS), Open Source), By Image Type (2D Images, 3D Images, Medical Images) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/image-annotation-tool-market
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20234.1(USD Billion)
    MARKET SIZE 20244.6(USD Billion)
    MARKET SIZE 203211.45(USD Billion)
    SEGMENTS COVEREDApplication ,End User ,Deployment Mode ,Access Type ,Image Type ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICSGrowing AI ML and DL adoption Increasing demand for image analysis and object recognition Cloudbased deployment and subscriptionbased pricing models Emergence of semiautomated and automated annotation tools Competitive landscape with established vendors and new entrants
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDTech Mahindra ,Capgemini ,Whizlabs ,Cognizant ,Tata Consultancy Services ,Larsen & Toubro Infotech ,HCL Technologies ,IBM ,Accenture ,Infosys BPM ,Genpact ,Wipro ,Infosys ,DXC Technology
    MARKET FORECAST PERIOD2024 - 2032
    KEY MARKET OPPORTUNITIES1 AI and ML Advancements 2 Growing Big Data Analytics 3 Cloudbased Image Annotation Tools 4 Image Annotation for Medical Imaging 5 Geospatial Image Annotation
    COMPOUND ANNUAL GROWTH RATE (CAGR) 12.08% (2024 - 2032)
  16. f

    Data from: An empirical study of the semantic similarity of geospatial...

    • tandf.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niloofar Aflaki; Kristin Stock; Christopher B. Jones; Hans Guesgen; Jeremy Morley (2023). An empirical study of the semantic similarity of geospatial prepositions and their senses [Dataset]. http://doi.org/10.6084/m9.figshare.20517959.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Niloofar Aflaki; Kristin Stock; Christopher B. Jones; Hans Guesgen; Jeremy Morley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spatial prepositions have been studied in some detail from multiple disciplinary perspectives. However, neither the semantic similarity of these prepositions, nor the relationships between the multiple senses of different spatial prepositions, are well understood. In an empirical study of 24 spatial prepositions, we identify the degree and nature of semantic similarity and extract senses for three semantically similar groups of prepositions using t-SNE, DBSCAN clustering, and Venn diagrams. We validate the work by manual annotation with another data set. We find nuances in meaning among proximity and adjacency prepositions, such as the use of close to instead of near for pairs of lines, and the importance of proximity over contact for the next to preposition, in contrast to other adjacency prepositions.

  17. d

    Disaster Preparedness Information_River Warning

    • data.gov.tw
    csv
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Water Resources Agency,Ministry of Economic Affairs (2025). Disaster Preparedness Information_River Warning [Dataset]. https://data.gov.tw/en/datasets/5983
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Water Resources Agency,Ministry of Economic Affairs
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    The Ministry of Economic Affairs' Water Resources Agency's Disaster Emergency Response Team, utilizing long-term disaster response experience, further combines real-time data such as rainfall, water levels, and reservoir levels, through computer technology to provide water level alerts to the public and relevant units. This helps people understand the risk of home flooding, prepare early, and reduce the occurrence of disasters. This dataset is linked to a Keyhole Markup Language (KML) file list, which is a markup language based on the eXtensible Markup Language (XML) syntax standard, developed and maintained by Google's Keyhole company for expressing geospatial annotations. Documents written in the KML language are referred to as KML files and are used in Google Earth-related software (Google Earth, Google Map, Google Maps for mobile, etc.) for displaying geospatial data. Many GIS-related systems now also use this format for geospatial data exchange, and the KML of this data uses UTF-8 encoding.

  18. GANDR: Georelating-Annotated Natural Disaster Reports

    • zenodo.org
    json
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kai Moltzen; Kai Moltzen; Ricardo Usbeck; Ricardo Usbeck; Junbo Huang; Junbo Huang (2025). GANDR: Georelating-Annotated Natural Disaster Reports [Dataset]. http://doi.org/10.5281/zenodo.15612556
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kai Moltzen; Kai Moltzen; Ricardo Usbeck; Ricardo Usbeck; Junbo Huang; Junbo Huang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To facilitate research on Georelating - inferring unnamed disaster-affected areas from NL- we construct GANDR, a silver-standard dataset of 2,000 synthetic disaster reports with annotated H3 DGGS cell indices and geospatial relations for the US and EU.

  19. m

    Mercado de serviços de anotação de dados Análise de Tamanho, Participação e...

    • marketresearchintellect.com
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Intellect (2025). Mercado de serviços de anotação de dados Análise de Tamanho, Participação e Tendências 2033 [Dataset]. https://www.marketresearchintellect.com/pt/product/global-data-annotation-service-market-size-and-forecast/
    Explore at:
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Market Research Intellect
    License

    https://www.marketresearchintellect.com/pt/privacy-policyhttps://www.marketresearchintellect.com/pt/privacy-policy

    Area covered
    Global
    Description

    O tamanho e a participação do mercado são categorizados com base em Image Annotation (Bounding Box Annotation, Polygon Annotation, Semantic Segmentation, 3D Cuboid Annotation, Image Classification) and Text Annotation (Named Entity Recognition, Sentiment Analysis, Text Categorization, Part-of-Speech Tagging, Text Summarization) and Video Annotation (Object Tracking, Action Recognition, Event Detection, Video Classification, Frame-by-Frame Annotation) and Audio Annotation (Speech Recognition, Speaker Identification, Emotion Recognition, Transcription Services, Audio Classification) and Sensor Data Annotation (Lidar Data Annotation, Radar Data Annotation, Depth Data Annotation, Time-Series Data Annotation, Geospatial Data Annotation) and regiões geográficas (América do Norte, Europa, Ásia-Pacífico, América do Sul, Oriente Médio e África)

  20. d

    Data from: Preliminary geologic Map of the Perris 7.5' Quadrangle, Riverside...

    • dataone.org
    Updated Dec 1, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Douglas M. Morton (2016). Preliminary geologic Map of the Perris 7.5' Quadrangle, Riverside County, California [Dataset]. https://dataone.org/datasets/a6c99038-0cb3-424a-92ce-0ac305f1920d
    Explore at:
    Dataset updated
    Dec 1, 2016
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Douglas M. Morton
    Area covered
    Variables measured
    DIP, SHD, LABL, NAME, LTYPE, PLABL, L-SYMB, P-SYMB, PTTYPE, SHDFIL, and 1 more
    Description

    This data set maps and describes the geology of the Perris 7.5' quadrangle, Riverside County, California. Created using Environmental Systems Research Institute's ARC/INFO software, the data base consists of the following items: (1) a map coverage containing geologic contacts and units, (2) a coverage containing structural data, (3) a coverage containing geologic unit annotation and leaders, and (4) attribute tables for geologic units (polygons), contacts (arcs), and site-specific data (points). In addition, the data set includes the following graphic and text products: (1) a postscript graphic plot-file containing the geologic map, topography, cultural data, a Correlation of Map Units (CMU) diagram, a Description of Map Units (DMU), and a key for point and line symbols, and (2) PDF files of the Readme (including the metadata file as an appendix), and the graphic produced by the Postscript plot file. The Perris quadrangle is located in the northern part of the Peninsular Ranges Province within the central part of the Perris block, a relatively stable, rectangular in plan area located between the Elsinore and San Jacinto fault zones. The quadrangle is underlain by Cretaceous age and older basement rocks. The Cretaceous plutonic rocks are part of the composite Peninsular Ranges batholith. A wide variety of intermediate composition granitic rocks are located in the quadrangle. These rocks are mainly of tonalitic composition but range from monzogranite to diorite. Most rock is faintly to intensely foliated. Many are heterogenous and contain varying amounts of meso-and melanocratic discoidal-shaped inclusions. Some rocks are composed essentially of inclusion material and some are migmatitic. Included within these granitic rocks are a few septa of Paleozoic(?) schist of upper amphibolite metamorphic grade. Metamorphic rocks of probable Mesozoic age occur in the southwest corner of the quadrangle. Most of these rocks are well-foliated phyllite of Mesozoic age. The metamorphic grade of these rocks is greenschist or sub-greenschist. Rocks of probable Paleozoic age occur as scattered masses within plutonic rocks in the northern part of the quadrangle. These rocks are of amphibolite grade and include cordierite and sillimanite biotite schist. In the center and southeast quarter of the quadrangle, biotite-hornblende tonalite of the Lakeview Mountains pluton is characterized by ubiquitous schlieren and by a lack of potassium feldspar. Masses of leucocratic and melanocratic rock occur scattered throughout the pluton. Mesocratic-to melanocratic discoidal-shaped inclusions are oriented parallel to the schlieren. A small body of comb-layered gabbro is located with the tonalite near the southern margin of the pluton. The tonalite contains rare-earth bearing, zoned pegmatite dikes. Biotite-hornblende tonalite located in the southwest part of the quadrangle is part of the Val Verde pluton. This tonalite is similar to that of the Lakeview Mountains pluton but lacks the ubiquitous schlieren and contains potassium feldspar. Diagonally crossing the quadrangle is the channel and flood plain of the ephemeral San Jacinto River. Most of the alluviated area west of the San Jacinto River consists of Pleistocene age fluvial deposits, which have a degraded upper surface that is preserved in some places near the contact with granitic rocks. The upper part of these deposits form the Paloma surface of Woodford and others (1971). A modern-to Holocene-age drainage channel is within these older Pleistocene deposits. Younger Pleistocene alluvial fans emanate from the Lakeview Mountains east of the San Jacinto River.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383

Data Labeling Market Report

Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 8, 2025
Dataset authored and provided by
Data Insights Market
License

https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.

Search
Clear search
Close search
Google apps
Main menu