66 datasets found

f
DataSheet1_Synthetic data at scale: a development model to efficiently...
figshare.com
zip
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Klein; Rebekah Waller; Sören Pirk; Wojtek Pałubicki; Mark Tester; Dominik L. Michels (2024). DataSheet1_Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.zip [Dataset]. http://doi.org/10.3389/fpls.2024.1360113.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fpls.2024.1360113.s001
Dataset updated
Sep 16, 2024
Dataset provided by
Frontiers
Authors
Jonathan Klein; Rebekah Waller; Sören Pirk; Wojtek Pałubicki; Mark Tester; Dominik L. Michels
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
Data from: X-ray CT data with semantic annotations for the paper "A workflow...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). X-ray CT data with semantic annotations for the paper "A workflow for segmenting soil and plant X-ray CT images with deep learning in Google’s Colaboratory" [Dataset]. https://catalog.data.gov/dataset/x-ray-ct-data-with-semantic-annotations-for-the-paper-a-workflow-for-segmenting-soil-and-p-d195a
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
Leaves from genetically unique Juglans regia plants were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA). Soil samples were collected in Fall of 2017 from the riparian oak forest located at the Russell Ranch Sustainable Agricultural Institute at the University of California Davis. The soil was sieved through a 2 mm mesh and was air dried before imaging. A single soil aggregate was scanned at 23 keV using the 10x objective lens with a pixel resolution of 650 nanometers on beamline 8.3.2 at the ALS. Additionally, a drought stressed almond flower bud (Prunus dulcis) from a plant housed at the University of California, Davis, was scanned using a 4x lens with a pixel resolution of 1.72 µm on beamline 8.3.2 at the ALS Raw tomographic image data was reconstructed using TomoPy. Reconstructions were converted to 8-bit tif or png format using ImageJ or the PIL package in Python before further processing. Images were annotated using Intel’s Computer Vision Annotation Tool (CVAT) and ImageJ. Both CVAT and ImageJ are free to use and open source. Leaf images were annotated in following Théroux-Rancourt et al. (2020). Specifically, Hand labeling was done directly in ImageJ by drawing around each tissue; with 5 images annotated per leaf. Care was taken to cover a range of anatomical variation to help improve the generalizability of the models to other leaves. All slices were labeled by Dr. Mina Momayyezi and Fiona Duong.To annotate the flower bud and soil aggregate, images were imported into CVAT. The exterior border of the bud (i.e. bud scales) and flower were annotated in CVAT and exported as masks. Similarly, the exterior of the soil aggregate and particulate organic matter identified by eye were annotated in CVAT and exported as masks. To annotate air spaces in both the bud and soil aggregate, images were imported into ImageJ. A gaussian blur was applied to the image to decrease noise and then the air space was segmented using thresholding. After applying the threshold, the selected air space region was converted to a binary image with white representing the air space and black representing everything else. This binary image was overlaid upon the original image and the air space within the flower bud and aggregate was selected using the “free hand” tool. Air space outside of the region of interest for both image sets was eliminated. The quality of the air space annotation was then visually inspected for accuracy against the underlying original image; incomplete annotations were corrected using the brush or pencil tool to paint missing air space white and incorrectly identified air space black. Once the annotation was satisfactorily corrected, the binary image of the air space was saved. Finally, the annotations of the bud and flower or aggregate and organic matter were opened in ImageJ and the associated air space mask was overlaid on top of them forming a three-layer mask suitable for training the fully convolutional network. All labeling of the soil aggregate and soil aggregate images was done by Dr. Devin Rippner. These images and annotations are for training deep learning models to identify different constituents in leaves, almond buds, and soil aggregates Limitations: For the walnut leaves, some tissues (stomata, etc.) are not labeled and only represent a small portion of a full leaf. Similarly, both the almond bud and the aggregate represent just one single sample of each. The bud tissues are only divided up into buds scales, flower, and air space. Many other tissues remain unlabeled. For the soil aggregate annotated labels are done by eye with no actual chemical information. Therefore particulate organic matter identification may be incorrect. Resources in this dataset:Resource Title: Annotated X-ray CT images and masks of a Forest Soil Aggregate. File Name: forest_soil_images_masks_for_testing_training.zipResource Description: This aggregate was collected from the riparian oak forest at the Russell Ranch Sustainable Agricultural Facility. The aggreagate was scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 0,0,0; pores spaces have a value of 250,250, 250; mineral solids have a value= 128,0,0; and particulate organic matter has a value of = 000,128,000. These files were used for training a model to segment the forest soil aggregate and for testing the accuracy, precision, recall, and f1 score of the model.Resource Title: Annotated X-ray CT images and masks of an Almond bud (P. Dulcis). File Name: Almond_bud_tube_D_P6_training_testing_images_and_masks.zipResource Description: Drought stressed almond flower bud (Prunis dulcis) from a plant housed at the University of California, Davis, was scanned by X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 4x lens with a pixel resolution of 1.72 µm using. For masks, the background has a value of 0,0,0; air spaces have a value of 255,255, 255; bud scales have a value= 128,0,0; and flower tissues have a value of = 000,128,000. These files were used for training a model to segment the almond bud and for testing the accuracy, precision, recall, and f1 score of the model.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads Resource Title: Annotated X-ray CT images and masks of Walnut leaves (J. Regia) . File Name: 6_leaf_training_testing_images_and_masks_for_paper.zipResource Description: Stems were collected from genetically unique J. regia accessions at the 117 USDA-ARS-NCGR in Wolfskill Experimental Orchard, Winters, California USA to use as scion, and were grafted by Sierra Gold Nursery onto a commonly used commercial rootstock, RX1 (J. microcarpa × J. regia). We used a common rootstock to eliminate any own-root effects and to simulate conditions for a commercial walnut orchard setting, where rootstocks are commonly used. The grafted saplings were repotted and transferred to the Armstrong lathe house facility at the University of California, Davis in June 2019, and kept under natural light and temperature. Leaves from each accession and treatment were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 170,170,170; Epidermis value= 85,85,85; Mesophyll value= 0,0,0; Bundle Sheath Extension value= 152,152,152; Vein value= 220,220,220; Air value = 255,255,255.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads
R
Data from: Annotation data about Multi Criteria Assessment Methods used in...
entrepot.recherche.data.gouv.fr
tsv
Updated Jul 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vincent Martinet; Louis-georges Soler; Vincent Martinet; Louis-georges Soler (2019). Annotation data about Multi Criteria Assessment Methods used in Social Science, Agriculture and Food, Rural Development and Environment : the French National Institute for Agricultural Research (INRA-SAE2) experience [Dataset]. http://doi.org/10.15454/RZKKWG
Explore at:
tsv(5994)Available download formats
Unique identifier
https://doi.org/10.15454/RZKKWG
Dataset updated
Jul 30, 2019
Dataset provided by
Recherche Data Gouv
Authors
Vincent Martinet; Louis-georges Soler; Vincent Martinet; Louis-georges Soler
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
This data article contains annotation data characterizing Multi Criteria Assessment Methods proposed in the scientific literature by INRA researchers belonging to the Social Science, Agriculture and Food, Rural Development and Environment department. Those researchs aim to on the one hand, to understand the functioning and social and economic developments of agriculture, food processing industries, agribusinesses, food with close links to local and global environmental stakes, and on the other hand, to shed light on public debates and public and private decisions.
f
Data from: Sensitivity examination of YOLOv4 regarding test image distortion...
tandf.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenan Yuan; Daeun Choi; Dimitrios Bolkas; Paul Heinz Heinemann; Long He (2023). Sensitivity examination of YOLOv4 regarding test image distortion and training dataset attribute for apple flower bud classification [Dataset]. http://doi.org/10.6084/m9.figshare.20047313.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20047313.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Wenan Yuan; Daeun Choi; Dimitrios Bolkas; Paul Heinz Heinemann; Long He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Applications of convolutional neural network (CNN)-based object detectors in agriculture have been a popular research topic in recent years. However, complicated agricultural environments bring many difficulties for ground truth annotation as well as potential uncertainties for image data quality. Using YOLOv4 as a representation of state-of-the-art object detectors, this study quantified YOLOv4’s sensitivity against artificial image distortions including white noise, motion blur, hue shift, saturation change, and intensity change, and examined the importance of various training dataset attributes based on model classification accuracies, including dataset size, label quality, negative sample presence, image sequence, and image distortion levels. The YOLOv4 model trained and validated on the original datasets failed at 31.91% white noise, 22.05-pixel motion blur, 77.38° hue clockwise shift, 64.81° hue counterclockwise shift, 89.98% saturation decrease, 895.35% saturation increase, 79.80% intensity decrease, and 162.71% intensity increase with 30% mean average precisions (mAPs) for four apple flower bud growth stages. The performance of YOLOv4 decreased with both declining training dataset size and training image label quality. Negative samples and training image sequence did not make a substantial difference in model performance. Incorporating distorted images during training improved the classification accuracies of YOLOv4 models on noisy test datasets by 13 to 390%. In the context of apple flower bud growth-stage classification, except for motion blur, YOLOv4 is sufficiently robust for potential image distortions by white noise, hue shift, saturation change, and intensity change in real life. Training image label quality and training instance number are more important factors than training dataset size. Exposing models to test-image-alike training images is crucial for optimal model classification accuracies. The study enhances understanding of implementing object detectors in agricultural research.
A
AI Data Labeling Service Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). AI Data Labeling Service Report [Dataset]. https://www.marketreportanalytics.com/reports/ai-data-labeling-service-72370
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Apr 9, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The AI Data Labeling Services market is experiencing rapid growth, driven by the increasing demand for high-quality training data to fuel advancements in artificial intelligence. The market, estimated at $10 billion in 2025, is projected to witness a robust Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching a substantial market size. This expansion is fueled by several key factors. The automotive industry leverages AI data labeling for autonomous driving systems, while healthcare utilizes it for medical image analysis and diagnostics. The retail and e-commerce sectors benefit from improved product recommendations and customer service through AI-powered chatbots and image recognition. Agriculture is employing AI data labeling for precision farming and crop monitoring. Furthermore, the increasing adoption of cloud-based solutions offers scalability and cost-effectiveness, bolstering market growth. While data security and privacy concerns present challenges, the ongoing development of innovative techniques and the rising availability of skilled professionals are mitigating these restraints. The market is segmented by application (automotive, healthcare, retail & e-commerce, agriculture, others) and type (cloud-based, on-premises), with cloud-based solutions gaining significant traction due to their flexibility and accessibility. Key players like Scale AI, Labelbox, and Appen are actively shaping market dynamics through technological innovations and strategic partnerships. The North American market currently holds a significant share, but regions like Asia Pacific are poised for substantial growth due to increasing AI adoption and technological advancements. The competitive landscape is dynamic, characterized by both established players and emerging startups. While larger companies possess substantial resources and experience, smaller, agile companies are innovating with specialized solutions and niche applications. Future growth will likely be influenced by advancements in data annotation techniques (e.g., synthetic data generation), increasing demand for specialized labeling services (e.g., 3D point cloud labeling), and the expansion of AI applications across various industries. The continued development of robust data governance frameworks and ethical considerations surrounding data privacy will play a critical role in shaping the market's trajectory in the coming years. Regional growth will be influenced by factors such as government regulations, technological infrastructure, and the availability of skilled labor. Overall, the AI Data Labeling Services market presents a compelling opportunity for growth and investment in the foreseeable future.

Global Image Annotation Service Market Research Report: By Service Type...

wiseguyreports.com

Updated Jul 23, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Image Annotation Service Market Research Report: By Service Type (Data Annotation, Image Enhancement, Image Segmentation, Object Detection, Image Classification), By Application (Automotive, Healthcare, Retail, Agriculture, Manufacturing), By Technology (Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Artificial Intelligence), By End-User Industry (E-commerce, Media and Entertainment, IT and Telecom, Transportation and Logistics, Education) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/image-annotation-service-market

Explore at:

Dataset updated

Jul 23, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 7, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	5.22(USD Billion)
MARKET SIZE 2024	5.9(USD Billion)
MARKET SIZE 2032	15.7(USD Billion)
SEGMENTS COVERED	Service Type ,Application ,Technology ,End-User Industry ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	AI and ML advancements Selfdriving car technology Growing healthcare applications Increasing image content Automation and efficiency
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Scale AI ,Anolytics ,Sama ,Hive ,Keymakr ,Mighty AI ,Labelbox ,SuperAnnotate ,TaskUs ,Veritone ,Cogito Tech ,CloudFactory ,Appen ,Figure Eight ,Lionbridge AI
MARKET FORECAST PERIOD	2024 - 2032
KEY MARKET OPPORTUNITIES	1 Advancements in AI and ML 2 Rising demand from ecommerce 3 Growth in autonomous vehicles 4 Increasing focus on data privacy 5 Emergence of cloudbased annotation tools
COMPOUND ANNUAL GROWTH RATE (CAGR)	13.01% (2024 - 2032)

Z
RafanoSet: Dataset of raw, manual and automatically annotated Raphanus...
data.niaid.nih.gov
zenodo.org
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarghini, Fabrizio (2024). RafanoSet: Dataset of raw, manual and automatically annotated Raphanus Raphanistrum weed images for object detection and segmentation in Heterogenous Agriculture Environment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10567783
Explore at:
Dataset updated
Apr 8, 2024
Dataset provided by
Sarghini, Fabrizio
Rana, Shubham
Carillo, Petronia
Crimaldi, Mariano
Barretta, Domenico
Cirillo, Valerio
Maggio, Albino
Gerbino, Salvatore
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a collection of raw and annotated Multispectral (MS) images acquired in a heterogenous agricultural environment with MicaSense RedEdge-M camera. The spectra particularly Green, Blue, Red, Red Edge and Near Infrared (NIR) were acquired at sub-metre level.. The MS images were labelled manually using VIA and automatically using Grounding DINO in combination with Segment Anything Model. The segmentation masks obtained using these two annotation techniqes over as well as the source code to perform necessary image processing operations are provided in the repository. The images are focussed over Horseradish (Raphanus Raphanistrum) infestations in Triticum Aestivum (wheat) crops.

The nomenclature of sequecncing and naming images and annotations has been in this format: IMG_1: Blue_2: Green_3: Red_4: Near Infrared_5: RedEdgeExample: An image name IMG_0200_3 represents the scene number 200 in Red channel

This dataset 'RafanoSet'is categorized in 6 directories namely 'Raw Images', 'Manual Annotations', 'Automated Annotations', 'Binary Masks - Manual', 'Binary Masks - Automated' and 'Codes'. The sub-directory 'Raw Images' consists of manually acquired 85 images in .PNG format. over 17 different scenes. The sub-directory 'Manual Annotations' consists of annotation file 'region_data' in COCO segmentation format. The sub-directory 'Automated Annotations' consists of 80 automatically annotated images in .JPG format and 80 .XML files in Pascal VOC annotation format.

The scientific framework of image acquisition and annotations are explained in the Data in Brief paper which is the course of peer review. This is just a prerequisite to the data article. Field experimentation roles:

The image acquisition was performed by Mariano Crimaldi, a researcher, on behalf of Department of Agriculture and the hosting institution University of Naples Federico II, Italy.

Shubham Rana has been the curator and analyst for the data under the supervision of his PhD supervisor Prof. Salvatore Gerbino. They are affiliated with Department of Engineering, University of Campania 'Luigi Vanvitelli'.

Domenico Barretta, Department of Engineering has been associated in consulting and brainstorming role particularly with data validation, annotation management and litmus testing of the datasets.
R
Agriculture Data Dataset
universe.roboflow.com
zip
Updated Jun 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Master Training DL Data Gen (2024). Agriculture Data Dataset [Dataset]. https://universe.roboflow.com/master-training-dl-data-gen/agriculture-data/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 18, 2024
Dataset authored and provided by
Master Training DL Data Gen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Crops Plants Agriculture Bounding Boxes
Description
Agriculture Data

## Overview Agriculture Data is a dataset for object detection tasks - it contains Crops Plants Agriculture annotations for 270 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
R
Data from: Annotation data about Multi Criteria Assessment Methods used in...
entrepot.recherche.data.gouv.fr
tsv
Updated Jul 30, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marie Thiollet-Scholtus; Christian Bockstaller; Marie Thiollet-Scholtus; Christian Bockstaller (2019). Annotation data about Multi Criteria Assessment Methods used in Science for Action and Development: the French National Institute for Agricultural Research (INRA-SAD) experience [Dataset]. http://doi.org/10.15454/TW5WAX
Explore at:
tsv(5994)Available download formats
Unique identifier
https://doi.org/10.15454/TW5WAX
Dataset updated
Jul 30, 2019
Dataset provided by
Recherche Data Gouv
Authors
Marie Thiollet-Scholtus; Christian Bockstaller; Marie Thiollet-Scholtus; Christian Bockstaller
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
This data article contains annotation data characterizing Multi Criteria Assessment Methods proposed in the scientific literature by INRA researchers belonging to the Science for Action and Development department. Its interdisciplinary approach combines agricultural, ecological and human and social sciences.
V
Video Annotation Service Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Video Annotation Service Report [Dataset]. https://www.datainsightsmarket.com/reports/video-annotation-service-1385419
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
May 18, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global video annotation service market is experiencing robust growth, driven by the escalating demand for high-quality training data in the artificial intelligence (AI) and machine learning (ML) sectors. The market's expansion is fueled by the proliferation of applications across diverse industries, including medical imaging analysis, autonomous vehicle development (transportation), precision agriculture, and retail analytics. The increasing adoption of computer vision technologies and the need for accurate, labeled video data to train these systems are major catalysts. While precise market sizing requires specific data, a reasonable estimation based on industry reports and the provided information (considering a potential CAGR of 20-25% which is common for rapidly growing tech sectors) would place the 2025 market value at approximately $2.5 Billion, projected to reach $7 Billion by 2033. The market is segmented by application (medical, transportation, agriculture, retail, others) and type of annotation service (video classification, video management, video tagging, video analysis, others). The North American market currently holds a significant share, followed by Europe and Asia Pacific. However, developing economies in Asia Pacific are showing rapid growth potential, driven by increasing digitalization and investments in AI. Key restraints to market growth include the high cost of annotation, the requirement for specialized skills and expertise, and concerns regarding data privacy and security. Nevertheless, the increasing availability of sophisticated annotation tools, the emergence of crowdsourcing platforms, and advancements in automation technologies are progressively mitigating these challenges. The future landscape of the video annotation service market is poised for significant expansion, particularly with the growing adoption of AI in various sectors and continuous innovation in video annotation techniques. This will lead to increased competition amongst the numerous providers mentioned: Acclivis, Ai-workspace, GTS, HabileData, iMerit, Keymakr, LXT, Mindy Support, Sama, Shaip, SunTec, TaskUs, Tasq, and Triyock, driving further market evolution and refinement of services.
Data from: Genome sequence resource of Streptomyces spp. from agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Genome sequence resource of Streptomyces spp. from agricultural soil that inhibit Verticillium dahliae [Dataset]. https://catalog.data.gov/dataset/data-from-genome-sequence-resource-of-streptomyces-spp-from-agricultural-soil-that-inhibit-6244e
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
Annotation data of genome assemblies of Streptomyces spp. isolated from agricultural soil. Resources in this dataset:Resource Title: Annotation data for MCL20-2. File Name: MCL20-2_prokka.zipResource Title: Annotation data for SCL15-4. File Name: SCL15-4_prokka.zipResource Title: Annotation data for SCL15-6. File Name: SCL15-6_prokka.zipResource Title: Annotation data for SJL17-1. File Name: SJL17-1_prokka.zipResource Title: Annotation data for SJL17-4. File Name: SJL17-4_prokka.zip
d
Data from: Data for "RumexWeeds: A Grassland Dataset for Agricultural...
data.dtu.dk
bin
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronja Güldenring; Jiahao Li; Frits van Evert; Lazaros Nalpantidis (2023). Data for "RumexWeeds: A Grassland Dataset for Agricultural Robotics" [Dataset]. http://doi.org/10.11583/DTU.17040518.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.17040518.v2
Dataset updated
Sep 22, 2023
Dataset provided by
Technical University of Denmark
Authors
Ronja Güldenring; Jiahao Li; Frits van Evert; Lazaros Nalpantidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The real-world dataset RumexWeeds targets the detection of the grassland weeds: Rumex obtusifolius L. and Rumex crispus L.. RumexWeeds includes whole image sequences with totally 5,510 images of 2.3 MP resolution and 15,519 manual bounding box annotations as well as 340 ground truth pixels-wise annotations, collected at 3 different farms and 4 different days in summer and autumn 2021. Additionally, navigational robot sensor points from GNSS, IMU and odometry are recorded.In a second iteration, we supplement the dataset with joint stem annotation: For each bounding box in the dataset, an ellipse annotation has been performed, representing the potential joint-stem position and the uncertainty of the human annotator.For a detailed description, please consider the related publications as well as the datasets website: https://dtu-pas.github.io/RumexWeeds/
m
Annotated Sugarcane Plants
data.mendeley.com
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Talha Ubaid (2024). Annotated Sugarcane Plants [Dataset]. http://doi.org/10.17632/ydr8vgg64w.2
Explore at:
Unique identifier
https://doi.org/10.17632/ydr8vgg64w.2
Dataset updated
May 30, 2024
Authors
Talha Ubaid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ubaid, M.T.; Javaid, S. Precision Agriculture: Computer Vision-Enabled Sugarcane Plant Counting in the Tillering Phase. Journal of Imaging 2024, 10, 102. https://doi.org/10.3390/jimaging10050102

Description

Plant annotation is the process of identifying and naming certain aspects or characteristics of plant species, usually for research, categorization, or agriculture. This technique is frequently done out manually by specialists or using automated systems that employ picture recognition technologies. Annotations give useful information on plants' morphology, phenology, diseases, and genetic characteristics. They may include labels for anatomical structures. Annotations may also include categorizing plants based on their development stage, health status, or species identification. Plant annotations are used in agriculture to monitor crop development, detect pests and diseases, optimize cultivation practices, and improve production estimates. Additionally, annotated plant datasets are useful resources for training machine learning models for automated plant recognition and analysis tasks.

The images were labeled using the labeling tool "labelImg". The cane under the leaves was labeled. Annotating the images was difficult because the cane section was so little. Labeling needs care and accuracy while drawing a bounded box around the cane. For 175 photos of data, around 18650 bounding boxes were drawn. The bounding boxes were allocated the class name "sugarcane".
R
Data from: Annotation data about Multi Criteria Assessment Methods used in...
entrepot.recherche.data.gouv.fr
tsv
Updated Jul 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aude Alaphilippe; Catherine Collet; Aude Alaphilippe; Catherine Collet (2019). Annotation data about Multi Criteria Assessment Methods used in Plant Health and Environment: the French National Institute for Agricultural Research (INRA-SPE) experience [Dataset]. http://doi.org/10.15454/3SI1GB
Explore at:
tsv(5994)Available download formats
Unique identifier
https://doi.org/10.15454/3SI1GB
Dataset updated
Jul 30, 2019
Dataset provided by
Recherche Data Gouv
Authors
Aude Alaphilippe; Catherine Collet; Aude Alaphilippe; Catherine Collet
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
This data article contains annotation data characterizing Multi Criteria Assessment Methods proposed in the scientific literature by INRA researchers belonging to the Plant Health and Environment department. Its research aims to contribute to the development of a productive but environmentally safer agriculture by producing both academic and operational knowledge, by providing methods and tools for crop protection, risk and impact assessment, and by contributing to professional and public education.
Cattle (Cows) Object Detection Dataset
kaggle.com
Updated Aug 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). Cattle (Cows) Object Detection Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/cows-detection-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Cows Object Detection Dataset

The dataset is a collection of images along with corresponding bounding box annotations that are specifically curated for detecting cows in images. The dataset covers different cows breeds, sizes, and orientations, providing a comprehensive representation of cows appearances and positions. Additionally, the visibility of each cow is presented in the .xml file.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

The cow detection dataset offers a diverse collection of annotated images, allowing for comprehensive algorithm development, evaluation, and benchmarking, ultimately aiding in the development of accurate and robust models.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fc1495731b6dff54b97ba132fc8d36fd9%2FMacBook%20Air%20-%201.png?generation=1692031830924617&alt=media" alt="">

Dataset structure

images - contains of original images of cows

boxes - includes bounding box labeling for the original images

annotations.xml - contains coordinates of the bounding boxes and labels, created for the original photo

Data Format

Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes for cows detection. For each point, the x and y coordinates are provided. Visibility of the cow is also provided by the label is_visible (true, false).

Example of XML file structure

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F7a0f0bd6a019e945074361896d27ee90%2Fcarbon%20(1).png?generation=1692032268744062&alt=media" alt="">

Cows Detection might be made in accordance with your requirements.

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

TrainingData provides high-quality data annotation tailored to your needs

keywords: farm animal, animal recognition, farm animal detection, image-based recognition, farmers, “on-farm” data, cows detection, cow images dataset, object detection, deep learning, computer vision, animal contacts, images dataset, agriculture, multiple animal pose estimation, cattle detection, identification, posture recognition, cattle images, individual beef cattle, cattle ranch, dairy cattle, farming, bounding boxes
MegaWeeds dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sophie Wildeboer; Sophie Wildeboer (2025). MegaWeeds dataset [Dataset]. http://doi.org/10.5281/zenodo.8077195
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8077195
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sophie Wildeboer; Sophie Wildeboer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The MegaWeeds dataset consists of seven existing datasets:

- WeedCrop dataset; Sudars, K., Jasko, J., Namatevs, I., Ozola, L., & Badaukis, N. (2020). Dataset of annotated food crops and weed images for robotic computer vision control. Data in Brief, 31, 105833. https://doi.org/https://doi.org/10.1016/j.dib.2020.105833

- Chicory dataset; Gallo, I., Rehman, A. U., Dehkord, R. H., Landro, N., La Grassa, R., & Boschetti, M. (2022). Weed detection by UAV 416a Image Dataset. https://universe.roboflow.com/chicory-crop-weeds-5m7vo/weed-detection-by-uav-416a/dataset/1

- Sesame dataset; Utsav, P., Raviraj, P., & Rayja, M. (2020). crop and weed detection data with bounding boxes. https://www.kaggle.com/datasets/ravirajsinh45/crop-and-weed-detection-data-with-bounding-boxes

- Sugar beet dataset; Wangyongkun. (2020). sugarbeetsAndweeds. https://www.kaggle.com/datasets/wangyongkun/sugarbeetsandweeds

- Weed-Detection-v2; Tandon, K. (2021, June). Weed_Detection_v2. https://www.kaggle.com/datasets/kushagratandon12/weed-detection-v2

- Maize dataset; Correa, J. M. L., D. Andújar, M. Todeschini, J. Karouta, JM Begochea, & Ribeiro A. (2021). WeedMaize. Zenodo. https://doi.org/10.5281/ZENODO.5106795

- CottonWeedDet12 dataset; Dang, F., Chen, D., Lu, Y., & Li, Z. (2023). YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems. Computers and Electronics in Agriculture, 205, 107655. https://doi.org/https://doi.org/10.1016/j.compag.2023.107655

All the datasets contain open-field images from crops and weeds with annotations. The annotation files were converted to text files so it can be used in the YOLO model. All the datasets were combined into one big dataset with in total 19,317 images. The dataset is split into a training and validation set.
A
AI Data Labeling Service Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). AI Data Labeling Service Report [Dataset]. https://www.marketreportanalytics.com/reports/ai-data-labeling-service-72373
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Apr 9, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The AI data labeling services market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across various sectors. The market's expansion is fueled by the critical need for high-quality labeled data to train and improve the accuracy of AI algorithms. While precise figures for market size and CAGR are not provided, industry reports suggest a significant market value, potentially exceeding $5 billion by 2025, with a Compound Annual Growth Rate (CAGR) likely in the range of 25-30% from 2025-2033. This rapid growth is attributed to several factors, including the proliferation of AI applications in autonomous vehicles, healthcare diagnostics, e-commerce personalization, and precision agriculture. The increasing availability of cloud-based solutions is also contributing to market expansion, offering scalability and cost-effectiveness for businesses of all sizes. However, challenges remain, such as the high cost of data annotation, the need for skilled labor, and concerns around data privacy and security. The market is segmented by application (automotive, healthcare, retail, agriculture, others) and type (cloud-based, on-premises), with the cloud-based segment expected to dominate due to its flexibility and accessibility. Key players like Scale AI, Labelbox, and Appen are driving innovation and market consolidation through technological advancements and strategic acquisitions. Geographic growth is expected across all regions, with North America and Asia-Pacific anticipated to lead in market share due to high AI adoption rates and significant investments in technological infrastructure. The competitive landscape is dynamic, featuring both established players and emerging startups. Strategic partnerships and mergers and acquisitions are common strategies for market expansion and technological enhancement. Future growth hinges on advancements in automation technologies that reduce the cost and time associated with data labeling. Furthermore, the development of more robust and standardized quality control metrics will be crucial for assuring the accuracy and reliability of labeled datasets, which is crucial for building trust and furthering adoption of AI-powered applications. The focus on addressing ethical considerations around data bias and privacy will also play a critical role in shaping the market's future trajectory. Continued innovation in both the technology and business models within the AI data labeling services sector will be vital for sustaining the high growth projected for the coming decade.
u
Data from: AgBase
agdatacommons.nal.usda.gov
bin
Updated Feb 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fiona McCarthy; Shane Burgess; Cathy Gresham; Jinhui Zhang; Mike Rice; Mais Ammari (2024). AgBase [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/AgBase/24853197
Explore at:
binAvailable download formats
Dataset updated
Feb 13, 2024
Dataset provided by
University of Arizona; Mississippi State University
Authors
Fiona McCarthy; Shane Burgess; Cathy Gresham; Jinhui Zhang; Mike Rice; Mais Ammari
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
AgBase Version 2.0 is a curated, open-source, Web-accessible resource for functional analysis of agricultural plant and animal gene products including gene ontology annotations. Its long-term goal is to serve the needs of the agricultural research communities by facilitating post-genome biology for agriculture researchers and for those researchers primarily using agricultural species as biomedical models. AgBase uses controlled vocabularies developed by the Gene Ontology (GO) Consortium to describe molecular function, biological process, and cellular component for genes and gene products in agricultural species. For more information about the AgBase database visit the Educational Resources page or refer to the AgBase publications. AgBase will also accept annotations from any interested party in the research communities. AgBase develops freely available tools for functional analysis, including tools for using GO. AgBase provides resources to facilitate modeling of functional genomics data and structural and functional annotation of agriculturally important animal, plant, microbe and parasite genomes. The website provides Text, BLAST, Taxonomy, and Gene Ontology search functions, and dedicated pages for Animals (channel catfish, cat, chick, bovine, daphnia, dog, horse, pig, salmon, sheep, trout, turkey), Plants (cotton, maize, Miscanthus, pine, poplar, rice, soybean), Microbes (26 taxa), and Parasites (10 taxa). AgBase currently provides 2,069,320 Gene Ontology (GO) annotations to 394,599 gene products in 534 different taxons, including GO annotations linked to transcripts represented on agricultural microarrays. For many of these arrays, this provides the only functional annotation available. GO annotations are available for download and AgBase provides comprehensive, species-specific GO annotation files for a variety of animal and plant organisms. AgBase hosts several associated databases and provide genome browsers for agricultural pathogens. Comprehensive training resources (including worked examples and tutorials) are available via links to Educational Resources at the AgBase website. Resources in this dataset:Resource Title: Website Pointer to AgBase [Version 2.0]. File Name: Web Page, url: https://agbase.arizona.edu/index.html Provides agricultural plant and animal Gene Ontology (GO) annotation search options, related tools, and genome browsers.
R
Data from: Annotation data about Multi Criteria Assessment Methods used in...
entrepot.recherche.data.gouv.fr
tsv
Updated Jul 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Régis Sabbadin; Geneviève Gésan-Guiziou; Régis Sabbadin; Geneviève Gésan-Guiziou (2019). Annotation data about Multi Criteria Assessment Methods used in Animal Health: the French National Institute for Agricultural Research (INRA-SA) experience [Dataset]. http://doi.org/10.15454/UTACZ6
Explore at:
tsv(5994)Available download formats
Unique identifier
https://doi.org/10.15454/UTACZ6
Dataset updated
Jul 30, 2019
Dataset provided by
Recherche Data Gouv
Authors
Régis Sabbadin; Geneviève Gésan-Guiziou; Régis Sabbadin; Geneviève Gésan-Guiziou
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
This data article contains annotation data characterizing Multi Criteria Assessment Methods proposed in the scientific literature by INRA researchers belonging to the Animal Health department. Its research is dedicated to animal health and veterinary public health.
m
Dataset of Study on the mechanism of salt relief and growth promotion of...
data.mendeley.com
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue Haitao (2023). Dataset of Study on the mechanism of salt relief and growth promotion of Enterobacter cloacae on cotton [Dataset]. http://doi.org/10.17632/s2rfjht5xk.1
Explore at:
Unique identifier
https://doi.org/10.17632/s2rfjht5xk.1
Dataset updated
Sep 7, 2023
Authors
Yue Haitao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw transcriptome sequencing data of the study entitled "Study on the mechanism of salt relief and growth promotion of Enterobacter cloacae on cotton"

Facebook

Twitter

Click to copy link

Link copied

Cite

Jonathan Klein; Rebekah Waller; Sören Pirk; Wojtek Pałubicki; Mark Tester; Dominik L. Michels (2024). DataSheet1_Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.zip [Dataset]. http://doi.org/10.3389/fpls.2024.1360113.s001

DataSheet1_Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.zip

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.3389/fpls.2024.1360113.s001

Dataset updated

Sep 16, 2024

Dataset provided by

Frontiers

Authors

Jonathan Klein; Rebekah Waller; Sören Pirk; Wojtek Pałubicki; Mark Tester; Dominik L. Michels

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.

Clear search

Close search

Google apps

Main menu

DataSheet1_Synthetic data at scale: a development model to efficiently...

Data from: X-ray CT data with semantic annotations for the paper "A workflow...

Data from: Annotation data about Multi Criteria Assessment Methods used in...

Data from: Sensitivity examination of YOLOv4 regarding test image distortion...

AI Data Labeling Service Report

Global Image Annotation Service Market Research Report: By Service Type...

RafanoSet: Dataset of raw, manual and automatically annotated Raphanus...

Agriculture Data Dataset

Agriculture Data

Data from: Annotation data about Multi Criteria Assessment Methods used in...

Video Annotation Service Report

Data from: Genome sequence resource of Streptomyces spp. from agricultural...

Data from: Data for "RumexWeeds: A Grassland Dataset for Agricultural...

Annotated Sugarcane Plants

Data from: Annotation data about Multi Criteria Assessment Methods used in...

Cattle (Cows) Object Detection Dataset

Cows Object Detection Dataset

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on TrainingData to buy the dataset

Dataset structure

Data Format

Example of XML file structure

Cows Detection might be made in accordance with your requirements.

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

TrainingData provides high-quality data annotation tailored to your needs

MegaWeeds dataset

AI Data Labeling Service Report

Data from: AgBase

Data from: Annotation data about Multi Criteria Assessment Methods used in...

Dataset of Study on the mechanism of salt relief and growth promotion of...

DataSheet1_Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.zip