Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The booming video annotation service market is projected to reach $7 Billion by 2033, driven by AI and ML advancements. Explore key trends, applications (medical, autonomous vehicles, agriculture), top companies, and regional insights in this comprehensive market analysis.
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
This data article contains annotation data characterizing Multi Criteria Assessment Methods proposed in the scientific literature by INRA researchers belonging to the Social Science, Agriculture and Food, Rural Development and Environment department. Those researchs aim to on the one hand, to understand the functioning and social and economic developments of agriculture, food processing industries, agribusinesses, food with close links to local and global environmental stakes, and on the other hand, to shed light on public debates and public and private decisions.
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The AI Data Labeling Services market is experiencing rapid growth, driven by the increasing demand for high-quality training data to fuel advancements in artificial intelligence. The market, estimated at $10 billion in 2025, is projected to witness a robust Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching a substantial market size. This expansion is fueled by several key factors. The automotive industry leverages AI data labeling for autonomous driving systems, while healthcare utilizes it for medical image analysis and diagnostics. The retail and e-commerce sectors benefit from improved product recommendations and customer service through AI-powered chatbots and image recognition. Agriculture is employing AI data labeling for precision farming and crop monitoring. Furthermore, the increasing adoption of cloud-based solutions offers scalability and cost-effectiveness, bolstering market growth. While data security and privacy concerns present challenges, the ongoing development of innovative techniques and the rising availability of skilled professionals are mitigating these restraints. The market is segmented by application (automotive, healthcare, retail & e-commerce, agriculture, others) and type (cloud-based, on-premises), with cloud-based solutions gaining significant traction due to their flexibility and accessibility. Key players like Scale AI, Labelbox, and Appen are actively shaping market dynamics through technological innovations and strategic partnerships. The North American market currently holds a significant share, but regions like Asia Pacific are poised for substantial growth due to increasing AI adoption and technological advancements. The competitive landscape is dynamic, characterized by both established players and emerging startups. While larger companies possess substantial resources and experience, smaller, agile companies are innovating with specialized solutions and niche applications. Future growth will likely be influenced by advancements in data annotation techniques (e.g., synthetic data generation), increasing demand for specialized labeling services (e.g., 3D point cloud labeling), and the expansion of AI applications across various industries. The continued development of robust data governance frameworks and ethical considerations surrounding data privacy will play a critical role in shaping the market's trajectory in the coming years. Regional growth will be influenced by factors such as government regulations, technological infrastructure, and the availability of skilled labor. Overall, the AI Data Labeling Services market presents a compelling opportunity for growth and investment in the foreseeable future.
Facebook
Twitterhttps://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
| BASE YEAR | 2024 |
| HISTORICAL DATA | 2019 - 2023 |
| REGIONS COVERED | North America, Europe, APAC, South America, MEA |
| REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
| MARKET SIZE 2024 | 1.57(USD Billion) |
| MARKET SIZE 2025 | 1.8(USD Billion) |
| MARKET SIZE 2035 | 7.0(USD Billion) |
| SEGMENTS COVERED | Application, Type of Annotation, Deployment Model, End Use, Regional |
| COUNTRIES COVERED | US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA |
| KEY MARKET DYNAMICS | Increased demand for ML models, Growing reliance on automation, Need for high-quality labeled data, Expansion in AI applications, Rising investment in AI technologies |
| MARKET FORECAST UNITS | USD Billion |
| KEY COMPANIES PROFILED | DataLabeling, Amazon Mechanical Turk, Mighty AI, Tractable, CloudFactory, Lionbridge AI, Roboflow, Trelent, iMerit, Tagbox, Vannotation, CVEDIA, Scale AI, Samasource, Appen, Turing |
| MARKET FORECAST PERIOD | 2025 - 2035 |
| KEY MARKET OPPORTUNITIES | Automated annotation tools demand, Increased AI adoption across industries, Demand for high-quality labeled datasets, Expansion in autonomous vehicle sector, Custom annotation solutions for niche markets |
| COMPOUND ANNUAL GROWTH RATE (CAGR) | 14.5% (2025 - 2035) |
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
This data article contains annotation data characterizing Multi Criteria Assessment Methods proposed in the scientific literature by INRA researchers belonging to the Science for Action and Development department. It develops as primary mission of producing generic and finalised information, and developing methods, tools and knowhow in its fields of competence which are mathematics and informatics applied to the sectors of food, agriculture and the environment.
Facebook
Twitter
As per our latest research, the global WSI Annotation Services market size stood at USD 1.42 billion in 2024, reflecting robust expansion driven by advancements in artificial intelligence and machine learning applications across diverse sectors. The market is expected to grow at a CAGR of 23.7% from 2025 to 2033, reaching a forecasted value of USD 11.19 billion by 2033. The primary growth factor fueling this remarkable trajectory is the surging demand for high-quality annotated data to train sophisticated AI models, particularly in sectors like autonomous vehicles, healthcare diagnostics, and retail automation.
One of the most significant growth drivers for the WSI Annotation Services market is the escalating adoption of AI-powered solutions across industries. As artificial intelligence becomes increasingly integral to business processes and consumer products, the necessity for accurately annotated data has soared. Companies are leveraging WSI annotation services to enhance the precision of machine learning algorithms, particularly in image, text, video, and audio data domains. This trend is particularly pronounced in sectors such as autonomous vehicles, where annotated data is essential for object detection and navigation, and in healthcare, where annotated medical images underpin diagnostic AI tools. The proliferation of digital transformation initiatives and the need to process large volumes of unstructured data further amplify the market’s expansion.
Another critical growth factor is the rapid evolution of data annotation technologies and methodologies. The market has witnessed substantial investments in automation tools, cloud-based platforms, and AI-assisted annotation frameworks that streamline the annotation process, enhance accuracy, and reduce turnaround times. These advancements are making WSI annotation services more accessible and cost-effective for organizations of all sizes, from startups to large enterprises. Furthermore, the growing emphasis on data privacy and regulatory compliance has spurred the adoption of secure, on-premises, and hybrid deployment models, broadening the market’s appeal across highly regulated industries such as BFSI and healthcare. The integration of advanced quality control mechanisms and scalable annotation workflows has further reinforced market growth.
The increasing focus on industry-specific applications is also propelling the WSI Annotation Services market forward. In retail and e-commerce, for instance, annotated data is pivotal for developing recommendation engines, visual search tools, and customer sentiment analysis. In agriculture, annotation services enable the deployment of precision farming technologies by facilitating crop and livestock monitoring through annotated images and sensor data. The security and surveillance sector is leveraging annotation for facial recognition, anomaly detection, and threat assessment. This diversification of use cases is driving demand for specialized annotation services tailored to the unique requirements of each industry, thereby expanding the market’s scope and value proposition.
From a regional perspective, North America continues to dominate the WSI Annotation Services market, accounting for the largest revenue share in 2024, closely followed by Europe and the Asia Pacific. The presence of leading technology companies, robust digital infrastructure, and a mature AI ecosystem are key factors underpinning North America’s leadership. However, the Asia Pacific region is emerging as the fastest-growing market, fueled by rapid digitization, increasing investments in AI research, and the proliferation of tech startups. Europe’s market growth is supported by strong regulatory frameworks and a focus on ethical AI development. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions increasingly adopt AI-driven solutions.
The WSI Annotation Services market by
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The dataset consist of aerial photography of agricultural plantations with crops such as cabbage and zucchini. The dataset addresses agricultural tasks such as plant detection and counting, health assessment, and irrigation planning.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F5fa7e8e62e793dac70dc9e1db6f60a18%2F66666.png?generation=1685972525147537&alt=media" alt="">
The dataset includes two types of segmentation: - Class Segmentation - objects corresponding to one class are identified - Object Segmentation - all objects are identified separately
Each image from img folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the polygons. For each point, the x and y coordinates are provided.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4107d573b14b40ee2c9c67727ab9ec87%2Fcarbon%20(6).png?generation=1686129907313187&alt=media" alt="">
🚀 You can learn more about our high-quality unique datasets here
keywords: agricultural tasks dataset, image segmentation dataset, plantations images dataset, plantations segmentation dataset, land cover dataset, agricultural products dataset, semantic segmentation dataset, agriculture dataset, agricultural data, object detection dataset, plants segmentation dataset, plant detection, plant recognition
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The AI Data Labeling Services market is booming, projected to reach $40B+ by 2033! Learn about market trends, key players (Scale AI, Labelbox, Appen), and growth drivers in this comprehensive analysis. Explore regional insights and understand the impact of cloud-based solutions on this rapidly evolving sector.
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The AI data labeling services market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across various sectors. The market's expansion is fueled by the critical need for high-quality labeled data to train and improve the accuracy of AI algorithms. While precise figures for market size and CAGR are not provided, industry reports suggest a significant market value, potentially exceeding $5 billion by 2025, with a Compound Annual Growth Rate (CAGR) likely in the range of 25-30% from 2025-2033. This rapid growth is attributed to several factors, including the proliferation of AI applications in autonomous vehicles, healthcare diagnostics, e-commerce personalization, and precision agriculture. The increasing availability of cloud-based solutions is also contributing to market expansion, offering scalability and cost-effectiveness for businesses of all sizes. However, challenges remain, such as the high cost of data annotation, the need for skilled labor, and concerns around data privacy and security. The market is segmented by application (automotive, healthcare, retail, agriculture, others) and type (cloud-based, on-premises), with the cloud-based segment expected to dominate due to its flexibility and accessibility. Key players like Scale AI, Labelbox, and Appen are driving innovation and market consolidation through technological advancements and strategic acquisitions. Geographic growth is expected across all regions, with North America and Asia-Pacific anticipated to lead in market share due to high AI adoption rates and significant investments in technological infrastructure. The competitive landscape is dynamic, featuring both established players and emerging startups. Strategic partnerships and mergers and acquisitions are common strategies for market expansion and technological enhancement. Future growth hinges on advancements in automation technologies that reduce the cost and time associated with data labeling. Furthermore, the development of more robust and standardized quality control metrics will be crucial for assuring the accuracy and reliability of labeled datasets, which is crucial for building trust and furthering adoption of AI-powered applications. The focus on addressing ethical considerations around data bias and privacy will also play a critical role in shaping the market's future trajectory. Continued innovation in both the technology and business models within the AI data labeling services sector will be vital for sustaining the high growth projected for the coming decade.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a collection of raw and annotated Multispectral (MS) images acquired in a heterogenous agricultural environment with MicaSense RedEdge-M camera. The spectra particularly Green, Blue, Red, Red Edge and Near Infrared (NIR) were acquired at sub-metre level..
The MS images were labelled manually using VIA and automatically using Grounding DINO in combination with Segment Anything Model. The segmentation masks obtained using these two annotation techniqes over as well as the source code to perform necessary image processing operations are provided in the repository. The images are focussed over Horseradish (Raphanus Raphanistrum) infestations in Triticum Aestivum (wheat) crops.
The nomenclature of sequecncing and naming images and annotations has been in this format: IMG_
This dataset 'RafanoSet'is categorized in 6 directories namely 'Raw Images', 'Manual Annotations', 'Automated Annotations', 'Binary Masks - Manual', 'Binary Masks - Automated' and 'Codes'. The sub-directory 'Raw Images' consists of manually acquired 85 images in .PNG format. over 17 different scenes. The sub-directory 'Manual Annotations' consists of annotation file 'region_data' in COCO segmentation format. The sub-directory 'Automated Annotations' consists of 80 automatically annotated images in .JPG format and 80 .XML files in Pascal VOC annotation format.
The scientific framework of image acquisition and annotations are explained in the Data in Brief paper which is the course of peer review. This is just a prerequisite to the data article.
Field experimentation roles:
The image acquisition was performed by Mariano Crimaldi, a researcher, on behalf of Department of Agriculture and the hosting institution University of Naples Federico II, Italy.
Shubham Rana has been the curator and analyst for the data under the supervision of his PhD supervisor Prof. Salvatore Gerbino. They are affiliated with Department of Engineering, University of Campania 'Luigi Vanvitelli'.
Domenico Barretta, Department of Engineering has been associated in consulting and brainstorming role particularly with data validation, annotation management and litmus testing of the datasets.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains high-resolution synthetic drone imagery of agricultural fields, each image enriched with detailed metadata, geolocation, crop type, and expert-annotated features such as weeds, diseases, and crop health indicators. It supports advanced analytics in precision agriculture, enabling crop health monitoring, automated feature detection, and field management optimization. The dataset is ideal for developing and benchmarking computer vision models and agronomic decision support tools.
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
This data article contains annotation data characterizing Multi Criteria Assessment Methods proposed in the scientific literature by INRA researchers belonging to the Animal Health department. Its research is dedicated to animal health and veterinary public health.
Facebook
TwitterAnnotation data of genome assemblies of Streptomyces spp. isolated from agricultural soil. Resources in this dataset:Resource Title: Annotation data for MCL20-2. File Name: MCL20-2_prokka.zipResource Title: Annotation data for SCL15-4. File Name: SCL15-4_prokka.zipResource Title: Annotation data for SCL15-6. File Name: SCL15-6_prokka.zipResource Title: Annotation data for SJL17-1. File Name: SJL17-1_prokka.zipResource Title: Annotation data for SJL17-4. File Name: SJL17-4_prokka.zip
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of five subsets with annotated images in COCO format, designed for object detection and tracking plant growth: 1. Cucumber_Train Dataset (for Faster R-CNN) - Includes training, validation, and test images of cucumbers from different angles. - Annotations: Bounding boxes in COCO format for object detection tasks.
Annotations: Bounding boxes in COCO format.
Pepper Dataset
Contains images of pepper plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
Cannabis Dataset
Contains images of cannabis plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
Cucumber Dataset
Contains images of cucumber plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
This dataset supports training and evaluation of object detection models across diverse crops.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The real-world dataset RumexWeeds targets the detection of the grassland weeds: Rumex obtusifolius L. and Rumex crispus L.. RumexWeeds includes whole image sequences with totally 5,510 images of 2.3 MP resolution and 15,519 manual bounding box annotations as well as 340 ground truth pixels-wise annotations, collected at 3 different farms and 4 different days in summer and autumn 2021. Additionally, navigational robot sensor points from GNSS, IMU and odometry are recorded.In a second iteration, we supplement the dataset with joint stem annotation: For each bounding box in the dataset, an ellipse annotation has been performed, representing the potential joint-stem position and the uncertainty of the human annotator.For a detailed description, please consider the related publications as well as the datasets website: https://dtu-pas.github.io/RumexWeeds/
Facebook
Twitterhttps://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy
La taille du marché mondial des outils d'annotation de données s'élève à 102.38 milliards USD en 2023, et devrait atteindre 908.57 milliards USD d'ici 2032, à un TCAC de 24.4 % de 2024 à 2032.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of drone images that were obtained for agricultural field monitoring to detect weeds and crops through computer vision and machine learning approaches. The images were obtained through high-resolution UAVs and annotated using the LabelImg and Roboflow tool. Each image has a corresponding YOLO annotation file that contains bounding box information and class IDs for detected objects. The dataset includes:
Original images in .jpg format with a resolution of 585 × 438 pixels.
Annotation files (.txt) corresponding to each image, following the YOLO format: class_id x_center y_center width height.
A classes.txt file listing the object categories used in labeling (e.g., Weed, Crop).
The dataset is intended for use in machine learning model development, particularly for precision agriculture, weed detection, and plant health monitoring. It can be directly used for training YOLOv7 and other object detection models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents a comprehensive collection of annotated images of diseased and healthy leaves across five important agricultural crops: Banana, Chilli, Radish, Groundnut, and Cauliflower. The dataset was created to support research in plant disease detection, precision agriculture, and deep learning-based crop monitoring systems.
Research Hypothesis Early detection and classification of crop diseases using image-based AI models can significantly reduce yield loss and improve sustainable farming practices. This dataset enables training and evaluation of such AI models across multiple crops and diverse disease types.
What the Data Shows The dataset contains over 23,000 images captured in real agricultural settings, labeled using bounding box annotations. Each crop includes both healthy and multiple disease-specific categories, with more than 30 total classes (e.g., Sigatoka, Leaf Curl, Anthracnose, Rust, Downy Mildew, Black Rot, etc.).
Notable Features High-quality images (640×640 resolution), collected using digital cameras and 200MP mobile phone cameras
Annotated with bounding boxes for object detection tasks
Data collected from Chengalpattu, Kanchipuram, and Krishnagiri districts, Tamil Nadu, India
Covers real-world variations in lighting, leaf orientation, and disease stages
How to Interpret and Use the Data Images are organized by crop name and disease class
Annotations are provided in YOLO format (can be converted to COCO/VOC)
Suitable for training CNN, YOLO, Faster R-CNN, or ViT models for plant disease classification and localization
Ideal for researchers working on edge AI, TinyML, and mobile agriculture apps
Potential Applications Real-time disease diagnosis in smart farming systems
Academic research in plant pathology and computer vision
Benchmarking object detection models in agricultural settings
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Agricultural Research Data Network (ARDN) provides dataset annotations which facilitate interoperability. For information on how to use ARDN annotations and other data products, see https://agmip.github.io/ARDN/ARDN_how.html. The ARDN project (https://usda.figshare.com/ARDN) is a network of datasets harmonized and aggregated using a common vocabulary termed ICASA. ICASA is a recommended data dictionary by USDA NAL with full description of all variables here. The Sustainable Corn CAP (Cropping Systems Coordinated Agricultural Project: Climate Change, Mitigation, and Adaptation in Corn-based Cropping Systems) was a multi-state transdisciplinary project supported by the USDA National Institute of Food and Agriculture (Award No. 2011-68002-30190). Research experiments were located through the U.S. Corn Belt and examined farm-level adaptation practices for corn-based cropping systems to current and predicted impacts of climate change. Refer to the parent data set for a complete explanation of sites and practices studied. The data found here are a subset of the parent data specifically developed for the Agricultural Research Data Network with csv and json files for easy ingestion into crop models. No data have been altered from the parent files although they have been reconfigured substantially to enable alignment with the ICASA dictionary. Data have also been subset to include variables of particular interest for the audience with pesticide operations, insect populations, and soil moisture data not included here. Resources in this dataset:Resource Title: Site Metadata. File Name: sitemeta.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Plot Identifiers. File Name: plotid.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Weather (On-Site and Network). File Name: weather_onsite.csvResource Description: Weather data were not in original published data but included here to better align with ARDN format and tools. Resource Title: Agronomic. File Name: agro.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Soil. File Name: soil.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Greenhouse Gas. File Name: ghg.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Drain Flow. File Name: flow.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Water Quality. File Name: quality.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Water Table. File Name: watertable.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Planting Operations. File Name: ops_plant.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Harvest Operations. File Name: ops_harv.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Tillage Operations. File Name: ops_till.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Fertilization Operations. File Name: ops_fert.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: Drainage Water Management Operations. File Name: ops_dwm.csvResource Description: Reconfigured csv from original published data to better align with ARDN tools. Resource Title: SC2 for Sustainable Corn CAP Research Data (USDA-NIFA Award No. 2011-68002-30190). File Name: master-sc2.jsonResource Description: ARDN sidecar 2 file which allows dataset to be automatically interpreted and translated to end user formats.Resource Software Recommended: AgMIP QuadUI,url: https://github.com/agmip/quadui/releases
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.