Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MegaWeeds dataset consists of seven existing datasets:
- WeedCrop dataset; Sudars, K., Jasko, J., Namatevs, I., Ozola, L., & Badaukis, N. (2020). Dataset of annotated food crops and weed images for robotic computer vision control. Data in Brief, 31, 105833. https://doi.org/https://doi.org/10.1016/j.dib.2020.105833
- Chicory dataset; Gallo, I., Rehman, A. U., Dehkord, R. H., Landro, N., La Grassa, R., & Boschetti, M. (2022). Weed detection by UAV 416a Image Dataset. https://universe.roboflow.com/chicory-crop-weeds-5m7vo/weed-detection-by-uav-416a/dataset/1
- Sesame dataset; Utsav, P., Raviraj, P., & Rayja, M. (2020). crop and weed detection data with bounding boxes. https://www.kaggle.com/datasets/ravirajsinh45/crop-and-weed-detection-data-with-bounding-boxes
- Sugar beet dataset; Wangyongkun. (2020). sugarbeetsAndweeds. https://www.kaggle.com/datasets/wangyongkun/sugarbeetsandweeds
- Weed-Detection-v2; Tandon, K. (2021, June). Weed_Detection_v2. https://www.kaggle.com/datasets/kushagratandon12/weed-detection-v2
- Maize dataset; Correa, J. M. L., D. Andújar, M. Todeschini, J. Karouta, JM Begochea, & Ribeiro A. (2021). WeedMaize. Zenodo. https://doi.org/10.5281/ZENODO.5106795
- CottonWeedDet12 dataset; Dang, F., Chen, D., Lu, Y., & Li, Z. (2023). YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems. Computers and Electronics in Agriculture, 205, 107655. https://doi.org/https://doi.org/10.1016/j.compag.2023.107655
All the datasets contain open-field images from crops and weeds with annotations. The annotation files were converted to text files so it can be used in the YOLO model. All the datasets were combined into one big dataset with in total 19,317 images. The dataset is split into a training and validation set.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
Agriculture Data is a dataset for object detection tasks - it contains Crops Plants Agriculture annotations for 270 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a collection of raw and annotated Multispectral (MS) images acquired in a heterogenous agricultural environment with MicaSense RedEdge-M camera. The spectra particularly Green, Blue, Red, Red Edge and Near Infrared (NIR) were acquired at sub-metre level.. The MS images were labelled manually using VIA and automatically using Grounding DINO in combination with Segment Anything Model. The segmentation masks obtained using these two annotation techniqes over as well as the source code to perform necessary image processing operations are provided in the repository. The images are focussed over Horseradish (Raphanus Raphanistrum) infestations in Triticum Aestivum (wheat) crops.
The nomenclature of sequecncing and naming images and annotations has been in this format: IMG_1: Blue_2: Green_3: Red_4: Near Infrared_5: RedEdgeExample: An image name IMG_0200_3 represents the scene number 200 in Red channel
This dataset 'RafanoSet'is categorized in 6 directories namely 'Raw Images', 'Manual Annotations', 'Automated Annotations', 'Binary Masks - Manual', 'Binary Masks - Automated' and 'Codes'. The sub-directory 'Raw Images' consists of manually acquired 85 images in .PNG format. over 17 different scenes. The sub-directory 'Manual Annotations' consists of annotation file 'region_data' in COCO segmentation format. The sub-directory 'Automated Annotations' consists of 80 automatically annotated images in .JPG format and 80 .XML files in Pascal VOC annotation format.
The scientific framework of image acquisition and annotations are explained in the Data in Brief paper which is the course of peer review. This is just a prerequisite to the data article. Field experimentation roles:
The image acquisition was performed by Mariano Crimaldi, a researcher, on behalf of Department of Agriculture and the hosting institution University of Naples Federico II, Italy.
Shubham Rana has been the curator and analyst for the data under the supervision of his PhD supervisor Prof. Salvatore Gerbino. They are affiliated with Department of Engineering, University of Campania 'Luigi Vanvitelli'.
Domenico Barretta, Department of Engineering has been associated in consulting and brainstorming role particularly with data validation, annotation management and litmus testing of the datasets.
Leaves from genetically unique Juglans regia plants were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA). Soil samples were collected in Fall of 2017 from the riparian oak forest located at the Russell Ranch Sustainable Agricultural Institute at the University of California Davis. The soil was sieved through a 2 mm mesh and was air dried before imaging. A single soil aggregate was scanned at 23 keV using the 10x objective lens with a pixel resolution of 650 nanometers on beamline 8.3.2 at the ALS. Additionally, a drought stressed almond flower bud (Prunus dulcis) from a plant housed at the University of California, Davis, was scanned using a 4x lens with a pixel resolution of 1.72 µm on beamline 8.3.2 at the ALS Raw tomographic image data was reconstructed using TomoPy. Reconstructions were converted to 8-bit tif or png format using ImageJ or the PIL package in Python before further processing. Images were annotated using Intel’s Computer Vision Annotation Tool (CVAT) and ImageJ. Both CVAT and ImageJ are free to use and open source. Leaf images were annotated in following Théroux-Rancourt et al. (2020). Specifically, Hand labeling was done directly in ImageJ by drawing around each tissue; with 5 images annotated per leaf. Care was taken to cover a range of anatomical variation to help improve the generalizability of the models to other leaves. All slices were labeled by Dr. Mina Momayyezi and Fiona Duong.To annotate the flower bud and soil aggregate, images were imported into CVAT. The exterior border of the bud (i.e. bud scales) and flower were annotated in CVAT and exported as masks. Similarly, the exterior of the soil aggregate and particulate organic matter identified by eye were annotated in CVAT and exported as masks. To annotate air spaces in both the bud and soil aggregate, images were imported into ImageJ. A gaussian blur was applied to the image to decrease noise and then the air space was segmented using thresholding. After applying the threshold, the selected air space region was converted to a binary image with white representing the air space and black representing everything else. This binary image was overlaid upon the original image and the air space within the flower bud and aggregate was selected using the “free hand” tool. Air space outside of the region of interest for both image sets was eliminated. The quality of the air space annotation was then visually inspected for accuracy against the underlying original image; incomplete annotations were corrected using the brush or pencil tool to paint missing air space white and incorrectly identified air space black. Once the annotation was satisfactorily corrected, the binary image of the air space was saved. Finally, the annotations of the bud and flower or aggregate and organic matter were opened in ImageJ and the associated air space mask was overlaid on top of them forming a three-layer mask suitable for training the fully convolutional network. All labeling of the soil aggregate and soil aggregate images was done by Dr. Devin Rippner. These images and annotations are for training deep learning models to identify different constituents in leaves, almond buds, and soil aggregates Limitations: For the walnut leaves, some tissues (stomata, etc.) are not labeled and only represent a small portion of a full leaf. Similarly, both the almond bud and the aggregate represent just one single sample of each. The bud tissues are only divided up into buds scales, flower, and air space. Many other tissues remain unlabeled. For the soil aggregate annotated labels are done by eye with no actual chemical information. Therefore particulate organic matter identification may be incorrect. Resources in this dataset:Resource Title: Annotated X-ray CT images and masks of a Forest Soil Aggregate. File Name: forest_soil_images_masks_for_testing_training.zipResource Description: This aggregate was collected from the riparian oak forest at the Russell Ranch Sustainable Agricultural Facility. The aggreagate was scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 0,0,0; pores spaces have a value of 250,250, 250; mineral solids have a value= 128,0,0; and particulate organic matter has a value of = 000,128,000. These files were used for training a model to segment the forest soil aggregate and for testing the accuracy, precision, recall, and f1 score of the model.Resource Title: Annotated X-ray CT images and masks of an Almond bud (P. Dulcis). File Name: Almond_bud_tube_D_P6_training_testing_images_and_masks.zipResource Description: Drought stressed almond flower bud (Prunis dulcis) from a plant housed at the University of California, Davis, was scanned by X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 4x lens with a pixel resolution of 1.72 µm using. For masks, the background has a value of 0,0,0; air spaces have a value of 255,255, 255; bud scales have a value= 128,0,0; and flower tissues have a value of = 000,128,000. These files were used for training a model to segment the almond bud and for testing the accuracy, precision, recall, and f1 score of the model.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads Resource Title: Annotated X-ray CT images and masks of Walnut leaves (J. Regia) . File Name: 6_leaf_training_testing_images_and_masks_for_paper.zipResource Description: Stems were collected from genetically unique J. regia accessions at the 117 USDA-ARS-NCGR in Wolfskill Experimental Orchard, Winters, California USA to use as scion, and were grafted by Sierra Gold Nursery onto a commonly used commercial rootstock, RX1 (J. microcarpa × J. regia). We used a common rootstock to eliminate any own-root effects and to simulate conditions for a commercial walnut orchard setting, where rootstocks are commonly used. The grafted saplings were repotted and transferred to the Armstrong lathe house facility at the University of California, Davis in June 2019, and kept under natural light and temperature. Leaves from each accession and treatment were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 170,170,170; Epidermis value= 85,85,85; Mesophyll value= 0,0,0; Bundle Sheath Extension value= 152,152,152; Vein value= 220,220,220; Air value = 255,255,255.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Applications of convolutional neural network (CNN)-based object detectors in agriculture have been a popular research topic in recent years. However, complicated agricultural environments bring many difficulties for ground truth annotation as well as potential uncertainties for image data quality. Using YOLOv4 as a representation of state-of-the-art object detectors, this study quantified YOLOv4’s sensitivity against artificial image distortions including white noise, motion blur, hue shift, saturation change, and intensity change, and examined the importance of various training dataset attributes based on model classification accuracies, including dataset size, label quality, negative sample presence, image sequence, and image distortion levels. The YOLOv4 model trained and validated on the original datasets failed at 31.91% white noise, 22.05-pixel motion blur, 77.38° hue clockwise shift, 64.81° hue counterclockwise shift, 89.98% saturation decrease, 895.35% saturation increase, 79.80% intensity decrease, and 162.71% intensity increase with 30% mean average precisions (mAPs) for four apple flower bud growth stages. The performance of YOLOv4 decreased with both declining training dataset size and training image label quality. Negative samples and training image sequence did not make a substantial difference in model performance. Incorporating distorted images during training improved the classification accuracies of YOLOv4 models on noisy test datasets by 13 to 390%. In the context of apple flower bud growth-stage classification, except for motion blur, YOLOv4 is sufficiently robust for potential image distortions by white noise, hue shift, saturation change, and intensity change in real life. Training image label quality and training instance number are more important factors than training dataset size. Exposing models to test-image-alike training images is crucial for optimal model classification accuracies. The study enhances understanding of implementing object detectors in agricultural research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of five subsets with annotated images in COCO format, designed for object detection and tracking plant growth: 1. Cucumber_Train Dataset (for Faster R-CNN) - Includes training, validation, and test images of cucumbers from different angles. - Annotations: Bounding boxes in COCO format for object detection tasks.
Annotations: Bounding boxes in COCO format.
Pepper Dataset
Contains images of pepper plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
Cannabis Dataset
Contains images of cannabis plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
Cucumber Dataset
Contains images of cucumber plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
This dataset supports training and evaluation of object detection models across diverse crops.
Agriculture-Vision aims to be a publicly available large-scale aerial agricultural image dataset that is high-resolution, multi-band, and with multiple types of patterns annotated by agronomy experts. The original dataset affiliated with the 2020 CVPR paper includes 94,986 512x512images sampled from 3,432 farmlands with nine types of annotations: double plant, drydown, endrow, nutrient deficiency, planter skip, storm damage, water, waterway and weed cluster. All of these patterns have substantial impacts on field conditions and the final yield. These farmland images were captured between 2017 and 2019 across multiple growing seasons in numerous farming locations in the US. Each field image contains four color channels: Near-infrared (NIR), Red, Green and Blue. We first randomly split the 3,432 farmland images with a 6/2/2 train/val/test ratio. We then assign each sampled image to the split of the farmland image they are cropped from. This guarantees that no cropped images from the same farmland will appear in multiple splits in the final dataset. The generated (supervised) Agriculture-Vision dataset thus contains 56,944/18,334/19,708 train/val/test images. Additionally, we continue to grow this dataset. In 2021 as a part of the Prize Challenge at CVPR, we have added sequences of full-field imagery across 52 fields to promote the use of weakly supervised methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The real-world dataset RumexWeeds targets the detection of the grassland weeds: Rumex obtusifolius L. and Rumex crispus L.. RumexWeeds includes whole image sequences with totally 5,510 images of 2.3 MP resolution and 15,519 manual bounding box annotations as well as 340 ground truth pixels-wise annotations, collected at 3 different farms and 4 different days in summer and autumn 2021. Additionally, navigational robot sensor points from GNSS, IMU and odometry are recorded.In a second iteration, we supplement the dataset with joint stem annotation: For each bounding box in the dataset, an ellipse annotation has been performed, representing the potential joint-stem position and the uncertainty of the human annotator.For a detailed description, please consider the related publications as well as the datasets website: https://dtu-pas.github.io/RumexWeeds/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ubaid, M.T.; Javaid, S. Precision Agriculture: Computer Vision-Enabled Sugarcane Plant Counting in the Tillering Phase. Journal of Imaging 2024, 10, 102. https://doi.org/10.3390/jimaging10050102
Description
Plant annotation is the process of identifying and naming certain aspects or characteristics of plant species, usually for research, categorization, or agriculture. This technique is frequently done out manually by specialists or using automated systems that employ picture recognition technologies. Annotations give useful information on plants' morphology, phenology, diseases, and genetic characteristics. They may include labels for anatomical structures. Annotations may also include categorizing plants based on their development stage, health status, or species identification. Plant annotations are used in agriculture to monitor crop development, detect pests and diseases, optimize cultivation practices, and improve production estimates. Additionally, annotated plant datasets are useful resources for training machine learning models for automated plant recognition and analysis tasks.
The images were labeled using the labeling tool "labelImg". The cane under the leaves was labeled. Annotating the images was difficult because the cane section was so little. Labeling needs care and accuracy while drawing a bounded box around the cane. For 175 photos of data, around 18650 bounding boxes were drawn. The bounding boxes were allocated the class name "sugarcane".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Smart Farming is a dataset for object detection tasks - it contains Chilly annotations for 722 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1. Introduction
This cherry tree disease detection dataset is a multimodal, multi-angle dataset which was constructed for monitoring the growth of cherry trees, including stress analysis and prediction. An orchard of cherry trees is considered in the area of Western Macedonia, where 577 cherry trees were recorded in a full crop season starting from Jul. 2021 to Jul. 2022. The dataset includes a) aerial / Unmanned Aerial Vehicle (UAV) images, b) ground RGB images/photos, and c) ground multispectral images/photos. Two agronomist experts annotated the dataset by identifying a stress, which in this case is a common disease in cherry trees known as Armillaria [1][2].
2. Citation
Please cite the following papers when using this dataset:
C. Chaschatzis, C. Karaiskou, E. Mouratidis, E. Karagiannis, and P. Sarigiannidis, “Detection and Characterization of Stressed Sweet Cherry Tissues Using Machine Learning”, Drones, vol. 6, no. 1, 2022.
P. Radoglou-Grammatikis, P. Sarigiannidis, T. Lagkas, & I. Moscholios, “A compilation of UAV applications for precision agriculture,” Computer Networks, vol. 172, no. 107148, 2020.
A. Lytos, T. Lagkas, P. Sarigiannidis, M. Zervakis, & G. Livanos, “Towards smart farming: Systems, frameworks and exploitation of multiple sources,” Computer Networks, vol. 172, no. 107147, 2020.
3. Cherry tree mapping
In this dataset, an orchard of cherry trees is considered in the area of Western Macedonia, where 577 cherry trees were recorded in a full crop season starting from Jul. 2021 to Jul. 2022. The tree mapping within the orchard is depicted in Fig. 1. (please refer to the ReadMe file), where each circle represents a cherry tree. Labels on the circles (green, red etc) will be elaborated in the following Sections. The five time periods, where the orchard was recorded are: 8th of Jul. 2021, 16th of Sep. 2021, 3rd of Nov. 2021, 26th of May 2022, and 13th of Jul. 2022, providing data to a full year of life cycle.
4. Dataset Modalities
The dataset includes a) aerial / Unmanned Aerial Vehicle (UAV) images, b) ground RGB images/photos, and c) ground multispectral images/photos. Two agronomist experts annotated the dataset by identifying a stress, which in this case is a common disease in cherry trees known as Armillaria [1][2]. In particular, the following modalities are featured in the dataset:
Ground RGB images
Ground multispectral images
UAV/Aerial images (RGB, multispectral, and NDVI).
These modalities represent the cherry tree cultivation in many levels. Each modality describes the same object (cherry tree) within the dataset, i.e., for each tree within. For example, Fig. 2 (please refer to the ReadMe file) show RGB images, Fig. 3 (please refer to the ReadMe file) illustrates multispectral images, and Fig. 4 (please refer to the ReadMe file) provides UAV images. All images show the same cherry trees under three (RGB, multispectral, and UAV) aspects.
5. Dataset Collection & Annotation
This dataset was annotated by two agronomist experts in terms of disease stage (Armillaria). In particular, they annotated each cherry tree, one by one, in four levels of disease stage:
Healthy: the cherry tree is completely healthy;
Stage1: Armillaria is present in light form in the cherry tree;
Stage2: Armillaria is present in advanced form;
Stage3: the cherry tree is killed due to Armillaria.
The annotation process was considered by each one of the underlying modalities (RGB, multispectral and UAV/aerial).
5.1 Image Collection
The image collection is depicted in the following image (please refer to the ReadMe file) in terms of the three modalities (aerial / Unmanned Aerial Vehicle (UAV) images, ground RGB images/photos, and ground multispectral images/photos).
5.2 Dataset Overview
The dataset overview is depicted in Table 1 (please refer to the ReadMe file).
6. Structure and Format
6.1 Dataset Structure
The provided dataset has the following structure (please refer to the ReadMe file).
6.2 Guide to edit the *.tif files
The Aerial/UAV images contain images obtained from the UAV camera in the .tif format. To open these images, you will need the QGIS or other relevant program, or load them by using the corresponding python libraries. Please follow the steps below:
Open QGIS
Locate the browser window in QGIS
Navigate to the folder that contains the images and select all the images in the layer.
Once you have selected the images, select Add Layer to Project, and the selected image will be added to your map.
For accessing the Image data with the OpenCV python library the following code example is provided (please refer to the ReadMe file).
7. Acknowledgment
This work was co‐financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE – INNOVATE (project code: Τ1EDK-04759).
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements No. 957406 (TERMINET).
References
[1] Devkota, P.; Iezzoni, A.; Gasic, K.; Reighard, G.; Hammerschmidt, R. Evaluation of the susceptibility of Prunus rootstock genotypes to Armillaria and Desarmillaria species. Eur. J. Plant Pathol. 2020, 158, 177–193.
[2] Devkota, P.; Hammerschmidt, R. “The infection process of Armillaria mellea and Armillaria solidipes”. Physiol. Mol. Plant Pathol. 2020, 112, 101543.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Precision Agriculture is a dataset for semantic segmentation tasks - it contains Precise Maps annotations for 4,121 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset focuses on drone-based rice panicle detection in Gazipur, Bangladesh, offering valuable visual data to researchers in agricultural studies. Captured using an advanced drone with a 4K resolution camera, the dataset comprises 2193 high-resolution images of rice fields and 5701 images after augmentation. All the images are annotated with precision to aid in automated rice panicle identification. Its main purpose is to support the development of algorithms and systems for critical agricultural tasks like crop monitoring and yield estimation, as well as disease identification and plant health evaluation. The dataset's creation involved extracting frames from drone-recorded video footage and meticulously annotating them with manual and deep learning algorithms using a semi-automatic approach.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
SemanticSugarBeets is a comprehensive dataset and framework designed for analyzing post-harvest and post-storage sugar beets using monocular RGB images. It supports three key tasks: instance segmentation to identify and delineate individual sugar beets, semantic segmentation to classify specific regions of each beet (e.g., damage, soil adhesion, vegetation, and rot) and oriented object detection to estimate the size and mass of beets using reference objects. The dataset includes 952 annotated images with 2,920 sugar-beet instances, captured both before and after storage. Accompanying the dataset is a demo application and processing code, available on GitHub. For more details, refer to the paper presented at the Agriculture-Vision Workshop at CVPR 2025.
The dataset supports three primary learning tasks, each designed to address specific aspects of sugar-beet analysis:
The dataset is organized into the following directories:
File names of images and annotations follow this format:
ssb-
If you use the SemanticSugarBeets dataset or source code in your research, please cite the following paper to acknowledge the authors' contributions:
Croonen, G., Trondl, A., Simon, J., Steininger, D., 2025. SemanticSugarBeets: A Multi-Task Framework and Dataset for Inspecting Harvest and Storage Characteristics of Sugar Beets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.
This is a detailed description of the dataset, a data sheet for the dataset as proposed by Gebru et al.
Motivation for Dataset Creation Why was the dataset created? Embrapa ADD 256 (Apples by Drones Detection Dataset — 256 × 256) was created to provide images and annotation for research on *apple detection in orchards for UAV-based monitoring in apple production.
What (other) tasks could the dataset be used for? Apple detection in low-resolution scenarios, similar to the aerial images employed here.
Who funded the creation of the dataset? The building of the ADD256 dataset was supported by the Embrapa SEG Project 01.14.09.001.05.04, Image-based metrology for Precision Agriculture and Phenotyping, and FAPESP under grant (2017/19282-7).
Dataset Composition What are the instances? Each instance consists of an RGB image and an annotation describing apples locations as circular markers (i.e., presenting center and radius).
How many instances of each type are there? The dataset consists of 1,139 images containing 2,471 apples.
What data does each instance consist of? Each instance contains an 8-bits RGB image. Its corresponding annotation is found in the JSON files: each apple marker is composed by its center (cx, cy) and its radius (in pixels), as seen below:
"gebler-003-06.jpg": [ { "cx": 116, "cy": 117, "r": 10 }, { "cx": 134, "cy": 113, "r": 10 }, { "cx": 221, "cy": 95, "r": 11 }, { "cx": 206, "cy": 61, "r": 11 }, { "cx": 92, "cy": 1, "r": 10 } ],
Dataset.ipynb is a Jupyter Notebook presenting a code example for reading the data as a PyTorch's Dataset (it should be straightforward to adapt the code for other frameworks as Keras/TensorFlow, fastai/PyTorch, Scikit-learn, etc.)
Is everything included or does the data rely on external resources? Everything is included in the dataset.
Are there recommended data splits or evaluation measures? The dataset comes with specified train/test splits. The splits are found in lists stored as JSON files.
| | Number of images | Number of annotated apples | | --- | --- | --- | |Training | 1,025 | 2,204 | |Test | 114 | 267 | |Total | 1,139 | 2,471 |
Dataset recommended split.
Standard measures from the information retrieval and computer vision literature should be employed: precision and recall, F1-score and average precision as seen in COCO and Pascal VOC.
What experiments were initially run on this dataset? The first experiments run on this dataset are described in A methodology for detection and location of fruits in apples orchards from aerial images by Santos & Gebler (2021).
Data Collection Process How was the data collected? The data employed in the development of the methodology came from two plots located at the Embrapa’s Temperate Climate Fruit Growing Experimental Station at Vacaria-RS (28°30’58.2”S, 50°52’52.2”W). Plants of the varieties Fuji and Gala are present in the dataset, in equal proportions. The images were taken during December 13, 2018, by an UAV (DJI Phantom 4 Pro) that flew over the rows of the field at a height of 12 m. The images mix nadir and non-nadir views, allowing a more extensive view of the canopies. A subset from the images was random selected and 256 × 256 pixels patches were extracted.
Who was involved in the data collection process? T. T. Santos and L. Gebler captured the images in field. T. T. Santos performed the annotation.
How was the data associated with each instance acquired? The circular markers were annotated using the VGG Image Annotator (VIA).
WARNING: Find non-ripe apples in low-resolution images of orchards is a challenging task even for humans. ADD256 was annotated by a single annotator. So, users of this dataset should consider it a noisy dataset.
Data Preprocessing What preprocessing/cleaning was done? No preprocessing was applied.
Dataset Distribution How is the dataset distributed? The dataset is available at GitHub.
When will the dataset be released/first distributed? The dataset was released in October 2021.
What license (if any) is it distributed under? The data is released under Creative Commons BY-NC 4.0 (Attribution-NonCommercial 4.0 International license). There is a request to cite the corresponding paper if the dataset is used. For commercial use, contact Embrapa Agricultural Informatics business office.
Are there any fees or access/export restrictions? There are no fees or restrictions. For commercial use, contact Embrapa Agricultural Informatics business office.
Dataset Maintenance Who is supporting/hosting/maintaining the dataset? The dataset is hosted at Embrapa Agricultural Informatics and all comments or requests can be sent to Thiago T. Santos (maintainer).
Will the dataset be updated? There is no scheduled updates.
If others want to extend/augment/build on this dataset, is there a mechanism for them to do so? Contributors should contact the maintainer by e-mail.
No warranty The maintainers and their institutions are exempt from any liability, judicial or extrajudicial, for any losses or damages arising from the use of the data contained in the image database.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The repository provides the following subdirectories:
TimberVision consists of multiple subsets for different application scenarios. To identify them, file names of images and annotations include the following prefixes:
If you use the TimberVision dataset for your research, please cite the original paper:
Steininger, D., Simon, J., Trondl, A., Murschitz, M., 2025. TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included.
The aim of the datasets are to facilitate bootstrapping agricultural semantic segmentation computer vision models with synthetic data that fine-tune and test on empirical images.
ACFR Orchard Fruit Dataset is an agricultural dataset containing images and annotations for different fruits, collected at different farms across Australia. The dataset was gathered by the agriculture team at the Australian Centre for Field Robotics, The University of Sydney, Australia.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MegaWeeds dataset consists of seven existing datasets:
- WeedCrop dataset; Sudars, K., Jasko, J., Namatevs, I., Ozola, L., & Badaukis, N. (2020). Dataset of annotated food crops and weed images for robotic computer vision control. Data in Brief, 31, 105833. https://doi.org/https://doi.org/10.1016/j.dib.2020.105833
- Chicory dataset; Gallo, I., Rehman, A. U., Dehkord, R. H., Landro, N., La Grassa, R., & Boschetti, M. (2022). Weed detection by UAV 416a Image Dataset. https://universe.roboflow.com/chicory-crop-weeds-5m7vo/weed-detection-by-uav-416a/dataset/1
- Sesame dataset; Utsav, P., Raviraj, P., & Rayja, M. (2020). crop and weed detection data with bounding boxes. https://www.kaggle.com/datasets/ravirajsinh45/crop-and-weed-detection-data-with-bounding-boxes
- Sugar beet dataset; Wangyongkun. (2020). sugarbeetsAndweeds. https://www.kaggle.com/datasets/wangyongkun/sugarbeetsandweeds
- Weed-Detection-v2; Tandon, K. (2021, June). Weed_Detection_v2. https://www.kaggle.com/datasets/kushagratandon12/weed-detection-v2
- Maize dataset; Correa, J. M. L., D. Andújar, M. Todeschini, J. Karouta, JM Begochea, & Ribeiro A. (2021). WeedMaize. Zenodo. https://doi.org/10.5281/ZENODO.5106795
- CottonWeedDet12 dataset; Dang, F., Chen, D., Lu, Y., & Li, Z. (2023). YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems. Computers and Electronics in Agriculture, 205, 107655. https://doi.org/https://doi.org/10.1016/j.compag.2023.107655
All the datasets contain open-field images from crops and weeds with annotations. The annotation files were converted to text files so it can be used in the YOLO model. All the datasets were combined into one big dataset with in total 19,317 images. The dataset is split into a training and validation set.