Automated leaf segmentation is a challenging area in computer vision. Recent advances in machine learning approaches allowed to achieve better results than traditional image processing techniques; however, training such systems often require large annotated data sets. To contribute with annotated data sets and help to overcome this bottleneck in plant phenotyping research, here we provide a novel photometric stereo (PS) data set with annotated leaf masks. This data set forms part of the work done in the BBSRC Tools and Resources Development project BB/N02334X/1.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is from this repository contributed by Pratik Kayal and Naman Jain. It's important to note that this dataset focuses on classification and does not include bounding boxes or other object recognition elements. files names has been formatted.
The Cropped-PlantDoc dataset was used for benchmarking classification models in the paper titled "PlantDoc: A Dataset for Visual Plant Disease Detection" which was accepted in the Research Track at ACM India Joint International Conference on Data Science and Management of Data (CoDS-COMAD 2020).
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9522896%2F8b0a4e5e91bb6e48ca447b0f18e964cd%2FPlantDoc_Examples.png?generation=1698555101222210&alt=media" alt="">
India loses 35% of the annual crop yield due to plant diseases. Early detection of plant diseases remains difficult due to the lack of lab infrastructure and expertise. In this paper, we explore the possibility of computer vision approaches for scalable and early plant disease detection. The lack of availability of sufficiently large-scale non-lab data set remains a major challenge for enabling vision based plant disease detection. Against this background, we present PlantDoc: a dataset for visual plant disease detection. Our dataset contains 2,598 data points in total across 13 plant species and up to 17 classes of diseases, involving approximately 300 human hours of effort in annotating internet scraped images. To show the efficacy of our dataset, we learn 3 models for the task of plant disease classification. Our results show that modelling using our dataset can increase the classification accuracy by up to 31%. We believe that our dataset can help reduce the entry barrier of computer vision techniques in plant disease detection.
For full paper, refer Arxiv and ACM
Davinder Singh*, Naman Jain*, Pranjali Jain*, Pratik Kayal*, Sudhakar Kumawat and Nipun Batra
@inproceedings{10.1145/3371158.3371196,
author = {Singh, Davinder and Jain, Naman and Jain, Pranjali and Kayal, Pratik and Kumawat, Sudhakar and Batra, Nipun},
title = {PlantDoc: A Dataset for Visual Plant Disease Detection},
year = {2020},
isbn = {9781450377386},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3371158.3371196},
doi = {10.1145/3371158.3371196},
booktitle = {Proceedings of the 7th ACM IKDD CoDS and 25th COMAD},
pages = {249–253},
numpages = {5},
keywords = {Deep Learning, Object Detection, Image Classification},
location = {Hyderabad, India},
series = {CoDS COMAD 2020}
}
Creative Commons Attribution 4.0 International Link
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
Plant is a dataset for object detection tasks - it contains Plant annotations for 350 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore an extensive dataset of 30,000 plant images, with 1,000 images per class and a diverse collection of 30 plant classes and 7 plant types.
This dataset contains 3 main feature classes. See the detailed description of each feature class in the individual metadata files below:
MNDNR Native Plant Communities
DNR NPC and Land Cover - EWR
DNR NPC and Land Cover - Parks and Trails
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Antarctic Plant Database is a database of the plant collections held in the British Antarctic Survey's herbarium (international code AAS). This contains over 50,000 plant specimens from Antarctica, the sub-Antarctic Islands and surrounding continents (especially Fuegia and Patagonia). Over 2000 species are represented, comprising predominantly mosses, liverworts and lichens with smaller collections of vascular plants, macro-algae and macro-fungi.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are two datasets and one table uploaded in this platform under the title "MED117_Medicinal Plant Leaf Dataset & Name Table". A folder is created with title "MED 117 Leaf Species". Inside this two sub folders with titles " Raw leaf image set of medicinal plants_v2" and "Segmented leaf set using UNET segmentation" are created. Raw leaf image set consists of leaf images of 117 medicinal plants found in Assam. All the samples are collected by visiting different (Govt, Public and Private) medicinal gardens situated in different places of Assam and some other general places where they are mostly found. Videos of 10 to 15 seconds duration were taken for two to three leaves of every species on a white background and video recording was done using a SLR Canon Camera. Individual videos were segregated into image frames and thus were able to get around 77,700 jpg image frames from the videos. The Raw leaf image set consists of folders with scientific name and common name within bracket. Second folder with title "Segmented leaf set using UNET segmentation" consists of 115 medicinal plant species with their segmented leaf image samples using UNET segmentation technique. Here two species are excluded from the original dataset due to small unpredictable size of the samples, so total 115 subfolders inside the segmented folder is achieved. Thirdly a table in doc format with title "Medicinal Plant Name Table" is uploaded and it includes Scientific name, Common name and Assamese name of the plants listed in the folders in the same sequence. The whole contribution is absolutely original and new, collected from different sources then processed for segmentation and prepared the table by discussing with taxonomy experts from Botany department of Gauhati University, Guwahati, Assam. India.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Juliekyungyoon/plant-kaggle-seg-data dataset hosted on Hugging Face and contributed by the HF Datasets community
https://www.neonscience.org/data-samples/data-policies-citationhttps://www.neonscience.org/data-samples/data-policies-citation
Plant species cover-abundance and presence observed in multi-scale plots. Plant species and associated percent cover in 1m2 subplots and plant species presence in 10m2 and 100m2 subplots are reported from 400m2 plots. Archived plant vouchers and foliar tissue support the data and additional analyses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Outdoor Plant is a dataset for classification tasks - it contains Plant annotations for 300 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This is a rice plant dataset that contains both healthy and unhealthy images. I collected this dataset for my research work on the diseases in plants and mainly focused on rice plants because rice is one of the economic crops of Pakistan. This dataset was collected from different cities in Pakistan such as Kandhkot, Shikarpur, Sukkur, Moro, and Kashmore.
I used the DSLR (a megapixel camera) to capture the images and tried my best to collect the most helpful dataset. I used this dataset for my research on detecting diseases in plants such as fungal blast disease. I successfully published a paper using this dataset entitled "Fungal Blast Disease Detection in Rice Seed Using Machine Learning", published in IJACSA (International Journal of Advanced Computer Science and Applications).
This dataset is already tuned and fined with image processing steps. I performed all the necessary tasks of data augmentation to make this dataset usable. Such as rescaling, cropping, enhancement, contrast, flipping, and saturation that make the dataset more visually.
In case of a query or question you can directly contact me regarding this dataset. I am available to help you.
NOTE: PLEASE DON'T FORGET TO CITE THIS DATASET WITH MY REFERENCES PAPER GIVEN BELOW.
Raj Kumar, Gulsher Baloch, Pankaj, Abdul Baseer Buriro and Junaid Bhatti, “Fungal Blast Disease Detection in Rice Seed using Machine Learning” International Journal of Advanced Computer Science and Applications(IJACSA), 12(2), 2021. http://dx.doi.org/10.14569/IJACSA.2021.0120232
DOI Link: https://dx.doi.org/10.14569/IJACSA.2021.0120232
Thanks and regards,
Engr. Raj Kumar | Research Scholar @ Jeju National University, South Korea
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
[NOTE: PLEXdb is no longer available online. Oct 2019.] PLEXdb (Plant Expression Database) is a unified gene expression resource for plants and plant pathogens. PLEXdb is a genotype to phenotype, hypothesis building information warehouse, leveraging highly parallel expression data with seamless portals to related genetic, physical, and pathway data. PLEXdb (http://www.plexdb.org), in partnership with community databases, supports comparisons of gene expression across multiple plant and pathogen species, promoting individuals and/or consortia to upload genome-scale data sets to contrast them to previously archived data. These analyses facilitate the interpretation of structure, function and regulation of genes in economically important plants. A list of Gene Atlas experiments highlights data sets that give responses across different developmental stages, conditions and tissues. Tools at PLEXdb allow users to perform complex analyses quickly and easily. The Model Genome Interrogator (MGI) tool supports mapping gene lists onto corresponding genes from model plant organisms, including rice and Arabidopsis. MGI predicts homologies, displays gene structures and supporting information for annotated genes and full-length cDNAs. The gene list-processing wizard guides users through PLEXdb functions for creating, analyzing, annotating and managing gene lists. Users can upload their own lists or create them from the output of PLEXdb tools, and then apply diverse higher level analyses, such as ANOVA and clustering. PLEXdb also provides methods for users to track how gene expression changes across many different experiments using the Gene OscilloScope. This tool can identify interesting expression patterns, such as up-regulation under diverse conditions or checking any gene’s suitability as a steady-state control. Resources in this dataset:Resource Title: Website Pointer for Plant Expression Database, Iowa State University. File Name: Web Page, url: https://www.bcb.iastate.edu/plant-expression-database [NOTE: PLEXdb is no longer available online. Oct 2019.] Project description for the Plant Expression Database (PLEXdb) and integrated tools.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This unique and huge data set contains plant information for the Himalaya Uplands; it consists of 164,360 records. This database is implemented in MS ACCESS following ABCD 1.2. It describes Asian plant species related to the Tibetan Plateau, Central Asia. Data have been collected for over 50 years, and in over 11 countries (e.g. Afghanistan, Pakistan, Bhutan, China,India, Kazakhstan, Kyrgyztan, Myanmar, Nepal, Russia, Tajikistan, Turkmenistan, Uzbekistan), covering over 220 national regions. Taxonomic information for this region is diverse and not well studied. However, the database follows ICBN taxonomy matched with ITIS and consists of over 5,562 unique species entries. From these, ITIS has 996 species listed. Over 2,200 collectors from all over the world contributed to this dataset, which mostly was compiled and maintained by the author for over 20 years. This database covers 21,869 localities. virtually all sites are georeferenced with latitude and longitude (2 decimals; geographic datum of WGS84), and 6,668 of such unique locations are found in the HUP database. This dataset has altitude information provided by the fieldworker.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Global Power Plant Database is a comprehensive, open source database of power plants around the world. It centralizes power plant data to make it easier to navigate, compare and draw insights for one’s own analysis. The database covers approximately 35,000 power plants from 167 countries and includes thermal plants (e.g. coal, gas, oil, nuclear, biomass, waste, geothermal) and renewables (e.g. hydro, wind, solar). Each power plant is geolocated and entries contain information on plant capacity, generation, ownership, and fuel type. It will be continuously updated as data becomes available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we present version 2.0 of the China Plant Trait Database, which contains information on morphometric, physical, chemical, photosynthetic and hydraulic traits from 1529 unique species in 140 sites spanning a diversity of vegetation types. Version 2 has five improvements compared to the previous version: (1) new data from a 4-km elevation transect on the edge of Tibetan Plateau, including alpine vegetation types not sampled previously; (2) inclusion of traits related to hydraulic processes, including specific sapwood conductance, the area ratio of sapwood to leaf, wood density and leaf turgor loss point; (3) inclusion of information on soil properties to complement the existing data on climate and vegetation (4) assessments of the reliability of individual trait measurements; and (5) inclusion of standardized checklists and templates for systematical field sampling and measurements. See detailed descriptions here: Wang, H., Harrison, S.P., Li, M. et al. The China plant trait database version 2. Sci Data 9, 769 (2022). https://doi.org/10.1038/s41597-022-01884-4
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PlantDoc dataset was originally published by researchers at the Indian Institute of Technology, and described in depth in their paper. One of the paper’s authors, Pratik Kayal, shared the object detection dataset available on GitHub.
PlantDoc is a dataset of 2,569 images across 13 plant species and 30 classes (diseased and healthy) for image classification and object detection. There are 8,851 labels. Read more about how the version available on Roboflow improves on the original version here.
And here's an example image:
https://i.imgur.com/fGlQ0kG.png" alt="Tomato Blight">
Fork
this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 416x416 export.
As the researchers from IIT stated in their paper, “plant diseases alone cost the global economy around US$220 billion annually.” Training models to recognize plant diseases earlier dramatically increases yield potential.
The dataset also serves as a useful open dataset for benchmarks. The researchers trained both object detection models like MobileNet and Faster-RCNN and image classification models like VGG16, InceptionV3, and InceptionResnet V2.
The dataset is useful for advancing general agriculture computer vision tasks, whether that be health crop classification, plant disease classification, or plant disease objection.
This dataset follows Creative Commons 4.0 protocol. You may use it commercially without Liability, Trademark use, Patent use, or Warranty.
Provide the following citation for the original authors:
@misc{singh2019plantdoc,
title={PlantDoc: A Dataset for Visual Plant Disease Detection},
author={Davinder Singh and Naman Jain and Pranjali Jain and Pratik Kayal and Sudhakar Kumawat and Nipun Batra},
year={2019},
eprint={1911.10317},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data file contains occurrence data based on historical observations and records between 1651 and 2004. Ten plant species have been studied : Alnus incana (L.) Moench, 1794 ; Buddleja davidii Franch., 1887 ; Castanea sativa Mill., 1798 ; Helianthus tuberosus L., 1753 ; Impatiens glandulifera Royle, 1833 ; Prunus cerasifera Ehrh., 1784 ; Prunus laurocerasus L., 1753 ; Reynoutria japonica Houtt., 1777 ; Robinia pseudoacacia L., 1753 ; and Spiraea japonica L.f., 1782. The data file is the result of a geo-historical study conducted over five months on the invasive plants species's introduction and distribution in Occitania (France), carried out within the framework of the EI2P-VALEEBEE project (Invasive species and pollinators, between constraints and potentials). Historical sources have been consulted during 2020 in order to find the oldest elements about the ten species. Each data corresponds to an historical observation or mention on one of the ten species of the study mainly on Metropolitan French territory since their introduction. Without an historical analysis, it is difficult to understand the current local distribution dynamics of invasive plant species, especially when some of them have been introduced on Metropolitan French territory for several centuries. All the interest of these occurrence data is to bring an historical depth allowing us to apprehend the local distribution of the ten species of the study over time. This can be allowed thanks to the record of several elements on their places of introduction, the comments from authors and observers on their abundance, and elements on the historical context of introduction. More generally, this historical data file is part of a multidisciplinary approach proposed by the members of EI2P project whose objective is to better take into account the ecological socio-cultural and economic issues raised by the issue of invasive alien plants.
This work was endorsed by the CNRS/INEE Zone Atelier Pyrénées Garonne (ZA PYGAR). The Zones Ateliers network (RZA) is recognized by ALLENVI, as an eLTER (European Long-Term Ecological Research).
A data paper explains precisely this dataset: Claudel M, Lerigoleur E, Brun C, Guillerme S (2022) Geohistorical dataset of ten plant species introduced into Occitania (France). Biodiversity Data Journal 10: e76283. https://doi.org/10.3897/BDJ.10.e76283
ipranavks/my-new-plant-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset was created by Alex Olariu
It contains the following files:
Important Plant Areas (IPAs) are a product of The New Mexico Rare Plant Conservation Strategy. The strategy is an integral part of the State of New Mexico’s Energy, Minerals, and Natural Resources Department, Forestry Division’s Forest Action Plan, which identifies needs and opportunities across all land ownerships in the state and guides long-term Division management, planning, and conservation opportunities. Important Plant Areas (IPAs) are places across New Mexico that have been identified (and delineated) as supporting either a high diversity of sensitive species or are the last remaining locations of our most endangered plants. The IPAs were developed using a combination of spatial modeling of rare species observation data in a GIS and expert review followed by the assignment of a Biodiversity Rank (B1-B4) to assist in prioritizing areas for conservation planning.
Automated leaf segmentation is a challenging area in computer vision. Recent advances in machine learning approaches allowed to achieve better results than traditional image processing techniques; however, training such systems often require large annotated data sets. To contribute with annotated data sets and help to overcome this bottleneck in plant phenotyping research, here we provide a novel photometric stereo (PS) data set with annotated leaf masks. This data set forms part of the work done in the BBSRC Tools and Resources Development project BB/N02334X/1.