Facebook
TwitterProblem statement : Tomato is one of the most extensively grown vegetables in any country, and their diseases can significantly affect yield and quality. Accurate and early detection of tomato diseases is crucial for reducing losses and improving crop management. Current Deep Learning and CNN research have resulted in the availability of multiple CNN designs, making automated plant disease identification viable rather than traditional visual inspection-based disease detection. When using Deep Learning Methods, the dataset serves one of the most crucial roles in disease prediction. PlantVillage is the most widely used publicly available dataset for Tomato Disease detection, but it was created in a lab/controlled environment, and models trained on it do not perform well on real-world images. Some natural or real-world datasets are available, but they are private and not publicly available. Also, when attempting to predict tomato diseases on the field in the Jodhpur and Jaipur districts of Rajasthan, India, we found that the majority of diseases are Leaf Miner, spotted wilt virus, and Nutrition deficiency diseases, but there are no public datasets containing such categories.
Proposed Solution:To overcome these challenges, we propose the creation of a new dataset called "Tomato-Village" with three variants: a) Multiclass tomato disease classification, b) Multilabel tomato disease classification and c) Object detection based tomato disease detection. As per our best knowledge, “Tomato-Village” will be the first such dataset to be available publicly. Further, we have applied the various CNN architectures/models on this dataset, and baseline results are drawn.
To use the dataset , Please cite the below article : Gehlot, M., Saxena, R.K. & Gandhi, G.C. “Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Systems (2023). DOI : https://doi.org/10.1007/s00530-023-01158-y
Article Link : https://link.springer.com/article/10.1007/s00530-023-01158-y
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Big Tomato is a dataset for object detection tasks - it contains Tomato annotations for 442 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Huggingface Hub: link
The Rotten Tomatoes Movie Review Sentiment Analysis Dataset contains a set of 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. Bo Pang and Lillian Lee first used this data in their paper Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, which was published in Proceedings of the ACL in 2005. All of the data fields are identical in every single one of the splits.The text column contains the review itself, and the label column indicates whether the review is positive or negative
The Performance of Sentiment Analysis In this post we take a look at the performance of different sentiment analysis systems on a movie review dataset from Rotten Tomatoes. This data was first used in Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales., Proceedings of the ACL, 2005. The data fields are the same among all splits
We will be using three different libraries for this post: 1) Scikit-learn, 2) NLTK, and 3) TextBlob. We will also compare the results of these systems with those from human raters. Each library takes different amounts of time and resources to run, so we will also be considering these factors in our comparisons.
NLTK
NLTK is a popular library for working with text data in Python. It includes many useful features for pre-processing text data, including tokenization, lemmatization, and part-of-speech tagging. NLTK also includes a number of helpful classes for building and evaluating predictive models (such as decision trees and maximum entropy classifiers).
TextBlob
TextBlob is a relatively new library that attempts to provide an easy-to-use interface for common text processing tasks (such as part-of-speech tagging, sentence parsing, spelling correction, etc). TextBlob is built on top of NLTK and Pattern, another Python library for web mining (see below).
Scikit-learn
Scikit-learn is a popular machine learning library for Python that provides efficient implementations of common algorithms such as support vector machines, random forests, and k-nearest neighbors classifiers. It also includes helpful utilities for pre-processing data and assessing model performance
- Identify positive and negative sentiment in movie reviews
- Categorize movie reviews by rating
- Cluster movie reviews to group together similar reviews
Huggingface Hub: link
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: validation.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |
File: train.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |
File: test.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pixar Movies Dataset
A comprehensive dataset of Pixar movies, including details on their release dates, directors, cast, box office performance, and ratings. This dataset is gathered from official sources, including Pixar, Rotten Tomatoes, and IMDb. For more information, visit Pixar.
How the Data is Compiled
All information in this dataset has been collected from public sources, including official information from Pixar, Rotten Tomatoes, and IMDb. Cells are each… See the full description on the dataset page: https://huggingface.co/datasets/RummageLabs/pixar_movies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data was reported at 6.430 SAR/kg in Mar 2025. This records an increase from the previous number of 6.250 SAR/kg for Feb 2025. Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data is updated monthly, averaging 5.470 SAR/kg from Jan 2009 (Median) to Mar 2025, with 195 observations. The data reached an all-time high of 8.320 SAR/kg in Oct 2013 and a record low of 3.400 SAR/kg in Jun 2009. Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data remains active status in CEIC and is reported by General Authority for Statistics. The data is categorized under Global Database’s Saudi Arabia – Table SA.P001: Average Prices of Goods and Services. [COVID-19-IMPACT]
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Movies' data is stored on several popular websites, but when it comes to critic reviews there is no better place than Rotten Tomatoes. This website allows to compare the ratings given by regular users (audience score) and the ratings given/reviews provided by critics (tomatometer) who are certified members of various writing guilds or film critic-associations.
In the movies dataset each record represents a movie available on Rotten Tomatoes, with the URL used for the scraping, movie tile, description, genres, duration, director, actors, users' ratings, and critics' ratings. In the critics dataset each record represents a critic review published on Rotten Tomatoes, with the URL used for the scraping, critic name, review publication, date, score, and content.
Data has been scraped from the publicly available website https://www.rottentomatoes.com as of 2020-10-31. Since the data takes a few days to be scraped from the website, there is no full consistency between some fields of the movies and critics datasets such as "tomatometer_count", "tomatometer_top_critics_count", "tomatometer_fresh_critics_count", and "tomatometer_rotten_critics_count" of the movies dataset compared to all records included in the critics dataset, which has been scraped first.
To provide detailed information in regards to movies' critic reviews and the users' vs critics' ratings, and it can be combined with other movie datasets publicly available (FilmTV, etc.).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The modes of interactions between plants and plant-associated microbiota are manifold, and secondary metabolites often play a central role in plant-microbe interactions. Abiotic and biotic (including both plant pathogens and endophytes) stress can affect the composition and concentration of secondary plant metabolites, and thus have an influence on chemical compounds that make up for the taste and aroma of fruit. While the role of microbiota in growth and health of plants is widely acknowledged, relatively little is known about the possible effect of microorganisms on the quality of fruit of plants they are colonizing. In this work, tomato (Solanum lycopersicum L.) plants of five different cultivars were grown in soil and in hydroponics to investigate the impact of the cultivation method on the flavor of fruit, and to assess whether variations in their chemical composition are attributable to shifts in bacterial microbiota. Ripe fruit were harvested and used for bacterial community analysis and for the analysis of tomato volatiles, sugars and acids, all contributing to flavor. Fruit grown in soil showed significantly higher sugar content, whereas tomatoes from plants under hydroponic conditions had significantly higher levels of organic acids. In contrast, aroma profiles of fruit were shaped by the tomato cultivars, rather than the cultivation method. In terms of bacterial communities, the cultivation method significantly defined the community composition in all cultivars, with the bacterial communities in hydroponic tomatoes being more variable that those in tomatoes grown in soil. Bacterial indicator species in soil-grown tomatoes correlated with higher concentrations of volatiles described to be perceived as “green” or “pungent.” A soil-grown specific reproducibly occurring ASV (amplicon sequence variants) classified as Bacillus detected solely in “Solarino” tomatoes, which were the sweetest among all cultivars, correlated with the amount of aroma-relevant volatiles as well as of fructose and glucose in the fruit. In contrast, indicator bacterial species in hydroponic-derived tomatoes correlated with aroma compounds with “sweet” and “floral” notes and showed negative correlations with glucose concentrations in fruit. Overall, our results point toward a microbiota-related accumulation of flavor and aroma compounds in tomato fruit, which is strongly dependent on the cultivation substrate and approach.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Laboro Tomato is an image dataset of growing tomatoes at different stages of their ripening which is designed for object detection and instance segmentation tasks. We also provide two subsets of tomatoes separated by size. Dataset was gathered at a local farm with two separate cameras with its different resolution and image quality.
COCO and YOLO annotations are available
https://github.com/nexuswho/LaboroTomato/blob/master/examples/ann_gif_IMG_1066.gif?raw=true">
https://github.com/nexuswho/LaboroTomato/blob/master/examples/ann_gif_IMG_1246.gif?raw=true">
Samples of raw/annotated images: IMG_1066, IMG_1246
Each tomato is divided into 2 categories according to size (normal size and cherry tomato) and 3 categories depending on the stage of ripening:
* fully_ripened - complitely red color and ready to be harvested. Filled with red color on 90%* or more
* half_ripened - greenish and needs time to ripen. Filled with red color on 30-89%*
* green - complitely green/white, sometimes with rare red parts. Filled with red color on 0-30%*
*All percentages are approximate and differ from case to case.
https://github.com/nexuswho/LaboroTomato/blob/master/examples/laboro_tomato_exp1.png?raw=true">
Dataset includes 804 images with following details:
name: tomato_mixed
images: 643 train, 161 test
cls_num: 6
cls_names: b_fully_ripened, b_half_ripened, b_green, l_fully_ripened, l_half_ripened, l_green
total_bboxes: train[7781], test[1,996]
bboxes_per_class:
*Train: b_fully_ripened[348], b_half_ripened[520], b_green[1467],
l_fully_ripened[982], l_half_ripened[797], l_green[3667]
*Test: b_fully_ripened[72], b_half_ripened[116], b_green[387],
l_fully_ripened[269], l_half_ripened[223], l_green[929]
image_resolutions: 3024x4032, 3120x4160
https://github.com/nexuswho/LaboroTomato/blob/master/examples/laboro_tomato_exp2.png?raw=true">
Laboro Tomato dataset can be used to solve cutting edge real-life tasks by fusing various technologies:
* Harvesting forecast based on tomato maturity
* Automatic harvest of only ripened tomates
* Identification and automatic thinning of deteriorated and obsolete tomatoes
* Sprayig pesticides only on tomatoes at a specific ripening stage
* Temperature control in greenhouse according to ripening stage
* Quality control on production line of food manufactures, etc.
Model have been trained by mmdetection V2.0 on 4 Tesla-V100 and based on Mask R-CNN with R-50-FPN 1x backbone:
| Dataset | bbox AP | mask AP | Download |
|---|---|---|---|
| Laboro Tomato | 64.3 | 65.7 | model |
We haven't done hyperparameters tuning for baseline model training and used default values, provided by original mmdetection configs.
Training parameters:
lr = 0.01
step = [32, 44]
total epoch = 48
Image gallery with pretrained model output examples and its comparison between raw and annotated images.
To evaluate pretrained models please prepare mmdetection environment by official installation guide.
It is recommended to symlink the dataset root to $MMDETECTION/data. If your folder structure is different, you may need to change the corresponding paths in config files.
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── laboro_tomato
│ │ ├── annotations
│ │ ├── train
│ │ ├── test
To load data we need to create a new config file mmdet/datasets/laboro_tomato.py with corresponding subsets:
from .coco import CocoDataset
from .builder import DATASETS
@DATASETS.register_module()
class LaboroTomato(CocoDataset):
CLASSES = ('b_fully_ripened', 'b_half_ripened', 'b_green',
'l_fully_ripened', 'l_half_ripened', 'l_green')
And add dataset names to mmdet/datasets/_init_.py:
from .laboro_tomato import LaboroTomato
_all_ = [
..., 'LaboroTomato'
]
Configuration files setup on Tomato Mixed dataset example:
laboro_tomato_base.py in configs/_base_/datasets/ with content of coco_detection configuration file and change dataset type, root and path parameters:dataset_type = 'LaboroTomato'
data_root = 'data/laboro_tomato/'
...
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
tomatOD is a dataset for tomato fruit localization and ripening classification, containing images of tomato fruits in a greenhouse and high-quality expert annotations from agriculturists. It is a task-specific object detection dataset for tomato fruits, suitable for precision agriculture applications that typically require highly-accurate localization.
The tomatOD dataset consists of 277 images with 2418 annotated tomato fruit samples of unripe, semi-ripe and fully-ripe classes.
The images and the annotations are licensed under CC BY-NC-SA 4.0 license. The contents of this repository are released under the license.
Sample images with tomato fruit annotations are shown below.
https://github.com/nexuswho/tomatOD/blob/master/assets/tomatOD_img1.png?raw=true">
https://github.com/nexuswho/tomatOD/blob/master/assets/tomatOD_img1.png?raw=true">
The dataset was split into train and test set according to a 80%/20% train-test split ratio. Please, note that the selection of the training and test data was conducted in a semi-random manner. The following table shows the number of images and annotated boxes of train and test sets of the tomatOD dataset.
| Train | Test | |
|---|---|---|
| Images | 222 | 55 |
| Annotated boxes | 1952 | 466 |
The annotations of the tomatOD dataset are provided in a COCO compatible format.
Fix for test annotations error in with categorical ids contributed by ARTURO-BANDINI-JR
The table below shows the number of annotated objects for each class of the tomatOD dataset.
| unripe | semi-ripe | fully-ripe |
|---|---|---|
| 1592 | 395 | 431 |
Additionally, the following figure illustrates the relative appearance frequencies of those three classes of the dataset. The classes of the tomatOD dataset are clearly not balanced, however their relative proportion is in line with the actual appearance frequency of each class in a realistic scenario.
https://github.com/nexuswho/tomatOD/blob/master/assets/classes_proportions_tomatOD.png?raw=true">
The percentile relative size of each bounding box is calculated, which indicates the proportion of the diagonal length of each box over the diagonal length of the image. In the image below, the histogram of the percentile relative size distribution of the tomatOD bounding boxes is presented. Most of the bounding boxes have a size of 3% to 15% relative to the image size.
https://github.com/nexuswho/tomatOD/blob/master/assets/histogramm_boxes.png?raw=true">
Only 1% of images have one category per image and 11% of images include 8 instances, while the maximum number of instances per image, which is 20, is found only in 0.72% of the images. The tomatOD dataset has an average of 8.7 instances per image. The image displays the histogram of the number of annotation instances per image.
As the next figure shows, more than 50% of the tomatOD images contain objects of all 3 categories, while less than 8% of the images have objects of a single category.
https://github.com/nexuswho/tomatOD/blob/master/assets/categories_in_images.png?raw=true">
Six state-of-the-art detectors are evaluated at the proposed tomatOD dataset. In detail, Faster RCNN with Inception v2, SSD with both Inception v2 and Mobilenet v2, PPN with Inception v2, RetinaNet (ResNet 101) and Yolo v3 are trained on tomatOD train set for 450 epochs, all of them pretrained on COCO dataset. Afterwards, they are evaluated on test set. Hyperparameter fine-tuning was performed for all networks in order to perform optimally on the tomatOD dataset.
The figure below illustrates the accuracy over epochs for both the train and the test set for every trained model.
https://github.com/nexuswho/tomatOD/blob/master/assets/accuracy_vs_epochs.png?raw=true">
Retina outperformed the rest detectors, yielding an accuracy of 79.4 %. The average precision of each class, the mAP metrics and precision-recall curves for classes of RetinaNet are listed.
In the precision-recall curves diagram, the unripe class is indicated by the green line, the semi-ripe class by the orange line, while the fully-ripe class by red line.
| unripe AP (%) | semi-ripe AP (%) | fully-ripe (%) | mAP (%) | |
|---|---|---|---|---|
| RetinaNet | 91.47 | 55.28 | 76.77 | 74.51 |
<img...
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Dataset Photo by Nicolas J Leclercq on Unsplash
This is a collection of both critic and audience reviews for 685 different online streaming shows off of Rotten Tomatoes. The data was scraped the week of July 10, 2022. Reviews found in the files are previews in most cases, especially when the critic review is long. There are also duplicates of some reviews in the case that a show may be found on more than one online streaming service (either paid or with a subscription).
Data was collected for the top 100 most popular shows from 9 streaming platforms: 'Apple TV+', 'Paramount+', 'HBO MAX', 'Disney+', 'Prime Video', 'Hulu', 'Netflix', 'Peacock', and 'VUDU'. In most cases, there are not 100 shows that show up with reviews for each service. Why is this? Well, a lot of shows just didn't have reviews by either critics or audience members. Some services also don't have 100 shows yet due to being newer. This is the case with Apple TV+ for instance. Additionally, Rotten Tomatoes apparently goes down for maintenance a lot. This affected the collection of reviews greatly, and I am not sure if I got every single one. Oh well.
tv_show_links.csv: This file is essentially a list of all shows whose data was scraped. Other information includes the network (or networks in some cases) a show can be found on, the percentage of critics who had positive reviews, the percentage of audience members who had positive reviews, and the Rotten Tomatoes link the reviews can be found at (mostly was used during scraping, not sure what use it would have in any data analysis).
audience_reviews.csv: This is the file that contains audience reviews. The columns are the show that is reviewed, the rating (on a scale from 0-5), and the review text.
critic_reviews.csv: The file that contains critic reviews. The columns are the reviewed show, the sentiment the critic has (1 for positive, 0 for negative), and the review text.
Facebook
TwitterRainbow Tomatoes Garden (RTG) is an online farm stand that offers the largest selection of tinned fish in the world. Tinned (canned) fish is popular in many cultures worldwide, and there are many varieties containing different seafoods, different preparations, and even sauces and spices added directly to the can.
RTG created this canned fish dataset in order to present to its customers the variety of options they offer, so customers can more easily find products they’d like. We need to clean the data to use it, but once we do, we’ll use it to study the different types of tinned fish available worldwide.
There are a total of 702 rows in the data, with each row corresponding to a different canned seafood product.
Data from: https://rainbowtomatoesgarden.com/index.php/choosing-a-tin/
Data downloaded August 8, 2023.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed for object detection, focusing on differentiating between green and red tomatoes. The annotation task involves drawing bounding boxes around these classes:
Green tomatoes are generally unripe. They have a circular shape with a smooth surface and lack the vibrant red hue of ripe tomatoes. Their color ranges from light green to a slightly yellowish tint as they begin to ripen.
Red tomatoes are fully ripe, featuring a bright red color that covers the entire surface. They maintain the classic tomato shape and may appear slightly larger and softer compared to their green counterparts.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterProblem statement : Tomato is one of the most extensively grown vegetables in any country, and their diseases can significantly affect yield and quality. Accurate and early detection of tomato diseases is crucial for reducing losses and improving crop management. Current Deep Learning and CNN research have resulted in the availability of multiple CNN designs, making automated plant disease identification viable rather than traditional visual inspection-based disease detection. When using Deep Learning Methods, the dataset serves one of the most crucial roles in disease prediction. PlantVillage is the most widely used publicly available dataset for Tomato Disease detection, but it was created in a lab/controlled environment, and models trained on it do not perform well on real-world images. Some natural or real-world datasets are available, but they are private and not publicly available. Also, when attempting to predict tomato diseases on the field in the Jodhpur and Jaipur districts of Rajasthan, India, we found that the majority of diseases are Leaf Miner, spotted wilt virus, and Nutrition deficiency diseases, but there are no public datasets containing such categories.
Proposed Solution:To overcome these challenges, we propose the creation of a new dataset called "Tomato-Village" with three variants: a) Multiclass tomato disease classification, b) Multilabel tomato disease classification and c) Object detection based tomato disease detection. As per our best knowledge, “Tomato-Village” will be the first such dataset to be available publicly. Further, we have applied the various CNN architectures/models on this dataset, and baseline results are drawn.
To use the dataset , Please cite the below article : Gehlot, M., Saxena, R.K. & Gandhi, G.C. “Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Systems (2023). DOI : https://doi.org/10.1007/s00530-023-01158-y
Article Link : https://link.springer.com/article/10.1007/s00530-023-01158-y