13 datasets found

Tomato-Village dataset
kaggle.com
zip
Updated Aug 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mvgehlot (2023). Tomato-Village dataset [Dataset]. https://www.kaggle.com/datasets/mamtag/tomato-village
Explore at:
zip(1611294359 bytes)Available download formats
Dataset updated
Aug 27, 2023
Authors
mvgehlot
Description
Problem statement : Tomato is one of the most extensively grown vegetables in any country, and their diseases can significantly affect yield and quality. Accurate and early detection of tomato diseases is crucial for reducing losses and improving crop management. Current Deep Learning and CNN research have resulted in the availability of multiple CNN designs, making automated plant disease identification viable rather than traditional visual inspection-based disease detection. When using Deep Learning Methods, the dataset serves one of the most crucial roles in disease prediction. PlantVillage is the most widely used publicly available dataset for Tomato Disease detection, but it was created in a lab/controlled environment, and models trained on it do not perform well on real-world images. Some natural or real-world datasets are available, but they are private and not publicly available. Also, when attempting to predict tomato diseases on the field in the Jodhpur and Jaipur districts of Rajasthan, India, we found that the majority of diseases are Leaf Miner, spotted wilt virus, and Nutrition deficiency diseases, but there are no public datasets containing such categories.

Proposed Solution:To overcome these challenges, we propose the creation of a new dataset called "Tomato-Village" with three variants: a) Multiclass tomato disease classification, b) Multilabel tomato disease classification and c) Object detection based tomato disease detection. As per our best knowledge, “Tomato-Village” will be the first such dataset to be available publicly. Further, we have applied the various CNN architectures/models on this dataset, and baseline results are drawn.

To use the dataset , Please cite the below article : Gehlot, M., Saxena, R.K. & Gandhi, G.C. “Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Systems (2023). DOI : https://doi.org/10.1007/s00530-023-01158-y

Article Link : https://link.springer.com/article/10.1007/s00530-023-01158-y
R
Big Tomato Dataset
universe.roboflow.com
zip
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Big Tomato Dataset [Dataset]. https://universe.roboflow.com/project-mobss/big-tomato/model/2
Explore at:
zipAvailable download formats
Dataset updated
Jul 24, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Tomato Bounding Boxes
Description
Big Tomato

## Overview Big Tomato is a dataset for object detection tasks - it contains Tomato annotations for 442 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Rotten Tomatoes Movie Reviews
kaggle.com
Updated Nov 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Rotten Tomatoes Movie Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/movie-review-data-set-from-rotten-tomatoes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 20, 2022
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Rotten Tomatoes Movie Reviews

Predicting Movie Review Sentiment

Source

Huggingface Hub: link

About this dataset

The Rotten Tomatoes Movie Review Sentiment Analysis Dataset contains a set of 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. Bo Pang and Lillian Lee first used this data in their paper Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, which was published in Proceedings of the ACL in 2005. All of the data fields are identical in every single one of the splits.The text column contains the review itself, and the label column indicates whether the review is positive or negative

How to use the dataset

The Performance of Sentiment Analysis In this post we take a look at the performance of different sentiment analysis systems on a movie review dataset from Rotten Tomatoes. This data was first used in Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales., Proceedings of the ACL, 2005. The data fields are the same among all splits

We will be using three different libraries for this post: 1) Scikit-learn, 2) NLTK, and 3) TextBlob. We will also compare the results of these systems with those from human raters. Each library takes different amounts of time and resources to run, so we will also be considering these factors in our comparisons.

NLTK

NLTK is a popular library for working with text data in Python. It includes many useful features for pre-processing text data, including tokenization, lemmatization, and part-of-speech tagging. NLTK also includes a number of helpful classes for building and evaluating predictive models (such as decision trees and maximum entropy classifiers).

TextBlob

TextBlob is a relatively new library that attempts to provide an easy-to-use interface for common text processing tasks (such as part-of-speech tagging, sentence parsing, spelling correction, etc). TextBlob is built on top of NLTK and Pattern, another Python library for web mining (see below).

Scikit-learn

Scikit-learn is a popular machine learning library for Python that provides efficient implementations of common algorithms such as support vector machines, random forests, and k-nearest neighbors classifiers. It also includes helpful utilities for pre-processing data and assessing model performance

Research Ideas

Identify positive and negative sentiment in movie reviews

Categorize movie reviews by rating

Cluster movie reviews to group together similar reviews

Acknowledgements

Huggingface Hub: link

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

File: train.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

File: test.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |
h
pixar_movies
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rummage Labs, pixar_movies [Dataset]. https://huggingface.co/datasets/RummageLabs/pixar_movies
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Rummage Labs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pixar Movies Dataset

A comprehensive dataset of Pixar movies, including details on their release dates, directors, cast, box office performance, and ratings. This dataset is gathered from official sources, including Pixar, Rotten Tomatoes, and IMDb. For more information, visit Pixar.

How the Data is Compiled

All information in this dataset has been collected from public sources, including official information from Pixar, Rotten Tomatoes, and IMDb. Cells are each… See the full description on the dataset page: https://huggingface.co/datasets/RummageLabs/pixar_movies.
S
Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes:...
ceicdata.com
Updated Mar 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported [Dataset]. https://www.ceicdata.com/en/saudi-arabia/average-prices-of-goods-and-services/prices-of-goods-and-services-avg-vegetables-tomatoes-imported
Explore at:
Dataset updated
Mar 15, 2023
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2024 - Jan 1, 2025
Area covered
Saudi Arabia
Variables measured
Price
Description
Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data was reported at 6.430 SAR/kg in Mar 2025. This records an increase from the previous number of 6.250 SAR/kg for Feb 2025. Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data is updated monthly, averaging 5.470 SAR/kg from Jan 2009 (Median) to Mar 2025, with 195 observations. The data reached an all-time high of 8.320 SAR/kg in Oct 2013 and a record low of 3.400 SAR/kg in Jun 2009. Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data remains active status in CEIC and is reported by General Authority for Statistics. The data is categorized under Global Database’s Saudi Arabia – Table SA.P001: Average Prices of Goods and Services. [COVID-19-IMPACT]
Rotten Tomatoes movies and critic reviews dataset
kaggle.com
zip
Updated Nov 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefano Leone (2020). Rotten Tomatoes movies and critic reviews dataset [Dataset]. https://www.kaggle.com/stefanoleone992/rotten-tomatoes-movies-and-critic-reviews-dataset
Explore at:
zip(80928022 bytes)Available download formats
Dataset updated
Nov 4, 2020
Authors
Stefano Leone
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Movies' data is stored on several popular websites, but when it comes to critic reviews there is no better place than Rotten Tomatoes. This website allows to compare the ratings given by regular users (audience score) and the ratings given/reviews provided by critics (tomatometer) who are certified members of various writing guilds or film critic-associations.

Content

In the movies dataset each record represents a movie available on Rotten Tomatoes, with the URL used for the scraping, movie tile, description, genres, duration, director, actors, users' ratings, and critics' ratings. In the critics dataset each record represents a critic review published on Rotten Tomatoes, with the URL used for the scraping, critic name, review publication, date, score, and content.

Acknowledgements

Data has been scraped from the publicly available website https://www.rottentomatoes.com as of 2020-10-31. Since the data takes a few days to be scraped from the website, there is no full consistency between some fields of the movies and critics datasets such as "tomatometer_count", "tomatometer_top_critics_count", "tomatometer_fresh_critics_count", and "tomatometer_rotten_critics_count" of the movies dataset compared to all records included in the critics dataset, which has been scraped first.

Inspiration

To provide detailed information in regards to movies' critic reviews and the users' vs critics' ratings, and it can be combined with other movie datasets publicly available (FilmTV, etc.).
Table_2_The Bacterial Microbiome of the Tomato Fruit Is Highly Dependent on...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carolina Escobar Rodríguez; Johannes Novak; Franziska Buchholz; Pia Uetz; Laura Bragagna; Marija Gumze; Livio Antonielli; Birgit Mitter (2023). Table_2_The Bacterial Microbiome of the Tomato Fruit Is Highly Dependent on the Cultivation Approach and Correlates With Flavor Chemistry.docx [Dataset]. http://doi.org/10.3389/fpls.2021.775722.s006
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpls.2021.775722.s006
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Carolina Escobar Rodríguez; Johannes Novak; Franziska Buchholz; Pia Uetz; Laura Bragagna; Marija Gumze; Livio Antonielli; Birgit Mitter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The modes of interactions between plants and plant-associated microbiota are manifold, and secondary metabolites often play a central role in plant-microbe interactions. Abiotic and biotic (including both plant pathogens and endophytes) stress can affect the composition and concentration of secondary plant metabolites, and thus have an influence on chemical compounds that make up for the taste and aroma of fruit. While the role of microbiota in growth and health of plants is widely acknowledged, relatively little is known about the possible effect of microorganisms on the quality of fruit of plants they are colonizing. In this work, tomato (Solanum lycopersicum L.) plants of five different cultivars were grown in soil and in hydroponics to investigate the impact of the cultivation method on the flavor of fruit, and to assess whether variations in their chemical composition are attributable to shifts in bacterial microbiota. Ripe fruit were harvested and used for bacterial community analysis and for the analysis of tomato volatiles, sugars and acids, all contributing to flavor. Fruit grown in soil showed significantly higher sugar content, whereas tomatoes from plants under hydroponic conditions had significantly higher levels of organic acids. In contrast, aroma profiles of fruit were shaped by the tomato cultivars, rather than the cultivation method. In terms of bacterial communities, the cultivation method significantly defined the community composition in all cultivars, with the bacterial communities in hydroponic tomatoes being more variable that those in tomatoes grown in soil. Bacterial indicator species in soil-grown tomatoes correlated with higher concentrations of volatiles described to be perceived as “green” or “pungent.” A soil-grown specific reproducibly occurring ASV (amplicon sequence variants) classified as Bacillus detected solely in “Solarino” tomatoes, which were the sweetest among all cultivars, correlated with the amount of aroma-relevant volatiles as well as of fructose and glucose in the fruit. In contrast, indicator bacterial species in hydroponic-derived tomatoes correlated with aroma compounds with “sweet” and “floral” notes and showed negative correlations with glucose concentrations in fruit. Overall, our results point toward a microbiota-related accumulation of flavor and aroma compounds in tomato fruit, which is strongly dependent on the cultivation substrate and approach.
Laboro Tomato
kaggle.com
zip
Updated Oct 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karthik Vinayan (2023). Laboro Tomato [Dataset]. https://www.kaggle.com/nexuswho/laboro-tomato
Explore at:
zip(1647136197 bytes)Available download formats
Dataset updated
Oct 20, 2023
Authors
Karthik Vinayan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Laboro Tomato is an image dataset of growing tomatoes at different stages of their ripening which is designed for object detection and instance segmentation tasks. We also provide two subsets of tomatoes separated by size. Dataset was gathered at a local farm with two separate cameras with its different resolution and image quality.

COCO and YOLO annotations are available

https://github.com/nexuswho/LaboroTomato/blob/master/examples/ann_gif_IMG_1066.gif?raw=true"> https://github.com/nexuswho/LaboroTomato/blob/master/examples/ann_gif_IMG_1246.gif?raw=true">
Samples of raw/annotated images: IMG_1066, IMG_1246

Annotation details

Each tomato is divided into 2 categories according to size (normal size and cherry tomato) and 3 categories depending on the stage of ripening:
* fully_ripened - complitely red color and ready to be harvested. Filled with red color on 90%* or more * half_ripened - greenish and needs time to ripen. Filled with red color on 30-89%* * green - complitely green/white, sometimes with rare red parts. Filled with red color on 0-30%*

*All percentages are approximate and differ from case to case.

https://github.com/nexuswho/LaboroTomato/blob/master/examples/laboro_tomato_exp1.png?raw=true">

Dataset details

Dataset includes 804 images with following details:

name: tomato_mixed images: 643 train, 161 test cls_num: 6 cls_names: b_fully_ripened, b_half_ripened, b_green, l_fully_ripened, l_half_ripened, l_green total_bboxes: train[7781], test[1,996] bboxes_per_class: *Train: b_fully_ripened[348], b_half_ripened[520], b_green[1467], l_fully_ripened[982], l_half_ripened[797], l_green[3667] *Test: b_fully_ripened[72], b_half_ripened[116], b_green[387], l_fully_ripened[269], l_half_ripened[223], l_green[929] image_resolutions: 3024x4032, 3120x4160

https://github.com/nexuswho/LaboroTomato/blob/master/examples/laboro_tomato_exp2.png?raw=true">

Scope of application

Laboro Tomato dataset can be used to solve cutting edge real-life tasks by fusing various technologies:
* Harvesting forecast based on tomato maturity * Automatic harvest of only ripened tomates * Identification and automatic thinning of deteriorated and obsolete tomatoes * Sprayig pesticides only on tomatoes at a specific ripening stage * Temperature control in greenhouse according to ripening stage * Quality control on production line of food manufactures, etc.

Baseline

Pretrained model

Model have been trained by mmdetection V2.0 on 4 Tesla-V100 and based on Mask R-CNN with R-50-FPN 1x backbone:

Dataset bbox AP mask AP Download
Laboro Tomato 64.3 65.7 model

We haven't done hyperparameters tuning for baseline model training and used default values, provided by original mmdetection configs.
Training parameters:
lr = 0.01 step = [32, 44] total epoch = 48

Output examples

Image gallery with pretrained model output examples and its comparison between raw and annotated images.

Test a dataset

To evaluate pretrained models please prepare mmdetection environment by official installation guide.

Prepare dataset

It is recommended to symlink the dataset root to $MMDETECTION/data. If your folder structure is different, you may need to change the corresponding paths in config files.

mmdetection ├── mmdet ├── tools ├── configs ├── data │ ├── laboro_tomato │ │ ├── annotations │ │ ├── train │ │ ├── test

Add datasets to mmdetection

To load data we need to create a new config file mmdet/datasets/laboro_tomato.py with corresponding subsets:

from .coco import CocoDataset from .builder import DATASETS @DATASETS.register_module() class LaboroTomato(CocoDataset): CLASSES = ('b_fully_ripened', 'b_half_ripened', 'b_green', 'l_fully_ripened', 'l_half_ripened', 'l_green')

And add dataset names to mmdet/datasets/_init_.py:

from .laboro_tomato import LaboroTomato _all_ = [ ..., 'LaboroTomato' ]

Configuration files

Configuration files setup on Tomato Mixed dataset example:

Create laboro_tomato_base.py in configs/_base_/datasets/ with content of coco_detection configuration file and change dataset type, root and path parameters:

dataset_type = 'LaboroTomato' data_root = 'data/laboro_tomato/' ...

Create `...
tomatOD
kaggle.com
zip
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karthik Vinayan (2023). tomatOD [Dataset]. https://www.kaggle.com/datasets/nexuswho/tomatod
Explore at:
zip(171728770 bytes)Available download formats
Dataset updated
Oct 20, 2023
Authors
Karthik Vinayan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
tomatOD

tomatOD is a dataset for tomato fruit localization and ripening classification, containing images of tomato fruits in a greenhouse and high-quality expert annotations from agriculturists. It is a task-specific object detection dataset for tomato fruits, suitable for precision agriculture applications that typically require highly-accurate localization.

The tomatOD dataset consists of 277 images with 2418 annotated tomato fruit samples of unripe, semi-ripe and fully-ripe classes.

The images and the annotations are licensed under CC BY-NC-SA 4.0 license. The contents of this repository are released under the license.

Sample images with tomato fruit annotations are shown below.

https://github.com/nexuswho/tomatOD/blob/master/assets/tomatOD_img1.png?raw=true"> https://github.com/nexuswho/tomatOD/blob/master/assets/tomatOD_img1.png?raw=true">

Data organization

The dataset was split into train and test set according to a 80%/20% train-test split ratio. Please, note that the selection of the training and test data was conducted in a semi-random manner. The following table shows the number of images and annotated boxes of train and test sets of the tomatOD dataset.

Train Test
Images 222 55
Annotated
boxes 1952 466

Data Format

The annotations of the tomatOD dataset are provided in a COCO compatible format.

Fix for test annotations error in with categorical ids contributed by ARTURO-BANDINI-JR

Statistics and data analysis

tomatOD classes

The table below shows the number of annotated objects for each class of the tomatOD dataset.

unripe semi-ripe fully-ripe
1592 395 431

Additionally, the following figure illustrates the relative appearance frequencies of those three classes of the dataset. The classes of the tomatOD dataset are clearly not balanced, however their relative proportion is in line with the actual appearance frequency of each class in a realistic scenario.

https://github.com/nexuswho/tomatOD/blob/master/assets/classes_proportions_tomatOD.png?raw=true">

Size distribution of bounding boxes

The percentile relative size of each bounding box is calculated, which indicates the proportion of the diagonal length of each box over the diagonal length of the image. In the image below, the histogram of the percentile relative size distribution of the tomatOD bounding boxes is presented. Most of the bounding boxes have a size of 3% to 15% relative to the image size.

https://github.com/nexuswho/tomatOD/blob/master/assets/histogramm_boxes.png?raw=true">

Number of labelled instances per image

Only 1% of images have one category per image and 11% of images include 8 instances, while the maximum number of instances per image, which is 20, is found only in 0.72% of the images. The tomatOD dataset has an average of 8.7 instances per image. The image displays the histogram of the number of annotation instances per image.

Number of categories in images

As the next figure shows, more than 50% of the tomatOD images contain objects of all 3 categories, while less than 8% of the images have objects of a single category.

https://github.com/nexuswho/tomatOD/blob/master/assets/categories_in_images.png?raw=true">

Experiment

Six state-of-the-art detectors are evaluated at the proposed tomatOD dataset. In detail, Faster RCNN with Inception v2, SSD with both Inception v2 and Mobilenet v2, PPN with Inception v2, RetinaNet (ResNet 101) and Yolo v3 are trained on tomatOD train set for 450 epochs, all of them pretrained on COCO dataset. Afterwards, they are evaluated on test set. Hyperparameter fine-tuning was performed for all networks in order to perform optimally on the tomatOD dataset.

The figure below illustrates the accuracy over epochs for both the train and the test set for every trained model.

https://github.com/nexuswho/tomatOD/blob/master/assets/accuracy_vs_epochs.png?raw=true">

Retina outperformed the rest detectors, yielding an accuracy of 79.4 %. The average precision of each class, the mAP metrics and precision-recall curves for classes of RetinaNet are listed.

In the precision-recall curves diagram, the unripe class is indicated by the green line, the semi-ripe class by the orange line, while the fully-ripe class by red line.

unripe AP (%) semi-ripe AP (%) fully-ripe (%) mAP (%)
RetinaNet 91.47 55.28 76.77 74.51

<img...
Rotten Tomatoes Reviews for Online Streaming Shows
kaggle.com
zip
Updated Jul 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Colton Barger (2022). Rotten Tomatoes Reviews for Online Streaming Shows [Dataset]. https://www.kaggle.com/datasets/coltonbarger/rotten-tomatoes-reviews-for-online-streaming-shows
Explore at:
zip(8908381 bytes)Available download formats
Dataset updated
Jul 17, 2022
Authors
Colton Barger
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset Photo by Nicolas J Leclercq on Unsplash

This is a collection of both critic and audience reviews for 685 different online streaming shows off of Rotten Tomatoes. The data was scraped the week of July 10, 2022. Reviews found in the files are previews in most cases, especially when the critic review is long. There are also duplicates of some reviews in the case that a show may be found on more than one online streaming service (either paid or with a subscription).

Data was collected for the top 100 most popular shows from 9 streaming platforms: 'Apple TV+', 'Paramount+', 'HBO MAX', 'Disney+', 'Prime Video', 'Hulu', 'Netflix', 'Peacock', and 'VUDU'. In most cases, there are not 100 shows that show up with reviews for each service. Why is this? Well, a lot of shows just didn't have reviews by either critics or audience members. Some services also don't have 100 shows yet due to being newer. This is the case with Apple TV+ for instance. Additionally, Rotten Tomatoes apparently goes down for maintenance a lot. This affected the collection of reviews greatly, and I am not sure if I got every single one. Oh well.

About the Files

tv_show_links.csv: This file is essentially a list of all shows whose data was scraped. Other information includes the network (or networks in some cases) a show can be found on, the percentage of critics who had positive reviews, the percentage of audience members who had positive reviews, and the Rotten Tomatoes link the reviews can be found at (mostly was used during scraping, not sure what use it would have in any data analysis).

audience_reviews.csv: This is the file that contains audience reviews. The columns are the show that is reviewed, the rating (on a scale from 0-5), and the review text.

critic_reviews.csv: The file that contains critic reviews. The columns are the reviewed show, the sentiment the critic has (1 for positive, 0 for negative), and the review text.
🐟 RTG Tinned Fish
kaggle.com
zip
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2024). 🐟 RTG Tinned Fish [Dataset]. https://www.kaggle.com/datasets/mexwell/rtg-tinned-fish
Explore at:
zip(23770 bytes)Available download formats
Dataset updated
Jul 31, 2024
Authors
mexwell
Description
Motivation

Rainbow Tomatoes Garden (RTG) is an online farm stand that offers the largest selection of tinned fish in the world. Tinned (canned) fish is popular in many cultures worldwide, and there are many varieties containing different seafoods, different preparations, and even sauces and spices added directly to the can.

RTG created this canned fish dataset in order to present to its customers the variety of options they offer, so customers can more easily find products they’d like. We need to clean the data to use it, but once we do, we’ll use it to study the different types of tinned fish available worldwide.

Data

There are a total of 702 rows in the data, with each row corresponding to a different canned seafood product.

Variable Description

Name (caveats to the data) Name of the product

RTG $ Price of the canned product

Oil/Water Used Whether the fish is canned with water, some kind of oil, or vinegar

Type of Fish The type of seafood used in the product (not necessarily a fish)

Latin Name Latin name (species) of the seafood

Country Origin The country of origin of the seafood

Brand The brand name of the canned seafood

Has Salt Whether the product contains salt

Has Sugar Whether the product contains sugar

Sauces/Inclusions The type of sauces or seasonings contained in the product

Boneless Whether the product is boneless (“NA” means the seafood does not contain bones at all, so there are none to remove; for instance, scallops do not have bones)

Skinless Whether the product is skinless (“NA” means the seafood does not have any skin/scales to remove, such as for mussels or squid)

Pieces/Tin The number of pieces contained per tin

Tin Size (in grams unless otherwise noted) Weight of the product in grams

Tin Size (in oz unless otherwise noted) Weight of the product in oz

Smoked Whether the product is smoked (including all methods of getting the flavor into the product)

Grilled/Seared Whether the product is grilled or seared

Citrus Whether the product contains citrus (everything from slices of fruit to lemon essence)

Garlic Whether the product contains garlic

Chili Pepper Whether the product contains chili peppers

Tomato Whether the product contains tomatoes

Dairy Whether the product contains dairy -Gluten Whether the product contains gluten

Organic Whether the product contains some amount of certified organic agricultural products

Kosher Cert Whether the product is Kosher certified

Servings/Tin The amount of servings per tin

Sodium/Serv (mg) The amount of sodium per serving

% RDA Sodium Percentage of recommended daily use of sodium, per serving

Questions

Since this is a manually entered dataset, to begin statistical analysis on this dataset, one needs to first perform some data cleaning. For example, most sodium amounts are numbers, but a few rows contain the string “< 1g”. In other variables, missing values are given several different ways, such as by being left blank, “?”, or “NA”. Load the data and carefully review the types of all variables you’ll need for your analysis. Write the necessary code to correct or filter unusual values.

Conduct an exploratory data analysis. What is the distribution of prices? Prices per unit weight? Which countries of origin are most common?

On average, how much more sodium does the salted tinned fish contain than unsalted fish?

Use ANOVA to test whether the price (per unit weight) of tinned fish varies by country, controlling for the type of fish.

Use logistic regression to explore the odds that the product contains added sugar. Which factors are most strongly associated with added sugar?

References

Data from: https://rainbowtomatoesgarden.com/index.php/choosing-a-tin/

Data downloaded August 8, 2023.

Acknowledgement

Foto von James Wei auf Unsplash
Tomatoes 2 2wvhj Aml8q Mpzp Dataset
universe.roboflow.com
zip
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow 100-VL (2025). Tomatoes 2 2wvhj Aml8q Mpzp Dataset [Dataset]. https://universe.roboflow.com/rf100-vl/tomatoes-2-2wvhj-aml8q-mpzp
Explore at:
zipAvailable download formats
Dataset updated
Mar 13, 2025
Dataset provided by
Roboflowhttps://roboflow.com/
Authors
Roboflow 100-VL
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Tomatoes 2 2wvhj Aml8q Mpzp Bounding Boxes
Description
Overview

Introduction

Object Classes

Green Tomatoes

Red Tomatoes

Introduction

This dataset is designed for object detection, focusing on differentiating between green and red tomatoes. The annotation task involves drawing bounding boxes around these classes:

Green Tomatoes: Unripe tomatoes, typically firm and less vibrant.

Red Tomatoes: Ripe tomatoes, known for their full, red color and soft texture.

Object Classes

Green Tomatoes

Description

Green tomatoes are generally unripe. They have a circular shape with a smooth surface and lack the vibrant red hue of ripe tomatoes. Their color ranges from light green to a slightly yellowish tint as they begin to ripen.

Instructions

Annotate each visible green tomato by enclosing it in a bounding box.

Ensure the box tightly fits the tomato's circular shape, capturing the entire fruit.

If a green tomato is partially occluded by other objects, extend the box to include the occluded parts, if they can be reasonably inferred.

Do not label tomatoes that are turning yellow or have substantial red areas, as they are not fully unripe.

Red Tomatoes

Description

Red tomatoes are fully ripe, featuring a bright red color that covers the entire surface. They maintain the classic tomato shape and may appear slightly larger and softer compared to their green counterparts.

Instructions

Draw bounding boxes around each red tomato, making sure to encompass their entire rounded shape.

Include tomatoes that are predominantly red even if they have small green or yellow patches.

If a red tomato is partially obscured, include the hidden portions as much as possible in the annotation.

Avoid labeling any fruit that is primarily green or has not reached full redness.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Dataset	bbox AP	mask AP	Download
Laboro Tomato	64.3	65.7	model

	Train	Test
Images	222	55
Annotated boxes	1952	466

unripe	semi-ripe	fully-ripe
1592	395	431

	unripe AP (%)	semi-ripe AP (%)	fully-ripe (%)	mAP (%)
RetinaNet	91.47	55.28	76.77	74.51

Facebook

Twitter

Click to copy link

Link copied

Cite

mvgehlot (2023). Tomato-Village dataset [Dataset]. https://www.kaggle.com/datasets/mamtag/tomato-village

Tomato-Village dataset

A dataset for end to end Tomato Disease detection in real-world environment

Explore at:

85 scholarly articles cite this dataset (View in Google Scholar)

zip(1611294359 bytes)Available download formats

Dataset updated

Aug 27, 2023

Authors

mvgehlot

Description

Problem statement : Tomato is one of the most extensively grown vegetables in any country, and their diseases can significantly affect yield and quality. Accurate and early detection of tomato diseases is crucial for reducing losses and improving crop management. Current Deep Learning and CNN research have resulted in the availability of multiple CNN designs, making automated plant disease identification viable rather than traditional visual inspection-based disease detection. When using Deep Learning Methods, the dataset serves one of the most crucial roles in disease prediction. PlantVillage is the most widely used publicly available dataset for Tomato Disease detection, but it was created in a lab/controlled environment, and models trained on it do not perform well on real-world images. Some natural or real-world datasets are available, but they are private and not publicly available. Also, when attempting to predict tomato diseases on the field in the Jodhpur and Jaipur districts of Rajasthan, India, we found that the majority of diseases are Leaf Miner, spotted wilt virus, and Nutrition deficiency diseases, but there are no public datasets containing such categories.

Proposed Solution:To overcome these challenges, we propose the creation of a new dataset called "Tomato-Village" with three variants: a) Multiclass tomato disease classification, b) Multilabel tomato disease classification and c) Object detection based tomato disease detection. As per our best knowledge, “Tomato-Village” will be the first such dataset to be available publicly. Further, we have applied the various CNN architectures/models on this dataset, and baseline results are drawn.

To use the dataset , Please cite the below article : Gehlot, M., Saxena, R.K. & Gandhi, G.C. “Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Systems (2023). DOI : https://doi.org/10.1007/s00530-023-01158-y

Article Link : https://link.springer.com/article/10.1007/s00530-023-01158-y

Clear search

Close search

Google apps

Main menu

Tomato-Village dataset

Big Tomato Dataset

Big Tomato

Datasets for Sentiment Analysis

Rotten Tomatoes Movie Reviews

Rotten Tomatoes Movie Reviews

Predicting Movie Review Sentiment

Source

About this dataset

How to use the dataset

NLTK

TextBlob

Scikit-learn

Research Ideas

Acknowledgements

License

Columns

pixar_movies

Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes:...

Rotten Tomatoes movies and critic reviews dataset

Context

Content

Acknowledgements

Inspiration

Table_2_The Bacterial Microbiome of the Tomato Fruit Is Highly Dependent on...

Laboro Tomato

Annotation details

Dataset details

Scope of application

Baseline

Pretrained model

Output examples

Test a dataset

Prepare dataset

Add datasets to mmdetection

Configuration files

tomatOD

tomatOD

Data organization

Data Format

Statistics and data analysis

tomatOD classes

Size distribution of bounding boxes

Number of labelled instances per image

Number of categories in images

Experiment

Rotten Tomatoes Reviews for Online Streaming Shows

About the Files

🐟 RTG Tinned Fish

Motivation

Data

Variable Description

Questions

References

Acknowledgement

Tomatoes 2 2wvhj Aml8q Mpzp Dataset

Overview

Introduction

Object Classes

Green Tomatoes

Description

Instructions

Red Tomatoes

Description

Instructions

Tomato-Village dataset

A dataset for end to end Tomato Disease detection in real-world environment