13 datasets found
  1. Tomato-Village dataset

    • kaggle.com
    zip
    Updated Aug 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mvgehlot (2023). Tomato-Village dataset [Dataset]. https://www.kaggle.com/datasets/mamtag/tomato-village
    Explore at:
    zip(1611294359 bytes)Available download formats
    Dataset updated
    Aug 27, 2023
    Authors
    mvgehlot
    Description

    Problem statement : Tomato is one of the most extensively grown vegetables in any country, and their diseases can significantly affect yield and quality. Accurate and early detection of tomato diseases is crucial for reducing losses and improving crop management. Current Deep Learning and CNN research have resulted in the availability of multiple CNN designs, making automated plant disease identification viable rather than traditional visual inspection-based disease detection. When using Deep Learning Methods, the dataset serves one of the most crucial roles in disease prediction. PlantVillage is the most widely used publicly available dataset for Tomato Disease detection, but it was created in a lab/controlled environment, and models trained on it do not perform well on real-world images. Some natural or real-world datasets are available, but they are private and not publicly available. Also, when attempting to predict tomato diseases on the field in the Jodhpur and Jaipur districts of Rajasthan, India, we found that the majority of diseases are Leaf Miner, spotted wilt virus, and Nutrition deficiency diseases, but there are no public datasets containing such categories.

    Proposed Solution:To overcome these challenges, we propose the creation of a new dataset called "Tomato-Village" with three variants: a) Multiclass tomato disease classification, b) Multilabel tomato disease classification and c) Object detection based tomato disease detection. As per our best knowledge, “Tomato-Village” will be the first such dataset to be available publicly. Further, we have applied the various CNN architectures/models on this dataset, and baseline results are drawn.

    To use the dataset , Please cite the below article : Gehlot, M., Saxena, R.K. & Gandhi, G.C. “Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Systems (2023). DOI : https://doi.org/10.1007/s00530-023-01158-y

    Article Link : https://link.springer.com/article/10.1007/s00530-023-01158-y

  2. R

    Big Tomato Dataset

    • universe.roboflow.com
    zip
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Big Tomato Dataset [Dataset]. https://universe.roboflow.com/project-mobss/big-tomato/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 24, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Tomato Bounding Boxes
    Description

    Big Tomato

    ## Overview
    
    Big Tomato is a dataset for object detection tasks - it contains Tomato annotations for 442 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  4. Rotten Tomatoes Movie Reviews

    • kaggle.com
    Updated Nov 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Rotten Tomatoes Movie Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/movie-review-data-set-from-rotten-tomatoes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 20, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Rotten Tomatoes Movie Reviews

    Predicting Movie Review Sentiment

    Source

    Huggingface Hub: link

    About this dataset

    The Rotten Tomatoes Movie Review Sentiment Analysis Dataset contains a set of 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. Bo Pang and Lillian Lee first used this data in their paper Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, which was published in Proceedings of the ACL in 2005. All of the data fields are identical in every single one of the splits.The text column contains the review itself, and the label column indicates whether the review is positive or negative

    How to use the dataset

    The Performance of Sentiment Analysis In this post we take a look at the performance of different sentiment analysis systems on a movie review dataset from Rotten Tomatoes. This data was first used in Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales., Proceedings of the ACL, 2005. The data fields are the same among all splits

    We will be using three different libraries for this post: 1) Scikit-learn, 2) NLTK, and 3) TextBlob. We will also compare the results of these systems with those from human raters. Each library takes different amounts of time and resources to run, so we will also be considering these factors in our comparisons.

    NLTK

    NLTK is a popular library for working with text data in Python. It includes many useful features for pre-processing text data, including tokenization, lemmatization, and part-of-speech tagging. NLTK also includes a number of helpful classes for building and evaluating predictive models (such as decision trees and maximum entropy classifiers).

    TextBlob

    TextBlob is a relatively new library that attempts to provide an easy-to-use interface for common text processing tasks (such as part-of-speech tagging, sentence parsing, spelling correction, etc). TextBlob is built on top of NLTK and Pattern, another Python library for web mining (see below).

    Scikit-learn

    Scikit-learn is a popular machine learning library for Python that provides efficient implementations of common algorithms such as support vector machines, random forests, and k-nearest neighbors classifiers. It also includes helpful utilities for pre-processing data and assessing model performance

    Research Ideas

    • Identify positive and negative sentiment in movie reviews
    • Categorize movie reviews by rating
    • Cluster movie reviews to group together similar reviews

    Acknowledgements

    Huggingface Hub: link

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: validation.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

    File: train.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

    File: test.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

  5. h

    pixar_movies

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rummage Labs, pixar_movies [Dataset]. https://huggingface.co/datasets/RummageLabs/pixar_movies
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Rummage Labs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pixar Movies Dataset

    A comprehensive dataset of Pixar movies, including details on their release dates, directors, cast, box office performance, and ratings. This dataset is gathered from official sources, including Pixar, Rotten Tomatoes, and IMDb. For more information, visit Pixar.

      How the Data is Compiled
    

    All information in this dataset has been collected from public sources, including official information from Pixar, Rotten Tomatoes, and IMDb. Cells are each… See the full description on the dataset page: https://huggingface.co/datasets/RummageLabs/pixar_movies.

  6. S

    Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes:...

    • ceicdata.com
    Updated Mar 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2023). Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported [Dataset]. https://www.ceicdata.com/en/saudi-arabia/average-prices-of-goods-and-services/prices-of-goods-and-services-avg-vegetables-tomatoes-imported
    Explore at:
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 1, 2024 - Jan 1, 2025
    Area covered
    Saudi Arabia
    Variables measured
    Price
    Description

    Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data was reported at 6.430 SAR/kg in Mar 2025. This records an increase from the previous number of 6.250 SAR/kg for Feb 2025. Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data is updated monthly, averaging 5.470 SAR/kg from Jan 2009 (Median) to Mar 2025, with 195 observations. The data reached an all-time high of 8.320 SAR/kg in Oct 2013 and a record low of 3.400 SAR/kg in Jun 2009. Saudi Arabia Prices of Goods and Services: Avg: Vegetables: Tomatoes: Imported data remains active status in CEIC and is reported by General Authority for Statistics. The data is categorized under Global Database’s Saudi Arabia – Table SA.P001: Average Prices of Goods and Services. [COVID-19-IMPACT]

  7. Rotten Tomatoes movies and critic reviews dataset

    • kaggle.com
    zip
    Updated Nov 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Leone (2020). Rotten Tomatoes movies and critic reviews dataset [Dataset]. https://www.kaggle.com/stefanoleone992/rotten-tomatoes-movies-and-critic-reviews-dataset
    Explore at:
    zip(80928022 bytes)Available download formats
    Dataset updated
    Nov 4, 2020
    Authors
    Stefano Leone
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Movies' data is stored on several popular websites, but when it comes to critic reviews there is no better place than Rotten Tomatoes. This website allows to compare the ratings given by regular users (audience score) and the ratings given/reviews provided by critics (tomatometer) who are certified members of various writing guilds or film critic-associations.

    Content

    In the movies dataset each record represents a movie available on Rotten Tomatoes, with the URL used for the scraping, movie tile, description, genres, duration, director, actors, users' ratings, and critics' ratings. In the critics dataset each record represents a critic review published on Rotten Tomatoes, with the URL used for the scraping, critic name, review publication, date, score, and content.

    Acknowledgements

    Data has been scraped from the publicly available website https://www.rottentomatoes.com as of 2020-10-31. Since the data takes a few days to be scraped from the website, there is no full consistency between some fields of the movies and critics datasets such as "tomatometer_count", "tomatometer_top_critics_count", "tomatometer_fresh_critics_count", and "tomatometer_rotten_critics_count" of the movies dataset compared to all records included in the critics dataset, which has been scraped first.

    Inspiration

    To provide detailed information in regards to movies' critic reviews and the users' vs critics' ratings, and it can be combined with other movie datasets publicly available (FilmTV, etc.).

  8. Table_2_The Bacterial Microbiome of the Tomato Fruit Is Highly Dependent on...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carolina Escobar Rodríguez; Johannes Novak; Franziska Buchholz; Pia Uetz; Laura Bragagna; Marija Gumze; Livio Antonielli; Birgit Mitter (2023). Table_2_The Bacterial Microbiome of the Tomato Fruit Is Highly Dependent on the Cultivation Approach and Correlates With Flavor Chemistry.docx [Dataset]. http://doi.org/10.3389/fpls.2021.775722.s006
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Carolina Escobar Rodríguez; Johannes Novak; Franziska Buchholz; Pia Uetz; Laura Bragagna; Marija Gumze; Livio Antonielli; Birgit Mitter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The modes of interactions between plants and plant-associated microbiota are manifold, and secondary metabolites often play a central role in plant-microbe interactions. Abiotic and biotic (including both plant pathogens and endophytes) stress can affect the composition and concentration of secondary plant metabolites, and thus have an influence on chemical compounds that make up for the taste and aroma of fruit. While the role of microbiota in growth and health of plants is widely acknowledged, relatively little is known about the possible effect of microorganisms on the quality of fruit of plants they are colonizing. In this work, tomato (Solanum lycopersicum L.) plants of five different cultivars were grown in soil and in hydroponics to investigate the impact of the cultivation method on the flavor of fruit, and to assess whether variations in their chemical composition are attributable to shifts in bacterial microbiota. Ripe fruit were harvested and used for bacterial community analysis and for the analysis of tomato volatiles, sugars and acids, all contributing to flavor. Fruit grown in soil showed significantly higher sugar content, whereas tomatoes from plants under hydroponic conditions had significantly higher levels of organic acids. In contrast, aroma profiles of fruit were shaped by the tomato cultivars, rather than the cultivation method. In terms of bacterial communities, the cultivation method significantly defined the community composition in all cultivars, with the bacterial communities in hydroponic tomatoes being more variable that those in tomatoes grown in soil. Bacterial indicator species in soil-grown tomatoes correlated with higher concentrations of volatiles described to be perceived as “green” or “pungent.” A soil-grown specific reproducibly occurring ASV (amplicon sequence variants) classified as Bacillus detected solely in “Solarino” tomatoes, which were the sweetest among all cultivars, correlated with the amount of aroma-relevant volatiles as well as of fructose and glucose in the fruit. In contrast, indicator bacterial species in hydroponic-derived tomatoes correlated with aroma compounds with “sweet” and “floral” notes and showed negative correlations with glucose concentrations in fruit. Overall, our results point toward a microbiota-related accumulation of flavor and aroma compounds in tomato fruit, which is strongly dependent on the cultivation substrate and approach.

  9. Laboro Tomato

    • kaggle.com
    zip
    Updated Oct 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karthik Vinayan (2023). Laboro Tomato [Dataset]. https://www.kaggle.com/nexuswho/laboro-tomato
    Explore at:
    zip(1647136197 bytes)Available download formats
    Dataset updated
    Oct 20, 2023
    Authors
    Karthik Vinayan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Laboro Tomato is an image dataset of growing tomatoes at different stages of their ripening which is designed for object detection and instance segmentation tasks. We also provide two subsets of tomatoes separated by size. Dataset was gathered at a local farm with two separate cameras with its different resolution and image quality.

    COCO and YOLO annotations are available

    https://github.com/nexuswho/LaboroTomato/blob/master/examples/ann_gif_IMG_1066.gif?raw=true"> https://github.com/nexuswho/LaboroTomato/blob/master/examples/ann_gif_IMG_1246.gif?raw=true">
    Samples of raw/annotated images: IMG_1066, IMG_1246

    Annotation details

    Each tomato is divided into 2 categories according to size (normal size and cherry tomato) and 3 categories depending on the stage of ripening:
    * fully_ripened - complitely red color and ready to be harvested. Filled with red color on 90%* or more * half_ripened - greenish and needs time to ripen. Filled with red color on 30-89%* * green - complitely green/white, sometimes with rare red parts. Filled with red color on 0-30%*

    *All percentages are approximate and differ from case to case.

    https://github.com/nexuswho/LaboroTomato/blob/master/examples/laboro_tomato_exp1.png?raw=true">

    Dataset details

    Dataset includes 804 images with following details:

    name: tomato_mixed
    images: 643 train, 161 test
    cls_num: 6
    cls_names: b_fully_ripened, b_half_ripened, b_green, l_fully_ripened, l_half_ripened, l_green
    total_bboxes: train[7781], test[1,996]
    bboxes_per_class:
      *Train: b_fully_ripened[348], b_half_ripened[520], b_green[1467], 
          l_fully_ripened[982], l_half_ripened[797], l_green[3667]
      *Test: b_fully_ripened[72], b_half_ripened[116], b_green[387], 
          l_fully_ripened[269], l_half_ripened[223], l_green[929]
    image_resolutions: 3024x4032, 3120x4160
    

    https://github.com/nexuswho/LaboroTomato/blob/master/examples/laboro_tomato_exp2.png?raw=true">

    Scope of application

    Laboro Tomato dataset can be used to solve cutting edge real-life tasks by fusing various technologies:
    * Harvesting forecast based on tomato maturity * Automatic harvest of only ripened tomates * Identification and automatic thinning of deteriorated and obsolete tomatoes * Sprayig pesticides only on tomatoes at a specific ripening stage * Temperature control in greenhouse according to ripening stage * Quality control on production line of food manufactures, etc.

    Baseline

    Pretrained model

    Model have been trained by mmdetection V2.0 on 4 Tesla-V100 and based on Mask R-CNN with R-50-FPN 1x backbone:

    Datasetbbox APmask APDownload
    Laboro Tomato64.365.7model

    We haven't done hyperparameters tuning for baseline model training and used default values, provided by original mmdetection configs.
    Training parameters:
    lr = 0.01 step = [32, 44] total epoch = 48

    Output examples

    Image gallery with pretrained model output examples and its comparison between raw and annotated images.

    Test a dataset

    To evaluate pretrained models please prepare mmdetection environment by official installation guide.

    Prepare dataset

    It is recommended to symlink the dataset root to $MMDETECTION/data. If your folder structure is different, you may need to change the corresponding paths in config files.

    mmdetection
    ├── mmdet
    ├── tools
    ├── configs
    ├── data
    │  ├── laboro_tomato
    │  │  ├── annotations
    │  │  ├── train
    │  │  ├── test
    

    Add datasets to mmdetection

    To load data we need to create a new config file mmdet/datasets/laboro_tomato.py with corresponding subsets:

    from .coco import CocoDataset
    from .builder import DATASETS
    
    
    @DATASETS.register_module()
    class LaboroTomato(CocoDataset):
      CLASSES = ('b_fully_ripened', 'b_half_ripened', 'b_green', 
            'l_fully_ripened', 'l_half_ripened', 'l_green')
    

    And add dataset names to mmdet/datasets/_init_.py:

    from .laboro_tomato import LaboroTomato
    
    _all_ = [  
          ..., 'LaboroTomato'
         ]
    
    

    Configuration files

    Configuration files setup on Tomato Mixed dataset example:

    1. Create laboro_tomato_base.py in configs/_base_/datasets/ with content of coco_detection configuration file and change dataset type, root and path parameters:
    dataset_type = 'LaboroTomato'
    data_root = 'data/laboro_tomato/'
    ...
    
    1. Create `...
  10. tomatOD

    • kaggle.com
    zip
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karthik Vinayan (2023). tomatOD [Dataset]. https://www.kaggle.com/datasets/nexuswho/tomatod
    Explore at:
    zip(171728770 bytes)Available download formats
    Dataset updated
    Oct 20, 2023
    Authors
    Karthik Vinayan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    tomatOD

    tomatOD is a dataset for tomato fruit localization and ripening classification, containing images of tomato fruits in a greenhouse and high-quality expert annotations from agriculturists. It is a task-specific object detection dataset for tomato fruits, suitable for precision agriculture applications that typically require highly-accurate localization.

    The tomatOD dataset consists of 277 images with 2418 annotated tomato fruit samples of unripe, semi-ripe and fully-ripe classes.

    The images and the annotations are licensed under CC BY-NC-SA 4.0 license. The contents of this repository are released under the license.

    Sample images with tomato fruit annotations are shown below.

    https://github.com/nexuswho/tomatOD/blob/master/assets/tomatOD_img1.png?raw=true"> https://github.com/nexuswho/tomatOD/blob/master/assets/tomatOD_img1.png?raw=true">

    Data organization

    The dataset was split into train and test set according to a 80%/20% train-test split ratio. Please, note that the selection of the training and test data was conducted in a semi-random manner. The following table shows the number of images and annotated boxes of train and test sets of the tomatOD dataset.

    TrainTest
    Images22255
    Annotated
    boxes
    1952466

    Data Format

    The annotations of the tomatOD dataset are provided in a COCO compatible format.

    Fix for test annotations error in with categorical ids contributed by ARTURO-BANDINI-JR

    Statistics and data analysis

    tomatOD classes

    The table below shows the number of annotated objects for each class of the tomatOD dataset.

    unripesemi-ripefully-ripe
    1592395431

    Additionally, the following figure illustrates the relative appearance frequencies of those three classes of the dataset. The classes of the tomatOD dataset are clearly not balanced, however their relative proportion is in line with the actual appearance frequency of each class in a realistic scenario.

    https://github.com/nexuswho/tomatOD/blob/master/assets/classes_proportions_tomatOD.png?raw=true">

    Size distribution of bounding boxes

    The percentile relative size of each bounding box is calculated, which indicates the proportion of the diagonal length of each box over the diagonal length of the image. In the image below, the histogram of the percentile relative size distribution of the tomatOD bounding boxes is presented. Most of the bounding boxes have a size of 3% to 15% relative to the image size.

    https://github.com/nexuswho/tomatOD/blob/master/assets/histogramm_boxes.png?raw=true">

    Number of labelled instances per image

    Only 1% of images have one category per image and 11% of images include 8 instances, while the maximum number of instances per image, which is 20, is found only in 0.72% of the images. The tomatOD dataset has an average of 8.7 instances per image. The image displays the histogram of the number of annotation instances per image.

    Number of categories in images

    As the next figure shows, more than 50% of the tomatOD images contain objects of all 3 categories, while less than 8% of the images have objects of a single category.

    https://github.com/nexuswho/tomatOD/blob/master/assets/categories_in_images.png?raw=true">

    Experiment

    Six state-of-the-art detectors are evaluated at the proposed tomatOD dataset. In detail, Faster RCNN with Inception v2, SSD with both Inception v2 and Mobilenet v2, PPN with Inception v2, RetinaNet (ResNet 101) and Yolo v3 are trained on tomatOD train set for 450 epochs, all of them pretrained on COCO dataset. Afterwards, they are evaluated on test set. Hyperparameter fine-tuning was performed for all networks in order to perform optimally on the tomatOD dataset.

    The figure below illustrates the accuracy over epochs for both the train and the test set for every trained model.

    https://github.com/nexuswho/tomatOD/blob/master/assets/accuracy_vs_epochs.png?raw=true">

    Retina outperformed the rest detectors, yielding an accuracy of 79.4 %. The average precision of each class, the mAP metrics and precision-recall curves for classes of RetinaNet are listed.

    In the precision-recall curves diagram, the unripe class is indicated by the green line, the semi-ripe class by the orange line, while the fully-ripe class by red line.

    unripe AP (%)semi-ripe AP (%)fully-ripe (%)mAP (%)
    RetinaNet91.4755.2876.7774.51

    <img...

  11. Rotten Tomatoes Reviews for Online Streaming Shows

    • kaggle.com
    zip
    Updated Jul 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colton Barger (2022). Rotten Tomatoes Reviews for Online Streaming Shows [Dataset]. https://www.kaggle.com/datasets/coltonbarger/rotten-tomatoes-reviews-for-online-streaming-shows
    Explore at:
    zip(8908381 bytes)Available download formats
    Dataset updated
    Jul 17, 2022
    Authors
    Colton Barger
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset Photo by Nicolas J Leclercq on Unsplash

    This is a collection of both critic and audience reviews for 685 different online streaming shows off of Rotten Tomatoes. The data was scraped the week of July 10, 2022. Reviews found in the files are previews in most cases, especially when the critic review is long. There are also duplicates of some reviews in the case that a show may be found on more than one online streaming service (either paid or with a subscription).

    Data was collected for the top 100 most popular shows from 9 streaming platforms: 'Apple TV+', 'Paramount+', 'HBO MAX', 'Disney+', 'Prime Video', 'Hulu', 'Netflix', 'Peacock', and 'VUDU'. In most cases, there are not 100 shows that show up with reviews for each service. Why is this? Well, a lot of shows just didn't have reviews by either critics or audience members. Some services also don't have 100 shows yet due to being newer. This is the case with Apple TV+ for instance. Additionally, Rotten Tomatoes apparently goes down for maintenance a lot. This affected the collection of reviews greatly, and I am not sure if I got every single one. Oh well.

    About the Files

    tv_show_links.csv: This file is essentially a list of all shows whose data was scraped. Other information includes the network (or networks in some cases) a show can be found on, the percentage of critics who had positive reviews, the percentage of audience members who had positive reviews, and the Rotten Tomatoes link the reviews can be found at (mostly was used during scraping, not sure what use it would have in any data analysis).

    audience_reviews.csv: This is the file that contains audience reviews. The columns are the show that is reviewed, the rating (on a scale from 0-5), and the review text.

    critic_reviews.csv: The file that contains critic reviews. The columns are the reviewed show, the sentiment the critic has (1 for positive, 0 for negative), and the review text.

  12. 🐟 RTG Tinned Fish

    • kaggle.com
    zip
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). 🐟 RTG Tinned Fish [Dataset]. https://www.kaggle.com/datasets/mexwell/rtg-tinned-fish
    Explore at:
    zip(23770 bytes)Available download formats
    Dataset updated
    Jul 31, 2024
    Authors
    mexwell
    Description

    Motivation

    Rainbow Tomatoes Garden (RTG) is an online farm stand that offers the largest selection of tinned fish in the world. Tinned (canned) fish is popular in many cultures worldwide, and there are many varieties containing different seafoods, different preparations, and even sauces and spices added directly to the can.

    RTG created this canned fish dataset in order to present to its customers the variety of options they offer, so customers can more easily find products they’d like. We need to clean the data to use it, but once we do, we’ll use it to study the different types of tinned fish available worldwide.

    Data

    There are a total of 702 rows in the data, with each row corresponding to a different canned seafood product.

    Variable Description

    • Name (caveats to the data) Name of the product
    • RTG $ Price of the canned product
    • Oil/Water Used Whether the fish is canned with water, some kind of oil, or vinegar
    • Type of Fish The type of seafood used in the product (not necessarily a fish)
    • Latin Name Latin name (species) of the seafood
    • Country Origin The country of origin of the seafood
    • Brand The brand name of the canned seafood
    • Has Salt Whether the product contains salt
    • Has Sugar Whether the product contains sugar
    • Sauces/Inclusions The type of sauces or seasonings contained in the product
    • Boneless Whether the product is boneless (“NA” means the seafood does not contain bones at all, so there are none to remove; for instance, scallops do not have bones)
    • Skinless Whether the product is skinless (“NA” means the seafood does not have any skin/scales to remove, such as for mussels or squid)
    • Pieces/Tin The number of pieces contained per tin
    • Tin Size (in grams unless otherwise noted) Weight of the product in grams
    • Tin Size (in oz unless otherwise noted) Weight of the product in oz
    • Smoked Whether the product is smoked (including all methods of getting the flavor into the product)
    • Grilled/Seared Whether the product is grilled or seared
    • Citrus Whether the product contains citrus (everything from slices of fruit to lemon essence)
    • Garlic Whether the product contains garlic
    • Chili Pepper Whether the product contains chili peppers
    • Tomato Whether the product contains tomatoes
    • Dairy Whether the product contains dairy -Gluten Whether the product contains gluten
    • Organic Whether the product contains some amount of certified organic agricultural products
    • Kosher Cert Whether the product is Kosher certified
    • Servings/Tin The amount of servings per tin
    • Sodium/Serv (mg) The amount of sodium per serving
    • % RDA Sodium Percentage of recommended daily use of sodium, per serving

    Questions

    • Since this is a manually entered dataset, to begin statistical analysis on this dataset, one needs to first perform some data cleaning. For example, most sodium amounts are numbers, but a few rows contain the string “< 1g”. In other variables, missing values are given several different ways, such as by being left blank, “?”, or “NA”. Load the data and carefully review the types of all variables you’ll need for your analysis. Write the necessary code to correct or filter unusual values.
    • Conduct an exploratory data analysis. What is the distribution of prices? Prices per unit weight? Which countries of origin are most common?
    • On average, how much more sodium does the salted tinned fish contain than unsalted fish?
    • Use ANOVA to test whether the price (per unit weight) of tinned fish varies by country, controlling for the type of fish.
    • Use logistic regression to explore the odds that the product contains added sugar. Which factors are most strongly associated with added sugar?

    References

    Data from: https://rainbowtomatoesgarden.com/index.php/choosing-a-tin/

    Data downloaded August 8, 2023.

    Acknowledgement

    Foto von James Wei auf Unsplash

  13. Tomatoes 2 2wvhj Aml8q Mpzp Dataset

    • universe.roboflow.com
    zip
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow 100-VL (2025). Tomatoes 2 2wvhj Aml8q Mpzp Dataset [Dataset]. https://universe.roboflow.com/rf100-vl/tomatoes-2-2wvhj-aml8q-mpzp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    Roboflowhttps://roboflow.com/
    Authors
    Roboflow 100-VL
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Tomatoes 2 2wvhj Aml8q Mpzp Bounding Boxes
    Description

    Overview

    Introduction

    This dataset is designed for object detection, focusing on differentiating between green and red tomatoes. The annotation task involves drawing bounding boxes around these classes:

    • Green Tomatoes: Unripe tomatoes, typically firm and less vibrant.
    • Red Tomatoes: Ripe tomatoes, known for their full, red color and soft texture.

    Object Classes

    Green Tomatoes

    Description

    Green tomatoes are generally unripe. They have a circular shape with a smooth surface and lack the vibrant red hue of ripe tomatoes. Their color ranges from light green to a slightly yellowish tint as they begin to ripen.

    Instructions

    • Annotate each visible green tomato by enclosing it in a bounding box.
    • Ensure the box tightly fits the tomato's circular shape, capturing the entire fruit.
    • If a green tomato is partially occluded by other objects, extend the box to include the occluded parts, if they can be reasonably inferred.
    • Do not label tomatoes that are turning yellow or have substantial red areas, as they are not fully unripe.

    Red Tomatoes

    Description

    Red tomatoes are fully ripe, featuring a bright red color that covers the entire surface. They maintain the classic tomato shape and may appear slightly larger and softer compared to their green counterparts.

    Instructions

    • Draw bounding boxes around each red tomato, making sure to encompass their entire rounded shape.
    • Include tomatoes that are predominantly red even if they have small green or yellow patches.
    • If a red tomato is partially obscured, include the hidden portions as much as possible in the annotation.
    • Avoid labeling any fruit that is primarily green or has not reached full redness.
  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
mvgehlot (2023). Tomato-Village dataset [Dataset]. https://www.kaggle.com/datasets/mamtag/tomato-village
Organization logo

Tomato-Village dataset

A dataset for end to end Tomato Disease detection in real-world environment

Explore at:
85 scholarly articles cite this dataset (View in Google Scholar)
zip(1611294359 bytes)Available download formats
Dataset updated
Aug 27, 2023
Authors
mvgehlot
Description

Problem statement : Tomato is one of the most extensively grown vegetables in any country, and their diseases can significantly affect yield and quality. Accurate and early detection of tomato diseases is crucial for reducing losses and improving crop management. Current Deep Learning and CNN research have resulted in the availability of multiple CNN designs, making automated plant disease identification viable rather than traditional visual inspection-based disease detection. When using Deep Learning Methods, the dataset serves one of the most crucial roles in disease prediction. PlantVillage is the most widely used publicly available dataset for Tomato Disease detection, but it was created in a lab/controlled environment, and models trained on it do not perform well on real-world images. Some natural or real-world datasets are available, but they are private and not publicly available. Also, when attempting to predict tomato diseases on the field in the Jodhpur and Jaipur districts of Rajasthan, India, we found that the majority of diseases are Leaf Miner, spotted wilt virus, and Nutrition deficiency diseases, but there are no public datasets containing such categories.

Proposed Solution:To overcome these challenges, we propose the creation of a new dataset called "Tomato-Village" with three variants: a) Multiclass tomato disease classification, b) Multilabel tomato disease classification and c) Object detection based tomato disease detection. As per our best knowledge, “Tomato-Village” will be the first such dataset to be available publicly. Further, we have applied the various CNN architectures/models on this dataset, and baseline results are drawn.

To use the dataset , Please cite the below article : Gehlot, M., Saxena, R.K. & Gandhi, G.C. “Tomato-Village”: a dataset for end-to-end tomato disease detection in a real-world environment. Multimedia Systems (2023). DOI : https://doi.org/10.1007/s00530-023-01158-y

Article Link : https://link.springer.com/article/10.1007/s00530-023-01158-y

Search
Clear search
Close search
Google apps
Main menu