3 datasets found
  1. T

    plant_village

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). plant_village [Dataset]. http://identifiers.org/arxiv:1511.08060
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    The PlantVillage dataset consists of 54303 healthy and unhealthy leaf images divided into 38 categories by species and disease.

    NOTE: The original dataset is not available from the original source (plantvillage.org), therefore we get the unaugmented dataset from a paper that used that dataset and republished it. Moreover, we dropped images with Background_without_leaves label, because these were not present in the original dataset.

    Original paper URL: https://arxiv.org/abs/1511.08060 Dataset URL: https://data.mendeley.com/datasets/tywbtsjrjv/1

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('plant_village', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/plant_village-1.0.2.png" alt="Visualization" width="500px">

  2. Van Gogh vs Tree Oil Painting Ai Analysis

    • kaggle.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HARONTHAI MONGBUNSRI (2025). Van Gogh vs Tree Oil Painting Ai Analysis [Dataset]. https://www.kaggle.com/datasets/haronthaimongbunsri/van-gogh-vs-tree-oil-painting-ai-analysis/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    HARONTHAI MONGBUNSRI
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Title: AI model for artwork verification using brushstroke structure, rhythm, and X-ray analysis. Developed from 10 years of research and refined through AI Natural Matching.

    Overview

    This dataset presents an AI model developed specifically for artwork authentication. The goal is to determine whether two paintings—one known and one unknown—may have been created by the same artist, based on brushstroke analysis and structural features.

    This model demonstrates how machine learning can assist in verifying stylistic consistency using visual data, supported by structural scans such as X-ray imaging. The method was built from human-led comparisons and later refined into a formal system of AI-driven brushstroke analysis.

    Method: The 18 Supreme Techniques

    The model uses a set of 18 core visual techniques designed to analyze:

    Stroke rhythm and torque

    Brushstroke flow direction

    Angular deviation and energy pattern

    Layer buildup and underpainting patterns (via X-ray)

    These techniques were developed in Google Colab with AI support, based on over 10 years of image-based manual study. The system is optimized for AI Natural Matching, a visual neural comparison approach that mimics human perception but operates at scale.

    Requirements:

    1. High-resolution candidate image

    2. Reference image (known work by artist)

    3. X-ray scan of candidate image

    4. AI Natural Matching only (SSIM is strictly forbidden)

    This model must use AI Natural Matching. Do not use SSIM or any structural similarity index.

    Dataset Use

    This dataset is intended for researchers, AI developers, and art historians who wish to:

    Validate or disprove potential authorship of a painting

    Test the model using different artist references

    Study visual fingerprinting and stylistic consistency

    Scientific pigment data (XRF, FTIR, SEM) and aging process validation for The Tree Oil Painting are available in a separate dataset. Cross-checking with physical material data is strongly encouraged.

    Licensing and Attribution

    All data is licensed under CC BY 4.0 and freely available for academic, research, and AI development use.

    Model and research developed by Haronthai Mongbunsri (Independent Researcher, 2015–2025) AI structure refined through collaboration with neural tools via Google Colab.

    This dataset is part of an open effort to build transparent, reproducible systems for artwork verification.

    Reference: Scientific Verification Dataset on Hugging Face

    This analysis is built upon scientific pigment data, X-ray, and FTIR results hosted on Hugging Face:

    We strongly recommend reviewing this core dataset to understand the chemical and material basis behind the visual AI analysis.

  3. Clean dirty containers in Montevideo

    • kaggle.com
    Updated Aug 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Laguna (2021). Clean dirty containers in Montevideo [Dataset]. https://www.kaggle.com/rodrigolaguna/clean-dirty-containers-in-montevideo/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rodrigo Laguna
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montevideo
    Description

    Context

    It all started during last #StayAtHome during 2020's pandemic: some neighbors worried about trash in Montevideo's container.

    The goal is to automatically detect clean from dirty containers to ask for maintenance.

    Want to know more about the entire process? Checkout this thread on how it began, and this other with respect to version 6 update process.

    Content

    Data is splitted in training/testing split, they are independent. However, each split contains several near duplicate images (typicaly, same container from different perspectives or days). Image sizes differ a lot among them.

    There are four major sources: * Images taken from Google Street View, they are 600x600 pixels, automatically collected through its API. * Images contributed by individual persons, most of which I took my self. * Images taken from social networks (Twitter & Facebook) and news. * Images contributed by pormibarrio.uy - 17-11-2020

    Images were took from green containers, the most popular in Montevideo, but also widely used in some other cities.

    Current version (clean-dirty-garbage-containers-V6) is also available here or you can download it as follows: wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1mdfJoOrO6MeTc3eMEjIDkAKlwK9bUFg6' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/ /p')&id=1mdfJoOrO6MeTc3eMEjIDkAKlwK9bUFg6" -O clean-dirty-garbage-containers-V6.zip && rm -rf /tmp/cookies.txt This is specially useful if you want to download it in Google Colab.

    This repo contains the code used during its building and documentation process, including the baselines for the purposed tasks.

    Dataset on news

    Since this is a hot topic in Montevideo, specially nowadays, with elections next week, it catch some attention from local press:

    Acknowledgements

    Thanks to every single person who give me images from their containers. Special thanks to my friend Diego, whose idea of using google street view as a source of data really contributed to increase the dataset. And finally to my wife, who supported me during this project and contributed a lot to this dataset.

    Citation

    If you use these data in a publication, presentation, or other research project or product, please use the following citation:

    Laguna, Rodrigo. 2021. Clean dirty containers in Montevideo - Version 6.1. url: https://www.kaggle.com/rodrigolaguna/clean-dirty-containers-in-montevideo

    @dataset{RLaguna-clean-dirty:2021,
    author = {Rodrigo Laguna},
    title = {Clean dirty containers in Montevideo},
    year = {2021},
    url = {https://www.kaggle.com/rodrigolaguna/clean-dirty-containers-in-montevideo},
    version = {6.1}
    }
    

    Contact

    I'm on twitter, @ro_laguna_ or write to me r.laguna.queirolo at outlook.com

    Future steps:

    • Add images from mapillary, an open source project similar to GoogleStreetView.
    • Keep going on with manually taken images.
    • Add any image from anyone who would like to contribute.
    • Develop & deploy a bot for automatically report container's status.
    • Translate docs to Spanish
    • Crop images to let one and only one container per image, taking most of the image

    Changelog

    • 19-05-2020: V1 - Initial version
    • 20-05-2020: V2 - Include more training samples
    • 12-09-2020: V3 - Include more training (+676) & testing (+64) samples:

      • train/clean from 574 to 1005 (+431)
      • train/dirty from 365 to 610 (+245)
      • test/clean from 100 to 128 (+28)
      • test/dirty from 100 to 136 (+36)
    • 21-12-2020: V4 - Include more training (+367) & testing (+794) samples, including ~400...

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). plant_village [Dataset]. http://identifiers.org/arxiv:1511.08060

plant_village

Explore at:
Dataset updated
Jun 1, 2024
Description

The PlantVillage dataset consists of 54303 healthy and unhealthy leaf images divided into 38 categories by species and disease.

NOTE: The original dataset is not available from the original source (plantvillage.org), therefore we get the unaugmented dataset from a paper that used that dataset and republished it. Moreover, we dropped images with Background_without_leaves label, because these were not present in the original dataset.

Original paper URL: https://arxiv.org/abs/1511.08060 Dataset URL: https://data.mendeley.com/datasets/tywbtsjrjv/1

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('plant_village', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/plant_village-1.0.2.png" alt="Visualization" width="500px">

Search
Clear search
Close search
Google apps
Main menu