The PlantVillage dataset consists of 54303 healthy and unhealthy leaf images divided into 38 categories by species and disease.
NOTE: The original dataset is not available from the original source (plantvillage.org), therefore we get the unaugmented dataset from a paper that used that dataset and republished it. Moreover, we dropped images with Background_without_leaves label, because these were not present in the original dataset.
Original paper URL: https://arxiv.org/abs/1511.08060 Dataset URL: https://data.mendeley.com/datasets/tywbtsjrjv/1
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('plant_village', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/plant_village-1.0.2.png" alt="Visualization" width="500px">
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Title: AI model for artwork verification using brushstroke structure, rhythm, and X-ray analysis. Developed from 10 years of research and refined through AI Natural Matching.
Overview
This dataset presents an AI model developed specifically for artwork authentication. The goal is to determine whether two paintings—one known and one unknown—may have been created by the same artist, based on brushstroke analysis and structural features.
This model demonstrates how machine learning can assist in verifying stylistic consistency using visual data, supported by structural scans such as X-ray imaging. The method was built from human-led comparisons and later refined into a formal system of AI-driven brushstroke analysis.
Method: The 18 Supreme Techniques
The model uses a set of 18 core visual techniques designed to analyze:
Stroke rhythm and torque
Brushstroke flow direction
Angular deviation and energy pattern
Layer buildup and underpainting patterns (via X-ray)
These techniques were developed in Google Colab with AI support, based on over 10 years of image-based manual study. The system is optimized for AI Natural Matching, a visual neural comparison approach that mimics human perception but operates at scale.
Requirements:
High-resolution candidate image
Reference image (known work by artist)
X-ray scan of candidate image
AI Natural Matching only (SSIM is strictly forbidden)
This model must use AI Natural Matching. Do not use SSIM or any structural similarity index.
Dataset Use
This dataset is intended for researchers, AI developers, and art historians who wish to:
Validate or disprove potential authorship of a painting
Test the model using different artist references
Study visual fingerprinting and stylistic consistency
Scientific pigment data (XRF, FTIR, SEM) and aging process validation for The Tree Oil Painting are available in a separate dataset. Cross-checking with physical material data is strongly encouraged.
Licensing and Attribution
All data is licensed under CC BY 4.0 and freely available for academic, research, and AI development use.
Model and research developed by Haronthai Mongbunsri (Independent Researcher, 2015–2025) AI structure refined through collaboration with neural tools via Google Colab.
This dataset is part of an open effort to build transparent, reproducible systems for artwork verification.
This analysis is built upon scientific pigment data, X-ray, and FTIR results hosted on Hugging Face:
We strongly recommend reviewing this core dataset to understand the chemical and material basis behind the visual AI analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It all started during last #StayAtHome during 2020's pandemic: some neighbors worried about trash in Montevideo's container.
The goal is to automatically detect clean from dirty containers to ask for maintenance.
Want to know more about the entire process? Checkout this thread on how it began, and this other with respect to version 6 update process.
Data is splitted in training/testing split, they are independent. However, each split contains several near duplicate images (typicaly, same container from different perspectives or days). Image sizes differ a lot among them.
There are four major sources: * Images taken from Google Street View, they are 600x600 pixels, automatically collected through its API. * Images contributed by individual persons, most of which I took my self. * Images taken from social networks (Twitter & Facebook) and news. * Images contributed by pormibarrio.uy - 17-11-2020
Images were took from green containers, the most popular in Montevideo, but also widely used in some other cities.
Current version (clean-dirty-garbage-containers-V6) is also available here or you can download it as follows:
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1mdfJoOrO6MeTc3eMEjIDkAKlwK9bUFg6' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/
/p')&id=1mdfJoOrO6MeTc3eMEjIDkAKlwK9bUFg6" -O clean-dirty-garbage-containers-V6.zip && rm -rf /tmp/cookies.txt
This is specially useful if you want to download it in Google Colab.
This repo contains the code used during its building and documentation process, including the baselines for the purposed tasks.
Since this is a hot topic in Montevideo, specially nowadays, with elections next week, it catch some attention from local press:
Thanks to every single person who give me images from their containers. Special thanks to my friend Diego, whose idea of using google street view as a source of data really contributed to increase the dataset. And finally to my wife, who supported me during this project and contributed a lot to this dataset.
If you use these data in a publication, presentation, or other research project or product, please use the following citation:
Laguna, Rodrigo. 2021. Clean dirty containers in Montevideo - Version 6.1. url: https://www.kaggle.com/rodrigolaguna/clean-dirty-containers-in-montevideo
@dataset{RLaguna-clean-dirty:2021,
author = {Rodrigo Laguna},
title = {Clean dirty containers in Montevideo},
year = {2021},
url = {https://www.kaggle.com/rodrigolaguna/clean-dirty-containers-in-montevideo},
version = {6.1}
}
I'm on twitter, @ro_laguna_ or write to me r.laguna.queirolo at outlook.com
12-09-2020: V3 - Include more training (+676) & testing (+64) samples:
21-12-2020: V4 - Include more training (+367) & testing (+794) samples, including ~400...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The PlantVillage dataset consists of 54303 healthy and unhealthy leaf images divided into 38 categories by species and disease.
NOTE: The original dataset is not available from the original source (plantvillage.org), therefore we get the unaugmented dataset from a paper that used that dataset and republished it. Moreover, we dropped images with Background_without_leaves label, because these were not present in the original dataset.
Original paper URL: https://arxiv.org/abs/1511.08060 Dataset URL: https://data.mendeley.com/datasets/tywbtsjrjv/1
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('plant_village', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/plant_village-1.0.2.png" alt="Visualization" width="500px">