STL-10 is an image recognition dataset inspired by CIFAR-10 dataset with some improvements. With a corpus of 100,000 unlabeled images and 500 training images, this dataset is best for developing unsupervised feature learning, deep learning, self-taught learning algorithms. Unlike CIFAR-10, the dataset has a higher resolution which makes it a challenging benchmark for developing more scalable unsupervised learning methods.
Data overview:
The original data source recommends the following standardized testing protocol for reporting results:
Original data source and banner image: https://cs.stanford.edu/~acoates/stl10/
Please cite the following reference when using this dataset:
Adam Coates, Honglak Lee, Andrew Y. Ng An Analysis of Single Layer Networks in Unsupervised Feature Learning AISTATS, 2011.
The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. The primary challenge is to make use of the unlabeled data (which comes from a similar but different distribution from the labeled data) to build a useful prior. All images were acquired from labeled examples on ImageNet.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('stl10', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/stl10-1.0.0.png" alt="Visualization" width="500px">
Inspired by the CIFAR-10 dataset, STL-10 is an image recognition dataset for the development of unsupervised machine and feature learning as well as deep learning algorithms. Each class has fewer number of labeled training examples compared to CIFAR-10, and a large set of unlabeled samples is provided to learn image models prior to training the models. The primary challenge is to utilize the unlabeled data. With the higher resolution (96x96) of this dataset, it is expected that will be a more challenging benchmark to attain when developing such scalable unsupervised ML models.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
STL-10 is an image recognition dataset inspired by CIFAR-10 dataset with some improvements. With a corpus of 100,000 unlabeled images and 500 training images, this dataset is best for developing unsupervised feature learning, deep learning, self-taught learning algorithms. Unlike CIFAR-10, the dataset has a higher resolution which makes it a challenging benchmark for developing more scalable unsupervised learning methods.
Data overview:
The original data source recommends the following standardized testing protocol for reporting results:
Original data source and banner image: https://cs.stanford.edu/~acoates/stl10/
Please cite the following reference when using this dataset:
Adam Coates, Honglak Lee, Andrew Y. Ng An Analysis of Single Layer Networks in Unsupervised Feature Learning AISTATS, 2011.