Facebook
TwitterThe MNIST database of handwritten digits.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
Facebook
TwitterFashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('fashion_mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">
Facebook
TwitterMoving variant of MNIST database of handwritten digits. This is the
data used by the authors for reporting model performance. See
tfds.video.moving_mnist.image_as_moving_sequence
for generating training/validation data from the MNIST dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('moving_mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterThis dataset was created by 3Jlou 4eJluk
Facebook
TwitterKuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images), provided in the original MNIST format as well as a NumPy format. Since MNIST restricts us to 10 classes, we chose one character to represent each of the 10 rows of Hiragana when creating Kuzushiji-MNIST.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('kmnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/kmnist-3.0.1.png" alt="Visualization" width="500px">
Facebook
TwitterThis dataset was created by Murad Al Dahmashi
Facebook
TwitterThis dataset was created by Arpan Dhatt
Facebook
TwitterThe EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset.
Note: Like the original EMNIST data, images provided here are inverted horizontally and rotated 90 anti-clockwise. You can use tf.transpose within ds.map to convert the images to a human-friendlier format.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('emnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/emnist-byclass-3.1.0.png" alt="Visualization" width="500px">
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Restructured Image Files: Each digit image is saved as a .png file in separate directories for training and testing.
CSV Metadata: Includes train_labels.csv and test_labels.csv, mapping image filenames to their respective labels.
Improved Accessibility: Simplified folder structure for easier dataset exploration and model training.
Format: Images are grayscale (28x28 pixels), suitable for most deep learning frameworks (TensorFlow, PyTorch, etc.).
This dataset is ideal for: - Developing and testing classification models for handwritten digit recognition. - Exploring custom preprocessing pipelines for digit datasets. - Comparing model performance on a restructured MNIST dataset.
Facebook
TwitterThis dataset was created by Yaroslav Bulatov by taking some publicly available fonts and extracting glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A-J.
A set of training and test images of letters from A to J on various typefaces. The images size is 28x28 pixels.
The dataset can be found on Tensorflow github page as well as on the blog from Yaroslav, here.
This is a pretty good dataset to train classifiers! According to Yaroslav:
Judging by the examples, one would expect this to be a harder task than MNIST. This seems to be the case -- logistic regression on top of stacked auto-encoder with fine-tuning gets about 89% accuracy whereas same approach gives got 98% on MNIST. Dataset consists of small hand-cleaned part, about 19k instances, and large uncleaned dataset, 500k instances. Two parts have approximately 0.5% and 6.5% label error rate. I got this by looking through glyphs and counting how often my guess of the letter didn't match it's unicode value in the font file.
Enjoy!
Facebook
TwitterA large set of images of cats and dogs. There are 1738 corrupted images that are dropped.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('cats_vs_dogs', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/cats_vs_dogs-4.0.1.png" alt="Visualization" width="500px">
Facebook
TwitterThis dataset was created by DanielTxz
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There is no story behind this dataset, I just felt that I should also have a dataset 😬 .
The dataset contains top view of dice digits which can be used as an alternative to the MNIST dataset for digit recognition, a benchmark dataset for classification.
The images currently are only 120 and attempts to augment the data have already been made through the Tensorflow data augmentation pipeline, which further increased the dataset to about 1600 images(with random rotations, crops amongst other operations)
For the small dataset that we have here, the images were made from just two dice. The images of the dice are resized to be similar to that of the MNIST dataset for testing results on the already present models.
The images currently in the dataset are named as follows: {number}_{color of the dice**}_{transform angle}_{transformation direction*}
My aim is that the dataset should be big enough so as to not cause overfitting. The dataset should also be diverse enough so that the model for which it is used is accurate.
Albeit augmentation of the dataset is a way to increase the dataset size, original images are preferred for their variability amongst many variables that I might have neglected in my analysis.
*if the direction is necessary, it is mentioned
** Although the images are converted to grayscale, the color of the dice might be feature that is required for some other analysis.
There is no one particularly that comes to mind, because each and every picture in this small dataset was manually edited by me, although I would like to help
The question that I have is whether this dataset can be used for Image Classification ? My take on this problem : GitHub Implementation
Facebook
TwitternotMNIST dataset is created from some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A-J taken from different fonts.
Judging by the examples, one would expect this to be a harder task than MNIST. This seems to be the case -- logistic regression on top of stacked auto-encoder with fine-tuning gets about 89% accuracy whereas same approach gives got 98% on MNIST. Dataset consists of small hand-cleaned part, about 19k instances, and large uncleaned dataset, 500k instances. Two parts have approximately 0.5% and 6.5% label error rate. Got this by looking through glyphs and counting how often my guess of the letter didn't match it's unicode value in the font file.
This dataset is used extensively in the Udacity Deep Learning course, and is available in the Tensorflow Github repo (under Examples). I'm not aware of any license governing the use of this data, so I'm posting it here so that the community can use it with Kaggle kernels.
notMNIST _large is a large but dirty version of the dataset with 529,119 images, and notMNIST_small is a small hand-cleaned version of the dataset, with 18726 images. The dataset was assembled by Yaroslav Bulatov, and can be obtained on his blog.
The two files each containing 28x28 grayscale images of letters A - J, organized into directories by letter. notMNIST_large contains 529,119 images and notMNIST_small contains 18726 images.
Thanks to Yaroslav Bulatov for putting together the dataset. http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains practical lab tasks for the Deep Learning Lab course, focusing on key deep learning concepts using NumPy, TensorFlow, Scikit-learn, and Matplotlib.
Lab1/ - Overview of TensorFlow and Its Operations.Lab2/ - Implementation of Activation Functions.Lab3/ - Implementation of Linear Regression Using Scikit-Learn.Lab4/ - Implementation of Single Layer Perceptron with Optimizer.Lab5/ - Implementing Multi-Layer Neural Network Using Keras.Lab6/ - Housing Price Prediction Using Keras.Lab7/ - Open Ended Lab.Lab8/ - Classifications Using Neural Networks.Lab9/ - Introduction to Image Processing.Lab10/ - MNIST Digit Classification with CNN.Lab11/ - Understanding Recurrent Neural Networks (RNNs).Lab12/ - Overfitting vs Underfitting Visualization with Synthesized Dataset.Lab13/ - Introduction to GANs using CelebA Dataset (PNG Images).Lab14/ - Open Ended Lab.This dataset purpose to experiment with deep learning frameworks and improve understanding of computational operations using real-world code examples.
Lab1/ folder to access the Jupyter Notebook. Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe MNIST database of handwritten digits.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">