EMNIST (extended MNIST) has 4 times more data than MNIST. It is a set of handwritten digits with a 28 x 28 format.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
EMNIST Letters Dataset
Authors
Gregory Cohen Saeed Afshar Jonathan Tapson Andre van Schaik
The MARCS Institute for Brain, Behaviour and DevelopmentWestern Sydney UniversityPenrith, Australia 2751 Email: g.cohen@westernsydney.edu.au
What is it?
The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 (NIST Special Database 19) and converted to a 28x28 pixel image format and dataset structure that… See the full description on the dataset page: https://huggingface.co/datasets/Royc30ne/emnist-letters.
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 (https://www.nist.gov/srd/nist-special-database-19) and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset (http://yann.lecun.com/exdb/mnist/). Further information on the dataset contents and conversion process can be found in the paper available at https://arxiv.org/abs/1702.05373v2
The MNIST dataset has become a standard benchmark for learning, classification and computer vision systems. Contributing to its widespread adoption are the understandable and intuitive nature of the task, its relatively small size and storage requirements and the accessibility and ease-of-use of the database itself. The MNIST database was derived from a larger dataset known as the NIST Special Database 19 which contains digits, uppercase and lowercase handwritten letters. This paper introduces a variant of the full NIST dataset, which we have called Extended MNIST (EMNIST), which follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute a more challenging classification tasks involving letters and digits, and that shares the same image structure and parameters as the original MNIST task, allowing for direct compatibility with all existing classifiers and systems. Benchmark results are presented along with a validation of the conversion process through the comparison of the classification results on converted NIST digits and the MNIST digits.
The database is made available in original MNIST format and Matlab format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is derived from the Leaf repository (https://github.com/TalwalkarLab/leaf) pre-processing of the Extended MNIST dataset, grouping examples by writer. Details about Leaf were published in "LEAF: A Benchmark for Federated Settings" https://arxiv.org/abs/1812.01097Note: This dataset does not include some additional preprocessing that MNIST includes, such as size-normalization and centering. In the Federated EMNIST data, the value of 1.0 corresponds to the background, and 0.0 corresponds to the color of the digits themselves; this is the inverse of some MNIST representations, e.g. in tensorflow_datasets, where 0 corresponds to the background color, and 255 represents the color of the digit.Data set sizes:only_digits=True: 3,383 users, 10 label classestrain: 341,873 examplestest: 40,832 examplesonly_digits=False: 3,400 users, 62 label classestrain: 671,585 examplestest: 77,483 examplesRather than holding out specific users, each user's examples are split across train and test so that all users have at least one example in train and one example in test. Writers that had less than 2 examples are excluded from the data set.The tf.data.Datasets returned by tff.simulation.datasets.ClientData.create_tf_dataset_for_client will yield collections.OrderedDict objects at each iteration, with the following keys and values, in lexicographic order by key:'label': a tf.Tensor with dtype=tf.int32 and shape [1], the class label of the corresponding pixels. Labels [0-9] correspond to the digits classes, labels [10-35] correspond to the uppercase classes (e.g., label 11 is 'B'), and labels [36-61] correspond to the lowercase classes (e.g., label 37 is 'b').'pixels': a tf.Tensor with dtype=tf.float32 and shape [28, 28], containing the pixels of the handwritten digit, with values in the range [0.0, 1.0].Argsonly_digits(Optional) whether to only include examples that are from the digits [0-9] classes. If False, includes lower and upper case characters, for a total of 62 class labels.cache_dir(Optional) directory to cache the downloaded file. If None, caches in Keras' default cache directory.ReturnsTuple of (train, test) where the tuple elements are tff.simulation.datasets.ClientData objects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
emnist_large.tar.gz contains the EMNIST-DA dataset consisting of 13 shifted versions of the 47-class extended-MNIST (EMNIST) dataset. As many methods achieve very good performance on MNIST datasets, this dataset was created to be more challenging with 47-classes and some difficult (measurement) shifts.
Source code to generate the dataset is available at https://github.com/cianeastwood/bufr/blob/main/data/emnist.py.
Famous extended MNIST dataset. The original dataset is available at https://www.nist.gov/itl/iad/image-group/emnist-dataset. This dataset was "re-created" to provide the images in JPEG format for individuals who might want to manipulate the images without having to convert them from their original IDX3 format. Original Authors: Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre van Schaik
Contains 28x28 grayscale images of hand-written characters, as well as a CSV file with providing each image's path and respective label. The characters are grouped in directories with the same name as their label.
This dataset was created by Mohammed Sadiq
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Kevin Min Seong Park
Released under MIT
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by syed khadir ahmed
Released under CC0: Public Domain
Dataset Card for "emnist-mnist"
Dataset Information
The emnist-mnist dataset is a set of images of handwritten digits. The dataset is split into a training set and a test set.
Data Fields
image: The image of the handwritten digit. The data type of this field is image. label: The label of the handwritten digit. The data type of this field is class_label, and it can take on the values '0' to '9'.
Data Splits
train: The training set consists of 60000… See the full description on the dataset page: https://huggingface.co/datasets/tanganke/emnist_mnist.
This dataset was created by Om Duggineni
Dataset Card for "emnist-letters"
Dataset Information
The emnist-letters dataset is a set of images of handwritten letters. The dataset is split into a training set and a test set.
Data Fields
image: The image of the handwritten letter. The data type of this field is image. label: The label of the handwritten letter. The data type of this field is class_label, and it can take on the values 'A' to 'Z'.
Data Splits
train: The training set consists… See the full description on the dataset page: https://huggingface.co/datasets/tanganke/emnist_letters.
This dataset was created by Ankur Goswami
This dataset was created by Олексій Чорний
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the different outputs generated for the paper "Blockchain-enabled Server-less Federated Learning", submitted to Computer Networks (COMNET) journal. Two types of outputs are provided:
Blockchain queue simulator (output_queue_simulator): results of the simulations done in the batch-service queue simulator (https://github.com/fwilhelmi/batch_service_queue_simulator) to characterize the queue latency of blockchain applications.
Tensorflow (output_tensorflow): results of the simulations done in Tensorflow Federated (TFF), resulting from the application of different models to the federated EMNIST dataset.
Each folder also includes the scripts used to execute the corresponding simulations. For more details, see the repository in https://github.com/fwilhelmi/blockchain_enabled_federated_learning
This dataset was created by Олексій Чорний
This dataset was created by Dmytro Potapov
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Yuvraj Joshi 1110
Released under MIT
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 101354 labeled Segmented Character Telerik RadCaptcha images, each with a resolution of 40x50 pixels. The images were segmented from 3000 Telerik RadCaptcha images featuring 5-character alphanumeric strings which were labeled manually which were resized to a fix resolution of 40x50 pixels. Inorder to create a diverse dataset we used image augmentation and applied various rotations and translations to the resized character images to increase generalization capability. This dataset is ideal for CAPTCHA text recognition and optical character recognition (OCR). The dataset is designed to support various deep learning and machine learning tasks, including text extraction and model robustness evaluation. Suitable for developing and fine-tuning CAPTCHA recognition systems, this dataset provides a diverse set of CAPTCHA samples for effective model training and testing. The dataset is very similar to MNIST and EMNIST dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
EMNIST (extended MNIST) has 4 times more data than MNIST. It is a set of handwritten digits with a 28 x 28 format.