100+ datasets found
  1. i

    DIDA: The largest historical handwritten digit dataset with 250k digits

    • ieee-dataport.org
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Yavariabdi (2022). DIDA: The largest historical handwritten digit dataset with 250k digits [Dataset]. https://ieee-dataport.org/documents/dida-largest-historical-handwritten-digit-dataset-250k-digits
    Explore at:
    Dataset updated
    Jan 6, 2022
    Authors
    Amir Yavariabdi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    250

  2. T

    mnist

    • tensorflow.org
    • universe.roboflow.com
    • +3more
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    The MNIST database of handwritten digits.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('mnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">

  3. t

    MNIST database of handwritten digits - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MNIST database of handwritten digits - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist-database-of-handwritten-digits
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The MNIST handwritten digit database is a dataset of 60,000 training and 10,000 test examples of handwritten digit images.

  4. P

    Persian Handwritten Digits Dataset Dataset

    • paperswithcode.com
    Updated Oct 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Persian Handwritten Digits Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/persian-handwritten-digits-dataset
    Explore at:
    Dataset updated
    Oct 10, 2024
    Description

    Description:

    👉 Download the dataset here

    Discover the rich and intricate patterns of Persian Handwritten Digits Dataset with our extensive dataset, thoughtfully curated to provide an unparalleled resource for Al and machine learning applications. This comprehensive collection comprises 150,000 high-resolution images, each meticulously generated to represent the full spectrum of Persian digits from 0 to 9. Leveraging advanced Generative Adversarial Networks (GANs), these images capture the subtle nuances of Persian handwriting, offering both diversity and authenticity in each digit's representation.

    Download Dataset

    Dataset Highlights

    Total Images: 150,000 high-quality images, offering a substantial dataset for robust model training and testing.

    Class Distribution: 15,000 images per digit class (0-9), ensuring a balanced and representative dataset across all Persian numerals.

    Image Resolution: Each image is rendered at a resolution of 28Ă—28 pixels, ideal for a variety of machine learning tasks.

    Authenticity: Images are created using state-of-the-art GANs, ensuring that each digit closely mimics real Persian handwriting with realistic variations.

    Versatility: This dataset is perfectly suited for a wide range of applications, including but not limited to:

    Digit recognition

    Generative modeling

    Handwriting analysis

    OCR (Optical Character Recognition) systems development

    Cultural studies in Persian script

    Extended Features

    Diverse Handwriting Styles: The dataset captures a variety of handwriting styles, from bold and precise to delicate and ornate, reflecting the cultural and individual diversity in Persian script.

    Generative Learning Applications: The GAN-generated images serve as an excellent resource for those interested in exploring generative learning models, providing a rich dataset for experiments in generating or augmenting handwritten text.

    Pre-Processed & Ready-to-Use: The dataset is pre-processed and ready for immediate use, eliminating the need for additional formatting or resizing, saving valuable time and effort in your Al projects.

    Cultural and Linguistic Value: Beyond its technical applications, the dataset holds cultural significance, offering insights into Persian script, which can be valuable for linguists and cultural historians alike.

    Applications

    This dataset is an essential tool for:

    Researchers: Dive into the complexities of Persian script and explore novel approaches in digit recognition or handwriting analysis.

    Data Scientists: Enhance your machine learning models with a dataset that offers both quality and quantity, ensuring accurate and reliable outcomes.

    Al Developers: Build and refine OCR systems tailored specifically for Persian script, or explore the potential of GANs in generating handwritten text.

    Educators and Students: Utilize this dataset in academic settings to teach and learn about handwriting recognition, machine learning, and cultural studies.

    Conclusion

    Whether you're pioneering new frontiers in digit recognition, developing advanced OCR systems, or simply exploring the intersection of Al and cultural studies, the Persian Handwritten Digits dataset offers a wealth of possibilities. Its balanced, high-quality images and extensive class representation make it a valuable resource for anyone working with Persian script or interested in the broader field of handwritten digit recognition.

    This dataset is sourced from Kaggle.

  5. Z

    Image classification in Galaxy with MNIST handwritten digits dataset

    • data.niaid.nih.gov
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaivan Kamali (2022). Image classification in Galaxy with MNIST handwritten digits dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4697905
    Explore at:
    Dataset updated
    Aug 4, 2022
    Dataset authored and provided by
    Kaivan Kamali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Credit: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998

    This is a subset of MNIST handwritten digits dataset (http://yann.lecun.com/exdb/mnist/). Training data of composed of 12,000 images of digits 0 to 9. Test data is composed of 6,000 images of digits 0 to 9 (Original dataset has 60,000 training and 10,000 testing images. We are using a subset for a Galaxy tutorial, so the training is not too computationally intensive). Images are grayscale and 28 by 28 pixels. Each pixel has a value between 0 and 255 (0 for color black, 255 for color white, and all other values for different shades of gray).

  6. t

    MNIST dataset for handwritten digits - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MNIST dataset for handwritten digits - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist-dataset-for-handwritten-digits
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The MNIST dataset is a collection of images of handwritten digits, with size n = 70,000 and D = 784.

  7. P

    Semeion Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Sep 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Semeion Dataset [Dataset]. https://paperswithcode.com/dataset/semeion
    Explore at:
    Dataset updated
    Sep 6, 2022
    Description

    1593 handwritten digits from around 80 persons were scanned, stretched in a rectangular box 16x16 in a gray scale of 256 values.

    The dataset was created by Tactile Srl, Brescia, Italy (http://www.tattile.it) and donated in 1994 to Semeion Research Center of Sciences of Communication, Rome, Italy (http://www.semeion.it), for machine learning research.

    For any questions, e-mail Massimo Buscema (m.buscema '@' semeion.it) or Stefano Terzi (s.terzi '@' semeion.it)

    Data Set Information: 1593 handwritten digits from around 80 persons were scanned, stretched in a rectangular box 16x16 in a gray scale of 256 values. Then each pixel of each image was scaled into a boolean (1/0) value using a fixed threshold.

    Each person wrote on a paper all the digits from 0 to 9, twice. The commitment was to write the digit the first time in the normal way (trying to write each digit accurately) and the second time in a fast way (with no accuracy).

    The best validation protocol for this dataset seems to be a 5x2CV, 50% Tune (Train +Test), and completely blind 50% Validation

    Attribute Information: This dataset consists of 1593 records (rows) and 256 attributes (columns). Each record represents a handwritten digit, originally scanned with a resolution of 256 grays scale (28). Each pixel of each original scanned image was first stretched, and after scaled between 0 and 1 (setting to 0 for every pixel whose value was under the value 127 of the grey scale (127 included) and setting to 1 for each pixel whose original value in the grey scale was over 127).

    Finally, each binary image was scaled again into a 16x16 square box (the final 256 binary attributes).

  8. 400k Augmented MNIST: Extended Handwritten Digits

    • kaggle.com
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Le Mercier (2025). 400k Augmented MNIST: Extended Handwritten Digits [Dataset]. http://doi.org/10.34740/kaggle/ds/6967763
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexandre Le Mercier
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview

    The 400k Augmented MNIST dataset is an extended version of the classic MNIST handwritten digits dataset. By applying a variety of augmentation techniques, I have increased the number of training images to 400,000 - roughly 40,000 per digit label. This large and diverse training set is designed to significantly improve the robustness and generalization of models trained on it, making them less susceptible to overfitting and more resilient against adversarial perturbations.

    Dataset Structure

    The dataset is organized into two main directories:

    • Augmented MNIST Training Set (400k):
      This directory contains 10 subdirectories, one for each digit label ("Label 0" through "Label 9"). Each subdirectory holds the corresponding JPEG images generated via augmentation. These images have been produced using techniques such as random rotation, shear, translation, scaling, reflection, spatial padding, Ben Graham transformation, Gaussian noise, salt-and-pepper noise, and random text overlay.
    • MNIST Validation Set (4k):
      This directory also contains subdirectories "Label 0" to "Label 9". However, the validation set consists solely of the original MNIST images (approximately 400 per label) that were not used for augmentation. This allows you to evaluate model performance on natural, unaltered digit images, providing a clear benchmark for generalization.

    How to Use This Dataset

    1. Training:
      Use the augmented training set to train your deep learning models. The 400k images offer a wide variety of conditions, helping your model learn robust features that generalize well.
    2. Validation:
      Evaluate your models on the validation set, which contains only the original MNIST images. This will help you measure performance on “natural” digits, ensuring that improvements in robustness do not come at the expense of real-world accuracy.
    3. Flexibility:
      You can also experiment with mixed training (combining augmented and original images) to study how different training strategies affect model robustness and accuracy.

    Augmentation Techniques Applied

    The following augmentation functions were used to generate the extended dataset:

    • Random Rotation: Randomly rotates images within a specified angle range.
    • Random Shear: Applies slight shearing transformations.
    • Random Translation: Shifts images horizontally and vertically.
    • Random Scale: Zooms in or out on the images.
    • Ben Graham Transform: Enhances image contrast and clarity using a weighted Gaussian blur.
    • Random Gaussian Noise: Adds Gaussian noise to simulate sensor or environmental disturbances.
    • Random Salt-and-Pepper Noise: Introduces random pixel-level corruption.

    A random number of transformations (between 1 and 6, in a random order) is applied to each image, with the goal of creating a diverse and challenging training set.

    Citation

    If you use this dataset in your research, please cite it as follows:

    @misc{alexandre_le_mercier_2025,
      title={400k Augmented MNIST: Extended Handwritten Digits},
      url={https://www.kaggle.com/ds/6967763},
      DOI={10.34740/KAGGLE/DS/6967763},
      publisher={Kaggle},
      author={Alexandre Le Mercier},
      year={2025}
    }
    

    License

    This dataset is under the Apache 2.0 license.

    Contact

    For any questions or issues regarding this dataset, please send a message in the "Discussions" or "Suggestions" sections of the Kaggle dataset page.

    Good luck and happy coding! 🚀

  9. Handwritten digits dataset

    • kaggle.com
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esraa Taher (2024). Handwritten digits dataset [Dataset]. https://www.kaggle.com/datasets/esraatahersaad/handwritten-digits-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 24, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Esraa Taher
    Description

    Dataset

    This dataset was created by Esraa Taher

    Contents

  10. f

    Arabic Handwritten Digits Dataset

    • figshare.com
    bin
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Loey (2023). Arabic Handwritten Digits Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.12236948.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Mohamed Loey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Arabic Handwritten Digits DatasetAbstractIn recent years, handwritten digits recognition has been an important areadue to its applications in several fields. This work is focusing on the recognitionpart of handwritten Arabic digits recognition that face several challenges, includingthe unlimited variation in human handwriting and the large public databases. Thepaper provided a deep learning technique that can be effectively apply to recognizing Arabic handwritten digits. LeNet-5, a Convolutional Neural Network (CNN)trained and tested MADBase database (Arabic handwritten digits images) that contain 60000 training and 10000 testing images. A comparison is held amongst theresults, and it is shown by the end that the use of CNN was leaded to significantimprovements across different machine-learning classification algorithms.The Convolutional Neural Network was trained and tested MADBase database (Arabic handwritten digits images) that contain 60000 training and 10000 testing images. Moreover, the CNN is giving an average recognition accuracy of 99.15%.ContextThe motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten digits recognition. In recent years, Arabic handwritten digits recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.ContentThe MADBase is modified Arabic handwritten digits database contains 60,000 training images, and 10,000 test images. MADBase were written by 700 writers. Each writer wrote each digit (from 0 -9) ten times. To ensure including different writing styles, the database was gathered from different institutions: Colleges of Engineering and Law, School of Medicine, the Open University (whose students span a wide range of ages), a high school, and a governmental institution.MADBase is available for free and can be downloaded from (http://datacenter.aucegypt.edu/shazeem/) .AcknowledgementsCNN for Handwritten Arabic Digits Recognition Based on LeNet-5http://link.springer.com/chapter/10.1007/978-3-319-48308-5_54Ahmed El-Sawy, Hazem El-Bakry, Mohamed LoeyProceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016Volume 533 of the series Advances in Intelligent Systems and Computing pp 566-575InspirationCreating the proposed database presents more challenges because it deals with many issues such as style of writing, thickness, dots number and position. Some characters have different shapes while written in the same position. For example the teh character has different shapes in isolated position.Arabic Handwritten Characters Datasethttps://www.kaggle.com/mloey1/ahcd1Benha Universityhttp://bu.edu.eg/staff/mloeyhttps://mloey.github.io/

  11. Data from: Written and spoken digits database for multimodal learning

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin
    Updated Jan 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond (2021). Written and spoken digits database for multimodal learning [Dataset]. http://doi.org/10.5281/zenodo.4452953
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 21, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database description:

    The written and spoken digits database is not a new database but a constructed database from existing ones, in order to provide a ready-to-use database for multimodal fusion [1].

    The written digits database is the original MNIST handwritten digits database [2] with no additional processing. It consists of 70000 images (60000 for training and 10000 for test) of 28 x 28 = 784 dimensions.

    The spoken digits database was extracted from Google Speech Commands [3], an audio dataset of spoken words that was proposed to train and evaluate keyword spotting systems. It consists of 105829 utterances of 35 words, amongst which 38908 utterances of the ten digits (34801 for training and 4107 for test). A pre-processing was done via the extraction of the Mel Frequency Cepstral Coefficients (MFCC) with a framing window size of 50 ms and frame shift size of 25 ms. Since the speech samples are approximately 1 s long, we end up with 39 time slots. For each one, we extract 12 MFCC coefficients with an additional energy coefficient. Thus, we have a final vector of 39 x 13 = 507 dimensions. Standardization and normalization were applied on the MFCC features.

    To construct the multimodal digits dataset, we associated written and spoken digits of the same class respecting the initial partitioning in [2] and [3] for the training and test subsets. Since we have less samples for the spoken digits, we duplicated some random samples to match the number of written digits and have a multimodal digits database of 70000 samples (60000 for training and 10000 for test).

    The dataset is provided in six files as described below. Therefore, if a shuffle is performed on the training or test subsets, it must be performed in unison with the same order for the written digits, spoken digits and labels.

    Files:

    • data_wr_train.npy: 60000 samples of 784-dimentional written digits for training;
    • data_sp_train.npy: 60000 samples of 507-dimentional spoken digits for training;
    • labels_train.npy: 60000 labels for the training subset;
    • data_wr_test.npy: 10000 samples of 784-dimentional written digits for test;
    • data_sp_test.npy: 10000 samples of 507-dimentional spoken digits for test;
    • labels_test.npy: 10000 labels for the test subset.

    References:

    1. Khacef, L. et al. (2020), "Brain-Inspired Self-Organization with Cellular Neuromorphic Computing for Multimodal Unsupervised Learning".
    2. LeCun, Y. & Cortes, C. (1998), “MNIST handwritten digit database”.
    3. Warden, P. (2018), “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition”.
  12. MNIST-Handwritten Digit Recognition Problem

    • kaggle.com
    Updated Dec 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dillsunnyb11 (2017). MNIST-Handwritten Digit Recognition Problem [Dataset]. https://www.kaggle.com/dillsunnyb11/digit-recognizer/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    dillsunnyb11
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by dillsunnyb11

    Released under Database: Open Database, Contents: Database Contents

    Contents

  13. R

    Handwritten Digits Dataset

    • universe.roboflow.com
    zip
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathocr (2025). Handwritten Digits Dataset [Dataset]. https://universe.roboflow.com/mathocr-jzmyo/handwritten-digits-h27w9/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 21, 2025
    Dataset authored and provided by
    Mathocr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Digits Bounding Boxes
    Description

    Handwritten Digits

    ## Overview
    
    Handwritten Digits is a dataset for object detection tasks - it contains Digits annotations for 1,560 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  14. P

    Printed Digits Dataset Dataset

    • paperswithcode.com
    Updated Apr 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Printed Digits Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/printed-digits-dataset
    Explore at:
    Dataset updated
    Apr 2, 2025
    Description

    Description:

    👉 Download the dataset here

    The Printed Digits Dataset is a comprehensive collection of approximately 3,000 grayscale images, specifically curate for numeric digit classification tasks. Originally create with 177 images, this dataset has undergone extensive augmentation to enhance its diversity and utility, making it an ideal resource for machine learning projects such as Sudoku digit recognition.

    Dataset Composition:

    Image Count: The dataset contains around 3,000 images, each representing a single numeric digit from 0 to 9.

    Image Dimensions: Each image is standardized to a 28Ă—28 pixel resolution, maintaining a consistent grayscale format.

    Purpose: This dataset was develop with a specific focus on Sudoku digit classification. Notably, it includes blank images for the digit '0', reflecting the common occurrence of empty cells in Sudoku puzzles.

    Download Dataset

    Augmentation Details:

    To expand the original dataset from 177 images to 3,000, a variety of data augmentation techniques were apply. These include:

    Rotation: Images were rotated to simulate different orientations of printed digits.

    Scaling: Variations in the size of digits were introduced to mimic real-world printing inconsistencies.

    Translation: Digits were shifted within the image frame to represent slight misalignments often seen in printed text.

    Noise Addition: Gaussian noise was added to simulate varying print quality and scanner imperfections.

    Applications:

    Sudoku Digit Recognition: Given its design, this dataset is particularly well-suited for training models to recognize and classify digits in Sudoku puzzles.

    Handwritten Digit Classification: Although the dataset contains printed digits, it can be adapted and utilized in combination with handwritten digit datasets for broader numeric

    classification tasks.

    Optical Character Recognition (OCR): This dataset can also be valuable for training OCR systems, especially those aim at processing low-resolution or small-scale printed text.

    Dataset Quality:

    Uniformity: All images are uniformly scaled and aligned, providing a clean and consistent dataset for model training.

    Diversity: Augmentation has significantly increased the diversity of digit representation, making the dataset robust for training deep learning models.

    Usage Notes:

    Zero Representation: Users should note that the digit '0' is represented by a blank image.

    This design choice aligns with the specific application of Sudoku puzzle solving but may require adjustments if the dataset is use for other numeric classification tasks.

    Preprocessing Required: While the dataset is ready for use, additional preprocessing steps, such as normalization or further augmentation, can be applied based on the specific requirements of the intended machine learning model.

    File Format:

    The images are stored in a standardized format compatible with most machine learning frameworks, ensuring ease of integration into existing workflows.

    Conclusion: The Printed Digits Dataset offers a rich resource for those working on digit classification projects, particularly within the context of Sudoku or other numeric-based puzzles. Its extensive augmentation and attention to application-specific details make it a valuable asset for both academic research and practical Al development.

    This dataset is sourced from Kaggle.

  15. a

    not-MNIST

    • datasets.activeloop.ai
    • opendatalab.com
    • +2more
    deeplake
    Updated Mar 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaroslav Bulatov (2022). not-MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/not-mnist-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 11, 2022
    Authors
    Yaroslav Bulatov
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The not-MNIST dataset is a dataset of handwritten digits. It is a challenging dataset that can be used for machine learning and artificial intelligence research. The dataset consists of 100,000 images of handwritten digits. The images are divided into a training set of 60,000 images and a test set of 40,000 images. The images are drawn from a variety of fonts and styles, making them more challenging than the MNIST dataset. The images are 28x28 pixels in size and are grayscale. The dataset is available under the Creative Commons Zero Public Domain Dedication license.

  16. i

    ARDIS: A Swedish Historical Handwritten Digit Dataset

    • ieee-dataport.org
    Updated Nov 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbas Cheddad (2022). ARDIS: A Swedish Historical Handwritten Digit Dataset [Dataset]. https://ieee-dataport.org/documents/ardis-swedish-historical-handwritten-digit-dataset
    Explore at:
    Dataset updated
    Nov 22, 2022
    Authors
    Abbas Cheddad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sweden
    Description

    whereas

  17. ICDAR2013 – Handwritten Digit and Digit String Recognition Competition

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Markus Diem; Markus Diem; Stefan Fiel; Stefan Fiel; Angelika Garz; Manuel Keglevic; Florian Kleber; Florian Kleber; Robert Sablatnig; Robert Sablatnig; Angelika Garz; Manuel Keglevic (2022). ICDAR2013 – Handwritten Digit and Digit String Recognition Competition [Dataset]. http://doi.org/10.5281/zenodo.1492173
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Markus Diem; Markus Diem; Stefan Fiel; Stefan Fiel; Angelika Garz; Manuel Keglevic; Florian Kleber; Florian Kleber; Robert Sablatnig; Robert Sablatnig; Angelika Garz; Manuel Keglevic
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The CVL Single Digit dataset consists of 7000 single digits (700 digits per class) written by approximately 60 different writers. The validation set has the same size but different writers. The validation set may be used for parameter estimation and validation but not for supervised training. The CVL Digit Strings dataset uses 10 different digit strings from a total of about 120 writers resulting in 1262 training images. The digits from the CVL Single Digit dataset were extracted from these strings.

    This database may be used for non-commercial research purpose only. If you publish material based on this database, we request you to include a reference to:

    Markus Diem, Stefan Fiel, Angelika Garz, Manuel Keglevic, Florian Kleber and Robert Sablatnig, ICDAR 2013 Competition on Handwritten Digit Recognition (HDRC 2013), In Proc. of the 12th Int. Conference on Document Analysis and Recognition (ICDAR) 2013, pp. 1454-1459, 2013.

  18. Handwritten Digits with Writer Characteristics

    • kaggle.com
    Updated May 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordi de la Torre UOC (2024). Handwritten Digits with Writer Characteristics [Dataset]. https://www.kaggle.com/datasets/jordidelatorreuoc/handwritten-digits-with-writer-characteristics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 12, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jordi de la Torre UOC
    Description

    It contains high-resolution images of handwritten digits together with various writer characteristics which are not available in the well-known MNIST database. The multiple writer characteristics gathered are a novelty of our data set and create new research opportunities. The data set is publicly available online. Second, we analyse this new data set. We begin with simple supervised tasks. We assess the predictability of the writer characteristics gathered, the effect of using some of those characteristics as predictors in classification task and the effect of higher resolution images on classification accuracy. We also explore semi-supervised applications; we can leverage the high quantity of handwritten digits data sets already existing online to improve the accuracy of various classifications task with noticeable success. Finally, we also demonstrate the generative perspective offered by this new data set; we are able to generate images that mimics the writing style of specific writers. The data set has unique and distinct features and our analysis establishes benchmarks and showcases some of the new opportunities made possible with this new data set.

  19. m

    Persian Handwritten Digits and symbols Dataset (PHDD)

    • data.mendeley.com
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Asgari (2025). Persian Handwritten Digits and symbols Dataset (PHDD) [Dataset]. http://doi.org/10.17632/8993swvpjx.1
    Explore at:
    Dataset updated
    Mar 10, 2025
    Authors
    Yousef Asgari
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The dataset structure is designed to include two main directories, "train" and "test." Each of these directories is divided into subfolders, where data is organized based on the type of character (handwritten digits from 0 to 9 and mathematical symbols such as parentheses, brackets, braces, and arithmetic signs). The images, after digitization, have been resized to a standard dimension of 28x28 pixels, and various filters (including Gaussian Blur, Median Blur, and Canny) were applied for noise reduction and enhancement of clarity. This preprocessing ensures data uniformity, enabling direct use in deep learning and computer vision algorithms.

  20. P

    Burmese Handwritten Digit Dataset (BHDD) Dataset

    • paperswithcode.com
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Burmese Handwritten Digit Dataset (BHDD) Dataset [Dataset]. https://paperswithcode.com/dataset/burmese-handwritten-digit-dataset-bhdd
    Explore at:
    Dataset updated
    May 28, 2025
    Description

    The Burmese Handwritten Digit Dataset (BHDD) is a dataset project specifically created for recognizing handwritten Burmese digits. It is a Burmese version of MNIST dataset with a training set of 60,000 examples, and a test set of 27,561 examples.

    Overview Dataset Statistics:

    Training Set: 60,000 samples Testing Set: 27,561 samples Number of Classes: 10 (Burmese digits 0–9)

    Data Format:

    Train Image Shape: (60000, 784) Train Label Shape: (60000, 10) Test Image Shape: (27561, 784) Test Label Shape: (27561, 10)

    The dataset was collected from over 150 individuals of different ages (ranging from high school students to professionals in their 50s) and diverse occupations (including clerks, programmers, and others) to achieve a wide variety of handwriting styles. We then preprocessed to mirror the structure and functionality of MNIST.

    Dataset Content The dataset consists of:

    Train Images: 60,000 grayscale images of handwritten Burmese digits, flattened into a 1D array of size 784 (28x28 pixels). Train Labels: One-hot encoded labels corresponding to the digit class. Test Images: 27,561 grayscale images for testing purposes. Test Labels: One-hot encoded labels for testing data.

    Contribution We encourage the ML/DL community to contribute by:

    Creating digit recognizers. Benchmarking with different models and algorithms. Writing tutorials and sharing findings.

    Citation If you use the BHDD dataset in your work, please cite this repository:

    @dataset{bhdd, author = {Expa.AI Research Team}, title = {Burmese Handwritten Digit Dataset (BHDD)}, year = {2019}, url = {https://github.com/baseresearch/BHDD} }

    Acknowledgments This dataset would not have been possible without:

    The efforts of the Expa.AI Research Team. Volunteers and interns from Taungoo Computer University who contributed handwriting samples. Highschool students from St.Augustine / B.E.H.S (2) Kamayut Friends and family members of Expa.AI Research Team. The community’s ongoing support and interest in ML/DL for the Burmese language.

    License This dataset is released under the LGPL-3.0 license. Please see the LICENSE file for more details.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amir Yavariabdi (2022). DIDA: The largest historical handwritten digit dataset with 250k digits [Dataset]. https://ieee-dataport.org/documents/dida-largest-historical-handwritten-digit-dataset-250k-digits

DIDA: The largest historical handwritten digit dataset with 250k digits

Explore at:
Dataset updated
Jan 6, 2022
Authors
Amir Yavariabdi
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

250

Search
Clear search
Close search
Google apps
Main menu