6 datasets found
  1. R

    Mnist Dataset

    • universe.roboflow.com
    • tensorflow.org
    • +5more
    zip
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Popular Benchmarks (2022). Mnist Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/mnist-cjkff/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 8, 2022
    Dataset authored and provided by
    Popular Benchmarks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Digits
    Description

    THE MNIST DATABASE of handwritten digits

    Authors:

    • Yann LeCun, Courant Institute, NYU
    • Corinna Cortes, Google Labs, New York
    • Christopher J.C. Burges, Microsoft Research, Redmond

    Dataset Obtained From: http://yann.lecun.com/exdb/mnist/

    All images were sized 28x28 in the original dataset

    The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

    It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

    Version 1 (original-images_trainSetSplitBy80_20):

    • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
    • Trained from Roboflow Classification Model's ImageNet training checkpoint

    Version 2 (original-images_ModifiedClasses_trainSetSplitBy80_20):

    • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
    • Modify Classes, a Roboflow preprocessing feature, was employed to change class names from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to one, two, three, four, five, six, seven, eight, nine
    • Trained from the Roboflow Classification Model's ImageNet training checkpoint

    Version 3 (original-images_Original-MNIST-Splits):

    • Original images, with the original splits for MNIST: train (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.
    • This version was not trained

    Citation:

    @article{lecun2010mnist,
     title={MNIST handwritten digit database},
     author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
     journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
     volume={2},
     year={2010}
    }
    
  2. R

    Data from: Fashion Mnist Dataset

    • universe.roboflow.com
    • opendatalab.com
    • +4more
    zip
    Updated Aug 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Popular Benchmarks (2022). Fashion Mnist Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/fashion-mnist-ztryt/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 10, 2022
    Dataset authored and provided by
    Popular Benchmarks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Clothing
    Description

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Authors:

    Dataset Obtained From: https://github.com/zalandoresearch/fashion-mnist

    All images were sized 28x28 in the original dataset

    Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. * Source

    Here's an example of how the data looks (each class takes three-rows): https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png" alt="Visualized Fashion MNIST dataset">

    Version 1 (original-images_Original-FashionMNIST-Splits):

    • Original images, with the original splits for MNIST: train (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.
    • This version was not trained

    Version 3 (original-images_trainSetSplitBy80_20):

    • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
    • https://blog.roboflow.com/train-test-split/ https://i.imgur.com/angfheJ.png" alt="Train/Valid/Test Split Rebalancing">

    Citation:

    @online{xiao2017/online,
     author    = {Han Xiao and Kashif Rasul and Roland Vollgraf},
     title    = {Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},
     date     = {2017-08-28},
     year     = {2017},
     eprintclass = {cs.LG},
     eprinttype  = {arXiv},
     eprint    = {cs.LG/1708.07747},
    }
    
  3. P

    MedMNIST v2 Dataset

    • paperswithcode.com
    • huggingface.co
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiancheng Yang; Rui Shi; Donglai Wei; Zequan Liu; Lin Zhao; Bilian Ke; Hanspeter Pfister; Bingbing Ni (2025). MedMNIST v2 Dataset [Dataset]. https://paperswithcode.com/dataset/medmnist-v2
    Explore at:
    Dataset updated
    Feb 18, 2025
    Authors
    Jiancheng Yang; Rui Shi; Donglai Wei; Zequan Liu; Lin Zhao; Bilian Ke; Hanspeter Pfister; Bingbing Ni
    Description

    MedMNIST v2 is a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28 x 28 (2D) or 28 x 28 x 28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of 708,069 2D images and 10,214 3D images in total, could support numerous research / educational purposes in biomedical image analysis, computer vision and machine learning.

    Description and image from: MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification

    Each subset keeps the same license as that of the source dataset. Please also cite the corresponding paper of source data if you use any subset of MedMNIST.

  4. Free Spoken Digits Dataset (FSDD)

    • kaggle.com
    Updated Nov 13, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Vial (2020). Free Spoken Digits Dataset (FSDD) [Dataset]. https://www.kaggle.com/jackvial/freespokendigitsdataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jack Vial
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Source https://github.com/Jakobovski/free-spoken-digit-dataset

    Free Spoken Digit Dataset (FSDD)

    DOI

    A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.

    FSDD is an open dataset, which means it will grow over time as data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using Zenodo DOI as well as git tags.

    Current status

    • 6 speakers
    • 3,000 recordings (50 of each digit per speaker)
    • English pronunciations

    Organization

    Files are named in the following format: {digitLabel}_{speakerName}_{index}.wav Example: 7_jackson_32.wav

    Contributions

    Please contribute your homemade recordings. All recordings should be mono 8kHz wav files and be trimmed to have minimal silence. Don't forget to update metadata.py with the speaker meta-data.

    To add your data, follow the recording instructions in acquire_data/say_numbers_prompt.py and then run split_and_label_numbers.py to make your files.

    Metadata

    metadata.py contains meta-data regarding the speakers gender and accents.

    Included utilities

    trimmer.py Trims silences at beginning and end of an audio file. Splits an audio file into multiple audio files by periods of silence.

    fsdd.py A simple class that provides an easy to use API to access the data.

    spectogramer.py Used for creating spectrograms of the audio data. Spectrograms are often a useful pre-processing step.

    Usage

    The test set officially consists of the first 10% of the recordings. Recordings numbered 0-4 (inclusive) are in the test and 5-49 are in the training set.

    Made with FSDD

    Did you use FSDD in a paper, project or app? Add it here! * https://github.com/Jakobovski/decoupled-multimodal-learning * https://adhishthite.github.io/sound-mnist/ by Adhish Thite (https://adhishthite.github.io/)

    External tools

    License

    Creative Commons Attribution-ShareAlike 4.0 International

  5. Z

    Multi-Domain Outlier Detection Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kulshrestha, Sakshum (2022). Multi-Domain Outlier Detection Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5941338
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    Francis, Raymond
    Rebbapragada, Umaa
    Wagstaff, Kiri
    Kulshrestha, Sakshum
    Lee, Jake
    Lu, Steven
    Dubayah, Bryce
    Kerner, Hannah
    Raman, Vinay
    Huff, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Multi-Domain Outlier Detection Dataset contains datasets for conducting outlier detection experiments for four different application domains:

    Astrophysics - detecting anomalous observations in the Dark Energy Survey (DES) catalog (data type: feature vectors)

    Planetary science - selecting novel geologic targets for follow-up observation onboard the Mars Science Laboratory (MSL) rover (data type: grayscale images)

    Earth science: detecting anomalous samples in satellite time series corresponding to ground-truth observations of maize crops (data type: time series/feature vectors)

    Fashion-MNIST/MNIST: benchmark task to detect anomalous MNIST images among Fashion-MNIST images (data type: grayscale images)

    Each dataset contains a "fit" dataset (used for fitting or training outlier detection models), a "score" dataset (used for scoring samples used to evaluate model performance, analogous to test set), and a label dataset (indicates whether samples in the score dataset are considered outliers or not in the domain of each dataset).

    To read more about the datasets and how they are used for outlier detection, or to cite this dataset in your own work, please see the following citation:

    Kerner, H. R., Rebbapragada, U., Wagstaff, K. L., Lu, S., Dubayah, B., Huff, E., Lee, J., Raman, V., and Kulshrestha, S. (2022). Domain-agnostic Outlier Ranking Algorithms (DORA)-A Configurable Pipeline for Facilitating Outlier Detection in Scientific Datasets. Under review for Frontiers in Astronomy and Space Sciences.

  6. f

    Comparison of Top1 and AUROC results when using one and multiple reference...

    • plos.figshare.com
    xls
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siqi Yin; Lifan Jiang (2024). Comparison of Top1 and AUROC results when using one and multiple reference images. [Dataset]. http://doi.org/10.1371/journal.pone.0310730.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Siqi Yin; Lifan Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The left side of each ‘-’ represents the Top1 result, and the right side of each ‘-’ represents the AUROC result.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Popular Benchmarks (2022). Mnist Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/mnist-cjkff/model/2

Mnist Dataset

mnist-cjkff

mnist-dataset

Explore at:
zipAvailable download formats
Dataset updated
Aug 8, 2022
Dataset authored and provided by
Popular Benchmarks
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured
Digits
Description

THE MNIST DATABASE of handwritten digits

Authors:

  • Yann LeCun, Courant Institute, NYU
  • Corinna Cortes, Google Labs, New York
  • Christopher J.C. Burges, Microsoft Research, Redmond

Dataset Obtained From: http://yann.lecun.com/exdb/mnist/

All images were sized 28x28 in the original dataset

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Version 1 (original-images_trainSetSplitBy80_20):

  • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
  • Trained from Roboflow Classification Model's ImageNet training checkpoint

Version 2 (original-images_ModifiedClasses_trainSetSplitBy80_20):

  • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
  • Modify Classes, a Roboflow preprocessing feature, was employed to change class names from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to one, two, three, four, five, six, seven, eight, nine
  • Trained from the Roboflow Classification Model's ImageNet training checkpoint

Version 3 (original-images_Original-MNIST-Splits):

  • Original images, with the original splits for MNIST: train (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.
  • This version was not trained

Citation:

@article{lecun2010mnist,
 title={MNIST handwritten digit database},
 author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
 journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
 volume={2},
 year={2010}
}
Search
Clear search
Close search
Google apps
Main menu