44 datasets found

cnn_c1
kaggle.com
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
satya (2021). cnn_c1 [Dataset]. https://www.kaggle.com/satyapr/cnn-c1/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
satya
Description
Dataset

This dataset was created by satya

Contents
h
mnist1d
huggingface.co
opendatalab.com
Updated Oct 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Akiki (2024). mnist1d [Dataset]. https://huggingface.co/datasets/christopher/mnist1d
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 9, 2024
Authors
Christopher Akiki
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
[!NOTE] This dataset card is based on the README file of the authors' GitHub repository: https://github.com/greydanus/mnist1d

The MNIST-1D Dataset

Most machine learning models get around the same ~99% test accuracy on MNIST. The MNIST-1D dataset is 100x smaller (default sample size: 4000+1000; dimensionality: 40) and does a better job of separating between models with/without nonlinearity and models with/without spatial inductive biases. MNIST-1D is a core teaching dataset in… See the full description on the dataset page: https://huggingface.co/datasets/christopher/mnist1d.
h
mnist_augmented
huggingface.co
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Anis Ur Rahman (2025). mnist_augmented [Dataset]. https://huggingface.co/datasets/ianisdev/mnist_augmented
Explore at:
Dataset updated
Jul 27, 2025
Authors
Muhammad Anis Ur Rahman
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for mnist_augmented

This dataset contains augmented versions of the MNIST dataset, created to benchmark how various augmentation strategies impact digit classification accuracy using deep learning models. The dataset is provided as a .zip file and must be unzipped before use. It follows the ImageFolder structure compatible with PyTorch and other DL frameworks.

📥 Download & Extract

wget… See the full description on the dataset page: https://huggingface.co/datasets/ianisdev/mnist_augmented.
f
Model comparison results using MNIST-C and MNIST-C-shape datasets.
plos.figshare.com
xls
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seoyoung Ahn; Hossein Adeli; Gregory J. Zelinsky (2024). Model comparison results using MNIST-C and MNIST-C-shape datasets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012159.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1012159.t001
Dataset updated
Jun 13, 2024
Dataset provided by
PLOS Computational Biology
Authors
Seoyoung Ahn; Hossein Adeli; Gregory J. Zelinsky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recognition accuracy (means and standard deviations from 5 trained models, hereafter referred to as model “runs”) from ORA and two CNN baselines, both of which were trained using identical CNN encoders (one a 2-layer CNN and the other a Resnet-18), and a CapsNet model following the implementation in [51].
MNIST NET
kaggle.com
Updated Feb 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ritvik Rastogi (2022). MNIST NET [Dataset]. https://www.kaggle.com/datasets/ritvik1909/mnist-net/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 9, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ritvik Rastogi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains SOTA models finetuned on MNIST Handwritten Digits Classification task. The inspiration behind this was to implement FID for evaluation of GANs trained on MNIST data.

Contents: * mnist_net: Mobile Net V2 model, 98.7% accurate
f
Different implementations on MNIST object detection accuracy (%) with input...
plos.figshare.com
xls
Updated Dec 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reyhane Ahmadi; Amirreza Ahmadnejad; Somayyeh Koohi (2024). Different implementations on MNIST object detection accuracy (%) with input image size. [Dataset]. http://doi.org/10.1371/journal.pone.0313547.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313547.t002
Dataset updated
Dec 30, 2024
Dataset provided by
PLOS ONE
Authors
Reyhane Ahmadi; Amirreza Ahmadnejad; Somayyeh Koohi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Different implementations on MNIST object detection accuracy (%) with input image size.
MNIST Preprocessed
kaggle.com
Updated Jul 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentyn Sichkar (2019). MNIST Preprocessed [Dataset]. https://www.kaggle.com/valentynsichkar/mnist-preprocessed/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 24, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Valentyn Sichkar
Description
📰 Related Paper

Sichkar V. N. Effect of various dimension convolutional layer filters on traffic sign classification accuracy. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 3, pp. DOI: 10.17586/2226-1494-2019-19-3-546-552 (Full-text available here ResearchGate.net/profile/Valentyn_Sichkar)

Test online with custom Traffic Sign here: https://valentynsichkar.name/mnist.html

:mortar_board: Related course for classification tasks

Design, Train & Test deep CNN for Image Classification. Join the course & enjoy new opportunities to get deep learning skills: https://www.udemy.com/course/convolutional-neural-networks-for-image-classification/

https://github.com/sichkar-valentyn/1-million-images-for-Traffic-Signs-Classification-tasks/blob/main/images/slideshow_classification.gif?raw=true%20=470x516" alt="CNN Course" title="CNN Course">

🗺️ Concept Map of the Course

https://github.com/sichkar-valentyn/1-million-images-for-Traffic-Signs-Classification-tasks/blob/main/images/concept_map.png?raw=true%20=570x410" alt="Concept map" title="Concept map">

👉 Join the Course

https://www.udemy.com/course/convolutional-neural-networks-for-image-classification/

Content

This is ready to use preprocessed data saved into pickle file.
Preprocessing stages are as follows:
- Normalizing whole data by dividing / 255.0.
- Dividing whole data into three datasets: train, validation and test.
- Normalizing whole data by subtracting mean image and dividing by standard deviation.
- Transposing every dataset to make channels come first.

mean image and standard deviation were calculated from train dataset and applied to all datasets.
When using user's image for classification, it has to be preprocessed firstly in the same way: normalized, subtracted with mean image and divided by standard deviation.

Data written as dictionary with following keys:
x_train: (59000, 1, 28, 28)
y_train: (59000,)
x_validation: (1000, 1, 28, 28)
y_validation: (1000,)
x_test: (1000, 1, 28, 28)
y_test: (1000,)

Contains pretrained weights model_params_ConvNet1.pickle for the model with following architecture:
Input --> Conv --> ReLU --> Pool --> Affine --> ReLU --> Affine --> Softmax

Parameters:

Input is 1-channeled GrayScale image.

32 filters of Convolutional Layer.

Stride for Pool is 2 and height = width = 2.

Number of hidden neurons is 500.

Number of output neurons is 10.

Architecture also can be understood as follows:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3400968%2Fc23041248e82134b7d43ed94307b720e%2FModel_1_Architecture_MNIST.png?generation=1563654250901965&alt=media" alt="">

Acknowledgements

Initial data is MNIST that was collected by Yann LeCun, Corinna Cortes, Christopher J.C. Burges.
Z
[MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image...
data.niaid.nih.gov
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Shi (2024). [MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification with Multiple Size Options: 28 (MNIST-Like), 64, 128, and 224 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5208229
Explore at:
Dataset updated
Nov 28, 2024
Dataset provided by
Zequan Liu
Hanspeter Pfister
Bilian Ke
Bingbing Ni
Donglai Wei
Lin Zhao
Rui Shi
Jiancheng Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code [GitHub] | Publication [Nature Scientific Data'23 / ISBI'21] | Preprint [arXiv]

Abstract

We introduce MedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning. We benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools. The data and code are publicly available at https://medmnist.com/.

Disclaimer: The only official distribution link for the MedMNIST dataset is Zenodo. We kindly request users to refer to this original dataset link for accurate and up-to-date data.

Update: We are thrilled to release MedMNIST+ with larger sizes: 64x64, 128x128, and 224x224 for 2D, and 64x64x64 for 3D. As a complement to the previous 28-size MedMNIST, the large-size version could serve as a standardized benchmark for medical foundation models. Install the latest API to try it out!

Python Usage

We recommend our official code to download, parse and use the MedMNIST dataset:

% pip install medmnist% python

To use the standard 28-size (MNIST-like) version utilizing the downloaded files:

from medmnist import PathMNIST

train_dataset = PathMNIST(split="train")

To enable automatic downloading by setting download=True:

from medmnist import NoduleMNIST3D

val_dataset = NoduleMNIST3D(split="val", download=True)

Alternatively, you can access MedMNIST+ with larger image sizes by specifying the size parameter:

from medmnist import ChestMNIST

test_dataset = ChestMNIST(split="test", download=True, size=224)

Citation

If you find this project useful, please cite both v1 and v2 paper as:

Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, Bingbing Ni. Yang, Jiancheng, et al. "MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification." Scientific Data, 2023.

Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis". IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021.

or using bibtex:

@article{medmnistv2, title={MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification}, author={Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing}, journal={Scientific Data}, volume={10}, number={1}, pages={41}, year={2023}, publisher={Nature Publishing Group UK London} }

@inproceedings{medmnistv1, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, booktitle={IEEE 18th International Symposium on Biomedical Imaging (ISBI)}, pages={191--195}, year={2021} }

Please also cite the corresponding paper(s) of source data if you use any subset of MedMNIST as per the description on the project website.

License

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0), except DermaMNIST under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

The code is under Apache-2.0 License.

Changelog

v3.0 (this repository): Released MedMNIST+ featuring larger sizes: 64x64, 128x128, and 224x224 for 2D, and 64x64x64 for 3D.

v2.2: Removed a small number of mistakenly included blank samples in OrganAMNIST, OrganCMNIST, OrganSMNIST, OrganMNIST3D, and VesselMNIST3D.

v2.1: Addressed an issue in the NoduleMNIST3D file (i.e., nodulemnist3d.npz). Further details can be found in this issue.

v2.0: Launched the initial repository of MedMNIST v2, adding 6 datasets for 3D and 2 for 2D.

v1.0: Established the initial repository (in a separate repository) of MedMNIST v1, featuring 10 datasets for 2D.

Note: This dataset is NOT intended for clinical use.
f
Comparison results (mean ± STD%) of different methods on the MNIST database....
plos.figshare.com
xls
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinshan Qi; Rui Xu (2025). Comparison results (mean ± STD%) of different methods on the MNIST database. [Dataset]. http://doi.org/10.1371/journal.pone.0326950.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0326950.t006
Dataset updated
Jul 17, 2025
Dataset provided by
PLOS ONE
Authors
Jinshan Qi; Rui Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison results (mean ± STD%) of different methods on the MNIST database.
4
Data and code underlying the research of: CCO-ADC for CIM Accelerators
data.4tu.nl
zip
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhairaj Singh (2024). Data and code underlying the research of: CCO-ADC for CIM Accelerators [Dataset]. http://doi.org/10.4121/e6614bef-e325-4555-b53c-1e236b8b23cd.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/e6614bef-e325-4555-b53c-1e236b8b23cd.v1
Dataset updated
Feb 16, 2024
Dataset provided by
4TU.ResearchData
Authors
Abhairaj Singh
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
DAIS
Description
This targets image classification applications. This work presents a memory-periphery co-design to perform accurate A/D conversions of analog matrix-vector-multiplication (MVM) outputs. A novel scheme is introduced where select-lines and bit-lines in the memory are virtu- ally fixed to improve conversion accuracy and aid a ring-oscillator-based A/D conversion, equipped with component sharing and inter-matching of the reference blocks. In addition, we deploy a self-timed technique to further ensure high robustness addressing global design and cycle-to-cycle variations. The concept is demonstrated using a 4Kb CIM chip prototype using resistive bitcells on TSMC 40nm CMOS technology. This dataset includes schematic netlist files, chip photos, raw data on the Excel sheets for latency and power estimations/simulation results, and Matlab codes for generating the graphs and figures in the associated publication.
t
Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo...
service.tib.eu
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo (2024). Dataset: MNIST, CIFAR10, and FEMNIST datasets. https://doi.org/10.57702/dictjdlh [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist--cifar10--and-femnist-datasets
Explore at:
Dataset updated
Dec 3, 2024
Description
MNIST, CIFAR10, and FEMNIST datasets are used to evaluate the effect of accuracy in various datasets.
f
Dataset features.
plos.figshare.com
xls
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zilong Deng; Yizhang Wang; Mustafa Muwafak Alobaedy (2025). Dataset features. [Dataset]. http://doi.org/10.1371/journal.pone.0326145.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0326145.t001
Dataset updated
Jun 12, 2025
Dataset provided by
PLOS ONE
Authors
Zilong Deng; Yizhang Wang; Mustafa Muwafak Alobaedy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Federated clustering is a distributed clustering algorithm that does not require the transmission of raw data and is widely used. However, it struggles to handle Non-IID data effectively because it is difficult to obtain accurate global consistency measures under Non-Independent and Identically Distributed (Non-IID) conditions. To address this issue, we propose a federated k-means clustering algorithm based on a cluster backbone called FKmeansCB. First, we add Laplace noise to all the local data, and run k-means clustering on the client side to obtain cluster centers, which faithfully represent the cluster backbone (i.e., the data structures of the clusters). The cluster backbone represents the client’s features and can approximatively capture the features of different labeled data points in Non-IID situations. We then upload these cluster centers to the server. Subsequently, the server aggregates all cluster centers and runs the k-means clustering algorithm to obtain global cluster centers, which are then sent back to the client. Finally, the client assigns all data points to the nearest global cluster center to produce the final clustering results. We have validated the performance of our proposed algorithm using six datasets, including the large-scale MNIST dataset. Compared with the leading non-federated and federated clustering algorithms, FKmeansCB offers significant advantages in both clustering accuracy and running time.
f
Accuracy for the fashion-mnist data set.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raoul Heese; Jochen Schmid; Michał Walczak; Michael Bortz (2023). Accuracy for the fashion-mnist data set. [Dataset]. http://doi.org/10.1371/journal.pone.0279876.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0279876.t006
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Raoul Heese; Jochen Schmid; Michał Walczak; Michael Bortz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Top-1 to top-5 accuracy of our naive and of our informed CASIMAC on the fashion-mnist data set. In the naive approach we use a purely Euclidean distance metric between the images, whereas the informed approach also takes the structrual image similarity into account. The best scores are highlighted in bold.
Z
Galaxy10 SDSS
data.niaid.nih.gov
zenodo.org
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leung, W. Henry (2024). Galaxy10 SDSS [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10844811
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Leung, W. Henry
Bovy, Jo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Galaxy10 SDSS is a dataset contains 21785 69x69 pixels colored galaxy images (g, r and i band) separated in 10 classes. Galaxy10 SDSS images come from Sloan Digital Sky Survey and labels come from Galaxy Zoo.

These classes are mutually exclusive, but Galaxy Zoo relies on human volunteers to classify galaxy images and the volunteers do not agree on all images. For this reason, Galaxy10 only contains images for which more than 55% of the votes agree on the class. That is, more than 55% of the votes among 10 classes are for a single class for that particular image. If none of the classes get more than 55%, the image will not be included in Galaxy10 as no agreement was reached. As a result, 21785 images after the cut.

The justification of 55% as the threshold is based on validation. Galaxy10 is meant to be an alternative to MNIST or Cifar10 as a deep learning toy dataset for astronomers. Thus astroNN.models.Cifar10_CNN is used with Cifar10 as a reference. The validation was done on the same astroNN.models.Cifar10_CNN. 50% threshold will result a poor neural network classification accuracy although around 36000 images in the dataset, many are probably misclassified and neural network has a difficult time to learn. 60% threshold result is similar to 55% , both classification accuracy is similar to Cifar10 dataset on the same network, but 55% threshold will have more images be included in the dataset. Thus 55% was chosen as the threshold to cut data.

The original images are 424x424, but were cropped to 207x207 centered at the images and then downscaled 3 times via bilinear interpolation to 69x69 in order to make them manageable on most computer and graphics card memory.

There is no guarantee on the accuracy of the labels. Moreover, Galaxy10 is not a balanced dataset and it should only be used for educational or experimental purpose. If you use Galaxy10 for research purpose, please cite Galaxy Zoo and Sloan Digital Sky Survey.

For more information on the original classification tree: Galaxy Zoo Decision Tree.
Z
Data from: SIDDA: SInkhorn Dynamic Domain Adaptation for Image...
data.niaid.nih.gov
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pandya, Sneh (2025). SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14583106
Explore at:
Dataset updated
Jan 23, 2025
Dataset authored and provided by
Pandya, Sneh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets used in the paper "SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks"

Abstract:

Modern deep learning models often do not generalize well in the presence of a "covariate shift"; that is, in situations where the training and test data distributions differ, but the conditional distribution of classification labels given the data remains unchanged. In such cases, neural network (NN) generalization can be reduced to a problem of learning more robust, domain-invariant features that enable the correct alignment of the two datasets in the network's latent space. Domain adaptation (DA) methods include a broad range of techniques aimed at achieving this, which allows the model to perform well on multiple datasets. However, these methods have struggled with the need for extensive hyperparameter tuning, which then incurs significant computational costs. In this work, we introduce SIDDA, an out-of-the-box DA training algorithm built upon the Sinkhorn divergence, that can achieve effective domain alignment with minimal hyperparameter tuning and computational overhead. We demonstrate the efficacy of our method on multiple simulated and real datasets of varying complexity, including simple shapes, handwritten digits, and real astronomical observational data. These datasets include covariate shifts induced by noise and blurring, as well as more complex differences between real astronomical data observed by different telescopes. SIDDA is compatible with a variety of NN architectures, and it works particularly well in improving classification accuracy and model calibration when paired with equivariant neural networks (ENNs), which respect data symmetries by design. We find that SIDDA consistently improves the generalization capabilities of NNs, enhancing classification accuracy in unlabeled target data by up to 40%. Simultaneously, the inclusion of SIDDA during training can improve performance on the labeled source data, though with a more modest enhancement of approximately 1%. We also study the efficacy of DA on ENNs with respect to the varying group orders of the dihedral group D_N, and find that the model performance improves as the degree of equivariance increases. Finally, we find that SIDDA can also improve the model calibration on both source and target data. The largest improvements are obtained when the model is applied to the unlabeled target domain, reaching more than an order of magnitude improvement in both the expected calibration error and the Brier score. SIDDA's versatility across various NN models and datasets, combined with its automated approach to domain alignment, has the potential to significantly advance multi-dataset studies by enabling the development of highly generalizable models.

Datasets:

Dataset directories include train and test subdirectories, which include the source and target domain data within them. The simulated datasets of shapes and astronomical objects were generated using DeepBench, with code for noise and PSF blurring found on our Github. The MNIST-M dataset can be found publically, and the Galaxy Zoo Evo dataset can be accessed following the steps on HuggingFace. Data was split into an 80%/20% train/test split.

Simulated shapes:

train:

source

target (noise)

test:

source

target (noise)

Simulated astronomical objects:

train:

source

target (noise)

test:

source

target (noise

MNIST-M:

train:

source

target (noise)

target (PSF)

test:

source

target (noise)

target (PSF)

Galaxy Zoo Evo:

train:

source (GZ SDSS)

target (GZ DESI)

test:

source (GZ SDSS)

target (GZ DESI)

Paper Data:

Data for generating Figures 4 and 5 in the paper are included in isomap_plot_data.zip and js_distances_group_order.zip, respectively. The code for generating the figures can be found in the notebooks on our Github. Figures 2 and 3 are visualizations of the datasets included here.
LLM prompts in the context of machine learning
kaggle.com
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Nelson (2024). LLM prompts in the context of machine learning [Dataset]. https://www.kaggle.com/datasets/jordanln/llm-prompts-in-the-context-of-machine-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 1, 2024
Dataset provided by
Kaggle
Authors
Jordan Nelson
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is an extension of my previous work on creating a dataset for natural language processing tasks. It leverages binary representation to characterise various machine learning models. The attributes in the dataset are derived from a dictionary, which was constructed from a corpus of prompts typically provided to a large language model (LLM). These prompts reference specific machine learning algorithms and their implementations. For instance, consider a user asking an LLM or a generative AI to create a Multi-Layer Perceptron (MLP) model for a particular application. By applying this concept to multiple machine learning models, we constructed our corpus. This corpus was then transformed into the current dataset using a bag-of-words approach. In this dataset, each attribute corresponds to a word from our dictionary, represented as a binary value: 1 indicates the presence of the word in a given prompt, and 0 indicates its absence. At the end of each entry, there is a label. Each entry in the dataset pertains to a single class, where each class represents a distinct machine learning model or algorithm. This dataset is intended for multi-class classification tasks, not multi-label classification, as each entry is associated with only one label and does not belong to multiple labels simultaneously. This dataset has been utilised with a Convolutional Neural Network (CNN) using the Keras Automodel API, achieving impressive training and testing accuracy rates exceeding 97%. Post-training, the model's predictive performance was rigorously evaluated in a production environment, where it continued to demonstrate exceptional accuracy. For this evaluation, we employed a series of questions, which are listed below. These questions were intentionally designed to be similar to ensure that the model can effectively distinguish between different machine learning models, even when the prompts are closely related.

KNN How would you create a KNN model to classify emails as spam or not spam based on their content and metadata? How could you implement a KNN model to classify handwritten digits using the MNIST dataset? How would you use a KNN approach to build a recommendation system for suggesting movies to users based on their ratings and preferences? How could you employ a KNN algorithm to predict the price of a house based on features such as its location, size, and number of bedrooms etc? Can you create a KNN model for classifying different species of flowers based on their petal length, petal width, sepal length, and sepal width? How would you utilise a KNN model to predict the sentiment (positive, negative, or neutral) of text reviews or comments? Can you create a KNN model for me that could be used in malware classification? Can you make me a KNN model that can detect a network intrusion when looking at encrypted network traffic? Can you make a KNN model that would predict the stock price of a given stock for the next week? Can you create a KNN model that could be used to detect malware when using a dataset relating to certain permissions a piece of software may have access to?

Decision Tree Can you describe the steps involved in building a decision tree model to classify medical images as malignant or benign for cancer diagnosis and return a model for me? How can you utilise a decision tree approach to develop a model for classifying news articles into different categories (e.g., politics, sports, entertainment) based on their textual content? What approach would you take to create a decision tree model for recommending personalised university courses to students based on their academic strengths and weaknesses? Can you describe how to create a decision tree model for identifying potential fraud in financial transactions based on transaction history, user behaviour, and other relevant data? In what ways might you apply a decision tree model to classify customer complaints into different categories determining the severity of language used? Can you create a decision tree classifier for me? Can you make me a decision tree model that will help me determine the best course of action across a given set of strategies? Can you create a decision tree model for me that can recommend certain cars to customers based on their preferences and budget? How can you make a decision tree model that will predict the movement of star constellations in the sky based on data provided by the NASA website? How do I create a decision tree for time-series forecasting?

Random Forest Can you describe the steps involved in building a random forest model to classify different types of anomalies in network traffic data for cybersecurity purposes and return the code for me? In what ways could you implement a random forest model to predict the severity of traffic congestion in urban areas based on historical traffic patterns, weather...
American Sign Language dataset for semantic communications
zenodo.org
ieee-dataport.org
zip
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vasileios Kouvakis; Vasileios Kouvakis; Lamprini Mitsiou; Stylianos E. Trevlakis; Stylianos E. Trevlakis; Alexandros-Apostolos A. Boulogeorgos; Alexandros-Apostolos A. Boulogeorgos; Theodoros Tsiftsis; Theodoros Tsiftsis; Lamprini Mitsiou (2025). American Sign Language dataset for semantic communications [Dataset]. http://doi.org/10.21227/2c1z-8j21
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.21227/2c1z-8j21
Dataset updated
Jan 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vasileios Kouvakis; Vasileios Kouvakis; Lamprini Mitsiou; Stylianos E. Trevlakis; Stylianos E. Trevlakis; Alexandros-Apostolos A. Boulogeorgos; Alexandros-Apostolos A. Boulogeorgos; Theodoros Tsiftsis; Theodoros Tsiftsis; Lamprini Mitsiou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
The dataset was developed as part of the NANCY project (https://nancy-project.eu/) to support tasks in the computer vision area. It is specifically designed for sign language recognition, focusing on representing joints and finger positions. The dataset comprises images of hands that represent the alphabet in American Sign Language (ASL), with the exception of the letters "J" and "Z," as these involve motion and the dataset is limited to static images. A significant feature of the dataset is the use of color-coding, where each finger is associated with a distinct color. This approach enhances the ability to extract features and distinguish between different fingers, offering significant advantages over traditional grayscale datasets like MNIST. The dataset consists of RGB images, which enhance the recognition process and support more effective learning, achieving high performance even with a relatively modest amount of training data. This format improves the ability to discriminate and extract features compared to grayscale images. Although the use of RGB images introduces additional complexity, such as increased data representation and storage requirements, the advantages in accuracy and feature extraction make it a valuable choice. The dataset is well-suited for applications involving gesture recognition, sign language interpretation, and other tasks requiring detailed analysis of joint and finger positions.

The NANCY project has received funding from the Smart Networks and Services Joint Undertaking (SNS JU) under the European Union's Horizon Europe research and innovation programme under Grant Agreement No 101096456.
O
notMNIST
opendatalab.com
datasets.activeloop.ai
+3more
zip
Updated Sep 8, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vrije University Amsterdam (2011). notMNIST [Dataset]. https://opendatalab.com/OpenDataLab/notMNIST
Explore at:
zipAvailable download formats
Dataset updated
Sep 8, 2011
Dataset provided by
University of Amsterdam
Vrije University Amsterdam
Skoltech
Description
Taken some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A-J taken from different fonts. Here are some examples of letter "A".udging by the examples, one would expect this to be a harder task than MNIST. This seems to be the case -- logistic regression on top of stacked auto-encoder with fine-tuning gets about 89% accuracy whereas same approach gives got 98% on MNIST. Dataset consists of small hand-cleaned part, about 19k instances, and large uncleaned dataset, 500k instances. Two parts have approximately 0.5% and 6.5% label error rate. I got this by looking through glyphs and counting how often my guess of the letter didn't match it's unicode value in the font file.
f
The influence of parameters privacy budget () on clustering results (ARI).
plos.figshare.com
figshare.com
xls
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zilong Deng; Yizhang Wang; Mustafa Muwafak Alobaedy (2025). The influence of parameters privacy budget () on clustering results (ARI). [Dataset]. http://doi.org/10.1371/journal.pone.0326145.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0326145.t003
Dataset updated
Jun 12, 2025
Dataset provided by
PLOS ONE
Authors
Zilong Deng; Yizhang Wang; Mustafa Muwafak Alobaedy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The influence of parameters privacy budget () on clustering results (ARI).
d
Replication Data for: Exploring Neural Network Weaknesses: Insights from...
search.dataone.org
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang, Jun-Jie; Deyu Meng (2023). Replication Data for: Exploring Neural Network Weaknesses: Insights from Quantum Principles [Dataset]. http://doi.org/10.7910/DVN/SWDL1S
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SWDL1S
Dataset updated
Dec 16, 2023
Dataset provided by
Harvard Dataverse
Authors
Zhang, Jun-Jie; Deyu Meng
Description
The dataset contains the code and raw data for exploiting the accuracy-robustness trade-off from the principle of the uncertainty principle in quantum physics. # The folder contains two sub-folders: "data upload" and "figure&plot". ## In "data upload" the three network structures are used for cifar-10 and mnist. Take the sub-sub-folder "cifar conv" as an example. One starts with the two notebooks named "selected_train_netwrok1_test2.ipynb" and "selected_train_netwrok2_test2.ipynb", where the former performs the training of the complete Convolutional Network while the later divide the convolutional layers into two parts - feature extractor and classifier. After running the the two notebooks, the weights of the networks at each training epoch are saved in the folder "model". Then one runs the other two notebooks named "scanner-x.ipynb" and "scanner-feature-crt.ipynb", where the former performs the Monte-Carlo integrations on multi-GPUs with respect to the normalized loss function of the complete Convolutional Network, while the later only integrates the classifiers (the second part of the complete Convolutional Network). Last, one opens the notebook "plotter.ipynb" to see the results. ## In "figure&plot" we mainly plot the figures in the paper. The txt files are simply copied from the "data upload" folder. To see the figures, one needs to open the file "plot.nb" with Mathematica.

Facebook

Twitter

Click to copy link

Link copied

Cite

satya (2021). cnn_c1 [Dataset]. https://www.kaggle.com/satyapr/cnn-c1/code

cnn_c1

train cnn model accuracy 98.3 mnist handwritten

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 9, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

satya

Description

Dataset

This dataset was created by satya

Clear search

Close search

Google apps

Main menu

cnn_c1

Dataset

Contents

mnist1d

mnist_augmented

Model comparison results using MNIST-C and MNIST-C-shape datasets.

MNIST NET

Different implementations on MNIST object detection accuracy (%) with input...

MNIST Preprocessed

📰 Related Paper

:mortar_board: Related course for classification tasks

🗺️ Concept Map of the Course

👉 Join the Course

Content

Acknowledgements

[MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image...

Comparison results (mean ± STD%) of different methods on the MNIST database....

Data and code underlying the research of: CCO-ADC for CIM Accelerators

Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo...

Dataset features.

Accuracy for the fashion-mnist data set.

Galaxy10 SDSS

Data from: SIDDA: SInkhorn Dynamic Domain Adaptation for Image...

LLM prompts in the context of machine learning

American Sign Language dataset for semantic communications

notMNIST

The influence of parameters privacy budget () on clustering results (ARI).

Replication Data for: Exploring Neural Network Weaknesses: Insights from...

cnn_c1

train cnn model accuracy 98.3 mnist handwritten

Dataset

Contents