In this paper, we present a case study on approximate multipliers for MNIST Convolutional Neural Network (CNN). We apply approximate multipliers with different bit-width to the convolution layer in MNIST CNN, evaluate the accuracy of MNIST classification, and analyze the trade-off between approximate multiplier’s area, critical path delay and the accuracy. Based on the results of the evaluation and analysis, we propose a design methodology for approximate multipliers. The approximate multipliers consist of some partial products, which are carefully selected according to the CNN input. With this methodology, we further reduce the area and the delay of the multipliers with keeping high accuracy of the MNIST classification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The architecture of the LeNet-5 convolutional neural network (CNN) was defined by LeCun in its paper "Gradient-based learning applied to document recognition" (https://ieeexplore.ieee.org/document/726791) to classify images of hand written digits (MNIST dataset).
This architecture has been customized to use Rectified Linear Unit (ReLU) as activation functions instead of Sigmoid, and 8-bit integers for weights and activations instead of floating-point.
It consists of the following layers:
The fault hypotheses for this work include the occurrence of:
In the memory cells containing all the parameters of the CNN:
Images 200 to 249 from the MNIST dataset have been used as workload.
This dataset contains the raw data obtained from running exhaustive fault injection campaigns for all considered fault models, targeting all considered locations and for all the images in the workload.
In addition, the raw data have been lightly processed to obtain global data related to the particular bits and parameters affected by the faults, and the obtained failure modes.
First, the CNN was used to classify all the images of the workload in the absence of faults to get a reference to determine the impact of faults. This is golden_run.csv file.
After that, one fault injection experiment was executed for each bit of each element of each parameter of the CNN.
Each experiment consisted in:
Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significant loss of critical information. Based on the outlined issues, we introduce a novel research problem of training CNN models for very large images, and present ‘UltraMNIST dataset’, a simple yet representative benchmark dataset for this task. UltraMNIST has been designed using the popular MNIST digits with additional levels of complexity added to replicate well the challenges of real-world problems. We present two variants of the problem: ‘UltraMNIST classification’ and ‘Budget-aware UltraMNIST classification’. The standard UltraMNIST classification benchmark is intended to facilitate the development of novel CNN training methods that make the effective use of the best available GPU resources. The budget-aware variant is intended to promote development of methods that work under constrained GPU memory. For the development of competitive solutions, we present several baseline models for the standard benchmark and its budget-aware variant. We study the effect of reducing resolution on the performance and present results for baseline models involving pretrained backbones from among the popular state-of-the-art models. Finally, with the presented benchmark dataset and the baselines, we hope to pave the ground for a new generation of CNN methods suitable for handling large images in an efficient and resource-light manner. UltraMNIST dataset comprises very large-scale images, each of 4000x4000 pixels with 3-5 digits per image. Each of these digits has been extracted from the original MNIST dataset. Your task is to predict the sum of the digits per image, and this number can be anything from 0 to 27.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recognition accuracy (means and standard deviations from 5 trained models, hereafter referred to as model “runs”) from ORA and two CNN baselines, both of which were trained using identical CNN encoders (one a 2-layer CNN and the other a Resnet-18), and a CapsNet model following the implementation in [51].
The dataset contains the code and raw data for exploiting the accuracy-robustness trade-off from the principle of the uncertainty principle in quantum physics. # The folder contains two sub-folders: "data upload" and "figure&plot". ## In "data upload" the three network structures are used for cifar-10 and mnist. Take the sub-sub-folder "cifar conv" as an example. One starts with the two notebooks named "selected_train_netwrok1_test2.ipynb" and "selected_train_netwrok2_test2.ipynb", where the former performs the training of the complete Convolutional Network while the later divide the convolutional layers into two parts - feature extractor and classifier. After running the the two notebooks, the weights of the networks at each training epoch are saved in the folder "model". Then one runs the other two notebooks named "scanner-x.ipynb" and "scanner-feature-crt.ipynb", where the former performs the Monte-Carlo integrations on multi-GPUs with respect to the normalized loss function of the complete Convolutional Network, while the later only integrates the classifiers (the second part of the complete Convolutional Network). Last, one opens the notebook "plotter.ipynb" to see the results. ## In "figure&plot" we mainly plot the figures in the paper. The txt files are simply copied from the "data upload" folder. To see the figures, one needs to open the file "plot.nb" with Mathematica.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The widespread adoption of cloud computing necessitates privacy-preserving techniques that allow information to be processed without disclosure. This paper proposes a method to increase the accuracy and performance of privacy-preserving Convolutional Neural Networks with Homomorphic Encryption (CNN-HE) by Self-Learning Activation Functions (SLAF). SLAFs are polynomials with trainable coefficients updated during training, together with synaptic weights, for each polynomial independently to learn task-specific and CNN-specific features. We theoretically prove its feasibility to approximate any continuous activation function to the desired error as a function of the SLAF degree. Two CNN-HE models are proposed: CNN-HE-SLAF and CNN-HE-SLAF-R. In the first model, all activation functions are replaced by SLAFs, and CNN is trained to find weights and coefficients. In the second one, CNN is trained with the original activation, then weights are fixed, activation is substituted by SLAF, and CNN is shortly re-trained to adapt SLAF coefficients. We show that such self-learning can achieve the same accuracy 99.38% as a non-polynomial ReLU over non-homomorphic CNNs and lead to an increase in accuracy (99.21%) and higher performance (6.26 times faster) than the state-of-the-art CNN-HE CryptoNets on the MNIST optical character recognition benchmark dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
In this paper, we present a case study on approximate multipliers for MNIST Convolutional Neural Network (CNN). We apply approximate multipliers with different bit-width to the convolution layer in MNIST CNN, evaluate the accuracy of MNIST classification, and analyze the trade-off between approximate multiplier’s area, critical path delay and the accuracy. Based on the results of the evaluation and analysis, we propose a design methodology for approximate multipliers. The approximate multipliers consist of some partial products, which are carefully selected according to the CNN input. With this methodology, we further reduce the area and the delay of the multipliers with keeping high accuracy of the MNIST classification.