Facebook
TwitterImageNet-v2 is an ImageNet test set (10 per class) collected by closely following the original labelling protocol. Each image has been labelled by at least 10 MTurk workers, possibly more, and depending on the strategy used to select which images to include among the 10 chosen for the given class there are three different versions of the dataset. Please refer to section four of the paper for more details on how the different variants were compiled.
The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_v2', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_v2-matched-frequency-3.0.0.png" alt="Visualization" width="500px">
Facebook
TwitterImageNet-A is a set of images labelled with ImageNet labels that were obtained by collecting new data and keeping only those images that ResNet-50 models fail to correctly classify. For more details please refer to the paper.
The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_a-0.1.0.png" alt="Visualization" width="500px">
Facebook
TwitterThis dataset contains ILSVRC-2012 (ImageNet) validation images augmented with a new set of "Re-Assessed" (ReaL) labels from the "Are we done with ImageNet" paper, see https://arxiv.org/abs/2006.07159. These labels are collected using the enhanced protocol, resulting in multi-label and more accurate annotations.
Important note: about 3500 examples contain no label, these should be excluded from the averaging when computing the accuracy. One possible way of doing this is with the following NumPy code:
is_correct = [pred in real_labels[i] for i, pred in enumerate(predictions) if real_labels[i]]
real_accuracy = np.mean(is_correct)
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet2012_real', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012_real-1.0.0.png" alt="Visualization" width="500px">
Facebook
TwitterImageNet-R is a set of images labelled with ImageNet labels that were obtained by collecting art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes. ImageNet-R has renditions of 200 ImageNet classes resulting in 30,000 images. by collecting new data and keeping only those images that ResNet-50 models fail to correctly classify. For more details please refer to the paper.
The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_r', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_r-0.2.0.png" alt="Visualization" width="500px">
Facebook
TwitterILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.
The test split contains 100K images but no labels because no labels have been publicly released. We provide support for the test split from 2012 with the minor patch released on October 10, 2019. In order to manually download this data, a user must perform the following operations:
The resulting tar-ball may then be processed by TFDS.
To assess the accuracy of a model on the ImageNet test split, one must run inference on all images in the split, export those results to a text file that must be uploaded to the ImageNet evaluation server. The maintainers of the ImageNet evaluation server permits a single user to submit up to 2 submissions per week in order to prevent overfitting.
To evaluate the accuracy on the test split, one must first create an account at image-net.org. This account must be approved by the site administrator. After the account is created, one can submit the results to the test server at https://image-net.org/challenges/LSVRC/eval_server.php The submission consists of several ASCII text files corresponding to multiple tasks. The task of interest is "Classification submission (top-5 cls error)". A sample of an exported text file looks like the following:
771 778 794 387 650
363 691 764 923 427
737 369 430 531 124
755 930 755 59 168
The export format is described in full in "readme.txt" within the 2013 development kit available here: https://image-net.org/data/ILSVRC/2013/ILSVRC2013_devkit.tgz Please see the section entitled "3.3 CLS-LOC submission format". Briefly, the format of the text file is 100,000 lines corresponding to each image in the test split. Each line of integers correspond to the rank-ordered, top 5 predictions for each test image. The integers are 1-indexed corresponding to the line number in the corresponding labels file. See labels.txt.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet2012', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012-5.1.0.png" alt="Visualization" width="500px">
Facebook
TwitterThis dataset is used in the Pytorch example Transfer Learning for Computer Vision Tutorial
Facebook
TwitterImagenette is a subset of 10 easily classified classes from the Imagenet dataset. It was originally prepared by Jeremy Howard of FastAI. The objective behind putting together a small version of the Imagenet dataset was mainly because running new ideas/algorithms/experiments on the whole Imagenet take a lot of time.
This version of the dataset allows researchers/practitioners to quickly try out ideas and share with others. The dataset comes in three variants:
Note: The v2 config correspond to the new 70/30 train/valid split (released in Dec 6 2019).
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenette', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenette-full-size-v2-1.0.0.png" alt="Visualization" width="500px">
Facebook
TwitterTaken from the README of the google-research/big_transfer repo:
by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby
In this repository we release multiple models from the Big Transfer (BiT): General Visual Representation Learning paper that were pre-trained on the ILSVRC-2012 and ImageNet-21k datasets. We provide the code to fine-tuning the released models in the major deep learning frameworks TensorFlow 2, PyTorch and Jax/Flax.
We hope that the computer vision community will benefit by employing more powerful ImageNet-21k pretrained models as opposed to conventional models pre-trained on the ILSVRC-2012 dataset.
We also provide colabs for a more exploratory interactive use: a TensorFlow 2 colab, a PyTorch colab, and a Jax colab.
Make sure you have Python>=3.6 installed on your machine.
To setup Tensorflow 2, PyTorch or Jax, follow the instructions provided in the corresponding repository linked here.
In addition, install python dependencies by running (please select tf2, pytorch or jax in the command below):
pip install -r bit_{tf2|pytorch|jax}/requirements.txt
First, download the BiT model. We provide models pre-trained on ILSVRC-2012 (BiT-S) or ImageNet-21k (BiT-M) for 5 different architectures: ResNet-50x1, ResNet-101x1, ResNet-50x3, ResNet-101x3, and ResNet-152x4.
For example, if you would like to download the ResNet-50x1 pre-trained on ImageNet-21k, run the following command:
wget https://storage.googleapis.com/bit_models/BiT-M-R50x1.{npz|h5}
Other models can be downloaded accordingly by plugging the name of the model (BiT-S or BiT-M) and architecture in the above command.
Note that we provide models in two formats: npz (for PyTorch and Jax) and h5 (for TF2). By default we expect that model weights are stored in the root folder of this repository.
Then, you can run fine-tuning of the downloaded model on your dataset of interest in any of the three frameworks. All frameworks share the command line interface
python3 -m bit_{pytorch|jax|tf2}.train --name cifar10_`date +%F_%H%M%S` --model BiT-M-R50x1 --logdir /tmp/bit_logs --dataset cifar10
Currently. all frameworks will automatically download CIFAR-10 and CIFAR-100 datasets. Other public or custom datasets can be easily integrated: in TF2 and JAX we rely on the extensible tensorflow datasets library. In PyTorch, we use torchvision’s data input pipeline.
Note that our code uses all available GPUs for fine-tuning.
We also support training in the low-data regime: the `--examples_per_class
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Tiny ImageNet Dataset is a dataset of 100,000 tiny (64x64) images of objects. It is a popular dataset for image classification and object detection research. The dataset consists of 200 different classes, each of which has 500 images.
Facebook
TwitterPetals_to_the_Metal.* models come from my popular notebook Computer Vision - Petals to the Metal🌻🌸🌹here In this notebook I take a step by step approach to: - Understand how TPUs work and how to use them ✅ - Explore transfer learning with 10+ models pretrained on either imagenet or noisy-student and evaluate their performance ✅ - Explore training large CNN models from scratch and evaluate their performance ✅ - Explore 10+ hyperparameter tuning methods and evaluate their performance ✅ - Explore 25+ combinations of models and tuning methods above and evaluate their performance ✅ - Ensemble models with loaded weights and evaluate their performance ✅ - Build a great looking vizualization that captures and highlights model + tuning performance ✅
The naming convention I follow in this dataset is: Kaggle Competition-[Tuning]-Model Name.[h5|tflite]
For example:
Petals_to_the_Metal-DenseNet201.h5 is a saved Keras model for the Petals to the Metal Kaggle competition using a pretrained DenseNet201 model on imagenet.
Petals_to_the_Metal-EfficientNetB7.h5 is a saved Keras model for the Petals to the Metal Kaggle competition using a pretrained EfficientNetB7 model on noisy-student.
Petals_to_the_Metal-70K_images-trainable_True-DenseNet201.h5 is a saved Keras model for the Petals to the Metal Kaggle competition starting with a pretrained DenseNet201 model on imagenet but performing end to end training (trainable_True). It also uses 70K (5x more than the standard models) images from other flower datasets.
Petals_to_the_Metal-70K_images-trainable_True-MobileNetV2.tflite is a TFLite model for the Petals to the Metal Kaggle competition converted from the Petals_to_the_Metal-70K_images-trainable_True-MobileNetV2.h5 model.
See my notebook above for more information.
Best 😀 George
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The file vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 contains pre-trained weights for the VGG16 convolutional neural network architecture, specifically designed for TensorFlow and Keras frameworks. This file is a crucial resource for researchers and practitioners in the field of deep learning, particularly those working on computer vision tasks.
VGG16 is a convolutional neural network architecture proposed by Karen Simonyan and Andrew Zisserman from the University of Oxford in their 2014 paper "Very Deep Convolutional Networks for Large-Scale Image Recognition". This network achieved top results in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, demonstrating exceptional performance in image classification tasks.
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
# Load the VGG16 model without top layers
base_model = VGG16(weights='path/to/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5',
include_top=False,
input_shape=(224, 224, 3))
# Add your own top layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)
# Create your new model
model = Model(inputs=base_model.input, outputs=predictions)
When using these weights, be aware of potential biases inherent in the ImageNet dataset. Consider the ethical implications and potential biases in your specific application.
By incorporating this weights file into your projects, you're building upon years of research and development in deep learning for computer vision. It's an excellent starting point for many image-related tasks and can significantly boost the performance of your models.
Facebook
TwitterThis folder contains the baseline model implementation for the Kaggle universal image embedding challenge based on
Following the above ideas, we also add a 64 projection layer on top of the Vision Transformer base model as the final embedding, since the competition requires embeddings of at most 64 dimensions. Please find more details in image_classification.py.
To use the code, please firstly install the prerequisites
pip install -r universal_embedding_challenge/requirements.txt
git clone https://github.com/tensorflow/models.git /tmp/models
export PYTHONPATH=$PYTHONPATH:/tmp/models
pip install --user -r /tmp/models/official/requirements.txt
Secondly, please download the imagenet1k data in TFRecord format from https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0 and https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1, and merge them together under folder imagenet-2012-tfrecord/. As a result, the paths to the training datasets and the validation datasets should be imagenet-2012-tfrecord/train* and imagenet-2012-tfrecord/validation*, respectively.
The trainer for the model is implemented in train.py, and the following example launches the training
python -m universal_embedding_challenge.train \
--experiment=vit_with_bottleneck_imagenet_pretrain \
--mode=train_and_eval \
--model_dir=/tmp/imagenet1k_test
The trained model checkpoints could be further converted to savedModel format using export_saved_model.py for Kaggle submission.
The code to compute metrics for Universal Embedding Challenge is implemented in metrics.py and the code to read the solution file is implemented in read_retrieval_solution.py.
Facebook
TwitterTensorflow reimplementation of Swin Transformer model.
Based on Official Pytorch implementation.
https://user-images.githubusercontent.com/24825165/121768619-038e6d80-cb9a-11eb-8cb7-daa827e7772b.png" alt="image">
tensorflow >= 2.4.1ImageNet-1K and ImageNet-22K Pretrained Checkpoints
| name | pretrain | resolution |acc@1 | #params | model |
| :---: | :---: | :---: | :---: | :---: | :---: |
|swin_tiny_224 |ImageNet-1K |224x224|81.2|28M|github|
|swin_small_224|ImageNet-1K |224x224|83.2|50M|github|
|swin_base_224 |ImageNet-22K|224x224|85.2|88M|github|
|swin_base_384 |ImageNet-22K|384x384|86.4|88M|github|
|swin_large_224|ImageNet-22K|224x224|86.3|197M|github|
|swin_large_384|ImageNet-22K|384x384|87.3|197M|github|
Initializing the model: ```python from swintransformer import SwinTransformer
model = SwinTransformer('swin_tiny_224', num_classes=1000, include_top=True, pretrained=False)
You can use a pretrained model like this:python
import tensorflow as tf
from swintransformer import SwinTransformer
model = tf.keras.Sequential([
tf.keras.layers.Lambda(lambda data: tf.keras.applications.imagenet_utils.preprocess_input(tf.cast(data, tf.float32), mode="torch"), input_shape=[*IMAGE_SIZE, 3]),
SwinTransformer('swin_tiny_224', include_top=False, pretrained=True),
tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
])
If you use a pretrained model with TPU on kaggle, specify `use_tpu` option:python
import tensorflow as tf
from swintransformer import SwinTransformer
model = tf.keras.Sequential([ tf.keras.layers.Lambda(lambda data: tf.keras.applications.imagenet_utils.preprocess_input(tf.cast(data, tf.float32), mode="torch"), input_shape=[*IMAGE_SIZE, 3]), SwinTransformer('swin_tiny_224', include_top=False, pretrained=True, use_tpu=True), tf.keras.layers.Dense(NUM_CLASSES, activation='softmax') ]) ``` Example: TPU training on Kaggle
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}
Facebook
TwitterThe STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. The primary challenge is to make use of the unlabeled data (which comes from a similar but different distribution from the labeled data) to build a useful prior. All images were acquired from labeled examples on ImageNet.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('stl10', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/stl10-1.0.0.png" alt="Visualization" width="500px">
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterImageNet-v2 is an ImageNet test set (10 per class) collected by closely following the original labelling protocol. Each image has been labelled by at least 10 MTurk workers, possibly more, and depending on the strategy used to select which images to include among the 10 chosen for the given class there are three different versions of the dataset. Please refer to section four of the paper for more details on how the different variants were compiled.
The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_v2', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_v2-matched-frequency-3.0.0.png" alt="Visualization" width="500px">