The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million
ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.
The test split contains 100K images but no labels because no labels have been publicly released. We provide support for the test split from 2012 with the minor patch released on October 10, 2019. In order to manually download this data, a user must perform the following operations:
The resulting tar-ball may then be processed by TFDS.
To assess the accuracy of a model on the ImageNet test split, one must run inference on all images in the split, export those results to a text file that must be uploaded to the ImageNet evaluation server. The maintainers of the ImageNet evaluation server permits a single user to submit up to 2 submissions per week in order to prevent overfitting.
To evaluate the accuracy on the test split, one must first create an account at image-net.org. This account must be approved by the site administrator. After the account is created, one can submit the results to the test server at https://image-net.org/challenges/LSVRC/eval_server.php The submission consists of several ASCII text files corresponding to multiple tasks. The task of interest is "Classification submission (top-5 cls error)". A sample of an exported text file looks like the following:
771 778 794 387 650
363 691 764 923 427
737 369 430 531 124
755 930 755 59 168
The export format is described in full in "readme.txt" within the 2013 development kit available here: https://image-net.org/data/ILSVRC/2013/ILSVRC2013_devkit.tgz Please see the section entitled "3.3 CLS-LOC submission format". Briefly, the format of the text file is 100,000 lines corresponding to each image in the test split. Each line of integers correspond to the rank-ordered, top 5 predictions for each test image. The integers are 1-indexed corresponding to the line number in the corresponding labels file. See labels.txt.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet2012_subset', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012_subset-1pct-5.0.0.png" alt="Visualization" width="500px">
https://choosealicense.com/licenses/undefined/https://choosealicense.com/licenses/undefined/
Dataset Card for tiny-imagenet
Dataset Summary
Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images.
Languages
The class labels in the dataset are in English.
Dataset Structure
Data Instances
{ 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=64x64 at 0x1A800E8E190, 'label': 15 }… See the full description on the dataset page: https://huggingface.co/datasets/zh-plus/tiny-imagenet.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Repack Information
This repository contains a complete repack of ILSVRC/imagenet-1k in Parquet format, with no arbitrary code execution. Images were not resampled.
Dataset Card for ImageNet
Dataset Summary
ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than… See the full description on the dataset page: https://huggingface.co/datasets/benjamin-paine/imagenet-1k.
ImageNet-Hard is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.
Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images and 50 test images.
This dataset contains ILSVRC-2012 (ImageNet) validation images augmented with a new set of "Re-Assessed" (ReaL) labels from the "Are we done with ImageNet" paper, see https://arxiv.org/abs/2006.07159. These labels are collected using the enhanced protocol, resulting in multi-label and more accurate annotations.
Important note: about 3500 examples contain no label, these should be excluded from the averaging when computing the accuracy. One possible way of doing this is with the following NumPy code:
is_correct = [pred in real_labels[i] for i, pred in enumerate(predictions) if real_labels[i]]
real_accuracy = np.mean(is_correct)
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet2012_real', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012_real-1.0.0.png" alt="Visualization" width="500px">
Object recognition is arguably the most important problem at the heart of computer vision. Recently, Barbu et al. introduced a dataset called ObjectNet which includes objects in daily life situations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benchmarking the robustness to distribution shifts traditionally relies on dataset collection which is typically laborious and expensive, in particular for datasets with a large number of classes like ImageNet. An exception to this procedure is ImageNet-C (Hendrycks & Dietterich, 2019), a dataset created by applying common real-world corruptions at different levels of intensity to the (clean) ImageNet images. Inspired by this work, we introduce ImageNet-Cartoon and ImageNet-Drawing, two datasets constructed by converting ImageNet images into cartoons and colored pencil drawings, using a GAN framework (Wang & Yu, 2020) and simple image processing (Lu et al., 2012), respectively.
This repository contains ImageNet-Cartoon and ImageNet-Drawing. Checkout the official GitHub Repo for the code on how to reproduce the datasets.
If you find this useful in your research, please consider citing:
@inproceedings{imagenetshift,
title={ImageNet-Cartoon and ImageNet-Drawing: two domain shift datasets for ImageNet},
author={Tiago Salvador and Adam M. Oberman},
booktitle={ICML Workshop on Shift happens: Crowdsourcing metrics and test datasets beyond ImageNet.},
year={2022}
}
The Stylized-ImageNet dataset is created by removing local texture cues in ImageNet while retaining global shape information on natural images via AdaIN style transfer. This nudges CNNs towards learning more about shapes and less about local textures.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy. For more information, see
ImageNet-A is a set of images labelled with ImageNet labels that were obtained by collecting new data and keeping only those images that ResNet-50 models fail to correctly classify. For more details please refer to the paper.
The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_a-0.1.0.png" alt="Visualization" width="500px">
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Repack Information
This repository contains a complete repack of ILSVRC/imagenet-1k in Parquet format with the following data transformations:
Images were center-cropped to square to the minimum height/width dimension. Images were then rescaled to 256x256 using Lanczos resampling. This dataset is available at benjamin-paine/imagenet-1k-256x256 Images were then rescaled to 128x128 using Lanczos resampling. This dataset is available at benjamin-paine/imagenet-1k-128x128. Images were… See the full description on the dataset page: https://huggingface.co/datasets/benjamin-paine/imagenet-1k-64x64.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks. ImageNet-21K dataset, which contains more pictures and classes, is used less frequently for pretraining, mainly due to its complexity, and underestimation of its added value compared to standard ImageNet-1K pretraining. This paper aims to close this gap, and make high-quality efficient pretraining on ImageNet-21K available for everyone. Via a dedicated preprocessing stage, utilizing WordNet hierarchies, and a novel training scheme called semantic softmax, we show that different models, including small mobile-oriented models, significantly benefit from ImageNet-21K pretraining on numerous datasets and tasks. We also show that we outperform previous ImageNet-21K pretraining schemes for prominent new models like ViT. Our proposed pretraining pipeline is efficient, accessible, and leads to SoTA reproducible results, from a publicly available dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Link to original evaluation code for: https://github.com/hendrycks/natural-adv-examples @article{hendrycks2021nae, title={Natural Adversarial Examples}, author={Dan Hendrycks and Kevin Zhao and Steven Basart and Jacob Steinhardt and Dawn Song}, journal={CVPR}, year={2021} }
ImageNet-LT is a subset of original ImageNet ILSVRC 2012 dataset. The training set is subsampled such that the number of images per class follows a long-tailed distribution. The class with the maximum number of images contains 1,280 examples, whereas the class with the minumum number of images contains only 5 examples. The dataset also has a balanced validation set, which is also a subset of the ImageNet ILSVRC 2012 training set and contains 20 images per class. The test set of this dataset is the same as the validation set of the original ImageNet ILSVRC 2012 dataset.
The original ImageNet ILSVRC 2012 dataset must be downloaded manually, and its path should be set with --manual_dir in order to generate this dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_lt', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_lt-1.0.0.png" alt="Visualization" width="500px">
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a subset of ImageNet called "ImageNet16" more suited for cases with limited computational budget and faster experimentation.
Each class has 400 train images and 100 test images.
If used in your work please cite as follows:
C. Kyrkou, "Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns," in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2024.3380827.
The classes corresponding to imagenet1K:
• n02009912 American_egret
• n02113624 toy_poodle
• n02123597 Siamese_cat
• n02132136 brown_bear
• n02504458 African_elephant
• n02690373 airliner
• n02835271 bicycle-built-for-two
• n02951358 canoe
• n03041632 cleaver
• n03085013 computer_keyboard
• n03196217 digital_clock
• n03977966 police_van
• n04099969 rocking_chair
• n04111531 rotisserie
• n04285008 sports_car
• n04591713 wine_bottle
From original map.txt
knife = n03041632
keyboard = n03085013
elephant = n02504458
bicycle = n02835271
airplane = n02690373
clock = n03196217
oven = n04111531
chair = n04099969
bear = n02132136
boat = n02951358
cat = n02123597
bottle = n04591713
truck = n03977966
car = n04285008
bird = n02009912
dog = n02113624
Folder Structure
-
--
--- .JPEG
--- .JPEG
--- ....
--
--...
-
--
--- .JPEG
--- .JPEG
--- ....
--
--...
Some preliminary results:
Model Name Accuracy (Top-1)
VGG16 85.3
ResNet50 88.2
MobileNetV2 91.0
EfficientNet B0 85.6
Massive Credit to original ImageNet authors[1] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei.ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015
The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models.
ImageNet-10 is a dataset of 10,000 224x224 color images in 10 classes, with 1,000 images per class.
This dataset was created by trtroi
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million