ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.
The test split contains 100K images but no labels because no labels have been publicly released. We provide support for the test split from 2012 with the minor patch released on October 10, 2019. In order to manually download this data, a user must perform the following operations:
The resulting tar-ball may then be processed by TFDS.
To assess the accuracy of a model on the ImageNet test split, one must run inference on all images in the split, export those results to a text file that must be uploaded to the ImageNet evaluation server. The maintainers of the ImageNet evaluation server permits a single user to submit up to 2 submissions per week in order to prevent overfitting.
To evaluate the accuracy on the test split, one must first create an account at image-net.org. This account must be approved by the site administrator. After the account is created, one can submit the results to the test server at https://image-net.org/challenges/LSVRC/eval_server.php The submission consists of several ASCII text files corresponding to multiple tasks. The task of interest is "Classification submission (top-5 cls error)". A sample of an exported text file looks like the following:
771 778 794 387 650
363 691 764 923 427
737 369 430 531 124
755 930 755 59 168
The export format is described in full in "readme.txt" within the 2013 development kit available here: https://image-net.org/data/ILSVRC/2013/ILSVRC2013_devkit.tgz Please see the section entitled "3.3 CLS-LOC submission format". Briefly, the format of the text file is 100,000 lines corresponding to each image in the test split. Each line of integers correspond to the rank-ordered, top 5 predictions for each test image. The integers are 1-indexed corresponding to the line number in the corresponding labels file. See labels.txt.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet2012_subset', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012_subset-1pct-5.0.0.png" alt="Visualization" width="500px">
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.
Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks. ImageNet-21K dataset, which contains more pictures and classes, is used less frequently for pretraining, mainly due to its complexity, and underestimation of its added value compared to standard ImageNet-1K pretraining. This paper aims to close this gap, and make high-quality efficient pretraining on ImageNet-21K available for everyone. Via a dedicated preprocessing stage, utilizing WordNet hierarchies, and a novel training scheme called semantic softmax, we show that different models, including small mobile-oriented models, significantly benefit from ImageNet-21K pretraining on numerous datasets and tasks. We also show that we outperform previous ImageNet-21K pretraining schemes for prominent new models like ViT. Our proposed pretraining pipeline is efficient, accessible, and leads to SoTA reproducible results, from a publicly available dataset.
Tiny ImageNet-C is an open-source data set comprising algorithmically generated corruptions applied to the Tiny ImageNet (ImageNet-200) test set comprising 200 classes following the concept of ImageNet-C. It was introduced by Hendrycks et al. ("Benchmarking Neural Network Robustness to Common Corruptions and Perturbations") and comprises 19 different corruptions (15 test corruptions and 4 validation corruptions) spanning 5 severity levels. This results in 200,000 images for the validation set and 750,000 images for the test set. For further information visit the original GitHub repository of ImageNet-C.
The Stylized-ImageNet dataset is created by removing local texture cues in ImageNet while retaining global shape information on natural images via AdaIN style transfer. This nudges CNNs towards learning more about shapes and less about local textures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a subset of ImageNet called "ImageNet16" more suited for cases with limited computational budget and faster experimentation.
Each class has 400 train images and 100 test images.
If used in your work please cite as follows:
C. Kyrkou, "Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns," in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2024.3380827.
The classes corresponding to imagenet1K:
• n02009912 American_egret
• n02113624 toy_poodle
• n02123597 Siamese_cat
• n02132136 brown_bear
• n02504458 African_elephant
• n02690373 airliner
• n02835271 bicycle-built-for-two
• n02951358 canoe
• n03041632 cleaver
• n03085013 computer_keyboard
• n03196217 digital_clock
• n03977966 police_van
• n04099969 rocking_chair
• n04111531 rotisserie
• n04285008 sports_car
• n04591713 wine_bottle
From original map.txt
knife = n03041632
keyboard = n03085013
elephant = n02504458
bicycle = n02835271
airplane = n02690373
clock = n03196217
oven = n04111531
chair = n04099969
bear = n02132136
boat = n02951358
cat = n02123597
bottle = n04591713
truck = n03977966
car = n04285008
bird = n02009912
dog = n02113624
Folder Structure
-
--
--- .JPEG
--- .JPEG
--- ....
--
--...
-
--
--- .JPEG
--- .JPEG
--- ....
--
--...
Some preliminary results:
Model Name Accuracy (Top-1)
VGG16 85.3
ResNet50 88.2
MobileNetV2 91.0
EfficientNet B0 85.6
Massive Credit to original ImageNet authors[1] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei.ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This page includes downsampled ImageNet images, which can be used for density estimation and generative modeling experiments. Images come in two resolutions: 32x32 and 64x64, and were introduced in Pixel Recurrent Neural Networks. Please refer to the Pixel RNN paper for more details and results. ![]()
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results on IMAGENET-100.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To enable research on automated alignment/interpretability evaluations, we release the experimental results of our paper "Scale Alone Does not Improve Mechanistic Interpretability in Vision Models" as a separate dataset.
Note that this is the first dataset containing interpretability measurements obtained through psychophysical experiments for multiple explanation methods and models. The dataset contains >120'000 anonymized human responses, each consisting of the final choice, a confidence score, and a reaction time. Out of these >120'000 responses, > 69'000 passed all our quality assertions - this is the main data (see responses_main.csv). The other responses failed (some) quality assertions and might be of lower quality - they should be used with care (see responses_lower_quality.csv). We consider the former the main dataset and provide the latter as data for development/debugging purposes. Furthermore, the dataset contains the used query images as well as the generated explanations for >760 units across nine models.
The dataset itself is a collection of labels and metainformation without the presence of fixed features that should be predictive of a unit's interpretability. Moreover, finding and constructing features that are predictive of the recorded labels will be one of the open challenges posed by this line of research.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This page includes downsampled ImageNet images, which can be used for density estimation and generative modeling experiments. Images come in two resolutions: 32x32 and 64x64, and were introduced in Pixel Recurrent Neural Networks. Please refer to the Pixel RNN paper for more details and results. ![]()
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classification accuracy against PGD-10 attacks on different datasets.
The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models.
ImageNet-C is an open source data set that consists of algorithmically generated corruptions (blur, noise) applied to the ImageNet test-set.
https://mtl.yyliu.net/download/https://mtl.yyliu.net/download/
The tieredImageNet dataset is a larger subset of ILSVRC-12 with 608 classes (779,165 images) grouped into 34 higher-level nodes in the ImageNet human-curated hierarchy. This set of nodes is partitioned into 20, 6, and 8 disjoint sets of training, validation, and testing nodes, and the corresponding classes form the respective meta-sets. As argued in Ren et al. (2018), this split near the root of the ImageNet hierarchy results in a more challenging, yet realistic regime with test classes that are less similar to training classes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of four convolutional neural networks on Imagenet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A subset of ImageNet containing 20 fun animal classes for educational purposes. Results in a high dimensional dataset with a minimal storage footprint allowing for deep learning models to be easily trained and evaluated.
ImageNet-Sketch data set consists of 50,889 images, approximately 50 images for each of the 1000 ImageNet classes. The data set is constructed with Google Image queries "sketch of ", where is the standard class name. Only within the "black and white" color scheme is searched. 100 images are initially queried for every class, and the pulled images are cleaned by deleting the irrelevant images and images that are for similar but different classes. For some classes, there are less than 50 images after manually cleaning, and then the data set is augmented by flipping and rotating the images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Musculoskeletal conditions affect an estimated 1.7 billion people worldwide, causing intense pain and disability. These conditions lead to 30 million emergency room visits yearly, and the numbers are only increasing. However, diagnosing musculoskeletal issues can be challenging, especially in emergencies where quick decisions are necessary. Deep learning (DL) has shown promise in various medical applications. However, previous methods had poor performance and a lack of transparency in detecting shoulder abnormalities on X-ray images due to a lack of training data and better representation of features. This often resulted in overfitting, poor generalisation, and potential bias in decision-making. To address these issues, a new trustworthy DL framework has been proposed to detect shoulder abnormalities (such as fractures, deformities, and arthritis) using X-ray images. The framework consists of two parts: same-domain transfer learning (TL) to mitigate imageNet mismatch and feature fusion to reduce error rates and improve trust in the final result. Same-domain TL involves training pre-trained models on a large number of labelled X-ray images from various body parts and fine-tuning them on the target dataset of shoulder X-ray images. Feature fusion combines the extracted features with seven DL models to train several ML classifiers. The proposed framework achieved an excellent accuracy rate of 99.2%, F1Score of 99.2%, and Cohen’s kappa of 98.5%. Furthermore, the accuracy of the results was validated using three visualisation tools, including gradient-based class activation heat map (Grad CAM), activation visualisation, and locally interpretable model-independent explanations (LIME). The proposed framework outperformed previous DL methods and three orthopaedic surgeons invited to classify the test set, who obtained an average accuracy of 79.1%. The proposed framework has proven effective and robust, improving generalisation and increasing trust in the final results.
Artificial neural networks (ANNs) are sensitive to perturbations and adversarial attacks. One hypothesized solution to adversarial robustness is to align manifolds in the embedded space of neural networks with biologically grounded manifolds. Recent state-of-the-art works that emphasize learning robust neural representations, rather than optimizing for a specific target task like classification, support the idea that researchers should investigate this hypothesis. Â While works have shown that fine-tuning ANNs to coincide with biological vision does increase robustness to both perturbations and adversarial attacks, these works have relied on proprietary datasets- the lack of publicly available biological benchmarks make it difficult to evaluate the efficacy of these claims. Here, we deliver a curated dataset consisting of biological representations of images taken from two commonly used computer vision datasets, ImageNet and COCO, that can be easily integrated into model training and eva..., , , # BOLD5000 Additional ROIs and RDMs for Neural Network Research
This dataset is made available as part of the publication of the following journal article:
Pickard W, Sikes K, Jamil H, Chaffee N, Blanchard N, Kirby M and Peterson C (2023) Exploring fMRI RDMs: enhancing model robustness through neurobiological data. Front. Comput. Sci. 5:1275026. doi: 10.3389/fcomp.2023.127502
This dataset is derivative of the BOLD5000 Release 2.0. Additional post-processing steps were performed to make the data more accessible in machine learning (ML) research using representational similarity analysis (RSA).
As a general overview, the following additional post-processing steps were performed with the results made available here:
Freesurfer was used to create the new ROIs. Freesurfer derivatives for...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
ImageNet-O contains anomalies of unforeseen classes which should result in low-confidence predictions.
ILSVRC 2012, commonly known as 'ImageNet' is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, we hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy.
The test split contains 100K images but no labels because no labels have been publicly released. We provide support for the test split from 2012 with the minor patch released on October 10, 2019. In order to manually download this data, a user must perform the following operations:
The resulting tar-ball may then be processed by TFDS.
To assess the accuracy of a model on the ImageNet test split, one must run inference on all images in the split, export those results to a text file that must be uploaded to the ImageNet evaluation server. The maintainers of the ImageNet evaluation server permits a single user to submit up to 2 submissions per week in order to prevent overfitting.
To evaluate the accuracy on the test split, one must first create an account at image-net.org. This account must be approved by the site administrator. After the account is created, one can submit the results to the test server at https://image-net.org/challenges/LSVRC/eval_server.php The submission consists of several ASCII text files corresponding to multiple tasks. The task of interest is "Classification submission (top-5 cls error)". A sample of an exported text file looks like the following:
771 778 794 387 650
363 691 764 923 427
737 369 430 531 124
755 930 755 59 168
The export format is described in full in "readme.txt" within the 2013 development kit available here: https://image-net.org/data/ILSVRC/2013/ILSVRC2013_devkit.tgz Please see the section entitled "3.3 CLS-LOC submission format". Briefly, the format of the text file is 100,000 lines corresponding to each image in the test split. Each line of integers correspond to the rank-ordered, top 5 predictions for each test image. The integers are 1-indexed corresponding to the line number in the corresponding labels file. See labels.txt.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet2012_subset', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012_subset-1pct-5.0.0.png" alt="Visualization" width="500px">