Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "ImageNet-Hard"
Project Page - ArXiv - Paper - Github - Image Browser
Dataset Summary
ImageNet-Hard is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to… See the full description on the dataset page: https://huggingface.co/datasets/taesiri/imagenet-hard.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
General Information
Title: ImageNet-AB Description: ImageNet-AB is an extended version of the ImageNet-1K training set, enriched with annotation byproducts (AB). In addition to the image and corresponding class labels, this dataset provides a rich history of interactions per input signal per front-end component during the annotation process. They include mouse traces, click locations, annotation times, as well as anonymised worker IDs. Links:
ICCV'23 Paper Main Repository ImageNet… See the full description on the dataset page: https://huggingface.co/datasets/coallaoh/ImageNet-AB.
Facebook
TwitterThe dataset used in the paper is the CIFAR-100 and ImageNet datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NINCO (No ImageNet Class Objects) dataset is introduced in the ICML 2023 paper In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation. The images in this dataset are free from objects that belong to any of the 1000 classes of ImageNet-1K (ILSVRC2012), which makes NINCO suitable for evaluating out-of-distribution detection on ImageNet-1K .
The NINCO main dataset consists of 64 OOD classes with a total of 5879 samples. These OOD classes were selected to have no categorical overlap with any classes of ImageNet-1K. Each sample was inspected individually by the authors to not contain ID objects.
Besides NINCO, included are (in the same .tar.gz file) truly OOD versions of 11 popular OOD datasets with in total 2715 OOD samples.
Further included are 17 OOD unit-tests, with 400 samples each.
Code for loading and evaluating on each of the three datasets is provided at https://github.com/j-cb/NINCO.
When using NINCO, please consider citing (besides the bibtex given below) the following data sources that were used to create NINCO:
When using NINCO_popular_datasets_subsamples, additionally to the above, please consider citing:
For citing our paper, we would appreciate using the following bibtex entry (this will be updated once the ICML 2023 proceedings are public):
@inproceedings{
bitterwolf2023ninco,
title={In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation},
author={Julian Bitterwolf and Maximilian Mueller and Matthias Hein},
booktitle={ICML},
year={2023},
url={https://proceedings.mlr.press/v202/bitterwolf23a.html}
}
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "Imagenet-Hard-4K"
Project Page - Paper - Github ImageNet-Hard-4K is 4K version of the original ImageNet-Hard dataset, which is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their… See the full description on the dataset page: https://huggingface.co/datasets/taesiri/imagenet-hard-4K.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Horikawa, T. & Kamitani, Y. (2017) Generic decoding of seen and imagined objects using hierarchical visual features. Nature Communications 8:15037. https://www.nature.com/articles/ncomms15037
In this study, fMRI data was recorded while subjects were viewing object images (image presentation experiment) or were imagining object images (imagery experiment). The image presentation experiment consisted of two distinct types of sessions: training image sessions and test image sessions. In the training image session, a total of 1,200 images from 150 object categories (8 images from each category) were each presented only once (24 runs). In the test image session, a total of 50 images from 50 object categories (1 image from each category) were presented 35 times each (35 runs). All images were taken from ImageNet (http://www.image-net.org/, Fall 2011 release), a large-scale hierarchical image database. During the image presentation experiment, subjects performed one-back image repetition task (5 trials in each run). In the imagery experiment, subjects were required to visually imagine images from 1 of the 50 categories (20 runs; 25 categories in each run; 10 samples for each category) that were presented in the test image session of the image presentation experiment. fMRI data in the training image sessions were used to train models (decoders) which predict visual features from fMRI patterns, and those in the test image sessions and the imagery experiment were used to evaluate the model performance. Predicted features for the test image sessions and imagery experiment are used to identify seen/imagined object categories from a set of computed features for numerous object images.
Analysis demo code is available at GitHub (KamitaniLab/GenericObjectDecoding).
The present dataset contains fMRI data from five subjects ('sub-01', 'sub-02', 'sub-03', 'sub-04', and 'sub-05'). Each subject data contains three types of MRI data each of which was collected over multiple scanning sessions.
Each scanning session consisted of functional (EPI) and anatomical (inplane T2) data. The functional EPI images covered the entire brain (TR, 3000 ms; TE, 30 ms; flip angle, 80°; voxel size, 3 × 3 × 3 mm; FOV, 192 × 192 mm; number of slices, 50, slice gap, 0 mm) and inplane T2-weighted anatomical images were acquired with the same slices used for the EPI (TR, 7020 ms; TE, 69 ms; flip angle, 160°; voxel size, 0.75 × 0.75 × 3.0 mm; FOV, 192 × 192 mm). The dataset also includes a T1-weighted anatomical reference image for each subject (TR, 2250 ms; TE, 3.06 ms; TI, 900 ms; flip angle, 9°; voxel size, 1.0 × 1.0 × 1.0 mm; FOV, 256 × 256 mm). The T1-weighted images were scanned only once for each subject in a separate scanning session and are stored in 'ses-anatomy' directories. The T1-weighted images were defaced by pydeface (https://pypi.python.org/pypi/pydeface). All DICOM files are converted to Nifti-1 files by mri_convert in FreeSurfer. In addition, the dataset contains mask images of manually defined ROIs for each subject in 'sourcedata' directory (See 'README' in 'sourcedata' for more details).
Preprocessed fMRI data are available in derivatives/preproc-spm. See the original paper (Horikawa & Kamitani, 2017) for the details of preprocessing.
Task event files (‘sub-*_ses-*_task-*_run-*_events.tsv’) contains recorded event (stimuli presentation, subject responses, etc.) during fMRI runs. In task event files for perception task (‘ses-perceptionTraining' and 'ses-perceptionTest'), each column represents:
In task event files for imagery task ('ses-imageryTest'), each column represents:
The stimulus images are named as 'n03626115_19498' where 'n03626115' is ImageNet/WorNet ID for a synset (category) and '19498' is image ID. The categories are named as the ImageNet/WordNet sysnet ID (e.g., 'n03626115'). The stimulus and category names are included in the task event files as 'stimulus_name' and 'category_name', respectively. For use in analysis code, the task event files also contain 'stimulus_id' and 'category_id', which are float numbers generated based on the stimulus or category names (e.g., 'n03626115_19498' --> 3626115.019498).
The mapping between stimulus/category names and IDs:
Because of licensing issues, we do not include the stimulus images in the dataset. A script downloading the images from ImageNet is available at https://github.com/KamitaniLab/GenericObjectDecoding. Image features (CNN unit responses, HMAX, GIST, and SIFT) used in the original study are available at https://figshare.com/articles/Generic_Object_Decoding/7387130.
Facebook
TwitterThis dataset contains ILSVRC-2012 (ImageNet) validation images augmented with a new set of "Re-Assessed" (ReaL) labels from the "Are we done with ImageNet" paper, see https://arxiv.org/abs/2006.07159. These labels are collected using the enhanced protocol, resulting in multi-label and more accurate annotations.
Important note: about 3500 examples contain no label, these should be excluded from the averaging when computing the accuracy. One possible way of doing this is with the following NumPy code:
is_correct = [pred in real_labels[i] for i, pred in enumerate(predictions) if real_labels[i]]
real_accuracy = np.mean(is_correct)
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet2012_real', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012_real-1.0.0.png" alt="Visualization" width="500px">
Facebook
TwitterImageNet-v2 is an ImageNet test set (10 per class) collected by closely following the original labelling protocol. Each image has been labelled by at least 10 MTurk workers, possibly more, and depending on the strategy used to select which images to include among the 10 chosen for the given class there are three different versions of the dataset. Please refer to section four of the paper for more details on how the different variants were compiled.
The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_v2', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_v2-matched-frequency-3.0.0.png" alt="Visualization" width="500px">
Facebook
TwitterImageNet-A is a set of images labelled with ImageNet labels that were obtained by collecting new data and keeping only those images that ResNet-50 models fail to correctly classify. For more details please refer to the paper.
The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_a-0.1.0.png" alt="Visualization" width="500px">
Facebook
TwitterThis dataset is introduced by the paper [1].
To build datasets with few erroneous cues, the datasets are gathered using a straightforward adversarial filtration strategy. Real-world, unaltered instances from datasets transferred to different unseen models consistently, showing that computer vision models have common flaws.
I provide 2 additional json mapping files and 1 python file to rename the class by real name instead of ID and vice versa.
Please download original dataset and other useful python files can be found at the author's source at this link: https://github.com/hendrycks/natural-adv-examples
[1] Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., & Song, D. (2021). Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 15262-15271).
Facebook
TwitterThe dataset used in the paper is ILSVRC2012 (ImageNet 1K), a large-scale image classification dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To enable research on automated alignment/interpretability evaluations, we release the experimental results of our paper "Scale Alone Does not Improve Mechanistic Interpretability in Vision Models" as a separate dataset.
Note that this is the first dataset containing interpretability measurements obtained through psychophysical experiments for multiple explanation methods and models. The dataset contains >120'000 anonymized human responses, each consisting of the final choice, a confidence score, and a reaction time. Out of these >120'000 responses, > 69'000 passed all our quality assertions - this is the main data (see responses_main.csv). The other responses failed (some) quality assertions and might be of lower quality - they should be used with care (see responses_lower_quality.csv). We consider the former the main dataset and provide the latter as data for development/debugging purposes. Furthermore, the dataset contains the used query images as well as the generated explanations for >760 units across nine models.
The dataset itself is a collection of labels and metainformation without the presence of fixed features that should be predictive of a unit's interpretability. Moreover, finding and constructing features that are predictive of the recorded labels will be one of the open challenges posed by this line of research.
Facebook
TwitterDataset used for paper -> "Rethinking Dataset Compression: Shifting Focus From Labels to Images"
Dataset created according to the paper Imagenet: A large-scale hierarchical image database.
Basic Usage
from datasets import load_dataset dataset = load_dataset("he-yang/2025-rethinkdc-imagenet-random-ipc-1")
For more information, please refer to the Rethinking-Dataset-Compression
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accuracy in ImageNet 1K train model with Visual Genome.
Facebook
TwitterThe dataset used in this paper is also Red Mini-ImageNet, which is a benchmark for evaluating the robustness of image classification models to label noise. It contains 50,000 training images and 5,000 test images of size 224x224 pixels with 100 classes.
Facebook
TwitterThe dataset used in the paper is ImageNet-52R, a random subset of ImageNet for object detection.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A comparison of the proposed method with image classification models on the ImageNet-Hard dataset.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
https://qiangli.de/imgs/flowchart2%20(1).png">
An Explainable Visual Benchmark Dataset for Robustness Evaluation. A Dataset for Image Background Exploration!
Blur Background, Segmented Background, AI-generated Background, Bias of Tools During Annotation, Color in Background, Random Background with Real Environment
+⭐ Follow Authors for project updates.
Website: XimageNet-12
Here, we trying to understand how image background effect the Computer Vision ML model, on topics such as Detection and Classification, based on baseline Li et.al work on ICLR 2022: Explainable AI: Object Recognition With Help From Background, we are now trying to enlarge the dataset, and analysis the following topics: Blur Background / Segmented Background / AI generated Background/ Bias of tools during annotation/ Color in Background / Dependent Factor in Background/ LatenSpace Distance of Foreground/ Random Background with Real Environment! Ultimately, we also define the math equation of Robustness Scores! So if you feel interested How would we make it or join this research project? please feel free to collaborate with us!
In this paper, we propose an explainable visual dataset, XIMAGENET-12, to evaluate the robustness of visual models. XIMAGENET-12 consists of over 200K images with 15,410 manual semantic annotations. Specifically, we deliberately selected 12 categories from ImageNet, representing objects commonly encountered in practical life. To simulate real-world situations, we incorporated six diverse scenarios, such as overexposure, blurring, and color changes, etc. We further develop a quantitative criterion for robustness assessment, allowing for a nuanced understanding of how visual models perform under varying conditions, notably in relation to the background.
We employed a combination of tools and methodologies to generate the images in this dataset, ensuring both efficiency and quality in the annotation and synthesis processes.
For a detailed breakdown of our prompt engineering and hyperparameters, we invite you to consult our upcoming paper. This publication will provide comprehensive insights into our methodologies, enabling a deeper understanding of the image generation process.
this dataset has been/could be downloaded via Kaggl...
Facebook
TwitterThe dataset used in the paper is not explicitly described, but it is mentioned that the authors used pre-trained DDPMs on ImageNet 64x64, ImageNet 128x128, and LSUN 256x256.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "ImageNet-Hard"
Project Page - ArXiv - Paper - Github - Image Browser
Dataset Summary
ImageNet-Hard is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to… See the full description on the dataset page: https://huggingface.co/datasets/taesiri/imagenet-hard.