The Oxford-IIIT pet dataset is a 37 category pet image dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed and species. Additionally, head bounding boxes are provided for the training split, allowing using this dataset for simple object detection tasks. In the test split, the bounding boxes are empty.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('oxford_iiit_pet', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('cifar10', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/cifar10-3.0.2.png" alt="Visualization" width="500px">
Organizing office desk, utensils etc
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('uiuc_d3field', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('glue', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Wake Vision is a large, high-quality dataset featuring over 6 million images, significantly exceeding the scale and diversity of current tinyML datasets (100x). This dataset includes images with annotations of whether each image contains a person. Additionally, it incorporates a comprehensive fine-grained benchmark to assess fairness and robustness, covering perceived gender, perceived age, subject distance, lighting conditions, and depictions. The Wake Vision labels are derived from Open Image's annotations which are licensed by Google LLC under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license. Note from Open Images: "while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself."
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('wake_vision', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/wake_vision-1.0.0.png" alt="Visualization" width="500px">
Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('fashion_mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">
Our dataset consists of objects in diverse appearance and geometry. It requires multi-stage and multi-modal fine motor skills to successfully assemble the pegs onto a unfixed board in a randomized scene. We collected a total of 22,550 trajectories across two different tasks on a Franka Panda arm. We record the trajectories from 2 global views and 2 wrist views. Each view contains both RGB and depth map.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('fmb', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Imagenette is a subset of 10 easily classified classes from the Imagenet dataset. It was originally prepared by Jeremy Howard of FastAI. The objective behind putting together a small version of the Imagenet dataset was mainly because running new ideas/algorithms/experiments on the whole Imagenet take a lot of time.
This version of the dataset allows researchers/practitioners to quickly try out ideas and share with others. The dataset comes in three variants:
Note: The v2 config correspond to the new 70/30 train/valid split (released in Dec 6 2019).
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenette', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenette-full-size-v2-1.0.0.png" alt="Visualization" width="500px">
ImageNet-A is a set of images labelled with ImageNet labels that were obtained by collecting new data and keeping only those images that ResNet-50 models fail to correctly classify. For more details please refer to the paper.
The label space is the same as that of ImageNet2012. Each example is represented as a dictionary with the following keys:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imagenet_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/imagenet_a-0.1.0.png" alt="Visualization" width="500px">
Open Images is a dataset of ~9M images that have been annotated with image-level labels and object bounding boxes.
The training set of V4 contains 14.6M bounding boxes for 600 object classes on 1.74M images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. The images are very diverse and often contain complex scenes with several objects (8.4 per image on average). Moreover, the dataset is annotated with image-level labels spanning thousands of classes.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('open_images_v4', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/open_images_v4-original-2.0.0.png" alt="Visualization" width="500px">
Kuka iiwa peg insertion with force feedback
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('stanford_kuka_multimodal_dataset_converted_externally_to_rlds', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
MTNT: Machine Translation of Noisy Text
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('mtnt', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Franka picking objects and insertion tasks
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('iamlab_cmu_pickup_insert_converted_externally_to_rlds', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
A re-labeled version of CIFAR-10's test set with soft-labels coming from real human annotators. For every pair (image, label) in the original CIFAR-10 test set, it provides several additional labels given by real human annotators as well as the average soft-label. The training set is identical to the one of the original dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('cifar10_h', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/cifar10_h-1.0.0.png" alt="Visualization" width="500px">
RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.
The datasets follow the RLDS format to represent steps and episodes.
Examples in the dataset represent SAR transitions stored when running a partially online trained agent as described in https://arxiv.org/abs/1904.12901. We follow the RLDS dataset format, as specified in https://github.com/google-research/rlds#dataset-format.
We release 40 datasets on 8 tasks in total -- with no combined challenge and easy combined challenge on the cartpole, walker, quadruped, and humanoid tasks. Each task contains 5 different sizes of datasets, 1%, 5%, 20%, 40%, and 100%. Note that the smaller dataset is not guaranteed to be a subset of the larger ones. For details on how the dataset was generated, please refer to the paper.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('rlu_rwrl', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
PR2 opening fridge doors
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('utokyo_pr2_opening_fridge_converted_externally_to_rlds', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('tidybot', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Simulated Franka performing various manipulation tasks
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('maniskill_dataset_converted_externally_to_rlds', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
D4RL is an open-source benchmark for offline reinforcement learning. It provides standardized environments and datasets for training and benchmarking algorithms.
The datasets follow the RLDS format to represent steps and episodes.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('d4rl_mujoco_ant', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
The Oxford-IIIT pet dataset is a 37 category pet image dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed and species. Additionally, head bounding boxes are provided for the training split, allowing using this dataset for simple object detection tasks. In the test split, the bounding boxes are empty.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('oxford_iiit_pet', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.