The Objectron dataset is a collection of short, object-centric video clips, which are accompanied by AR session metadata that includes camera poses, sparse point-clouds and characterization of the planar surfaces in the surrounding environment. In each video, the camera moves around the object, capturing it from different angles. The data also contain manually annotated 3D bounding boxes for each object, which describe the object’s position, orientation, and dimensions. The dataset consists of 15K annotated video clips supplemented with over 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes. To ensure geo-diversity, the dataset is collected from 10 countries across five continents.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DORI Spatial Reasoning Instruction Dataset
Dataset Description
This dataset contains instruction tuning data for spatial reasoning tasks across multiple question types and visual datasets.
Dataset Structure
Dataset Splits
train: 26,626 samples test: 6,672 samples Total: 33,298 samples
Question Types
q1 q2 q3 q4 q5 q6 q7
Source Datasets
3d_future cityscapes coco coco_space_sea get_3d jta kitti nocs_real objectron… See the full description on the dataset page: https://huggingface.co/datasets/appledora/DORI-instruction-tuning-dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The Objectron dataset is a collection of short, object-centric video clips, which are accompanied by AR session metadata that includes camera poses, sparse point-clouds and characterization of the planar surfaces in the surrounding environment. In each video, the camera moves around the object, capturing it from different angles. The data also contain manually annotated 3D bounding boxes for each object, which describe the object’s position, orientation, and dimensions. The dataset consists of 15K annotated video clips supplemented with over 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes. To ensure geo-diversity, the dataset is collected from 10 countries across five continents.