Objaverse is a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. Objaverse improves upon present day 3D repositories in terms of scale, number of categories, and in the visual diversity of instances within a category.
BigDetection is a new large-scale benchmark to build more general and powerful object detection systems. It leverages the training data from existing datasets (LVIS, OpenImages and Object365) with carefully designed principles, and curate a larger dataset for improved detector pre-training. BigDetection dataset has 600 object categories and contains 3.4M training images with 36M object bounding boxes.
Parts and Attributes of Common Objects (PACO) is a detection dataset that goes beyond traditional object boxes and masks and provides richer annotations such as part masks and attributes. It spans 75 object categories, 456 object-part categories and 55 attributes across image (LVIS) and video (Ego4D) datasets. The dataset contains 641K part masks annotated across 260K object boxes, with half of them exhaustively annotated with attributes as well.
BURST is a benchmark suite built upon TAO that requires tracking and segmenting multiple objects from camera video. The benchmark contains 6 different sub-tasks divided into 2 groups that all share the same data for training/validation/testing.
Class-guided
Common: Track and segment all objects belonging to a set of 78 common classes (based on the COCO class set) Long-tail: Track and segment all objects belonging to an extended set of 482 object classes (based on the LVIS class set) Open-world: Methods are only allowed to use the annotations of the 78 common classes during training, but during inference they are expected to track and segment all 482 object classes (class label predictions are not required)
Exemplar-guided
Mask: Track and segment all objects in the video for which the first-frame object masks are given. This task is identical to Video Object Segmentation (VOS). Box: Track and segment all objects in the video for which the first-frame object bounding-boxes are given. Point: Track and segment all objects in the video for which we are only given the (x,y) point coordinates of the mask centroid in the first-frame in which the objects appear.
An illustration of the task hierarchy is given here and a detailed explanation is given in Sec. 5 of the dataset paper
CARER is an emotion dataset collected through noisy labels, annotated via distant supervision as in (Go et al., 2009).
The subset of data provided here corresponds to the six emotions variant described in the paper. The six emotions are anger, fear, joy, love, sadness, and surprise.
Argoverse-HD is a dataset built for streaming object detection, which encompasses real-time object detection, video object detection, tracking, and short-term forecasting. It contains the video data from Argoverse 1.1 with our own MS COCO-style bounding box annotations with track IDs. The annotations are backward-compatible with COCO as one can directly evaluate COCO pre-trained models on this dataset to estimate the efficiency or the cross-dataset generalization capability of the models. The dataset contains high-quality and temporally-dense annotations for high-resolution videos (1920 x 1200 @ 30 FPS). Overall, there are 70,000 image frames and 1.3 million bounding boxes.
Argoverse-HD is the dataset used in the Streaming Perception Challenge, which includes two tracks:
Detection-only (real-time object detection). In this track, the participants will develop single-frame object detectors as they would for COCO and LVIS challenges. The crucial distinction is that the evaluation will score latency through streaming accuracy. Full-stack. In this track, the method is unrestricted. However, most likely tracking and forecasting will be used to compensate for the latency of the detectors.
By default, all submissions measure their latency on a V100 GPU with the official toolkit.
OmniObject3D is a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. OmniObject3D has several appealing properties:
1) Large Vocabulary: It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets (e.g., ImageNet and LVIS), benefiting the pursuit of generalizable 3D representations.
2) Rich Annotations: Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
3) Realistic Scans: The professional scanners support highquality object scans with precise shapes and realistic appearances.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Objaverse is a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. Objaverse improves upon present day 3D repositories in terms of scale, number of categories, and in the visual diversity of instances within a category.