KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odometry challenge) with 11 classes: building, tree, sky, car, sign, road, pedestrian, fence, pole, sidewalk, and bicyclist.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The depth prediction evaluation is related to the work published in Sparsity Invariant CNNs (THREEDV 2017). It contains over 93 thousand depth maps with corresponding raw LiDaR scans and RGB images, aligned with the "raw data" of the KITTI dataset. Given a large amount of training data, this dataset shall allow the training of complex deep learning models for depth completion and single image depth prediction tasks. Also, manually selected images with unpublished depth maps are provided here to serve as a benchmark for those two challenging tasks.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
From: This is the file : 2011_09_29_drive_0071 (4.1 GB) [synced+rectified data] This page contains our raw data recordings, sorted by category (see menu above). So far, we included only sequences, for which we either have 3D object labels or which occur in our odometry benchmark training set. The dataset comprises the following information, captured and synchronized at 10 Hz: Raw (unsynced+unrectified) and processed (synced+rectified) grayscale stereo sequences (0.5 Megapixels, stored in png format) Raw (unsynced+unrectified) and processed (synced+rectified) color stereo sequences (0.5 Megapixels, stored in png format) 3D Velodyne point clouds (100k points per frame, stored as binary float matrix) 3D GPS/IMU data (location, speed, acceleration, meta information, stored as text file) Calibration (Camera, Camera-to-GPS/IMU, Camera-to-Velodyne, stored as text file) 3D object tracklet labels (cars, trucks, trams, pedestrians, cyclists, sto
Description:
Mono KITTI is a specialized version of the KITTI dataset that focuses exclusively on monocular images and the corresponding distance measurements. This dataset is designed to facilitate research and development in the field of monocular absolute distance estimation, providing a rich set of data for training and evaluating machine learning models.
Download Dataset
Key Features:
Monocular Images: The dataset consists solely of images captured from a single camera, making it ideal for tasks that require monocular vision.
Distance Measurements: Each image is paired with accurate distance measurements, enabling precise absolute distance estimation.
High-Quality Data: Derived from the renowned KITTI dataset, Mono KITTI maintains high standards of image quality and accuracy.
Diverse Environments: The dataset includes a variety of scenes, from urban to rural environments, offering a comprehensive set of conditions for model training.
Dataset Composition:
Image Data: High-resolution monocular images covering diverse driving scenarios.
Distance Annotations: Accurate distance labels provided for each image, essential for absolute distance estimation tasks.
Training, Validation, and Test Sets: The data is split into well-defined training, validation, and test sets to support robust model development and evaluation.
Applications:
Monocular Distance Estimation: Ideal for developing and testing algorithms that estimate distances from monocular images.
Autonomous Driving: Supports research in autonomous driving by providing data for critical perception tasks.
Computer Vision Research: A valuable resource for advancing the state-of-the-art in monocular vision and distance estimation.
Methodology:
The distance annotations in Mono KITTI are derived using advanced techniques to ensure high accuracy. This involves leveraging stereo vision data from the original KITTI dataset to extract precise distance measurements, which are then paired with the corresponding monocular images.
This dataset is sourced from Kaggle.
🤖 Robo3D - The KITTI-C Benchmark KITTI-C is an evaluation benchmark heading toward robust and reliable 3D object detection in autonomous driving. With it, we probe the robustness of 3D detectors under out-of-distribution (OoD) scenarios against corruptions that occur in the real-world environment. Specifically, we consider natural corruptions happen in the following cases:
Adverse weather conditions, such as fog, wet ground, and snow; External disturbances that are caused by motion blur or result in LiDAR beam missing; Internal sensor failure, including crosstalk, possible incomplete echo, and cross-sensor scenarios.
KITTI-C is part of the Robo3D benchmark. Visit our homepage to explore more details.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
This dataset consists of four ZIP files containing annotated images used for experiments in the research of formal specification and specification-based testing for image recognition in autonomous driving systems. The dataset has been derived and modified from the KITTI dataset.
image1.zip: Contains 349 images. These images are part of the first subset used in the experiments.
label1.zip: Contains the 2D bounding box annotations for vehicles corresponding to the images in image1.zip. There are 349 annotation files, and in total, 2,736 vehicles are annotated.
image2.zip: Contains 1,300 images. These images are part of the second subset used in the experiments.
label2.zip: Contains the 2D bounding box annotations for vehicles corresponding to the images in image2.zip. There are 1,300 annotation files, and in total, 5,644 vehicles are annotated.
The dataset was utilized in the research project focused on Bounding Box Specification Language (BBSL), a formal specification language designed for image recognition in autonomous driving systems. This research explores specification-based testing methodologies for object detection systems.
The BBSL project and related testing tools can be accessed on GitHub: https://github.com/IOKENTOI/BBSL-test.
The original KITTI dataset used for modification can be found at [KITTI dataset source link].
If you use this dataset, please cite the original KITTI dataset:
@inproceedings{Geiger2012CVPR,
author = {Andreas Geiger and Philip Lenz and Raquel Urtasun},
title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2012}
}
This Datasets contains the Kitti Object Detection Benchmark, created by Andreas Geiger, Philip Lenz and Raquel Urtasun in the Proceedings of 2012 CVPR ," Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite". This Kernel contains the object detection part of their different Datasets published for Autonomous Driving. It contains a set of images with their bounding box labels. For more information visit the Website they published the data on (linked above) and/or read the README file as it explains the Label format.
Furthermore this is distributed by the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 license. For clarity I did not modify any of this and do not want to use this commercially. This is just for a educational purpose
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Dataset Card for KITTI Flow 2012
Dataset Description
The KITTI Flow 2012 dataset is a real-world benchmark dataset designed to evaluate optical flow estimation algorithms in the context of autonomous driving. Introduced in the seminal paper "Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite" by Geiger et al., it provides challenging sequences recorded from a moving platform in urban, residential, and highway scenes. Optical flow refers to the apparent… See the full description on the dataset page: https://huggingface.co/datasets/randall-lab/kitti-flow2012.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traffic object detection results: Comparing on the KITTI dataset.
The KITTI object detection benchmark dataset.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Dataset Card for KITTI Stereo 2012
Dataset Description
The KITTI Stereo 2012 dataset is a widely used benchmark dataset for evaluating stereo vision, optical flow, and scene flow algorithms in autonomous driving scenarios. It was introduced in the paper "Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite" by Geiger et al. Stereo matching refers to the process of estimating depth from two images captured from slightly different viewpoints—typically a… See the full description on the dataset page: https://huggingface.co/datasets/randall-lab/kitti-stereo2012.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The .hkl weights for training PredNet on the KITTI dataset, created with an updated hickle version.
The KITTI 3D dataset consists of 7,481 images for training and 7,518 images for testing. The labels of the train set are publicly available and the labels of the test set are stored on a test server for evaluation.
The KITTI-Depth dataset includes depth maps from projected LiDAR point clouds that were matched against the depth estimation from the stereo cameras. The depth images are highly sparse with only 5% of the pixels available and the rest is missing. The dataset has 86k training images, 7k validation images, and 1k test set images on the benchmark server with no access to the ground truth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Detection results on the KITTI dataset.
The Virtual KITTI 2 dataset is a synthetic clone of the real KITTI dataset, containing 5 sequence clones of Scene 01, 02, 06, 18 and 20, and nine variants with diverse weather conditions or modified camera configurations.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation. Virtual KITTI contains 21,260 images generated from five different virtual worlds in urban settings under different imaging and weather conditions. These photo-realistic synthetic images are automatically, exactly, and fully annotated for 2D and 3D multi-object tracking and at the pixel level with category, instance, flow, and depth labels.
The KITTI 2012 and 2015 datasets are used for stereo matching experiments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantitative evaluation results of the proposed 3D object detection algorithm on the KITTI dataset.
SemanticKITTI is a large-scale outdoor-scene dataset for point cloud semantic segmentation. It is derived from the KITTI Vision Odometry Benchmark which it extends with dense point-wise annotations for the complete 360 field-of-view of the employed automotive LiDAR. The dataset consists of 22 sequences. Overall, the dataset provides 23201 point clouds for training and 20351 for testing.
KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odometry challenge) with 11 classes: building, tree, sky, car, sign, road, pedestrian, fence, pole, sidewalk, and bicyclist.