KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odometry challenge) with 11 classes: building, tree, sky, car, sign, road, pedestrian, fence, pole, sidewalk, and bicyclist.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
General Information
The KITTI dataset is highly esteemed among researchers, prized for its precision, accuracy, and vast scope. It serves as an excellent resource for evaluating and comparing algorithms related to object detection, tracking, and scene understanding. The KITTI Vision Benchmark Suite was developed jointly by the Karlsruhe Institute of Technology and the Toyota Institute of Technology in Chicago. Link: https://www.cvlibs.net/datasets/kitti/
Description
The dataset covers outdoor driving scenarios and includes varying lighting conditions, weather, and occlusions. The ground truth depth maps are obtained using a Velodyne LiDAR sensor and are provided at a resolution of 1242 × 375 (Patni et al., 2024). The dataset includes a large number of real-world scenarios, which makes it more representative of real-world driving conditions. KITTI datsets are captured by driving around the mid-size city of Karlsruhe, in rural areas and on highways (Geiger et al., 2012).
New depth estimation method
KITTI dataset is a widely used outdoor benchmark for monocular depth estimation, containing over 24k densely labeled pairs of RGB and depth images. Patni, S., Agarwal, A., & Arora, C. decided to use this dataset in an innovative method for estimating depth from a single image - “ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation”. Link: https://arxiv.org/abs/2403.18807
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation. Virtual KITTI contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions. These worlds were created using the Unity game engine and a novel real-to-virtual cloning method. These photo-realistic synthetic videos are automatically, exactly, and fully annotated for 2D and 3D multi-object tracking and at the pixel level with category, instance, flow, and depth labels (cf. below for download links).
The Virtual KITTI 2 dataset is a synthetic clone of the real KITTI dataset, containing 5 sequence clones of Scene 01, 02, 06, 18 and 20, and nine variants with diverse weather conditions or modified camera configurations.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Object Detection Evaluation 2012.
The object detection and object orientation estimation benchmark consists of 7481 training images and 7518 test images, comprising a total of 80.256 labeled objects. All images are color and saved as png. For evaluation, we compute precision-recall curves for object detection and orientation-similarity-recall curves for joint object detection and orientation estimation. In the latter case not only the object 2D bounding box has to be located correctly, but also the orientation estimate in bird's eye view is evaluated. To rank the methods we compute average precision and average orientation similiarity. We require that all methods use the same parameter set for all test pairs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
KITTI is a dataset for object detection tasks - it contains Traffic Participants annotations for 7,464 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Public dataset for KITTI Object Detection: https://github.com/DataWorkshop-Foundation/poznan-project02-car-model
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1246155%2Fc9cc7e9e46ce68919b8157f82b4c0d06%2Fpassat_sensors_920.png?generation=1605764967434311&alt=media" alt="">
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License
When using this dataset in your research, we will be happy if you cite us: @INPROCEEDINGS{Geiger2012CVPR, author = {Andreas Geiger and Philip Lenz and Raquel Urtasun}, title = {Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite}, booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2012} }
The KITTI-Depth dataset includes depth maps from projected LiDAR point clouds that were matched against the depth estimation from the stereo cameras. The depth images are highly sparse with only 5% of the pixels available and the rest is missing. The dataset has 86k training images, 7k validation images, and 1k test set images on the benchmark server with no access to the ground truth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
KITTI Dataset For Training YOLOv7 is a dataset for object detection tasks - it contains Vehicles Pedestrians annotations for 7,481 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Kitti Train is a dataset for object detection tasks - it contains Objects annotations for 7,481 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore Mono KITTI, a dataset of high-quality monocular images with accurate distance measurements, derived from the KITTI dataset. Ideal for monocular distance estimation, autonomous driving research, and advancing computer vision techniques.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Yolov7 On KITTI is a dataset for object detection tasks - it contains Kitti Classes annotations for 3,654 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The depth prediction evaluation is related to the work published in Sparsity Invariant CNNs (THREEDV 2017). It contains over 93 thousand depth maps with corresponding raw LiDaR scans and RGB images, aligned with the "raw data" of the KITTI dataset. Given a large amount of training data, this dataset shall allow the training of complex deep learning models for depth completion and single image depth prediction tasks. Also, manually selected images with unpublished depth maps are provided here to serve as a benchmark for those two challenging tasks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Cars Kitti is a dataset for object detection tasks - it contains Cars annotations for 383 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kitti kit
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The .hkl weights for training PredNet on the KITTI dataset, created with an updated hickle version.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
KITTI Masks Dataset
This Dataset consists of 2120 sequences of binary masks of pedestrians. The sequence length varies between 2-710. For details, we refer to our paper. It is based on the original KITTI Segmentation challenge which can be found at https://www.vision.rwth-aachen.de/page/mots
A detailed description can be found at: https://openreview.net/pdf?id=EbIDjBynYJ8
An example dataloader can be found at: https://github.com/bethgelab/slow_disentanglement/blob/26eef4557ad25f1991b6f5dc774e37e192bdcabf/scripts/dataset.py#L875
KITTI is a well established dataset in the computer vision community. It has often been used for trajectory prediction despite not having a well defined split, generating non comparable baselines in different works. This dataset aims at bridging this gap and proposes a well defined split of the KITTI data. Samples are collected as 6 seconds chunks (2seconds for past and 4 for future) in a sliding window fashion from all trajectories in the dataset, including the egovehicle. There are a total of 8613 top-view trajectories for training and 2907 for testing. Since top-view maps are not provided by KITTI, semantic labels of static categories obtained with DeepLab-v3+ from all frames are projected in a common top-view map using the Velodyne 3D point cloud and IMU. The resulting maps have a spatial resolution of 0.5 meters and are provided along with the trajectories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
KITTI Data Set is a dataset for instance segmentation tasks - it contains Drivable Area annotations for 866 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The KITTI 3D dataset consists of 7,481 images for training and 7,518 images for testing. The labels of the train set are publicly available and the labels of the test set are stored on a test server for evaluation.
KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odometry challenge) with 11 classes: building, tree, sky, car, sign, road, pedestrian, fence, pole, sidewalk, and bicyclist.