100+ datasets found

UDayton24Automotive Datasets
kaggle.com
zip
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
setareh kian (2025). UDayton24Automotive Datasets [Dataset]. https://www.kaggle.com/datasets/setarehkian/udayton24automotive-datasets
Explore at:
zip(14313285997 bytes)Available download formats
Dataset updated
Aug 6, 2025
Authors
setareh kian
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Datasets for automotive applications require human annotators to label objects such as traffic lights, cars, and pedestrians. There are many available today (e.g. image data sets and infrared images), as well sensor fusion data sets (e.g. image/RADAR/LiDAR, images with athermalized lenses, and images with event-based sensor data). UDayton24Automotive differs from other datasets in the sense that it is specifically designed for developing, training, and benchmarking object detection algorithms using raw sensor data. Multiple automotive cameras are involved, as described below.

RGGB Camera Data (Baseline Training Set) We collected a new dataset of raw/demosaicked image pairs using automotive camera (SONY IMX390 camera with RGGB color filter array and 174 degree fisheye camera), yielding 438 images for training and 88 images for testing tasks. The dataset was annotated by human for cars (3089), pedestrians (687), stop signs (110), and traffic lights (848). This dataset is used to train the raw sensor data-based object detection algorithm for the RGGB camera module, which we may regards as the “teacher” algorithm in knowledge distillation.

RCCB Camera Data (Test Set) We collected this dataset by using the RCCB camera module with 169 degree fisheye lens to test and evaluate the performance of the proposed object detection algorithm. There are total number of 474 raw/demosaicked image pairs captured by this automotive camera. The dataset was annotated by human for cars (2506), pedestrians (406), stop-signs (109),and traffic lights (784).

Joint RGGB-RCCB Camera Data (Cross-Camera Training Set) We collected 90 RGGB-RCCB pair images using the dual-camera configuration shown in 2 and captured by Sony IMX390 Cameras with RGGB and RCCB color filter arrays. As this dataset is intended to support the unsupervised learning of raw RCCB sensor data-based object detection, the image pairs in this dataset are not annotated. The two cameras are externally triggered by two separate laptops (again, limitation to the hardware/software environment we are given). Although not perfectly synchronized, they are manually triggered together so that they are captured within a fraction of a second. Unlike the RGGB Camera Dataset (Baseline Training Set) or the RCCB Camera Data (Test Set), the RGGB-RCCB Camera Dataset does not need to contain moving targets such as pedestrians and cars, and therefore strict synchronization is not necessary.
t
ObChange Dataset
researchdata.tuwien.at
researchdata.tuwien.ac.at
zip
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edith Langer; Edith Langer; Edith Langer; Edith Langer (2024). ObChange Dataset [Dataset]. http://doi.org/10.48436/y3ggy-hxp10
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.48436/y3ggy-hxp10
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Edith Langer; Edith Langer; Edith Langer; Edith Langer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset that can be used to evaluate methods, which are able to detect changed objects when comparing two recordings of the same environment at different time instances. Based on the labeled ground truth objects, it is possible to differentiate between static, moved, removed and novel objects.
Dataset Description
The dataset was recorded with an Asus Xtion PRO Live mounted on the HSR robot. We provide scenes from five different rooms or parts of rooms, namely a big room, a small room, a living area, a kitchen counter and an office desk. Each room is visited by the robot at least five times while between each run a subset of objects from the YCB Object and Model Set (YCB)[1] is re-arranged in the room. In total we generated 26 recordings. For each recording between three and 17 objects are placed (219 in total). Furthermore, furniture and permanent background objects are slightly rearranged. These changes are not labeled because for most service robot tasks, this is not relevant.
Assuming most objects are placed on horizontal surfaces, we extracted planes in each room in a pre-processing step (excluding the floor). For each surface, all frames from the recording where it is visible are extracted and used as the input for ElasticFusion[2]. This results in a total of 34 reconstructed surfaces.
We provide pointwise annotation of the YCB objects for each surface reconstruction from each recording.
Images of exemplary surface reconstructions can be found here: https://www.acin.tuwien.ac.at/vision-for-robotics/software-tools/obchange/

Dataset Structure
The file structure of ObChange.zip is the following:
Room
- scene2
- planes
- 0
- merged_plane_clouds_ds002.pcd
- merged_plane_clouds_ds002.anno
- merged_plane_clouds_ds002_GT.anno
- 1
- merged_plane_clouds_ds002.pcd
- merged_plane_clouds_ds002.anno
- merged_plane_clouds_ds002_GT.anno
- ...
table.txt
- scene3
The pcd-file contains the reconstruction of the surface. The merged_plane_clouds_ds002.anno lists the YCB objects visible in the reconstruction and merged_plane_clouds_ds002_GT.anno contains the point indices of the reconstruction corresponding to the YCB objects together with the corresponding object name. The last element for each object is a bool value indicating if the object is on the floor (and was reconstructed by chance). The table.txt lists for each detected plane the centroid, height, convex hull points and plane coefficients.
We provide the original input data for each room. The zip-files contain the rosbag file for each recording. Each rosbag contains the tf-tree and the RGB and depth stream, as well as the camera intrinsic. Additionally, the semantically annotated Voxblox[3] reconstruction created with SparseConvNet[4] is provided for each recording.

You may also be interested in Object Change Detection Dataset of Indoor Environments. It uses the same input data, but the ground truth annotation is based on a full room reconstruction instead of individual planes.

Acknowledgements
The research leading to these results has received funding from the Austrian Science Fund (FWF) under grant agreement Nos. I3969-N30 (InDex), I3967-N30 (BURG) and from the Austrian Research Promotion Agency (FFG) under grant agreement 879878 (K4R).
References
[1] B. Calli, A. Singh, J. Bruce, A. Walsman, K. Konolige, S. Srinivasa, P. Abbeel, A. M. Dollar, Yale-CMU-Berkeley dataset for robotic manipulation research, The International Journal of Robotics Research, vol. 36, Issue 3, pp. 261 – 268, April 2017.
[2] T. Whelan, S. Leutenegger, R. Salas-Moreno, B. Glocker, A. Davison, ElasticFusion: Dense SLAM without a pose graph, Proceedings of Robotics: Science and Systems, July 2015.
[3] H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, J. Nieto, Juan, Voxblox: Incremental 3D Euclidean Signed Distance Fields for On-Board MAV Planning, in Proceedings of IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 1366-1373, 2017.
[4] B. Graham, M. Engelcke, L. van der Maaten, 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 9224 – 9232, 2018.
The ORBIT (Object Recognition for Blind Image Training)-India Dataset
zenodo.org
data.niaid.nih.gov
+1more
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones (2025). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. http://doi.org/10.5281/zenodo.12608444
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.12608444
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

REFERENCES:

Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641
e
Simple download service (Atom) of the dataset: Carentan PPRL Hazard Zone
data.europa.eu
gimi9.com
+1more
unknown
Updated Apr 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Simple download service (Atom) of the dataset: Carentan PPRL Hazard Zone [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-246be8b1-9842-4098-b355-d1d8cea627b6
Explore at:
unknownAvailable download formats
Dataset updated
Apr 26, 2021
Description
Area exposed to one or more hazards represented on the hazard map used for risk analysis of the RPP. The hazard map is the result of the study of hazards, the objective of which is to assess the intensity of each hazard at any point in the study area. The evaluation method is specific to each hazard type. It leads to the delimitation of a set of areas on the study perimeter constituting a zoning graduated according to the level of the hazard. The allocation of a hazard level at a given point in the territory takes into account the probability of occurrence of the dangerous phenomenon and its degree of intensity. For multi-random PPRNs, each zone is usually identified on the hazard map by a code for each hazard to which it is exposed.

All hazard areas shown on the hazard map are included. Areas protected by protective structures must be represented (possibly in a specific way) as they are always considered subject to hazard (case of breakage or inadequacy of the structure). Hazard zones can be described as developed data to the extent that they result from a synthesis using multiple sources of calculated, modelled or observed hazard data. These source data are not concerned by this class of objects but by another standard dealing with the knowledge of hazards. Some areas within the study area are considered “no or insignificant hazard zones”. These are the areas where the hazard has been studied and is nil. These areas are not included in the object class and do not have to be represented as hazard zones. However, in the case of natural RPPs, regulatory zoning may classify certain areas not exposed to hazard as prescribing areas (see definition of the PPR class).
e
Simple download service (Atom) of the dataset: Hazard zone of the PPRN SAONE...
data.europa.eu
Updated Jan 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Simple download service (Atom) of the dataset: Hazard zone of the PPRN SAONE AVAL [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-d68ed7f8-d177-4e80-b359-f56820231555
Explore at:
inspire download serviceAvailable download formats
Dataset updated
Jan 25, 2022
Description
Area exposed to one or more hazards represented on the hazard map used for risk analysis of the RPP. The hazard map is the result of the study of hazards, the objective of which is to assess the intensity of each hazard at any point in the study area. The evaluation method is specific to each hazard type. It leads to the delimitation of a set of areas on the study perimeter constituting a zoning graduated according to the level of the hazard. The assignment of a hazard level at a given point in the territory takes into account the probability of occurrence of the dangerous phenomenon and its degree of intensity.For multi-random PPRNs, each zone is usually identified on the hazard map by a code for each hazard to which it is exposed. All hazard areas shown on the hazard map are included. Areas protected by protective structures must be represented (possibly in a specific way) as they are always considered to be subject to hazard (cases of breakage or inadequacy of the structure).The hazard zones may be classified as data compiled in so far as they result from a synthesis using several sources of calculated, modelled or observed hazard data. These source data are not covered by this class of objects but by another standard dealing with the knowledge of hazards.Some areas of the study perimeter are considered “zero or insignificant hazard zones”. These are the areas where the hazard has been studied and is nil. These areas are not included in the object class and do not have to be represented as hazard zones. However, in the case of natural RPPs, regulatory zoning may classify certain areas not exposed to hazard as prescribing areas (see definition of the PPR class).
Cyclist Dataset for Object Detection
kaggle.com
zip
Updated Mar 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SemiEmptyGlass (2022). Cyclist Dataset for Object Detection [Dataset]. https://www.kaggle.com/datasets/semiemptyglass/cyclist-dataset
Explore at:
zip(2319730694 bytes)Available download formats
Dataset updated
Mar 15, 2022
Authors
SemiEmptyGlass
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
Cyclist Dataset

Tsinghua-Daimler Cyclist Detection Benchmark Dataset in yolo format for Object Detection

Context

I'm not owner the of this dataset, all the credit goes to X. Li, F. Flohr, Y. Yang, H. Xiong, M. Braun, S. Pan, K. Li and D. M. Gavrila, the creators of this dataset.

Content

img size - 2048x1024

13.7k labeled images (1000 images have no cyclists)

labels in yolo format: id center_x center_y width height (relative to image width and height)

Example yolo bounding box:

0 0.41015625 0.44140625 0.0341796875 0.11328125

Acknowledgments

License Terms

This dataset is made freely available non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given that you agree:

That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, Daimler (or the website host) does not accept any responsibility for errors or omissions.

That you include a reference to the above publication in any published work that makes use of the dataset.

That if you have altered the content of the dataset or created derivative work, prominent notices are made so that any recipients know that they are not receiving the original data.

That you may not use or distribute the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.

That this original license notice is retained with all copies or derivatives of the dataset.

That all rights not expressly granted to you are reserved by Daimler.

Cite

X. Li, F. Flohr, Y. Yang, H. Xiong, M. Braun, S. Pan, K. Li and D. M. Gavrila. A New Benchmark for Vision-Based Cyclist Detection. In Proc. of the IEEE Intelligent Vehicles Symposium (IV), Gothenburg, Sweden, pp.1028-1033, 2016.
Z
Data from: HL Dataset: Visually-grounded Description of Scenes, Actions and...
data.niaid.nih.gov
data.europa.eu
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michele Cafagna; Kees van Deemter; Albert Gatt (2024). HL Dataset: Visually-grounded Description of Scenes, Actions and Rationales [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10723070
Explore at:
Dataset updated
Feb 28, 2024
Dataset provided by
University of Malta
University of Utrecht
University of Urecht
Authors
Michele Cafagna; Kees van Deemter; Albert Gatt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Current captioning datasets focus on object-centric captions, describing the visible objects in the image, often ending up stating the obvious (for humans), e.g. "people eating food in a park". Although these datasets are useful to evaluate the ability of Vision & Language models to recognize and describe visual content, they do not support controlled experiments involving model testing or fine-tuning, with more high-level captions, which humans find easy and natural to produce. For example, people often describe images based on the type of scene they depict ("people at a holiday resort") and the actions they perform ("people having a picnic"). Such concepts are based on personal experience and contribute to forming common sense assumptions. We present the High-Level Dataset, a dataset extending 14997 images from the COCO dataset, aligned with a new set of 134,973 human-annotated (high-level) captions collected along three axes: scenes, actions and rationales. We further extend this dataset with confidence scores collected from an independent set of readers, as well as a set of narrative captions generated synthetically, by combining each of the three axes. We describe this dataset and analyse it extensively. We also present baseline results for the High-Level Captioning task.
t
OCID – Object Clutter Indoor Dataset
researchdata.tuwien.at
application/gzip
Updated Jul 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi (2025). OCID – Object Clutter Indoor Dataset [Dataset]. http://doi.org/10.48436/pcbjd-4wa12
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.48436/pcbjd-4wa12
Dataset updated
Jul 3, 2025
Dataset provided by
TU Wien
Authors
Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi; Jean-Baptiste Nicolas Weibel; Markus Suchi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 20, 2019
Description
OCID – Object Clutter Indoor Dataset
Developing robot perception systems for handling objects in the real-world requires computer vision algorithms to be carefully scrutinized with respect to the expected operating domain. This demands large quantities of ground truth data to rigorously evaluate the performance of algorithms.
The Object Cluttered Indoor Dataset is an RGBD-dataset containing point-wise labeled point-clouds for each object. The data was captured using two ASUS-PRO Xtion cameras that are positioned at different heights. It captures diverse settings of objects, background, context, sensor to scene distance, viewpoint angle and lighting conditions. The main purpose of OCID is to allow systematic comparison of existing object segmentation methods in scenes with increasing amount of clutter. In addition OCID does also provide ground-truth data for other vision tasks like object-classification and recognition.
OCID comprises 96 fully built up cluttered scenes. Each scene is a sequence of labeled pointclouds which are created by building a increasing cluttered scene incrementally and adding one object after the other. The first item in a sequence contains no objects, the second one object, up to the final count of added objects.
Dataset
The dataset uses 89 different objects that are chosen representatives from the Autonomous Robot Indoor Dataset(ARID)[1] classes and YCB Object and Model Set (YCB)[2] dataset objects.
The ARID20 subset contains scenes including up to 20 objects from ARID. The ARID10 and YCB10 subsets include cluttered scenes with up to 10 objects from ARID and the YCB objects respectively. The scenes in each subset are composed of objects from only one set at a time to maintain separation between datasets. Scene variation includes different floor (plastic, wood, carpet) and table textures (wood, orange striped sheet, green patterned sheet). The complete set of data provides 2346 labeled point-clouds.
OCID subsets are structured so that specific real-world factors can be individually assessed.
ARID20-structure
location: floor, table
view: bottom, top
scene: sequence-id
free: clearly separated (objects 1-9 in corresponding sequence)
touching: physically touching (objects 10-16 in corresponding sequence)
stacked: on top of each other (objects 17-20 in corresponding sequence)
ARID10-structure
location: floor, table
view: bottom, top
box: objects with sharp edges (e.g. cereal-boxes)
curved: objects with smooth curved surfaces (e.g. ball)
mixed: objects from both the box and curved
fruits: fruit and vegetables
non-fruits: mixed objects without fruits
scene: sequence-id
YCB10-structure
location: floor, table
view: bottom, top
box: objects with sharp edges (e.g. cereal-boxes)
curved: objects with smooth curved surfaces (e.g. ball)
mixed: objects from both the box and curved
scene: sequence-id
Structure:
You can find all labeled pointclouds of the ARID20 dataset for the first sequence on a table recorded with the lower mounted camera in this directory:
./ARID20/table/bottom/seq01/pcd/
In addition to labeled organized point-cloud files, corresponding depth, RGB and 2d-label-masks are available:
pcd: 640×480 organized XYZRGBL-pointcloud file with ground truth
rgb: 640×480 RGB png-image
depth: 640×480 16-bit png-image with depth in mm
label: 640×480 16-bit png-image with unique integer-label for each object at each pixel
Dataset creation using EasyLabel:
OCID was created using EasyLabel – a semi-automatic annotation tool for RGBD-data. EasyLabel processes recorded sequences of organized point-cloud files and exploits incrementally built up scenes, where in each take one additional object is placed. The recorded point-cloud data is then accumulated and the depth difference between two consecutive recordings are used to label new objects. The code is available here.
OCID data for instance recognition/classification
For ARID10 and ARID20 there is additional data available usable for object recognition and classification tasks. It contains semantically annotated RGB and depth image crops extracted from the OCID dataset.
The structure is as follows:
type: depth, RGB
class name: eg. banana, kleenex, …
class instance: eg. banana_1, banana_2, kleenex_1, kleenex_2,…
The data is provided by Mohammad Reza Loghmani.

Research paper
If you found our dataset useful, please cite the following paper:
@inproceedings{DBLP:conf/icra/SuchiPFV19,
author = {Markus Suchi and
Timothy Patten and
David Fischinger and
Markus Vincze},
title = {EasyLabel: {A} Semi-Automatic Pixel-wise Object Annotation Tool for
Creating Robotic {RGB-D} Datasets},
booktitle = {International Conference on Robotics and Automation, {ICRA} 2019,
Montreal, QC, Canada, May 20-24, 2019},
pages = {6678--6684},
year = {2019},
crossref = {DBLP:conf/icra/2019},
url = {https://doi.org/10.1109/ICRA.2019.8793917},
doi = {10.1109/ICRA.2019.8793917},
timestamp = {Tue, 13 Aug 2019 20:25:20 +0200},
biburl = {https://dblp.org/rec/bib/conf/icra/SuchiPFV19},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

@proceedings{DBLP:conf/icra/2019,
title = {International Conference on Robotics and Automation, {ICRA} 2019,
Montreal, QC, Canada, May 20-24, 2019},
publisher = {{IEEE}},
year = {2019},
url = {http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8780387},
isbn = {978-1-5386-6027-0},
timestamp = {Tue, 13 Aug 2019 20:23:21 +0200},
biburl = {https://dblp.org/rec/bib/conf/icra/2019},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contact & credits
For any questions or issues with the OCID-dataset, feel free to contact the author:
Markus Suchi – email: suchi@acin.tuwien.ac.at
Tim Patten – email: patten@acin.tuwien.ac.at
For specific questions about the OCID-semantic crops data please contact:
Mohammad Reza Loghmani – email: loghmani@acin.tuwien.ac.at
References
[1] Loghmani, Mohammad Reza et al. "Recognizing Objects in-the-Wild: Where do we Stand?" 2018 IEEE International Conference on Robotics and Automation (ICRA) (2018): 2170-2177.
[2] Berk Calli, Arjun Singh, James Bruce, Aaron Walsman, Kurt Konolige, Siddhartha Srinivasa, Pieter Abbeel, Aaron M Dollar, Yale-CMU-Berkeley dataset for robotic manipulation research, The International Journal of Robotics Research, vol. 36, Issue 3, pp. 261 – 268, April 2017.
Vehicle Detection Dataset image
kaggle.com
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daud shah (2025). Vehicle Detection Dataset image [Dataset]. https://www.kaggle.com/datasets/daudshah/vehicle-detection-dataset
Explore at:
zip(545957939 bytes)Available download formats
Dataset updated
May 29, 2025
Authors
Daud shah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vehicle Detection Dataset

This dataset is designed for vehicle detection tasks, featuring a comprehensive collection of images annotated for object detection. This dataset, originally sourced from Roboflow (https://universe.roboflow.com/object-detection-sn8ac/ai-traffic-system), was exported on May 29, 2025, at 4:59 PM GMT and is now publicly available on Kaggle under the CC BY 4.0 license.

Overview

Purpose: The dataset supports the development of computer vision models for detecting various types of vehicles in traffic scenarios.

Classes: The dataset includes annotations for 7 vehicle types:

Bicycle

Bus

Car

Motorbike

Rickshaw

Truck

Van

Number of Images: The dataset contains 9,440 images, split into training, validation, and test sets:

Training: Images located in ../train/images

Validation: Images located in ../valid/images

Test: Images located in ../test/images

Annotation Format: Images are annotated in YOLOv11 format, suitable for training state-of-the-art object detection models.

Pre-processing: Each image has been resized to 640x640 pixels (stretched). No additional image augmentation techniques were applied.

Source and Creation

This dataset was created and exported via Roboflow, an end-to-end computer vision platform that facilitates collaboration, image collection, annotation, dataset creation, model training, and deployment. The dataset is part of the ai-traffic-system project (version 1) under the workspace object-detection-sn8ac. For more details, visit: https://universe.roboflow.com/object-detection-sn8ac/ai-traffic-system/dataset/1.

Usage

This dataset is ideal for researchers, data scientists, and developers working on vehicle detection and traffic monitoring systems. It can be used to: - Train and evaluate deep learning models for object detection, particularly using the YOLOv11 framework. - Develop AI-powered traffic management systems, autonomous driving applications, or urban mobility solutions. - Explore computer vision techniques for real-world traffic scenarios.

For advanced training notebooks compatible with this dataset, check out: https://github.com/roboflow/notebooks. To explore additional datasets and pre-trained models, visit: https://universe.roboflow.com.

License

The dataset is licensed under CC BY 4.0, allowing for flexible use, sharing, and adaptation, provided appropriate credit is given to the original source.

This dataset is a valuable resource for building robust vehicle detection models and advancing computer vision applications in traffic systems.
Medical Image DataSet: Brain Tumor Detection
kaggle.com
zip
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parisa Karimi Darabi (2025). Medical Image DataSet: Brain Tumor Detection [Dataset]. https://www.kaggle.com/datasets/pkdarabi/medical-image-dataset-brain-tumor-detection
Explore at:
zip(311417066 bytes)Available download formats
Dataset updated
Feb 10, 2025
Authors
Parisa Karimi Darabi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Medical Image DataSet: Brain Tumor Detection

Medical Image Dataset: Brain Tumor Detection

The Brain Tumor MRI dataset, curated by Roboflow Universe, is a comprehensive dataset designed for the detection and classification of brain tumors using advanced computer vision techniques. It comprises 3,903 MRI images categorized into four distinct classes:

Glioma: A tumor originating from glial cells in the brain.

Meningioma: Tumors arising from the meninges, the protective layers surrounding the brain and spinal cord.

Pituitary Tumor: Tumors located in the pituitary gland, affecting hormonal balance.

No Tumor: MRI scans that do not exhibit any tumor presence.

Each image in the dataset is annotated with bounding boxes to indicate tumor locations, facilitating object detection tasks precisely. The dataset is structured into training (70%), validation (20%), and test (10%) sets, ensuring a robust framework for model development and evaluation.

The primary goal of this dataset is to aid in the early detection and diagnosis of brain tumors, contributing to improved treatment planning and patient outcomes. By offering a diverse range of annotated MRI images, this dataset enables researchers and practitioners to develop and fine-tune computer vision models with high accuracy in identifying and localizing brain tumors.

This dataset supports multiple annotation formats, including YOLOv8, YOLOv9, and YOLOv11, making it versatile and compatible with various machine-learning frameworks. Its integration with these formats ensures real-time and efficient object detection, ideal for applications requiring timely and precise results.

By leveraging this dataset, researchers and healthcare professionals can make significant strides in developing cutting-edge AI solutions for medical imaging, ultimately supporting more effective and accurate diagnoses in clinical settings.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14850461%2Fe03fba81bb62e32c0b73d6535a25cb8d%2F3.jpg?generation=1734173601629363&alt=media" alt="">
n
Jurisdictional Unit (Public) - Dataset - CKAN
nationaldataplatform.org
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Jurisdictional Unit (Public) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/jurisdictional-unit-public
Explore at:
Dataset updated
Feb 28, 2024
Description
Jurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The
SH17 Dataset for PPE Detection
kaggle.com
data.niaid.nih.gov
zip
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mughees (2024). SH17 Dataset for PPE Detection [Dataset]. https://www.kaggle.com/datasets/mugheesahmad/sh17-dataset-for-ppe-detection
Explore at:
zip(14096291832 bytes)Available download formats
Dataset updated
Jul 3, 2024
Authors
mughees
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
We propose Safe Human dataset consisting of 17 different objects referred to as SH17 dataset. We scrapped images from the Pexels website, which offers "https://www.pexels.com/license/">clear usage rights for all its images, showcasing a range of human activities across diverse industrial operations.

To extract relevant images, we used multiple queries such as manufacturing worker, industrial worker, human worker, labor, etc. The tags associated with Pexels images proved reasonably accurate. After removing duplicate samples, we obtained a dataset of 8,099 images. The dataset exhibits significant diversity, representing manufacturing environments globally, thus minimizing potential regional or racial biases. Samples of the dataset are shown below.

Paper available at Arxiv Link.

GitHub link: https://github.com/ahmadmughees/SH17dataset

Key features

Collected from diverse industrial environments globally

High quality images (max resolution 8192x5462, min 1920x1002)

Average of 9.38 instances per image

Includes small objects like ears and earmuffs (39,764 annotations < 1% image area, 59,025 annotations < 5% area)

Classes

Person

Head

Face

Glasses

Face-mask-medical

Face-guard

Ear

Earmuffs

Hands

Gloves

Foot

Shoes

Safety-vest

Tools

Helmet

Medical-suit

Safety-suit

The data consists of three folders, - images contains all images - labels contains labels in YOLO format for all images - voc_labels contains labels in VOC format for all images - train_files.txt contains list of all images we used for training - val_files.txt contains list of all images we used for validation

Disclaimer and Responsible Use:

This dataset, scrapped through the Pexels website, is intended for educational, research, and analysis purposes only. You may be able to use the data for training of the Machine learning models only. Users are urged to use this data responsibly, ethically, and within the bounds of legal stipulations.

Users should adhere to Copyright Notice of Pexels when utilizing this dataset.

Legal Simplicity: All photos and videos on Pexels can be downloaded and used for free.

Allowed 👌

All photos and videos on Pexels are free to use.

Attribution is not required. Giving credit to the photographer or Pexels is not necessary but always appreciated.

You can modify the photos and videos from Pexels. Be creative and edit them as you like. #### Not allowed 👎

Identifiable people may not appear in a bad light or in a way that is offensive.

Don't sell unaltered copies of a photo or video, e.g. as a poster, print or on a physical product without modifying it first.

Don't imply endorsement of your product by people or brands on the imagery.

Don't redistribute or sell the photos and videos on other stock photo or wallpaper platforms.

Don't use the photos or videos as part of your trade-mark, design-mark, trade-name, business name or service mark.

No Warranty Disclaimer:

The dataset is provided "as is," without warranty, and the creator disclaims any legal liability for its use by others.

Ethical Use:

Users are encouraged to consider the ethical implications of their analyses and the potential impact on broader community.

GitHub Page:

https://github.com/ahmadmughees/SH17dataset

Citation:

@misc{ahmad2024sh17datasethumansafety, title={SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry}, author={Hafiz Mughees Ahmad and Afshin Rahimi}, year={2024}, eprint={2407.04590}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2407.04590}, }

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2806979%2F0a24bd8b9a3f281cf924a5171db28a40%2Fpexels-photo-3862627.jpeg?generation=1720104820503689&alt=media" alt="">
g
Dataset Direct Download Service (WFS): PPRMT Pontoise (Area area)
gimi9.com
data.europa.eu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataset Direct Download Service (WFS): PPRMT Pontoise (Area area) [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-6ba595b5-8db8-47cd-b403-b62f3ea383a8
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Pontoise
Description
Area exposed to one or more hazards represented on the hazard map used for risk analysis of the RPP. The hazard map is the result of the study of hazards, the objective of which is to assess the intensity of each hazard at any point in the study area. The evaluation method is specific to each hazard type. It leads to the delimitation of a set of areas on the study perimeter constituting a zoning graduated according to the level of the hazard. The assignment of a hazard level at a given point in the territory takes into account the probability of occurrence of the dangerous phenomenon and its degree of intensity.For multi-random PPRNs, each zone is usually identified on the hazard map by a code for each hazard to which it is exposed. All hazard areas shown on the hazard map are included. Areas protected by protective structures must be represented (possibly in a specific way) as they are always considered to be subject to hazard (cases of breakage or inadequacy of the structure).The hazard zones may be classified as data compiled in so far as they result from a synthesis using several sources of calculated, modelled or observed hazard data. These source data are not covered by this class of objects but by another standard dealing with the knowledge of hazards.Some areas of the study perimeter are considered “zero or insignificant hazard zones”. These are the areas where the hazard has been studied and is nil. These areas are not included in the object class and do not have to be represented as hazard zones. However, in the case of natural RPPs, regulatory zoning may classify certain areas not exposed to hazard as prescribing areas (see definition of the PPR class).
Retail Market Basket Transactions Dataset
kaggle.com
Updated Aug 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wasiq Ali (2025). Retail Market Basket Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/wasiqaliyasir/retail-market-basket-transactions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Wasiq Ali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview

The Market_Basket_Optimisation dataset is a classic transactional dataset often used in association rule mining and market basket analysis.
It consists of multiple transactions where each transaction represents the collection of items purchased together by a customer in a single shopping trip.

File Name: Market_Basket_Optimisation.csv

Format: CSV (Comma-Separated Values)

Structure: Each row corresponds to one shopping basket. Each column in that row contains an item purchased in that basket.

Nature of Data: Transactional, categorical, sparse.

Primary Use Case: Discovering frequent itemsets and association rules to understand shopping patterns, product affinities, and to build recommender systems.

Detailed Information

📊 Dataset Composition

Transactions: 7,501 (each row = one basket).

Items (unique): Around 120 distinct products (e.g., bread, mineral water, chocolate, etc.).

Columns per row: Up to 20 possible items (not fixed; some rows have fewer, some more).

Data Type: Purely categorical (no numerical or continuous features).

Missing Values: Present in the form of empty cells (since not every basket has all 20 columns).

Duplicates: Some baskets may appear more than once — this is acceptable in transactional data as multiple customers can buy the same set of items.

🛒 Nature of Transactions

Basket Definition: Each row captures items bought together during a single visit to the store.

Variability: Basket size varies from 1 to 20 items. Some customers buy only one product, while others purchase a full set of groceries.

Sparsity: Since there are ~120 unique items but only a handful appear in each basket, the dataset is sparse. Most entries in the one-hot encoded representation are zeros.

🔎 Examples of Data

Example transaction rows (simplified):

Item 1 Item 2 Item 3 Item 4 ...
Bread Butter Jam
Mineral water Chocolate Eggs Milk
Spaghetti Tomato sauce Parmesan

Here, empty cells mean no item was purchased in that slot.

📈 Applications of This Dataset

This dataset is frequently used in data mining, analytics, and recommendation systems. Common applications include:

Association Rule Mining (Apriori, FP-Growth):

Discover rules like {Bread, Butter} ⇒ {Jam} with high support and confidence.

Identify cross-selling opportunities.

Product Affinity Analysis:

Understand which items tend to be purchased together.

Helps with store layout decisions (placing related items near each other).

Recommendation Engines:

Build systems that suggest "You may also like" products.

Example: If a customer buys pasta and tomato sauce, recommend cheese.

Marketing Campaigns:

Bundle promotions and discounts on frequently co-purchased products.

Personalized offers based on buying history.

Inventory Management:

Anticipate demand for certain product combinations.

Prevent stockouts of items that drive the purchase of others.

📌 Key Insights Potentially Hidden in the Dataset

Popular Items: Some items (like mineral water, eggs, spaghetti) occur far more frequently than others.

Product Pairs: Frequent pairs and triplets (e.g., pasta + sauce + cheese) reflect natural meal-prep combinations.

Basket Size Distribution: Most customers buy fewer than 5 items, but a small fraction buy 10+ items, showing long-tail behavior.

Seasonality (if extended with timestamps): Certain items might show peaks in demand during weekends or holidays (though timestamps are not included in this dataset).

📂 Dataset Limitations

No Customer Identifiers:

We cannot track repeated purchases by the same customer.

Analysis is limited to basket-level insights.

No Timestamps:

No temporal analysis (trends over time, seasonality) is possible.

No Quantities or Prices:

We only know whether an item was purchased, not how many units or its cost.

Sparse & Noisy:

Many baskets are small (1–2 items), which may produce weak or trivial rules.

🔮 Potential Extensions

Synthetic Timestamps: Assign simulated timestamps to study temporal buying patterns.

Add Customer IDs: If merged with external data, one can perform personalized recommendations.

Price Data: Adding cost allows for profit-driven association rules (not just frequency-based).

Deep Learning Models: Sequence models (RNNs, Transformers) could be applied if temporal ordering of items is introduced.

...
z
Cartographic Sign Detection Dataset (CaSiDD)
zenodo.org
bin, txt, zip
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Remi Petitpierre; Remi Petitpierre; Jiaming Jiang; Jiaming Jiang (2025). Cartographic Sign Detection Dataset (CaSiDD) [Dataset]. http://doi.org/10.5281/zenodo.16925731
Explore at:
txt, zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16925731
Dataset updated
Aug 27, 2025
Dataset provided by
EPFL
Authors
Remi Petitpierre; Remi Petitpierre; Jiaming Jiang; Jiaming Jiang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Aug 27, 2025
Description
The Cartographic Sign Detection Dataset (CaSiDD) comprises 796 manually annotated historical map samples, corresponding to 18,750 cartographic signs, such as icons and symbols. Moreover, the signs are categorized into 24 distinct classes, such as tree, mill, hill, religious edifice, or grave. The original images are part of the Semap dataset [1].

The dataset is published in the context of R. Petitpierre's PhD thesis: Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration [2]. Details on the annotation process and statistics on the annotated cartographic signs are provided in the manuscript.

Organization of the data

The data is organized following the COCO dataset format.

project_root/
├── classes.txt
├── images/
│ ├── train/
│ │ ├── image1.png
│ │ └── image2.png
│ └── val/
│ ├── image3.png
│ └── image4.png
└── labels/
├── train/
│ ├── image1.txt
│ └── image2.txt
└── val/
├── image3.txt
└── image4.txt

Label syntax

The labels are stored in separate text files, one for each image. In the text files, object classes and coordinates are stored line by line, using the following syntax:

class_id x_center y_center width height

Where x is the horizontal axis. The dimensions are expressed relative to the size of the labeled image. Example:

13 0.095339 0.271003 0.061719 0.027161
1 0.154258 0.490052 0.017370 0.019010
8 0.317982 0.556484 0.017370 0.014063

Classes

0 battlefield
1 tree
2 train (e.g. wagon)
3 mill (watermill or windmill)
4 bridge
5 settlement or building
6 army
7 grave
8 bush
9 marsh
10 grass
11 vine
12 religious monument
13 hill/mountain
14 cannon
15 rock
16 tower
17 signal or survey point
18 gate (e.g. city gate)
19 ship/boat/shipwreck
20 station (e.g. metro/tram/train station)
21 dam/lock
22 harbor
23 well/basin/reservoir
24 miscellaneous (e.g. post office, spring, hospital, school, etc.)

Model weights

A YOLOv10 model yolov10_single_class_model.pt, trained as described in [2], is provided for convenience and reproducibility. The model does not support multi-class object detection. The YOLOv10 implementation used is distributed by Ultralytics [3].

Descriptive statistics

Number of distinct classes: 24 + misc
Number of image samples: 796
Number of annotations: 18,750
Study period: 1492–1948.

Use and Citation

For any mention of this dataset, please cite :

@misc{casidd_petitpierre_2025,
author = {Petitpierre, R{\'{e}}mi and Jiang, Jiaming},
title = {{Cartographic Sign Detection Dataset (CaSiDD)}},
year = {2025}, publisher = {EPFL}, url = {https://doi.org/10.5281/zenodo.16278380}}

@phdthesis{studying_maps_petitpierre_2025,
author = {Petitpierre, R{\'{e}}mi},
title = {{Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration}},
year = {2025}, school = {EPFL}}

Corresponding author

Rémi PETITPIERRE - remi.petitpierre@epfl.ch - ORCID - Github - Scholar - ResearchGate

Work ethics

85% of the data were annotated by RP. The remainder was annotated by JJ, a master's student from EPFL, Switzerland.

License

This project is licensed under the CC BY 4.0 License. See the license_images file for details about the respective reuse policy of digitized map images.

Liability

We do not assume any liability for the use of this dataset.

References

Petitpierre R., Gomez Donoso D., Kriesel B. (2025) Semantic Segmentation Map Dataset (Semap). EPFL. https://doi.org/10.5281/zenodo.16164781

Petitpierre R. (2025) Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration. PhD thesis. École Polytechnique Fédérale de Lausanne.

Jocher G. et al. (2024) Ultralytics YOLO. v8.3.39. https://github.com/ultralytics/ultralytics
h
InternData-A1
huggingface.co
Updated Sep 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Intern Robotics (2025). InternData-A1 [Dataset]. https://huggingface.co/datasets/InternRobotics/InternData-A1
Explore at:
Dataset updated
Sep 29, 2025
Dataset authored and provided by
Intern Robotics
Description
InternData-A1

InternData-A1 is a hybrid synthetic-real manipulation dataset containing over 630k trajectories and 7,433 hours across 4 embodiments, 18 skills, 70 tasks, and 227 scenes, covering rigid, articulated, deformable, and fluid-object manipulation.

Your browser does not support the video tag. Your browser does not support the video tag.… See the full description on the dataset page: https://huggingface.co/datasets/InternRobotics/InternData-A1.
R
Ufba 425 Wniel Izom Dataset
universe.roboflow.com
zip
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow100VL Full (2025). Ufba 425 Wniel Izom Dataset [Dataset]. https://universe.roboflow.com/roboflow100vl-full/ufba-425-wniel-izom
Explore at:
zipAvailable download formats
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Roboflow100VL Full
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Ufba 425 Wniel Izom Bounding Boxes
Description
Overview

Introduction

Object Classes

11

12

13

14

15

16

17

18

21

22

23

24

25

26

27

28

31

32

33

34

35

36

37

38

41

42

43

44

45

46

47

48

Introduction

The UFBA-425 dataset is designed to support object detection tasks with a variety of unique classes. The dataset contains 15 images and 32 distinct classes identified by numeric codes. Each class corresponds to specific objects that are visually identifiable. The goal is to annotate each object class based on its visual characteristics and ensure precision in object detection.

Object Classes

11

Description

Class 11 represents objects characterized by their upright, elongated shape, often found in specific environments such as industrial or outdoor landscapes.

Instructions

Annotate the entire elongated structure, ensuring to include any visible base or fixture connecting it to the ground. Do not annotate partial or obscured sections unless identifiable.

12

Description

Class 12 objects are distinguished by their flat, rectangular surfaces and sharp, distinct edges. Often used in man-made structures.

Instructions

Outline the boundaries of the flat surfaces, paying attention to capture all four corners precisely. Avoid annotating if these objects are stacked, unless clear separation is visible.

13

Description

Objects in class 13 include spheres or rounded shapes that maintain symmetry from multiple perspectives.

Instructions

Focus on capturing the outer contour of the spherical shape. Ensure to capture the entirety of the outline even if it extends partially behind another object.

14

Description

Class 14 consists of objects with complex, irregular outlines, often with a textured surface.

Instructions

Detail the contour of these complex objects accurately, including any protrusions. Avoid over-simplifying the shape and ensure internal segments remain unannotated unless distinct.

15

Description

Class 15 covers objects with multiple geometric components arranged in a symmetrical pattern.

Instructions

Annotate each geometric component, ensuring alignment is consistent with the overall pattern. Do not separate annotations unless components differ from the pattern.

16

Description

Class 16 objects feature prominently in vertical settings with a consistent width throughout.

Instructions

Capture the full height of the object, including its base connection. Avoid annotating if the object is severely obstructed or if identification is uncertain.

17

Description

This class includes objects that are commonly found in pairs or groups, exhibiting symmetry.

Instructions

Annotate each individual component in the pair or group, ensuring each is distinctly identified. Do not join annotations unless the components are physically connected.

18

Description

Objects with class 18 are identified by their bright surfaces and reflective properties.

Instructions

Highlight the reflective surfaces, ensuring boundaries are clearly defined. Exclude reflections not originating from the object itself.

21

Description

Class 21 is dedicated to static objects that have a fixed presence in their environment.

Instructions

Identify the static object's position, from ground level to visible extent. Do not include dynamic objects in close proximity unless physically connected.

22

Description

These objects are characterized by dynamic shapes, often fluctuating in form while maintaining a recognizable profile.

Instructions

Document the entirety of the object in its current shape, focusing on its most defined features. Avoid annotating incomplete forms or shapes without definitive boundaries.

23

Description

Class 23 involves horizontally extended objects with a shallow vertical profile.

Instructions

Delineate the horizontal length meticulously, ensuring the full span is captured. Ignore vertical deviations that do not contribute to the primary horizontal feature.

24

Description

Objects defined by a central core with surrounding features that taper or extend outward.

Instructions

The annotation should include the core and tapering features while ensuring the central portion maintains prominence. Avoid isolating peripheral elements unless completely detached.

25

Description

Objects in class 25 consist of layered elements, oriented either vertically or horizontally.

Instructions

Each layer should be defined distinctly, with annotat
g
Dataset Direct Download Service (WFS): Saint-Fromond PPRT hazard zone
gimi9.com
data.europa.eu
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataset Direct Download Service (WFS): Saint-Fromond PPRT hazard zone [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-0248fce4-f8f3-4057-889e-c56060ce1b63/
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Saint-Fromond
Description
Area exposed to one or more hazards represented on the hazard map used for risk analysis of the RPP. The hazard map is the result of the study of hazards, the objective of which is to assess the intensity of each hazard at any point in the study area. The evaluation method is specific to each hazard type. It leads to the delimitation of a set of areas on the study perimeter constituting a zoning graduated according to the level of the hazard. The allocation of a hazard level at a given point in the territory takes into account the probability of occurrence of the dangerous phenomenon and its degree of intensity. For PPRTs the hazard levels are determined effect by effect on maps by type of effect and overall on an aggregated level on a synthesis map. All hazard areas shown on the hazard map are included. Areas protected by protective structures must be represented (possibly in a specific way) as they are always considered subject to hazard (case of breakage or inadequacy of the structure). Hazard zones can be described as developed data to the extent that they result from a synthesis using multiple sources of calculated, modelled or observed hazard data. These source data are not concerned by this class of objects but by another standard dealing with the knowledge of hazards. Some areas within the study area are considered “no or insignificant hazard zones”. These are the areas where the hazard has been studied and is nil. These areas are not included in the object class and do not have to be represented as hazard zones.
Data from: Incorporating travel time reliability into the Highway Capacity...
catalog.data.gov
data.virginia.gov
+3more
Updated Dec 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2023). Incorporating travel time reliability into the Highway Capacity Manual [supporting datasets] [Dataset]. https://catalog.data.gov/dataset/incorporating-travel-time-reliability-into-the-highway-capacity-manual-supporting-datasets
Explore at:
Dataset updated
Dec 7, 2023
Dataset provided by
Federal Highway Administrationhttps://highways.dot.gov/
Description
The Highway Capacity Manual (HCM) historically has been among the most important reference guides used by transportation professionals seeking a systematic basis for evaluating the capacity, level of service, and performance measures for elements of the surface transportation system, particularly highways but also other modes. The objective of this project was to determine how data and information on the impacts of differing causes of nonrecurrent congestion (incidents, weather, work zones, special events, etc.) in the context of highway capacity can be incorporated into the performance measure estimation procedures contained in the HCM. The methodologies contained in the HCM for predicting delay, speed, queuing, and other performance measures for alternative highway designs are not currently sensitive to traffic management techniques and other operation/design measures for reducing nonrecurrent congestion. A further objective was to develop methodologies to predict travel time reliability on selected types of facilities and within corridors. This project developed new analytical procedures and prepared chapters about freeway facilities and urban streets for potential incorporation of travel-time reliability into the HCM. The methods are embodied in two computational engines, and a final report documents the research. This zip file contains comma separated value (.csv) files of data to support SHRP 2 report S2-L08-RW-1, Incorporating travel time reliability into the Highway Capacity Manual. Zip size is 1.83 MB. Files were accessed in Microsoft Excel 2016. Data will be preserved as is. To view publication see: https://rosap.ntl.bts.gov/view/dot/3606
d
Shoreline Data Rescue Project of Klamath River, CA, CA1874A
catalog.data.gov
datasets.ai
+1more
Updated Oct 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NGS Communications and Outreach Branch (Point of Contact, Custodian) (2024). Shoreline Data Rescue Project of Klamath River, CA, CA1874A [Dataset]. https://catalog.data.gov/dataset/shoreline-data-rescue-project-of-klamath-river-ca-ca1874a1
Explore at:
Dataset updated
Oct 31, 2024
Dataset provided by
NGS Communications and Outreach Branch (Point of Contact, Custodian)
Area covered
Klamath River, California
Description
These data were automated to provide an accurate high-resolution historical shoreline of Klamath River, CA suitable as a geographic information system (GIS) data layer. These data are derived from shoreline maps that were produced by the NOAA National Ocean Service including its predecessor agencies which were based on an office interpretation of imagery and/or field survey. The NGS attribution scheme 'Coastal Cartographic Object Attribute Source Table (C-COAST)' was developed to conform the attribution of various sources of shoreline data into one attribution catalog. C-COAST is not a recognized standard, but was influenced by the International Hydrographic Organization's S-57 Object-Attribute standard so the data would be more accurately translated into S-57. This resource is a member of https://inport.nmfs.noaa.gov/inport/item/39808

Item 1	Item 2	Item 3	Item 4
Bread	Butter	Jam
Mineral water	Chocolate	Eggs	Milk
Spaghetti	Tomato sauce	Parmesan

Facebook

Twitter

Click to copy link

Link copied

Cite

setareh kian (2025). UDayton24Automotive Datasets [Dataset]. https://www.kaggle.com/datasets/setarehkian/udayton24automotive-datasets

UDayton24Automotive Datasets

Object detection on road elements with RGGB/RCCB CFA sensors data

Explore at:

zip(14313285997 bytes)Available download formats

Dataset updated

Aug 6, 2025

Authors

setareh kian

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

Datasets for automotive applications require human annotators to label objects such as traffic lights, cars, and pedestrians. There are many available today (e.g. image data sets and infrared images), as well sensor fusion data sets (e.g. image/RADAR/LiDAR, images with athermalized lenses, and images with event-based sensor data). UDayton24Automotive differs from other datasets in the sense that it is specifically designed for developing, training, and benchmarking object detection algorithms using raw sensor data. Multiple automotive cameras are involved, as described below.

RGGB Camera Data (Baseline Training Set) We collected a new dataset of raw/demosaicked image pairs using automotive camera (SONY IMX390 camera with RGGB color filter array and 174 degree fisheye camera), yielding 438 images for training and 88 images for testing tasks. The dataset was annotated by human for cars (3089), pedestrians (687), stop signs (110), and traffic lights (848). This dataset is used to train the raw sensor data-based object detection algorithm for the RGGB camera module, which we may regards as the “teacher” algorithm in knowledge distillation.

RCCB Camera Data (Test Set) We collected this dataset by using the RCCB camera module with 169 degree fisheye lens to test and evaluate the performance of the proposed object detection algorithm. There are total number of 474 raw/demosaicked image pairs captured by this automotive camera. The dataset was annotated by human for cars (2506), pedestrians (406), stop-signs (109),and traffic lights (784).

Joint RGGB-RCCB Camera Data (Cross-Camera Training Set) We collected 90 RGGB-RCCB pair images using the dual-camera configuration shown in 2 and captured by Sony IMX390 Cameras with RGGB and RCCB color filter arrays. As this dataset is intended to support the unsupervised learning of raw RCCB sensor data-based object detection, the image pairs in this dataset are not annotated. The two cameras are externally triggered by two separate laptops (again, limitation to the hardware/software environment we are given). Although not perfectly synchronized, they are manually triggered together so that they are captured within a fraction of a second. Unlike the RGGB Camera Dataset (Baseline Training Set) or the RCCB Camera Data (Test Set), the RGGB-RCCB Camera Dataset does not need to contain moving targets such as pedestrians and cars, and therefore strict synchronization is not necessary.

Clear search

Close search

Google apps

Main menu

UDayton24Automotive Datasets

ObChange Dataset

Dataset Description

Dataset Structure

Acknowledgements

References

The ORBIT (Object Recognition for Blind Image Training)-India Dataset

Simple download service (Atom) of the dataset: Carentan PPRL Hazard Zone

Simple download service (Atom) of the dataset: Hazard zone of the PPRN SAONE...

Cyclist Dataset for Object Detection

Cyclist Dataset

Context

Content

Example yolo bounding box:

Acknowledgments

License Terms

Cite

Data from: HL Dataset: Visually-grounded Description of Scenes, Actions and...

OCID – Object Clutter Indoor Dataset

OCID – Object Clutter Indoor Dataset

Dataset

ARID20-structure

ARID10-structure

YCB10-structure

Structure:

Dataset creation using EasyLabel:

Research paper

Contact & credits

References

Vehicle Detection Dataset image

Overview

Source and Creation

Usage

License

Medical Image DataSet: Brain Tumor Detection

Medical Image DataSet: Brain Tumor Detection

Medical Image Dataset: Brain Tumor Detection

Jurisdictional Unit (Public) - Dataset - CKAN

SH17 Dataset for PPE Detection

Paper available at Arxiv Link.

GitHub link: https://github.com/ahmadmughees/SH17dataset

Key features

Classes

Disclaimer and Responsible Use:

Users should adhere to Copyright Notice of Pexels when utilizing this dataset.

Allowed 👌

No Warranty Disclaimer:

Ethical Use:

GitHub Page:

Citation:

Dataset Direct Download Service (WFS): PPRMT Pontoise (Area area)

Retail Market Basket Transactions Dataset

Overview

Detailed Information

📊 Dataset Composition

🛒 Nature of Transactions

🔎 Examples of Data

📈 Applications of This Dataset

📌 Key Insights Potentially Hidden in the Dataset

📂 Dataset Limitations

🔮 Potential Extensions

...

Cartographic Sign Detection Dataset (CaSiDD)

Organization of the data

Label syntax

Classes

Model weights

Descriptive statistics

Use and Citation

Corresponding author

Work ethics

License

Liability

References

InternData-A1

Ufba 425 Wniel Izom Dataset

Overview

Introduction

Object Classes

11