Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
The Pascal VOC 2012 dataset is a dataset of images with object annotations. The dataset consists of 20 object classes, and each image is labeled with a bounding box for each object. The Pascal VOC 2012 dataset has been used to train and evaluate a variety of object detection algorithms.
This dataset contains the data from the PASCAL Visual Object Classes Challenge, corresponding to the Classification and Detection competitions.
In the Classification competition, the goal is to predict the set of labels contained in the image, while in the Detection competition the goal is to predict the bounding box and label of each individual object. WARNING: As per the official dataset, the test set of VOC2012 does not contain annotations.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('voc', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/voc-2007-5.0.0.png" alt="Visualization" width="500px">
This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL semantic segmentation task by providing annotations for the whole scene. The statistics section has a full list of 400+ labels. Since the dataset is an annotation of PASCAL VOC 2010, it has the same statistics as those of the original dataset. Training and validation contains 10,103 images while testing contains 9,637 images.
Pascal VOC Dataset 2012, is the standard dataset for Image Segmentation, Detection, Localization, etc. In image segmentation: We need to predict per pixel prediction. Object Detection: We need to specify what classes are present in the given image. We can also bound them using a bounding box.
It contains two directories one contains validation and training set and the other contains the test set. Inside the train_val directory, it contains an Image set which contains a text file that represents training and validation instances. For Every image, it provides Class labels and Objects Labels along with annotations. Labeled images contain the class label per pixel.
The same goes for the Test set. The predicted labels of the test set are also present inside SegmentationClass or Segmentation object depending on which application you are working on.
I downloaded the dataset from the standard PASCAL VOC site.
I made this dataset available so that everyone can use it to train their model for various different applications, and have a chance to embrace their knowledge in the respective field.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance on the PASCAL VOC 2012 test set, compared to weakly supervised approaches based only on image-level labels.
This repository contains the mapping from integer id's to actual label names (in HuggingFace Transformers typically called id2label) for several datasets. Current datasets include:
ImageNet-1k ImageNet-22k (also called ImageNet-21k as there are 21,843 classes) COCO detection 2017 COCO panoptic 2017 ADE20k (actually, the MIT Scene Parsing benchmark, which is a subset of ADE20k) Cityscapes VQAv2 Kinetics-700 RVL-CDIP PASCAL VOC Kinetics-400 ...
You can read in a label file as follows (using… See the full description on the dataset page: https://huggingface.co/datasets/huggingface/label-files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
PASCAL VOC 2007 is a dataset for object detection tasks - it contains Objects annotations for 9,960 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Original Dataset is used in the paper, (S. Liu, J. Feng, C. Domokos, H. Xu, J. Huang, Z. Hu, & S. Yan. 2014) CFPD | Fashion Parsing with Weak Color-Category Labels, with for Object Detection and Segmentation tasks (https://sites.google.com/site/fashionparsing)
This dataset is custom for Object Detection task, with remove skin, face, background infomation, and format follow PASCAL VOC format. The classes of the this dataset: -sunglass, -hat, -jacket, -shirt, -pants, -shorts, -skirt, -dress, -bag, -shoe
Note: If you want .txt file with YOLO format, you can use Annotations_txt directory.
WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('wider_face', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/wider_face-0.1.0.png" alt="Visualization" width="500px">
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset supports the training and evaluation of the label detection module of the ELIE pipeline, designed for object detection in multi-label image datasets. The dataset consists of annotated images in JPEG and Pascal VOC XML format, split into training (2224 images), validation (556 images), and test (278 images) subsets. The data were derived from four collections, LEPPHIL, Bees & Bytes, AntWeb, and Picturae_MfN, each annotated with the “label” class. This dataset facilitates reproducibility and benchmarking of label detection performance across diverse digitized natural history image sources.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Pascal Panoptic Parts dataset consists of annotations for the part-aware panoptic segmentation task on the PASCAL VOC 2010 dataset. It is created by merging scene-level labels from PASCAL-Context with part-level labels from PASCAL-Part.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The water bottle detection dataset and measurement model dataset for the paper titled "The Semantic PHD Filter for Multi-class Target Tracking: From Theory to Practice" by Jun Chen, Zhanteng Xie and Philip Dames, and the paper titled "Experimental Datasets and Processing Codes for the Semantic PHD Filter" by Zhanteng Xie, Jun Chen and Philip Dames
1. Detection dataset:
Size:
Total: 4870 images
Training: 4000 images
Validation: 870 images
Bottle Classes: Aquafina, Deer, Kirkland, Nestle
Format: PASCAL VOC, Darknet
Folder Structure:
- Annotations: containing the xml label files in PASCAL VOC format
- ImageSets: containing the training index files
- JPEGImages: containing the image data in jpg format
- Labels: containing the txt label files in Darknet format
2. Measurement model dataset:
Format: ROSBAG
Duration: 19:59s (1199s)
Topics:
/darknet_ros/detection_image 3543 msgs : sensor_msgs/Image
/map 1 msg : nav_msgs/OccupancyGrid
/sphd_measurements 3585 msgs : sphd_msgs/SPHDMeasurements
/tf 142727 msgs : tf2_msgs/TFMessage
/tf_static 1 msg : tf2_msgs/TFMessage
Message Types:
nav_msgs/OccupancyGrid
sensor_msgs/Image
sphd_msgs/SPHDMeasurements
tf2_msgs/TFMessage
3. Processing codes:
Detection processing:
Zenodo: https://doi.org/10.5281/zenodo.7066045
GitHub: https://github.com/TempleRAIL/yolov3_bottle_detector
Measurement model processing:
Zenodo: https://doi.org/10.5281/zenodo.7066050
GitHub: https://github.com/TempleRAIL/sphd_sensor_models
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Object detection Dataset was collected with a Himax HM01B0 greyscale camera. The datasets contain QVGA images of Bottles and Tin-Cans and their respective labels. The labels follow the PascalVOC format specification. The dataset also include tfrecod files for ease of use with tensorflow. This dataset was used in our paper Bio-inspired Autonomous Exploration Policies with CNN-based Object Detection on Nano-drones
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ODDS Smart Building Depth Dataset
The goal of this dataset is to facilitate research focusing on recognizing objects in smart buildings using the depth sensor mounted at the ceiling. This dataset contains annotations of depth images for eight frequently seen object classes. The classes are: person, backpack, laptop, gun, phone, umbrella, cup, and box.
We collected data from two settings. We had Kinect mounted at a 9.3 feet ceiling near to a 6 feet wide door. We also used a tripod with a horizontal extender holding the kinect at a similar height looking downwards. We asked about 20 volunteers to enter and exit a number of times each in different directions (3 times walking straight, 3 times walking towards left side, 3 times walking towards right side) holding objects in many different ways and poses underneath the Kinect. Each subject was using his/her own backpack, purse, laptop, etc. As a result, we considered varieties within the same object, e.g., for laptops, we considered Macbooks, HP laptops, Lenovo laptops of different years and models, and for backpacks, we considered backpacks, side bags, and purse of women. We asked the subjects to walk while holding it in many ways, e.g., for laptop, the laptop was fully open, partially closed, and fully closed while carried. Also, people hold laptops in front and side of their bodies, and underneath their elbow. The subjects carried their backpacks in their back, in their side at different levels from foot to shoulder. We wanted to collect data with real guns. However, bringing real guns to the office is prohibited. So, we obtained a few nerf guns and the subjects were carrying these guns pointing it to front, side, up, and down while walking.
The Annotated dataset is created following the structure of Pascal VOC devkit, so that the data preparation becomes simple and it can be used quickly with different with object detection libraries that are friendly to Pascal VOC style annotations (e.g. Faster-RCNN, YOLO, SSD). The annotated data consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the eight classes present in the image. Multiple objects from multiple classes may be present in the same image. The dataset has 3 main directories:
1)DepthImages: Contains all the images of training set and validation set.
2)Annotations: Contains one xml file per image file, (e.g., 1.xml for image file 1.png). The xml file includes the bounding box annotations for all objects in the corresponding image.
3)ImagesSets: Contains two text files training_samples.txt and testing_samples.txt. The training_samples.txt file has the name of images used in training and the testing_samples.txt has the name of images used for testing. (We randomly choose 80%, 20% split)
The un-annotated data consists of several set of depth images. No ground-truth annotation is available for these images yet. These un-annotated sets contain several challenging scenarios and no data has been collected from this office during annotated dataset construction. Hence, it will provide a way to test generalization performance of the algorithm.
If you use ODDS Smart Building dataset in your work, please cite the following reference in any publications: @inproceedings{mithun2018odds, title={ODDS: Real-Time Object Detection using Depth Sensors on Embedded GPUs}, author={Niluthpol Chowdhury Mithun and Sirajum Munir and Karen Guo and Charles Shelton}, booktitle={ ACM/IEEE Conference on Information Processing in Sensor Networks (IPSN)}, year={2018}, }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Personal Protective Equipment Dataset (PPED)
This dataset serves as a benchmark for PPE in chemical plants We provide datasets and experimental results.
We produced a data set based on the actual needs and relevant regulations in chemical plants. The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories. The complete dataset is contained in the folder PPED/data.
1.1. Image collection
We took more than 3300 pictures. We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.
Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.
Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.
Light: Good lighting conditions and poor lighting conditions were studied.
Diversity: Some images contain a single person, and some contain multiple people.
Angle: The pictures we took can be divided into front and side.
A total of more than 3300 photos were taken in the raw data under all conditions. All images are located in the folder “PPED/data/JPEGImages”.
1.2. Label
We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format. Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file. Annotations are stored in the folder PPED/data/Annotations
1.3. Dataset Features
The pictures are made by us according to the different conditions mentioned above. The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.
1.4. Dataset Division
The data set is divided into 9:1 training set and test set.
We provide baseline results with five models, namely Faster R-CNN ®, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder PPED/experiment.
2.1. Environment and Configuration:
Intel Core i7-8700 CPU
NVIDIA GTX1060 GPU
16 GB of RAM
Python: 3.8.10
pytorch: 1.9.0
pycocotools: pycocotools-win
Windows 10
2.2. Applied Models
The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.
2.2.1. Faster R-CNN
Faster R-CNN
backbone: resnet50+fpn
We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.
We modified the dataset path, training classes and training parameters including batch size.
We run train_res50_fpn.py start training.
Then, the weights are trained by the training set.
Finally, we validate the results on the test set.
backbone: mobilenetv2
the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.
The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN. The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth. The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).
2.2.2. SSD
backbone: resnet50
We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.
The same training method as Faster R-CNN is applied.
The SSD source code used in our experiment is given in folder PPED/experiment/ssd. The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth. The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.
2.2.3. YOLOv3-spp
backbone: DarkNet53
We modified the type information of the XML file to match our application.
We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.
The weights used are: yolov3-spp-ultralytics-608.pt.
The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt. The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.
2.2.4. YOLOv5
backbone: CSP_DarkNet
We modified the type information of the XML file to match our application.
We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.
The weights used are: yolov5s.
The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5. The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt. The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.
2.3. Evaluation
The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.
Faster R-CNN (R and M)
official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py
SSD
official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py
YOLOv3-spp
YOLOv5
Sourced from: https://www.tensorflow.org/datasets/catalog/wider_face
WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.
Homepage: http://shuoyang1213.me/WIDERFACE/
Source code: tfds.object_detection.WiderFace
Versions:
0.1.0 (default): No release notes. Download size: 3.42 GiB
Dataset size: 3.45 GiB
Auto-cached (documentation): No
Splits:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets and directories are structured similar to the PASCAL VOC dataset, avoiding the need to change scripts already available, with the detection frameworks ready to parse PASCAL VOC annotations into their format.
The sub-directory JPEGImages consist of 1730 images (612x512 pixels) used for train, test and validation. Each image has at least one annotated fruit. The sub-directory Annotations consists of all the annotation files (record of bounding box coordinates for each image) in xml format and have the same name as the image name. The sub-directory Main consists of the text file that contains image names (without extension) used for train, test and validation. Training set (train.txt) lists 1300 train images Validation set (val.txt) lists 130 validation images Test set (test.txt) lists 300 test images
Each image has an XML annotation file (filename = image name) and each image set (training validation and test set) has associated text files (train.txt, val.txt and test.txt) containing the list of image names to be used for training and testing. The XML annotation file contains the image attributes (name, width, height), the object attributes (class name, object bounding box co-ordinates (xmin, ymin, xmax, ymax)). (xmin, ymin) and (xmax, ymax) are the pixel co-ordinates of the bounding box’s top-left corner and bottom-right corner respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mechanical Parts Dataset
The dataset consists of a total of 2250 images obtained by downloading from various internet platforms. Among the images in the dataset, there are 714 images with bearings, 632 images with bolts, 616 images with gears and 586 images with nuts. A total of 10597 manual labeling processes were carried out in the dataset, including 2099 labels belonging to the bearing class, 2734 labels belonging to the bolt class, 2662 labels belonging to the gear class and 3102 labels belonging to the nut class.
Folder Content
The created dataset is divided into 3 as 80% train, 10% validation and 10% test. In the "Mechanical Parts Dataset" folder, there are three separate folders as "train", "test" and "val". In each of these three folders there are folders named "images" and "labels". Images are kept in the "images" folder and tag information is kept in the "labels" folder.
Finally, inside the folder there is a yaml file named "mech_parts_data" for the Yolo algorithm. This file contains the number of classes and class names.
Images and Labels
The dataset was prepared in accordance with the Yolov5 algorithm.
For example, the tag information of the image named "2a0xhkr_jpg.rf.45a11bf63c40ad6e47da384fdf6bb7a1.jpg" is stored in the txt file with the same name. The tag information (coordinates) in the txt file are as follows: "class x_center y_center width height".
Update 05.01.2023
***Pascal voc and coco json formats have been added.***
Related paper: doi.org/10.5281/zenodo.7496767
Explore a comprehensive dataset featuring over 14000 labelled urban tree canopies in images from across the globe, specifically curated for advancing tree detection methodologies. The dataset comprises image tiles in both .tif, .jpg and .png formats, following the Pascal VOC and YOLO standards. Additionally, we included a .csv summary file with all annotations. To enhance usability, labels are provided in three formats: .xml, and .txt. The RGB tiles, utilized in this dataset, are openly accessible.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains images of yellow sticky traps and bounding box annotations for three classes of flying insects found in greenhouses. The annotated classes are Macrolophus pygmaeus, Nesidiocoris tenuis and Trialeurodes vaporariorum (Whitefly).
The dataset is based on the original version "Raw data from Yellow Sticky Traps with insects for training of deep learning Convolutional Neural Network for object detection" by A.T. Nieuwenhuizen et. al. (see source). This version contains corrected annotations and uniform image orientations.
The yellow sticky dataset consists of: * 284 landscape JPEG images of 5184 x 3456 px * 8114 bounding box annotations: * 1619 Macrolophus pygmaeus * 688 Nesidiocoris tenuis * 5807 Trialeurodes vaporariorum (Whitefly)
Compared to the original dataset by A.T. Nieuwenhuizen et. al. the Exif image rotation information were fixed to match the landscape-oriented images.
Additionally, the annotation quality was improved by labeling previously unlabeled objects, fixing wrong labeled classes, resizing bounding boxes and improving the location of bounding boxes.
The annotations were improved with LabelImg, created by Tzutalin. Ground truth information is stored in XML files in PASCAL VOC format.
The labeling process was carried out to the best of our knowledge.
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
The Pascal VOC 2012 dataset is a dataset of images with object annotations. The dataset consists of 20 object classes, and each image is labeled with a bounding box for each object. The Pascal VOC 2012 dataset has been used to train and evaluate a variety of object detection algorithms.