This dataset is formatted in the PASCAL VOC format, a widely-used structure for object detection tasks. It includes high-quality images, their corresponding bounding box annotations, and predefined splits for training, validation, and testing.
Structure:
VOC/
├── Annotations # XML files with bounding box and class labels
├── JPEGImages # All images in .jpg format
├── ImageSets
│ └── Main # Contains train.txt, val.txt, and test.txt
Content:
.jpg
files stored in JPEGImages
..xml
files in Annotations
folder that contain bounding box coordinates and object class names for each image.train.txt
, val.txt
, and test.txt
in ImageSets/Main
specify which images belong to each split for training, validation, and testing.Organized Dataset:
Standardized XML Files:
<path>
tag in each XML file to reflect the correct relative path (e.g., JPEGImages/image1.jpg
) or removed it if unnecessary.<folder>
tag is standardized (e.g., set to VOC
) or removed for compatibility.Created Train-Val-Test Splits:
train.txt
, val.txt
, and test.txt
files in the ImageSets/Main
directory.Validated Class Distribution:
This dataset contains the following object classes:
Class Name (English) | Class Name (Chinese) | Abbreviation |
---|---|---|
Manhole Cover | 井盖 (jg) | jg |
Crossing Light | 人行灯 (rxd) | rxd |
Pipeline Indicating Pile | 地下管线桩 (dxgx) | dxgx |
Traffic Signs | 指示牌 (zsp) | zsp |
Hydrant | 消防栓 (xfs) | xfs |
Camera | 电子眼 (dzy) | dzy |
Traffic Light | 红绿灯 (lhd) | lhd |
Guidepost | 街道路名牌 (jdp) | jdp |
Traffic Warning Sign | 警示牌 (jsp) | jsp |
Streetlamp | 路灯 (ld) | ld |
Communication Box | 通讯箱 (txx) | txx |
This dataset contains different animal categories such as: buffalo, capybara, cat, cow, deer, dog, elephant, flamingo, giraffe, jaguar, kangaroo, lion, parrot, penguin, rhino, sheep, tiger, turtle and zebra.
Most of the images can be found in existing datasets: https://github.com/freds0/capybara_dataset https://universe.roboflow.com/miguel-narbot-usp-br/capybara-and-animals/dataset/1 https://www.kaggle.com/datasets/hugozanini1/kangaroodataset?resource=download https://github.com/experiencor/kangaroo https://universe.roboflow.com/z-jeans-pig/kangaroo-epscj/dataset/1 https://cvwc2019.github.io/challenge.html# https://www.kaggle.com/datasets/biancaferreira/african-wildlife https://universe.roboflow.com/new-workspace-5kofa/elephant-dataset/dataset/6 https://universe.roboflow.com/nathanael-hutama-harsono/large-cat/dataset/1/images/?split=train https://universe.roboflow.com/giraffs-and-cows/giraffes-and-cows/dataset/1 https://universe.roboflow.com/turtledetector/turtledetector/dataset/2 https://www.kaggle.com/datasets/smaranjitghose/sea-turtle-face-detection https://universe.roboflow.com/fadilyounes-me-gmail-com/zebra---savanna/dataset/1 https://universe.roboflow.com/test-qeryf/yolov5-9snhq https://universe.roboflow.com/or-the-king/two-zebras https://universe.roboflow.com/wild-animals-datasets/zebra-images/dataset/2 https://universe.roboflow.com/zebras/zebras/dataset/2 https://universe.roboflow.com/v2-rabotaem-xkxra/zebras_v2/dataset/5 https://universe.roboflow.com/vijay-vikas-mangena/animal_od_test1/dataset/1 https://universe.roboflow.com/bdoma13-gmail-com/rhino_horn/dataset/7 https://universe.roboflow.com/rudtkd134-naver-com/finalproject2/dataset/2 https://universe.roboflow.com/the-super-nekita/cats-brofl/dataset/2 https://universe.roboflow.com/lihi-gur-arie/pinguin-object-detection/dataset/2 https://universe.roboflow.com/utas-377cc/penguindataset-4dujc/dataset/10 https://universe.roboflow.com/new-workspace-tdyir/penguin-clfnj/dataset/1 https://universe.roboflow.com/utas-wd4sd/kit315_assignment/dataset/7 https://universe.roboflow.com/jeonjuuniv/deer-hqp4i/dataset/1 https://universe.roboflow.com/new-workspace-hqowp/sheeps/dataset/1 https://universe.roboflow.com/ali-eren-altindag/sheepstest2/dataset/1 https://universe.roboflow.com/yaser/sheep-0gudu/dataset/3 https://universe.roboflow.com/ali-eren-altindag/mixed_sheep/dataset/1 https://universe.roboflow.com/pkm-kc-2022/sapi-birahi/dataset/2 https://universe.roboflow.com/ghostikgh/team1_cows/dataset/5 https://universe.roboflow.com/ml-dlq4x/liontrain/dataset/2 https://universe.roboflow.com/animals/lionnew/dataset/2 https://universe.roboflow.com/parrottrening/parrot_trening/dataset/1 https://universe.roboflow.com/uet-hi8bg/parrots-r4tfl/dataset/1 https://universe.roboflow.com/superweight/parrot_poop/dataset/5 https://www.kaggle.com/datasets/tarunbisht11/intruder-detection
From those datasets the images has been filtered (deleted objects of size smaller 32, images with dimension smaller than 320px has been deleted, images and labeled objects has been renamed). The rest of images has been labeled by me.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use this dataset, please cite this paper: Puertas, E.; De-Las-Heras, G.; Sánchez-Soriano, J.; Fernández-Andrés, J. Dataset: Variable Message Signal Annotated Images for Object Detection. Data 2022, 7, 41. https://doi.org/10.3390/data7040041
This dataset consists of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Also, a CSV file is attached with information regarding the geographic position, the folder where the image is located, and the text in Spanish. This can be used to train supervised learning computer vision algorithms, such as convolutional neural networks. Throughout this work, the process followed to obtain the dataset, image acquisition, and labeling, and its specifications are detailed. The dataset is constituted of 1216 instances, 888 positives, and 328 negatives, in 1152 jpg images with a resolution of 1280x720 pixels. These are divided into 576 real images and 576 images created from the data-augmentation technique. The purpose of this dataset is to help in road computer vision research since there is not one specifically for VMSs.
The folder structure of the dataset is as follows:
In which:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset consists of CT and PET-CT DICOM images of lung cancer subjects with XML Annotation files that indicate tumor location with bounding boxes. The images were retrospectively acquired from patients with suspicion of lung cancer, and who underwent standard-of-care lung biopsy and PET/CT. Subjects were grouped according to a tissue histopathological diagnosis. Patients with Names/IDs containing the letter 'A' were diagnosed with Adenocarcinoma, 'B' with Small Cell Carcinoma, 'E' with Large Cell Carcinoma, and 'G' with Squamous Cell Carcinoma.
The images were analyzed on the mediastinum (window width, 350 HU; level, 40 HU) and lung (window width, 1,400 HU; level, –700 HU) settings. The reconstructions were made in 2mm-slice-thick and lung settings. The CT slice interval varies from 0.625 mm to 5 mm. Scanning mode includes plain, contrast and 3D reconstruction.
Before the examination, the patient underwent fasting for at least 6 hours, and the blood glucose of each patient was less than 11 mmol/L. Whole-body emission scans were acquired 60 minutes after the intravenous injection of 18F-FDG (4.44MBq/kg, 0.12mCi/kg), with patients in the supine position in the PET scanner. FDG doses and uptake times were 168.72-468.79MBq (295.8±64.8MBq) and 27-171min (70.4±24.9 minutes), respectively. 18F-FDG with a radiochemical purity of 95% was provided. Patients were allowed to breathe normally during PET and CT acquisitions. Attenuation correction of PET images was performed using CT data with the hybrid segmentation method. Attenuation corrections were performed using a CT protocol (180mAs,120kV,1.0pitch). Each study comprised one CT volume, one PET volume and fused PET and CT images: the CT resolution was 512 × 512 pixels at 1mm × 1mm, the PET resolution was 200 × 200 pixels at 4.07mm × 4.07mm, with a slice thickness and an interslice distance of 1mm. Both volumes were reconstructed with the same number of slices. Three-dimensional (3D) emission and transmission scanning were acquired from the base of the skull to mid femur. The PET images were reconstructed via the TrueX TOF method with a slice thickness of 1mm.
The location of each tumor was annotated by five academic thoracic radiologists with expertise in lung cancer to make this dataset a useful tool and resource for developing algorithms for medical diagnosis. Two of the radiologists had more than 15 years of experience and the others had more than 5 years of experience. After one of the radiologists labeled each subject the other four radiologists performed a verification, resulting in all five radiologists reviewing each annotation file in the dataset. Annotations were captured using Labellmg. The image annotations are saved as XML files in PASCAL VOC format, which can be parsed using the PASCAL Development Toolkit: https://pypi.org/project/pascal-voc-tools/. Python code to visualize the annotation boxes on top of the DICOM images can be downloaded here.
Two deep learning researchers used the images and the corresponding annotation files to train several well-known detection models which resulted in a maximum a posteriori probability (MAP) of around 0.87 on the validation set.
Dataset link: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70224216
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use this dataset, please cite this paper: Puertas, E.; De-Las-Heras, G.; Fernández-Andrés, J.; Sánchez-Soriano, J. Dataset: Roundabout Aerial Images for Vehicle Detection. Data 2022, 7, 47. https://doi.org/10.3390/data7040047
This publication presents a dataset of Spanish roundabouts aerial images taken from an UAV, along with annotations in PASCAL VOC XML files that indicate the position of vehicles within them. Additionally, a CSV file is attached containing information related to the location and characteristics of the captured roundabouts. This work details the process followed to obtain them: image capture, processing and labeling. The dataset consists of 985,260 total instances: 947,400 cars, 19,596 cycles, 2,262 trucks, 7,008 buses and 2,208 empty roundabouts, in 61,896 1920x1080px JPG images. These are divided into 15,474 extracted images from 8 roundabouts with different traffic flows and 46,422 images created using data augmentation techniques. The purpose of this dataset is to help research on computer vision on the road, as such labeled images are not abundant. It can be used to train supervised learning models, such as convolutional neural networks, which are very popular in object detection.
Roundabout (scenes) |
Frames |
Car |
Truck |
Cycle |
Bus |
Empty |
1 (00001) |
1,996 |
34,558 |
0 |
4229 |
0 |
0 |
2 (00002) |
514 |
743 |
0 |
0 |
0 |
157 |
3 (00003-00017) |
1,795 |
4822 |
58 |
0 |
0 |
0 |
4 (00018-00033) |
1,027 |
6615 |
0 |
0 |
0 |
0 |
5 (00034-00049) |
1,261 |
2248 |
0 |
550 |
0 |
81 |
6 (00050-00052) |
5,501 |
180,342 |
1420 |
120 |
1376 |
0 |
7 (00053) |
2,036 |
5,789 |
562 |
0 |
226 |
92 |
8 (00054) |
1,344 |
1,733 |
222 |
0 |
150 |
222 |
Total |
15,474 |
236,850 |
2,262 |
4,899 |
1,752 |
552 |
Data augmentation |
x4 |
x4 |
x4 |
x4 |
x4 |
x4 |
Total |
61,896 |
947,400 |
9048 |
19,596 |
7,008 |
2,208 |
Just a simple dataset to demonstrate object detection and classification in the retail environment, preferably using computer vision.
This dataset contains resized images which have been annotated using LabelIMG. These resized images are founded in the directory 'ResizedImages' while corresponding XML notations in the Pascal VOC format. I used a YOLOv3 model to use this data. As of November 13, 2020, only three categories of products exists: 'can', 'shampoo', and 'spice.' Images vary in number of objects, with some images sporting only one object of one class, others sporting multiple object of the same class, and lastly, some sporting multiple objects of different classes.
The inspiration from this dataset was the need for a submission to the FLAIRS conference.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A large dataset containing aerial images from fields in Juchowo, Poland and Wageningen, the Netherlands, with annotated cows present in the images using Pascal VOC XML Annotation Format. This dataset has been used to train various Deep Learning models (nanonets, yolov3 and the like) as part of the GenTORE project (https://www.gentore.eu ) Please download all the files, then use 7-zip to unzip the multi part archive.
Data abstract: This Zenodo upload contains the Turkish Pedestrian Dataset (TURPED) for benchmarking and developing pedestrian detection methods for autonomous driving assistance systems. There are three folders named "Annotations", "Image Sets" and "JPEGImages". The annotations folder includes the pedestrian labels for each image in an XML format. The standard Pascal VOC XML annotation format is chosen to ease of use. The TXT files in the image sets folder describe which images are in training/validation/test sets. Finally, the images can be found in the JPEGImages folder in JPEG format.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
The dataset comprises edited recordings of Mexican sign language's Dactylology (29 signs) and Ten First Numbers (from 1 to 10), including static and continuous signs accordingly from person 1 to person 5. The edited recordings are organized for easy access and management. Edited videos and screenshots of static signs are labeled in their file with corresponding sign language representations and stored in a consistent order per person, the number of the cycle of recording, and per hand. Static sign images can be exported in PASCAL VOC format with XML annotations too. The dataset is designed to facilitate feature extraction and further analysis in Mexican sign language recognition research.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 313 images of Iranian vehicle plates in 224x224. Annotations are for 224x224 images and they are in PASCAL VOC XML format.
Original images are 1280x1280 which do not have annotations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Personal Protective Equipment Dataset (PPED)
This dataset serves as a benchmark for PPE in chemical plants We provide datasets and experimental results.
We produced a data set based on the actual needs and relevant regulations in chemical plants. The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories. The complete dataset is contained in the folder PPED/data.
1.1. Image collection
We took more than 3300 pictures. We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.
Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.
Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.
Light: Good lighting conditions and poor lighting conditions were studied.
Diversity: Some images contain a single person, and some contain multiple people.
Angle: The pictures we took can be divided into front and side.
A total of more than 3300 photos were taken in the raw data under all conditions. All images are located in the folder “PPED/data/JPEGImages”.
1.2. Label
We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format. Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file. Annotations are stored in the folder PPED/data/Annotations
1.3. Dataset Features
The pictures are made by us according to the different conditions mentioned above. The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.
1.4. Dataset Division
The data set is divided into 9:1 training set and test set.
We provide baseline results with five models, namely Faster R-CNN ®, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder PPED/experiment.
2.1. Environment and Configuration:
Intel Core i7-8700 CPU
NVIDIA GTX1060 GPU
16 GB of RAM
Python: 3.8.10
pytorch: 1.9.0
pycocotools: pycocotools-win
Windows 10
2.2. Applied Models
The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.
2.2.1. Faster R-CNN
Faster R-CNN
backbone: resnet50+fpn
We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.
We modified the dataset path, training classes and training parameters including batch size.
We run train_res50_fpn.py start training.
Then, the weights are trained by the training set.
Finally, we validate the results on the test set.
backbone: mobilenetv2
the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.
The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN. The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth. The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).
2.2.2. SSD
backbone: resnet50
We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.
The same training method as Faster R-CNN is applied.
The SSD source code used in our experiment is given in folder PPED/experiment/ssd. The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth. The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.
2.2.3. YOLOv3-spp
backbone: DarkNet53
We modified the type information of the XML file to match our application.
We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.
The weights used are: yolov3-spp-ultralytics-608.pt.
The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt. The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.
2.2.4. YOLOv5
backbone: CSP_DarkNet
We modified the type information of the XML file to match our application.
We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.
The weights used are: yolov5s.
The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5. The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt. The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.
2.3. Evaluation
The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.
Faster R-CNN (R and M)
official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py
SSD
official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py
YOLOv3-spp
YOLOv5
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ODDS Smart Building Depth Dataset
#Introduction:
The goal of this dataset is to facilitate research focusing on recognizing objects in smart buildings using the depth sensor mounted at the ceiling. This dataset contains annotations of depth images for eight frequently seen object classes. The classes are: person, backpack, laptop, gun, phone, umbrella, cup, and box.
#Data Collection:
We collected data from two settings. We had Kinect mounted at a 9.3 feet ceiling near to a 6 feet wide door. We also used a tripod with a horizontal extender holding the kinect at a similar height looking downwards. We asked about 20 volunteers to enter and exit a number of times each in different directions (3 times walking straight, 3 times walking towards left side, 3 times walking towards right side) holding objects in many different ways and poses underneath the Kinect. Each subject was using his/her own backpack, purse, laptop, etc. As a result, we considered varieties within the same object, e.g., for laptops, we considered Macbooks, HP laptops, Lenovo laptops of different years and models, and for backpacks, we considered backpacks, side bags, and purse of women. We asked the subjects to walk while holding it in many ways, e.g., for laptop, the laptop was fully open, partially closed, and fully closed while carried. Also, people hold laptops in front and side of their bodies, and underneath their elbow. The subjects carried their backpacks in their back, in their side at different levels from foot to shoulder. We wanted to collect data with real guns. However, bringing real guns to the office is prohibited. So, we obtained a few nerf guns and the subjects were carrying these guns pointing it to front, side, up, and down while walking.
#Annotated Data Description:
The Annotated dataset is created following the structure of Pascal VOC devkit, so that the data preparation becomes simple and it can be used quickly with different with object detection libraries that are friendly to Pascal VOC style annotations (e.g. Faster-RCNN, YOLO, SSD). The annotated data consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the eight classes present in the image. Multiple objects from multiple classes may be present in the same image. The dataset has 3 main directories:
1)DepthImages: Contains all the images of training set and validation set.
2)Annotations: Contains one xml file per image file, (e.g., 1.xml for image file 1.png). The xml file includes the bounding box annotations for all objects in the corresponding image.
3)ImagesSets: Contains two text files training_samples.txt and testing_samples.txt. The training_samples.txt file has the name of images used in training and the testing_samples.txt has the name of images used for testing. (We randomly choose 80%, 20% split)
#UnAnnotated Data Description:
The un-annotated data consists of several set of depth images. No ground-truth annotation is available for these images yet. These un-annotated sets contain several challenging scenarios and no data has been collected from this office during annotated dataset construction. Hence, it will provide a way to test generalization performance of the algorithm.
#Citation:
If you use ODDS Smart Building dataset in your work, please cite the following reference in any publications:
@inproceedings{mithun2018odds,
title={ODDS: Real-Time Object Detection using Depth Sensors on Embedded GPUs},
author={Niluthpol Chowdhury Mithun and Sirajum Munir and Karen Guo and Charles Shelton},
booktitle={ ACM/IEEE Conference on Information Processing in Sensor Networks (IPSN)},
year={2018},
}
The PESMOD (PExels Small Moving Object Detection) dataset consists of high resolution aerial images in which moving objects are labelled manually. It was created from videos selected from the Pexels website. The aim of this dataset is to provide a different and challenging dataset for moving object detection methods evaluation. Each moving object is labelled for each frame with PASCAL VOC format in a XML file. The dataset consists of 8 different video sequences.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The larch casebearer, Coleophora laricella, is a moth that mainly attacks larch trees and has caused significant damage in larch stands in Västergötland, Sweden.
Original dataset of aerial drone images of larch forests was modified to remove duplicates or badly annotated images. Only images taken in may (20190527) are present here as they contain damage level classes. Also, the annotation files in Pascal VOC XML format were converted to YOLO PyTorch TXT format.
Images were taken in 5 areas around Västergötland, Sweden and the names of the files correspond to different areas:
There are 4 classes in total: * H - healthy larch trees * LD - light damage to larch trees * HD - high damage to larch trees * other - non-larch trees
Training, validation and testing images were selected from different acquisition sites to try creating a better generalizing detection model.
If you use these data in a publication or report, please use the following citation:
Swedish Forest Agency (2021): Forest Damages – Larch Casebearer 1.0. National Forest Data Lab. Dataset.
For questions about this data set, contact Halil Radogoshi (halil.radogoshi@skogsstyrelsen.se) at the Swedish Forest Agency.
More information about the dataset can be found on LILA BC page.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Name: ZeroCostDL4Mic - YoloV2 example training and test dataset
(see our Wiki for details)
Data type: 2D grayscale .png images with corresponding bounding box annotations in .xml PASCAL Voc format.
Microscopy data type: Phase contrast microscopy data (brightfield)
Microscope: Inverted Zeiss Axio zoom widefield microscope equipped with an AxioCam MRm camera, an EL Plan-Neofluar 20 × /0.5 NA objective (Carl Zeiss), with a heated chamber (37 °C) and a CO2 controller (5%).
Cell type: MDA-MB-231 cells migrating on cell-derived matrices generated by fibroblasts.
File format: .png (8-bit)
Image size: 1388 x 1040 px (323 nm)
Author(s): Guillaume Jacquemet1,2,3, Lucas von Chamier4,5
Contact email: lucas.chamier.13@ucl.ac.uk and guillaume.jacquemet@abo.fi
Affiliation(s):
1) Faculty of Science and Engineering, Cell Biology, Åbo Akademi University, 20520 Turku, Finland
2) Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520 Turku
3) ORCID: 0000-0002-9286-920X
4) MRC-Laboratory for Molecular Cell Biology. University College London, London, UK
5) ORCID: 0000-0002-9243-912X
Associated publications: Jacquemet et al 2016. DOI: 10.1038/ncomms13297
Funding bodies: G.J. was supported by grants awarded by the Academy of Finland, the Sigrid Juselius Foundation and Åbo Akademi University Research Foundation (CoE CellMech) and by Drug Discovery and Diagnostics strategic funding to Åbo Akademi University.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Notice: We have currently a paper under double-blind review that introduces this dataset. Therefore, we have anonymized the dataset authorship. Once the review process has concluded, we will update the authorship information of this dataset.
Chinese Chemical Safety Signs (CCSS)
This dataset is compiled as a benchmark for recognizing chemical safety signs from images. We provide both the dataset and the experimental results at doi:10.5281/zenodo.5482334.
The complete dataset is contained in the folder ccss/data in archive css_data.zip. The images include signs based on the Chinese standard "Safety Signs and their Application Guidelines" (GB 2894-2008) for safety signs in chemical environments. This standard, in turn, refers to the standards ISO 7010 (Graphical symbols – Safety Colours and Safety Signs – Safety signs used in workplaces and public areas), GB/T 10001 (Public Information Graphic Symbols for Signs), and GB 13495 (Fire Safety Signs)
1.1. Image Collection
We collect photos commonly used chemical safety signs in chemical laboratories and chemical teaching buildings. For a discussion of the standards we base our collections, refer to the book "Talking about Hazardous Chemicals and Safety Signs" for common signs, and refer to the safety signs guidelines (GB 2894-2008).
The shooting was mainly carried out in 6 locations, namely on the road, in a parking lot, construction walls, in a chemical laboratory, outside near big machines, and inside the factory and corridor.
Shooting scale: Images in which the signs appear in small, medium and large scales were taken for each location by shooting photos from different distances.
Shooting light: good lighting conditions and poor lighting conditions were investigated.
Part of the images contain multiple targets and the other part contains only single signs.
Under all conditions, a total of 4650 photos were taken in the original data. These were expanded to 27'900 photos were via data enhancement. All images are located in folder ccss/data/JPEGImages.
The file ccss/data/features/enhanced_data_to_original_data.csv provides a mapping between the enhanced image name and the corresponding original image.
1.2. Annotation and Labelling
The labelling tool is Labelimg, which uses the PASCAL-VOC labelling format. The annotation is stored in the folder ccss/data/Annotations.
Faster R-CNN and SSD are two algorithms that use this format. When training YOLOv5, you can run trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to a txt file.
We provide further meta-information about the dataset in form of a CSV file features.csv which notes, for each image, which other features it has (lighting conditions, scale, multiplicity, etc.).
1.3. Dataset Features
As stated above, the images have been shot under different conditions. We provide all the feature information in folder ccss/data/features. For each feature, there is a separate list of file names in that folder. The file ccss/data/features/features_on_original_data.csv is a CSV file which notes all the features of each original image.
1.4. Dataset Division
The data set is fixedly divided into 7:3 training set and test set. You can find the corresponding image names in the files ccss/data/training_data_file_names.txt and ccss/data/test_data_file_names.txt.
We provide baseline results with the three models of Faster R-CNN, SSD, and YOLOv5. All code and results is given in folder ccss/experiment in archive ccss_experiment.
2.2. Environment and Configuration
Single Intel Core i7-8700 CPU
NVIDIA GTX1060 GPU
16 GB of RAM
Python: 3.8.10
pytorch: 1.9.0
pycocotools: pycocotools-win
Visual Studio 2017
Windows 10
2.3. Applied Models
The source codes and results of the applied models is given in folder ccss/experiment with sub-folders corresponding to the model names.
2.3.1. Faster R-CNN
backbone: resnet50+fpn.
we downloaded the pre-training weights from
we modify the type information of the JSON file to match our application.
run train_res50_fpn.py
finally, the weights trained by the training set.
backbone: mobilenetv2
the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.
The Faster R-CNN source code used in our experiment is given in folder ccss/experiment/sources/faster_rcnn. The weights of the fully-trained Faster R-CNN model are stored in file ccss/experiment/trained_models/faster_rcnn.pth. The performance measurements of Faster R-CNN are stored in folder ccss/experiment/performance_indicators/faster_rcnn.
2.3.2. SSD
backbone: resnet50
we downloaded pre-training weights from
the same training method as Faster R-CNN is applied.
The SSD source code used in our experiment is given in folder ccss/experiment/sources/ssd. The weights of the fully-trained SSD model are stored in file ccss/experiment/trained_models/ssd.pth. The performance measurements of SSD are stored in folder ccss/experiment/performance_indicators/ssd.
2.3.4. YOLOv5
backbone: CSP_DarkNet
we modified the type information of the YML file to match our application
run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.
the weights used are: yolov5s.
The YOLOv5 source code used in our experiment is given in folder ccss/experiment/sources/yolov5. The weights of the fully-trained YOLOv5 model are stored in file ccss/experiment/trained_models/yolov5.pt. The performance measurements of YOLOv5 are stored in folder ccss/experiment/performance_indicators/yolov5.
2.4. Evaluation
The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder ccss/experiment/performance_indicators. They are provided over the complete test st as well as separately for the image features (over the test set).
Faster R-CNN
official code:
SSD
official code:
YOLOv5
We are particularly thankful to the author of the GitHub repository WZMIAOMIAO/deep-learning-for-image-processing (with whom we are not affiliated). Their instructive videos and codes were most helpful during our work. In particular, we based our own experimental codes on his work (and obtained permission to include it in this archive).
While our dataset and results are published under the Creative Commons Attribution 4.0 License, this does not hold for the included code sources. These sources are under the particular license of the repository where they have been obtained from (see Section 3 above).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A subset of 124 images has been selected from the DocExplore dataset in order to create a detection dataset of drawings in medieval manuscripts. Since the original dataset has been published without providing any annotations, we selected and annotated 8 different patterns in the subset, which resulted in a total of 268 annotated instance.
The dataset has been split into three sub-sets (train, validation, test) in order to follow the standard procedure followed by modern deep-learning models. Furthermore, each image has one corresponding ".xml" file containing the annotation information of drawings in that image using the Pascal Voc format.
The research for this work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy – EXC 2176 ‘Understanding Written Artefacts: Material, Interaction and Transmission in Manuscript Cultures', project no. 390893796. The research was conducted within the scope of the Centre for the Study of Manuscript Cultures (CSMC) at Universität Hamburg.
In addition, we thank Aneta Yotova for annotating the samples in this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets and directories are structured similar to the PASCAL VOC dataset, avoiding the need to change scripts already available, with the detection frameworks ready to parse PASCAL VOC annotations into their format.
The sub-directory JPEGImages consist of 1730 images (612x512 pixels) used for train, test and validation. Each image has at least one annotated fruit. The sub-directory Annotations consists of all the annotation files (record of bounding box coordinates for each image) in xml format and have the same name as the image name. The sub-directory Main consists of the text file that contains image names (without extension) used for train, test and validation. Training set (train.txt) lists 1300 train images Validation set (val.txt) lists 130 validation images Test set (test.txt) lists 300 test images
Each image has an XML annotation file (filename = image name) and each image set (training validation and test set) has associated text files (train.txt, val.txt and test.txt) containing the list of image names to be used for training and testing. The XML annotation file contains the image attributes (name, width, height), the object attributes (class name, object bounding box co-ordinates (xmin, ymin, xmax, ymax)). (xmin, ymin) and (xmax, ymax) are the pixel co-ordinates of the bounding box’s top-left corner and bottom-right corner respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tank Detection and Count Dataset contains 760 satellite image tiles of size 512*512 pixels and one-pixel cover 30cm*30cm at ground level. Each tile is associated with .xml and .txt files. Both .xml and .txt file contains the same annotations of oil/gas tanks but in a different format. .xml contains the pascal VOC format and in .txt file, every line contains the class of the tank and four coordinates of the bounding box: xmin, ymin, xmax, ymax.
This dataset is formatted in the PASCAL VOC format, a widely-used structure for object detection tasks. It includes high-quality images, their corresponding bounding box annotations, and predefined splits for training, validation, and testing.
Structure:
VOC/
├── Annotations # XML files with bounding box and class labels
├── JPEGImages # All images in .jpg format
├── ImageSets
│ └── Main # Contains train.txt, val.txt, and test.txt
Content:
.jpg
files stored in JPEGImages
..xml
files in Annotations
folder that contain bounding box coordinates and object class names for each image.train.txt
, val.txt
, and test.txt
in ImageSets/Main
specify which images belong to each split for training, validation, and testing.Organized Dataset:
Standardized XML Files:
<path>
tag in each XML file to reflect the correct relative path (e.g., JPEGImages/image1.jpg
) or removed it if unnecessary.<folder>
tag is standardized (e.g., set to VOC
) or removed for compatibility.Created Train-Val-Test Splits:
train.txt
, val.txt
, and test.txt
files in the ImageSets/Main
directory.Validated Class Distribution:
This dataset contains the following object classes:
Class Name (English) | Class Name (Chinese) | Abbreviation |
---|---|---|
Manhole Cover | 井盖 (jg) | jg |
Crossing Light | 人行灯 (rxd) | rxd |
Pipeline Indicating Pile | 地下管线桩 (dxgx) | dxgx |
Traffic Signs | 指示牌 (zsp) | zsp |
Hydrant | 消防栓 (xfs) | xfs |
Camera | 电子眼 (dzy) | dzy |
Traffic Light | 红绿灯 (lhd) | lhd |
Guidepost | 街道路名牌 (jdp) | jdp |
Traffic Warning Sign | 警示牌 (jsp) | jsp |
Streetlamp | 路灯 (ld) | ld |
Communication Box | 通讯箱 (txx) | txx |