Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Tomato (Minimize False Positive) is a dataset for object detection tasks - it contains Tomato PjmX annotations for 206 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThis dataset was created by Feres Kordani
Facebook
TwitterDetection results with different object detection networks.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This collection contains the object detection results of 9 architectures found in the Detectron2 library, for the MS COCO val2017 dataset, under different compresion level Q = 1, 2, …, 100. The stored results include all detections above 0.5 confidence score threshold, and allows for re-calculation of the performance metrics. There are 9 per-model archive files, and each file contains 100 subfolders named evaluator_dump_, with results for a particular compression quality for that model. Each folder contains the following files: results.json.gz - summarized performance metrics, overall and per-class coco_instances_results.json.gz - detailed results for each image, with object classes and bounding boxes. The last file, baseline_05.tar.gz contains 9 folders, per model, with the same structure as above, only obtained using the original image quality. Supplementary data: counts_vs_Tc_by_Q.pdf – a PDF with multiple plots of object counts (TP, FP, EX), for every compression quality Q. PRF1_vs_Tc_by_Q.pdf – a PDF with multiple plots of Precision, Recall and F1-score (PPV, TPR, F1), for every compression quality Q. rate_ssim_byQ.tar.gz – archive with JSON files containing image information (quality metrics) for every quality, for every image in COCO val2017.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This collection contains the trained models and object detection results of 2 architectures found in the Detectron2 library, on the MS COCO val2017 dataset, under different JPEG compresion level Q = {5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96} (14 levels per trained model). Architectures: F50 – Faster R-CNN on ResNet-50 with FPN R50 – RetinaNet on ResNet-50 with FPN Training type: D2 – Detectron2 Model ZOO pre-trained 1x model (90.000 iterations, batch 16) STD – standard 1x training (90.000 iterations) on original train2017 dataset Q20 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=20 Q40 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=40 T20 – extra 1x training on top of D2 on train2017 dataset degraded to Q=20 T40 – extra 1x training on top of D2 on train2017 dataset degraded to Q=40 Model and metrics files models_FasterRCNN.tar.gz (F50-STD, F50-Q20, …) models_RetinaNet.tar.gz (R50-STD, R50-Q20, …) For every model there are 3 files: config.yaml – the Detectron2 config of the model. model_final.pth – the weights (training snapshot) in PyTorch format. metrics.json – training metrics (like time, total loss, etc.) every 20 iterations. The D2 models were not included, because they are available from the Detectron2 Model ZOO, as faster_rcnn_R_50_FPN_1x (F50-D2) and retinanet_R_50_FPN_1x (R50-D2). Result files F50-results.tar.gz – results for Faster R-CNN models (inluding D2). R50-results.tar.gz – results for RetinaNet models (inluding D2). For every model there are 14 subdirectories, e.g. evaluator_dump_R50x1_005 through evaluator_dump_R50x1_096, for each of the JPEG Q values. Each such folder contains: coco_instances_results.json – all detected objects (image id, bounding box, class index and confidence). results.json – AP metrics as computed by COCO API. Source code for processing the data The data can be processed using our code, published at: https://github.com/tgandor/urban_oculus. Additional dependencies for the source code: COCO API Detectron2
Facebook
TwitterDiverse License Plate Detection - A combination of images from multiple LPD datasets along with negative samples to make the dataset training-ready. The images are from a large number of datasets from Kaggle as well as general internet sources, as will be listed below.
Purpose: Training a YOLOv8 object detection model for license-plate detection. See https://github.com/ultralytics/ultralytics for pre-trained models.
Data: There are about 3100 images of cars with tagged license plates, on the conditions that the plate has at least one recognizable character and is not seen through the glass of another car (or reflection for that matter). There are also about 1500 images which either do not contain cars at all, or contain cars but in such an angle or resolution that their license plate is not detectable. This is to address false positives which I have encountered when using some of the open-source models found on huggingface (ones that were only trained on images with at least one license-plate in them) Update 24/03/2024: Dataset contains 5500 images, see Updates for specifications.
Updates: Update 24/03/2024: - After using a different open-source model from huggingface trained on a dataset from roboflow on yolo5 (dataset in list here) we encountered two items of note. - A. The system yielded much better results when using a Yolo8 model to first identify vehicles, and then detect plates on them - B. The dataset on which the yolov5 model was trained on contained almost no negative examples, and contained a lot of duplications. - Added some images from a potholes dataset (listed in Sources) to help with disassociate roads with plates, and also get a few more car images. - Added images from the roboflow dataset which included images from a parking garage, allowing for a different angle. - Added images from a parking sign dataset, few with car images but mostly to disassociate street and road signs from plates found on actual cars. - Trained a Yolo8 model on the dataset, but did not run in production and was only really for testing. However the class loss decreased at a much faster rate than when trained with a much larger dataset consisting of very similar (quality, size, locality) positive images, even by epoch 10. - Current plan moving forward: Run transfer learning on Yolo9 with this dataset for somewhere between 100 and 200 epochs (yay for Programmable Gradient Information) and post some of the model results here.
Update 25/03/2024 Training update Split dataset into Train, Validation, Test by ratio of 80, 10, 10. Trained using the ultralytics library with imgsz=640, batch=16, dropout=0.2 for 150 epochs Validation plateaued after about 110 epochs. Out of 100 test photos from real data there were about 5 false positives and 5 false negatives when using this model only, and 2 false positives + 1 false negative when first processing the image through YOLOv8x and filtering for [car, truck, motorcycle, bus]. Currently runs in production on Torchserve. False positives: Rectangles with lines (like grates) False negatives: Very low quality images, or images in which the plate was already very small and so disappeared when image was resized to 640.
Diversity: The datasets sampled for this "diverse" one contain plates from a variety of countries, but mainly focusing on the ones from this site which are: - European (+Russian) - American - Middle Eastern - Indian
Sources: (In no particular order) - https://www.kaggle.com/datasets/amriteshtiwari20/truck-licenseplate-dataset - https://www.kaggle.com/datasets/gaelcohen/license-plate-israel - https://www.kaggle.com/datasets/kedarsai/indian-license-plates-with-labels - https://www.kaggle.com/datasets/mohamedalitrabelsi/tunisiania-license-plate-detection - https://www.kaggle.com/datasets/mrabduqayum/license-plate-detection-yolov5 - https://www.kaggle.com/datasets/akhiljethwa/forest-vs-desert - https://www.kaggle.com/datasets/balraj98/stanford-background-dataset - https://www.kaggle.com/datasets/mikhailma/house-rooms-streets-image-dataset - https://www.kaggle.com/datasets/paulchambaz/google-street-view - https://www.kaggle.com/datasets/pcmill/license-plates-on-vehicles - https://www.kaggle.com/datasets/psvishnu/pennfudan-database-for-pedestrian-detection-zip - https://www.kaggle.com/datasets/rayanechibani/dataset - https://www.kaggle.com/datasets/yellowj4acket/car-license-plate-detection-yolo-orig-by-larxel - https://www.kaggle.com/datasets/amirhoseinahmadnejad/car-license-plate-detection-iran - https://www.kaggle.com/datasets/aritrag/license - https://www.kaggle.com/datasets/marshah/license-plate-persian-coco - https://www.kaggle.com/datasets/narimanjabbar/dataset-iraq-license-plate - https://www.kaggle.com/datasets/nuralitileuov/car-license-plate - https://www.kaggle.com/datasets/saisirishan/indian-vehicle-dataset - https://www.kaggle.com/datasets/sushankg...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was developed to support research on object detection and recognition, focusing on items forgotten inside vehicles. It captures a diverse range of real-world scenarios under different lighting conditions, both indoors and outdoors, to ensure robustness and applicability in various analytical tasks.
The collection contains 971 high-quality images featuring everyday objects such as: 0 - smartphone, 1 - laptop, 2 - card, 3 - suitcase, 4 - wallet, 5 - backpack, 6 - clothing, 7 - keys, 8 - glasses, 9 - handbag.
The dataset is organized into two main directories (inside leftincar-data.zip):
➔ images/ – contains all visual samples. ➔ labels/ – includes YOLO-format annotation files (.txt), one per image. Images without annotations correspond to negative samples (no objects present).
An additional Python script, yolo_dataset_splitter.py, is provided to automate the division of the dataset into training, validation, and testing subsets. The script ensures that all images are included in the output, creating empty label files where necessary for full YOLO compatibility.
Facebook
TwitterExperimental results of the object detection task on the COCO dataset.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Containing subfolders: faster_rcnn_r50_fpn_adhd/ and faster_rcnn_r50_gfap/. Both of them contain the training configuration, loss curves, metrics and evaluations on each individual test set against the GT annotations. We share these results for reproducibility.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Significant advancements in object detection have transformed our understanding of everyday applications. These developments have been successfully deployed in real-world scenarios, such as vision surveillance systems and autonomous vehicles. Object recognition technologies have evolved from traditional methods to sophisticated, modern approaches. Contemporary object detection systems, leveraging high accuracy and promising results, can identify objects of interest in images and videos. The ability of Convolutional Neural Networks (CNNs) to emulate human-like vision has garnered considerable attention. This study provides a comprehensive review and evaluation of CNN-based object detection techniques, emphasizing the advancements in deep learning that have significantly improved model performance. It analyzes 1-stage, 2-stage, and hybrid approaches for object recognition, localization, classification, and identification, focusing on CNN architecture, backbone design, and loss functions. The findings reveal that while 2-stage and hybrid methods achieve superior accuracy and detection precision, 1-stage methods offer faster processing and lower computational complexity, making them advantageous in specific real-time applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of small-scale objects on the DUO dataset.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Wildfire image and video dataset designed for image classification and object detection tasks, aiding in wildfire prevention through advanced AI models. The dataset includes both processed and unprocessed images and videos, capturing diverse wildfire scenarios. It is a work in progress and will be continuously monitored and updated to ensure its relevance and usefulness for research and practical applications.
Usage: Training and Testing Neural Network models.
Structure: 1- Images: a. Flame b. Smoke c. Negative Samples d. Waterdogs e. NIR Fire f. NIR no Fire g. Unlabeled 2- Videos: a. Flame/Smoke b. Negative Samples c. Waterdogs d. Ember e. Unlabeled 3- Training: a. Training i. Flame ii. Smoke iii. Negative Samples b. Validation: i. Flame ii. Smoke iii. Negative Samples How it was collected and processed: GWFP was manually collected for publicly available sources. It will be published upon paper approval. The dataset has processed and unprocessed files as described in the structure. Processing images includes augmentation and/or resizing to 224x224. All data is continuously maintained and more data will be added.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of annotations for PACO images containing free-form fine-grained textual captions of objects, their parts, and their attributes. It also comprises several sets of negative captions that can be used to test and evaluate the fine-grained recognition ability of open-vocabulary models.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Image Recognition Market Size 2024-2028
The image recognition market size is valued to increase USD 111.45 billion, at a CAGR of 25.49% from 2023 to 2028. Increasing instances of identity threats will drive the image recognition market.
Major Market Trends & Insights
North America dominated the market and accounted for a 36% growth during the forecast period.
By End-user - Media and entertainment segment was valued at USD 9.10 billion in 2022
By Deployment - Cloud-based segment accounted for the largest market revenue share in 2022
Market Size & Forecast
Market Opportunities: USD 471.72 billion
Market Future Opportunities: USD 111.45 billion
CAGR : 25.49%
North America: Largest market in 2022
Market Summary
The market is a dynamic and continually evolving industry, driven by advancements in core technologies such as Deep learning and neural networks. Applications of image recognition span various sectors, including healthcare, retail, and security, with the growing threat of identity theft fueling demand. According to recent studies, the identity theft market is projected to reach a staggering USD20.31 billion by 2023, creating a significant need for robust image recognition solutions. Cloud-based image analysis solutions are gaining popularity due to their cost-effective deployment and scalability. However, the high cost of initial deployment and integration remains a challenge for smaller businesses.
Furthermore, regulatory compliance, particularly in the healthcare sector, adds complexity to the market landscape. Despite these challenges, opportunities abound, with the potential for expansion into new industries and applications. In conclusion, the market is a rapidly evolving industry, shaped by technological advancements, market demands, and regulatory requirements. With the increasing instances of identity threats and the growing popularity of cloud-based image analysis solutions, the market is poised for continued growth and innovation.
What will be the Size of the Image Recognition Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
How is the Image Recognition Market Segmented and what are the key trends of market segmentation?
The image recognition industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
End-user
Media and entertainment
Retail and e-commerce
BFSI
IT and telecom
Others
Deployment
Cloud-based
On-premise
Geography
North America
US
Europe
Germany
APAC
China
India
Japan
Rest of World (ROW)
By End-user Insights
The media and entertainment segment is estimated to witness significant growth during the forecast period.
The market is experiencing significant growth, with the media and entertainment sector leading the charge. This sector's dominance is attributed to the increasing adoption of facial recognition technology in Video Surveillance systems. By analyzing audience engagement at multiplexes, businesses can deliver targeted promotions, enhance visitor experiences, and create an improved ambiance. Facial recognition technology's demand continues to expand globally, fueling the growth of the media and entertainment segment and, in turn, the market as a whole. Moreover, advancements in computer vision techniques, such as pose estimation algorithms, visual search engines, and image segmentation approaches, are driving innovation in image recognition.
GPU accelerated computing, Machine learning libraries, and deep learning frameworks are enabling more sophisticated image processing capabilities. These advancements are also leading to improvements in accuracy metrics evaluation, pattern recognition systems, and object detection algorithms. Additionally, synthetic image generation, false positive reduction, and multimodal image analysis are gaining traction in the market. These technologies enhance image quality assessment, transfer learning models, and data augmentation strategies. Furthermore, the integration of Video Analytics platforms, model optimization techniques, and real-time image processing is enabling more efficient and effective image recognition applications. The market's dynamism is also evident in the emergence of Edge Computing solutions, object tracking systems, semantic image understanding, and image retrieval systems.
Cloud-based image processing, low-light image enhancement, and false negative minimization are other areas of ongoing development. Overall, the market is continuously evolving, with numerous applications across various sectors, including healthcare, retail, and security. According to recent studies, the market is expected to grow by 25.3% in the next y
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Smoke and fire detection technology is a key technology for automatically realizing forest monitoring and forest fire warning. One of the most popular algorithms for object detection tasks is YOLOv5. However, it suffers from some challenges, such as high computational load and limited detection performance. This paper proposes a high-performance lightweight network model for detecting forest smoke and fire based on YOLOv5 to overcome these problems. C3Ghost and Ghost modules are introduced into the Backbone and Neck network to achieve the purpose of reducing network parameters and improving the feature’s expressing performance. Coordinate Attention (CA) module is introduced into the Backbone network to highlight the object’s important information about smoke and fire and to suppress irrelevant background information. In Neck network part, in order to distinguish the importance of different features in feature fusing process, the weight parameter of feature fusion is added which is based on PAN (path aggregation network) structure, which is named PAN-weight. Multiple sets of controlled experiments were conducted to confirm the proposed method’s performance. Compared with YOLOv5s, the proposed method reduced the model size and FLOPs by 44.75% and 47.46% respectively, while increased precision and mAP(mean average precision)@0.5 by 2.53% and 1.16% respectively. The experimental results demonstrated the usefulness and superiority of the proposed method. The core code and dataset required for the experiment are saved in this article at https://github.com/vinchole/zzzccc.git.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data was constructed for detecting window and blind states. All images were annotated in XML format using LabelImg for object detection tasks. The results of applying the Faster R-CNN based model include detected images and loss graphs for both training and validation in this dataset. Additionally, the raw data with other annotations can be used for applications such as semantic segmentation and image captioning.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The VHR-10 dataset is a collection of Very High Resolution remote sensing images of 10 classes, provided by Northwestern Polytechnical University (NWPU) in China.
The dataset is divided into two sets:
The positive image set includes objects from the following ten classes:
It includes object detection bounding boxes and instance segmentation masks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the rapid development of ocean observation technology, underwater object detection has begun to occupy an essential position in the fields of aquaculture, environmental monitoring, marine science, etc. However, due to the problems unique to underwater images such as severe noise, blurred objects, and multi-scale, deep learning-based target detection algorithms lack sufficient capabilities to cope with these challenges. To address these issues, we improve DETR to make it well suited for underwater scenarios. First, a simple and effective learnable query recall mechanism is proposed to mitigate the effect of noise and can significantly improve the detection performance of the object. Second, for underwater small and irregular object detection, a lightweight adapter is designed to provide multi-scale features for the encoding and decoding stages. Third, the regression mechanism of the bounding box is optimized using the combination loss of smooth L1 and CIoU. Finally, we validate the designed network against other state-of-the-art methods on the RUOD dataset. The experimental results show that the proposed method is effective.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset holds the trained deep learning models for our paper "Object detection neural network improves Fourier ptychography reconstruction". The results produced in the paper can be replicated through the use of these models in conjunction with the inference scripts provided in our GitHub repository: External Link. Abstract High resolution microscopy is heavily dependent on superb optical elements and superresolution microscopy even more so. Correcting unavoidable optical aberrations during post-processing is an elegant method to reduce the optical system’s complexity. A prime method that promises superresolution, aberration correction, and quantitative phase imaging is Fourier ptychography. This microscopy technique combines many images of the sample, recorded at differing illumination angles akin to computed tomography and uses error minimisation between the recorded images with those generated by a forward model. The more precise knowledge of those illumination angles is available for the image formation forward model, the better the result. Therefore, illumination estimation from the raw data is an important step and supports correct phase recovery and aberration correction. Here, we derive how illumination estimation can be cast as an object detection problem that permits the use of a fast convolutional neural network (CNN) for this task. We find that faster-RCNN delivers highly robust results and outperforms classical approaches by far with an up to 3-fold reduction in estimation errors. Intriguingly, we find that conventionally beneficial smoothing and filtering of raw data is counterproductive in this type of application. We present a detailed analysis of the network’s performance and provide all our developed software openly.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Most object detection models deployed on unmanned aerial systems (UAS) focus on identifying objects in the visible spectrum, also known as Red-Green-Blue (RGB) imagery, due to availability and cost. Fusing long wave infrared (LWIR) images with RGB imagery can increase the resiliency and overall performance of a machine learning (ML) object detection model in changing environments. Because LWIR based ML models are not commonly utilized or studied there exists a gap in the literature that discusses performance metrics between LWIR, RGB and LWIR-RGB fused ML object detection models from various ground and air collection platforms. Understanding multispectral ML object detection performance from UAS is highly meaningful because of the increasing convergence of ML and UAS technologies. Therefore, the need to establish baseline performance metrics for how certain ML models perform on various ground and air platforms is necessary for effective implementation and deployment of these two technologies. Using object detection results from both ground and air platforms, this research provided a baseline RGB ML model mean average precision (mAP) of 95.1%, a LWIR ML model mAP of 94.5%, and a blended RGB-LWIR ML model mAP of 92.6%. This research contributes to the literature a labelled training dataset of 12,600 images for ground-based and air-based RGB, LWIR, and RGB-LWIR fused imagery, to encourage further multispectral machine-driven object detection research from both air and ground platforms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Tomato (Minimize False Positive) is a dataset for object detection tasks - it contains Tomato PjmX annotations for 206 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).