Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.
More information about the project here.
Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.
Contents:
Weights File (neuralNetWeights_V3.pth):
Format: .pth
Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.
Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):
Format: .zip
Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.
Contents:
Images: JPEG format images of micro-FTIR filters.
Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.
Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.
Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.
Usage Notes:
The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.
The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.
Code can be found on the related Github repository.
Here We Grow is a community learning project for the SingularityNet blockchain.
Our group set out to synthesize labeled data for computer vision tasks. Adam Kelly, from Immersive Limit, found a great way of doing this: Microsoft's Common Objects in Context's annotations (Coco), MatterPort's Mask R-CNN for image segmentation and the GIMP photo editor. This dataset is for a Kaggle implementation of the Udemy course materials from "Complete Guide to Creating COCO Datasets" by Adam Kelly.
Course can be found here: https://www.immersivelimit.com/courses
The MatterPort Mask R-CNN github repository: https://github.com/matterport/Mask_RCNN CocoSynth data by Adam Kelly : https://www.kaggle.com/stargarden/cocosynth-for-here-we-grow
These notebooks are from the course: Training and inference notebook : https://www.kaggle.com/stargarden/m-r-cnn-matterport-1 Coco image viewer notebook: https://www.kaggle.com/stargarden/coco-image-viewer
Almost all of this code is borrowed from Udemy course - "Complete Guide to Creating COCO Datasets" Immersive Limit and MatterPort's mask RCNN
https://github.com/matterport/Mask_RCNN/blob/master/LICENSE
Hopefully this can hyper charge some deep learning projects and contests. The ability to bootstrap extra images from the original dataset is pretty powerful.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Most semantic segmentation works have obtained accurate segmentation results through exploring the contextual dependencies. However, there are several major limitations that need further investigation. For example, most approaches rarely distinguish different types of contextual dependencies, which may pollute the scene understanding. Moreover, local convolutions are commonly used in deep learning models to learn attention and capture local patterns in the data. These convolutions operate on a small neighborhood of the input, focusing on nearby information and disregarding global structural patterns. To address these concerns, we propose a Global Domain Adaptation Attention with Data-Dependent Regulator (GDAAR) method to explore the contextual dependencies. Specifically, to effectively capture both the global distribution information and local appearance details, we suggest using a stacked relation approach. This involves incorporating the feature node itself and its pairwise affinities with all other feature nodes within the network, arranged in raster scan order. By doing so, we can learn a global domain adaptation attention mechanism. Meanwhile, to improve the features similarity belonging to the same segment region while keeping the discriminative power of features belonging to different segments, we design a data-dependent regulator to adjust the global domain adaptation attention on the feature map during inference. Extensive ablation studies demonstrate that our GDAAR better captures the global distribution information for the contextual dependencies and achieves the state-of-the-art performance on several popular benchmarks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Motivation: I have to count all the chickens every day to check if they are all in the hencoop. As it's hard to count them in the nest, I decided to count them using CV. I will be placing the camera above the door, thus most of the photos are taken from an overhead angle.
You can find the annotation file, named _annotations.coco.json
, in the train folder.
I did not split the dataset here and the annotation format is COCO, but you can find the other versions and formats here: https://universe.roboflow.com/training-kuvo9/chicken-counter
Dataset: Size: 375 Image shape: 640x640x3 Annotation format: COCO (JSON)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This synthetic Siberian Larch tree crown dataset was created for upscaling and machine learning purposes as a part of the SiDroForest (Siberia Drone Forest Inventory) project. The SiDroForest data collection (https://www.pangaea.de/?q=keyword%3A%22SiDroForest%22) consists of vegetation plots covered in Siberia during a 2-month fieldwork expedition in 2018 by the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research in Germany. During fieldwork fifty-six, 50*50-meter vegetation plots were covered by Unmanned Aerial Vehicle (UAV) flights and Red Green Blue (RGB) and Red Green Near Infrared (RGNIR) photographs were taken with a consumer grade DJI Phantom 4 quadcopter. The synthetic dataset provided here contains Larch (Larix gmelinii (Rupr.) Rupr. and Larix cajanderi Mayr.) tree crowns extracted from the onboard camera RGB UAV images of five selected vegetation plots from this expedition, placed on top of full-resized images from the same RGB flights. The extracted tree crowns have been rotated, rescaled and repositioned across the images with the result of a diverse synthetic dataset that contains 10.000 images for training purposes and 2000 images for validation purposes for complex machine learning neural networks. In addition, the data is saved in the Microsoft's Common Objects in Context dataset (COCO) format (Lin et al.,2013) and can be easily loaded as a dataset for networks such as the Mask R-CNN, U-Nets or the Faster R-NN. These are neural networks for instance segmentation tasks that have become more frequently used over the years for forest monitoring purposes. The images included in this dataset are from the field plots: EN18062 (62.17° N 127.81° E), EN18068 (63.07° N 117.98° E), EN18074 (62.22° N 117.02° E), EN18078 (61.57° N 114.29° E), EN18083 (59.97° N 113° E), located in Central Yakutia, Siberia. These sites were selected based on their vegetation content, their spectral differences in color as well as UAV flight angles and the clarity of the UAV images that were taken with automatic shutter and white balancing (Brieger et al. 2019). From each site 35 images were selected in order of acquisition, starting at the fifteenth image in the flight to make up the backgrounds for the dataset. The first fifteen images were excluded because they often contain a visual representation of the research team. The 117 tree crowns were manually cut out in Gimp software to ensure that they were all Larix trees.Of the tree crowns,15% were included that are at the margin of the image to make sure that the algorithm does not rely on a full tree crown in order to detect a tree. As a background image for the extracted tree crowns, 35 raw UAV images for each of the five sites were selected were included. The images were selected based on their content. In some of the UAV images, the research teams are visible and those have been excluded from this dataset. The five sites were selected based on their spectral diversity, and their vegetation content. The raw UAV images were cropped to 640 by 480 pixels at a resolution of 72 dpi. These are later rescaled to 448 by 448 pixels in the process of the dataset creation. In total there were 175 cropped backgrounds. The synthetic images and their corresponding annotations and masks were created using the cocosynth python software provided by Adam Kelly (2019). The software is open source and available on GitHub: https://github.com/akTwelve/cocosynth. The software takes the tree crowns and rescales and transform them before placing up to three tree crowns on the backgrounds that were provided. The software also creates matching masks that are used by instance segmentation and object detection algorithms to learn the shapes and location of the synthetic crown. COCO annotation files with information about the crowns name and label are also generated. This format can be loaded into a variety of neural networks for training purposes.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
New Version 2: It is the largest high quality (min size of 400x400) dataset as far as we know (01/2023).
The dataset called RIWA represents a pixel-wise binary river water segmentation. It consist of manually labelled smartphone, drone and DSLR images of rivers as well as suiting images of the Water Segmentation Dataset and high quality AED20K images. The COCO dataset was withdrawn since the segmentation quality is extremely poor.
Version 2: (declared as Version 4 by kaggle) - contains 1142 training, 167 validation and 323 test images. - Min size: 400 x 400 (h x w) - High quality segmentations. If you find an error, please message us.
Version 1: - contains 789 training, 228 validation and 111 test images. - Min size: 174 x 200 (hxw) - Some segmentations are not perfect.
If you use this dataset, please cite as:
@misc{RIWA_Dataset,
title={River Water Segmentation Dataset (RIWA)},
url={https://www.kaggle.com/dsv/4901781},
DOI={10.34740/KAGGLE/DSV/4901781},
publisher={Kaggle},
author={Xabier Blanch and Franz Wagner and Anette Eltner},
year={2023}
}
Contact: - Xabier Blanch, TU Dresden see at SCIENTIFIC STAFF - Franz Wagner, TU Dresden - Anette Eltner, TU Dresden
In 2023, we carried out a comparison to find the best CNN on this domain. If you are interested, please see our paper: River water segmentation in surveillance camera images: A comparative study of offline and online augmentation using 32 CNNs.
We conducted the tests using the AiSeg GitLab repository. It is capable of interactively train 2D and 3D CNNs, augmenting data with offline and online augmentation, analyzing single networks, comparing multiple networks, and applying trained CNNs to new data. The RIWA dataset can be used directly.
The handling of natural disasters, especially heavy rainfall and corresponding floods, requires special demands on emergency services. The need to obtain a quick, efficient and real-time estimation of the water level is critical for monitoring a flood event. This is a challenging task and usually requires specially prepared river sections. In addition, in heavy flood events, some classical observation methods may be compromised.
With the technological advances derived from image-based observation methods and segmentation algorithms based on neural networks (NN), it is possible to generate real-time, low-cost monitoring systems. This new approach makes it possible to densify the observation network, improving flood warning and management. In addition, images can be obtained by remotely positioned cameras, preventing data loss during a major event.
The workflow we have developed for real-time monitoring consists of the integration of 3 different techniques. The first step consists of a topographic survey using Structure from Motion (SfM) strategies. In this stage, images of the area of interest are obtained using both terrestrial cameras and UAV images. The survey is completed by obtaining ground control point coordinates with multi-band GNSS equipment. The result is a 3D SfM model georeferenced to centimetre accuracy that allows us to reconstruct not only the river environment but also the riverbed.
The second step consists of segmenting the images obtained with a surveillance camera installed ad hoc to monitor the river. This segmentation is achieved with the use of convolutional neural networks (CNN). The aim is to automatically segment the time-lapse images obtained every 15 minutes. We have carried out this research by testing different CNN to choose the most suitable structure for river segmentation, adapted to each study area and at each time of the day (day and night).
The third step is based on the integration between the automatically segmented images and the 3D model acquired. The CNN-segmented river boundary is projected into the 3D SfM model to obtain a metric result of the water level based on the point of the 3D model closest to the image ray.
The possibility of automating the segmentation and reprojection in the 3D model will allow the generation of a robust centimetre-accurate workflow, capable of estimating the water level in near real time both day and night. This strategy represents the basis for a better understanding of river flo...
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Existing Earth Vision datasets are either suitable for semantic segmentation or object detection. iSAID is the first benchmark dataset for instance segmentation in aerial images. This large-scale and densely annotated dataset contains 655,451 object instances for 15 categories across 2,806 high-resolution images. The distinctive characteristics of iSAID are the following: (a) large number of images with high spatial resolution, (b) fifteen important and commonly occurring categories, (c) large number of instances per category, (d) large count of labelled instances per image, which might help in learning contextual information, (e) huge object scale variation, containing small, medium and large objects, often within the same image, (f) Imbalanced and uneven distribution of objects with varying orientation within images, depicting real-life aerial conditions, (g) several small size objects, with ambiguous appearance, can only be resolved with contextual reasoning, (h) precise instance-level annotations carried out by professional annotators, cross-checked and validated by expert annotators complying with well-defined guidelines.
The images of iSAID is the same as the DOTA-v1.0 dataset, which are manily collected from the Google Earth, some are taken by satellite JL-1, the others are taken by satellite GF-2 of the China Centre for Resources Satellite Data and Application.
Use of the images from Google Earth must respect the corresponding terms of use: "Google Earth" terms of use.
All images and their associated annotations in iSAID can be used for academic purposes only, but any commercial use is prohibited.
Object Category The object categories in iSAID include: plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout, soccer ball field and swimming pool.
Annotation format The iSAID uses pixel-level annotations. Each pixel represents a particular class. The annotation follows the format of MS COCO.
@inproceedings{waqas2019isaid, title={iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images}, author={Waqas Zamir, Syed and Arora, Aditya and Gupta, Akshita and Khan, Salman and Sun, Guolei and Shahbaz Khan, Fahad and Zhu, Fan and Shao, Ling and Xia, Gui-Song and Bai, Xiang}, booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops}, pages={28--37}, year={2019} }
@InProceedings{Xia_2018_CVPR, author = {Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei}, title = {DOTA: A Large-Scale Dataset for Object Detection in Aerial Images}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2018} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
This paper presents SubPipe, an underwater dataset for SLAM, object detection, and image segmentation. SubPipe has been recorded using a lightweight autonomous underwater vehicle (LAUV), operated by OceanScan MST, and carrying a sensor suite including two cameras, a side-scan sonar, and an inertial navigation system, among other sensors. The AUV has been deployed in a pipeline inspection environment with a submarine pipe partially covered by sand. The AUV's pose ground truth is estimated from the navigation sensors. The side-scan sonar and RGB images include object detection and segmentation annotations, respectively. State-of-the-art segmentation, object detection, and SLAM methods are benchmarked on SubPipe to demonstrate the dataset's challenges and opportunities for leveraging computer vision algorithms.To the authors' knowledge, this is the first annotated underwater dataset providing a real pipeline inspection scenario. The dataset and experiments are publicly available online.
On Zenodo we provide three versions for SubPipe. One is the full version (SubPipe.zip, ~80GB unzipped) and two subsamples: SubPipeMini.zip, ~12GB unzipped and SubPipeMini2.zip, ~16GB unzipped. Both subsamples are only parts of the entire dataset (SubPipe.zip). SubPipeMini is a subset, containing semantic segmentation data, and it has interesting camera data of the underwater pipeline. On the other hand, SubPipeMini2 is mainly focused on underwater side-scan sonar images of the seabed including ground truth object detection bounding boxes of the pipeline.
For (re-)using/publishing SubPipe, please include the following copyright text:
SubPipe is a public dataset of a submarine outfall pipeline, property of Oceanscan-MST. This dataset was acquired with a Light Autonomous Underwater Vehicle by Oceanscan-MST, within the scope of Challenge Camp 1 of the H2020 REMARO project.
More information about OceanScan-MST can be found at this link.
Cam0 — GoPro Hero 10
Camera parameters:
Resolution: 1520×2704
fx = 1612.36
fy = 1622.56
cx = 1365.43
cy = 741.27
k1,k2, p1, p2 = [−0.247, 0.0869, −0.006, 0.001]
Side-scan Sonars
Each sonar image was created after 20 “ping” (after every 20 new lines) which corresponds to approx. ~1 image / second.
Regarding the object detection annotations, we provide both COCO and YOLO formats for each annotation. A single COCO annotation file is provided per each chunk and per each frequency (low frequency vs. high frequency), whereas the YOLO annotations are provided for each SSS image file.
Metadata about the side-scan sonar images contained in this dataset:
Images for object detection
5000
LF image size: 2500 × 500
5030
HF Image size 5000 × 500
Total number of images: 10030
Annotations
3163
3172
Total number of annotations: 6335
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Objectives: The objective of the research was to use hyperspectral imaging (HSI) to detect thermal damage induced in vital organs (such as the liver, pancreas, and stomach) during laser thermal therapy. The experimental study was conducted during thermal ablation procedures on live pigs.
Ethical Approval: The experiments were performed at the Institute for Image Guided Surgery in Strasbourg, France. This experimental study was approved by the local Ethical Committee on Animal Experimentation (ICOMETH No. 38.2015.01.069) and by the French Ministry of Higher Education and Research (protocol №APAFiS-19543-2019030112087889, approved on March 14, 2019). All animals were treated in accordance with the ARRIVE guidelines, the French legislation on the use and care of animals, and the guidelines of the Council of the European Union (2010/63/EU).
Description: During our experimental study, we used a TIVITA hyperspectral camera to acquire hypercubes of size 640x480x100 voxels, indicating 640x480 pixels for 100 bands, and regular RGB images at each acquisition step. These bands were acquired directly from the hyperspectral camera without additional pre-processing. The hypercube was acquired in approximately 6 seconds and synchronized with the absence of breathing motion using a protocol implemented for animal anesthesia. Polyurethane markers were placed around the target area to serve as references for superimposing the hyperspectral images, which were acquired using target areas selected according to the hyperspectral camera manufacturer's guidelines.
As part of our investigation, we included hyperspectral cubes from 20 experiments conducted under identical conditions in our study. The hyperspectral cubes were collected in three distinct stages. In the first stage, the cubes were gathered before laparotomy at a temperature of 37°C. In the second stage, we obtained the cubes as the temperature gradually increased from 60°C to 110°C at 10°C intervals. Finally, in the last stage, the cubes were collected after turning off the laser during the post-ablation phase. Thus, we obtained a total of 233 hyperspectral cubes, each consisting of 100 wavelengths, resulting in a dataset of 23,300 two-dimensional images. The temperature changes were recorded, and the “Temperature profile during laser ablation” image illustrates the corresponding profile, highlighting the specific time intervals during which the hyperspectral camera and laser were activated and deactivated. To provide a visual representation of the collected data, we have included several examples of images captured from different organs in the “Examples of ablation areas” figure.
The raw dataset, comprising 233 hyperspectral cubes of 100 wavelengths each, was transformed into 699 single-channel images using PCA and t-SNE decompositions. These images were then divided into training and test subsets and prepared in the COCO object detection format. This COCO dataset can be used for training and testing different neural networks.
Access to the Study: Further information about this study, including curated source code, dataset details, and trained models, can be accessed through the following repositories:
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity.
An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers.
The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
https://arxiv.org/abs/1512.03385
Architecture visualization: http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
https://imgur.com/nyYh5xH.jpg" alt="Resnet">
A pre-trained model has been previously trained on a dataset and contains the weights and biases that represent the features of whichever dataset it was trained on. Learned features are often transferable to different data. For example, a model trained on a large dataset of bird images will contain learned features like edges or horizontal lines that you would be transferable your dataset.
Pre-trained models are beneficial to us for many reasons. By using a pre-trained model you are saving time. Someone else has already spent the time and compute resources to learn a lot of features and your model will likely benefit from it.
This dataset supports Ye et al. 2024 Nature Communications. Please cite this dataset and paper if you use this resource. Please also see Ye et al. 2024 for the full DataSheet that accompanies this download, including the meta data for how to use this data is you want to compare model results on benchmark tasks. Below is just a summary. Also see the dataset licensing below.
It consists of being trained together on the following datasets:
https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" target="_blank" rel="noopener">Here is an image with a keypoint guide.
• No experimental data was collected for this model; all datasets used are cited above.
• Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2024 for our Supplementary Note on annotator bias). You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle. We recommend if performance of a model trained on this data is not as good as you need it to be, first try video adaptation (see Ye et al. 2024), or fine-tune the weights with your own labeling.
Modified MIT.
Copyright 2023-present by Mackenzie Mathis, Shaokai Ye, and contributors.
Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive,
and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”)
to use the "DATASET" subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial
portions of the Software:
This data or resulting software may not be used to harm any animal deliberately.
LICENSEE acknowledges that the DATASET is a research tool.
THE DATASET IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATASET OR THE USE OR OTHER DEALINGS IN THE DATASET.
If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis
(mackenzie@post.harvard.edu) for a commercial use license.
Please cite Ye et al if you use this DATASET in your work.
Versioning Note:
- V2 includes fixes to Stanford Dog data; it affected less than 1% of the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We evaluated our AHOD model using two well-known datasets in the field of object detection:COCO (Common Objects in Context)One of the most widely used benchmarks for object detection.Contains over 200,000 images and more than 80 object categories.Includes objects in varied and sometimes cluttered contexts, allowing the robustness of detectors to be evaluated.Pascal VOCAnother reference dataset, often used for classification, detection and segmentation tasks.Includes 20 object categories, with precise bounding box annotations.Less complex than COCO, but useful for comparing performance on more conventional objects.Tools, techniques and innovations usedThe AHOD architecture is based on three main modules:Feature Pyramid Enhancement (FPE)Multi-scale feature processing tool.Improves the representation of objects of various sizes in the same image.Inspired by architectures such as FPN (Feature Pyramid Networks), but optimised for better performance.Dynamic Context Module (DCM)Intelligent contextual module.Capable of dynamically adjusting the extracted features according to the context (e.g. by adapting the features according to urban or rural areas in a road image).Enhances the model's ability to understand the overall context of the scene.Fast and Accurate Detection Head (FADH)Optimised detection head.Seeks a compromise between the speed of YOLO and the accuracy of Faster R-CNN.Probably uses lightweight convolution layers or optimisations such as MobileNet/Depthwise Convolutions.Probable technologies usedAlthough the summary does not specify this, we can reasonably assume that the following tools are used:Deep learning frameworks: PyTorch or TensorFlow, which are standard in object detection research.GPUs for training and inference, particularly for measuring inference times (essential in real-time applications).Standard evaluation techniques:mAP (mean Average Precision): measure of average precision.FPS (Frames Per Second) or inference time for real-time performance.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.
More information about the project here.
Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.
Contents:
Weights File (neuralNetWeights_V3.pth):
Format: .pth
Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.
Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):
Format: .zip
Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.
Contents:
Images: JPEG format images of micro-FTIR filters.
Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.
Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.
Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.
Usage Notes:
The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.
The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.
Code can be found on the related Github repository.