Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This a Lego bricks image dataset that is annotated in a PASCAL VOC format ready for ML object detection pipeline. Additionally I made tutorials on how to: - Generate synthetic images and create bounding box annotations in Pascal VOC format using Blender. - Train ML models (YoloV5 and SSD) for detecting multiple objects in an image. The tutorial with Blender scripts for rendering the dataset and Jupyter notebooks for training ML models can be found here: https://github.com/mantyni/Multi-object-detection-lego
Dataset contains: Lego Brick images in JPG format, 300x300 resolution Annotations in PASCAL-VOC format There's 6 Lego bricks in this dataset, each appearing approximately 600 times across the dataset: brick_2x2, brick_2x4, brick_1x6, plate_1x2, plate_2x2, plate_2x4
Lego brick 3D models obtained from: Mecabricks - https://www.mecabricks.com/
First 500 images are of individual Lego bricks rendered in different angles and backgrounds. Images afterwards are of multiple bricks. Each image is rendered using different backgrounds, brick colour and shadow variations to enable Sim2Real transfer. After training ML (YoloV5 and SSD) models on synthetic dataset I then tested it on real images achieving ~70% detection accuracy.
The main purpose of this project is to show how to create your own realistic synthetic image datasets for training computer vision models without needing real world data.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Accurate and robust 6DOF (Six Degrees of Freedom) pose estimation is a critical task in various fields, including computer vision, robotics, and augmented reality. This research paper presents a novel approach to enhance the accuracy and reliability of 6DOF pose estimation by introducing a robust method for generating synthetic data and leveraging the ease of multi-class training using the generated dataset. The proposed method tackles the challenge of insufficient real-world annotated data by creating a large and diverse synthetic dataset that accurately mimics real-world scenarios. The proposed method only requires a CAD model of the object and there is no limit to the number of unique data that can be generated. Furthermore, a multi-class training strategy that harnesses the synthetic dataset's diversity is proposed and presented. This approach mitigates class imbalance issues and significantly boosts accuracy across varied object classes and poses. Experimental results underscore the method's effectiveness in challenging conditions, highlighting its potential for advancing 6DOF pose estimation across diverse applications. Our approach only uses a single RGB frame and is real-time. Methods This dataset has been synthetically generated using 3D software like Blender and APIs like Blendeproc.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The MatSim Dataset and benchmark
Synthetic dataset and real images benchmark for visual similarity recognition of materials and textures.
MatSim: a synthetic dataset, a benchmark, and a method for computer vision-based recognition of similarities and transitions between materials and textures focusing on identifying any material under any conditions using one or a few examples (one-shot learning).
Based on the paper: One-shot recognition of any material anywhere using contrastive learning with physics-based rendering
Benchmark_MATSIM.zip: contain the benchmark made of real-world images as described in the paper
MatSim_object_train_split_1,2,3.zip: Contain a subset of the synthetics dataset for images of CGI images materials on random objects as described in the paper.
MatSim_Vessels_Train_1,2,3.zip : Contain a subset of the synthetics dataset for images of CGI images materials inside transparent containers as described in the paper.
*Note: these are subsets of the dataset; the full dataset can be found at:
https://e1.pcloud.link/publink/show?code=kZIiSQZCYU5M4HOvnQykql9jxF4h0KiC5MX
or
https://icedrive.net/s/A13FWzZ8V2aP9T4ufGQ1N3fBZxDF
Code:
Up to date code for generating the dataset, reading and evaluation and trained nets can be found in this URL:https://github.com/sagieppel/MatSim-Dataset-Generator-Scripts-And-Neural-net
Dataset Generation Scripts.zip: Contain the Blender (3.1) Python scripts used for generating the dataset, this code might be odl up to date code can be found here
Net_Code_And_Trained_Model.zip: Contain a reference neural net code, including loaders, trained models, and evaluators scripts that can be used to read and train with the synthetic dataset or test the model with the benchmark. Note code in the ZIP file is not up to date and contains some bugs For the Latest version of this code see this URL
Further documentation can be found inside the zip files or in the paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SynthAer is a dataset consisting of synthetic aerial images with pixel-level semantic annotations from a suburban scene generated using the 3D modelling tool Blender. SynthAer contains three time-of-day variations for each image - one for lighting conditions at dawn, one for midday, and one for dusk.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data-set is a supplementary material related to the generation of synthetic images of a corridor in the University of Melbourne, Australia from a building information model (BIM). This data-set was generated to check the ability of deep learning algorithms to learn task of indoor localisation from synthetic images, when being tested on real images. =============================================================================The following is the name convention used for the data-sets. The brackets show the number of images in the data-set.REAL DATAReal
---------------------> Real images (949 images)
Gradmag-Real -------> Gradmag of real data
(949 images)SYNTHETIC DATASyn-Car
----------------> Cartoonish images (2500 images)
Syn-pho-real ----------> Synthetic photo-realistic images (2500 images)
Syn-pho-real-tex -----> Synthetic photo-realistic textured (2500 images)
Syn-Edge --------------> Edge render images (2500 images)
Gradmag-Syn-Car ---> Gradmag of Cartoonish images (2500 images)=============================================================================Each folder contains the images and their respective groundtruth poses in the following format [ImageName X Y Z w p q r].To generate the synthetic data-set, we define a trajectory in the 3D indoor model. The points in the trajectory serve as the ground truth poses of the synthetic images. The height of the trajectory was kept in the range of 1.5–1.8 m from the floor, which is the usual height of holding a camera in hand. Artificial point light sources were placed to illuminate the corridor (except for Edge render images). The length of the trajectory was approximately 30 m. A virtual camera was moved along the trajectory to render four different sets of synthetic images in Blender*. The intrinsic parameters of the virtual camera were kept identical to the real camera (VGA resolution, focal length of 3.5 mm, no distortion modeled). We have rendered images along the trajectory at 0.05 m interval and ± 10° tilt.The main difference between the cartoonish (Syn-car) and photo-realistic images (Syn-pho-real) is the model of rendering. Photo-realistic rendering is a physics-based model that traces the path of light rays in the scene, which is similar to the real world, whereas the cartoonish rendering roughly traces the path of light rays. The photorealistic textured images (Syn-pho-real-tex) were rendered by adding repeating synthetic textures to the 3D indoor model, such as the textures of brick, carpet and wooden ceiling. The realism of the photo-realistic rendering comes at the cost of rendering times. However, the rendering times of the photo-realistic data-sets were considerably reduced with the help of a GPU. Note that the naming convention used for the data-sets (e.g. Cartoonish) is according to Blender terminology.An additional data-set (Gradmag-Syn-car) was derived from the cartoonish images by taking the edge gradient magnitude of the images and suppressing weak edges below a threshold. The edge rendered images (Syn-edge) were generated by rendering only the edges of the 3D indoor model, without taking into account the lighting conditions. This data-set is similar to the Gradmag-Syn-car data-set, however, does not contain the effect of illumination of the scene, such as reflections and shadows.*Blender is an open-source 3D computer graphics software and finds its applications in video games, animated films, simulation and visual art. For more information please visit: http://www.blender.orgPlease cite the papers if you use the data-set:1) Acharya, D., Khoshelham, K., and Winter, S., 2019. BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS Journal of Photogrammetry and Remote Sensing. 150: 245-258.2) Acharya, D., Singha Roy, S., Khoshelham, K. and Winter, S. 2019. Modelling uncertainty of single image indoor localisation using a 3D model and deep learning. In ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, IV-2/W5, pages 247-254.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data collection is perhaps the most crucial part of any machine learning model: without it being done properly, not enough information is present for the model to learn from the patterns leading to one output or another. Data collection is however a very complex endeavor, time-consuming due to the volume of data that needs to be acquired and annotated. Annotation is an especially problematic step, due to its difficulty, length, and vulnerability to human error and inaccuracies when annotating complex data.
With high processing power becoming ever more accessible, synthetic dataset generation is becoming a viable option when looking to generate large volumes of accurately annotated data. With the help of photorealistic renderers, it is for example possible now to generate immense amounts of data, annotated with pixel-perfect precision and whose content is virtually indistinguishable from real-world pictures.
As an exercise of synthetic dataset generation, the data offered here was generated using the Python API of Blender, with the images rendered through the Cycles raycaster. It represents plausible images representing pictures of chessboard and pieces. The goal is, from those pictures and their annotation, to build a model capable of recognizing the pieces, as well as their positions on the board.
The dataset contains a large amount of synthetic, randomly generated images representing pictures of chess images, taken at an angle overlooking the board and its pieces. Each image is associated with a .json file containing its annotations. The naming convention is that each render is associated with a number X, and that the images and annotations associated with that render are respectively named X.jpg and X.json.
The data has been generated using the Python scripts and .blend file present in this repository. The chess board and pieces models that have been used for those renders are not provided with the code.
Data characteristics :
No distinction has been hard-built between training, validation, and testing data, and is left completely up to the users. A proposed pipeline for the extraction, recognition, and placement of chess pieces is proposed in a notebook added with this dataset.
I would like to express my gratitude for the efforts of the Blender Foundation and all its participants, for their incredible open-source tool which once again has allowed me to conduct interesting projects with great ease.
Two interesting papers on the generation and use of synthetic data, which have inspired me to conduct this project :
Erroll Wood, Tadas Baltrušaitis, Charlie Hewitt (2021) Fake It Till You Make It: Face analysis in the wild using synthetic data alone https://arxiv.org/abs/2109.15102 Salehe Erfanian Ebadi, You-Cyuan Jhang, Alex Zook (2021) PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision https://arxiv.org/abs/2112.09290
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile robots operating in a changing environment (like a household), where it is important to learn new, never seen objects on the fly. This dataset can also be used for other learning use-cases, like instance segmentation or depth estimation. Or where household objects or continual learning are of interest.
Our dataset contains 150,795 unique synthetic images using 25 different household categories with 925 3D models in total. For each of those categories, we generated about 6000 RGB images. In addition, we also provide a corresponding depth, segmentation, and normal image.
The dataset was created with BlenderProc [Denninger et al. (2019)], a procedural pipeline to generate images for deep learning. This tool created a virtual room with randomly textured floors, walls, and a light source with randomly chosen light intensity and color. After that, a 3D model is placed in the resulting room. This object gets customized by randomly assigning materials, including different textures, to achieve a diverse dataset. Moreover, each object might be deformed with a random displacement texture. We use 774 3D models from the ShapeNet dataset [A. X. Chang et al. (2015)] and the other models from various internet sites. Please note that we had to manually fix and filter most of the models with Blender before using them in the pipeline!
For continual learning (CL), we provide two different loading schemes: - Five sequences with five categories each - Twelve sequences with three categories in the first and two in the other sequences.
In addition to the RGB, depth, segmentation, and normal images, we also provide the calculated features of the RGB images (by ResNet50) as used in our RECALL paper. In those two loading schemes, ten percent of the images are used for validation, where we ensure that an object instance is either in the training or the validation set, not in both. This avoids learning to recognize certain instances by heart.
We recommend using those loading schemes to compare your approach with others.
Here we provide three files for download: - HOWS_CL_25.zip [124GB]: This is the original dataset with the RGB, depth, segmentation, and normal images, as well as the loading schemes. It is divided into three archive parts. To open the dataset, please ensure to download all three parts. - HOWS_CL_25_hdf5_features.zip [2.5GB]: This only contains the calculated features from the RGB input by a ResNet50 in a .hdf5 file. Download this if you want to use the dataset for learning and/or want to compare your approach to our RECALL approach (where we used the same features). - README.md: Some additional explanation.
For further information and code examples, please have a look at our website: https://github.com/DLR-RM/RECALL.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition
This repository contains the data synthesis pipeline and synthetic product recognition datasets proposed in [1].
Data Synthesis Pipeline:
We provide the Blender 3.1 project files and Python source code of our data synthesis pipeline pipeline.zip, accompanied by the FastCUT models used for synthetic-to-real domain translation models.zip. For the synthesis of new shelf images, a product assortment list and product images must be provided in the corresponding directories products/assortment/ and products/img/. The pipeline expects product images to follow the naming convention c.png, with c corresponding to a GTIN or generic class label (e.g., 9120050882171.png). The assortment list, assortment.csv, is expected to use the sample format [c, w, d, h], with c being the class label and w, d, and h being the packaging dimensions of the given product in mm (e.g., [4004218143128, 140, 70, 160]). The assortment list to use and the number of images to generate can be specified in generateImages.py (see comments). The rendering process is initiated by either executing load.py from within Blender or within a command-line terminal as a background process.
Datasets:
Table 1: Dataset characteristics.
| Dataset | #images | #products | #instances | labels | translation |
| SG3k | 10,000 | 3,234 | 851,801 | bounding box & generic class¹ | none |
| SG3kt | 10,000 | 3,234 | 851,801 | bounding box & generic class¹ | GroZi-3.2k |
| SGI3k | 10,000 | 1,063 | 838,696 | bounding box & generic class² | none |
| SGI3kt | 10,000 | 1,063 | 838,696 | bounding box & generic class² | GroZi-3.2k |
| SPS8k | 16,224 | 8,112 | 1,981,967 | bounding box & GTIN | none |
| SPS8kt | 16,224 | 8,112 | 1,981,967 | bounding box & GTIN | SKU110k |
Sample Format
A sample consists of an RGB image (i.png) and an accompanying label file (i.txt), which contains the labels for all product instances present in the image. Labels use the YOLO format [c, x, y, w, h].
¹SG3k and SG3kt use generic pseudo-GTIN class labels, created by combining the GroZi-3.2k food product category number i (1-27) with the product image index j (j.jpg), following the convention i0000j (e.g., 13000097).
²SGI3k and SGI3kt use the generic GroZi-3.2k class labels from https://arxiv.org/abs/2003.06800.
Download and Use
This data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].
[1] Strohmayer, Julian, and Martin Kampel. "Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition." International Conference on Computer Analysis of Images and Patterns. Cham: Springer Nature Switzerland, 2023.
BibTeX citation:
@inproceedings{strohmayer2023domain,
title={Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition},
author={Strohmayer, Julian and Kampel, Martin},
booktitle={International Conference on Computer Analysis of Images and Patterns},
pages={239--250},
year={2023},
organization={Springer}
}
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Industry 4.0 advancements offer promising solutions to the challenges faced by high-mix, low-volume (HMLV) foundries, particularly in quality assessment and process automation. This comprehensive dataset was developed to train an image segmentation neural network, aimed at automating the post-processing task of removing sprues and risers from cast parts. By enabling the analysis of diverse part geometries, the approach is designed to address the variability inherent in HMLV foundries, where standardization is difficult due to complex part configurations.
Data for this project consists of three types of images: camera images, synthetic images, and augmented images, all stored in JPEG format. Each sample contains 36 camera, synthetic, and augmented images. Camera images were captured using an Arduino Nicla Vision camera with a default resolution of 240 x 320 pixels. Images are labeled ‘Sample## Natural up/down ##’, with ## denoting the sample and image numbers. Synthetic images were created from raw 3D scan data and rendered in JPEG format using Blender at a default resolution of 1920 x 1080 pixels. They are labeled ‘Sample ## synthetic up/down ##’ Augmented images are synthetic images that have been modified by replacing the original part geometry with a CAD model. They are labeled ‘Sample ##A up/down ##’. The dataset includes both labeled and unlabeled images. The unlabeled set only includes JPEG images, while the labelled set includes JPEG images and their label in .txt format, as well as the .yaml file. A detailed description of the dataset creation is outlined below:
1) Real image creation: To create the real image dataset, each sample was placed on top of a turntable and a photo graph of the sample was taken. Then, the sample was rotated 20 degrees, and a subsequent photograph was taken. This process was repeated until a full rotation of the sample was complete, providing a total of 18 images. This process was then repeated with the object flipped upside down, providing 36 parts per sample, and 1080 images for the real dataset.
2) Synthetic image creation: The synthetic image dataset was created by using an Einscan Pro HD 3D scanner to collect 3D scans of the casted parts. The scans were imported into Blender and wrapped in an aluminum texture resembling the appearance real part. Then, the texture-wrapped part was place either in a blank scene with a black or gray background, or on top of a turntable resembling the real turntable in front of a white. Finally, the same image capture procedure performed on the real dataset was repeated in Blender to produce a total of 1080 synthetic images. All camera angles Fig. 4. Examples of augmented, real, and synthetic images from the dataset. and lighting were modelled to resemble the real images as closely as possible.
3) Augmented image creation: For the augmented image dataset, the same procedure for synthetic images was followed, with the exception of using Creo Parametic to replace the original part geometry with a CAD model of the part prior to importing into Blender. Similarly, 1080 Augmented parts were created.
Practitioners using this dataset have several options to tailor it to their needs. The dataset includes labeled and unlabeled images, with the labeled images organized into a single folder, enabling users to create custom train-validation-test splits. The unlabeled images can be categorized into different classes beyond the three provided in the labeled set (part, sprue, and riser). Additionally, the unlabeled set supports further 2D spatial augmentations, as class locations are specified in .txt files. The labeled images are named in the format ‘Sample XX-jpg.rf.RandomCharacterString,’ reflecting the dataset’s export from the data management platform Roboflow. Labeling can be performed in various platforms, such as Roboflow, CVAT, or Labelbox, providing flexibility in data management.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS) is presented, consisting of 40 scenes and 79,800 renderings together with ground truth depth maps, extrinsic and intrinsic camera parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent (straight) track to yield a total of 1,995 camera views. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of camera focal lengths, random noise for the camera location and rotation parameters and shader modifications of the rail profile. Representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. This primary, synthetic dataset is augmented by a smaller image collection consisting of 320 manually annotated photographs for improved segmentation performance. The specular rail profile represents the most challenging component for MVS reconstruction algorithms, pipelines and neural network architectures, increasing the ambiguity and complexity of the data distribution. RailEnV-PASMVS represents an application specific dataset for railway engineering, against the backdrop of existing datasets available in the field of computer vision, providing the precision required for novel research applications in the field of transportation engineering.
File descriptions
Steps to reproduce
The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices.
import numpy as np
from scipy.spatial.transform import Rotation as R
#The intrinsic matrix K is constructed using the following formulation:
focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm
K = [[focalLengthPixel, 0, dimX/2],
[0, focalPixel, dimY/2],
[0, 0, 1]]
#The rotation vector as provided by Blender was first transformed to a rotation matrix:
r = R.from_euler('xyz', vectR, degrees=True)
matR = r.as_matrix()
#Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system:
R_world2bcam = np.transpose(matR)
#The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is:
R_bcam2cv = np.array([[1, 0, 0],
[0, -1, 0],
[0, 0, -1]])
#Thus the representation from WORLD to CV/STANDARD coordinates is:
R_world2cv = R_bcam2cv.dot(R_world2bcam)
#The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates:
T_world2bcam = -1 * R_world2bcam.dot(vectC)
T_world2cv = R_bcam2cv.dot(T_world2bcam)
The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao. The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided.
Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I was looking to do some LEGO sorting using object detection (my friend is building the actual sorter, I'm writing the software)
I looked around for labeled datasets, but couldn't find any good ones. The ones I did find were fairly limited (basic parts, not enough variation, black background, no bounding boxes, etc.) (example: https://www.kaggle.com/marwin1665/synthetic-lego-images-images22) (all: https://www.kaggle.com/datasets?search=lego)
So I scripted Blender to generate a synthetic dataset for 600 unique lego parts with multiple parts per image.
I'd love to know if people find this useful or interesting, I can also release the trained PyTorch model as well 😇
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all recorded and hand-annotated as well as all synthetically generated data as well as representative trained networks used for detection and tracking experiments in the replicAnt - generating annotated images of animals in complex environments using Unreal Engine manuscript. Unless stated otherwise, all 3D animal models used in the synthetically generated data have been generated with the open-source photgrammetry platform scAnt peerj.com/articles/11155/. All synthetic data has been generated with the associated replicAnt project available from https://github.com/evo-biomech/replicAnt.
Abstract:
Deep learning-based computer vision methods are transforming animal behavioural research. Transfer learning has enabled work in non-model species, but still requires hand-annotation of example footage, and is only performant in well-defined conditions. To overcome these limitations, we created replicAnt, a configurable pipeline implemented in Unreal Engine 5 and Python, designed to generate large and variable training datasets on consumer-grade hardware instead. replicAnt places 3D animal models into complex, procedurally generated environments, from which automatically annotated images can be exported. We demonstrate that synthetic data generated with replicAnt can significantly reduce the hand-annotation required to achieve benchmark performance in common applications such as animal detection, tracking, pose-estimation, and semantic segmentation; and that it increases the subject-specificity and domain-invariance of the trained networks, so conferring robustness. In some applications, replicAnt may even remove the need for hand-annotation altogether. It thus represents a significant step towards porting deep learning-based computer vision tools to the field.
Benchmark data
Two video datasets were curated to quantify detection performance; one in laboratory and one in field conditions. The laboratory dataset consists of top-down recordings of foraging trails of Atta vollenweideri (Forel 1893) leaf-cutter ants. The colony was collected in Uruguay in 2014, and housed in a climate chamber at 25°C and 60% humidity. A recording box was built from clear acrylic, and placed between the colony nest and a box external to the climate chamber, which functioned as feeding site. Bramble leaves were placed in the feeding area prior to each recording session, and ants had access to the recording area at will. The recorded area was 104 mm wide and 200 mm long. An OAK-D camera (OpenCV AI Kit: OAK-D, Luxonis Holding Corporation) was positioned centrally 195 mm above the ground. While keeping the camera position constant, lighting, exposure, and background conditions were varied to create recordings with variable appearance: The “base” case is an evenly lit and well exposed scene with scattered leaf fragments on an otherwise plain white backdrop. A “bright” and “dark” case are characterised by systematic over- or underexposure, respectively, which introduces motion blur, colour-clipped appendages, and extensive flickering and compression artefacts. In a separate well exposed recording, the clear acrylic backdrop was substituted with a printout of a highly textured forest ground to create a “noisy” case. Last, we decreased the camera distance to 100 mm at constant focal distance, effectively doubling the magnification, and yielding a “close” case, distinguished by out-of-focus workers. All recordings were captured at 25 frames per second (fps).
The field datasets consists of video recordings of Gnathamitermes sp. desert termites, filmed close to the nest entrance in the desert of Maricopa County, Arizona, using a Nikon D850 and a Nikkor 18-105 mm lens on a tripod at camera distances between 20 cm to 40 cm. All video recordings were well exposed, and captured at 23.976 fps.
Each video was trimmed to the first 1000 frames, and contains between 36 and 103 individuals. In total, 5000 and 1000 frames were hand-annotated for the laboratory- and field-dataset, respectively: each visible individual was assigned a constant size bounding box, with a centre coinciding approximately with the geometric centre of the thorax in top-down view. The size of the bounding boxes was chosen such that they were large enough to completely enclose the largest individuals, and was automatically adjusted near the image borders. A custom-written Blender Add-on aided hand-annotation: the Add-on is a semi-automated multi animal tracker, which leverages blender’s internal contrast-based motion tracker, but also include track refinement options, and CSV export functionality. Comprehensive documentation of this tool and Jupyter notebooks for track visualisation and benchmarking is provided on the replicAnt and BlenderMotionExport GitHub repositories.
Synthetic data generation
Two synthetic datasets, each with a population size of 100, were generated from 3D models of \textit{Atta vollenweideri} leaf-cutter ants. All 3D models were created with the scAnt photogrammetry workflow. A “group” population was based on three distinct 3D models of an ant minor (1.1 mg), a media (9.8 mg), and a major (50.1 mg) (see 10.5281/zenodo.7849059)). To approximately simulate the size distribution of A. vollenweideri colonies, these models make up 20%, 60%, and 20% of the simulated population, respectively. A 33% within-class scale variation, with default hue, contrast, and brightness subject material variation, was used. A “single” population was generated using the major model only, with 90% scale variation, but equal material variation settings.
A Gnathamitermes sp. synthetic dataset was generated from two hand-sculpted models; a worker and a soldier made up 80% and 20% of the simulated population of 100 individuals, respectively with default hue, contrast, and brightness subject material variation. Both 3D models were created in Blender v3.1, using reference photographs.
Each of the three synthetic datasets contains 10,000 images, rendered at a resolution of 1024 by 1024 px, using the default generator settings as documented in the Generator_example level file (see documentation on GitHub). To assess how the training dataset size affects performance, we trained networks on 100 (“small”), 1,000 (“medium”), and 10,000 (“large”) subsets of the “group” dataset. Generating 10,000 samples at the specified resolution took approximately 10 hours per dataset on a consumer-grade laptop (6 Core 4 GHz CPU, 16 GB RAM, RTX 2070 Super).
Additionally, five datasets which contain both real and synthetic images were curated. These “mixed” datasets combine image samples from the synthetic “group” dataset with image samples from the real “base” case. The ratio between real and synthetic images across the five datasets varied between 10/1 to 1/100.
Funding
This study received funding from Imperial College’s President’s PhD Scholarship (to Fabian Plum), and is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement No. 851705, to David Labonte). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a set of synthetic images, used to simulate a photogrammetric survey of everyday and cultural heritage objects. This dataset has been used to study the influence of specular reflections on the photgrammetric reconstruction process, and to train a neural model to remove specular highlights from photos, to improve the photogrammetric reconstruction. The 3D assets have been prepared to have a coherent position orientation, scale, materials, naming, and stored in individual blender files. For each asset, another blender file was created, with a configurable camera and lighting rig, to simulate camera placement and ligthing used in a photogrammetric survey. Rendering was configured to produce simultaneously 3 images: a complete rendering, a diffuse-only rendering, and a specular-only rendering. In this way, for each asset, it was possible to generate renderings compatible in terms of framing, resolution, lighting and appearance, to real-world photogrammetric surveys. The image triples were used to train a neural network to remove specular highlights from photos. This dataset comprises 24 assets, 144 rendered images per asset (3456 in total). This dataset has been prepared by the Visual Computing Lab, CNR-ISTI (https://vcg.isti.cnr.it) Authors: Marco Callieri, Daniela Giorgi, Massimiliano Corsini, Marco Sorrenti. For more info: callieri@isti.cnr.it ---------------------------------------------------------------- DATASET CONTENT: The dataset contains two main folders, plus readme info file. --IMAGES The rendered images, simulating the photogrammetric survey. One folder per asset, folder is named as the asset. For each asset, 48 views, 3 render per view: ASSETNAME_CXXX combined rendering image ASSETNAME_DXXX diffuse-only image ASSETNAME_SXXX specular-only image In total, 144 images per folder, 2000x2000 resolution. PNG format, with transparency channel as mask. --SCENES_ASSETS The 3D assets and Blender scenes. 00_BASE.blend is the basic empty scene file, used for the inital setup of the rendering process and the lighting/camera rigs. For each asset there is a blender file, named as the asset, containing the camera/lighting setup and rendering configuration to produce the combined/diffuse/specular images. Each of these blender files references its relative 3D asset from another Blender file from one of the subfolders. The prepared 3D assets are stored in the subfolders named as the asset. Each asset subfolder contains the original 3D model and texture, plus a Blender scene containing the ready-to-use asset with a standardized position, orientation, scale, naming, material and settings. ---------------------------------------------------------------- The 3D models included in this dataset have been sourced from Sketchfab (https://sketchfab.com/) and a private repository (https://vcg.isti.cnr.it). All objects had an open license, and were marked as "usable for AI applications". adidas "Scanned Adidas Sports Shoe" (https://skfb.ly/QZo9) by 3Digify is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) baby "Baby Waiting For Birth" (https://skfb.ly/6WBwn) by Tore Lysebo is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/)ù bfvase "Black-Figure Neck Amphora, c. 540 BCE" (https://skfb.ly/oOysJ) by Minneapolis Institute of Art is licensed under Creative Commons Attribution-ShareAlike (http://creativecommons.org/licenses/by-sa/4.0/) birdvase "Vessel in the Form of a Bird, 100 BCE - 600 CE" (https://skfb.ly/oOzUH) by Minneapolis Institute of Art is licensed under Creative Commons Attribution-ShareAlike (http://creativecommons.org/licenses/by-sa/4.0/) boot1 "Caterpillar Work Boot" (https://skfb.ly/o8KnO) by inciprocal is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) brezel "Laugenbrezel" (https://skfb.ly/o99yz) by svnfbgr is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) cabbage "Cabbage" (https://skfb.ly/6Z6EK) by Meerschaum Digital is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) conga "African Drum (raw scan)" (https://skfb.ly/6VwHK) by Piotr Lezanski is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) dwarf Visual Computing Lab, ISTI-CNR (https://vcg.isti.cnr.it), under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) elephant "Wooden Elephant Scan | Game-ready asset" (https://skfb.ly/6XvpQ) by Photogrammetry Guy is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) gator "Crocodile dog toy 3D scan" (https://skfb.ly/otSAq) by LukaszRyz is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) grinder "Angle_grinder DeWalt D28136" (https://skfb.ly/oOFvv) by Den.Prodan is licensed under Cre
Facebook
Twitterhttps://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
The SynWBM (Synthetic White Button Mushrooms) Dataset!
Synthetic dataset of white button mushrooms (Agaricus bisporus) with instance segmentation masks and depth maps.
Dataset Summary
The SynWBM Dataset is a collection of synthetic images of white button mushroom. The dataset incorporates rendered (using Blender) and generated (using Stable Diffusion XL) synthetic images for training mushroom segmentation models. Each image is annotated with instance segmentation masks… See the full description on the dataset page: https://huggingface.co/datasets/ABC-iRobotics/SynWBM.
Facebook
TwitterThis is a collection of synthetically generated LEGO images. The images were generated with Blender. Each image has a resolution of 800x600px. The dataset contains approximately 9 LEGO's per image. This results in 11520 individual LEGO's for training and 2295 LEGO's for validation. All LEGO's are not adjacent. Meaning, they do not overlap. There are a total of 14 different LEGO types in this collection.
This dataset was generated in connection with the following project:
Convolutional Neural Network to detect LEGO Bricks. (https://github.com/deeepwin/lego-cnn)
The notebook runs on Colab, but can be easly adjusted to run on Kaggle.
Annotation is compatible with VGG Image Annotator (https://gitlab.com/vgg/via).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
We used a subset of synthetic images that only contained forward-looking images from the original Unimelb corridor dataset, and removed the additional images that were generated by rotating the camera along the X and Y axes. To compensate for the low number of synthetic images, we generated 900 more images along the original trajectory by reducing the spacing between the consecutive images, which finally resulted in 1400 images for the synthetic dataset. The dataset also contains 950 real images and their corresponding groundtruth camera poses in the BIM coordinate system. We removed some of the redundant images (100) at the end of the trajectory and added another 500 new real images, which resulted in 1350 real images. The synthetic and real cameras have identical intrinsic camera parameters, with an image resolution of 640 x 480 pixels.
Additionally, the provided Blender files can be used to render the images. Please note that SynCar dataset should be rendered with Blender 2.78 only, whereas SynPhoReal and SynPhoRealTex images can be generated using the latest Blender 3.4.
[1] Acharya, D., Khoshelham, K. and Winter, S., 2019. BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS Journal of Photogrammetry and Remote Sensing, 150, pp.245-258.
[2] Acharya, D., Singha Roy, S., Khoshelham, K. and Winter, S., 2020. A recurrent deep network for estimating the pose of real indoor images from synthetic image sequences. Sensors, 20(19), p.5492.
[3] Acharya, D., Tennakoon, R., Muthu, S., Khoshelham, K., Hoseinnezhad, R. and Bab-Hadiashar, A., 2022. Single-image localisation using 3D models: Combining hierarchical edge maps and semantic segmentation for domain adaptation. Automation in Construction, 136, p.104152.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The aim of this study was to train a Vision Transformer (ViT) model for semantic segmentation to differentiate between ripe and unripe strawberries using synthetic data to avoid challenges with conventional data collection methods. The solution used Blender to generate synthetic strawberry images along with their corresponding masks for precise segmentation. Subsequently, the synthetic images were used to train and evaluate the SwinUNet as a segmentation method, and Deep Domain Confusion was utilized for domain adaptation. The trained model was then tested on real images from the Strawberry Digital Images dataset. The performance on the real data achieved a Dice Similarity Coefficient of 94.8% for ripe strawberries and 94% for unripe strawberries, highlighting its effectiveness for applications such as fruit ripeness detection. Additionally, the results show that increasing the volume and diversity of the training data can significantly enhance the segmentation accuracy of each class. This approach demonstrates how synthetic datasets can be employed as a cost-effective and efficient solution for overcoming data scarcity in agricultural applications.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D scenes, with 650,000 images, based on real world shapes and textures.
The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes. Main Project Page
The dataset is composed of 4 parts:
3D shape recognition and retrieval. 2D shape recognition and retrieval. 3D Materials recognition and retrieval. 2D Texture recognition and retrieval.
Additional assets is as set of 350,000 natural 2D shapes extracted from real world images (SHAPES_COLLECTION_350k.zip)
Each can be trained and tested independently.
For shape recognition the goal is to identify the same shape in different images, where the material/texture/color of the shape is changed, the shape is rotated, and the background is replaced. Hence, only the shape remains the same in both images. Note that this means the model can't use any contextual cues and most rely on the shape information alone.
All jpg images that are in the exact same subfolder contain the exact same shape (but with different texture/color/background/orientation).
For texture and materials, the goal is to identify and match images containing the same material or textures, however the shape/object on which the material texture is applied is different, and so is the background and light. Removing contextual clues and forcing the model to use only the texture/material for the recognition process.
All jpg images that are in the exact same subfolder contain the exact same texture/material (but overlay on different objects with different background/and illumination/orientation).
The images in the synthetic part of the dataset were created by automatically extracting shapes and textures from natural images and combining them in synthetic images. This created synthetic images that completely rely on real-world patterns, making extremely diverse and complex shapes and textures. As far as we know this is the largest and most diverse shape and texture recognition/retrieval dataset. 3D data was generated using physics-based material and rendering (blender) making the images physically grounded and enabling using the data to train for real-world examples.
For 3D shape recognition and retrieval, we also supply a real-world natural image benchmark. With a variety of natural images containing the exact same 3D shape but made/coated with different materials and in different environments and orientations. The goal is again to identify the same shape in different images.
File containing the word 'synthetic' contains synthetic images that can be used for training or testing, the type of data (2D shapes, 3D shapes, 2D textures, 3D materials) appears in the file name, as well as the number of images. Files containing "MULTI TESTS" in their name, contains various of small tests (500 images) that can be used to test some how single variation effect the recognition the recognition (orientation/background), and are less suitable for general training or testing.
The file Files starting with "Scripts" contains the scripts used to generate the dataset and the scripts used to evaluate various of LVLMs on this dataset.
The file SHAPES_COLLECTION_350k.zip contains 350,000 2D shapes extracted from natural images and used for the dataset generation.
For evaluating and testing see: SCRIPTS_Testing_LVLM_ON_LAST_VQA.zip This can be use to test leading LVLMs using api, create human tests, and in general turn the dataset into multichoice question images similar to the one in the paper.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
ENVISET is a dataset of synthetic images of the satellite ENVISAT, generated in Blender, for CNN-based satellite pose estimation tasks. The dataset includes images for training and testing pose estimation algorithms employing CNNs; for each images, relative position and attitude labels are provided.
Training - A set of 20000 images of the satellite, fully visible or truncated in the Field of View, with randomized relative position, relative attitude and illumination. Earth is present in half of the images.
Testing - Dataset A - A set of 4000 images of the satellite, fully visible or truncated in the Field of View, with randomized relative position, relative attitude and illumination. Earth is present in half of the images. Relative position and attitude labels are provided. - Dataset B - A set of 3928 images acquired along a monitoring trajectory around the satellite, with an acquisition rate of 1 Hz, at distances between 25 m and 57 m; ENVISAT is spinning at 0.4°/s. - Dataset C.1 - A set of 3928 images acquired along a monitoring trajectory around the satellite, with an acquisition rate of 1 Hz, at distances between 25 m and 57 m; ENVISAT is tumbling with an initial angular velocity of 0.3°/s around each axis. - Dataset C.2 - A set of 3928 images acquired along a monitoring trajectory around the satellite, with an acquisition rate of 1 Hz, at distances between 25 m and 57 m; ENVISAT is tumbling with an initial angular velocity of 0.58°/s around each axis. - Dataset D - A set of 799 images acquired along an approach trajectory towards the satellite, with an acquisition rate of 1 Hz, from 70 m to 25 m; ENVISAT is spinning at 0.4°/s.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing 3,346 synthetically generated RGB images of road segments with cracks. Road segments and crack formations created in Blender, data collected in Microsoft AirSim. Data is split into train (~70%), test (~15%), and validation (~15%) folders. Contains ground truth bounding boxes labelling cracks in both YOLO and COCO JSON format.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This a Lego bricks image dataset that is annotated in a PASCAL VOC format ready for ML object detection pipeline. Additionally I made tutorials on how to: - Generate synthetic images and create bounding box annotations in Pascal VOC format using Blender. - Train ML models (YoloV5 and SSD) for detecting multiple objects in an image. The tutorial with Blender scripts for rendering the dataset and Jupyter notebooks for training ML models can be found here: https://github.com/mantyni/Multi-object-detection-lego
Dataset contains: Lego Brick images in JPG format, 300x300 resolution Annotations in PASCAL-VOC format There's 6 Lego bricks in this dataset, each appearing approximately 600 times across the dataset: brick_2x2, brick_2x4, brick_1x6, plate_1x2, plate_2x2, plate_2x4
Lego brick 3D models obtained from: Mecabricks - https://www.mecabricks.com/
First 500 images are of individual Lego bricks rendered in different angles and backgrounds. Images afterwards are of multiple bricks. Each image is rendered using different backgrounds, brick colour and shadow variations to enable Sim2Real transfer. After training ML (YoloV5 and SSD) models on synthetic dataset I then tested it on real images achieving ~70% detection accuracy.
The main purpose of this project is to show how to create your own realistic synthetic image datasets for training computer vision models without needing real world data.