30 datasets found
  1. Synthetic Lego brick dataset for object detection

    • kaggle.com
    zip
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mantas Gribulis (2021). Synthetic Lego brick dataset for object detection [Dataset]. https://www.kaggle.com/datasets/mantasgr/synthetic-lego-brick-dataset-for-object-detection/code
    Explore at:
    zip(54438517 bytes)Available download formats
    Dataset updated
    Nov 15, 2021
    Authors
    Mantas Gribulis
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This a Lego bricks image dataset that is annotated in a PASCAL VOC format ready for ML object detection pipeline. Additionally I made tutorials on how to: - Generate synthetic images and create bounding box annotations in Pascal VOC format using Blender. - Train ML models (YoloV5 and SSD) for detecting multiple objects in an image. The tutorial with Blender scripts for rendering the dataset and Jupyter notebooks for training ML models can be found here: https://github.com/mantyni/Multi-object-detection-lego

    Content

    Dataset contains: Lego Brick images in JPG format, 300x300 resolution Annotations in PASCAL-VOC format There's 6 Lego bricks in this dataset, each appearing approximately 600 times across the dataset: brick_2x2, brick_2x4, brick_1x6, plate_1x2, plate_2x2, plate_2x4

    Lego brick 3D models obtained from: Mecabricks - https://www.mecabricks.com/

    First 500 images are of individual Lego bricks rendered in different angles and backgrounds. Images afterwards are of multiple bricks. Each image is rendered using different backgrounds, brick colour and shadow variations to enable Sim2Real transfer. After training ML (YoloV5 and SSD) models on synthetic dataset I then tested it on real images achieving ~70% detection accuracy.

    Inspiration

    The main purpose of this project is to show how to create your own realistic synthetic image datasets for training computer vision models without needing real world data.

  2. n

    6DOF pose estimation - synthetically generated dataset using BlenderProc

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Nov 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Divyam Sheth (2023). 6DOF pose estimation - synthetically generated dataset using BlenderProc [Dataset]. http://doi.org/10.5061/dryad.rbnzs7hj5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 26, 2023
    Dataset provided by
    Dwarkadas J. Sanghvi College of Engineering
    Authors
    Divyam Sheth
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Accurate and robust 6DOF (Six Degrees of Freedom) pose estimation is a critical task in various fields, including computer vision, robotics, and augmented reality. This research paper presents a novel approach to enhance the accuracy and reliability of 6DOF pose estimation by introducing a robust method for generating synthetic data and leveraging the ease of multi-class training using the generated dataset. The proposed method tackles the challenge of insufficient real-world annotated data by creating a large and diverse synthetic dataset that accurately mimics real-world scenarios. The proposed method only requires a CAD model of the object and there is no limit to the number of unique data that can be generated. Furthermore, a multi-class training strategy that harnesses the synthetic dataset's diversity is proposed and presented. This approach mitigates class imbalance issues and significantly boosts accuracy across varied object classes and poses. Experimental results underscore the method's effectiveness in challenging conditions, highlighting its potential for advancing 6DOF pose estimation across diverse applications. Our approach only uses a single RGB frame and is real-time. Methods This dataset has been synthetically generated using 3D software like Blender and APIs like Blendeproc.

  3. MatSim Dataset and benchmark for one-shot visual materials and textures...

    • zenodo.org
    • data.niaid.nih.gov
    pdf, zip
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik; Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik (2025). MatSim Dataset and benchmark for one-shot visual materials and textures recognition [Dataset]. http://doi.org/10.5281/zenodo.7390166
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik; Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The MatSim Dataset and benchmark

    Lastest version

    Synthetic dataset and real images benchmark for visual similarity recognition of materials and textures.

    MatSim: a synthetic dataset, a benchmark, and a method for computer vision-based recognition of similarities and transitions between materials and textures focusing on identifying any material under any conditions using one or a few examples (one-shot learning).

    Based on the paper: One-shot recognition of any material anywhere using contrastive learning with physics-based rendering

    Benchmark_MATSIM.zip: contain the benchmark made of real-world images as described in the paper



    MatSim_object_train_split_1,2,3.zip: Contain a subset of the synthetics dataset for images of CGI images materials on random objects as described in the paper.

    MatSim_Vessels_Train_1,2,3.zip : Contain a subset of the synthetics dataset for images of CGI images materials inside transparent containers as described in the paper.

    *Note: these are subsets of the dataset; the full dataset can be found at:
    https://e1.pcloud.link/publink/show?code=kZIiSQZCYU5M4HOvnQykql9jxF4h0KiC5MX

    or
    https://icedrive.net/s/A13FWzZ8V2aP9T4ufGQ1N3fBZxDF

    Code:

    Up to date code for generating the dataset, reading and evaluation and trained nets can be found in this URL:https://github.com/sagieppel/MatSim-Dataset-Generator-Scripts-And-Neural-net

    Dataset Generation Scripts.zip: Contain the Blender (3.1) Python scripts used for generating the dataset, this code might be odl up to date code can be found here
    Net_Code_And_Trained_Model.zip: Contain a reference neural net code, including loaders, trained models, and evaluators scripts that can be used to read and train with the synthetic dataset or test the model with the benchmark. Note code in the ZIP file is not up to date and contains some bugs For the Latest version of this code see this URL

    Further documentation can be found inside the zip files or in the paper.

  4. SynthAer - a synthetic dataset of semantically annotated aerial images

    • figshare.com
    zip
    Updated Sep 13, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Scanlon (2018). SynthAer - a synthetic dataset of semantically annotated aerial images [Dataset]. http://doi.org/10.6084/m9.figshare.7083242.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 13, 2018
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Maria Scanlon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SynthAer is a dataset consisting of synthetic aerial images with pixel-level semantic annotations from a suburban scene generated using the 3D modelling tool Blender. SynthAer contains three time-of-day variations for each image - one for lighting conditions at dawn, one for midday, and one for dusk.

  5. u

    Unimelb Corridor Synthetic dataset

    • figshare.unimelb.edu.au
    png
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debaditya Acharya; KOUROSH KHOSHELHAM; STEPHAN WINTER (2023). Unimelb Corridor Synthetic dataset [Dataset]. http://doi.org/10.26188/5dd8b8085b191
    Explore at:
    pngAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    The University of Melbourne
    Authors
    Debaditya Acharya; KOUROSH KHOSHELHAM; STEPHAN WINTER
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data-set is a supplementary material related to the generation of synthetic images of a corridor in the University of Melbourne, Australia from a building information model (BIM). This data-set was generated to check the ability of deep learning algorithms to learn task of indoor localisation from synthetic images, when being tested on real images. =============================================================================The following is the name convention used for the data-sets. The brackets show the number of images in the data-set.REAL DATAReal
    ---------------------> Real images (949 images)

    Gradmag-Real -------> Gradmag of real data (949 images)SYNTHETIC DATASyn-Car
    ----------------> Cartoonish images (2500 images)

    Syn-pho-real ----------> Synthetic photo-realistic images (2500 images)

    Syn-pho-real-tex -----> Synthetic photo-realistic textured (2500 images)

    Syn-Edge --------------> Edge render images (2500 images)

    Gradmag-Syn-Car ---> Gradmag of Cartoonish images (2500 images)=============================================================================Each folder contains the images and their respective groundtruth poses in the following format [ImageName X Y Z w p q r].To generate the synthetic data-set, we define a trajectory in the 3D indoor model. The points in the trajectory serve as the ground truth poses of the synthetic images. The height of the trajectory was kept in the range of 1.5–1.8 m from the floor, which is the usual height of holding a camera in hand. Artificial point light sources were placed to illuminate the corridor (except for Edge render images). The length of the trajectory was approximately 30 m. A virtual camera was moved along the trajectory to render four different sets of synthetic images in Blender*. The intrinsic parameters of the virtual camera were kept identical to the real camera (VGA resolution, focal length of 3.5 mm, no distortion modeled). We have rendered images along the trajectory at 0.05 m interval and ± 10° tilt.The main difference between the cartoonish (Syn-car) and photo-realistic images (Syn-pho-real) is the model of rendering. Photo-realistic rendering is a physics-based model that traces the path of light rays in the scene, which is similar to the real world, whereas the cartoonish rendering roughly traces the path of light rays. The photorealistic textured images (Syn-pho-real-tex) were rendered by adding repeating synthetic textures to the 3D indoor model, such as the textures of brick, carpet and wooden ceiling. The realism of the photo-realistic rendering comes at the cost of rendering times. However, the rendering times of the photo-realistic data-sets were considerably reduced with the help of a GPU. Note that the naming convention used for the data-sets (e.g. Cartoonish) is according to Blender terminology.An additional data-set (Gradmag-Syn-car) was derived from the cartoonish images by taking the edge gradient magnitude of the images and suppressing weak edges below a threshold. The edge rendered images (Syn-edge) were generated by rendering only the edges of the 3D indoor model, without taking into account the lighting conditions. This data-set is similar to the Gradmag-Syn-car data-set, however, does not contain the effect of illumination of the scene, such as reflections and shadows.*Blender is an open-source 3D computer graphics software and finds its applications in video games, animated films, simulation and visual art. For more information please visit: http://www.blender.orgPlease cite the papers if you use the data-set:1) Acharya, D., Khoshelham, K., and Winter, S., 2019. BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS Journal of Photogrammetry and Remote Sensing. 150: 245-258.2) Acharya, D., Singha Roy, S., Khoshelham, K. and Winter, S. 2019. Modelling uncertainty of single image indoor localisation using a 3D model and deep learning. In ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, IV-2/W5, pages 247-254.

  6. Synthetic Chess Board Images

    • kaggle.com
    zip
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheFamousRat (2022). Synthetic Chess Board Images [Dataset]. https://www.kaggle.com/datasets/thefamousrat/synthetic-chess-board-images
    Explore at:
    zip(457498797 bytes)Available download formats
    Dataset updated
    Feb 13, 2022
    Authors
    TheFamousRat
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Data collection is perhaps the most crucial part of any machine learning model: without it being done properly, not enough information is present for the model to learn from the patterns leading to one output or another. Data collection is however a very complex endeavor, time-consuming due to the volume of data that needs to be acquired and annotated. Annotation is an especially problematic step, due to its difficulty, length, and vulnerability to human error and inaccuracies when annotating complex data.

    With high processing power becoming ever more accessible, synthetic dataset generation is becoming a viable option when looking to generate large volumes of accurately annotated data. With the help of photorealistic renderers, it is for example possible now to generate immense amounts of data, annotated with pixel-perfect precision and whose content is virtually indistinguishable from real-world pictures.

    As an exercise of synthetic dataset generation, the data offered here was generated using the Python API of Blender, with the images rendered through the Cycles raycaster. It represents plausible images representing pictures of chessboard and pieces. The goal is, from those pictures and their annotation, to build a model capable of recognizing the pieces, as well as their positions on the board.

    Content

    The dataset contains a large amount of synthetic, randomly generated images representing pictures of chess images, taken at an angle overlooking the board and its pieces. Each image is associated with a .json file containing its annotations. The naming convention is that each render is associated with a number X, and that the images and annotations associated with that render are respectively named X.jpg and X.json.

    The data has been generated using the Python scripts and .blend file present in this repository. The chess board and pieces models that have been used for those renders are not provided with the code.

    Data characteristics :

    • Images : 1280x1280 JPEG images representing pictures of chess game boards.
    • Annotations : JSON files containing two variables :
      • "config", a dictionary associating a cell to the type of piece it contains. If a cell is not presented in the keys, it means that it is empty.
      • "corners", a 4x2 list which contains the coordinates, in the image, of the board corners. Those corners coordinates are normalized to the [0;1] range.
    • config.json : A JSON file generated before rendering, which contains variables relative to the constant properties of the boards in the renders :
      • "cellsCoordinates", a dictionary associating a cell name to its coordinates on the board. We have for example
      • "piecesTypes", a list of strings containing the types of pieces present in the renders.

    No distinction has been hard-built between training, validation, and testing data, and is left completely up to the users. A proposed pipeline for the extraction, recognition, and placement of chess pieces is proposed in a notebook added with this dataset.

    Acknowledgements

    I would like to express my gratitude for the efforts of the Blender Foundation and all its participants, for their incredible open-source tool which once again has allowed me to conduct interesting projects with great ease.

    Inspiration

    Two interesting papers on the generation and use of synthetic data, which have inspired me to conduct this project :

    Erroll Wood, Tadas Baltrušaitis, Charlie Hewitt (2021) Fake It Till You Make It: Face analysis in the wild using synthetic data alone https://arxiv.org/abs/2109.15102 Salehe Erfanian Ebadi, You-Cyuan Jhang, Alex Zook (2021) PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision https://arxiv.org/abs/2112.09290

  7. Z

    Data from: HOWS-CL-25: Household Objects Within Simulation Dataset for...

    • data.niaid.nih.gov
    • opendatalab.com
    • +1more
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Knauer, Markus; Denninger, Maximilian; Triebel, Rudolph (2022). HOWS-CL-25: Household Objects Within Simulation Dataset for Continual Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7054170
    Explore at:
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    German Aerospace Center (DLR)
    German Aerospace Center (DLR), Technical University of Munich (TUM)
    Authors
    Knauer, Markus; Denninger, Maximilian; Triebel, Rudolph
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile robots operating in a changing environment (like a household), where it is important to learn new, never seen objects on the fly. This dataset can also be used for other learning use-cases, like instance segmentation or depth estimation. Or where household objects or continual learning are of interest.

    Our dataset contains 150,795 unique synthetic images using 25 different household categories with 925 3D models in total. For each of those categories, we generated about 6000 RGB images. In addition, we also provide a corresponding depth, segmentation, and normal image.

    The dataset was created with BlenderProc [Denninger et al. (2019)], a procedural pipeline to generate images for deep learning. This tool created a virtual room with randomly textured floors, walls, and a light source with randomly chosen light intensity and color. After that, a 3D model is placed in the resulting room. This object gets customized by randomly assigning materials, including different textures, to achieve a diverse dataset. Moreover, each object might be deformed with a random displacement texture. We use 774 3D models from the ShapeNet dataset [A. X. Chang et al. (2015)] and the other models from various internet sites. Please note that we had to manually fix and filter most of the models with Blender before using them in the pipeline!

    For continual learning (CL), we provide two different loading schemes: - Five sequences with five categories each - Twelve sequences with three categories in the first and two in the other sequences.

    In addition to the RGB, depth, segmentation, and normal images, we also provide the calculated features of the RGB images (by ResNet50) as used in our RECALL paper. In those two loading schemes, ten percent of the images are used for validation, where we ensure that an object instance is either in the training or the validation set, not in both. This avoids learning to recognize certain instances by heart.

    We recommend using those loading schemes to compare your approach with others.

    Here we provide three files for download: - HOWS_CL_25.zip [124GB]: This is the original dataset with the RGB, depth, segmentation, and normal images, as well as the loading schemes. It is divided into three archive parts. To open the dataset, please ensure to download all three parts. - HOWS_CL_25_hdf5_features.zip [2.5GB]: This only contains the calculated features from the RGB input by a ResNet50 in a .hdf5 file. Download this if you want to use the dataset for learning and/or want to compare your approach to our RECALL approach (where we used the same features). - README.md: Some additional explanation.

    For further information and code examples, please have a look at our website: https://github.com/DLR-RM/RECALL.

  8. Data from: Domain-adaptive Data Synthesis for Large-scale Supermarket...

    • zenodo.org
    zip
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Strohmayer; Julian Strohmayer; Martin Kampel; Martin Kampel (2024). Domain-adaptive Data Synthesis for Large-scale Supermarket Product Recognition [Dataset]. http://doi.org/10.5281/zenodo.7750242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julian Strohmayer; Julian Strohmayer; Martin Kampel; Martin Kampel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition

    This repository contains the data synthesis pipeline and synthetic product recognition datasets proposed in [1].

    Data Synthesis Pipeline:

    We provide the Blender 3.1 project files and Python source code of our data synthesis pipeline pipeline.zip, accompanied by the FastCUT models used for synthetic-to-real domain translation models.zip. For the synthesis of new shelf images, a product assortment list and product images must be provided in the corresponding directories products/assortment/ and products/img/. The pipeline expects product images to follow the naming convention c.png, with c corresponding to a GTIN or generic class label (e.g., 9120050882171.png). The assortment list, assortment.csv, is expected to use the sample format [c, w, d, h], with c being the class label and w, d, and h being the packaging dimensions of the given product in mm (e.g., [4004218143128, 140, 70, 160]). The assortment list to use and the number of images to generate can be specified in generateImages.py (see comments). The rendering process is initiated by either executing load.py from within Blender or within a command-line terminal as a background process.

    Datasets:

    • SG3k - Synthetic GroZi-3.2k (SG3k) dataset, consisting of 10,000 synthetic shelf images with 851,801 instances of 3,234 GroZi-3.2k products. Instance-level bounding boxes and generic class labels are provided for all product instances.
    • SG3kt - Domain-translated version of SGI3k, utilizing GroZi-3.2k as the target domain. Instance-level bounding boxes and generic class labels are provided for all product instances.
    • SGI3k - Synthetic GroZi-3.2k (SG3k) dataset, consisting of 10,000 synthetic shelf images with 838,696 instances of 1,063 GroZi-3.2k products. Instance-level bounding boxes and generic class labels are provided for all product instances.
    • SGI3kt - Domain-translated version of SGI3k, utilizing GroZi-3.2k as the target domain. Instance-level bounding boxes and generic class labels are provided for all product instances.
    • SPS8k - Synthetic Product Shelves 8k (SPS8k) dataset, comprised of 16,224 synthetic shelf images with 1,981,967 instances of 8,112 supermarket products. Instance-level bounding boxes and GTIN class labels are provided for all product instances.
    • SPS8kt - Domain-translated version of SPS8k, utilizing SKU110k as the target domain. Instance-level bounding boxes and GTIN class labels for all product instances.

    Table 1: Dataset characteristics.

    Dataset#images#products#instances labels translation
    SG3k10,0003,234851,801bounding box & generic class¹none
    SG3kt10,0003,234851,801bounding box & generic class¹GroZi-3.2k
    SGI3k10,0001,063838,696bounding box & generic class²none
    SGI3kt10,0001,063838,696bounding box & generic class²GroZi-3.2k
    SPS8k16,2248,1121,981,967bounding box & GTINnone
    SPS8kt16,2248,1121,981,967bounding box & GTINSKU110k

    Sample Format

    A sample consists of an RGB image (i.png) and an accompanying label file (i.txt), which contains the labels for all product instances present in the image. Labels use the YOLO format [c, x, y, w, h].

    ¹SG3k and SG3kt use generic pseudo-GTIN class labels, created by combining the GroZi-3.2k food product category number i (1-27) with the product image index j (j.jpg), following the convention i0000j (e.g., 13000097).

    ²SGI3k and SGI3kt use the generic GroZi-3.2k class labels from https://arxiv.org/abs/2003.06800.

    Download and Use
    This data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].

    [1] Strohmayer, Julian, and Martin Kampel. "Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition." International Conference on Computer Analysis of Images and Patterns. Cham: Springer Nature Switzerland, 2023.

    BibTeX citation:

    @inproceedings{strohmayer2023domain,
     title={Domain-Adaptive Data Synthesis for Large-Scale Supermarket Product Recognition},
     author={Strohmayer, Julian and Kampel, Martin},
     booktitle={International Conference on Computer Analysis of Images and Patterns},
     pages={239--250},
     year={2023},
     organization={Springer}
    }
  9. Sandcast Images for Foundry Automation (SIFA)

    • kaggle.com
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martell Bell17 (2024). Sandcast Images for Foundry Automation (SIFA) [Dataset]. http://doi.org/10.34740/kaggle/dsv/9634322
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Martell Bell17
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Industry 4.0 advancements offer promising solutions to the challenges faced by high-mix, low-volume (HMLV) foundries, particularly in quality assessment and process automation. This comprehensive dataset was developed to train an image segmentation neural network, aimed at automating the post-processing task of removing sprues and risers from cast parts. By enabling the analysis of diverse part geometries, the approach is designed to address the variability inherent in HMLV foundries, where standardization is difficult due to complex part configurations.

    Data for this project consists of three types of images: camera images, synthetic images, and augmented images, all stored in JPEG format. Each sample contains 36 camera, synthetic, and augmented images. Camera images were captured using an Arduino Nicla Vision camera with a default resolution of 240 x 320 pixels. Images are labeled ‘Sample## Natural up/down ##’, with ## denoting the sample and image numbers. Synthetic images were created from raw 3D scan data and rendered in JPEG format using Blender at a default resolution of 1920 x 1080 pixels. They are labeled ‘Sample ## synthetic up/down ##’ Augmented images are synthetic images that have been modified by replacing the original part geometry with a CAD model. They are labeled ‘Sample ##A up/down ##’. The dataset includes both labeled and unlabeled images. The unlabeled set only includes JPEG images, while the labelled set includes JPEG images and their label in .txt format, as well as the .yaml file. A detailed description of the dataset creation is outlined below:

    1) Real image creation: To create the real image dataset, each sample was placed on top of a turntable and a photo graph of the sample was taken. Then, the sample was rotated 20 degrees, and a subsequent photograph was taken. This process was repeated until a full rotation of the sample was complete, providing a total of 18 images. This process was then repeated with the object flipped upside down, providing 36 parts per sample, and 1080 images for the real dataset.

    2) Synthetic image creation: The synthetic image dataset was created by using an Einscan Pro HD 3D scanner to collect 3D scans of the casted parts. The scans were imported into Blender and wrapped in an aluminum texture resembling the appearance real part. Then, the texture-wrapped part was place either in a blank scene with a black or gray background, or on top of a turntable resembling the real turntable in front of a white. Finally, the same image capture procedure performed on the real dataset was repeated in Blender to produce a total of 1080 synthetic images. All camera angles Fig. 4. Examples of augmented, real, and synthetic images from the dataset. and lighting were modelled to resemble the real images as closely as possible.

    3) Augmented image creation: For the augmented image dataset, the same procedure for synthetic images was followed, with the exception of using Creo Parametic to replace the original part geometry with a CAD model of the part prior to importing into Blender. Similarly, 1080 Augmented parts were created.

    Practitioners using this dataset have several options to tailor it to their needs. The dataset includes labeled and unlabeled images, with the labeled images organized into a single folder, enabling users to create custom train-validation-test splits. The unlabeled images can be categorized into different classes beyond the three provided in the labeled set (part, sprue, and riser). Additionally, the unlabeled set supports further 2D spatial augmentations, as class locations are specified in .txt files. The labeled images are named in the format ‘Sample XX-jpg.rf.RandomCharacterString,’ reflecting the dataset’s export from the data management platform Roboflow. Labeling can be performed in various platforms, such as Roboflow, CVAT, or Labelbox, providing flexibility in data management.

  10. RailEnV-PASMVS: a dataset for multi-view stereopsis training and...

    • zenodo.org
    • resodate.org
    • +2more
    bin, csv, png, txt +1
    Updated Jul 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    André Broekman; André Broekman; Petrus Johannes Gräbe; Petrus Johannes Gräbe (2024). RailEnV-PASMVS: a dataset for multi-view stereopsis training and reconstruction applications [Dataset]. http://doi.org/10.5281/zenodo.5233840
    Explore at:
    bin, csv, txt, zip, pngAvailable download formats
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    André Broekman; André Broekman; Petrus Johannes Gräbe; Petrus Johannes Gräbe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS) is presented, consisting of 40 scenes and 79,800 renderings together with ground truth depth maps, extrinsic and intrinsic camera parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent (straight) track to yield a total of 1,995 camera views. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of camera focal lengths, random noise for the camera location and rotation parameters and shader modifications of the rail profile. Representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. This primary, synthetic dataset is augmented by a smaller image collection consisting of 320 manually annotated photographs for improved segmentation performance. The specular rail profile represents the most challenging component for MVS reconstruction algorithms, pipelines and neural network architectures, increasing the ambiguity and complexity of the data distribution. RailEnV-PASMVS represents an application specific dataset for railway engineering, against the backdrop of existing datasets available in the field of computer vision, providing the precision required for novel research applications in the field of transportation engineering.

    File descriptions

    • RailEnV-PASMVS.blend (227 Mb) - Blender file (developed using Blender version 2.8.1) used to generate the dataset. The Blender file packs only one of the HDR environmental textures to use as an example, along with all the other asset textures.
    • RailEnV-PASMVS_sample.png (28 Mb) - A visual collage of 30 scenes, illustrating the variability introduced by using different models, illumination, material properties and camera focal lengths.
    • geometry.zip (2 Mb) - Geometry CSV files used for scenes 01 to 20. The Bezier curve defines the geometry of the rail profile (10 mm intervals).
    • PhysicalDataset.7z (2.0 Gb) - A smaller, secondary dataset of 320 manually annotated photographs of railway environments; only the railway profiles are annotated.
    • 01.7z-40.7z (2.0 Gb each) - Archive of every scene (01 through 40).
    • all_list.txt, training_list.txt, validation_list.txt - Text files containing the all the scene names, together with those used for validation (validation_list.txt) and training (training_list.txt), used by MVSNet.
    • index.csv - CSV file provides a convenient reference for all the sample files, linking the corresponding file and relative data path.

    Steps to reproduce

    The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices.

    import numpy as np
    from scipy.spatial.transform import Rotation as R
    
    #The intrinsic matrix K is constructed using the following formulation:
    focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm
    K = [[focalLengthPixel, 0, dimX/2],
       [0, focalPixel, dimY/2],
       [0, 0, 1]]
    
    #The rotation vector as provided by Blender was first transformed to a rotation matrix:
    r = R.from_euler('xyz', vectR, degrees=True)
    matR = r.as_matrix()
    
    #Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system:
    R_world2bcam = np.transpose(matR)
    
    #The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is:
    R_bcam2cv = np.array([[1, 0, 0],
                   [0, -1, 0],
                   [0, 0, -1]])
    
    #Thus the representation from WORLD to CV/STANDARD coordinates is:
    R_world2cv = R_bcam2cv.dot(R_world2bcam)
    
    #The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates:
    T_world2bcam = -1 * R_world2bcam.dot(vectC)
    T_world2cv = R_bcam2cv.dot(T_world2bcam)

    The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao. The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided.

    Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.

  11. Largest LEGO Dataset (600 parts)

    • kaggle.com
    zip
    Updated Jul 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dreamfactor (2021). Largest LEGO Dataset (600 parts) [Dataset]. https://www.kaggle.com/dreamfactor/biggest-lego-dataset-600-parts
    Explore at:
    zip(8248541886 bytes)Available download formats
    Dataset updated
    Jul 24, 2021
    Authors
    dreamfactor
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I was looking to do some LEGO sorting using object detection (my friend is building the actual sorter, I'm writing the software)

    I looked around for labeled datasets, but couldn't find any good ones. The ones I did find were fairly limited (basic parts, not enough variation, black background, no bounding boxes, etc.) (example: https://www.kaggle.com/marwin1665/synthetic-lego-images-images22) (all: https://www.kaggle.com/datasets?search=lego)

    So I scripted Blender to generate a synthetic dataset for 600 unique lego parts with multiple parts per image.

    What's cool about this dataset

    • It's the largest publicly available LEGO dataset for object detection
    • Uses SoTA domain randomization techniques to bridge the sim-to-real gap
    • Cheap to generate more data with DreamFactor

    I'd love to know if people find this useful or interesting, I can also release the trained PyTorch model as well 😇

    Acknowledgements

    • LDraw parts library
    • Blender
  12. Z

    replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Plum, Fabian; Bulla, René; Beck, Hendrik; Imirzian, Natalie; Labonte, David (2023). replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7849416
    Explore at:
    Dataset updated
    Apr 21, 2023
    Dataset provided by
    Imperial College London
    The Pocket Dimension, Munich
    Authors
    Plum, Fabian; Bulla, René; Beck, Hendrik; Imirzian, Natalie; Labonte, David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all recorded and hand-annotated as well as all synthetically generated data as well as representative trained networks used for detection and tracking experiments in the replicAnt - generating annotated images of animals in complex environments using Unreal Engine manuscript. Unless stated otherwise, all 3D animal models used in the synthetically generated data have been generated with the open-source photgrammetry platform scAnt peerj.com/articles/11155/. All synthetic data has been generated with the associated replicAnt project available from https://github.com/evo-biomech/replicAnt.

    Abstract:

    Deep learning-based computer vision methods are transforming animal behavioural research. Transfer learning has enabled work in non-model species, but still requires hand-annotation of example footage, and is only performant in well-defined conditions. To overcome these limitations, we created replicAnt, a configurable pipeline implemented in Unreal Engine 5 and Python, designed to generate large and variable training datasets on consumer-grade hardware instead. replicAnt places 3D animal models into complex, procedurally generated environments, from which automatically annotated images can be exported. We demonstrate that synthetic data generated with replicAnt can significantly reduce the hand-annotation required to achieve benchmark performance in common applications such as animal detection, tracking, pose-estimation, and semantic segmentation; and that it increases the subject-specificity and domain-invariance of the trained networks, so conferring robustness. In some applications, replicAnt may even remove the need for hand-annotation altogether. It thus represents a significant step towards porting deep learning-based computer vision tools to the field.

    Benchmark data

    Two video datasets were curated to quantify detection performance; one in laboratory and one in field conditions. The laboratory dataset consists of top-down recordings of foraging trails of Atta vollenweideri (Forel 1893) leaf-cutter ants. The colony was collected in Uruguay in 2014, and housed in a climate chamber at 25°C and 60% humidity. A recording box was built from clear acrylic, and placed between the colony nest and a box external to the climate chamber, which functioned as feeding site. Bramble leaves were placed in the feeding area prior to each recording session, and ants had access to the recording area at will. The recorded area was 104 mm wide and 200 mm long. An OAK-D camera (OpenCV AI Kit: OAK-D, Luxonis Holding Corporation) was positioned centrally 195 mm above the ground. While keeping the camera position constant, lighting, exposure, and background conditions were varied to create recordings with variable appearance: The “base” case is an evenly lit and well exposed scene with scattered leaf fragments on an otherwise plain white backdrop. A “bright” and “dark” case are characterised by systematic over- or underexposure, respectively, which introduces motion blur, colour-clipped appendages, and extensive flickering and compression artefacts. In a separate well exposed recording, the clear acrylic backdrop was substituted with a printout of a highly textured forest ground to create a “noisy” case. Last, we decreased the camera distance to 100 mm at constant focal distance, effectively doubling the magnification, and yielding a “close” case, distinguished by out-of-focus workers. All recordings were captured at 25 frames per second (fps).

    The field datasets consists of video recordings of Gnathamitermes sp. desert termites, filmed close to the nest entrance in the desert of Maricopa County, Arizona, using a Nikon D850 and a Nikkor 18-105 mm lens on a tripod at camera distances between 20 cm to 40 cm. All video recordings were well exposed, and captured at 23.976 fps.

    Each video was trimmed to the first 1000 frames, and contains between 36 and 103 individuals. In total, 5000 and 1000 frames were hand-annotated for the laboratory- and field-dataset, respectively: each visible individual was assigned a constant size bounding box, with a centre coinciding approximately with the geometric centre of the thorax in top-down view. The size of the bounding boxes was chosen such that they were large enough to completely enclose the largest individuals, and was automatically adjusted near the image borders. A custom-written Blender Add-on aided hand-annotation: the Add-on is a semi-automated multi animal tracker, which leverages blender’s internal contrast-based motion tracker, but also include track refinement options, and CSV export functionality. Comprehensive documentation of this tool and Jupyter notebooks for track visualisation and benchmarking is provided on the replicAnt and BlenderMotionExport GitHub repositories.

    Synthetic data generation

    Two synthetic datasets, each with a population size of 100, were generated from 3D models of \textit{Atta vollenweideri} leaf-cutter ants. All 3D models were created with the scAnt photogrammetry workflow. A “group” population was based on three distinct 3D models of an ant minor (1.1 mg), a media (9.8 mg), and a major (50.1 mg) (see 10.5281/zenodo.7849059)). To approximately simulate the size distribution of A. vollenweideri colonies, these models make up 20%, 60%, and 20% of the simulated population, respectively. A 33% within-class scale variation, with default hue, contrast, and brightness subject material variation, was used. A “single” population was generated using the major model only, with 90% scale variation, but equal material variation settings.

    A Gnathamitermes sp. synthetic dataset was generated from two hand-sculpted models; a worker and a soldier made up 80% and 20% of the simulated population of 100 individuals, respectively with default hue, contrast, and brightness subject material variation. Both 3D models were created in Blender v3.1, using reference photographs.

    Each of the three synthetic datasets contains 10,000 images, rendered at a resolution of 1024 by 1024 px, using the default generator settings as documented in the Generator_example level file (see documentation on GitHub). To assess how the training dataset size affects performance, we trained networks on 100 (“small”), 1,000 (“medium”), and 10,000 (“large”) subsets of the “group” dataset. Generating 10,000 samples at the specified resolution took approximately 10 hours per dataset on a consumer-grade laptop (6 Core 4 GHz CPU, 16 GB RAM, RTX 2070 Super).

    Additionally, five datasets which contain both real and synthetic images were curated. These “mixed” datasets combine image samples from the synthetic “group” dataset with image samples from the real “base” case. The ratio between real and synthetic images across the five datasets varied between 10/1 to 1/100.

    Funding

    This study received funding from Imperial College’s President’s PhD Scholarship (to Fabian Plum), and is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement No. 851705, to David Labonte). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  13. Synthetic Photogrammetric Survey Dateset

    • data.europa.eu
    unknown
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Synthetic Photogrammetric Survey Dateset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15402118?locale=fr
    Explore at:
    unknown(7999)Available download formats
    Dataset updated
    May 13, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a set of synthetic images, used to simulate a photogrammetric survey of everyday and cultural heritage objects. This dataset has been used to study the influence of specular reflections on the photgrammetric reconstruction process, and to train a neural model to remove specular highlights from photos, to improve the photogrammetric reconstruction. The 3D assets have been prepared to have a coherent position orientation, scale, materials, naming, and stored in individual blender files. For each asset, another blender file was created, with a configurable camera and lighting rig, to simulate camera placement and ligthing used in a photogrammetric survey. Rendering was configured to produce simultaneously 3 images: a complete rendering, a diffuse-only rendering, and a specular-only rendering. In this way, for each asset, it was possible to generate renderings compatible in terms of framing, resolution, lighting and appearance, to real-world photogrammetric surveys. The image triples were used to train a neural network to remove specular highlights from photos. This dataset comprises 24 assets, 144 rendered images per asset (3456 in total). This dataset has been prepared by the Visual Computing Lab, CNR-ISTI (https://vcg.isti.cnr.it) Authors: Marco Callieri, Daniela Giorgi, Massimiliano Corsini, Marco Sorrenti. For more info: callieri@isti.cnr.it ---------------------------------------------------------------- DATASET CONTENT: The dataset contains two main folders, plus readme info file. --IMAGES The rendered images, simulating the photogrammetric survey. One folder per asset, folder is named as the asset. For each asset, 48 views, 3 render per view: ASSETNAME_CXXX combined rendering image ASSETNAME_DXXX diffuse-only image ASSETNAME_SXXX specular-only image In total, 144 images per folder, 2000x2000 resolution. PNG format, with transparency channel as mask. --SCENES_ASSETS The 3D assets and Blender scenes. 00_BASE.blend is the basic empty scene file, used for the inital setup of the rendering process and the lighting/camera rigs. For each asset there is a blender file, named as the asset, containing the camera/lighting setup and rendering configuration to produce the combined/diffuse/specular images. Each of these blender files references its relative 3D asset from another Blender file from one of the subfolders. The prepared 3D assets are stored in the subfolders named as the asset. Each asset subfolder contains the original 3D model and texture, plus a Blender scene containing the ready-to-use asset with a standardized position, orientation, scale, naming, material and settings. ---------------------------------------------------------------- The 3D models included in this dataset have been sourced from Sketchfab (https://sketchfab.com/) and a private repository (https://vcg.isti.cnr.it). All objects had an open license, and were marked as "usable for AI applications". adidas "Scanned Adidas Sports Shoe" (https://skfb.ly/QZo9) by 3Digify is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) baby "Baby Waiting For Birth" (https://skfb.ly/6WBwn) by Tore Lysebo is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/)ù bfvase "Black-Figure Neck Amphora, c. 540 BCE" (https://skfb.ly/oOysJ) by Minneapolis Institute of Art is licensed under Creative Commons Attribution-ShareAlike (http://creativecommons.org/licenses/by-sa/4.0/) birdvase "Vessel in the Form of a Bird, 100 BCE - 600 CE" (https://skfb.ly/oOzUH) by Minneapolis Institute of Art is licensed under Creative Commons Attribution-ShareAlike (http://creativecommons.org/licenses/by-sa/4.0/) boot1 "Caterpillar Work Boot" (https://skfb.ly/o8KnO) by inciprocal is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) brezel "Laugenbrezel" (https://skfb.ly/o99yz) by svnfbgr is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) cabbage "Cabbage" (https://skfb.ly/6Z6EK) by Meerschaum Digital is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) conga "African Drum (raw scan)" (https://skfb.ly/6VwHK) by Piotr Lezanski is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) dwarf Visual Computing Lab, ISTI-CNR (https://vcg.isti.cnr.it), under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) elephant "Wooden Elephant Scan | Game-ready asset" (https://skfb.ly/6XvpQ) by Photogrammetry Guy is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) gator "Crocodile dog toy 3D scan" (https://skfb.ly/otSAq) by LukaszRyz is licensed under Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) grinder "Angle_grinder DeWalt D28136" (https://skfb.ly/oOFvv) by Den.Prodan is licensed under Cre

  14. h

    SynWBM

    • huggingface.co
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antal Bejczy Center for Intelligent Robotics (2025). SynWBM [Dataset]. http://doi.org/10.57967/hf/6084
    Explore at:
    Dataset updated
    Sep 23, 2025
    Dataset authored and provided by
    Antal Bejczy Center for Intelligent Robotics
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    The SynWBM (Synthetic White Button Mushrooms) Dataset!

    Synthetic dataset of white button mushrooms (Agaricus bisporus) with instance segmentation masks and depth maps.

      Dataset Summary
    

    The SynWBM Dataset is a collection of synthetic images of white button mushroom. The dataset incorporates rendered (using Blender) and generated (using Stable Diffusion XL) synthetic images for training mushroom segmentation models. Each image is annotated with instance segmentation masks… See the full description on the dataset page: https://huggingface.co/datasets/ABC-iRobotics/SynWBM.

  15. Synthetic LEGO Images (Images6)

    • kaggle.com
    zip
    Updated May 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MARWIN (2020). Synthetic LEGO Images (Images6) [Dataset]. https://www.kaggle.com/marwin1665/images6
    Explore at:
    zip(1221999956 bytes)Available download formats
    Dataset updated
    May 7, 2020
    Authors
    MARWIN
    Description

    Description

    This is a collection of synthetically generated LEGO images. The images were generated with Blender. Each image has a resolution of 800x600px. The dataset contains approximately 9 LEGO's per image. This results in 11520 individual LEGO's for training and 2295 LEGO's for validation. All LEGO's are not adjacent. Meaning, they do not overlap. There are a total of 14 different LEGO types in this collection.

    Notebook

    This dataset was generated in connection with the following project:

    Convolutional Neural Network to detect LEGO Bricks. (https://github.com/deeepwin/lego-cnn)

    The notebook runs on Colab, but can be easly adjusted to run on Kaggle.

    Annotation

    Annotation is compatible with VGG Image Annotator (https://gitlab.com/vgg/via).

  16. r

    Extended Unimelb Corridor

    • research-repository.rmit.edu.au
    • researchdata.edu.au
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debaditya Acharya; KOUROSH KHOSHELHAM (2023). Extended Unimelb Corridor [Dataset]. http://doi.org/10.25439/rmt.21901305.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    RMIT University
    Authors
    Debaditya Acharya; KOUROSH KHOSHELHAM
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    We used a subset of synthetic images that only contained forward-looking images from the original Unimelb corridor dataset, and removed the additional images that were generated by rotating the camera along the X and Y axes. To compensate for the low number of synthetic images, we generated 900 more images along the original trajectory by reducing the spacing between the consecutive images, which finally resulted in 1400 images for the synthetic dataset. The dataset also contains 950 real images and their corresponding groundtruth camera poses in the BIM coordinate system. We removed some of the redundant images (100) at the end of the trajectory and added another 500 new real images, which resulted in 1350 real images. The synthetic and real cameras have identical intrinsic camera parameters, with an image resolution of 640 x 480 pixels.

    Additionally, the provided Blender files can be used to render the images. Please note that SynCar dataset should be rendered with Blender 2.78 only, whereas SynPhoReal and SynPhoRealTex images can be generated using the latest Blender 3.4.

    [1] Acharya, D., Khoshelham, K. and Winter, S., 2019. BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS Journal of Photogrammetry and Remote Sensing, 150, pp.245-258.

    [2] Acharya, D., Singha Roy, S., Khoshelham, K. and Winter, S., 2020. A recurrent deep network for estimating the pose of real indoor images from synthetic image sequences. Sensors, 20(19), p.5492.

    [3] Acharya, D., Tennakoon, R., Muthu, S., Khoshelham, K., Hoseinnezhad, R. and Bab-Hadiashar, A., 2022. Single-image localisation using 3D models: Combining hierarchical edge maps and semantic segmentation for domain adaptation. Automation in Construction, 136, p.104152.

  17. Model parameters and specifications.

    • plos.figshare.com
    xls
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kimia Aghamohammadesmaeilketabforoosh; Joshua Parfitt; Soodeh Nikan; Joshua M. Pearce (2025). Model parameters and specifications. [Dataset]. http://doi.org/10.1371/journal.pone.0322189.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kimia Aghamohammadesmaeilketabforoosh; Joshua Parfitt; Soodeh Nikan; Joshua M. Pearce
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The aim of this study was to train a Vision Transformer (ViT) model for semantic segmentation to differentiate between ripe and unripe strawberries using synthetic data to avoid challenges with conventional data collection methods. The solution used Blender to generate synthetic strawberry images along with their corresponding masks for precise segmentation. Subsequently, the synthetic images were used to train and evaluate the SwinUNet as a segmentation method, and Deep Domain Confusion was utilized for domain adaptation. The trained model was then tested on real images from the Strawberry Digital Images dataset. The performance on the real data achieved a Dice Similarity Coefficient of 94.8% for ripe strawberries and 94% for unripe strawberries, highlighting its effectiveness for applications such as fruit ripeness detection. Additionally, the results show that increasing the volume and diversity of the training data can significantly enhance the segmentation accuracy of each class. This approach demonstrates how synthetic datasets can be employed as a cost-effective and efficient solution for overcoming data scarcity in agricultural applications.

  18. LAS&T: Large Shape And Texture Dataset

    • kaggle.com
    zip
    Updated Aug 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagi Eppel (2025). LAS&T: Large Shape And Texture Dataset [Dataset]. https://www.kaggle.com/datasets/sagieppel/las-and-t-large-shape-and-texture-dataset
    Explore at:
    zip(50958840696 bytes)Available download formats
    Dataset updated
    Aug 2, 2025
    Authors
    Sagi Eppel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Large Shape And Texture Dataset (LAS&T)

    LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D scenes, with 650,000 images, based on real world shapes and textures.

    Overview

    The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes. Main Project Page

    The dataset is composed of 4 parts:

    3D shape recognition and retrieval. 2D shape recognition and retrieval. 3D Materials recognition and retrieval. 2D Texture recognition and retrieval.

    Additional assets is as set of 350,000 natural 2D shapes extracted from real world images (SHAPES_COLLECTION_350k.zip)

    Each can be trained and tested independently.

    Shapes Recognition and Retrieval:

    For shape recognition the goal is to identify the same shape in different images, where the material/texture/color of the shape is changed, the shape is rotated, and the background is replaced. Hence, only the shape remains the same in both images. Note that this means the model can't use any contextual cues and most rely on the shape information alone.

    File structure:

    All jpg images that are in the exact same subfolder contain the exact same shape (but with different texture/color/background/orientation).

    Textures and Materials Recognition and Retrieval

    For texture and materials, the goal is to identify and match images containing the same material or textures, however the shape/object on which the material texture is applied is different, and so is the background and light. Removing contextual clues and forcing the model to use only the texture/material for the recognition process.

    File structure:

    All jpg images that are in the exact same subfolder contain the exact same texture/material (but overlay on different objects with different background/and illumination/orientation).

    Data Generation:

    The images in the synthetic part of the dataset were created by automatically extracting shapes and textures from natural images and combining them in synthetic images. This created synthetic images that completely rely on real-world patterns, making extremely diverse and complex shapes and textures. As far as we know this is the largest and most diverse shape and texture recognition/retrieval dataset. 3D data was generated using physics-based material and rendering (blender) making the images physically grounded and enabling using the data to train for real-world examples.

    Real-world images data:

    For 3D shape recognition and retrieval, we also supply a real-world natural image benchmark. With a variety of natural images containing the exact same 3D shape but made/coated with different materials and in different environments and orientations. The goal is again to identify the same shape in different images.

    File structure:

    File containing the word 'synthetic' contains synthetic images that can be used for training or testing, the type of data (2D shapes, 3D shapes, 2D textures, 3D materials) appears in the file name, as well as the number of images. Files containing "MULTI TESTS" in their name, contains various of small tests (500 images) that can be used to test some how single variation effect the recognition the recognition (orientation/background), and are less suitable for general training or testing.

    Supporting Scripts

    The file Files starting with "Scripts" contains the scripts used to generate the dataset and the scripts used to evaluate various of LVLMs on this dataset.

    Shapes Collections

    The file SHAPES_COLLECTION_350k.zip contains 350,000 2D shapes extracted from natural images and used for the dataset generation.

    Evaluating and Testing

    For evaluating and testing see: SCRIPTS_Testing_LVLM_ON_LAST_VQA.zip This can be use to test leading LVLMs using api, create human tests, and in general turn the dataset into multichoice question images similar to the one in the paper.

  19. ENVISET

    • kaggle.com
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Napolano (2025). ENVISET [Dataset]. https://www.kaggle.com/datasets/giuseppenapolano/enviset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Giuseppe Napolano
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    ENVISET is a dataset of synthetic images of the satellite ENVISAT, generated in Blender, for CNN-based satellite pose estimation tasks. The dataset includes images for training and testing pose estimation algorithms employing CNNs; for each images, relative position and attitude labels are provided.

    Training - A set of 20000 images of the satellite, fully visible or truncated in the Field of View, with randomized relative position, relative attitude and illumination. Earth is present in half of the images.

    Testing - Dataset A - A set of 4000 images of the satellite, fully visible or truncated in the Field of View, with randomized relative position, relative attitude and illumination. Earth is present in half of the images. Relative position and attitude labels are provided. - Dataset B - A set of 3928 images acquired along a monitoring trajectory around the satellite, with an acquisition rate of 1 Hz, at distances between 25 m and 57 m; ENVISAT is spinning at 0.4°/s. - Dataset C.1 - A set of 3928 images acquired along a monitoring trajectory around the satellite, with an acquisition rate of 1 Hz, at distances between 25 m and 57 m; ENVISAT is tumbling with an initial angular velocity of 0.3°/s around each axis. - Dataset C.2 - A set of 3928 images acquired along a monitoring trajectory around the satellite, with an acquisition rate of 1 Hz, at distances between 25 m and 57 m; ENVISAT is tumbling with an initial angular velocity of 0.58°/s around each axis. - Dataset D - A set of 799 images acquired along an approach trajectory towards the satellite, with an acquisition rate of 1 Hz, from 70 m to 25 m; ENVISAT is spinning at 0.4°/s.

  20. Synthetic Pavement Crack Dataset for Object Detection

    • figshare.com
    zip
    Updated Apr 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oliver Macnaughton (2025). Synthetic Pavement Crack Dataset for Object Detection [Dataset]. http://doi.org/10.6084/m9.figshare.28781687.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 12, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Oliver Macnaughton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset containing 3,346 synthetically generated RGB images of road segments with cracks. Road segments and crack formations created in Blender, data collected in Microsoft AirSim. Data is split into train (~70%), test (~15%), and validation (~15%) folders. Contains ground truth bounding boxes labelling cracks in both YOLO and COCO JSON format.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mantas Gribulis (2021). Synthetic Lego brick dataset for object detection [Dataset]. https://www.kaggle.com/datasets/mantasgr/synthetic-lego-brick-dataset-for-object-detection/code
Organization logo

Synthetic Lego brick dataset for object detection

Lego dataset with tutorials for Blender, YoloV5 and SSD multi-object detection

Explore at:
zip(54438517 bytes)Available download formats
Dataset updated
Nov 15, 2021
Authors
Mantas Gribulis
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

This a Lego bricks image dataset that is annotated in a PASCAL VOC format ready for ML object detection pipeline. Additionally I made tutorials on how to: - Generate synthetic images and create bounding box annotations in Pascal VOC format using Blender. - Train ML models (YoloV5 and SSD) for detecting multiple objects in an image. The tutorial with Blender scripts for rendering the dataset and Jupyter notebooks for training ML models can be found here: https://github.com/mantyni/Multi-object-detection-lego

Content

Dataset contains: Lego Brick images in JPG format, 300x300 resolution Annotations in PASCAL-VOC format There's 6 Lego bricks in this dataset, each appearing approximately 600 times across the dataset: brick_2x2, brick_2x4, brick_1x6, plate_1x2, plate_2x2, plate_2x4

Lego brick 3D models obtained from: Mecabricks - https://www.mecabricks.com/

First 500 images are of individual Lego bricks rendered in different angles and backgrounds. Images afterwards are of multiple bricks. Each image is rendered using different backgrounds, brick colour and shadow variations to enable Sim2Real transfer. After training ML (YoloV5 and SSD) models on synthetic dataset I then tested it on real images achieving ~70% detection accuracy.

Inspiration

The main purpose of this project is to show how to create your own realistic synthetic image datasets for training computer vision models without needing real world data.

Search
Clear search
Close search
Google apps
Main menu