100+ datasets found

R
Image Augmentation Dataset
universe.roboflow.com
zip
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Augmentation (2024). Image Augmentation Dataset [Dataset]. https://universe.roboflow.com/data-augmentation-d7svr/image-augmentation-4ax9o
Explore at:
zipAvailable download formats
Dataset updated
Apr 2, 2024
Dataset authored and provided by
Data Augmentation
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Fractured Bounding Boxes
Description
Image Augmentation

## Overview Image Augmentation is a dataset for object detection tasks - it contains Fractured annotations for 702 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
u
Variable Message Signal annotated images for object detection
portalcientifico.universidadeuropea.com
zenodo.org
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
De Las Heras De Matías, Gonzalo; Sánchez-Soriano, Javier; Puertas, Enrique; De Las Heras De Matías, Gonzalo; Sánchez-Soriano, Javier; Puertas, Enrique (2022). Variable Message Signal annotated images for object detection [Dataset]. https://portalcientifico.universidadeuropea.com/documentos/668fc42eb9e7c03b01bd5abc?lang=en
Explore at:
Dataset updated
2022
Authors
De Las Heras De Matías, Gonzalo; Sánchez-Soriano, Javier; Puertas, Enrique; De Las Heras De Matías, Gonzalo; Sánchez-Soriano, Javier; Puertas, Enrique
Description
If you use this dataset, please cite this paper: Puertas, E.; De-Las-Heras, G.; Sánchez-Soriano, J.; Fernández-Andrés, J. Dataset: Variable Message Signal Annotated Images for Object Detection. Data 2022, 7, 41. https://doi.org/10.3390/data7040041 This dataset consists of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Also, a CSV file is attached with information regarding the geographic position, the folder where the image is located, and the text in Spanish. This can be used to train supervised learning computer vision algorithms, such as convolutional neural networks. Throughout this work, the process followed to obtain the dataset, image acquisition, and labeling, and its specifications are detailed. The dataset is constituted of 1216 instances, 888 positives, and 328 negatives, in 1152 jpg images with a resolution of 1280x720 pixels. These are divided into 576 real images and 576 images created from the data-augmentation technique. The purpose of this dataset is to help in road computer vision research since there is not one specifically for VMSs. The folder structure of the dataset is as follows: vms_dataset/ data.csv real_images/ imgs/ annotations/ data-augmentation/ imgs/ annotations/ In which: data.csv: Each row contains the following information separated by commas (,): image_name, x_min, y_min, x_max, y_max, class_name, lat, long, folder, text. real_images: Images extracted directly from the videos. data-augmentation: Images created using data-augmentation imgs: Image files in .jpg format. annotations: Annotation files in .xml format.
Yolo tiger and lion labelled detection
kaggle.com
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junkie75 (2024). Yolo tiger and lion labelled detection [Dataset]. https://www.kaggle.com/datasets/junkie75/yolo-tiger-and-lion-labelled-detection/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 10, 2024
Dataset provided by
Kaggle
Authors
Junkie75
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains images of lions and tigers sourced from the Open Images Dataset V6 and labeled specifically for object detection using the YOLO format. The dataset focuses on two classes: lion and tiger, with annotations provided for each image in a YOLO-compatible .txt file format. This dataset is ideal for training machine learning models for wildlife detection and classification tasks, particularly in distinguishing between these two majestic big cats. Key Features:

Classes: Lion and Tiger Annotations: YOLO format, with bounding box coordinates and class labels provided in separate .txt files for each image. Source: Images sourced from Open Images Dataset V6, which is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0). Application: Suitable for object detection models like YOLO, SSD, or Faster R-CNN.

Usage:

The dataset can be used for training, validating, or testing object detection models. Each image is accompanied by a corresponding YOLO annotation file, making it easy to integrate into any YOLO-based pipeline. Attribution:

This dataset is derived from the Open Images Dataset V6, and proper attribution must be given. Please credit the Open Images Dataset when using or sharing this dataset in any format.
m
SyntheticIndoorObjectDetectionDataset
data.mendeley.com
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Fahad (2025). SyntheticIndoorObjectDetectionDataset [Dataset]. http://doi.org/10.17632/nnph98d3kc.2
Explore at:
Unique identifier
https://doi.org/10.17632/nnph98d3kc.2
Dataset updated
Mar 25, 2025
Authors
Nafiz Fahad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset was collected from the MyNursingHome dataset, available at https://data.mendeley.com/datasets/fpctx3svzd/1 , and curated to develop a synthetic indoor object detection dataset for autonomous mobile robots, or robots, for supporting researchers in detecting and classifying objects for computer vision and pattern recognition. From the original dataset containing 25 object categories, we selected six key categories—basket bin (499 images), sofa (499 images), human (499 images), table (500 images), chair (496 images), and door (500 images). Initially, we collected a total of 2,993 images from these categories; however, during the annotation process using Roboflow, we rejected 1 sofa, 10 tables, 9 chairs, and 12 door images due to quality concerns, such as poor image resolution or difficulty in identifying the object, resulting in a final dataset of 2,961 images. To ensure an effective training pipeline, we divided the dataset into 70% training (2,073 images), 20% validation (591 images), and 10% test (297 images). Preprocessing steps included auto-orientation and resizing all images to 640×640 pixels to maintain uniformity. To improve generalization for real-world applications, we applied data augmentation techniques, including horizontal and vertical flipping, 90-degree rotations (clockwise, counter-clockwise, and upside down), random rotations within -15° to +15°, shearing within ±10° horizontally and vertically, and brightness adjustments between -15% and +15%. This augmentation process expanded the dataset to 7,107 images, with 6,219 images for training (88%), 597 for validation (8%), and 297 for testing (4%). Moreover, this well-annotated, preprocessed, and augmented dataset significantly improves object detection performance in indoor settings.
A dataset for window and blind states detection
figshare.com
bin
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seunghyeon Wang (2024). A dataset for window and blind states detection [Dataset]. http://doi.org/10.6084/m9.figshare.26403004.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26403004.v1
Dataset updated
Aug 5, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Seunghyeon Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data was constructed for detecting window and blind states. All images were annotated in XML format using LabelImg for object detection tasks. The results of applying the Faster R-CNN based model include detected images and loss graphs for both training and validation in this dataset. Additionally, the raw data with other annotations can be used for applications such as semantic segmentation and image captioning.
R
Hard Hat Workers Object Detection Dataset - resize-416x416-reflectEdges
public.roboflow.com
zip
Updated Sep 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Northeastern University - China (2022). Hard Hat Workers Object Detection Dataset - resize-416x416-reflectEdges [Dataset]. https://public.roboflow.com/object-detection/hard-hat-workers/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 30, 2022
Dataset authored and provided by
Northeastern University - China
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Bounding Boxes of Workers
Description
Overview

The Hard Hat dataset is an object detection dataset of workers in workplace settings that require a hard hat. Annotations also include examples of just "person" and "head," for when an individual may be present without a hard hart.

The original dataset has a 75/25 train-test split.

Example Image: https://i.imgur.com/7spoIJT.png" alt="Example Image">

Use Cases

One could use this dataset to, for example, build a classifier of workers that are abiding safety code within a workplace versus those that may not be. It is also a good general dataset for practice.

Using this Dataset

Use the fork or Download this Dataset button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

Dataset Versions:

Image Preprocessing | Image Augmentation | Modify Classes * v1 (resize-416x416-reflect): generated with the original 75/25 train-test split | No augmentations * v2 (raw_75-25_trainTestSplit): generated with the original 75/25 train-test split | These are the raw, original images * v3 (v3): generated with the original 75/25 train-test split | Modify Classes used to drop person class | Preprocessing and Augmentation applied * v5 (raw_HeadHelmetClasses): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class * v8 (raw_HelmetClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and person classes * v9 (raw_PersonClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and helmet classes * v10 (raw_AllClasses): generated with a 70/20/10 train/valid/test split | These are the raw, original images * v11 (augmented3x-AllClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied | 3x image generation | Trained with Roboflow's Fast Model * v12 (augmented3x-HeadHelmetClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Fast Model * v13 (augmented3x-HeadHelmetClasses-AccurateModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Accurate Model * v14 (raw_HeadClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class, and remap/relabel helmet class to head

Choosing Between Computer Vision Model Sizes | Roboflow Train

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
r
Hat Data Augmentation Dataset
universe.roboflow.com
zip
Updated May 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data augmentation (2023). Hat Data Augmentation Dataset [Dataset]. https://universe.roboflow.com/data-augmentation-a0ako/hat-data-augmentation
Explore at:
zipAvailable download formats
Dataset updated
May 28, 2023
Dataset authored and provided by
data augmentation
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Hat Person Bounding Boxes
Description
Hat Data Augmentation

## Overview Hat Data Augmentation is a dataset for object detection tasks - it contains Hat Person annotations for 3,213 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
ROAD OBSTACLES.zip Road Obstacles for Training DL Models
figshare.com
zip
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pison mutabarura; Nicasio Maguu Muchuka; Davies Rene Segera (2024). ROAD OBSTACLES.zip Road Obstacles for Training DL Models [Dataset]. http://doi.org/10.6084/m9.figshare.27909219.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27909219.v1
Dataset updated
Nov 26, 2024
Dataset provided by
figshare
Authors
pison mutabarura; Nicasio Maguu Muchuka; Davies Rene Segera
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Augmented custom dataset with images sourced from online sources and camera captures. The dataset was used to train YOLO models for road obstacle detection on African roads specificallly.

Traffic Road Object Detection Polish 12k

kaggle.com

Updated Aug 9, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mikołaj Kołek (2024). Traffic Road Object Detection Polish 12k [Dataset]. https://www.kaggle.com/datasets/mikoajkoek/traffic-road-object-detection-polish-12k

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 9, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Mikołaj Kołek

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset contains annotated images of Polish roads, specifically curated for object detection tasks. The data was collected using a car camera on roads in Poland, primarily in Kraków. The images capture a diverse range of scenarios, including different road types and various lighting conditions (day and night).

Classes:

Car (Vehicles without a trailer)
Different-Traffic-Sign (Other traffic signs than warning and prohibition signs, mostly information and order signs)
Green-Traffic-Light (Green traffic lights for cars only; green lights for pedestrians are not annotated)
Motorcycle
Pedestrian (People and cyclists)
Pedestrian-Crossing (Pedestrian crossings)
Prohibition-Sign (All prohibition signs)
Red-Traffic-Light (Red traffic lights for cars only; lights for pedestrians are not annotated)
Speed-Limit-Sign (Speed limit signs)
Truck (Vehicles with a trailer)
Warning-Sign (Warning signs)

Annotation Process:

Annotations were carried out using Roboflow. A total of 2,000 images were manually labeled, while an additional 9,000 images were generated through data augmentation. The labeled techniques applied were crop, saturation, brightness, and exposure adjustments.

Image Statistics Before Data Augmentation:

Approximately

400 cars per 100 photos
30 different-traffic-signs per photos
80 red-traffic-lights per photos
70 pedestrians per photos
50 warning signs per photos
50 pedestrian-crossings per photos
40 green-traffic-lights per photos
40 prohibition signs per photos
40 trucks per photos
20 speed-limit-signs per photos
2 motorcycles per photos

The photos were taken on both normal roads and highways, under various conditions, including day and night. All photos were initially 1920x1080 pixels. After cropping, some images may be slightly smaller. No preprocessing steps were applied to the photos.

Annotations are provided in YOLO format.

Image Statistics Before Data Augmentation:

Set	Photos	Car	Different-Traffic-Sign	Red-Traffic-Light	Pedestrian	Warning-Sign	Pedestrian-Crossing	Green-Traffic-Light	Prohibition-Sign	Truck	Speed-Limit-Sign	Motorcycle
Test Set	166	687	547	163	137	79	82	52	48	66	22	4
Train Set	1178	4766	3370	805	812	544	476	402	396	409	230	38
Validation Set	327	1343	945	232	228	163	112	87	112	137	59	10

Image Statistics After Data Augmentation:

Set	Photos	Car	Different-Traffic-Sign	Red-Traffic-Light	Pedestrian	Warning-Sign	Pedestrian-Crossing	Green-Traffic-Light	Prohibition-Sign	Truck	Speed-Limit-Sign	Motorcycle
Test Set	996	4122	3282	978	822	474	492	312	288	396	132	24
Train Set	7068	28596	20220	4830	4872	3264	2856	2412	2376	2454	1380	228
Validation Set	1962	8058	5670	1392	1368	978	672	522	672	822	354	60

Training dataset for object detection - Penguins from UAV
data.aad.gov.au
data.gov.au
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BELYAEV, OLEG (2023). Training dataset for object detection - Penguins from UAV [Dataset]. http://doi.org/10.26179/s10z-da41
Explore at:
Unique identifier
https://doi.org/10.26179/s10z-da41
Dataset updated
Feb 21, 2023
Dataset provided by
Australian Antarctic Divisionhttps://www.antarctica.gov.au/
Australian Antarctic Data Centre
Authors
BELYAEV, OLEG
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 8, 2021
Area covered

Description
On February 8, 2021, Deception Island Chinstrap penguin colonies were photographed during the PiMetAn Project XXXIV Spanish Antarctic campaign using unmanned aerial vehicles (UAV) at a height of 30m. From the obtained imagery, a training dataset for penguin detection from aerial perspective was generated.

The penguin species is the Chinstrap penguin (Pygoscelis antarcticus).

The dataset consists of three folders: "train", containing 531 images, intended for model training; "valid", containing 50 images, intended for model validation; and "test", containing 25 images, intended for model testing. In each of the three folders, an additional .csv file is located, containing labels (x,y positions and class names for every penguin in the images), annotated in Tensorflow Object Detection format.

There is only one annotation class: Penguin.

All 606 images are 224x224 px in size, and 96 dpi.

The following augmentation was applied to create 3 versions of each source image: * Random shear of between -18° to +18° horizontally and -11° to +11° vertically

This dataset was annotated and exported via www.roboflow.com

The model Faster R-CNN64 with ResNet-101 backbone was used to perform object detection tasks. Training and evaluation tasks were performed using the TensorFlow 2.0 machine learning platform by Google.
m
MangoImageBD: An Extensive Image Dataset of Common and Popular Mango...
data.mendeley.com
Updated Dec 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Hasanul Ferdaus (2024). MangoImageBD: An Extensive Image Dataset of Common and Popular Mango Varieties in Bangladesh for Identification and Classification [Dataset]. http://doi.org/10.17632/hp2cdckpdr.2
Explore at:
Unique identifier
https://doi.org/10.17632/hp2cdckpdr.2
Dataset updated
Dec 10, 2024
Authors
Md Hasanul Ferdaus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
Type of data: 504 x 1120 px mango images. Data format: JPEG. Contents of the dataset: Images (original, processed, and augmented) of common and popular varieties of mangoes in Bangladesh.

Number of classes: Fifteen (15) common and popular varieties of mangoes in Bangladesh - (1) Amrapali, (2) Ashshina Classic, (3) Ashshina Zhinuk, (4) Banana Mango, (5) Bari-4, (6) Bari-11, (7) Fazli Classic, (8) Fazli Shurmai, (9) Gourmoti, (10) Harivanga, (11) Himsagor, (12) Katimon, (13) Langra, (14) Rupali, and (15) Shada.

Number of images: Total number of images in the dataset: 28,515. (1) Total original (raw) images of mango cultivars (MangoOriginal) = 5,703, (2) Total processed images with a blend of both real and virtual backgrounds (MangoRealVirtual) = 5,703, and (3) Total augmented images (MangoAugmented)= 17,109.

Distribution of instances: (1) Original (raw) images in each class of the mango cultivars (MangoOriginal): Amrapali = 135, Ashshina Classic = 571, Ashshina Zhinuk = 1,286, Banana Mango = 83, Bari-4 = 74, Bari-11 = 1,244, Fazli Classic = 171, Fazli Shurmai = 247, Gourmoti = 630, Harivanga = 265, Himsagor = 106, Katimon = 424, Langra = 120, Rupali = 184, and Shada = 163. (2) Processed images with a blend of both real and virtual backgrounds for each class of the mango cultivars (MangoRealVirtual): Amrapali = 135, Ashshina Classic = 571, Ashshina Zhinuk = 1,286, Banana Mango = 83, Bari-4 = 74, Bari-11 = 1,244, Fazli Classic = 171, Fazli Shurmai = 247, Gourmoti = 630, Harivanga = 265, Himsagor = 106, Katimon = 424, Langra = 120, Rupali = 184, and Shada = 163. (3) Augmented images for each class of the mango cultivars (MangoAugmented): Amrapali = 405, Ashshina Classic = 1,713, Ashshina Zhinuk = 3,858, Banana Mango = 249, Bari-4 = 222, Bari-11 = 3,732, Fazli Classic = 513, Fazli Shurmai = 741, Gourmoti = 1,890, Harivanga = 795, Himsagor = 318, Katimon = 1,272, Langra = 360, Rupali = 552, and Shada = 489.

Dataset size: Total size of the dataset = 1.35 GB and the compressed ZIP file size = 1.16 GB.

Data acquisition process: Images of various mango varieties are captured through high-definition smartphone cameras focusing from different angles.

Data source location: Local wholesale and retail fruit markets located in six geographically distributed districts of Bangladesh, namely Chapai Nawabganj, Dhaka, Panchagarh, Rajshahi, Rangpur, and Satkhira which are renowned for diverse mango cultivation and availability.

Where applicable: Training and evaluating machine learning and deep learning models to identify and classify mango varieties in Bangladesh which can be useful in smart horticulture, precision farming, supply chain automation, ecology and ecosystem health monitoring, and biodiversity and conservation efforts.
R
Image Augmentation And Annotation Dataset
universe.roboflow.com
zip
Updated Jun 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vico (2022). Image Augmentation And Annotation Dataset [Dataset]. https://universe.roboflow.com/vico/image-augmentation-and-annotation/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 24, 2022
Dataset authored and provided by
Vico
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects Bounding Boxes
Description
Image Augmentation And Annotation

## Overview Image Augmentation And Annotation is a dataset for object detection tasks - it contains Objects annotations for 431 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Data_Sheet_1_Inside out: transforming images of lab-grown plants for machine...
frontiersin.figshare.com
pdf
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander E. Krosney; Parsa Sotoodeh; Christopher J. Henry; Michael A. Beck; Christopher P. Bidinosti (2023). Data_Sheet_1_Inside out: transforming images of lab-grown plants for machine learning applications in agriculture.pdf [Dataset]. http://doi.org/10.3389/frai.2023.1200977.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2023.1200977.s001
Dataset updated
Jul 6, 2023
Dataset provided by
Frontiers
Authors
Alexander E. Krosney; Parsa Sotoodeh; Christopher J. Henry; Michael A. Beck; Christopher P. Bidinosti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionMachine learning tasks often require a significant amount of training data for the resultant network to perform suitably for a given problem in any domain. In agriculture, dataset sizes are further limited by phenotypical differences between two plants of the same genotype, often as a result of different growing conditions. Synthetically-augmented datasets have shown promise in improving existing models when real data is not available.MethodsIn this paper, we employ a contrastive unpaired translation (CUT) generative adversarial network (GAN) and simple image processing techniques to translate indoor plant images to appear as field images. While we train our network to translate an image containing only a single plant, we show that our method is easily extendable to produce multiple-plant field images.ResultsFurthermore, we use our synthetic multi-plant images to train several YoloV5 nano object detection models to perform the task of plant detection and measure the accuracy of the model on real field data images.DiscussionThe inclusion of training data generated by the CUT-GAN leads to better plant detection performance compared to a network trained solely on real data.
P
ELEVATER Dataset
paperswithcode.com
library.toponeai.link
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chunyuan Li; Haotian Liu; Liunian Harold Li; Pengchuan Zhang; Jyoti Aneja; Jianwei Yang; Ping Jin; Houdong Hu; Zicheng Liu; Yong Jae Lee; Jianfeng Gao, ELEVATER Dataset [Dataset]. https://paperswithcode.com/dataset/elevater
Explore at:
Authors
Chunyuan Li; Haotian Liu; Liunian Harold Li; Pengchuan Zhang; Jyoti Aneja; Jianwei Yang; Ping Jin; Houdong Hu; Zicheng Liu; Yong Jae Lee; Jianfeng Gao
Description
The ELEVATER benchmark is a collection of resources for training, evaluating, and analyzing language-image models on image classification and object detection. ELEVATER consists of:

Benchmark: A benchmark suite that consists of 20 image classification datasets and 35 object detection datasets, augmented with external knowledge Toolkit: An automatic hyper-parameter tuning toolkit; Strong language-augmented efficient model adaptation methods. Baseline: Pre-trained language-free and language-augmented visual models. Knowledge: A platform to study the benefit of external knowledge for vision problems. Evaluation Metrics: Sample-efficiency (zero-, few-, and full-shot) and Parameter-efficiency. Leaderboard: A public leaderboard to track performance on the benchmark

The ultimate goal of ELEVATER is to drive research in the development of language-image models to tackle core computer vision problems in the wild.
m
SmallFishBD: A Comprehensive Image Dataset of Common Small Fish Varieties in...
data.mendeley.com
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Hasanul Ferdaus (2024). SmallFishBD: A Comprehensive Image Dataset of Common Small Fish Varieties in Bangladesh for Species Identification and Classification [Dataset]. http://doi.org/10.17632/8jvxtvz52x.2
Explore at:
Unique identifier
https://doi.org/10.17632/8jvxtvz52x.2
Dataset updated
Nov 28, 2024
Authors
Md Hasanul Ferdaus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
Type of data: 320x320 px fish images.

Data format: JPEG.

Contents of the dataset: Varieties of small fishes in Bangladesh.

Number of classes: Ten small fish varieties - (1) Bele, (2) Nama Chanda, (3) Chela, (4) Guchi, (5) Kachki, (6) Mola, (7) Kata Phasa, (8) Pabda, (9) Puti, and (10) Tengra.

Number of images: (A) Total images in the original dataset (SmallFishBD) = 1,700. (B) Total images in the augmented dataset (Augmented SmallFishBD) = 20,400.

Distribution of instances: (A) Images in each fish category of the original dataset (SmallFishBD): Bele = 205, Nama Chanda = 110, Chela = 190, Guchi = 164, Kachki = 247, Mola = 179, Kata Phasa = 129, Pabda = 125, Puti = 218, Tengra = 133. (B) Images in each fish category of the augmented dataset (Augmented SmallFishBD): Bele = 2,460, Nama Chanda = 1,320, Chela = 2,280, Guchi = 1,968, Kachki = 2,964, Mola = 2,148, Kata Phasa = 1,548, Pabda = 1,500, Puti = 2,616, Tengra = 1,596.

Dataset size: (A) Total size of the original dataset (SmallFishBD) = 36.2 MB and the ZIP compressed size = 28.4 MB. (B) Total size of the augmented dataset (Augmented SmallFishBD) = 617 MB and the ZIP compressed size = 527 MB.

Data acquisition process: Images of various small fish categories are captured through high-definition smartphone cameras focusing from different angles.

Data source location: Local wholesale fish markets located in different areas of Dhaka, Bangladesh.

Where applicable: Training and evaluating machine learning and deep learning models to identify and classify small fish species in Bangladesh which can be useful in aquaculture development, fisheries management and sustainable fishing, ecology and ecosystem health monitoring, and biodiversity and conservation efforts.
V
Visual Search Technology Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Visual Search Technology Report [Dataset]. https://www.datainsightsmarket.com/reports/visual-search-technology-1983000
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
May 24, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The visual search technology market is experiencing robust growth, driven by the increasing adoption of e-commerce, the proliferation of smartphones with advanced camera capabilities, and the rising demand for enhanced user experiences in online shopping and information retrieval. The market, estimated at $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an estimated $75 billion by 2033. This expansion is fueled by several key trends, including the advancements in artificial intelligence (AI) and machine learning (ML) algorithms that power visual search engines, improvements in image recognition accuracy, and the integration of visual search into various applications beyond e-commerce, such as social media and augmented reality (AR) experiences. Companies like Google, Amazon, and Microsoft are heavily investing in this technology, driving further innovation and market penetration. However, challenges remain, including the need for improved data privacy measures, addressing biases in algorithms, and overcoming the limitations of handling complex visual queries. The segmentation of the visual search technology market reveals a diverse landscape. The market is categorized by technology type (image recognition, object detection, etc.), application (e-commerce, social media, healthcare, etc.), and deployment mode (cloud, on-premise, etc.). Leading players like Microsoft, Google, and Amazon leverage their extensive data resources and technological prowess to dominate the market. Smaller, specialized companies, including Clarifai, Syte, and others, are focusing on niche applications and are contributing to market innovation. Geographic growth is expected to be broadly distributed, with North America and Europe leading initially, followed by rapid expansion in Asia-Pacific and other emerging markets as internet penetration and smartphone adoption increase. The competitive landscape is dynamic, with established tech giants and innovative startups vying for market share through product differentiation, strategic partnerships, and acquisitions.
Elephant - Thermal Images
kaggle.com
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shijo john (2025). Elephant - Thermal Images [Dataset]. https://www.kaggle.com/datasets/shijo96john/elephant-thermal-images/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
shijo john
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Objective: The goal is to detect elephants in thermal images using the YOLOv8 (You Only Look Once version 8) deep learning model. This detection system can have applications in wildlife monitoring, preventing human-elephant conflict, and conservation efforts.

Fine-Tuning with YOLOv8n.pt

1.Model Selection:

○YOLOv8n (n stands for nano) was chosen due to its lightweight architecture, making it ideal for deployment in resource-constrained environments such as drones or edge devices.

2.Fine-Tuning Approach:

○Pre-trained Weights: Fine-tuning was performed using the pre-trained YOLOv8n model (yolov8n.pt), leveraging its robust feature extraction capabilities. ○Transfer Learning: The pre-trained weights were adapted to the specific task of detecting elephants in thermal imagery by training on the custom dataset.

3.Training Process:

○Epochs: Trained for a sufficient number of epochs to achieve convergence (e.g., 50–100 epochs depending on dataset size). ○Learning Rate: Adjusted learning rate using a warm-up strategy, starting with a smaller rate and increasing gradually. ○Optimizer: Utilized AdamW optimizer for faster convergence. ○Data Augmentation: Techniques like flipping, scaling, rotation, and noise injection were applied to improve the model's generalization.

4.Performance Metrics:

○mAP (mean Average Precision): Evaluated at IoU thresholds of 0.5 and 0.5:0.95. ○Precision and Recall: Analyzed to ensure minimal false positives and high detection rates.

Dataset Sourced from Roboflow

1.Source and Accessibility:

○The dataset was sourced from Roboflow, a platform providing pre-annotated thermal images of elephants.

2.Dataset Composition:

○Includes thermal images capturing elephants in diverse postures, distances, and environmental conditions. ○Balanced dataset with positive (elephant present) and negative (no elephant) samples.

3.Annotations:

○Bounding box annotations for elephant detection. ○Annotation format: YOLO-compatible (text files with class ID and bounding box coordinates).

4.Training/Validation Split:

○Split into 80% training, 10% validation, and 10% test sets.

Dataset Analysis and Resolution

1.Resolution:

○Images in the dataset have a resolution of 640x640 pixels, resized from their original dimensions to ensure compatibility with YOLOv8 input requirements.

2.Class Distribution:

○A single class for elephants (Class ID: 0). ○Total samples: ~5000 images (example count; adjust based on the actual dataset).

3.Challenges Identified:

○Low Contrast: Thermal images often have lower contrast, which can make object detection challenging. ○Noise: Some images contained thermal noise, which was mitigated during pre-processing. ○Small Object Detection: Instances of elephants far from the camera required additional focus during training.

4.Pre-processing:

○Normalized pixel values between 0 and 1. ○Noise reduction applied using Gaussian filtering. ○Images augmented with techniques such as brightness adjustment and random cropping.

5.Validation Metrics:

○Distribution across training, validation, and test sets ensured minimal class imbalance. ○Random sampling ensured diversity in angles, environments, and distances.
m
Cry, Laugh, or Angry? A Benchmark Dataset for Computer Vision-Based Approach...
data.mendeley.com
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Mehedi Hasan (2025). Cry, Laugh, or Angry? A Benchmark Dataset for Computer Vision-Based Approach to Infant Facial Emotion Recognition [Dataset]. http://doi.org/10.17632/hy969mrx9p.1
Explore at:
Unique identifier
https://doi.org/10.17632/hy969mrx9p.1
Dataset updated
Mar 10, 2025
Authors
Md. Mehedi Hasan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a meticulously curated dataset designed for infant facial emotion recognition, featuring four primary emotional expressions: Angry, Cry, Laugh, and Normal. The dataset aims to facilitate research in machine learning, deep learning, affective computing, and human-computer interaction by providing a large collection of labeled infant facial images.

Primary Data (1600 Images): - Angry: 400 - Cry: 400 - Laugh: 400 - Normal: 400

Data Augmentation & Expanded Dataset (26,143 Images): To enhance the dataset's robustness and expand the dataset, 20 augmentation techniques (including HorizontalFlip, VerticalFlip, Rotate, ShiftScaleRotate, BrightnessContrast, GaussNoise, GaussianBlur, Sharpen, HueSaturationValue, CLAHE, GridDistortion, ElasticTransform, GammaCorrection, MotionBlur, ColorJitter, Emboss, Equalize, Posterize, FogEffect, and RainEffect) were applied randomly. This resulted in a significantly larger dataset with:

Angry: 5,781

Cry: 6,930

Laugh: 6,870

Normal: 6,562

Data Collection & Ethical Considerations: The dataset was collected under strict ethical guidelines to ensure compliance with privacy and data protection laws. Key ethical considerations include: 1. Ethical Approval: The study was reviewed and approved by the Institutional Review Board (IRB) of Daffodil International University under Reference No: REC-FSIT-2024-11-10. 2. Informed Parental Consent: Written consent was obtained from parents before capturing and utilizing infant facial images for research purposes. 3. Privacy Protection: No personally identifiable information (PII) is included in the dataset, and images are strictly used for research in AI-driven emotion recognition.

Data Collection Locations & Geographical Diversity: To ensure diversity in infant facial expressions, data collection was conducted across multiple locations in Bangladesh, covering healthcare centers and educational institutions:

250-bed District Sadar Hospital, Sherpur (Latitude: 25.019405 & Longitude: 90.013733)

Upazila Health Complex, Baraigram, Natore (Latitude: 24.3083 & Longitude: 89.1700)

Char Bhabna Community Clinic, Sherpur (Latitude: 25.0188 & Longitude: 90.0175)

Jamiatul Amin Mohammad Al-Islamia Cadet Madrasa, Khagan, Dhaka (Latitude: 23.872856 & Longitude: 90.318947)

Face Detection Methodology: To extract the facial regions efficiently, RetinaNet—a deep learning-based object detection model—was employed. The use of RetinaNet ensures precise facial cropping while minimizing background noise and occlusions.

Potential Applications: 1. Affective Computing: Understanding infant emotions for smart healthcare and early childhood development. 2. Computer Vision: Training deep learning models for automated infant facial expression recognition. 3. Pediatric & Mental Health Research: Assisting in early autism screening and emotion-aware AI for child psychology. 4. Human-Computer Interaction (HCI): Designing AI-powered assistive technologies for infants.
m
Dental OPG XRAY Dataset
data.mendeley.com
Updated Aug 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rubaba Binte Rahman (2024). Dental OPG XRAY Dataset [Dataset]. http://doi.org/10.17632/c4hhrkxytw.4
Explore at:
Unique identifier
https://doi.org/10.17632/c4hhrkxytw.4
Dataset updated
Aug 27, 2024
Authors
Rubaba Binte Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes dental OPG X-rays collected from three different dental clinics. This dataset can be used for tasks like object detection, image analysis, disease classification, and segmentation. It has two folders: the object detection dataset folder and the classification dataset folder. The object detection folder contains 232 original and 604 augmented images and labels. The classification folder contains six distinct files for each class. The images are in JPG format, and the labels are in JSON format. The augmented data is split into training, validation, and testing sets in an 80:10:10 ratio.

Dataset collection: • Source: Prescription Point Ltd, Lab Aid Specialized Hospital, Ibn Sina Diagnostic and Imaging Center. • Capture Method: Using android phone camera. • Anonymization: All data were rigorously anonymized to maintain confidentiality and privacy. • Informed Consent: All patients provided their consent in accordance with the dental ethical principles.

Dataset composition: • Total Participants: 232 Male and female patients aged 10 years or older.

Variables: • Healthy Teeth: 223 • Caries: 119 • Impacted Teeth: 87 • Broken Down Crown/ Root: 52 • Infection: 23 • Fractured Teeth: 13
Urban_Tree_Detection_Dataset
kaggle.com
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mcii34 (2024). Urban_Tree_Detection_Dataset [Dataset]. https://www.kaggle.com/datasets/mcii34/urbantree-subset-public
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mcii34
Description
The tree_detection_dataset is a subset of the original tree classification dataset, accessible at this link. This dataset includes only the original images, each with a resolution of 1200x1600. A further refined subset, yolo11, includes bounding box annotations specifically for object detection. All augmentations and processing were performed using Roboflow.

Dataset Configuration

The dataset comprises 2,716 images, divided as follows: - Training Set: 87% (2,376 images) - Validation Set: 8% (227 images) - Test Set: 5% (113 images)

To ensure uniformity in aspect ratio, all images have been resized to 640x640 pixels using a "fit" resizing approach, which may introduce black edges. Additionally, auto-orientation has been applied for consistency.

Data Preprocessing and Augmentation

Several augmentation techniques have been applied to enhance model robustness and generalization. Each training example generates three augmented outputs, expanding the diversity of the dataset. The augmentations include:

Flip: Horizontal flipping.

Rotation:

90° rotations: Clockwise, counterclockwise, and upside-down.

Minor random rotations: Between -3° and +3° for slight variations.

Grayscale: Applied to 12% of images to simulate different lighting conditions.

Blur: Up to 1.5 pixels to mimic focus variations.

Noise: Added to up to 0.1% of pixels to introduce subtle distortions.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data Augmentation (2024). Image Augmentation Dataset [Dataset]. https://universe.roboflow.com/data-augmentation-d7svr/image-augmentation-4ax9o

Image Augmentation Dataset

image-augmentation-4ax9o

image-augmentation-dataset

Explore at:

zipAvailable download formats

Dataset updated

Apr 2, 2024

Dataset authored and provided by

Data Augmentation

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Fractured Bounding Boxes

Description

Image Augmentation

## Overview

Image Augmentation is a dataset for object detection tasks - it contains Fractured annotations for 702 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Clear search

Close search

Google apps

Main menu

Image Augmentation Dataset

Image Augmentation

Variable Message Signal annotated images for object detection

Yolo tiger and lion labelled detection

SyntheticIndoorObjectDetectionDataset

A dataset for window and blind states detection

Hard Hat Workers Object Detection Dataset - resize-416x416-reflectEdges

Overview

Use Cases

Using this Dataset

Dataset Versions:

About Roboflow

Hat Data Augmentation Dataset

Hat Data Augmentation

ROAD OBSTACLES.zip Road Obstacles for Training DL Models

Traffic Road Object Detection Polish 12k

Classes:

Annotation Process:

Image Statistics Before Data Augmentation:

Approximately

Image Statistics Before Data Augmentation:

Image Statistics After Data Augmentation:

Training dataset for object detection - Penguins from UAV

MangoImageBD: An Extensive Image Dataset of Common and Popular Mango...

Image Augmentation And Annotation Dataset

Image Augmentation And Annotation

Data_Sheet_1_Inside out: transforming images of lab-grown plants for machine...

ELEVATER Dataset

SmallFishBD: A Comprehensive Image Dataset of Common Small Fish Varieties in...

Visual Search Technology Report

Elephant - Thermal Images

Cry, Laugh, or Angry? A Benchmark Dataset for Computer Vision-Based Approach...

Dental OPG XRAY Dataset

Urban_Tree_Detection_Dataset

Dataset Configuration

Data Preprocessing and Augmentation

Image Augmentation Dataset

image-augmentation-4ax9o

image-augmentation-dataset

Image Augmentation