Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation. Virtual KITTI contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions. These worlds were created using the Unity game engine and a novel real-to-virtual cloning method. These photo-realistic synthetic videos are automatically, exactly, and fully annotated for 2D and 3D multi-object tracking and at the pixel level with category, instance, flow, and depth labels (cf. below for download links).
https://opensource.org/license/bsd-3-clause/https://opensource.org/license/bsd-3-clause/
Total-Text is a dataset tailored for instance segmentation, semantic segmentation, and object detection tasks, containing 1555 images with 11165 labeled objects belonging to a single class — text with text label tag. Its primary aim is to open new research avenues in the scene text domain. Unlike traditional text datasets, Total-Text uniquely includes curved-oriented text in addition to horizontal and multi-oriented text, offering diverse text orientations in more than half of its images. This variety makes it a crucial resource for advancing text-related studies in computer vision and natural language processing.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Laboro Tomato dataset comprises images capturing tomatoes in various stages of ripening, tailored for tasks involving object detection and instance segmentation. Additionally, the dataset offers two distinct subsets categorized by tomato size. These images were acquired at a local farm, utilizing two separate cameras, each contributing to varying resolutions and image quality.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Concrete Crack Segmentation Dataset comprises 458 high-resolution images accompanied by corresponding alpha maps in black and white, which signify the presence of cracks. The dataset's semantic segmentation ground truth involves two distinct classes for binary pixel-wise classification. These images were captured in diverse buildings situated at the Middle East Technical University.
In EMDS-6, there are 21 classes of environmental microorganisms (EMs). In each calss, there are 40 EM original images and their corresponding binary groud truth images. In ground truth images, the foreground is white and background is black.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The authors of the UAVid: A Semantic Segmentation Dataset for UAV Imagery dataset discussed the significance of semantic segmentation, a crucial aspect of visual scene understanding, with applications in fields such as robotics and autonomous driving. They noted that the success of semantic segmentation owes much to large-scale datasets, particularly for deep learning methods. While several datasets existed for semantic segmentation in complex urban scenes, capturing side views of objects from mounted cameras on driving cars, there was a dearth of datasets capturing urban scenes from an oblique Unmanned Aerial Vehicle (UAV) perspective. Such oblique views provide both top and side views of objects, offering richer information for object recognition. To address this gap, the authors introduced the UAVid dataset, which presented new challenges, including variations in scale, moving object recognition, and maintaining temporal consistency.
P. Dansena, S. Bag, and R. Pal, “Differentiating Pen Inks in Hand-written Bank Cheques Using Multi-Layer Perceptron”, Proc. of 7th International Conference on Pattern recognition and Machine Intelligence, Kolkata, India, December 2017. https://www.idrbt.ac.in//icid.html
This beans dataset was created to provide an open and accessible, well-labeled, sufficiently curated image dataset. This is to enable researchers to build various machine learning experiments to aid innovations that may include; bean crop disease diagnosis and spatial analysis. This beans image dataset was collected across three different classes: Healthy, Angular Leaf Spot (ALS), and Bean Rust.
CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including - 10,177 number of identities, - 202,599 number of face images, and - 5 landmark locations, 40 binary attributes annotations per image.
The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, and landmark (or facial part) localization.
Note: CelebA dataset may contain potential bias. The fairness indicators example goes into detail about several considerations to keep in mind while using the CelebA dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('celeb_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/celeb_a-2.1.0.png" alt="Visualization" width="500px">
https://spdx.org/licenses/https://spdx.org/licenses/
TICaM Real Images: A Time-of-Flight In-Car Cabin Monitoring Dataset is a time-of-flight dataset of car in-cabin images providing means to test extensive car cabin monitoring systems based on deep learning methods. The authors provide depth, RGB, and infrared images of front car cabin that have been recorded using a driving simulator capturing various dynamic scenarios that usually occur while driving. For dataset they provide ground truth annotations for 2D and 3D object detection, as well as for instance segmentation.
METU-ALET is an image dataset for the detection of the tools in the wild. The dataset has annotations for tools that belongs to the categories such as farming, gardening, office, stonemasonry, vehicle, woodworking and workshop. The images in the dataset contains a total of 22,841 bounding boxes and 49 different tool categories.
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
We introduce a new dataset for goat detection that contains 6160 annotated images captured under varying environmental conditions. The dataset is intended for developing machine learning algorithms for goat detection, with applications in precision agriculture, animal welfare, behaviour analysis, and animal husbandry. The annotations were performed by expert in this filed, ensuring high accuracy and consistency. The dataset is publicly available and can be used as a benchmark for evaluating existing algorithms. This dataset advances research in computer vision for agriculture.
The STARE (STructured Analysis of the Retina) Project was conceived and initiated in 1975 by Michael Goldbaum, M.D., at the University of California, San Diego. It was funded by the U.S. National Institutes of Health . During its history, over thirty people contributed to the project, with backgrounds ranging from medicine to science to engineering. Images and clinical data were provided by the Shiley Eye Center at the University of California, San Diego, and by the Veterans Administration Medical Center in San Diego. The contents of this web page reflect Dr.Adam Hoover's contributions.
Please find the diagnosis codes, annotation of the manifestations, mappings and some brilliant works done by experts in the following link hosted by Clemson University. https://cecas.clemson.edu/~ahoover/stare/
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset was created to provide an open-source and well-curated image dataset showing diseased and healthy cassava leaf images from Uganda. This will be used by data scientists, researchers, the wider machine learning community, and experts from other domains to conduct research into automating the identification and diagnosis of cassava crop diseases. The image dataset was collected across three different classes: Healthy, Cassava Brown Streak Disease (CBSD), and Cassava Mosaic Disease (CMD).
The dataset was created to provide an open and accessible Cocoa dataset with well-labeled, sufficiently curated, and prepared Cocoa crop imagery that will be used by data scientists, researchers, the wider machine learning community, and social entrepreneurs within Sub-saharan Africa and worldwide for various machine learning experiments so as to build solutions towards in-field Cocoa crop disease diagnosis and spatial analysis. The Cocoa dataset was collected across three classes: Healthy, Cocoa Swollen Shoot Virus Disease (CSSVD), and Anthracnose.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset was created to provide an open and accessible maize dataset with well-labeled, sufficiently curated, and prepared maize crop imagery that will be used by data scientists, researchers, the wider machine learning community, and social entrepreneurs within Sub-saharan Africa and worldwide for various machine learning experiments so as to build solutions towards infield maize crop disease diagnosis and spatial analysis. The image dataset was collected across three different classes: Healthy, Maize Streak Virus (MSV), and Maize Leaf Blight (MLB).
https://choosealicense.com/licenses/bsd/https://choosealicense.com/licenses/bsd/
ADE20K Dataset
Description
ADE20K is composed of more than 27K images from the SUN and Places databases. Images are fully annotated with objects, spanning over 3K object categories. Many of the images also contain object parts, and parts of parts. We also provide the original annotated polygons, as well as object instances for amodal segmentation. Images are also anonymized, blurring faces and license plates.
Images
MIT, CSAIL does not own the… See the full description on the dataset page: https://huggingface.co/datasets/1aurent/ADE20K.
https://polyp.grand-challenge.org/CVCClinicDB/https://polyp.grand-challenge.org/CVCClinicDB/
CVC-ClinicDB is a database of frames extracted from colonoscopy videos. These frames contain several examples of polyps. In addition to the frames, we provide the ground truth for the polyps CVC-ClinicDB is the official database to be used in the training stages of MICCAI 2015 Sub-Challenge on Automatic Polyp Detection Challenge in Colonoscopy Videos .
https://www.gnu.org/licenses/lgpl-3.0.htmlhttps://www.gnu.org/licenses/lgpl-3.0.html
Authors introduce the Tree component for classification task within The Tree Dataset of Urban Street, encompassing 4,804 high-resolution images distributed across 23 classes. With these comprehensive resources at your disposal, this subset empowers researchers and practitioners to delve deep into the detailed analysis of urban street greenery, offering a valuable resource for comprehensive instance segmentation studies. Automatic tree species identification can be used to realize autonomous street tree inventories and help people without botanical knowledge and experience to better understand the diversity and regionalization of different urban landscapes.
https://opendatacommons.org/licenses/dbcl/1-0/https://opendatacommons.org/licenses/dbcl/1-0/
The author of the dataset was engaged in a project related to weapon detection in CCTV footage and encountered difficulties in finding a suitable pre-existing dataset for their research. Consequently, they decided to create the new dataset. It primarily consists of segmented videos (sourced mainly from YouTube) and images (other sources).
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation. Virtual KITTI contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions. These worlds were created using the Unity game engine and a novel real-to-virtual cloning method. These photo-realistic synthetic videos are automatically, exactly, and fully annotated for 2D and 3D multi-object tracking and at the pixel level with category, instance, flow, and depth labels (cf. below for download links).