Population distribution : the race distribution is Asians, Caucasians and black people, the gender distribution is male and female, the age distribution is from children to the elderly
Collecting environment : including indoor and outdoor scenes (such as supermarket, mall and residential area, etc.)
Data diversity : different ages, different time periods, different cameras, different human body orientations and postures, different ages collecting environment
Device : surveillance cameras, the image resolution is not less than 1,9201,080
Data format : the image data format is .jpg, the annotation file format is .json
Annotation content : human body rectangular bounding boxes, 15 human body attributes
Quality Requirements : A rectangular bounding box of human body is qualified when the deviation is not more than 3 pixels, and the qualified rate of the bounding boxes shall not be lower than 97%;Annotation accuracy of attributes is over 97%
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore the Synthetic Rock Paper Scissors Dataset featuring a diverse collection of augmented images for training and testing machine learning models.
Race distribution : Asians, Caucasians, black people
Gender distribution : gender balance
Age distribution : ranging from teenager to the elderly, the middle-aged and young people are the majorities
Collecting environment : including indoor and outdoor scenes
Data diversity : different shooting heights, different ages, different light conditions, different collecting environment, clothes in different seasons, multiple human poses
Device : cameras
Data format : the data format is .jpg/mp4, the annotation file format is .json, the camera parameter file format is .json, the point cloud file format is .pcd
Accuracy : based on the accuracy of the poses, the accuracy exceeds 97%;the accuracy of labels of gender, race, age, collecting environment and clothes are more than 97%
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and 2 male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data. Despite the existence of multi-view color datasets captured with the use of hardware (HW) synchronization, to the best of our knowledge, HUMAN4D is the first and only public resource that provides volumetric depth maps with high synchronization precision due to the use of intra- and inter-sensor HW-SYNC. Moreover, a spatio-temporally aligned scanned and rigged 3D character complements HUMAN4D to enable joint research on time-varying and high-quality dynamic meshes. We provide evaluation baselines by benchmarking HUMAN4D with state-of-the-art human pose estimation and 3D compression methods. For the former, we apply 2D and 3D pose estimation algorithms both on single- and multi-view data cues. For the latter, we benchmark open-source 3D codecs on volumetric data respecting online volumetric video encoding and steady bit-rates. Furthermore, qualitative and quantitative visual comparison between mesh-based volumetric data reconstructed in different qualities showcases the available options with respect to 4D representations. HUMAN4D is introduced to the computer vision and graphics research communities to enable joint research on spatio-temporally aligned pose, volumetric, mRGBD and audio data cues.The dataset and its code are available online.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Part 1 of the synthetic facial data rendered from female fbx models. The total female dataset contains around 13k facial images generated from 12 identity and the corresponding raw facial depth and head pose.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
## Overview
Computer Vision Web V2 is a dataset for object detection tasks - it contains Web V2 annotations for 1,478 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore the expanded Linear Equation Image Dataset with over 30,000 images and CSV data for solving high school algebraic problems using machine learning.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Drivable Area Segmentation Dataset is meticulously crafted to enhance the capabilities of AI in navigating autonomous vehicles through diverse driving environments. It features a wide array of high-resolution images, with resolutions ranging from 1600 x 1200 to 2592 x 1944 pixels, capturing various pavement types such as bitumen, concrete, gravel, earth, snow, and ice. This dataset is vital for training AI models to differentiate between drivable and non-drivable areas, a fundamental aspect of autonomous driving. By providing detailed semantic and binary segmentation, it aims to improve the safety and efficiency of autonomous vehicles, ensuring they can adapt to different road conditions and environments encountered in real-world scenarios.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
X2I Dataset
Project Page: https://vectorspacelab.github.io/OmniGen/ Github: https://github.com/VectorSpaceLab/OmniGen Paper: https://arxiv.org/abs/2409.11340 Model: https://huggingface.co/Shitao/OmniGen-v1
To achieve robust multi-task processing capabilities, it is essential to train the OmniGen on large-scale and diverse datasets. However, in the field of unified image generation, a readily available dataset has yet to emerge. For this reason, we have curated a large-scale⦠See the full description on the dataset page: https://huggingface.co/datasets/yzwang/X2I-computer-vision.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Discover our curated Animal Image Dataset featuring bears, crows, elephants, and rats. Ideal for training Convolutional Neural Networks (CNNs) with annotated.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In practical media distribution systems, visual content usually undergoes multiple stages of quality degradation along the delivery chain, but the pristine source content is rarely available at most quality monitoring points along the chain to serve as a reference for quality assessment. As a result, full-reference (FR) and reduced-reference (RR) image quality assessment (IQA) methods are generally infeasible. Although no-reference (NR) methods are readily applicable, their performance is often not reliable. On the other hand, intermediate references of degraded quality are often available, e.g., at the input of video transcoders, but how to make the best use of them in proper ways has not been deeply investigated.This database is associated with a research project whose main goal is to make one of the first attempts to establish a new IQA paradigm named degraded-reference IQA (DR IQA). We initiate work on DR IQA by restricting ourselves to a two-stage distortion pipeline. Most IQA research projects rely on the availability of appropriate quality-annotated datasets. However, we find that only a few small-scale subject-rated datasets of multiply distorted images exist at the moment. These datasets contain a few hundreds of images and include the LIVE Multiply Distorted (LIVE MD), Multiply Distorted IVL (MD IVL), and LIVE Wild Compressed (LIVE WCmp) databases. Such small-scale data is not only insufficient to develop robust machine learning based IQA models, it is also not enough to perform multiple distortions behavior analysis, i.e., to study how multiple distortions behave in conjunction with each other when impacting visual content simultaneously. Surprisingly, such detailed analysis is lacking even for the case of two simultaneous distortions.We address the above-mentioned and other issues in our research project titled Degraded Reference Image Quality Assessment. As part of this project, we address the scarcity of data by constructing two large-scale datasets called DR IQA database Version 1 (V1) and DR IQA database Version 2 (V2). Each of these datasets contains 34 pristine reference (PR) images, 1,122 singly distorted degraded reference (DR) images, and 31,790 multiply distorted final distorted (FD) images, making them the largest datasets constructed in this particular area of IQA to-date. These datasets formed the basis of multiple distortion behavior analysis and DR IQA model development conducted in the above-mentioned project. We hope that the IQA research community will find them useful. Here we are releasing DR IQA database V1, while DR IQA database V2 has been separately released, also on IEEE DataPort. If you use this database in your research then please cite the following paper (Details about the DR IQA project can also be found in this paper):S. Athar and Z. Wang, "Degraded Reference Image Quality Assessment," Accepted for publication in IEEE Transactions on Image Processing, 2022.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CAPA Apple Quality Grading Multi-Spectral Image Database consists of multispectral (450nm, 500nm, 750nm, and 800nm) images of health and defected apples of bi-color, manual segmentations of defected regions, and expert evaluations of the apples into 4 quality categories. The defect types consist of bruise, rot, flesh damage, frost damage, russet, etc. The database can be used for academic or research purposes with the aim of computer vision based apple quality inspection.
The CAPA Apple Quality Grading Multi-Spectral Image Database is a propriety of ULG (Gembloux Agro-Bio Tech) - Belgium, and cannot be used without the consent of the ULG (Gembloux Agro-Bio Tech), Belgium.
For consent, contact
Devrim Unay, İzmir University of Economics, Turkey: unaydevrim@gmail.com
OR
Marie-France Destain, Gembloux Agro-Bio Tech, Belgium: mfdestain@ulg.ac.be
In disseminating results using this database,
1. the author should indicate in the manuscript that it was acquired by ULG (Gembloux Agro-Bio Tech), Belgium.
2. cite the following article Kleynen, O., Leemans, V., & Destain, M.-F. (2005). Development of a multi-spectral vision system for the detection of defects on apples. Journal of Food Engineering, 69(1), 41-49.
Relevant publications:
Kleynen et al., 2003 O. Kleynen, V. Leemans and M.F. Destain, Selection of the most efficient wavelength bands for āJonagoldā apple sorting. Postharv. Biol. Technol., 30 (2003), pp. 221ā232.
Leemans and Destain, 2004 V. Leemans and M.F. Destain, A real-time grading method of apples based on features extracted from defects. J. Food Eng., 61 (2004), pp. 83ā89.
Leemans et al., 2002 V. Leemans, H. Magein and M.F. Destain, On-line fruit grading according to their external quality using machine vision. Biosyst. Eng., 83 (2002), pp. 397ā404.
Unay and Gosselin, 2006 D. Unay and B. Gosselin, Automatic defect detection of āJonagoldā apples on multi-spectral images: A comparative study. Postharv. Biol. Technol., 42 (2006), pp. 271ā279.
Unay and Gosselin, 2007 D. Unay and B. Gosselin, Stem and calyx recognition on āJonagoldā apples by pattern recognition. J. Food Eng., 78 (2007), pp. 597ā605.
Unay et al., 2011 Unay, D., Gosselin, B., Kleynen, O, Leemans, V., Destain, M.-F., Debeir, O, āAutomatic Grading of Bi-Colored Apples by Multispectral Machine Visionā, Computers and Electronics in Agriculture, 75(1), 204-212, 2011.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks and coded for themes through both structured and qualitative content analysis. This work most closely pairs with research question 1 in the genealogies of data project (https://arxiv.org/abs/2007.07399): How do dataset developers in CV and NLP research, describe and motivate the decisions that go into their creation?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
RHGC Computer Vision is a dataset for object detection tasks - it contains Bus Train Car People Rail Road Crossing annotations for 1,119 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As one of the research directions at OLIVES Lab @ Georgia Tech, we focus on recognizing textures and materials in real-world images, which plays an important role in object recognition and scene understanding. Aiming at describing objects or scenes with more detailed information, we explore how to computationally characterize apparent or latent properties (e.g. surface smoothness) of materials, i.e., computational material characterization, which moves a step further beyond material recognition. For this purpose, we introduce a large, publicly available dataset named challenging microscopic material surface dataset (CoMMonS). We utilize a powerful microscope to capture high-resolution images with fine details of fabric surfaces. The CoMMonS dataset consists of 6,912 images covering 24 fabric samples in a controlled environment under varying imaging conditions such as lighting, zoom levels, geometric variations, and touching directions. This dataset can be used to assess the performance of existing deep learning-based algorithms and to develop our own method for material characterization in terms of fabric properties such as fiber length, surface smoothness, and toweling effect. Please refer to our GitHub page for code, papers, and more information.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore the UCSD Pedestrian Database, a comprehensive grayscale video dataset for pedestrian detection, crowd analysis, and scene segmentation.
Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata: This data product is a unique offering in the realm of AI/ML training data. What sets it apart is the sheer volume and diversity of the dataset, which includes 4.5 million files spanning across 20 different categories. These categories range from Animals/Wildlife and The Arts to Technology and Transportation, providing a rich and varied dataset for AI/ML applications.
The data is sourced from Wirestock's platform, where creators upload and sell their photos, videos, and AI art online. This means that the data is not only vast but also constantly updated, ensuring a fresh and relevant dataset for your AI/ML needs. The data is collected in a GDPR-compliant manner, ensuring the privacy and rights of the creators are respected.
The primary use-cases for this data product are numerous. It is ideal for training machine learning models for image recognition, improving computer vision algorithms, and enhancing AI applications in various industries such as retail, healthcare, and transportation. The diversity of the dataset also means it can be used for more niche applications, such as training AI to recognize specific objects or scenes.
This data product fits into Wirestock's broader data offering as a key resource for AI/ML training. Wirestock is a platform for creators to sell their work, and this dataset is a collection of that work. It represents the breadth and depth of content available on Wirestock, making it a valuable resource for any company working with AI/ML.
The core benefits of this dataset are its volume, diversity, and quality. With 4.5 million files, it provides a vast resource for AI training. The diversity of the dataset, spanning 20 categories, ensures a wide range of images for training purposes. The quality of the images is also high, as they are sourced from creators selling their work on Wirestock.
In terms of how the data is collected, creators upload their work to Wirestock, where it is then sold on various marketplaces. This means the data is sourced directly from creators, ensuring a diverse and unique dataset. The data includes both the images themselves and associated metadata, providing additional context for each image.
The different image categories included in this dataset are Animals/Wildlife, The Arts, Backgrounds/Textures, Beauty/Fashion, Buildings/Landmarks, Business/Finance, Celebrities, Education, Emotions, Food Drinks, Holidays, Industrial, Interiors, Nature Parks/Outdoor, People, Religion, Science, Signs/Symbols, Sports/Recreation, Technology, Transportation, Vintage, Healthcare/Medical, Objects, and Miscellaneous. This wide range of categories ensures a diverse dataset that can cater to a variety of AI/ML applications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Computer Vision Industries is a dataset for object detection tasks - it contains Safe Walkway NwO8 annotations for 4,803 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our dataset of 8,125 high-resolution micro-PCB images, featuring 13 distinct micro-PCBs in 125 unique orientations under ideal lighting.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pressing demand of workload along with social media interaction leads to diminished alertness during work hours. Researchers attempted to measure alertness level from various cues like EEG, EOG, Video-based eye movement analysis, etc. Among these, video-based eyelid and iris motion tracking gained much attention in recent years. However, most of these implementations are tested on video data of subjects without spectacles. These videos do not pose a challenge for eye detection and tracking. In this work, we have designed an experiment to yield a video database of 58 human subjects wearing spectacles and are at different levels of alertness. Along with spectacles, we introduced variation in session, recording frame rate (fps), illumination, and time of the experiment. We carried out analysis to detect the reliableness of facial and ocular features like yawning and eyeblinks in the context of alertness level detection capability. Also, we observe the influence of spectacles on ocular feature detection performance under spectacles and propose a simple preprocessing step to alleviate the specular reflection problem. Extensive experiments on real-world images demonstrate that our approach achieves desirable reflection suppression results within minimum execution time compared to the state of the art.
Population distribution : the race distribution is Asians, Caucasians and black people, the gender distribution is male and female, the age distribution is from children to the elderly
Collecting environment : including indoor and outdoor scenes (such as supermarket, mall and residential area, etc.)
Data diversity : different ages, different time periods, different cameras, different human body orientations and postures, different ages collecting environment
Device : surveillance cameras, the image resolution is not less than 1,9201,080
Data format : the image data format is .jpg, the annotation file format is .json
Annotation content : human body rectangular bounding boxes, 15 human body attributes
Quality Requirements : A rectangular bounding box of human body is qualified when the deviation is not more than 3 pixels, and the qualified rate of the bounding boxes shall not be lower than 97%;Annotation accuracy of attributes is over 97%