Facebook
Twitter-Secure Implementation: NDA is signed to gurantee secure implementation and Annotated Imagery Data is destroyed upon delivery.
-Quality: Multiple rounds of quality inspections ensures high quality data output, certified with ISO9001
Facebook
Twitterhttps://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
| BASE YEAR | 2024 |
| HISTORICAL DATA | 2019 - 2023 |
| REGIONS COVERED | North America, Europe, APAC, South America, MEA |
| REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
| MARKET SIZE 2024 | 3.75(USD Billion) |
| MARKET SIZE 2025 | 4.25(USD Billion) |
| MARKET SIZE 2035 | 15.0(USD Billion) |
| SEGMENTS COVERED | Application, Deployment Type, End Use Industry, Type of Annotation, Regional |
| COUNTRIES COVERED | US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA |
| KEY MARKET DYNAMICS | growing AI adoption, increasing data volume, demand for automation, enhanced accuracy requirements, need for regulatory compliance |
| MARKET FORECAST UNITS | USD Billion |
| KEY COMPANIES PROFILED | Cognizant, Health Catalyst, Microsoft Azure, Slydian, Scale AI, Lionbridge AI, Samarthanam Trust, DataRobot, Clarifai, SuperAnnotate, Amazon Web Services, Appen, Google Cloud, iMerit, TAGSYS, Labelbox |
| MARKET FORECAST PERIOD | 2025 - 2035 |
| KEY MARKET OPPORTUNITIES | Increased AI adoption, Demand for automated solutions, Advancements in machine learning, Expanding IoT data sources, Need for regulatory compliance |
| COMPOUND ANNUAL GROWTH RATE (CAGR) | 13.4% (2025 - 2035) |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of drone images that were obtained for agricultural field monitoring to detect weeds and crops through computer vision and machine learning approaches. The images were obtained through high-resolution UAVs and annotated using the LabelImg and Roboflow tool. Each image has a corresponding YOLO annotation file that contains bounding box information and class IDs for detected objects. The dataset includes:
Original images in .jpg format with a resolution of 585 × 438 pixels.
Annotation files (.txt) corresponding to each image, following the YOLO format: class_id x_center y_center width height.
A classes.txt file listing the object categories used in labeling (e.g., Weed, Crop).
The dataset is intended for use in machine learning model development, particularly for precision agriculture, weed detection, and plant health monitoring. It can be directly used for training YOLOv7 and other object detection models.
Facebook
Twitterhttps://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
| BASE YEAR | 2024 |
| HISTORICAL DATA | 2019 - 2023 |
| REGIONS COVERED | North America, Europe, APAC, South America, MEA |
| REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
| MARKET SIZE 2024 | 2.72(USD Billion) |
| MARKET SIZE 2025 | 3.06(USD Billion) |
| MARKET SIZE 2035 | 10.0(USD Billion) |
| SEGMENTS COVERED | Application, Deployment Type, End Use, Annotation Type, Regional |
| COUNTRIES COVERED | US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA |
| KEY MARKET DYNAMICS | Rising demand for AI models, Increasing data volume, Need for cost-effective solutions, Advancements in machine learning, Growing partnerships and collaborations |
| MARKET FORECAST UNITS | USD Billion |
| KEY COMPANIES PROFILED | IBM, Edgecase, AWS, Mighty AI, CrowdFlower, NVIDIA, Clarifai, Gigantum, Microsoft, Labelbox, Zegami, Scale AI, Google, SuperAnnotate, DataRobot |
| MARKET FORECAST PERIOD | 2025 - 2035 |
| KEY MARKET OPPORTUNITIES | Rising demand for AI training data, Increasing adoption of machine learning, Expansion in autonomous vehicle technology, Growth in healthcare automation, Surge in data-driven decision making |
| COMPOUND ANNUAL GROWTH RATE (CAGR) | 12.6% (2025 - 2035) |
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The SkySeaLand Dataset is a high-resolution satellite imagery collection developed for object detection, classification, and aerial analysis tasks. It focuses on transportation-related objects observed from diverse geospatial contexts, offering precise YOLO-formatted annotations for four categories: airplane, boat, car, and ship.
This dataset bridges terrestrial, maritime, and aerial domains, providing a unified resource for developing and benchmarking computer vision models in complex real-world environments.
.txt file per image) The SkySeaLand Dataset is divided into the following subsets for training, validation, and testing:
This split ensures a balanced distribution for training, validating, and testing models, facilitating robust model evaluation and performance analysis.
| Class Name | Object Count |
|---|---|
| Airplane | 4,847 |
| Boat | 3,697 |
| Car | 6,932 |
| Ship | 3,627 |
The dataset maintains a moderately balanced distribution among categories, ensuring stable model performance during multi-class training and evaluation.
Each label file contains normalized bounding box annotations in YOLO format.
The format for each line is:
Where: - class_id: The class of the object (refer to the table below). - x_center, y_center: The center coordinates of the bounding box, normalized between 0 and 1 relative to the image width and height. - width, height: The width and height of the bounding box, also normalized between 0 and 1.
| Class ID | Category |
|---|---|
| 0 | Airplane |
| 1 | Boat |
| 2 | Car |
| 3 | Ship |
All coordinates are normalized between 0 and 1 relative to the image width and height.
Data Source:
- Satellite imagery was obtained from Google Earth Pro under fair-use and research guidelines.
- The dataset was prepared solely for academic and educational computer vision research.
Annotation Tools:
- Manual annotations were performed and verified using:
- CVAT (Computer Vision Annotation Tool)
- Roboflow
These tools were used to ensure consistent annotation quality and accurate bounding box placement across all object classes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.
For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.
Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.
Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.
By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The dataset comprises of annotated video frames from positioned in a public space camera. The tracking of each individual in the camera's view has been achieved using the rectangle tool in the Computer Vision Annotation Tool (CVAT).
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fc5a8dc4f63fe85c64a5fead10fad3031%2Fpersons_gif.gif?generation=1690705558283123&alt=media" alt="">
images directory houses the original video frames, serving as the primary source of raw data. annotations.xml file provides the detailed annotation data for the images. boxes directory contains frames that visually represent the bounding box annotations, showing the locations of the tracked individuals within each frame. These images can be used to understand how the tracking has been implemented and to visualize the marked areas for each individual.The annotations are represented as rectangle bounding boxes that are placed around each individual. Each bounding box annotation contains the position ( xtl-ytl-xbr-ybr coordinates ) for the respective box within the frame.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4f274551e10db2754c4d8a16dff97b33%2Fcarbon%20(10).png?generation=1687776281548084&alt=media" alt="">
🚀 You can learn more about our high-quality unique datasets here
keywords: multiple people tracking, human detection dataset, object detection dataset, people tracking dataset, tracking human object interactions, human Identification tracking dataset, people detection annotations, detecting human in a crowd, human trafficking dataset, deep learning object tracking, multi-object tracking dataset, labeled web tracking dataset, large-scale object tracking dataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project involves annotating players on a rugby league field in a set of video frames. The goal is to label each player with a bounding box in each frame.
We have extracted around 1500 frames from rugby league videos, and we need to annotate the players in each frame. The labels should be accurate and consistent across all frames.
I've uploaded the dataset so you can use the built-in annotation tool to label each player with a bounding box. To get started, follow these steps:
Open the annotation tool and select the first frame in the dataset.
Use the rectangle tool to draw a bounding box around each player in the frame.
Add the label 'Player' to each bounding box
Move to the next frame in the dataset and repeat steps 3-4.
Continue annotating all frames in the dataset until all players are labeled.
We recommend exporting the labels in the YOLO format.
If you have any questions or concerns about the annotation process, please don't hesitate to reach out to us.
Thank you for your help with this project!
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The dataset was created for bee object detection based on images. Videos were taken at the entrance of 25 beehives in three apiaries in San Jose, Cupertino, and Gilroy in CA, USA. The videos were taken above the landing pad of different beehives. The camera was placed at a distinct angle to provide a clear view of the hive entrance.
The images were saved one frame per second from videos. The annotation platform Label Studio was selected to annotate bees in each image due to the friendly user interface and high quality. The below criteria was followed in the labeling process. First, at least 50% of the bee's body must be visible. Second, the image cannot be too blurry. After tagging each bee with a rectangle box in the annotation tool, output label files with Yolo labeling format were generated for each image. The output label files contained one set of bounding-box (BBox) coordinates for each bee in the image. If there were multiple objects in the image, there would be one line for one object in the label file. It recorded the object ID, X-axis center, Y-axis center, BBox width, and height with normalized image size from 0 to 1.
Please cite the paper if you used the data in your research: Liang, A. (2024). Developing a multimodal system for bee object detection and health assessment. IEEE Access, 12, 158703 - 15871. https://doi.org/10.1109/ACCESS.2024.3464559.
Facebook
TwitterLarge Public Aquaria are complex ecosystems that require constant monitoring to detect and correct anomalies that may affect the habitat and their species. Many of those anomalies can be directly or indirectly spotted by monitoring the behavior of fish. This can be a quite laborious task to be done by biologists alone. Automated fish tracking methods, specially of the non-intrusive type, can help biologists in the timely detection of such events. These systems require annotated data of fish to be trained. We used footage collected from the main aquarium of Oceanário de Lisboa to create a novel dataset with fish annotations from the shark and ray species. The dataset has the following characteristics:
66 shark training tracks with a total of 15812 bounding boxes 88 shark testing tracks with a total of 15978 bounding boxes 133 ray training tracks with a total of 28168 bounding boxes 192 ray testing tracks with a total of 31529 bounding boxes
The training set corresponds to a calm enviro..., The dataset was collected using a stationary camera positioned outside the main tank of Oceanário de Lisboa aiming at the fish. Additionally, this data was processed using the CVAT annotation tool to create the sharks and rays annotations., , # Sharks and rays swimming in a large public aquarium
Each set has 2 folders: gt and img1. The gt folder contains 3 txt files: gt, gt_out and labels. The gt and gt_out files contain the bounding box annotations sorted in two distinct ways. The former has the annotations sorted by frame number, while the latter is sorted by the track ID. Each line of the ground truth files represents one bounding box of a fish trajectory. The bounding boxes are represented with the following format: frame id, track id, x, y, w, h, not ignored, class id, visibility. The folder img1 contains all the annotated frames.
frame id points to the frame where the bounding box was obtained;
track id identifies the track of a fish with which the bonding box is associated;
x and y are the pixel coordinates of the top left corner of the bounding box;
w and h are the width and height of the bounding box respectively. These variables are measured in terms of pixels o...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains high-quality MRI images of brain tumors with detailed annotations. The dataset is meticulously curated, cleaned, and annotated to aid in the development and evaluation of machine learning models for brain tumor detection and classification.
The dataset includes a total of 5,249 MRI images divided into training and validation sets. Each image is annotated with bounding boxes in YOLO format, and labels corresponding to one of the four classes of brain tumors.
The images in the dataset are from different angles of MRI scans including sagittal, axial, and coronal views. This variety ensures comprehensive coverage of brain anatomy, enhancing the robustness of models trained on this dataset.
The bounding boxes were manually annotated using the LabelImg tool by a dedicated team. This rigorous process ensures high accuracy and reliability of the annotations.
This dataset was inspired by two existing datasets: 1. https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset 2. https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri
A thorough cleaning process was performed to remove noisy, mislabeled, and poor-quality images, resulting in a high-quality and well-labeled dataset.
This dataset is suitable for training and validating deep learning models for the detection and classification of brain tumors. The variety in MRI scan angles and the precision of annotations provide an excellent foundation for developing robust computer vision applications in medical imaging.
If you use this dataset in your research or project, please consider citing it appropriately to acknowledge the effort put into its creation and annotation.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This project demonstrates the process of creating a labeled dataset for computer vision tasks using web scraping and the CVAT annotation tool. Web scraping was employed to gather images from the web, and CVAT was utilized to annotate these images with bounding boxes around objects of interest. This dataset can then be used to train object detection models.
requests and Beautiful Soup were likely used for this task.This dataset can be used to train object detection models for bird species identification. It can also be used to evaluate the performance of existing object detection models on a specific dataset.
The code used for this project is available in the attached notebook. It demonstrates how to perform the following tasks:
This project provides a comprehensive guide to data annotation for computer vision tasks. By combining web scraping and CVAT, we were able to create a high-quality labeled dataset for training object detection models. Sources github.com/cvat-ai/cvat opencv.org/blog/data-annotation/
{"version":"1.1"}
{"type":"images"}
{"name":"Spot-billed_Pelican_-_Pelecanus_philippensis_-_Media_Search_-_Macaulay_Library_and_eBirdMacaulay_Library_logoMacaulay_Library_lo/10001","extension":".jpg","width":480,"height":360,"meta":{"related_images":[]}}
{"name":"Spot-billed_Pelican_-_Pelecanus_philippensis_-_Media_Search_-_Macaulay_Library_and_eBirdMacaulay_Library_logoMacaulay_Library_lo/10002","extension":".jpg","width":480,"height":320,"meta":{"related_images":[]}}
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The QT-MSTR dataset is a text detection and recognition dataset focused on multi-lingual scenes in the Qinghai-Tibet Plateau region of China. It aims to provide high-quality benchmark data for research areas such as Tibetan OCR, multi-lingual scene text recognition, and low-resource language processing through real-world street-view images. Data were collected between 2020 and 2023, covering key urban areas in the Qinghai-Tibet region, including Xining and Haidong in Qinghai Province, Gannan Tibetan Autonomous Prefecture and Tianzhu Tibetan Autonomous County in Gansu Province, as well as Lhasa in the Tibet Autonomous Region. The collection focused on public spaces where multi-lingual text commonly appears, such as commercial streets, tourist service points, transportation hubs, and areas around public facilities, to accurately reflect the "Tibetan-Chinese-English" multilingual environment of the region. Data were captured using mainstream smartphone rear cameras and portable digital cameras under natural lighting conditions, with all images saved at their original resolution (primarily 4032×3024 pixels). In terms of data processing, we established a standardized annotation pipeline. First, all images underwent strict privacy protection processing, with faces and license plates that could involve personal identity information being blurred. Subsequently, annotators proficient in Tibetan, Chinese, and English performed initial annotations using the LabelMe tool. The annotation content includes not only precise bounding boxes (quadrilateral annotations) for text lines but also language information (Tibetan, Chinese, English, numeric, or mixed text) and the corresponding transcribed text. To strictly control data quality, we implemented a dual process of automated script validation and expert review, focusing on checking the structural integrity of JSON files, the validity of bounding boxes, and the accuracy of language tags, with manual emphasis on reviewing ambiguous samples identified by the automated process. The final dataset consists of 1,000 original images and exactly 1,000 paired annotation files in JSON format. Each data file is named according to the "QT[category]_[sequence number]" rule (e.g., QTdor_001.jpg and QTdor_001.json), ensuring a one-to-one correspondence between images and annotations. The annotation files adopt a standard structure that clearly defines the geometric location, language attribute, and text content of each text instance in the image. The dataset is complete, with no missing values or invalid samples. Potential errors introduced during the annotation process mainly stem from text blurring under extreme lighting or partial occlusion in complex backgrounds; the bounding box annotations for such samples have all been reviewed by experts to ensure overall annotation accuracy. The dataset uses common .jpg (image) and .json (annotation) formats and can be read and processed using any deep learning framework (such as PyTorch or TensorFlow) and common annotation tools that support these formats, with no need for specific niche software.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Semantic PASCAL-Part dataset
The Semantic PASCAL-Part dataset is the RDF version of the famous PASCAL-Part dataset used for object detection in Computer Vision. Each image is annotated with bounding boxes containing a single object. Couples of bounding boxes are annotated with the part-whole relationship. For example, the bounding box of a car has the part-whole annotation with the bounding boxes of its wheels.
This original release joins Computer Vision with Semantic Web as the objects in the dataset are aligned with concepts from:
the provided supporting ontology;
the WordNet database through its synstes;
the Yago ontology.
The provided Python 3 code (see the GitHub repo) is able to browse the dataset and convert it in RDF knowledge graph format. This new format easily allows the fostering of research in both Semantic Web and Machine Learning fields.
Structure of the semantic PASCAL-Part Dataset
This is the folder structure of the dataset:
semanticPascalPart: it contains the refined images and annotations (e.g., small specific parts are merged into bigger parts) of the PASCAL-Part dataset in Pascal-voc style.
Annotations_set: the test set annotations in .xml format. For further information See the PASCAL VOC format here.
Annotations_trainval: the train and validation set annotations in .xml format. For further information See the PASCAL VOC format here.
JPEGImages_test: the test set images in .jpg format.
JPEGImages_trainval: the train and validation set images in .jpg format.
test.txt: the 2416 image filenames in the test set.
trainval.txt: the 7687 image filenames in the train and validation set.
The PASCAL-Part Ontology
The PASCAL-Part OWL ontology formalizes, through logical axioms, the part-of relationship between whole objects (22 classes) and their parts (39 classes). The ontology contains 85 logical axiomns in Description Logic in (for example) the following form:
Every potted_plant has exactly 1 plant AND has exactly 1 pot
We provide two versions of the ontology: with and without cardinality constraints in order to allow users to experiment with or without them. The WordNet alignment is encoded in the ontology as annotations. We further provide the WordNet_Yago_alignment.csv file with both WordNet and Yago alignments.
The ontology can be browsed with many Semantic Web tools such as:
Protégé: a graphical tool for ongology modelling;
OWLAPI: Java API for manipulating OWL ontologies;
rdflib: Python API for working with the RDF format.
RDF stores: databases for storing and semantically retrieve RDF triples. See here for some examples.
Citing semantic PASCAL-Part
If you use semantic PASCAL-Part in your research, please use the following BibTeX entry
@article{DBLP:journals/ia/DonadelloS16, author = {Ivan Donadello and Luciano Serafini}, title = {Integration of numeric and symbolic information for semantic image interpretation}, journal = {Intelligenza Artificiale}, volume = {10}, number = {1}, pages = {33--47}, year = {2016} }
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Tasmanian Orange Roughy Stereo Image Machine Learning Dataset is a collection of annotated stereo image pairs collected by a net-attached Acoustic and Optical System (AOS) during orange roughy (Hoplostethus atlanticus) biomass surveys off the northeast coast of Tasmania, Australia in July 2019. The dataset consists of expertly annotated imagery from six AOS deployments (OP12, OP16, OP20, OP23, OP24, and OP32), representing a variety of conditions including different fish densities, benthic substrates, and altitudes above the seafloor. Each image was manually annotated with bounding boxes identifying orange roughy and other marine species. For all annotated images, paired stereo images from the opposite camera have been included where available to enable stereo vision analysis. This dataset was specifically developed to investigate the effectiveness of machine learning-based object detection techniques for automating fish detection under variable real-world conditions, providing valuable resources for advancing automated image processing in fisheries science. Lineage: Data were obtained onboard the 32 m Fishing Vessel Saxon Onward during an orange roughy acoustic biomass survey off the northeast coast of Tasmania in July 2019. Stereo image pairs were collected using a net-attached Acoustic and Optical System (AOS), which is a self-contained autonomous system with multi-frequency and optical capabilities mounted on the headline of a standard commercial orange roughy demersal trawl. Images were acquired by a pair of Prosilica GX3300 Gigabyte Ethernet cameras with Zeiss F2.8 lenses (25 mm focal length), separated by 90 cm and angled inward at 7° to provide 100% overlap at a 5 m range. Illumination was provided by two synchronised quantum trio strobes. Stereo pairs were recorded at 1 Hz in JPG format with a resolution of 3296 x 2472 pixels and a 24-bit depth.
Human experts manually annotated images from the six deployments using both the CVAT annotation tool (producing COCO format annotations) and LabelImg tool (producing XML format annotations). Only port camera views were annotated for all deployments. Annotations included bounding boxes for "orange roughy" and "orange roughy edge" (for partially visible fish), as well as other marine species (brittle star, coral, eel, miscellaneous fish, etc.). Prior to annotation, under-exposed images were enhanced based on altitude above the seafloor using a Dark Channel Prior (DCP) approach, and images taken above 10 m altitude were discarded due to poor visibility.
For all annotated images, the paired stereo images (from the opposite camera) have been included where available to enable stereo vision applications. The dataset represents varying conditions of fish density (1-59 fish per image), substrate types (light vs. dark), and altitudes (2.0-10.0 m above seafloor), making it particularly valuable for training and evaluating object detection models under variable real-world conditions.
The final standardised COCO dataset contains 1051 annotated port-side images, 849 paired images (without annotations), and 14414 total annotations across 17 categories. The dataset's category distribution includes orange roughy (9887), orange roughy edge (2928), mollusc (453), cnidaria (359), misc fish (337), sea anemone (136), sea star (105), sea feather (100), sea urchin (45), coral (22), eel (15), oreo (10), brittle star (8), whiptail (4), chimera (2), siphonophore (2), and shark (1).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
I3D Tools Dataset
This is the official dataset for the "I3D Tools Dataset" paper. The dataset contains a diverse collection of 16 hand tool categories, curated for applications in object detection, segmentation, and synthetic data generation. Codebase:
📊 Dataset Statistics
Number of Tool Classes: 16
Total Images: ~35,000
Image Resolution: 1024x1024
Annotations per Image:
YOLOv8 bounding box format
Pixel-level segmentation mask
Natural language caption… See the full description on the dataset page: https://huggingface.co/datasets/i3dlabiisc/I3D-Tools-Dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study presents a comprehensive dataset comprising 2,517 images of apple fruitlets, each uniformly rescaled to 500×500 pixels to maintain dimensional consistency. Manual annotation was performed using LabelImg software, with bounding box coordinates stored in XML format. To enhance usability across different platforms, we additionally provide annotations in TXT format. Figure 3 illustrates a representative annotation example. During the annotation process, special attention was given to: (1) precise localization of small fruitlets, and (2) accurate annotation of partially occluded targets, where only visible portions were labeled to minimize false positives in subsequent analyses.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Our dataset, Nest Monitoring of the Kagu, consists of around ten days (253 hours) of continuous monitoring sampled at 25 frames per second. Our proposed dataset aims to facilitate computer vision research that relates to event detection and localization. We fully annotated the entire dataset (23M frames) with spatial localization labels in the form of a tight bounding box. Additionally, we provide temporal event segmentation labels of five unique bird activities: Feeding, Pushing leaves, Throwing leaves, Walk-In, and Walk-Out. The feeding event represents the period of time when the birds feed the chick. The nest-building events (pushing/throwing leaves) occur when the birds work on the nest during incubation. Pushing leaves is a nest-building behavior during which the birds form a crater by pushing leaves with their legs toward the edges of the nest while sitting on the nest. Throwing leaves is another nest-building behavior during which the birds throw leaves with the bill towards the nest while being, most of the time, outside the nest. Walk-in and walkout events represent the transitioning events from an empty nest to incubation or brooding, and vice versa. We also provide five additional labels that are based on time-of-day and lighting conditions: Day, Night, Sunrise, Sunset, and Shadows. In our manuscript, we provide a baseline approach that detects events and spatially localizes the bird in each frame using an attention mechanism. Our approach does not require any labels and uses a predictive deep learning architecture that is inspired by cognitive psychology studies, specifically, Event Segmentation Theory (EST). We split the dataset such that the first two days are used for validation, and performance evaluation is done on the last eight days. Methods The video monitoring system consisted of a commercial infrared illuminator surveillance camera (Sony 1/3′′ CCD image sensor), and an Electret mini microphone with built-in SMD amplifier (Henri Electronic, Germany), connected to a recording device via a 6.4-mm multicore cable. The transmission cable consisted of a 3-mm coaxial cable for the video signal, a 2.2-mm coaxial cable for the audio signal and two 2-mm (0.75 mm2) cables to power the camera and microphone. We powered the systems with 25-kg deep cycle, lead-acid batteries with a storage capacity of 100 Ah. We used both Archos™ 504 DVRs (with 80 GB hard drives) and Archos 700 DVRs (with 100 GB hard drives). All cameras were equipped with 12 infrared light emitting diodes (LEDs) for night vision. We have manually annotated the dataset with temporal events, time-of-day/lighting conditions, and spatial bounding boxes without relying on any object detection/tracking algorithms. The temporal annotations were initially created by experts who study the behavior of the Kagu bird and later refined to improve the precision of the temporal boundaries. Additional labels, such as lighting conditions, were added during the refinement process. The spatial bounding box annotations of 23M frames were created manually using professional video editing software (Davinci Resolve). We attempted to use available data annotation software tools, but they did not work for the scale of our video (10 days of continuous monitoring). We resorted to video editing software, which helped us annotate and export bounding box masks as videos. The masks were then post-processed to convert annotations from binary mask frames to bounding box coordinates for storage. It is worth noting that the video editing software allowed us to linearly interpolate between keyframes of the bounding boxes annotations, which helped save time and effort when the bird’s motion is linear. Both temporal and spatial annotations were verified by two volunteer graduate students. The process of creating spatial and temporal annotations took approximately two months.
Facebook
TwitterOur aim is to identify Hand Gesture from the given image and display the result in text format or audio which will be useful for Hearing impaired people. T0 train the CNN model, we have prepared our own dataset. The following are the dataset details
Image Resolution : 12 mega pixel Image Size : 1920 * 1080
The dataset has 9 hand gestures. The following are the hand gestures:
Class ID : Class Name "1": "Have", "2": "Nice", "3": "Day", "4": "Early", "5": "Morning", "6": "Wakeup", "7": "Love", "8": "Funny", "9": "You"
Train dataset has 232 images and Validation Dataset has 55 images.
All the images are annotated using VGG Annotator tool .
Annotation Details:
Hand Gesture is annotated with polygon coordinates. Annotated only the hand region (Palm and Fingers). Annotation information are stored in JSON file (via_region_annotation.json)
Each Hand Gesture has 20 images in Train Dataset and 5 images in validation Dataset.
In our Project, we have used MASK RCNN to detect the Hand Gesture . It gives 3 results such as Class Name, Bounding Box Regressor and Segmentation.
Accuracy Score : Intersection Over Union (IoU) - 0.875 and mAP (Mean Arithmetic Precision) - 0.95
If you have any queries. Please reach out to us via email (HSL.Queries@gmail.com)
Facebook
TwitterVOC2012 Image and Annotation Visualization Notebook
Github: https://github.com/ikaankeskin/MLXdatasets/tree/main/ObjectDetection/PASCAL HuggingFace: https://huggingface.co/datasets/ikaankeskin/PASCAL_MLX This repository contains a tool that facilitates the download, extraction, and visualization of the VOC2012 dataset, complete with bounding box annotations extracted from associated XML files.
Features
Automated Dataset Download: Fetches the VOC2012 dataset from… See the full description on the dataset page: https://huggingface.co/datasets/ikaankeskin/PASCAL_MLX.
Facebook
Twitter-Secure Implementation: NDA is signed to gurantee secure implementation and Annotated Imagery Data is destroyed upon delivery.
-Quality: Multiple rounds of quality inspections ensures high quality data output, certified with ISO9001