15 datasets found

f
DataSheet1_A novel approach for automatic annotation of human actions in 3D...
frontiersin.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Krusche; Ibrahim Al Naser; Mohamad Bdiwi; Steffen Ihlenfeldt (2023). DataSheet1_A novel approach for automatic annotation of human actions in 3D point clouds for flexible collaborative tasks with industrial robots.docx [Dataset]. http://doi.org/10.3389/frobt.2023.1028329.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/frobt.2023.1028329.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Sebastian Krusche; Ibrahim Al Naser; Mohamad Bdiwi; Steffen Ihlenfeldt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Manual annotation for human action recognition with content semantics using 3D Point Cloud (3D-PC) in industrial environments consumes a lot of time and resources. This work aims to recognize, analyze, and model human actions to develop a framework for automatically extracting content semantics. Main Contributions of this work: 1. design a multi-layer structure of various DNN classifiers to detect and extract humans and dynamic objects using 3D-PC preciously, 2. empirical experiments with over 10 subjects for collecting datasets of human actions and activities in one industrial setting, 3. development of an intuitive GUI to verify human actions and its interaction activities with the environment, 4. design and implement a methodology for automatic sequence matching of human actions in 3D-PC. All these procedures are merged in the proposed framework and evaluated in one industrial Use-Case with flexible patch sizes. Comparing the new approach with standard methods has shown that the annotation process can be accelerated by 5.2 times through automation.
t
Data from: REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic...
researchdata.tuwien.ac.at
txt, zip
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee (2025). REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly [Dataset]. http://doi.org/10.48436/0ewrv-8cb44
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.48436/0ewrv-8cb44
Dataset updated
Jul 15, 2025
Dataset provided by
TU Wien
Authors
Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 9, 2025 - Jan 14, 2025
Description
REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

📋 Introduction

Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and more. We believe REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios.

✨ Key Features

Multimodality: REASSEMBLE contains data from robot proprioception, RGB cameras, Force&Torque sensors, microphones, and event cameras

Multitask labels: REASSEMBLE contains labeling which enables research in Temporal Action Segmentation, Motion Policy Learning, Anomaly detection, and Task Inversion.

Long horizon: Demonstrations in the REASSEMBLE dataset cover long horizon tasks and actions which usually span multiple steps.

Hierarchical labels: REASSEMBLE contains actions segmentation labels at two hierarchical levels.

🔴 Dataset Collection

Each demonstration starts by randomizing the board and object poses, after which an operator teleoperates the robot to assemble and disassemble the board while narrating their actions and marking task segment boundaries with key presses. The narrated descriptions are transcribed using Whisper [1], and the board and camera poses are measured at the beginning using a motion capture system, though continuous tracking is avoided due to interference with the event camera. Sensory data is recorded with rosbag and later post-processed into HDF5 files without downsampling or synchronization, preserving raw data and timestamps for future flexibility. To reduce memory usage, video and audio are stored as encoded MP4 and MP3 files, respectively. Transcription errors are corrected automatically or manually, and a custom visualization tool is used to validate the synchronization and correctness of all data and annotations. Missing or incorrect entries are identified and corrected, ensuring the dataset’s completeness. Low-level Skill annotations were added manually after data collection, and all labels were carefully reviewed to ensure accuracy.

📑 Dataset Structure

The dataset consists of several HDF5 (.h5) and JSON (.json) files, organized into two directories. The poses directory contains the JSON files, which store the poses of the cameras and the board in the world coordinate frame. The data directory contains the HDF5 files, which store the sensory readings and annotations collected as part of the REASSEMBLE dataset. Each JSON file can be matched with its corresponding HDF5 file based on their filenames, which include the timestamp when the data was recorded. For example, 2025-01-09-13-59-54_poses.json corresponds to 2025-01-09-13-59-54.h5.

The structure of the JSON files is as follows:

{"Hama1": [ [x ,y, z], [qx, qy, qz, qw] ], "Hama2": [ [x ,y, z], [qx, qy, qz, qw] ], "DAVIS346": [ [x ,y, z], [qx, qy, qz, qw] ], "NIST_Board1": [ [x ,y, z], [qx, qy, qz, qw] ] }

[x, y, z] represent the position of the object, and [qx, qy, qz, qw] represent its orientation as a quaternion.

The HDF5 (.h5) format organizes data into two main types of structures: datasets, which hold the actual data, and groups, which act like folders that can contain datasets or other groups. In the diagram below, groups are shown as folder icons, and datasets as file icons. The main group of the file directly contains the video, audio, and event data. To save memory, video and audio are stored as encoded byte strings, while event data is stored as arrays. The robot’s proprioceptive information is kept in the robot_state group as arrays. Because different sensors record data at different rates, the arrays vary in length (signified by the N_xxx variable in the data shapes). To align the sensory data, each sensor’s timestamps are stored separately in the timestamps group. Information about action segments is stored in the segments_info group. Each segment is saved as a subgroup, named according to its order in the demonstration, and includes a start timestamp, end timestamp, a success indicator, and a natural language description of the action. Within each segment, low-level skills are organized under a low_level subgroup, following the same structure as the high-level annotations.

📁

The splits folder contains two text files which list the h5 files used for the traning and validation splits.

📌 Important Resources

The project website contains more details about the REASSEMBLE dataset. The Code for loading and visualizing the data is avaibile on our github repository.

📄 Project website: https://tuwien-asl.github.io/REASSEMBLE_page/
💻 Code: https://github.com/TUWIEN-ASL/REASSEMBLE

⚠️ File comments

Below is a table which contains a list records which have any issues. Issues typically correspond to missing data from one of the sensors.

Recording Issue
2025-01-10-15-28-50.h5 hand cam missing at beginning
2025-01-10-16-17-40.h5 missing hand cam
2025-01-10-17-10-38.h5 hand cam missing at beginning
2025-01-10-17-54-09.h5 no empty action at
h
Global Generative AI in Data Labeling Solution and Services Market Scope &...
htfmarketinsights.com
pdf & excel
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HTF Market Intelligence (2025). Global Generative AI in Data Labeling Solution and Services Market Scope & Changing Dynamics 2024-2032 [Dataset]. https://www.htfmarketinsights.com/report/4361215-generative-ai-in-data-labeling-solution-and-services-market
Explore at:
pdf & excelAvailable download formats
Dataset updated
Jun 25, 2025
Dataset authored and provided by
HTF Market Intelligence
License
https://www.htfmarketinsights.com/privacy-policyhttps://www.htfmarketinsights.com/privacy-policy
Time period covered
2019 - 2031
Area covered
Global
Description
Global Generative AI in Data Labeling Solution and Services is segmented by Application (Autonomous driving, NLP, Medical imaging, Retail AI, Robotics), Type (Text Annotation, Image/Video Tagging, Audio Labeling, 3D Point Cloud Labeling, Synthetic Data Generation) and Geography(North America, LATAM, West Europe, Central & Eastern Europe, Northern Europe, Southern Europe, East Asia, Southeast Asia, South Asia, Central Asia, Oceania, MEA)
T
The Living with Robots Elevator Button Dataset
dataverse.tdl.org
bin, jpeg, json, pdf +2
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Verzic; Abhinav Chadaga; Justin Hart; Nicholas Verzic; Abhinav Chadaga; Justin Hart (2024). The Living with Robots Elevator Button Dataset [Dataset]. http://doi.org/10.18738/T8/ZMIWXS
Explore at:
jpeg(136615), jpeg(2311442), jpeg(2955198), jpeg(292253), jpeg(2223236), jpeg(245029), jpeg(2694946), jpeg(3173377), jpeg(2276044), jpeg(2482053), jpeg(2694041), jpeg(2706161), jpeg(2923500), txt(12709), jpeg(4008084), jpeg(4215464), jpeg(71358), jpeg(2001914), jpeg(4177748), jpeg(172521), jpeg(2549565), jpeg(88369), jpeg(5133569), jpeg(2601266), jpeg(3927875), jpeg(2777459), jpeg(77591), jpeg(2536608), jpeg(184407), jpeg(3491704), jpeg(2582896), jpeg(2777900), bin(1325952963), jpeg(2543058), jpeg(2554779), jpeg(1060813), jpeg(2501375), jpeg(4034905), jpeg(4237578), jpeg(2436779), jpeg(2350022), jpeg(2360185), jpeg(2754636), jpeg(2507560), jpeg(2406134), jpeg(53576), jpeg(3459768), jpeg(96123), jpeg(5003097), jpeg(2331509), jpeg(2806475), jpeg(2263743), jpeg(2455727), jpeg(2405185), jpeg(4173010), jpeg(1021750), jpeg(2372777), jpeg(156722), jpeg(2312498), jpeg(2403096), jpeg(2300837), jpeg(48068), jpeg(2588101), jpeg(2589930), jpeg(3353115), jpeg(2500862), jpeg(3378286), jpeg(2243785), jpeg(1109875), txt(13473), jpeg(2349104), jpeg(215735), jpeg(3872964), jpeg(2509147), jpeg(2748747), jpeg(5018361), jpeg(2402763), jpeg(2878591), jpeg(2262168), jpeg(2355309), jpeg(2670704), jpeg(2741479), jpeg(2554596), jpeg(2439555), jpeg(2787969), jpeg(2511238), jpeg(511384), jpeg(2144469), jpeg(2453072), jpeg(2624539), jpeg(2360153), jpeg(363667), jpeg(2339530), jpeg(2377723), jpeg(2133447), jpeg(2726918), jpeg(1069880), jpeg(68764), jpeg(145615), jpeg(153453), jpeg(2740692), jpeg(2886460), jpeg(2382487), jpeg(2231674), jpeg(2160381), jpeg(2583044), jpeg(2241634), jpeg(738459), jpeg(2263490), jpeg(660250), jpeg(2281733), jpeg(2708881), json(3908048), jpeg(2547070), jpeg(464026), jpeg(2508100), jpeg(2168193), jpeg(2918855), jpeg(2671884), jpeg(105866), jpeg(4196911), jpeg(2333916), jpeg(2372632), jpeg(5271752), jpeg(72302), jpeg(107752), json(97711), text/x-python-script(1539), jpeg(94700), jpeg(2589163), jpeg(5487436), jpeg(2197659), jpeg(2757353), jpeg(2604448), jpeg(4282050), jpeg(3971049), jpeg(2631983), jpeg(2394989), jpeg(4047164), jpeg(2538898), jpeg(2706418), jpeg(1959466), jpeg(385676), jpeg(2552008), jpeg(2505227), jpeg(3121869), jpeg(202876), jpeg(2322268), jpeg(2389620), jpeg(2676519), jpeg(147624), jpeg(102587), jpeg(2393358), jpeg(5102152), jpeg(2361727), jpeg(2597553), jpeg(177026), jpeg(2729298), jpeg(2430426), jpeg(2177168), jpeg(2590693), jpeg(808349), jpeg(2143123), jpeg(94280), jpeg(586411), jpeg(2225973), jpeg(257669), jpeg(1926998), jpeg(2689242), jpeg(384577), jpeg(4262268), jpeg(2266304), jpeg(4588266), jpeg(2814480), jpeg(2438008), jpeg(35671), jpeg(749166), jpeg(2299663), jpeg(5031068), jpeg(198390), jpeg(206577), jpeg(2743541), jpeg(2299530), jpeg(2443304), jpeg(2229963), jpeg(123857), jpeg(2951413), jpeg(2613002), jpeg(2468245), txt(210), jpeg(2461153), jpeg(2430630), jpeg(2275275), jpeg(777964), jpeg(49082), jpeg(2698000), jpeg(5252214), json(2913054), jpeg(2278069), jpeg(5057072), jpeg(2427691), jpeg(146772), jpeg(2503296), jpeg(25687), jpeg(2392644), jpeg(2509688), jpeg(2385118), jpeg(2539088), jpeg(2248523), jpeg(120532), jpeg(2917445), jpeg(2214848), jpeg(2566424), jpeg(2052237), jpeg(9031), jpeg(2443293), jpeg(2363892), jpeg(2426976), jpeg(2543532), jpeg(2450250), pdf(1401233), jpeg(2486517), jpeg(2595517), jpeg(3638140), jpeg(2148924), jpeg(2553195), jpeg(2661051), jpeg(2892554), jpeg(2229888), jpeg(2700413), jpeg(208434), jpeg(3630320), jpeg(3700971), jpeg(132018), jpeg(2353671), jpeg(158966), jpeg(2445679), jpeg(2439960), jpeg(1934299), jpeg(2508334), jpeg(63641), jpeg(2546889), jpeg(3351719), jpeg(2826115), jpeg(3455176), jpeg(712488), jpeg(3377718), jpeg(1035927), jpeg(3374445), jpeg(64187), jpeg(412144), jpeg(758962), jpeg(503611), jpeg(3424447), jpeg(1105750), jpeg(2313366), jpeg(4117966), jpeg(2668596), jpeg(2256662), jpeg(2595887), jpeg(341997), jpeg(222163), jpeg(1282900), jpeg(2792870), jpeg(2692063), jpeg(193192), jpeg(2315117), jpeg(2745115), jpeg(66113), jpeg(2567008), jpeg(155408), jpeg(2440326), jpeg(2294274), jpeg(4284919), jpeg(433219), jpeg(2723012), jpeg(2821769), jpeg(2197246), jpeg(162950), jpeg(390977), jpeg(2100970), jpeg(1573644), jpeg(3973658), jpeg(2461820), jpeg(3681436), jpeg(2256700), jpeg(5248283), jpeg(5282124), jpeg(22087), jpeg(4205443), jpeg(5035792), jpeg(5209220), jpeg(2623527), jpeg(557317), json(40415), jpeg(3907443), jpeg(2623188), jpeg(2872860), jpeg(2595175), jpeg(2608826), jpeg(73434), jpeg(4987584), jpeg(30658), jpeg(2384581), jpeg(2288672), jpeg(2552129), jpeg(378638), jpeg(3938493), jpeg(3228052), jpeg(2622622), jpeg(2793562), jpeg(2779700), jpeg(2519470), jpeg(2706677), jpeg(2341642), jpeg(2489902), jpeg(2272686), jpeg(2787335), jpeg(3552039), jpeg(2655002), txt(3952), jpeg(4140719), jpeg(329938), jpeg(4288170), jpeg(2343978), jpeg(4056242), jpeg(346785), jpeg(1176324), jpeg(2062771), jpeg(2734804), jpeg(120315), jpeg(202327), jpeg(835810), jpeg(2126673), jpeg(1019770), jpeg(8638), jpeg(2274556), jpeg(2283791), jpeg(188776), jpeg(5138306), jpeg(2355865), jpeg(2544560), jpeg(685802), jpeg(448676), jpeg(2176583), jpeg(2720449), bin(286168417), jpeg(2523385), jpeg(3018718), jpeg(2658817), jpeg(4181623), jpeg(2650097), jpeg(66278), jpeg(71920), jpeg(2734041), jpeg(2794451), jpeg(18626), jpeg(100621), jpeg(5159538), jpeg(2382724), jpeg(2208867), jpeg(4267398), jpeg(2403854), jpeg(2184053), jpeg(2851123), jpeg(2857345), jpeg(2512396), jpeg(2202188), jpeg(2609344), jpeg(325740), jpeg(2170825), jpeg(215906), jpeg(1897118), jpeg(2284573), jpeg(2843856), jpeg(2523022), jpeg(35612), jpeg(2693387), jpeg(4301950), jpeg(2663109), jpeg(2290992), jpeg(2325688), jpeg(5134002), jpeg(2462082), jpeg(2595344), jpeg(923018), jpeg(804978), jpeg(491458), jpeg(2288831), jpeg(3623642), jpeg(2272734), jpeg(2453154), jpeg(2710483), jpeg(2645360), jpeg(2133827), jpeg(832162), jpeg(2567065), jpeg(104562), jpeg(5346545), jpeg(64743), jpeg(4300849), jpeg(2807233), jpeg(549774), jpeg(4956142), jpeg(3094886), jpeg(802890), jpeg(5176609), jpeg(167459), jpeg(2420715), jpeg(2214767), jpeg(2651196), jpeg(40189)Available download formats
Unique identifier
https://doi.org/10.18738/T8/ZMIWXS
Dataset updated
Aug 12, 2024
Dataset provided by
Texas Data Repository
Authors
Nicholas Verzic; Abhinav Chadaga; Justin Hart; Nicholas Verzic; Abhinav Chadaga; Justin Hart
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Sep 1, 2022 - Aug 10, 2024
Area covered
Austin, Texas, United States
Description
Introduction As mobile service robots increasingly operate in human-centered environments, they must learn to use elevators without modifying elevator hardware. This task traditionally involves processing an image of an elevator control panel using instance segmentation of the buttons and labels, reading the text on the labels, and associating buttons with their corresponding labels. In addition to the standard approach, our project also implements an additional segmentation step where missing buttons and labels are recovered after the first feature detection pass. In a robust system, both the first segmentation pass and the recovery models’ training data requires pixel-level annotations of buttons and labels, while the label reading step needs annotations of the text on the labels. Current elevator panel feature datasets, however, either do not provide segmentation annotations, or do not draw distinctions between the buttons and labels. The “Living With Robots Elevator Button Dataset” was assembled for purposes of training segmentation and scene text recognition models on realistic scenarios involving varying conditions such as lighting, blur, and position of the camera relative to the elevator control panel. Buttons are labeled with the same action as their respective labels for purposes of training a button-label association model. A pipeline including all steps of the task mentioned was trained and evaluated, producing state-of-the-art accuracy and precision results using the high quality elevator button dataset. Dataset Contents 400 jpeg images of elevator panels. 292 taken of 25 different elevators across 24 buildings on the University of Texas at Austin campus. 108 sourced from the internet, with varying lighting, quality, and perspective conditions. JSON files containing border annotations, button and label distinctions, and text on labels for the Campus and Internet Sub-Datasets. PyTorch files containing state dictionaries with network weights for: The first-pass segmentation model, a transformer-based model trained to segment buttons and labels in a full-color image: “segmentation_vit_model.pth”. The feature-recovery segmentation model, a transformer-based model trained to segment masks of missed buttons and labels from the class map output of the first pass: “recovery_vit_model.pth”. The scene text recognition model, trained from PARSeq to read the special characters present on elevator panel labels: “parseq_str.ckpt”. Links to the data loader, training, and evaluation scripts for the segmentation models hosted in GitHub. The data subsets are all JPGs collected through 2 different means. The campus subset images were taken in buildings on and around the University of Texas at Austin campus. All pictures were taken facing the elevator panel’s wall roughly straight-on, while the camera itself was positioned in each of nine locations in a 3x3 grid layout relative to the panel: to the top left, top middle, top right, middle left, center, middle right, bottom left, bottom middle, and bottom right. A subset of these also includes versions of each image with the elevator door closed or open, varying the lighting and background conditions. All of these images are 3024 × 4032, and were taken with either an iPhone 12 or 12 Pro Max. The Internet subset deliberately features user-shared photos with irregular or uncommon panel characteristics. Images in this dataset vary widely in terms of resolution, clarity, button/label shape, and angle of the image to add variety to the dataset and robustness to any models trained with it. Data Segmentation The segmentation for this dataset served two training purposes. First, they were used to identify the pixels that comprise the elevator buttons and labels in the images. A segmentation model was then trained to accurately recognize buttons and labels in an image at the pixel-level. The second use, and the one that most distinguishes our approach, was training a separate model to recover missed button and label detections. The annotations were used to generate class maps of each, before being procedurally masked to provide a data ground-truth (the remaining masks) and a target (the hidden masks) for the recovery model. Data Annotation Method All annotations were done with the VGG Image Annotator published by the University of Oxford. All images were given their own set of annotations, identified in their file naming convention. Regarding the segmentation annotations, any button that was largely in-view of the image was segmented as one of several shapes that most closely fit the feature: rectangle, ellipse, or polygon. In the annotation JSONs, these appeared as either the coordinates of each point of a polygon or as the dimensions of an ellipse (center coordinates, radius dimensions, and angle of rotation). Additionally, each feature was designated as a “button” or “label”. For retraining the model that reads text on labels, each label and its...
ActiveHuman Part 2
zenodo.org
data.niaid.nih.gov
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charalampos Georgiadis; Charalampos Georgiadis (2025). ActiveHuman Part 2 [Dataset]. http://doi.org/10.5281/zenodo.8361114
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8361114
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Charalampos Georgiadis; Charalampos Georgiadis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is Part 2/2 of the ActiveHuman dataset! Part 1 can be found here.
Dataset Description
ActiveHuman was generated using Unity's Perception package.
It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals).
The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset.
Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.

Folder configuration
The dataset consists of 3 folders:
JSON Data: Contains all the generated JSON files.
RGB Images: Contains the generated RGB images.
Semantic Segmentation Images: Contains the generated semantic segmentation images.

Essential Terminology
Annotation: Recorded data describing a single capture.
Capture: One completed rendering process of a Unity sensor which stored the rendered result to data files (e.g. PNG, JPG, etc.).
Ego: Object or person on which a collection of sensors is attached to (e.g., if a drone has a camera attached to it, the drone would be the ego and the camera would be the sensor).
Ego coordinate system: Coordinates with respect to the ego.
Global coordinate system: Coordinates with respect to the global origin in Unity.
Sensor: Device that captures the dataset (in this instance the sensor is a camera).
Sensor coordinate system: Coordinates with respect to the sensor.
Sequence: Time-ordered series of captures. This is very useful for video capture where the time-order relationship of two captures is vital.
UIID: Universal Unique Identifier. It is a unique hexadecimal identifier that can represent an individual instance of a capture, ego, sensor, annotation, labeled object or keypoint, or keypoint template.

Dataset Data
The dataset includes 4 types of JSON annotation files files:
annotation_definitions.json: Contains annotation definitions for all of the active Labelers of the simulation stored in an array. Each entry consists of a collection of key-value pairs which describe a particular type of annotation and contain information about that specific annotation describing how its data should be mapped back to labels or objects in the scene. Each entry contains the following key-value pairs:
id: Integer identifier of the annotation's definition.
name: Annotation name (e.g., keypoints, bounding box, bounding box 3D, semantic segmentation).
description: Description of the annotation's specifications.
format: Format of the file containing the annotation specifications (e.g., json, PNG).
spec: Format-specific specifications for the annotation values generated by each Labeler.

Most Labelers generate different annotation specifications in the spec key-value pair:
BoundingBox2DLabeler/BoundingBox3DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
KeypointLabeler:
template_id: Keypoint template UUID.
template_name: Name of the keypoint template.
key_points: Array containing all the joints defined by the keypoint template. This array includes the key-value pairs:
label: Joint label.
index: Joint index.
color: RGBA values of the keypoint.
color_code: Hex color code of the keypoint
skeleton: Array containing all the skeleton connections defined by the keypoint template. Each skeleton connection defines a connection between two different joints. This array includes the key-value pairs:
label1: Label of the first joint.
label2: Label of the second joint.
joint1: Index of the first joint.
joint2: Index of the second joint.
color: RGBA values of the connection.
color_code: Hex color code of the connection.
SemanticSegmentationLabeler:
label_name: String identifier of a label.
pixel_value: RGBA values of the label.
color_code: Hex color code of the label.

captures_xyz.json: Each of these files contains an array of ground truth annotations generated by each active Labeler for each capture separately, as well as extra metadata that describe the state of each active sensor that is present in the scene. Each array entry in the contains the following key-value pairs:
id: UUID of the capture.
sequence_id: UUID of the sequence.
step: Index of the capture within a sequence.
timestamp: Timestamp (in ms) since the beginning of a sequence.
sensor: Properties of the sensor. This entry contains a collection with the following key-value pairs:
sensor_id: Sensor UUID.
ego_id: Ego UUID.
modality: Modality of the sensor (e.g., camera, radar).
translation: 3D vector that describes the sensor's position (in meters) with respect to the global coordinate system.
rotation: Quaternion variable that describes the sensor's orientation with respect to the ego coordinate system.
camera_intrinsic: matrix containing (if it exists) the camera's intrinsic calibration.
projection: Projection type used by the camera (e.g., orthographic, perspective).
ego: Attributes of the ego. This entry contains a collection with the following key-value pairs:
ego_id: Ego UUID.
translation: 3D vector that describes the ego's position (in meters) with respect to the global coordinate system.
rotation: Quaternion variable containing the ego's orientation.
velocity: 3D vector containing the ego's velocity (in meters per second).
acceleration: 3D vector containing the ego's acceleration (in ).
format: Format of the file captured by the sensor (e.g., PNG, JPG).
annotations: Key-value pair collections, one for each active Labeler. These key-value pairs are as follows:
id: Annotation UUID .
annotation_definition: Integer identifier of the annotation's definition.
filename: Name of the file generated by the Labeler. This entry is only present for Labelers that generate an image.
values: List of key-value pairs containing annotation data for the current Labeler.

Each Labeler generates different annotation specifications in the values key-value pair:
BoundingBox2DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
x: Position of the 2D bounding box on the X axis.
y: Position of the 2D bounding box position on the Y axis.
width: Width of the 2D bounding box.
height: Height of the 2D bounding box.
BoundingBox3DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
translation: 3D vector containing the location of the center of the 3D bounding box with respect to the sensor coordinate system (in meters).
size: 3D
Z
Tuta Absoluta Robotic Traps Dataset
data.niaid.nih.gov
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kapetas, Dimitrios (2024). Tuta Absoluta Robotic Traps Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13134109
Explore at:
Dataset updated
Jul 30, 2024
Dataset provided by
Kapetas, Dimitrios
Pechlivani, Eleftheria-Maria
Christakakis, Panagiotis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset focuses on enabling Tuta absoluta detection, necessitated annotated images. It was created as part of the H2020 PestNu project (No. 101037128) using the SpyFly AI-robotic trap from Agrorobotica. The SpyFly trap features a color camera (Svpro 13MP, sensor: Sony 1/3” IMX214) with a resolution of 3840 × 2880 for high-quality image capture. The camera was positioned 15 cm from the glue-paper to capture the entire adhesive board. In Total 217 images were captured.

Expert agronomists annotated the images using Roboflow, labeling a total of 6787 T. absoluta insects, averaging 62.26 annotations per image. Images without insects were excluded, resulting in 109 annotated images, one per day.

The dataset was split into training and validation subsets with an 80–20% ratio, leading to 87 images for training and 22 for validation. The dataset is organized into two main folders: “0_captured_dataset" contains the original 217 .jpg images. "1_annotated_dataset" includes the images and the annotated data, split into separate subfolders for training and validation. The Tuta absoluta count in each subset can be seen in the following table:

Set Images Tuta Absoluta Instances

Training 87 5344

Validation 22 1443

Total 109 6787
S
HA4M - Human Action Multi-Modal Monitoring in Manufacturing
scidb.cn
Updated Jul 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto Marani; Laura Romeo; Grazia Cicirelli; Tiziana D'Orazio (2022). HA4M - Human Action Multi-Modal Monitoring in Manufacturing [Dataset]. http://doi.org/10.57760/sciencedb.01872
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.01872
Dataset updated
Jul 6, 2022
Dataset provided by
Science Data Bank
Authors
Roberto Marani; Laura Romeo; Grazia Cicirelli; Tiziana D'Orazio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OverviewThe HA4M dataset is a collection of multi-modal data relative to actions performed by different subjects in an assembly scenario for manufacturing. It has been collected to provide a good test-bed for developing, validating and testing techniques and methodologies for the recognition of assembly actions. To the best of the authors' knowledge, few vision-based datasets exist in the context of object assembly.The HA4M dataset provides a considerable variety of multi-modal data compared to existing datasets. Six types of simultaneous data are supplied: RGB frames, Depth maps, IR frames, RGB-Depth-Aligned frames, Point Clouds and Skeleton data.These data allow the scientific community to make consistent comparisons among processing approaches or machine learning approaches by using one or more data modalities. Researchers in computer vision, pattern recognition and machine learning can use/reuse the data for different investigations in different application domains such as motion analysis, human-robot cooperation, action recognition, and so on.Dataset detailsThe dataset includes 12 assembly actions performed by 41 subjects for building an Epicyclic Gear Train (EGT).The assembly task involves three phases first, the assembly of Block 1 and Block 2 separately, and then the final setting up of both Blocks to build the EGT. The EGT is made up of a total of 12 components divided into two sets: the first eight components for building Block 1 and the remaining four components for Block 2. Finally, two screws are fixed with an Allen Key to assemble the two blocks and thus obtain the EGT.Acquisition setupThe acquisition experiment took place in two laboratories (one in Italy and one in Spain), where an acquisition area was reserved for the experimental setup. A Microsoft Azure Kinect camera acquires videos during the execution of the assembly task. It is placed in front of the operator and the table where the components are spread over. The camera is place on a tripod at an height h of 1.54 m and a distance of 1.78m. The camera is down-tilted by an angle of 17 degrees.Technical informationThe HA4M dataset contains 217 videos of the assembly task performed by 41 subjects (15 females and 26 males). Their ages ranged from 23 to 60. All the subjects participated voluntarily and were provided with a written description of the experiment. Each subject was asked to execute the task several times and to perform the actions at their own convenience (e.g. with both hands), independently from their dominant hand. The HA4M project is a growing project. So new acquisitions, planned in the next future, will expand the current dataset.ActionsTwelve actions are considered in HA4M. Actions from 1 to 4 are needed to build Block 1, then actions from 5 to 8 for building Block 2 and finally, the actions from 9 to 12 for completing the EGT. Actions are listed below:Pick up/Place CarrierPick up/Place Gear Bearings (x3)Pick up/Place Planet Gears (x3)Pick up/Place Carrier ShaftPick up/Place Sun ShaftPick up/Place Sun GearPick up/Place Sun Gear BearingPick up/Place Ring BearPick up Block 2 and place it on Block 1Pick up/Place CoverPick up/Place Screws (x2)Pick up/Place Allen Key, Turn Screws, Return Allen Key and EGTAnnotationData annotation concerns the labeling of the different actions in the video sequences.The annotation of the actions has been manually done by observing the RGB videos, frame by frame. The start frame of each action is identified as the subject starts to move the arm to the component to be grasped. The end frame, instead, is recorded when the subject releases the component, so the next frame becomes the start frame of the subsequent action.The total number of actions annotated in this study is 4123, including the “don't care” action (ID=0) and the action repetitions in the case of actions 2, 3 and 11.Available codeThe dataset has been acquired using the Multiple Azure Kinect GUI software, available at https://gitlab.com/roberto.marani/multiple-azure-kinect-gui, based on the Azure Kinect Sensor SDK v1.4.1 and Azure Kinect Body Tracking SDK v1.1.2.The software records device data to a Matroska (.mkv) file, containing video tracks, IMU samples, and device calibration. In this work, IMU samples are not considered.The same Multiple Azure Kinect GUI software processes the Matroska file and returns the different types of data provided with our dataset: RGB images, RGB-depth-Aligned (RGB-A) images, Depth images, IR images, Point Cloud and Skeleton data.
d
Robot Control Gestures (RoCoG)
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Aug 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Celso de Melo; Brandon Rothrock; Prudhvi Gurram; Oytun Ulutan; B.S. Manjunath (2020). Robot Control Gestures (RoCoG) [Dataset]. http://doi.org/10.25349/D9PP5J
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25349/D9PP5J
Dataset updated
Aug 26, 2020
Dataset provided by
Dryad
Authors
Celso de Melo; Brandon Rothrock; Prudhvi Gurram; Oytun Ulutan; B.S. Manjunath
Time period covered
Aug 21, 2020
Description
The following files are available with the dataset:

rocog_s00.zip, ..., rocog_s12.zip (26.2 GB): Raw videos for the human subjects performing the gestures and annotations

rocog_human_frames.zip, ..., rocog_human_frames.z02 (18.7 GB): Frames for human data used for training and testing. Each folder also has annotations for gesture (label.bin), orientation (orientation.bin), and the number of times the gesture is repeated (repetitions.bin)

rocog_synth_frames.zip, ..., rocog_synth_frames.z09 (~85.0 GB): Frames for synthetic data used for training and testing. Each folder also has annotations for gesture (label.bin), orientation (orientation.bin), and the number of times the gesture is repeated (repetitions.bin)

The labels are saved into Python binary struct arrays. Each file contains one entry per frame in the corresponding directory. Here's Python sample code to open these files:

import glob import os import struct

frames_dir = 'FemaleCivilian\10_Advance_11_1_2019_1...
u
Data from: DIPSEER: A Dataset for In-Person Student Emotion and Engagement...
observatorio-cientifico.ua.es
scidb.cn
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel; Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel (2025). DIPSEER: A Dataset for In-Person Student Emotion and Engagement Recognition in the Wild [Dataset]. https://observatorio-cientifico.ua.es/documentos/67321d21aea56d4af0484172
Explore at:
Dataset updated
2025
Authors
Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel; Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel
Description
Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.
Z
Acoustic Monitoring Dataset for Robotic Laser Directed Energy Deposition...
data.niaid.nih.gov
Updated Nov 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, Lequn (2023). Acoustic Monitoring Dataset for Robotic Laser Directed Energy Deposition (LDED) of Maraging Steel C300 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10208371
Explore at:
Dataset updated
Nov 27, 2023
Dataset authored and provided by
Chen, Lequn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset presents a set of acoustic signals captured during a single-bead wall experiment in robotic Laser Directed Energy Deposition (LDED) using Maraging Steel C300. The acoustic data was recorded using a high-fidelity Prepolarized microphone sensor (Xiris WeldMIC), capturing the intricate sound profiles associated with the LDED process at a sampling rate of 44,100 Hz. Laser Directed Energy Deposition: This dataset was generated with a robotic LDED process that consists of a six-axis industrial robot (KUKA KR90) coupled with a two-axis positioner, a laser head, and a coaxial powder-feeding nozzle.

Folder Structure:

/sample-1: The main folder for the experiment sample.

/audio_files: Contains 4624 .wav audio files, each representing a 40 ms chunk of the LDED process sound. /annotations_1.csv: A CSV file providing annotations for the audio files, labeling each as "Defect-free", "Defective", or "Laser-off". audio_features.h5: extracted acoustic features in time-domain, frequency-domain, and time-frequency representations (MFCC features). Feature extraction was conducted using Python Essentia Library.

File Naming Convention:

Audio files within the audio_files folder are named following the pattern sample_ExperimentID_SampleID.wav. Given that there's only one experiment and one sample, the naming will be consistent, for example, sample_1_1.wav for the first file. Annotation Details:

The annotations_1.csv file contains detailed labels for each audio file, correlating to the conditions observed during the experiment, aiding in quick identification and analysis. Experimental Parameters: The dataset reflects a controlled experiment setup with the following specifications:

Geometry: Single bead wall structure Dimensions: 90 mm * 42.5 mm Number of layers: 50 Laser beam diameter: 2 mm Layer thickness: 0.85 mm Stand-off distance: 12 mm Laser profile: Gaussian Laser wavelength: 1064 nm Process Parameters:

Laser power: 2.3 kW Speed: 25 mm/s Dwell time: 0 s Powder flow rate: 12 g/min This dataset aims to facilitate the development and testing of acoustic-based defect detection models for real-time quality monitoring in LDED processes. It can also serve as a reference point for further research on sensor fusion, machine learning, and real-time monitoring of manufacturing processes.
Z
FlexiGroBots Ground-level Blueberry Orchard Dataset v1 - RGB Bush Detection...
data.niaid.nih.gov
zenodo.org
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefanović Dimitrije (2023). FlexiGroBots Ground-level Blueberry Orchard Dataset v1 - RGB Bush Detection Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7813237
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Stefanović Dimitrije
Grbović Željana
Filipović Vladan
Đurić Nemanja
Pajević Nina
Panić Marko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ground-level Blueberry Orchard Dataset v1 consists of 2000 RGB images of blueberry orchard scenes captured in the village of Babe, Serbia on three occasions in March, May, and August of 2022. Images are captured using the RGB module of Luxonis OAK-D device, with the resolution of 1920×1080 pixels and stored in the lossless PNG format.

The dataset is created for the purpose of training deep learning models for blueberry bush detection, for the task of autonomous UGV guidance. It contains sequences of images captured from the UGV moving and rotating in blueberry orchard rows. Images are captured from a height of approximately 0.5 meters, with the camera angled towards the base of a blueberry plant and the surrounding bank on which it grows. Dataset is captured in real-life outdoor conditions and contains multiple sources of variability (bush shape and size, lighting conditions, shadows, saturation etc.) and artifacts (occlusions by weeds, branches, presence of irregular objects etc.).

There are two classes of annotated objects of interest:

Bush, corresponding to the base of the blueberry bush.

Pole, corresponding to hail netting poles and similar obstructing objects such as lamp posts or wooden legs of bumblebee hives (distinguishing poles is important to prevent equipment damage in operations such as soil sampling and pruning).

Objects of interest are annotated with bounding boxes. Labels are saved in two formats:

LabelMe JSON format (x1, y1, x2, y2; in pixels)

Yolo TXT format (x_center, y_center, width, height; as a ratio of total image size, with numerical labels 0 and 1 corresponding to Bush and Pole)

There are 61 images with no annotated objects, and there are no corresponding label files for these images.

The dataset is split into train, validation and test sets with 75%, 10%, and 15% split (1490, 200, and 310 images, respectively). As the data contains sequences of images, the split is made based on sequences rather than individual images to prevent data leakage.

Detailed description and statistics are available in:

V. Filipović, D. Stefanović, N. Pajević, Ž. Grbović, N. Đurić and M. Panić, "Bush Detection for Vision-based UGV Guidance in Blueberry Orchards: Data Set and Methods," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Vancouver, Canada, 2023. (Accepted)
R
Oxford Pets Dataset
universe.roboflow.com
zip
Updated Aug 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brad Dwyer (2022). Oxford Pets Dataset [Dataset]. https://universe.roboflow.com/brad-dwyer/oxford-pets/model/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 18, 2022
Dataset authored and provided by
Brad Dwyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Pets Bounding Boxes
Description
https://www.robots.ox.ac.uk/%7Evgg/data/pets/pet_annotations.jpg" alt="Example Annotations">

About this Dataset

The Oxford Pets dataset (also known as the "dogs vs cats" dataset) is a collection of images and annotations labeling various breeds of dogs and cats. There are approximately 100 examples of each of the 37 breeds. This dataset contains the object detection portion of the original dataset with bounding boxes around the animals' heads.

Origin

This dataset was collected by the Visual Geometry Group (VGG) at the University of Oxford.
S
Data from: CHIRLA: Comprehensive High-resolution Identification and...
scidb.cn
observatorio-cientifico.ua.es
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bessie Dominguez-Dager; Felix Escalona; Francisco Gomez-Donoso; Miguel Cazorla (2025). CHIRLA: Comprehensive High-resolution Identification and Re-identification for Large-scale Analysis [Dataset]. http://doi.org/10.57760/sciencedb.20543
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.20543
Dataset updated
Feb 5, 2025
Dataset provided by
Science Data Bank
Authors
Bessie Dominguez-Dager; Felix Escalona; Francisco Gomez-Donoso; Miguel Cazorla
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CHIRLA dataset (Comprehensive High-resolution Identification and Re-identification for Large-scale Analysis) is designed for long-term person re-identification (Re-ID) in real-world scenarios. The dataset consists of multi-camera video recordings captured over seven months in an indoor office environment. This dataset aims to facilitate the development and evaluation of Re-ID algorithms capable of handling significant variations in individuals’ appearances, including changes in clothing and physical characteristics. The dataset includes 22 individuals with 963,554 bounding box annotations across 596,345 frames.Data Generation ProceduresThe dataset was recorded at the Robotics, Vision, and Intelligent Systems Research Group headquarters at the University of Alicante, Spain. Seven strategically placed Reolink RLC-410W cameras were used to capture videos in a typical office setting, covering areas such as laboratories, hallways, and shared workspaces. Each camera features a 1/2.7" CMOS image sensor with a 5.0-megapixel resolution and an 80° horizontal field of view. The cameras were connected via Ethernet and WiFi to ensure stable streaming and synchronization.A ROS-based interconnection framework was used to synchronize and retrieve images from all cameras. The dataset includes video recordings at a resolution of 1080×720 pixels, with a consistent frame rate of 30 fps, stored in AVI format with DivX MPEG-4 encoding.Data Processing Methods and StepsData processing involved a semi-automatic labeling procedure:Detection: YOLOv8x was used to detect individuals in video frames and extract bounding boxes.Tracking: The Deep SORT algorithm was employed to generate tracklets and assign unique IDs to detected individuals.Manual Verification: A custom graphical user interface (GUI) was developed to facilitate manual verification and correction of the automatically generated labels.Bounding boxes and IDs were assigned consistently across different cameras and sequences to maintain identity coherence.Data Structure and FormatThe dataset comprises:Video Files: 70 videos, each corresponding to a specific camera view in a sequence, stored in AVI format.Annotation Files: JSON files containing frame-wise annotations, including bounding box coordinates and identity labels.The dataset is structured as follows:videos/seq_XXX/camera_Y.avi: Video files for each camera view.annotations/seq_XXX/camera_Y.json: Annotation files providing labeled bounding boxes and IDs.Use Cases and ReusabilityThe CHIRLA dataset is suitable for:Long-term person re-identificationMulti-camera tracking and re-identificationSingle-camera tracking and re-identification
h
oxford-iiit-pet
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PyTorch Image Models, oxford-iiit-pet [Dataset]. https://huggingface.co/datasets/timm/oxford-iiit-pet
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
PyTorch Image Models
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Oxford-IIIT Pet Dataset

Description

A 37 category pet dataset with roughly 200 images for each class. The images have a large variations in scale, pose and lighting. This instance of the dataset uses standard label ordering and includes the standard train/test splits. Trimaps and bbox are not included, but there is an image_id field that can be used to reference those annotations from official metadata. Website: https://www.robots.ox.ac.uk/~vgg/data/pets/… See the full description on the dataset page: https://huggingface.co/datasets/timm/oxford-iiit-pet.
Cooperamos UJI pipes dataset
zenodo.org
zip
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvador López Barajas; Salvador López Barajas; Raul Marin; Raul Marin; Pedro J Sanz; Pedro J Sanz (2025). Cooperamos UJI pipes dataset [Dataset]. http://doi.org/10.5281/zenodo.15792142
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15792142
Dataset updated
Jul 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Salvador López Barajas; Salvador López Barajas; Raul Marin; Raul Marin; Pedro J Sanz; Pedro J Sanz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 2, 2025
Description
Underwater Pipeline Dataset (YOLO Segmentation Format)

Dataset Overview

This dataset contains annotated underwater images of pipeline components, designed for robotics applications such as subsea inspection, maintenance, and navigation. The dataset was obtained from Roboflow Universe - Yellow Pipes v4.

Labels & Classes

The dataset includes the following object classes, each represented with pixel-accurate segmentation masks:

tpipe: T-junctions in pipelines (where three pipes connect in a "T" shape).

lpipe: Pipe elbows or bends (usually at 90° or 45° angles).

coupler: Pipe couplers or connectors joining two straight pipe segments.

pipe: Straight pipe sections without visible joints or bends.

Annotation Format

YOLO Segmentation:
Each image has an associated .txt file with segmentation label data in YOLO format.

Each line represents one object instance.

The first value is the class ID (0 = tpipe, 1 = lpipe, 2 = coupler, 3 = pipe).

The remaining values are normalized segmentation points describing the object’s outline as polygons.

Images:
Supplied in standard image formats (e.g., .jpg, .png).

Example segmentation label line (YOLO format):
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sebastian Krusche; Ibrahim Al Naser; Mohamad Bdiwi; Steffen Ihlenfeldt (2023). DataSheet1_A novel approach for automatic annotation of human actions in 3D point clouds for flexible collaborative tasks with industrial robots.docx [Dataset]. http://doi.org/10.3389/frobt.2023.1028329.s001

DataSheet1_A novel approach for automatic annotation of human actions in 3D point clouds for flexible collaborative tasks with industrial robots.docx

Explore at:

docxAvailable download formats

Unique identifier

https://doi.org/10.3389/frobt.2023.1028329.s001

Dataset updated

May 31, 2023

Dataset provided by

Frontiers

Authors

Sebastian Krusche; Ibrahim Al Naser; Mohamad Bdiwi; Steffen Ihlenfeldt

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Manual annotation for human action recognition with content semantics using 3D Point Cloud (3D-PC) in industrial environments consumes a lot of time and resources. This work aims to recognize, analyze, and model human actions to develop a framework for automatically extracting content semantics. Main Contributions of this work: 1. design a multi-layer structure of various DNN classifiers to detect and extract humans and dynamic objects using 3D-PC preciously, 2. empirical experiments with over 10 subjects for collecting datasets of human actions and activities in one industrial setting, 3. development of an intuitive GUI to verify human actions and its interaction activities with the environment, 4. design and implement a methodology for automatic sequence matching of human actions in 3D-PC. All these procedures are merged in the proposed framework and evaluated in one industrial Use-Case with flexible patch sizes. Comparing the new approach with standard methods has shown that the annotation process can be accelerated by 5.2 times through automation.

Clear search

Close search

Google apps

Main menu

Recording	Issue
2025-01-10-15-28-50.h5	hand cam missing at beginning
2025-01-10-16-17-40.h5	missing hand cam
2025-01-10-17-10-38.h5	hand cam missing at beginning
2025-01-10-17-54-09.h5	no empty action at

DataSheet1_A novel approach for automatic annotation of human actions in 3D...

Data from: REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic...

REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

📋 Introduction

✨ Key Features

🔴 Dataset Collection

📑 Dataset Structure

📌 Important Resources

⚠️ File comments

Global Generative AI in Data Labeling Solution and Services Market Scope &...

The Living with Robots Elevator Button Dataset

ActiveHuman Part 2

Tuta Absoluta Robotic Traps Dataset

HA4M - Human Action Multi-Modal Monitoring in Manufacturing

Robot Control Gestures (RoCoG)

Data from: DIPSEER: A Dataset for In-Person Student Emotion and Engagement...

Acoustic Monitoring Dataset for Robotic Laser Directed Energy Deposition...

FlexiGroBots Ground-level Blueberry Orchard Dataset v1 - RGB Bush Detection...

Oxford Pets Dataset

About this Dataset

Origin

Data from: CHIRLA: Comprehensive High-resolution Identification and...

oxford-iiit-pet

Cooperamos UJI pipes dataset

Underwater Pipeline Dataset (YOLO Segmentation Format)

Dataset Overview

Labels & Classes

Annotation Format

DataSheet1_A novel approach for automatic annotation of human actions in 3D point clouds for flexible collaborative tasks with industrial robots.docx