Data DescriptionThe CADDI dataset is designed to support research in in-class activity recognition using IMU data from low-cost sensors. It provides multimodal data capturing 19 different activities performed by 12 participants in a classroom environment, utilizing both IMU sensors from a Samsung Galaxy Watch 5 and synchronized stereo camera images. This dataset enables the development and validation of activity recognition models using sensor fusion techniques.Data Generation ProceduresThe data collection process involved recording both continuous and instantaneous activities that typically occur in a classroom setting. The activities were captured using a custom setup, which included:A Samsung Galaxy Watch 5 to collect accelerometer, gyroscope, and rotation vector data at 100Hz.A ZED stereo camera capturing 1080p images at 25-30 fps.A synchronized computer acting as a data hub, receiving IMU data and storing images in real-time.A D-Link DSR-1000AC router for wireless communication between the smartwatch and the computer.Participants were instructed to arrange their workspace as they would in a real classroom, including a laptop, notebook, pens, and a backpack. Data collection was performed under realistic conditions, ensuring that activities were captured naturally.Temporal and Spatial ScopeThe dataset contains a total of 472.03 minutes of recorded data.The IMU sensors operate at 100Hz, while the stereo camera captures images at 25-30Hz.Data was collected from 12 participants, each performing all 19 activities multiple times.The geographical scope of data collection was Alicante, Spain, under controlled indoor conditions.Dataset ComponentsThe dataset is organized into JSON and PNG files, structured hierarchically:IMU Data: Stored in JSON files, containing:Samsung Linear Acceleration Sensor (X, Y, Z values, 100Hz)LSM6DSO Gyroscope (X, Y, Z values, 100Hz)Samsung Rotation Vector (X, Y, Z, W quaternion values, 100Hz)Samsung HR Sensor (heart rate, 1Hz)OPT3007 Light Sensor (ambient light levels, 5Hz)Stereo Camera Images: High-resolution 1920×1080 PNG files from left and right cameras.Synchronization: Each IMU data record and image is timestamped for precise alignment.Data StructureThe dataset is divided into continuous and instantaneous activities:Continuous Activities (e.g., typing, writing, drawing) were recorded for 210 seconds, with the central 200 seconds retained.Instantaneous Activities (e.g., raising a hand, drinking) were repeated 20 times per participant, with data captured only during execution.The dataset is structured as:/continuous/subject_id/activity_name/ /camera_a/ → Left camera images /camera_b/ → Right camera images /sensors/ → JSON files with IMU data
/instantaneous/subject_id/activity_name/repetition_id/ /camera_a/ /camera_b/ /sensors/ Data Quality & Missing DataThe smartwatch buffers 100 readings per second before sending them, ensuring minimal data loss.Synchronization latency between the smartwatch and the computer is negligible.Not all IMU samples have corresponding images due to different recording rates.Outliers and anomalies were handled by discarding incomplete sequences at the start and end of continuous activities.Error Ranges & LimitationsSensor data may contain noise due to minor hand movements.The heart rate sensor operates at 1Hz, limiting its temporal resolution.Camera exposure settings were automatically adjusted, which may introduce slight variations in lighting.File Formats & Software CompatibilityIMU data is stored in JSON format, readable with Python’s json library.Images are in PNG format, compatible with all standard image processing tools.Recommended libraries for data analysis:Python: numpy, pandas, scikit-learn, tensorflow, pytorchVisualization: matplotlib, seabornDeep Learning: Keras, PyTorchPotential ApplicationsDevelopment of activity recognition models in educational settings.Study of student engagement based on movement patterns.Investigation of sensor fusion techniques combining visual and IMU data.This dataset represents a unique contribution to activity recognition research, providing rich multimodal data for developing robust models in real-world educational environments.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025caddiinclassactivitydetection, title={CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Monica Pina-Navarro and Miguel Cazorla and Francisco Gomez-Donoso}, year={2025}, eprint={2503.02853}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.02853}, }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset offers a detailed inventory of road intersections and their corresponding suburbs within Cape Town, meticulously curated to highlight instances of high pedestrian crash counts observed in "high-high" cluster and "high-low" outlier fishnet grid cells across the years 2017, 2018, 2019, and 2021. To enhance its utility, the dataset meticulously colour-codes each month associated with elevated crash occurrences, providing a nuanced perspective. Furthermore, the dataset categorises road intersections based on their placement within "high-high" clusters (marked with pink tabs) or "high-low" outlier cells (indicated by red tabs). For ease of navigation, the intersections are further organised alphabetically by suburb name, ensuring accessibility and clarity.Data SpecificsData Type: Geospatial-temporal categorical data with numeric attributesFile Format: Word document (.docx)Size: 255 KBNumber of Files: The dataset contains a total of 264 road intersection records (68 "high-high" clusters and 196 "high-low" outliers)Date Created: 21st May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, Open Refine, Python, SQLProcessing Steps: The raw road traffic crash data underwent a comprehensive refining process using Python software to ensure its accuracy and consistency. Following this, duplicates were eliminated to retain only one entry per crash incident. Subsequently, the data underwent further refinement with Open Refine software, focusing specifically on isolating unique crash descriptions for subsequent geocoding in ArcGIS Pro. Notably, during this process, only the road intersection crashes were retained, as they were the only incidents with spatial definitions.Once geocoded, road intersection crashes that involved a pedestrian were extracted so that subsequent spatio-temporal analyses would focus on these crashes only. The spatio-temporal analysis methods by which the pedestrian crashes were analysed included spatial autocorrelation, hotspot analysis, and cluster and outlier analysis. Leveraging these methods, road intersections involving pedestrian crashes identified as either "high-high" clusters or "high-low" outliers were extracted for inclusion in the dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)
Note: All supplementary files are provided as a single compressed archive named dataset.zip. Users should extract this file to access the individual Excel and Python files listed below.
This supplementary dataset supports the manuscript titled “Mahalanobis-Based Multivariate Financial Statement Analysis: Outlier Detection and Typological Clustering in U.S. Tech Firms.” It contains both data files and Python scripts used in the financial ratio analysis, Mahalanobis distance computation, and hierarchical clustering stages of the study. The files are organized as follows:
ESM_1.xlsx – Raw financial ratios of 18 U.S. technology firms (2020–2024)
ESM_2.py – Python script to calculate Z-scores from raw financial ratios
ESM_3.xlsx – Dataset containing Z-scores for the selected financial ratios
ESM_4.py – Python script for generating the correlation heatmap of the Z-scores
ESM_5.xlsx – Mahalanobis distance values for each firm
ESM_6.py – Python script to compute Mahalanobis distances
ESM_7.py – Python script to visualize Mahalanobis distances
ESM_8.xlsx – Mean Z-scores per firm (used for cluster analysis)
ESM_9.py – Python script to compute mean Z-scores
ESM_10.xlsx – Re-standardized Z-scores based on firm-level means
ESM_11.py – Python script to re-standardize mean Z-scores
ESM_12.py – Python script to generate the hierarchical clustering dendrogram
All files are provided to ensure transparency and reproducibility of the computational procedures in the manuscript. Each script is commented and formatted for clarity. The dataset is intended for educational and academic reuse under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides comprehensive information on road intersection crashes recognised as "high-low" outliers within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in at least 5% of the total "high-low" outlier road intersection crashes for the years 2017, 2018, 2019, and 2021. The dataset is meticulously organised according to support metric values, ranging from 0,05 to 0,0278, with entries presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 0,99 MBNumber of Files: The dataset contains a total of 10212 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-low" outlier fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes that occurred within the "high-low" outlier fishnet grid cells were extracted to be processed by association analysis. The association analysis of the "high-low" outlier road intersection crashes was processed using Python software and involved the use of a 0,05 support metric value. Consequently, commonly occurring crash attributes among at least 5% of the "high-low" outlier road intersection crashes were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset offers a detailed inventory of road intersections and their corresponding suburbs within Cape Town, meticulously curated to highlight instances of high motorcycle (Motorcycle: Above 125cc, Motorcycle: 125cc and under, Quadru-cycle, Motor Tricycle) crash counts that resulted in injuries (slight, serious, fatalities) observed in "high-high" cluster and "high-low" outlier fishnet grid cells across the years 2017, 2018 and 2019. To enhance its utility, the dataset meticulously colour-codes each month associated with elevated crash occurrences, providing a nuanced perspective. Furthermore, the dataset categorises road intersections based on their placement within "high-high" clusters (marked with pink tabs) or "high-low" outlier cells (indicated by red tabs). For ease of navigation, the intersections are further organised alphabetically by suburb name, ensuring accessibility and clarity.Data SpecificsData Type: Geospatial-temporal categorical data with numeric attributesFile Format: Word document (.docx)Size: 157 KBNumber of Files: The dataset contains a total of 158 road intersection records (11 "high-high" clusters and 147 "high-low" outliers)Date Created: 22nd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, Open Refine, Python, SQLProcessing Steps: The raw road traffic crash data underwent a comprehensive refining process using Python software to ensure its accuracy and consistency. Following this, duplicates were eliminated to retain only one entry per crash incident. Subsequently, the data underwent further refinement with Open Refine software, focusing specifically on isolating unique crash descriptions for subsequent geocoding in ArcGIS Pro. Notably, during this process, only the road intersection crashes were retained, as they were the only incidents with spatial definitions.Once geocoded, road intersection crashes that involved either a motor tricycle, motorcycle above 125cc, motorcycle below 125cc and quadru-cycles and that were additionally associated with a slight, severe or fatal injury type were extracted so that subsequent spatio-temporal analyses would focus on these crashes only. The spatio-temporal analysis methods by which these motorcycle crashes were analysed included spatial autocorrelation, hotspot analysis, and cluster and outlier analysis. Leveraging these methods, road intersections with motorcycle crashes identified as either "high-high" clusters or "high-low" outliers were extracted for inclusion in the dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2019
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides comprehensive information on road intersection pedestrian crashes resulting in serious injuries and/or fatalities recognised as "high-low" outliers within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in at least 5% of the total "high-low" outlier pedestrian road intersection crashes resulting in serious injuries and/or fatalities for the years 2017, 2018 and 2019. The dataset is meticulously organised according to support metric values, ranging from 0,05 to 0,099, with entries presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 477 KBNumber of Files: The dataset contains a total of 10260 association rulesDate Created: 24th May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-low" outlier fishnet grid cells from a cluster and outlier analysis, all the road intersection pedestrian crashes resulting in serious injuries and/or fatalities that occurred within the "high-low" outlier fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,05 support metric value. Consequently, commonly occurring crash attributes among at least 5% of the "high-low" outlier road intersection pedestrian crashes resulting in serious injuries and/or fatalities were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2019
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Data DescriptionThe CADDI dataset is designed to support research in in-class activity recognition using IMU data from low-cost sensors. It provides multimodal data capturing 19 different activities performed by 12 participants in a classroom environment, utilizing both IMU sensors from a Samsung Galaxy Watch 5 and synchronized stereo camera images. This dataset enables the development and validation of activity recognition models using sensor fusion techniques.Data Generation ProceduresThe data collection process involved recording both continuous and instantaneous activities that typically occur in a classroom setting. The activities were captured using a custom setup, which included:A Samsung Galaxy Watch 5 to collect accelerometer, gyroscope, and rotation vector data at 100Hz.A ZED stereo camera capturing 1080p images at 25-30 fps.A synchronized computer acting as a data hub, receiving IMU data and storing images in real-time.A D-Link DSR-1000AC router for wireless communication between the smartwatch and the computer.Participants were instructed to arrange their workspace as they would in a real classroom, including a laptop, notebook, pens, and a backpack. Data collection was performed under realistic conditions, ensuring that activities were captured naturally.Temporal and Spatial ScopeThe dataset contains a total of 472.03 minutes of recorded data.The IMU sensors operate at 100Hz, while the stereo camera captures images at 25-30Hz.Data was collected from 12 participants, each performing all 19 activities multiple times.The geographical scope of data collection was Alicante, Spain, under controlled indoor conditions.Dataset ComponentsThe dataset is organized into JSON and PNG files, structured hierarchically:IMU Data: Stored in JSON files, containing:Samsung Linear Acceleration Sensor (X, Y, Z values, 100Hz)LSM6DSO Gyroscope (X, Y, Z values, 100Hz)Samsung Rotation Vector (X, Y, Z, W quaternion values, 100Hz)Samsung HR Sensor (heart rate, 1Hz)OPT3007 Light Sensor (ambient light levels, 5Hz)Stereo Camera Images: High-resolution 1920×1080 PNG files from left and right cameras.Synchronization: Each IMU data record and image is timestamped for precise alignment.Data StructureThe dataset is divided into continuous and instantaneous activities:Continuous Activities (e.g., typing, writing, drawing) were recorded for 210 seconds, with the central 200 seconds retained.Instantaneous Activities (e.g., raising a hand, drinking) were repeated 20 times per participant, with data captured only during execution.The dataset is structured as:/continuous/subject_id/activity_name/ /camera_a/ → Left camera images /camera_b/ → Right camera images /sensors/ → JSON files with IMU data
/instantaneous/subject_id/activity_name/repetition_id/ /camera_a/ /camera_b/ /sensors/ Data Quality & Missing DataThe smartwatch buffers 100 readings per second before sending them, ensuring minimal data loss.Synchronization latency between the smartwatch and the computer is negligible.Not all IMU samples have corresponding images due to different recording rates.Outliers and anomalies were handled by discarding incomplete sequences at the start and end of continuous activities.Error Ranges & LimitationsSensor data may contain noise due to minor hand movements.The heart rate sensor operates at 1Hz, limiting its temporal resolution.Camera exposure settings were automatically adjusted, which may introduce slight variations in lighting.File Formats & Software CompatibilityIMU data is stored in JSON format, readable with Python’s json library.Images are in PNG format, compatible with all standard image processing tools.Recommended libraries for data analysis:Python: numpy, pandas, scikit-learn, tensorflow, pytorchVisualization: matplotlib, seabornDeep Learning: Keras, PyTorchPotential ApplicationsDevelopment of activity recognition models in educational settings.Study of student engagement based on movement patterns.Investigation of sensor fusion techniques combining visual and IMU data.This dataset represents a unique contribution to activity recognition research, providing rich multimodal data for developing robust models in real-world educational environments.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025caddiinclassactivitydetection, title={CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Monica Pina-Navarro and Miguel Cazorla and Francisco Gomez-Donoso}, year={2025}, eprint={2503.02853}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.02853}, }