The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.
All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.
The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.
The number of measurement points were distributed across participants as follows:
Participant 1 – 3 baseline, 6 treatment, 1 post-treatment
Participant 3 – 2 baseline, 7 treatment, 1 post-treatment
Participant 5 – 2 baseline, 5 treatment, 1 post-treatment
Participant 6 – 3 baseline, 4 treatment, 1 post-treatment
Participant 7 – 2 baseline, 5 treatment, 1 post-treatment
In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.
Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.
Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.
Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.
Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.
During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.
Age = age in ‘years, months’ at the start of the study
Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)
Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.
The second part of the variable name refers to the task, as follows:
DC = dot comparison
SDC = single-digit computation
NLE_UT = number line estimation (untrained set)
NLE_T= number line estimation (trained set)
CE = multidigit computational estimation
NC = number comparison
The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).
Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.
The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.
If you share or use this dataset, please cite [4] and [5] in any relevant documentation.
In addition, an image dataset for crack classification has also been published at [6].
References:
[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873
[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605
[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434
[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678
5 Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044
[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is A gyrokinetic calculation of transmission & reflection of the fast wave in the ion cyclotron range of frequencies. It features 7 columns including author, publication date, language, and book publisher.
The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air pollution emissions from a range of regulated source types, whether directly through the use of technical measures, or indirectly through economic or other measures.
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
The U.S. Geological Survey has been characterizing the regional variation in shear stress on the sea floor and sediment mobility through statistical descriptors. The purpose of this project is to identify patterns in stress in order to inform habitat delineation or decisions for anthropogenic use of the continental shelf. The statistical characterization spans the continental shelf from the coast to approximately 120 m water depth, at approximately 5 km resolution. Time-series of wave and circulation are created using numerical models, and near-bottom output of steady and oscillatory velocities and an estimate of bottom roughness are used to calculate a time-series of bottom shear stress at 1-hour intervals. Statistical descriptions such as the median and 95th percentile, which are the output included with this database, are then calculated to create a two-dimensional picture of the regional patterns in shear stress. In addition, time-series of stress are compared to critical stress values at select points calculated from observed surface sediment texture data to determine estimates of sea floor mobility.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.
Contents
The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.
Day 1, March 19, 2019
Start: 2019-03-19T11:00:00.000000+01:00
End: 2019-03-19T18:00:00.000000+01:00
Day 2, March 20, 2019
Start: 2019-03-20T08:00:00.000000+01:00
End: 2019-03-20T15:30:00.000000+01:00
The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.
cz.muni.csirt.IpfixEntry.tgz – an archive of IPFIX traffic flows enriched with an additional payload of parsed application protocols in raw JSON.
cz.muni.csirt.SyslogEntry.tgz – an archive of Linux Syslog entries with the payload of corresponding text-based log messages.
cz.muni.csirt.WinlogEntry.tgz – an archive of Windows Event Log entries with the payload of original events in raw XML.
Each archive listed above includes a directory of the same name with the following four files, ready to be processed.
data.json.gz – the actual data entries in a single gzipped JSON file.
dictionary.yml – data dictionary for the entries.
schema.ddl – data schema for Apache Spark analytics engine.
schema.jsch – JSON schema for the entries.
Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.
global-gateway-config.json – the network configuration of the global gateway in the NetJSON format.
global-gateway-routing.json – the routing configuration of the global gateway in the NetJSON format.
redteam-attack-schedule.{csv,odt} – the schedule of the Red Team attacks in CSV and ODT format. Source for Table 2.
redteam-reserved-ip-ranges.{csv,odt} – the list of IP segments reserved for the Red Team in CSV and ODT format. Source for Table 1.
topology.{json,pdf,png} – the topology of the complete Cyber Czech exercise network in the NetJSON, PDF and PNG format.
topology-small.{pdf,png} – simplified topology in the PDF and PNG format. Source for Figure 1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project provides a national unified database of residential building retrofit measures and associated retail prices and end-user might experience. These data are accessible to software programs that evaluate most cost-effective retrofit measures to improve the energy efficiency of residential buildings and are used in the consumer-facing website https://remdb.nrel.gov/
This publicly accessible, centralized database of retrofit measures offers the following benefits:
This database provides full price estimates for many different retrofit measures. For each measure, the database provides a range of prices, as the data for a measure can vary widely across regions, houses, and contractors. Climate, construction, home features, local economy, maturity of a market, and geographic location are some of the factors that may affect the actual price of these measures.
This database is not intended to provide specific cost estimates for a specific project. The cost estimates do not include any rebates or tax incentives that may be available for the measures. Rather, it is meant to help determine which measures may be more cost-effective. The National Renewable Energy Laboratory (NREL) makes every effort to ensure accuracy of the data; however, NREL does not assume any legal liability or responsibility for the accuracy or completeness of the information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and codes for "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 "
The MATLAB codes and related datasets are used for generating the figures for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023".
Files and variables
File 1: Data_and_Code.zip
Directory: Main_function
**Description:****Include MATLAB scripts and functions. Each script include discriptions that guide the user how to used it and how to find the dataset that used for processing.
MATLAB Main Scripts: Include the whole steps to process the data, output figures, and output videos.
Script_1_Ice_velocity_process_flow.m
Script_2_strain_rate_process_flow.m
Script_3_DROT_grounding_line_extraction.m
Script_4_Read_ICESat2_h5_files.m
Script_5_Extraction_results.m
MATLAB functions: Five Files that includes MATLAB functions that support the main script:
1_Ice_velocity_code: Include MATLAB functions related to ice velocity post-processing, includes remove outliers, filter, correct for atmospheric and tidal effect, inverse weited averaged, and error estimate.
2_strain_rate: Include MATLAB functions related to strain rate calculation.
3_DROT_extract_grounding_line_code: Include MATLAB functions related to convert range offset results output from GAMMA to differential vertical displacement and used the result extract grounding line.
4_Extract_data_from_2D_result: Include MATLAB functions that used for extract profiles from 2D data.
5_NeRD_Damage_detection: Modified code fom Izeboud et al. 2023. When apply this code please also cite Izeboud et al. 2023 (https://www.sciencedirect.com/science/article/pii/S0034425722004655).
6_Figure_plotting_code:Include MATLAB functions related to Figures in the paper and support information.
Director: data_and_result
Description:**Include directories that store the results output from MATLAB. user only neeed to modify the path in MATLAB script to their own path.
1_origin : Sample data ("PS-20180323-20180329", “PS-20180329-20180404”, “PS-20180404-20180410”) output from GAMMA software in Geotiff format that can be used to calculate DROT and velocity. Includes displacment, theta, phi, and ccp.
2_maskccpN: Remove outliers by ccp < 0.05 and change displacement to velocity (m/day).
3_rockpoint: Extract velocities at non-moving region
4_constant_detrend: removed orbit error
5_Tidal_correction: remove atmospheric and tidal induced error
6_rockpoint: Extract non-aggregated velocities at non-moving region
6_vx_vy_v: trasform velocities from va/vr to vx/vy
7_rockpoint: Extract aggregated velocities at non-moving region
7_vx_vy_v_aggregate_and_error_estimate: inverse weighted average of three ice velocity maps and calculate the error maps
8_strain_rate: calculated strain rate from aggregate ice velocity
9_compare: store the results before and after tidal correction and aggregation.
10_Block_result: times series results that extrac from 2D data.
11_MALAB_output_png_result: Store .png files and time serties result
12_DROT: Differential Range Offset Tracking results
13_ICESat_2: ICESat_2 .h5 files and .mat files can put here (in this file only include the samples from tracks 0965 and 1094)
14_MODIS_images: you can store MODIS images here
shp: grounding line, rock region, ice front, and other shape files.
File 2 : PIG_front_1947_2023.zip
Includes Ice front positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.
File 3 : PIG_DROT_GL_2016_2021.zip
Includes grounding line positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.
Data was derived from the following sources:
Those links can be found in MATLAB scripts or in the paper "**Open Research" **section.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a point feature representing range points within the City and County of Denver. Range points are termini for range lines, which serve as offsets to right-of-way lines and block lines. Range points are typically located below surface streets.
ACCESS CONSTRAINTS:
None.
USE CONSTRAINTS: The City and County of Denver is not responsible and shall not be liable to the user for damages of any kind arising out of the use of data or information provided by the City and County of Denver, including the installation of the data or information, its use, or the results obtained from its use.
ANY DATA OR INFORMATION PROVIDED BY THE City and County of Denver IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Data or information provided by the City and County of Denver shall be used and relied upon only at the user's sole risk, and the user agrees to indemnify and hold harmless the City and County of Denver, its officials, officers and employees from any liability arising out of the use of the data/information provided.
NOT FOR ENGINEERING PURPOSES
XYZ Pvt Ltd is an E-Commerce Company dealing in a wide range of Healthy Products combined with the power of Artificial Intelligence. But recently it has started facing an issue of HIGH Return Rates throughout India. (A return order is when the order is in transit but a customer refuses to accept it sighting different reasons)
The dataset has 1600 orders with every detail ranging from city and state for geographical analysis or dates for time-series analysis, each product's category, name, cost and ID has also been given for more detailed analysis.
If there are columns you would like me to add please let me know in the comments.
The latest data has been cleaned.
Study the dataset to figure out the Return Rate Patterns amongst the customers. Every column has been carefully added for you to analyze which may/may not directly influence the return rates.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Abstract: This data shows the model nodes, indicating water level only and/or flow and water levels along the centre-line of rivers that have been modelled to generate the CFRAM flood maps. The nodes estimate maximum design event flood flows and maximum flood levels. Flood event probabilities are referred to in terms of a percentage Annual Exceedance Probability, or ‘AEP’. This represents the probability of an event of this, or greater, severity occurring in any given year. These probabilities may also be expressed as odds (e.g. 100 to 1) of the event occurring in any given year. They are also commonly referred to in terms of a return period (e.g. the 100-year flood), although this period is not the length of time that will elapse between two such events occurring, as, although unlikely, two very severe events may occur within a short space of time. The following sets out a range of flood event probabilities for which fluvial and coastal flood maps are typically developed, expressed in terms of Annual Exceedance Probability (AEP), and identifies their parallels under other forms of expression: 10% (High Probability) Annual Exceedance Probability which can also be expressed as the 10 Year Return Period and as a 10:1 odds of occurrence in any given year. 1% (Medium Probability - Fluvail/River Flood Maps) Annual Exceedance Probability which can also be expressed as the 100 Year Return Period and as 100:1 odds of occurrence in any given year. 0.5% (Medium Probability - Coastal Flood Maps) Annual Exceedance Probability which can also be expressed as the 200 Year Return Period and as 200:1 odds of occurrence in any given year. 0.1% (Low Probability) Annual Exceedance Probability which can also be expressed as the 1000 Year Return Period and as 1000:1 odds of occurrence in any given year. The Mid-Range Future Scenario extents where generated taking in in the potential effects of climate change using an increase in rainfall of 20% and sea level rise of 500mm (20 inches). Data has been produced for the 'Areas of Further Assessment' (AFAs), as required by the EU 'Floods' Directive [2007/60/EC] and designated under the Preliminary Flood Risk Assessment, and also for other reaches between the AFAs and down to the sea that are referred to as 'Medium Priority Watercourses' (MPWs). River reaches that have been modelled are indicated by the CFRAM Modelled River Centrelines dataset. Flooding from other reaches of river may occur, but has not been mapped, and so areas that are not shown as being within a flood extent may therefore be at risk of flooding from unmodelled rivers (as well as from other sources). The purpose of the Flood Maps is not to designate individual properties at risk of flooding. They are community-based maps. Lineage: Fluvial and coastal flood map data is developed using hydrodynamic modelling, based on calculated design river flows and extreme sea levels, surveyed channel cross-sections, in-bank / bank-side / coastal structures, Digital Terrain Models, and other relevant datasets (e.g. land use, data on past floods for model calibration, etc.). The process may vary for particular areas or maps. Technical Hydrology and Hydraulics Reports set out full technical details on the derivation of the flood maps. For fluvial flood levels, calibration and verification of the models make use of the best available data, including hydrometric records, photographs, videos, press articles and anecdotal information. Subject to the availability of suitable calibration data, models are verified in so far as possible to target vertical water level accuracies of approximately +/-0.2m for areas within the AFAs, and approximately +/-0.4m along the MPWs. For coastal flood levels, the accuracy of the predicted annual exceedance probability (AEP) of combined tide and surge levels depends on the accuracy of the various components used in deriving these levels i.e. accuracy of the tidal and surge model, the accuracy of the statistical data and the accuracy for the conversion from marine datum to land levelling datum. The output of the water level modelling, combined with the extreme value analysis undertaken as detailed above is generally within +/-0.2m for confidence limits of 95% at the 0.1% AEP. Higher probability (lower return period) events are expected to have tighter confidence limits. v101 (March 2025) The section of map near Oranmore Galway updated following a map review process see https://www.floodinfo.ie/map-review/ for further information, Map Review Code: MR019. v102 (July 2025) The section of map near Claregalway updated following a map review process see https://www.floodinfo.ie/map-review/ for further information, Map Review Code: MR057. Purpose: The data has been developed to comply with the requirements of the European Communities (Assessment and Management of Flood Risks) Regulations 2010 to 2015 (the “Regulations”) (implementing Directive 2007/60/EC) for the purposes of establishing a framework for the assessment and management of flood risks, aiming at the reduction of adverse consequences for human health, the environment, cultural heritage and economic activity associated with floods.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Overview The Human Vital Signs Dataset is a comprehensive collection of key physiological parameters recorded from patients. This dataset is designed to support research in medical diagnostics, patient monitoring, and predictive analytics. It includes both original attributes and derived features to provide a holistic view of patient health.
Attributes Patient ID
Description: A unique identifier assigned to each patient. Type: Integer Example: 1, 2, 3, ... Heart Rate
Description: The number of heartbeats per minute. Type: Integer Range: 60-100 bpm (for this dataset) Example: 72, 85, 90 Respiratory Rate
Description: The number of breaths taken per minute. Type: Integer Range: 12-20 breaths per minute (for this dataset) Example: 16, 18, 15 Timestamp
Description: The exact time at which the vital signs were recorded. Type: Datetime Format: YYYY-MM-DD HH:MM Example: 2023-07-19 10:15:30 Body Temperature
Description: The body temperature measured in degrees Celsius. Type: Float Range: 36.0-37.5°C (for this dataset) Example: 36.7, 37.0, 36.5 Oxygen Saturation
Description: The percentage of oxygen-bound hemoglobin in the blood. Type: Float Range: 95-100% (for this dataset) Example: 98.5, 97.2, 99.1 Systolic Blood Pressure
Description: The pressure in the arteries when the heart beats (systolic pressure). Type: Integer Range: 110-140 mmHg (for this dataset) Example: 120, 130, 115 Diastolic Blood Pressure
Description: The pressure in the arteries when the heart rests between beats (diastolic pressure). Type: Integer Range: 70-90 mmHg (for this dataset) Example: 80, 75, 85 Age
Description: The age of the patient. Type: Integer Range: 18-90 years (for this dataset) Example: 25, 45, 60 Gender
Description: The gender of the patient. Type: Categorical Categories: Male, Female Example: Male, Female Weight (kg)
Description: The weight of the patient in kilograms. Type: Float Range: 50-100 kg (for this dataset) Example: 70.5, 80.3, 65.2 Height (m)
Description: The height of the patient in meters. Type: Float Range: 1.5-2.0 m (for this dataset) Example: 1.75, 1.68, 1.82 Derived Features Derived_HRV (Heart Rate Variability)
Description: A measure of the variation in time between heartbeats. Type: Float Formula: 𝐻 𝑅
Standard Deviation of Heart Rate over a Period Mean Heart Rate over the Same Period HRV= Mean Heart Rate over the Same Period Standard Deviation of Heart Rate over a Period
Example: 0.10, 0.12, 0.08 Derived_Pulse_Pressure (Pulse Pressure)
Description: The difference between systolic and diastolic blood pressure. Type: Integer Formula: 𝑃
Systolic Blood Pressure − Diastolic Blood Pressure PP=Systolic Blood Pressure−Diastolic Blood Pressure Example: 40, 45, 30 Derived_BMI (Body Mass Index)
Description: A measure of body fat based on weight and height. Type: Float Formula: 𝐵 𝑀
Weight (kg) ( Height (m) ) 2 BMI= (Height (m)) 2
Weight (kg)
Example: 22.8, 25.4, 20.3 Derived_MAP (Mean Arterial Pressure)
Description: An average blood pressure in an individual during a single cardiac cycle. Type: Float Formula: 𝑀 𝐴
Diastolic Blood Pressure + 1 3 ( Systolic Blood Pressure − Diastolic Blood Pressure ) MAP=Diastolic Blood Pressure+ 3 1 (Systolic Blood Pressure−Diastolic Blood Pressure) Example: 93.3, 100.0, 88.7 Target Feature Risk Category Description: Classification of patients into "High Risk" or "Low Risk" based on their vital signs. Type: Categorical Categories: High Risk, Low Risk Criteria: High Risk: Any of the following conditions Heart Rate: > 90 bpm or < 60 bpm Respiratory Rate: > 20 breaths per minute or < 12 breaths per minute Body Temperature: > 37.5°C or < 36.0°C Oxygen Saturation: < 95% Systolic Blood Pressure: > 140 mmHg or < 110 mmHg Diastolic Blood Pressure: > 90 mmHg or < 70 mmHg BMI: > 30 or < 18.5 Low Risk: None of the above conditions Example: High Risk, Low Risk This dataset, with a total of 200,000 samples, provides a robust foundation for various machine learning and statistical analysis tasks aimed at understanding and predicting patient health outcomes based on vital signs. The inclusion of both original attributes and derived features enhances the richness and utility of the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Section 1: Introduction
Brief overview of dataset contents:
Current database contains anonymised data collected during exercise testing services performed on male and female participants (cycling, rowing, kayaking and running) provided by the Human Performance Laboratory, School of Medicine, Trinity College Dublin, Dublin 2, Ireland.
835 graded incremental exercise test files (285 cycling, 266 rowing / kayaking, 284 running)
Description file with each row representing a test file - COLUMNS: file name (AXXX), sport (cycling, running, rowing or kayaking)
Anthropometric data of participants by sport (age, gender, height, body mass, BMI, skinfold thickness,% body fat, lean body mass and haematological data; namely, haemoglobin concentration (Hb), haematocrit (Hct), red blood cell (RBC) count and white blood cell (WBC) count )
Test data (HR, VO2 and lactate data) at rest and across a range of exercise intensities
Derived physiological indices quantifying each individual’s endurance profile
Following a request from athletes seeking assessment by phone or e-mail the test protocol, risks, benefits and test and medical requirements, were explained verbally or by return e-mail. Subsequently, an appointment for an exercise assessment was arranged following the regulatory reflection period (7 days). Following this regulatory period each participant’s verbal consent was obtained pre-test, for participants under 18 years of age parent / guardian consent was obtained in writing. Ethics approval was obtained from the Faculty of Health Sciences ethics committee and all testing procedures were performed in compliance with Declaration of Helsinki guidelines.
All consenting participants were required to attend the laboratory on one occasion in a rested, carbohydrate loaded and well-hydrated state, and for male participants’ clean shaven in the facial region. All participants underwent a pre-test medical examination, including assessment of resting blood pressure, pulmonary function testing and haematological (Coulter Counter Act Diff, Beckmann Coulter, CA,US) review performed by a qualified medical doctor prior to exercise testing. Any person presenting with any cardiac abnormalities, respiratory difficulties, symptoms of cold or influenza, musculoskeletal injury that could impair performance, diabetes, hypertension, metabolic disorders, or any other contra-indicatory symptoms were excluded. In addition, participants completed a medical questionnaire detailing training history, previous personal and family health abnormalities, recent illness or injury, menstrual status for female participants, as well as details of recent travel and current vaccination status, and current medications, supplements and allergies. Barefoot height in metre (Holtain, Crymych, UK), body mass (counter balanced scales) in kilogram (Seca, Hamburg, Germany) and skinfold thickness in millimetre using a Harpenden skinfold caliper (Bath International, West Sussex, UK) were recorded pre-exercise.
Section 2: Testing protocols
2.1: Cycling
A continuous graded incremental exercise test (GxT) to volitional exhaustion was performed on an electromagnetically braked cycle ergometer (Lode Excalibur Sport, Groningen, The Netherlands). Participants initially identified a cycling position in which they were most comfortable by adjusting saddle height, saddle fore-aft position relative to the crank axis, saddle to handlebar distance and handlebar height. Participant’s feet were secured to the ergometer using their own cycling shoes with cleats and accompanying pedals. The protocol commenced with a 15-min warm-up at a workload of 120 Watt (W), followed by a 10-min rest. The GxT began with a 3-min stationary phase for resting data collection, followed by an active phase commencing at a workload of 100 or 120 W for female and male participants, respectively, and subsequently increasing by a 20, 30 or 40 W incremental increase every 3-min depending on gender and current competition category. During assessment participants maintained a constant self-selected cadence chosen during their warm-up (permitted window was 5 rev.min−1 within a permitted absolute range of 75 to 95 rev.min−1) and the test was terminated when a participant was no longer able to maintain a constant cadence.
Heart rate (HR) data were recorded continuously by radio-telemetry using a Cosmed HR monitor (Cosmed, Rome, Italy). During the test, blood samples were collected from the middle finger of the right hand at the end of the second minute of each 3-min interval. The fingertip was cleaned to remove any sweat or blood and lanced using a long point sterile lancet (Braun, Melsungen, Germany). The blood sample was collected into a heparinised capillary tube (Brand, Wertheim, Germany) by holding the tube horizontal to the droplet and allowing transfer by capillary action. Subsequently, a 25μL aliquot of whole blood was drawn from the capillary tube using a YSI syringepet (YSI, OH, USA) and added into the chamber of a YSI 1500 Sport lactate analyser (YSI, OH, USA) for determination of non-lysed [Lac] in mmol.L−1. The lactate analyser was calibrated to the manufacturer’s requirements (± 0.05 mmol.L−1) before each test using a standard solution (YSI, OH, USA) of known concentration (5 mmol.L−1) and analyser linearity was confirmed using either a 15 or 30 mmol.L-1 standard solution (YSI, OH, USA).
Gas exchange variables including respiration rate (Rf in breaths.min-1), minute ventilation (VE in L.min-1), oxygen consumption (VO2 in L.min-1 and in mL.kg-1.min-1) and carbon dioxide production (VCO2 in L.min-1), were measured on a breath-by-breath basis throughout the test, using a cardiopulmonary exercise testing unit (CPET) and an associated software package (Cosmed, Rome, Italy). Participants wore a face mask (Hans Rudolf, KA, USA) which was connected to the CPET unit. The metabolic unit was calibrated prior to each test using ambient air and an alpha certified gas mixture containing 16% O2, 5% CO2 and 79% N2 (Cosmed, Rome, Italy). Volume calibration was performed using a 3L gas calibration syringe (Cosmed, Rome, Italy). Barometric pressure recorded by the CPET was confirmed by recording barometric pressure using a laboratory grade barometer.
Following testing mean HR and mean VO2 data at rest and during each exercise increment were computed and tabulated over the final minute of each 3-min interval. A graphical plot of [Lac], mean VO2 and mean HR versus cycling workload was constructed and analysed to quantify physiological endurance indices, see Data Analysis section. Data for VO2 peak in L.min-1 (absolute) and in mL.kg-1.min-1 (relative) and VE peak in L.min-1 were reported as the peak data recorded over any 10 consecutive breaths recorded during the last minute of the final exercise increment.
2.2: Running protocol
A continuous graded incremental exercise test (GxT) to volitional exhaustion was performed on a motorised treadmill (Powerjog, Birmingham, UK). The running protocol, performed at a gradient of 0%, commenced with a 15-min warm-up at a velocity (km.h-1) which was lower than the participant’s reported typical weekly long run (>60 min) on-road training velocity. Subsequently, the warm-up was followed by a 10 minute rest / dynamic stretching phase. From a safety perspective during all running GxT participants wore a suspended lightweight safety harness to minimise any potential falls risk. The GxT began with a 3-min stationary phase for resting data collection, followed by an active phase commencing at a sub-maximal running velocity which was lower than the participant’s reported typical weekly long run (>60 min) on-road training velocity, and subsequently increased by ≥ 1 km.h-1 every 3-min depending on gender and current competition category. The test was terminated when a participant was no longer able to maintain the imposed treadmill.
Measurement variables, equipment and pre-test calibration procedures, timing and procedure for measurement of selected variables and subsequent data analysis were as outlined in Section 2.1.
2.3: Rowing / kayaking protocol
A discontinuous graded incremental exercise test (GxT) to volitional exhaustion was performed on a Concept 2C rowing ergometer (Concept, VA, US) in rowers or a Dansprint kayak ergometer (Dansprint, Hvidovre, Denmark) in flat-water kayakers. The protocol commenced with a 15-min low-intensity warm-up at a workload (W) dependent on gender, sport and competition category, followed by a 10-min rest. For rowing the flywheel damping (120, 125 or 130W) was set dependent on gender and competition category. For kayaking the bungee cord tension was adjusted by individual participants to suit their requirements. A discontinuous protocol of 3-min exercise at a targeted load followed by a 1-min rest phase to facilitate stationary earlobe capillary blood sample collection and resetting of ergometer display (Dansprint ergometer) was used. The GxT began with a 3-min stationary phase for resting data collection, followed by an active phase commencing at a sub-maximal load 80 to 120 W for rowing, 50 to 90 W for kayaking and subsequently increased by 20,30 or 40 W every 3-min depending on gender, sport and current competition category. The test was terminated when a participant was no longer able to maintain the targeted workload.
Measurement variables, equipment and pre-test calibration procedures, timing and procedure for measurement of selected variables and subsequent data analysis were as outlined in Section 2.1.
3.1: Data analysis
Constructed graphical plots (HR, VO2 and [Lac] versus load / velocity) were analysed to quantify the following; load / velocity at TLac, HR at TLac, [Lac] at TLac, % of VO2 peak at TLac, % of HRmax at TLac, load / velocity and HR at a nominal [Lac] of 2 mmol.L-1, load / velocity, VO2 and [Lac} at a nominal HR of
Countix is a real world dataset of repetition videos collected in the wild (i.e.YouTube) covering a wide range of semantic settings with significant challenges such as camera and object motion, diverse set of periods and counts, and changes in the speed of repeated actions. Countix include repeated videos of workout activities (squats, pull ups, battle rope training, exercising arm), dance moves (pirouetting, pumping fist), playing instruments (playing ukulele), using tools repeatedly (hammer hitting objects, chainsaw cutting wood, slicing onion), artistic performances (hula hooping, juggling soccer ball), sports (playing ping pong and tennis) and many others. Figure 6 illustrates some examples from the dataset as well as the distribution of repetition counts and period lengths.
Raw data to calculate rate of adaptationRaw dataset for rate of adaptation calculations (Figure 1) and related statistics.dataall.csvR code to analyze raw data for rate of adaptationCompetition Analysis.RRaw data to calculate effective population sizesdatacount.csvR code to analayze effective population sizesR code used to analyze effective population sizes; Figure 2Cell Count Ne.RR code to determine our best estimate of the dominance coefficient in each environmentR code to produce figures 3, S4, S5 -- what is the best estimate of dominance? Note, competition and effective population size R code must be run first in the same session.what is h.R
Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
To enable the evaluation of HPE models and the development of exercise feedback systems, we produced a new rehabilitation dataset (REHAB24-6). The main focus is on a diverse range of exercises, views, body heights, lighting conditions, and exercise mistakes. With the publicly available RGB videos, skeleton sequences, repetition segmentation, and exercise correctness labels, this dataset offers the most comprehensive testbed for exercise-correctness-related tasks.
Contents
65 recordings (184,825 frames, 30 FPS):
RGB videos from two cameras (videos.zip, horizontal = Camera17, vertical = Camera18);
3D and 2D projected positions of 41 motion capture marker (<2/3>d_markers.zip, marker labels in marker_names.txt);
3D and 2D projected positions of 26 skeleton joints (<2/3>d_joints.zip, joint labels in joint_names.txt);
Annotation of 1,072 exercise repetitions (Segmentation.csv, indexed based only on 30 FPS data, described in Segmentation.txt):
Temporal segmentation (start/end frame, most between 2–5 seconds);
Binary correctness label (around 90 from each category in each exercise, except Ex3 with around 50);
Exercise direction (around 90 from each direction in each exercise);
Lighting conditions label.
Recording Conditions
Our laboratory setup included 18 synchronized sensors (2 RGB video cameras, 16 ultra-wide motion capture cameras) spread around an 8.2 × 7 m room. The RGB cameras were located in the corners of the room, one in a horizontal position (hor.), providing a larger field of view (FoV), and one in a vertical (ver.), resulting in a narrower FoV. Both types of cameras were synchronized with a sampling frequency of 30 frames per second (FPS).
The subjects wore motion capture body suits with 41 markers attached to them, which were detected by optical cameras. The OptiTrack Motive 2.3.0 software inferred the 3D positions of the markers in virtual centimeters and converted them into a skeleton with 26 joints, forming our human pose 3D ground truth (GT).
To acquire a 2D version of the ground truth in pixel coordinates, we applied a projection of the virtual coordinates into the camera using the simplified pinhole model. We estimated the parameters for this projection as follows. First, the virtual position of the cameras was estimated using measuring tape and knowledge of the virtual origin. Then, the orientation of the cameras was optimized by matching the virtual marker positions with their position in the videos.
We also simulated changes in lighting conditions: a few videos were shot in the natural evening light, which resulted in worse visibility, while the rest were under artificial lighting.
Exercises
10 subjects participated in our recording and consented to release the data publicly: 6 males and 4 females of different ages (from 25 to 50) and fitness levels. A physiotherapist instructed the subjects on how to perform the exercises so that at least five repetitions were done in what he deemed the correct way and five more incorrectly. The participants had a certain degree of freedom, e.g., in which leg they used in Ex4 and Ex5. Similarly, the physiotherapist suggested different exercise mistakes for each subject.
Ex1 = Arm abduction: sideway raising of the straightened right arm;
Ex2 = Arm VW: fluent transition of arms between V (arms straight up) and W (elbows down, hands up) shape;
Ex3 = Push-ups: push-ups with hands on a table;
Ex4 = Leg abduction: sideway raising of the straightened leg;
Ex5 = Leg lunge: pushing a knee of the back leg down while keeping a right angle on the front knee;
Ex6 = Squats.
Every exercise was also executed in two directions, resulting in different views of the subject depending on the camera. Facing the horizontal camera resulted in a front view for that camera and a profile from the other. Facing the wall between the cameras shows the subject from half-profile in both cameras. A rare direction, only used for push-ups due to the use of the table, was facing the vertical camera, with the views being reversed compared to the first orientation.
Citation
Cite the related conference paper:
Černek, A., Sedmidubsky, J., Budikova P.: REHAB24-6: Physical Therapy Dataset for Analyzing Pose Estimation Methods. 17th International Conference on Similarity Search and Applications (SISAP). Springer, 14 pages, 2024.
License
This dataset is for academic or non-profit organization noncomercial research use only. By using you agree to appropriately reference the paper above in any publication making of its use. For comercial purposes contact us at info@visioncraft.ai
The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.
All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.
The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.
The number of measurement points were distributed across participants as follows:
Participant 1 – 3 baseline, 6 treatment, 1 post-treatment
Participant 3 – 2 baseline, 7 treatment, 1 post-treatment
Participant 5 – 2 baseline, 5 treatment, 1 post-treatment
Participant 6 – 3 baseline, 4 treatment, 1 post-treatment
Participant 7 – 2 baseline, 5 treatment, 1 post-treatment
In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.
Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.
Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.
Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.
Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.
During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.
Age = age in ‘years, months’ at the start of the study
Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)
Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.
The second part of the variable name refers to the task, as follows:
DC = dot comparison
SDC = single-digit computation
NLE_UT = number line estimation (untrained set)
NLE_T= number line estimation (trained set)
CE = multidigit computational estimation
NC = number comparison
The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).
Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.