Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.
S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826
The HaDR dataset is a multimodal dataset designed for human-robot gesture-based interaction research, consisting of RGB and Depth frames, with binary masks for each hand instance (i1, i2, single class data). The dataset is entirely synthetic, generated using Domain Randomization technique in CoppeliaSim 3D. The dataset can be used to train Deep Learning models to recognize hands using either a single modality (RGB or depth) or both simultaneously. The training-validation split comprises 95K and 22K samples, respectively, with annotations provided in COCO format. The instances are uniformly distributed across the image boundaries. The vision sensor captures depth and color images of the scene, with the depth pixel values scaled into a single channel 8-bit grayscale image in the range [0.2, 1.0] m. The following aspects of the scene were randomly varied during generation of dataset: • Number, colors, textures, scales and types of distractor objects selected from a set of 3D models of general tools and geometric primitives. A special type of distractor – an articulated dummy without hands (for instance-free samples) • Hand gestures (9 options). • Hand models’ positions and orientations. • Texture and surface properties (diffuse, specular and emissive properties) and number (from none to 2) of the object of interest, as well as its background. • Number and locations of directional lights sources (from 1 to 4), in addition to a planar light for ambient illumination. The sample resolution is set to 320×256, encoded in lossless PNG format, and contains only right hand meshes (we suggest using Flip augmentations during training), with a maximum of two instances per sample.
Test dataset (real camera images): Test dataset containing 706 images was captured using a real RGB-D camera (RealSense L515) in a cluttered and unstructured industrial environment. The dataset comprises various scenarios with diverse lighting conditions, backgrounds, obstacles, number of hands, and different types of work gloves (red, green, white, yellow, no gloves) with varying sleeve lengths. The dataset is assumed to have only one user, and the maximum number of hand instances per sample was limited to two. The dataset was manually labelled, and we provide hand instance segmentation COCO annotations in instances_hands_full.json (separately for train and val) and full arm instance annotations in instances_arms_full.json. The sample resolution was set to 640×480, and depth images were encoded in the same way as those of the synthetic dataset.
Channel-wise normalization and standardization parameters for datasets
| Dataset | Mean (R, G, B, D) | STD (R, G, B, D) |
|---|---|---|
| Train | 98.173, 95.456, 93.858, 55.872 | 67.539, 67.194, 67.796, 47.284 |
| Validation | 99.321, 97.284, 96.318, 58.189 | 67.814, 67.518, 67.576, 47.186 |
| Test | 123.675, 116.28, 103.53, 35.3792 | 58.395, 57.12, 57.375, 45.978 |
If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.
S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826
Facebook
Twitterhttp://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
This replication package contains all necessary scripts and data to replicate the main figures and tables presented in the paper.
1_scriptsThis folder contains all scripts required to replicate the main figures and tables of the paper. The scripts are numbers with a prefix (e.g. "1_") in the order they should be run. Output will also be produced in this folder.
0_init.Rmd: An R Markdown file that installs and loads all packages necessary for the subsequent scripts. - 1_fig_1.Rmd: Primarily produces Figure 1 (Zipf's plots) and conducts statistical tests to support underlying statistical claims made through the figure.
2_fig_2_to_4.Rmd: Primarily produces Figures 2 to 4 (average levels of expression) and conducts statistical tests to support underlying statistical claims made through the figures. This includes conducting t-tests to establish subgroup differences.
The script also includes The file table_controlling_how.csv contains the full set of regression results for the analysis of subgroup differences in political stances, controlling for emotionality, egocentrism, and toxicity. This file includes effect sizes, standard errors, confidence intervals, and p-values for each stance, group variable, and confounder.
3_fig_5_to_6.Rmd: Primarily produces Figures 5 to 6 (trends in expression) and conducts statistical tests to support underlying statistical claims made through the figures. This includes conducting t-tests to establish subgroup differences.
4_tab_1_to_2.Rmd: Produces Tables 1 to 2, and shows code for Table A5 (descriptive tables).
Expected run time for each script is under 3 minutes and requires around 4GB RAM. Script 3_fig_5_to_6.Rmd can take up to 3-4 minutes and requires up to 6GB RAM. Installation of each package for the first time user may take around 2 minutes each, except 'tidyverse', which may take around 4 minutes.
We have not provided a demo since the actual dataset used for analysis is small enough and computations are efficient enough to be run in most systems.
Each script starts with a layperson explanation to overview the functionality of the code and a pseudocode for a detailed procedure, followed by the actual code.
2_dataThis folder contains all data used to replicate the main results. The data is called by the respective scripts automatically using relative paths.
data_dictionary.txt: Provides a description of all variables as they are coded in the various datasets, especially the main author by time level dataset called repl_df.csv.- Processed data at individual author by time (year by month) level aggregated measures are provided, as raw data containing raw tweets cannot be shared.This project uses R and RStudio. Make sure you have the following installed:
Once installed, to ensure the correct versions of the required packages are installed, use the following R markdown script '0_init.Rmd'. This script will install the remotes package (if not already installed) and then install the specified versions of the required packages.
This project is licensed under the Apache License 2.0 - see the license.txt file for details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveThere is currently inconclusive evidence regarding the relationship between recidivism and mental illness. This retrospective study aimed to use rigorous machine learning methods to understand the unique predictive utility of mental illness for recidivism in a general population (i.e.; not only those with mental illness) prison sample in the United States.MethodParticipants were adult men (n = 322) and women (n = 72) who were recruited from three prisons in the Midwest region of the United States. Three model comparisons using Bayesian correlated t-tests were conducted to understand the incremental predictive utility of mental illness, substance use, and crime and demographic variables for recidivism prediction. Three classification statistical algorithms were considered while evaluating model configurations for the t-tests: elastic net logistic regression (GLMnet), k-nearest neighbors (KNN), and random forests (RF).ResultsRates of substance use disorders were particularly high in our sample (86.29%). Mental illness variables and substance use variables did not add predictive utility for recidivism prediction over and above crime and demographic variables. Exploratory analyses comparing the crime and demographic, substance use, and mental illness feature sets to null models found that only the crime and demographics model had an increased likelihood of improving recidivism prediction accuracy.ConclusionsDespite not finding a direct relationship between mental illness and recidivism, treatment of mental illness in incarcerated populations is still essential due to the high rates of mental illnesses, the legal imperative, the possibility of decreasing institutional disciplinary burden, the opportunity to increase the effectiveness of rehabilitation programs in prison, and the potential to improve meaningful outcomes beyond recidivism following release.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains wireless link quality estimation data for the FlockLab testbed [1,2]. The rationale and description of this dataset is described in a the following abstract (pdf is included in this repository -- see below).
Dataset: Wireless Link Quality Estimationon FlockLab – and Beyond Romain Jacob, Reto Da Forno, Roman Trüb, Andreas Biri, Lothar Thiele DATA '19 Proceedings of the 2nd Workshop on Data Acquisition To Analysis, 2019
Data collection scenario
The data collection scenario is simple. Each FlockLab node is assigned one dedicated time slot. In this slot, a node sends 100 packets, called strobes. All strobes have the same payload size and use a given radio frequency channel and transmit power. All other nodes listen for the strobes and log packet reception events (i.e., success or failed).
The test scenario is ran every two hours on two different platforms: the TelosB [3] and DPP-cc430 [4] platforms. We used all nodes currently available at test time (between 27 and 29).
Final dataset status
3 months of data with about 12 tests per day per platform
5 month of data with about 4 tests per day per platform
Data collection firmware
We are happy to share the link quality data we collected for the FlockLab testbed, but we also wanted to make it easier for others to collect similar datasets for other wireless networks. To achieve this, we include in this repository the data collection firmware we design. The entire data collection scheduling and control is done entirely in software, in order to make the firmware usable in a large variety on wireless networks. We implemented our data collection software using Baloo [5], a flexible network stack design framework based on Synchronous Transmission. Baloo efficiently handles network time synchronization and offers a flexible interface to schedule communication rounds. The firmware source code is available in the Baloo repository [6].
A set of experiment parameters can be patched directly in the firmware, which let the user tune the data collection without having to recompile the source code. This improves usability and facilitates automation. An example patching script is included in this repository. Currently, the following parameters can be patched:
rf_channel,
payload,
host_id, and
rand_seed
Current supported platforms
TelosB [3]
DPP-cc430 [4]
Repository versions
v1.4.1 Updated visualizations in the notebook
v1.4.0 Addition of data from November 2019 to March 2020. Data collection is discontinued (the new FlockLab testbed is being setup).
v1.3.1 Update abstract and notebook
v1.3.0 Addition of October 2019 data. The frequency of tests has been reduced to 4 per day, executing at (approximately) 1:00, 7:00, 13:00, and 19:00. From October 28 onward, time shifted by one hour (2:00, 8:00, 14:00, 20:00).
v1.2.0 Addition of September 2019 data. Many missing tests on the 12, 13, 19, and 20 of September (due to construction works in the building).
v1.1.4 Update of the abstract to have hyperlinks to the plots. Corrected typos.
v1.1.0 Initial version. Add the data collected in August 2019. Data collected was disturbed at the beginning of the month and resumed normally on the August 13. Data from previous days are incomplete.
v1.0.0 Initial version. Contain collected data in July 2019, from the 10th to 30th of July. No data were collected on the 31st of July (technical issue).
List of files
yyyy-mm_raw_platform.zip Archive containing all FlockLab test result files (one .zip file per month and per platform).
yyyy-mm_preprocessed_all.zip Archive containing preprocessed csv files, one per month and per platform.
firmware.zip Archive containing the firmware for all supported platform.
firmware_patch.sh Example bash script illustrating the firmware patching.
parse_flocklab_results.ipynb [open in nbviewer] Jupyter notebook used to create the pre-process data files. Also includes some example of data visualization.
parse_flocklab_results.html HTML rendering of the notebook (static).
plots.zip Archive containing high resolution visualization of the dataset, generated by the parse_flocklab_results notebook, and presented in the abstract.
abstract.pdf A 3 page abstract presenting the dataset.
CRediT.pdf The list of contributions from the authors.
References
[1] R. Lim, F. Ferrari, M. Zimmerling, C. Walser, P. Sommer, and J. Beutel, “FlockLab: A Testbed for Distributed, Synchronized Tracing and Profiling of Wireless Embedded Systems,” in Proceedings of the 12th International Conference on Information Processing in Sensor Networks, New York, NY, USA, 2013, pp. 153–166.
[2] “FlockLab,” GitLab. [Online]. Available: https://gitlab.ethz.ch/tec/public/flocklab/wikis/home. [Accessed: 24-Jul-2019].
[3] Advanticsys, “MTM-CM5000-MSP 802.15.4 TelosB mote Module.” [Online]. Available: https://www.advanticsys.com/shop/mtmcm5000msp-p-14.html. [Accessed: 21-Sep-2018].
[4] Texas Instruments, “CC430F6137 16-Bit Ultra-Low-Power MCU.” [Online]. Available: http://www.ti.com/product/CC430F6137. [Accessed: 21-Sep-2018].
[5] R. Jacob, J. Bächli, R. Da Forno, and L. Thiele, “Synchronous Transmissions Made Easy: Design Your Network Stack with Baloo,” in Proceedings of the 2019 International Conference on Embedded Wireless Systems and Networks, 2019.
[6] “Baloo,” Dec-2018. [Online]. Available: http://www.romainjacob.net/research/baloo/.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SummaryThe repository includes the data and R script for performing an analysis of among- and within-individual differences in the timing of first nesting attempts of the year in natal and pre-breeding environmental conditions (see reference). The data come from a long-term study of the demography of Savannah sparrows (Passerculus sandwichensis) breeding on Kent Island, New Brunswick, Canada (44.58°N, 66.76°W). Climate data were taken from an Environment and Climate Change Canada weather station at the airport in Saint John, NB (45.32°N, 65.89°W; https://www.climate.weather.gc.ca)Datasets(1) SAVS_all_nests_samp.csv: contains summary information for all nest attempts observed for all females included in the analysis (i.e., including both first-of-year and subsequent lay dates).(2) SAVS_first_nest_per_year_samp.csv: contains detailed information on the first nesting attempt by each female Savannah sparrow monitored in the population over the course of the study (1987-2019, excluding the years 2005-2007; see Methods: Study site and field sampling in reference).(3) mean_daily_temperature.csv: contains mean daily temperature records from the ECCC weather station at Saint John, NB (see above). These mean daily temperatures were used in a climate sensitivity analysis to determine the optimum pre-breeding window on Kent Island.(4) SAVS_annual_summary.csv: contains annual summaries of average lay dates, breeding density, reproductive output, etc.Variables- female.id = factor; unique aluminum band number (USGS or Canadian Wildlife Service) assigned to each female- rain.categorical = binary (0 = low rainfall; 1 = high rainfall); groups females into low (81-171 mm) and high (172-378 mm) natal rainfall groups, based on the natal environmental conditions observed in each year (see Methods: Statistical analysis in reference)- year = integer (1987-2019); study year. The population on Savannah sparrows on Kent Island has been monitored since 1987 (excluding three years, 2005-2007)- nest.id = factor; an alpha-numeric code assigned to each nest; unique within years (the combination of year and nest.id would create a unique identifier for each nest)- fledglings = integer; number of offspring fledged from a nest- total.fledglings = integer; the total number of fledglings reared by a given female over the course of her lifetime- nest.attempts = integer; the total number of nest attempts per female (the number of nests over which the total number of fledglings is divided; includes both successful and unsuccessful clutches)hatch.yday = integer; day of the year on which the first egg hatched in a given nestlay.ydate = integer; day of the year on which the first egg was laid in a given nestlay.caldate = date (dd/mm/yyyy); calendar date on which the first egg in a given nest was laidnestling.year = integer; the year in which the female/mother of a given nest was born- nestling.density = integer; the density of adult breeders in the year in which a given female (associated with a particular nest) was born- total.nestling.rain = numeric; cumulative rainfall (in mm) experienced by a female during the nestling period in her natal year of life (01 June to 31 July; see Methods: Temperature and precipitation data in reference)- years.experience = integer; number of previous breeding years per female in a particular year- density.total = integer; total number of adult breeders in the study site in a particular year- MCfden = numeric; mean-centred female density- MCbfden = numeric; mean-centred between-female density- MCwfden = numeric; mean-centred within-female density- mean.t.window = numeric; mean temperature during the identified pre-breeding window (03 May to 26 May; see Methods: Climate sensitivity analysis in reference)- MCtemp = numeric; mean-centred temperature during the optimal pre-breeding window- MCbtemp = numeric; mean-centred between-female temperature during the optimal pre-breeding window- MCwtemp = numeric; mean-centred within-female temperature during the optimal pre-breeding window- female.age = integer; age (in years) of a given female in a given year- MCage = numeric; mean-centred female age- MCbage = numeric; mean-centred between-female age- MCwage = numeric; mean-centred within-female age- mean_temp_c = numeric; mean daily temperature in °C- meanLD = numeric; mean lay date (in days of the year) across all first nest attempts in a given year- sdLD = numeric; standard deviation in lay date (in days of the year) across all first nest attempts in a given year- seLD = numeric; standard error n lay date (in days of the year) across all first nest attempts in a given year- meanTEMP = numeric; mean temperature (in °C) during the breeding period in a given year- records = integer; number of first nest attempts from each year included in the analysis- total.nestling.precip = numeric; total rainfall (in mm) during the nestling period (01 June to 31 July) in a given year- total.breeding.precip = numeric; total rainfall (in mm) during the breeding period (15 April to 31 July) in a given year- density.total = integer; total density of adult breeders on the study site in a given year- total.fledglings = integer; total number of offspring fledged by all breeders in the study site on a given year- cohort.fecundity = numeric; average number of offspring per breeder in a given yearCodecode for Burant et al. - SAVS lay date plasticity analysis.RThe R script provided includes all the code required to import the data and perform the statistical analyses presented in the manuscript. These include:- t-tests investigating the effects of natal conditions (rain.categorical) on female age, nest attempts, and reproductive success- linear models of changes in temperature, precipitation, reproductive success, and population density over time, and lay dates in response to female age, density, etc.- a climate sensing analysis to identify the optimal pre-breeding window on Kent Island- mixed effects models investigating how lay dates respond to changes in within- and between-female age, density, and temperaturesee readme.rtf for a list of datasets and variables.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT Regression analysis is highly relevant to agricultural sciences since many of the factors studied are quantitative. Researchers have generally used polynomial models to explain their experimental results, mainly because much of the existing software perform this analysis and a lack of knowledge of other models. On the other hand, many of the natural phenomena do not present such behavior; nevertheless, the use of non-linear models is costly and requires advanced knowledge of language programming such as R. Thus, this work presents several regression models found in scientific studies, implementing them in the form of an R package called AgroReg. The package comprises 44 analysis functions with 66 regression models such as polynomial, non-parametric (loess), segmented, logistic, exponential, and logarithmic, among others. The functions provide the coefficient of determination (R2), model coefficients and the respective p-values from the t-test, root mean square error (RMSE), Akaike’s information criterion (AIC), Bayesian information criterion (BIC), maximum and minimum predicted values, and the regression plot. Furthermore, other measures of model quality and graphical analysis of residuals are also included. The package can be downloaded from the CRAN repository using the command: install.packages(“AgroReg”). AgroReg is a promising analysis tool in agricultural research on account of its user-friendly and straightforward functions that allow for fast and efficient data processing with greater reliability and relevant information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Activities of Daily Living Object DatasetOverviewThe ADL (Activities of Daily Living) Object Dataset is a curated collection of images and annotations specifically focusing on objects commonly interacted with during daily living activities. This dataset is designed to facilitate research and development in assistive robotics in home environments.Data Sources and LicensingThe dataset comprises images and annotations sourced from four publicly available datasets:COCO DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (ECCV), 740–755.Open Images DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2020). The Open Images Dataset V6: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. International Journal of Computer Vision, 128(7), 1956–1981.LVIS DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A Dataset for Large Vocabulary Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5356–5364.Roboflow UniverseLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation: The following repositories from Roboflow Universe were used in compiling this dataset:Work, U. AI Based Automatic Stationery Billing System Data Dataset. 2022. Accessible at: https://universe.roboflow.com/university-work/ai-based-automatic-stationery-billing-system-data (accessed on 11 October 2024).Destruction, P.M. Pencilcase Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/pencilcase-se7nb (accessed on 11 October 2024).Destruction, P.M. Final Project Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/final-project-wsuvj (accessed on 11 October 2024).Personal. CSST106 Dataset. 2024. Accessible at: https://universe.roboflow.com/personal-pgkq6/csst106 (accessed on 11 October 2024).New-Workspace-kubz3. Pencilcase Dataset. 2022. Accessible at: https://universe.roboflow.com/new-workspace-kubz3/pencilcase-s9ag9 (accessed on 11 October 2024).Finespiralnotebook. Spiral Notebook Dataset. 2024. Accessible at: https://universe.roboflow.com/finespiralnotebook/spiral_notebook (accessed on 11 October 2024).Dairymilk. Classmate Dataset. 2024. Accessible at: https://universe.roboflow.com/dairymilk/classmate (accessed on 11 October 2024).Dziubatyi, M. Domace Zadanie Notebook Dataset. 2023. Accessible at: https://universe.roboflow.com/maksym-dziubatyi/domace-zadanie-notebook (accessed on 11 October 2024).One. Stationery Dataset. 2024. Accessible at: https://universe.roboflow.com/one-vrmjr/stationery-mxtt2 (accessed on 11 October 2024).jk001226. Liplip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/liplip (accessed on 11 October 2024).jk001226. Lip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/lip-uteep (accessed on 11 October 2024).Upwork5. Socks3 Dataset. 2022. Accessible at: https://universe.roboflow.com/upwork5/socks3 (accessed on 11 October 2024).Book. DeskTableLamps Material Dataset. 2024. Accessible at: https://universe.roboflow.com/book-mxasl/desktablelamps-material-rjbgd (accessed on 11 October 2024).Gary. Medicine Jar Dataset. 2024. Accessible at: https://universe.roboflow.com/gary-ofgwc/medicine-jar (accessed on 11 October 2024).TEST. Kolmarbnh Dataset. 2023. Accessible at: https://universe.roboflow.com/test-wj4qi/kolmarbnh (accessed on 11 October 2024).Tube. Tube Dataset. 2024. Accessible at: https://universe.roboflow.com/tube-nv2vt/tube-9ah9t (accessed on 11 October 2024). Staj. Canned Goods Dataset. 2024. Accessible at: https://universe.roboflow.com/staj-2ipmz/canned-goods-isxbi (accessed on 11 October 2024).Hussam, M. Wallet Dataset. 2024. Accessible at: https://universe.roboflow.com/mohamed-hussam-cq81o/wallet-sn9n2 (accessed on 14 October 2024).Training, K. Perfume Dataset. 2022. Accessible at: https://universe.roboflow.com/kdigital-training/perfume (accessed on 14 October 2024).Keyboards. Shoe-Walking Dataset. 2024. Accessible at: https://universe.roboflow.com/keyboards-tjtri/shoe-walking (accessed on 14 October 2024).MOMO. Toilet Paper Dataset. 2024. Accessible at: https://universe.roboflow.com/momo-nutwk/toilet-paper-wehrw (accessed on 14 October 2024).Project-zlrja. Toilet Paper Detection Dataset. 2024. Accessible at: https://universe.roboflow.com/project-zlrja/toilet-paper-detection (accessed on 14 October 2024).Govorkov, Y. Highlighter Detection Dataset. 2023. Accessible at: https://universe.roboflow.com/yuriy-govorkov-j9qrv/highlighter_detection (accessed on 14 October 2024).Stock. Plum Dataset. 2024. Accessible at: https://universe.roboflow.com/stock-qxdzf/plum-kdznw (accessed on 14 October 2024).Ibnu. Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/ibnu-h3cda/avocado-g9fsl (accessed on 14 October 2024).Molina, N. Detection Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/norberto-molina-zakki/detection-avocado (accessed on 14 October 2024).in Lab, V.F. Peach Dataset. 2023. Accessible at: https://universe.roboflow.com/vietnam-fruit-in-lab/peach-ejdry (accessed on 14 October 2024).Group, K. Tomato Detection 4 Dataset. 2023. Accessible at: https://universe.roboflow.com/kkabs-group-dkcni/tomato-detection-4 (accessed on 14 October 2024).Detection, M. Tomato Checker Dataset. 2024. Accessible at: https://universe.roboflow.com/money-detection-xez0r/tomato-checker (accessed on 14 October 2024).University, A.S. Smart Cam V1 Dataset. 2023. Accessible at: https://universe.roboflow.com/ain-shams-university-byja6/smart_cam_v1 (accessed on 14 October 2024).EMAD, S. Keysdetection Dataset. 2023. Accessible at: https://universe.roboflow.com/shehab-emad-n2q9i/keysdetection (accessed on 14 October 2024).Roads. Chips Dataset. 2024. Accessible at: https://universe.roboflow.com/roads-rvmaq/chips-a0us5 (accessed on 14 October 2024).workspace bgkzo, N. Object Dataset. 2021. Accessible at: https://universe.roboflow.com/new-workspace-bgkzo/object-eidim (accessed on 14 October 2024).Watch, W. Wrist Watch Dataset. 2024. Accessible at: https://universe.roboflow.com/wrist-watch/wrist-watch-0l25c (accessed on 14 October 2024).WYZUP. Milk Dataset. 2024. Accessible at: https://universe.roboflow.com/wyzup/milk-onbxt (accessed on 14 October 2024).AussieStuff. Food Dataset. 2024. Accessible at: https://universe.roboflow.com/aussiestuff/food-al9wr (accessed on 14 October 2024).Almukhametov, A. Pencils Color Dataset. 2023. Accessible at: https://universe.roboflow.com/almas-almukhametov-hs5jk/pencils-color (accessed on 14 October 2024).All images and annotations obtained from these datasets are released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits sharing and adaptation of the material in any medium or format, for any purpose, even commercially, provided that appropriate credit is given, a link to the license is provided, and any changes made are indicated.Redistribution Permission:As all images and annotations are under the CC BY 4.0 license, we are legally permitted to redistribute this data within our dataset. We have complied with the license terms by:Providing appropriate attribution to the original creators.Including links to the CC BY 4.0 license.Indicating any changes made to the original material.Dataset StructureThe dataset includes:Images: High-quality images featuring ADL objects suitable for robotic manipulation.Annotations: Bounding boxes and class labels formatted in the YOLO (You Only Look Once) Darknet format.ClassesThe dataset focuses on objects commonly involved in daily living activities. A full list of object classes is provided in the classes.txt file.FormatImages: JPEG format.Annotations: Text files corresponding to each image, containing bounding box coordinates and class labels in YOLO Darknet format.How to Use the DatasetDownload the DatasetUnpack the Datasetunzip ADL_Object_Dataset.zipHow to Cite This DatasetIf you use this dataset in your research, please cite our paper:@article{shahria2024activities, title={Activities of Daily Living Object Dataset: Advancing Assistive Robotic Manipulation with a Tailored Dataset}, author={Shahria, Md Tanzil and Rahman, Mohammad H.}, journal={Sensors}, volume={24}, number={23}, pages={7566}, year={2024}, publisher={MDPI}}LicenseThis dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).License Link: https://creativecommons.org/licenses/by/4.0/By using this dataset, you agree to provide appropriate credit, indicate if changes were made, and not impose additional restrictions beyond those of the original licenses.AcknowledgmentsWe gratefully acknowledge the use of data from the following open-source datasets, which were instrumental in the creation of our specialized ADL object dataset:COCO Dataset: We thank the creators and contributors of the COCO dataset for making their images and annotations publicly available under the CC BY 4.0 license.Open Images Dataset: We express our gratitude to the Open Images team for providing a comprehensive dataset of annotated images under the CC BY 4.0 license.LVIS Dataset: We appreciate the efforts of the LVIS dataset creators for releasing their extensive dataset under the CC BY 4.0 license.Roboflow Universe:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT
The issue of diagnosing psychotic diseases, including schizophrenia and bipolar disorder, in particular, the objectification of symptom severity assessment, is still a problem requiring the attention of researchers. Two measures that can be helpful in patient diagnosis are heart rate variability calculated based on electrocardiographic signal and accelerometer mobility data. The following dataset contains data from 30 psychiatric ward patients having schizophrenia or bipolar disorder and 30 healthy persons. The duration of the measurements for individuals was usually between 1.5 and 2 hours. R-R intervals necessary for heart rate variability calculation were collected simultaneously with accelerometer data using a wearable Polar H10 device. The Positive and Negative Syndrome Scale (PANSS) test was performed for each patient participating in the experiment, and its results were attached to the dataset. Furthermore, the code for loading and preprocessing data, as well as for statistical analysis, was included on the corresponding GitHub repository.
BACKGROUND
Heart rate variability (HRV), calculated based on electrocardiographic (ECG) recordings of R-R intervals stemming from the heart's electrical activity, may be used as a biomarker of mental illnesses, including schizophrenia and bipolar disorder (BD) [Benjamin et al]. The variations of R-R interval values correspond to the heart's autonomic regulation changes [Berntson et al, Stogios et al]. Moreover, the HRV measure reflects the activity of the sympathetic and parasympathetic parts of the autonomous nervous system (ANS) [Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology, Matusik et al]. Patients with psychotic mental disorders show a tendency for a change in the centrally regulated ANS balance in the direction of less dynamic changes in the ANS activity in response to different environmental conditions [Stogios et al]. Larger sympathetic activity relative to the parasympathetic one leads to lower HRV, while, on the other hand, higher parasympathetic activity translates to higher HRV. This loss of dynamic response may be an indicator of mental health. Additional benefits may come from measuring the daily activity of patients using accelerometry. This may be used to register periods of physical activity and inactivity or withdrawal for further correlation with HRV values recorded at the same time.
EXPERIMENTS
In our experiment, the participants were 30 psychiatric ward patients with schizophrenia or BD and 30 healthy people. All measurements were performed using a Polar H10 wearable device. The sensor collects ECG recordings and accelerometer data and, additionally, prepares a detection of R wave peaks. Participants of the experiment had to wear the sensor for a given time. Basically, it was between 1.5 and 2 hours, but the shortest recording was 70 minutes. During this time, evaluated persons could perform any activity a few minutes after starting the measurement. Participants were encouraged to undertake physical activity and, more specifically, to take a walk. Due to patients being in the medical ward, they received instruction to take a walk in the corridors at the beginning of the experiment. They were to repeat the walk 30 minutes and 1 hour after the first walk. The subsequent walks were to be slightly longer (about 3, 5 and 7 minutes, respectively). We did not remind or supervise the command during the experiment, both in the treatment and the control group. Seven persons from the control group did not receive this order and their measurements correspond to freely selected activities with rest periods but at least three of them performed physical activities during this time. Nevertheless, at the start of the experiment, all participants were requested to rest in a sitting position for 5 minutes. Moreover, for each patient, the disease severity was assessed using the PANSS test and its scores are attached to the dataset.
The data from sensors were collected using Polar Sensor Logger application [Happonen]. Such extracted measurements were then preprocessed and analyzed using the code prepared by the authors of the experiment. It is publicly available on the GitHub repository [Książek et al].
Firstly, we performed a manual artifact detection to remove abnormal heartbeats due to non-sinus beats and technical issues of the device (e.g. temporary disconnections and inappropriate electrode readings). We also performed anomaly detection using Daubechies wavelet transform. Nevertheless, the dataset includes raw data, while a full code necessary to reproduce our anomaly detection approach is available in the repository. Optionally, it is also possible to perform cubic spline data interpolation. After that step, rolling windows of a particular size and time intervals between them are created. Then, a statistical analysis is prepared, e.g. mean HRV calculation using the RMSSD (Root Mean Square of Successive Differences) approach, measuring a relationship between mean HRV and PANSS scores, mobility coefficient calculation based on accelerometer data and verification of dependencies between HRV and mobility scores.
DATA DESCRIPTION
The structure of the dataset is as follows. One folder, called HRV_anonymized_data contains values of R-R intervals together with timestamps for each experiment participant. The data was properly anonymized, i.e. the day of the measurement was removed to prevent person identification. Files concerned with patients have the name treatment_X.csv, where X is the number of the person, while files related to the healthy controls are named control_Y.csv, where Y is the identification number of the person. Furthermore, for visualization purposes, an image of the raw RR intervals for each participant is presented. Its name is raw_RR_{control,treatment}_N.png, where N is the number of the person from the control/treatment group. The collected data are raw, i.e. before the anomaly removal. The code enabling reproducing the anomaly detection stage and removing suspicious heartbeats is publicly available in the repository [Książek et al]. The structure of consecutive files collecting R-R intervals is following:
Phone timestamp
RR-interval [ms]
12:43:26.538000
651
12:43:27.189000
632
12:43:27.821000
618
12:43:28.439000
621
12:43:29.060000
661
...
...
The first column contains the timestamp for which the distance between two consecutive R peaks was registered. The corresponding R-R interval is presented in the second column of the file and is expressed in milliseconds.
The second folder, called accelerometer_anonymized_data contains values of accelerometer data collected at the same time as R-R intervals. The naming convention is similar to that of the R-R interval data: treatment_X.csv and control_X.csv represent the data coming from the persons from the treatment and control group, respectively, while X is the identification number of the selected participant. The numbers are exactly the same as for R-R intervals. The structure of the files with accelerometer recordings is as follows:
Phone timestamp
X [mg]
Y [mg]
Z [mg]
13:00:17.196000
-961
-23
182
13:00:17.205000
-965
-21
181
13:00:17.215000
-966
-22
187
13:00:17.225000
-967
-26
193
13:00:17.235000
-965
-27
191
...
...
...
...
The first column contains a timestamp, while the next three columns correspond to the currently registered acceleration in three axes: X, Y and Z, in milli-g unit.
We also attached a file with the PANSS test scores (PANSS.csv) for all patients participating in the measurement. The structure of this file is as follows:
no_of_person
PANSS_P
PANSS_N
PANSS_G
PANSS_total
1
8
13
22
43
2
11
7
18
36
3
14
30
44
88
4
18
13
27
58
...
...
...
...
..
The first column contains the identification number of the patient, while the three following columns refer to the PANSS scores related to positive, negative and general symptoms, respectively.
USAGE NOTES
All the files necessary to run the HRV and/or accelerometer data analysis are available on the GitHub repository [Książek et al]. HRV data loading, preprocessing (i.e. anomaly detection and removal), as well as the calculation of mean HRV values in terms of the RMSSD, is performed in the main.py file. Also, Pearson's correlation coefficients between HRV values and PANSS scores and the statistical tests (Levene's and Mann-Whitney U tests) comparing the treatment and control groups are computed. By default, a sensitivity analysis is made, i.e. running the full pipeline for different settings of the window size for which the HRV is calculated and various time intervals between consecutive windows. Preparing the heatmaps of correlation coefficients and corresponding p-values can be done by running the utils_advanced_plots.py file after performing the sensitivity analysis. Furthermore, a detailed analysis for the one selected set of hyperparameters may be prepared (by setting sensitivity_analysis = False), i.e. for 15-minute window sizes, 1-minute time intervals between consecutive windows and without data interpolation method. Also, patients taking quetiapine may be excluded from further calculations by setting exclude_quetiapine = True because this medicine can have a strong impact on HRV [Hattori et al].
The accelerometer data processing may be performed using the utils_accelerometer.py file. In this case, accelerometer recordings are downsampled to ensure the same timestamps as for R-R intervals and, for each participant, the mobility coefficient is calculated. Then, a correlation
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
In many animal species, males compete for access to fertile females. The resulting sexual selection leads to sex differences in morphology and behaviour, but may also have consequences for physiology. Pectoral sandpipers are an arctic breeding polygynous shorebird in which males perform elaborate displays around the clock and move over long distances to sample potential breeding sites. We examined the oxygen carrying capacity of pectoral sandpipers, measured as the volume percentage of red blood cells in blood (haematocrit, Hct). We found a remarkable sex difference in Hct levels, with males having much higher values (58.9 ± 3.8 SD) than females (49.8 ± 5.3 SD). While Hct values of male pectoral sandpipers are notable for being among the highest recorded in birds, the sex difference we report is unprecedented and more than double that of any previously described. We also show that Hct values declined after arrival to the breeding grounds in females, but not in males, suggesting that males maintain an aerobic capacity during the mating period equivalent to that during trans-hemispheric migration. We conclude that sexual selection for extreme physical performance in male pectoral sandpipers has led to exceptional sex differences in oxygen carrying capacity. Methods Study site and general procedures We studied pectoral sandpipers at a ~2 km2 site at the northern tip of the Alaskan Arctic Coastal Plain near Utqiagvik, Alaska (71°18′ N, 156°44′ W) in the years 2004-2009, 2012, 2014, and 2018. We caught pectoral sandpipers during the breeding season using hand-held mist nets (males and females) or nest traps (females only). We assigned each bird a metal leg band and a unique combination of colour leg bands, weighed them (to the nearest 0.1 g), measured their tarsus (to the nearest 0.1 mm), and sampled 200–300 μl of blood using brachial venepuncture. We collected blood in 70 μl heparinized microhematocrit capillary tubes and centrifuged the samples at 5,000 rpm for 10 min. on the day of collection, thus separating plasma from cellular blood. Haematocrit levels were measured for each full capillary as the percentage of packed red blood cells over the total blood sample. For statistical analyses, we used the mean value from all capillary tubes obtained from an individual during a given capture. Red blood cells were kept and stored in Queen’s lysis buffer for subsequent molecular sexing. In total, we obtained 778 blood samples from males and 262 from females. For 38 males we obtained multiple samples within a breeding season (i.e., two (N = 34), three (N = 3), or four (N = 1)). Additionally, for eight males we obtained samples across multiple breeding seasons (i.e., two (N = 6) or three (N = 3)). We attempted to find all nests on the study site by 1) observing foraging females until they went to their nest to incubate or 2) flushing females off their nest by systematically searching or rope dragging the area. Comparative data We obtained estimates of haematocrit values from other bird species from Minias (2020). Minias (2020) compiled a dataset of 611 Hct estimates from 279 species based on data available in the published literature. This dataset includes only non-experimental studies, and only studies on wild birds or birds kept in outdoor aviaries (i.e. studies on birds kept indoors were not included). From this dataset, we extracted all studies that reported a separate estimate for males and females from the same age class (juvenile or adult). We only included studies when the Hct estimate for each sex was based on at least ten individuals, resulting in a dataset consisting of 63 estimates of male and female Hct values from 35 different species. We additionally obtained sex-specific Hct estimates from five species at our study site which we had collected as part of parallel research projects. These samples were obtained and processed in the same way as those of pectoral sandpipers. Thus, we included data from American golden plover Pluvialis dominica (n = 10 males, 12 females), dunlin Calidris alpina (n = 10 males, 10 females), long-billed dowitcher Limnodromus scolopaceus (n = 26 males, 21 females), red phalarope Phalaropus fulicarius (n = 257 males, 312 females), and semipalmated sandpiper Calidris pusilla (n = 71 males, 67 females). The combined set of data extracted from Minias (2020) and collected by ourselves (excluding pectoral sandpipers) thus consisted of 68 estimates of male and female mean Hct values from 40 different species. Statistical analyses All statistical analyses were performed with R (version 4.2.2, www.r-project.org). We performed (generalised) linear mixed models using the lme4 package (Bates et al. 2014). We used the R package multcomp (Hothorn et al. 2008) to obtain P-values corrected for multiple testing. To test for a difference in Hct levels between male and female pectoral sandpipers, we performed a t-test. To test for a difference in average Hct levels between males and females across different bird species (68 male and female estimates from 40 species), we performed a paired t-test. To examine factors associated with variation in Hct values in pectoral sandpipers, we performed linear mixed-effect models with Hct value as the response variable. Because factors affecting Hct levels may differ between the sexes, and to avoid having to fit 3-way interactions, we ran a model for males and females separately. We included date, body mass (g), and tarsus length (mm) as fixed effects. For females, we also included breeding status, i.e., whether they nested locally at out study site during a given season. We also tested for interaction effects of breeding status with date, body mass, and tarsus length, respectively, but only retained these interaction effects in the model if they explained significant variation (p<0.01). Breeding status was not included in the model for males, because males mate opportunistically at different breeding sites and can all be considered potential breeders. Previous work showed that males who sired offspring locally did not differ in Hct values from males that did not (Kempenaers & Valcu 2017). Numeric variables were mean-centred, so that the model intercept was estimated for the mean value of the explanatory variables. Body mass and tarsus length correlated positively, but weakly (Pearson’s correlation coefficients; males: r=0.24, females: r=0.13) and both were thus included as explanatory variables in the models. Year was included as a random effect in both models. Because we had more than one sample for some males, we also included individual identity as a random intercept in the model for males.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the anonymous reviewing version; the source code repository will be added after the review.
This dataset provides the results of measuring jetty with the 1,000 artificial regressions with Peass and with JMH. The creation of the artificial regressions and the measurement is defined here: https://anonymous.4open.science/r/jetty-evaluation-6F58/ (Repository named jetty-evaluation, GitHub link will be provided after review) An example regression is contained in https://anonymous.4open.science/r/jetty-experiments-202D We obtained these data from measurement on Intel Xeon CPU E5-2620 v3 @ 2.40GHz.
The dataset contains the following data:
regression-results-peass-0.tar.xz (Results of the measurement with Peass, part 0)
regression-results-peass-1.tar.xz (Results of the measurement with Peass, part 1)
regression-results-peass-2.tar.xz (Results of the measurement with Peass, part 2)
regression-results-peass-3.tar.xz (Results of the measurement with Peass, part 3)
regression-results-jmh.tar.xz (Results of the measurement with JMH)
tree-results.tar.xz (Metadata of the trees)
To get the data in a usable format, extract the peass data to one folder (the folder will be named $PEASS_RESULT_FOLDER):
mkdir peass for file in ; do echo $file; tar -xf $file; done for i in {0..3}; do mv $i/ .; done
This will yield to a folder containing 1000 folders named regression-$i, where each consists of
deps.tar.xz: The regression test selection results
logs.tar.xz: The logs of the test executions
results: The traces of the regression test selection and a file named changes_*testcase.json, which contains statistical details of the measured performance change (if present)
jetty.project_peass: Detailed measurement data and logs of individual JVM starts
To analyse the Peass results, run
cd scripts/peass ./analyzeChangeIdentification.sh $PEASS_RESULTS_FOLDER ./analyzeFrequency.sh $PEASS_RESULTS_FOLDER
This will take some time, since partial results need to be unpacked for analysis. The first script will create the following results:
and the second will yield the following results:
Correct Measurement: 587 Not selected changes: 146 Wrong measurement result: 267 Wrong analysis (should be 0): 0 Overall: 1000 Share of changed method on correct measurements: 0.109571 0.11238 32 Method call count on correct measurement: 15638.4 32853.7 32 Average tree depth on correct measurements: 1.1022 2.6875 32 Share of changed method on wrong measurements: 0.17692 0.180365 968 Method call count on wrong measurement: 711415 180438 968 Average tree depth on wrong measurements: 1.23239 2.42252 968
To analyze the JMH data, first extract the metadata (the folder will be named $TREEFOLDER):
tar -xf tree-results.tar.xz
Afterwards extract the JMH results (the folder will be named $JMH_RESULTS_FOLDER):
tar -xvf regression-results-jmh.tar.xz
This will yield a folder containing a measurement for each regression with two files:
basic.json: The performance measurement result of the basic version
regression-$i.json: The performance measurement result of the version containing the regression
Afterwards, run the analysis in the jetty-evaluation repository:
cd scripts/jmh ./analyzeFrequency.sh $JMH_RESULTS_FOLDER $TREEFOLDER
Since the regression are injected in the call tree of the benchmark, there are now unselected changes. The analysis will yield the following results:
Share of changed method on correct measurements: 0.184631 0.271968 587 Method call count on correct measurement: 14628.2 4979.6 587 Share of changed method on wrong measurements: 0.180981 0.235614 267 Method call count on wrong measurement: 14902.2 4333.58 267
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
These data and computer code (written in R, https://www.r-project.org) were created to statistically evaluate a suite of intrinsic and extrinsic risk factors related to calf elk and their mothers' body condition and age. Specifically, known-fate data were collected from 94 elk calves monitored from 2013-2016 in a partially migratory elk (Cervus canadensis) population in Alberta, Canada. Along with adult female data on pregnancy status, age, and body condition, we created a time-to-event dataset that allowed us to analyze calf mortality risk in a time-to-event approach. We also estimated pooled survivorship and cause-specific mortality, as well as stratifying these metrics by migration tactic (resident vs. eastern migrant). Cox proportional hazards models were used to evaluate calf mortality risk in terms of forage biomass (kg/ha), bear predation risk (from an RSF), and other factors that varied between migration tactics. We tested for differences in a number of maternal reproductive parameters (e.g., pregnancy status) and for calf explanatory variables between migrant and resident elk segments. We also use cumulative incidence functions to estimate cause-specific mortality in this multiple carnivore system. Ultimately, we hope that this work helps wildlife managers anticipate how elk calf survival and partial migration dynamics are affected by grizzly bear predation, and our study builds on a long-term partial migration study at the Ya Ha Tinda Ranch in Alberta, Canada.
Methods
We used a continuous time-to-event approach on a daily mortality timescale to assess factors influencing mortality risk for elk calves over a 90-day period with birth as the origin. We right-censored calves with failed tags (n = 1) and calves that survived past the 90-day period in their year of birth. We used a generalized Kaplan-Meier (KM) estimator to provide estimates of overall calf survival during the 90-day post-calving period. We stratified estimates by sex, year, and migration tactic, then used log-rank tests to test for significant differences in survival among these groups. We report 95% confidence intervals (CI) for KM survival estimates on the complementary log-log scale, which provides better estimates of uncertainty near the boundaries of 0 or 1. We used cumulative incidence functions to estimate cause-specific pooled mortality rates as well as stratified by migration tactic. We assessed factors influencing mortality risk using the Andersen-Gill formulation of the Cox proportional hazards (PH) model because it accommodated time-dependent risk covariates. We evaluated a suite of nested candidate models that included both intrinsic and extrinsic variables. Further, because of the importance of understanding differences in migratory tactics, we conducted Welch’s two-sample t-tests to evaluate differences in terms of adult female elk age and body condition, and calf sex and birth mass; we used a nonparametric Kolmogorov-Smirnov test to investigate phenological differences in calf birth dates. We also tested for differences between migratory tactics across extrinsic variables using a Welch’s two-sample t-test for human disturbance metrics and linear mixed-effects (LME) models with random intercepts to account for correlated observations within calves for forage biomass and predation risk. In summary, grizzly bear predation risk was considered a time-dependent covariate in our mortality risk analysis similar to forage biomass that changed with variation in plant phenology, while other covariates were static through time.
We standardized (by subtracting the mean and dividing by 2 SDs) all continuous predictor variables before analysis to allow relative coefficient comparisons among categorical and continuous predictor variables. We also screened variables for collinearity and avoided including variables in the same model that had a correlation coefficient of > |0.5|. We further screened out clearly unimportant variables (i.e., P > 0.05) before conducting model selection. As a conservative measure, we used Bayesian Information Criterion (BIC) for model selection. We also used model averaging, given the complexity of the full model relative to our sample size, and reported unconditional standard errors and naturally averaged model coefficients (i.e., non-shrinkage estimates). We present only models in 95% of the cumulative weight. We used Monte Carlo sampling to predict survivorship and the 95% uncertainty interval from model-averaged coefficient estimates and the Breslow estimate of the cumulative baseline hazard function.
For brevity, we provide the time-to-event dataset and analysis code rather than include all of the code, GIS, etc. used to estimate calf-rearing areas and extract risk covariates for each individual. Rather, we provide the data and all code to reproduce all results presented in the manuscript, but we do include some data processing in the analysis code. The code is organized as a series of 4 pdf documents that were created in Rstudio using Rmarkdown. The analysis code is broken into the following sections that mirror the results in the manuscript: I) summary of adult female reproductive parameters and other summary statistics and tests, II) survivorship estimates from Kaplan-Meier along with log-rank tests, III) calf cause-specific morality estimation, and IV) Cox proportional hazards for calf mortality risk analysis.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
#https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################
###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################
###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)
###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.
###To look at samples of the data, uncomment this line:
###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe
im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe
im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe
################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer
#strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))
###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.
library("foreach", lib.loc="~/R/win-library/3.3")
###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):
###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')
#each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)
#im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).
#To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))
#Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")
#Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }
#there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")
#One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)
#To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)
#The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:
library(...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically