Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
ABSTRACT Meta-analysis is an adequate statistical technique to combine results from different studies, and its use has been growing in the medical field. Thus, not only knowing how to interpret meta-analysis, but also knowing how to perform one, is fundamental today. Therefore, the objective of this article is to present the basic concepts and serve as a guide for conducting a meta-analysis using R and RStudio software. For this, the reader has access to the basic commands in the R and RStudio software, necessary for conducting a meta-analysis. The advantage of R is that it is a free software. For a better understanding of the commands, two examples were presented in a practical way, in addition to revising some basic concepts of this statistical technique. It is assumed that the data necessary for the meta-analysis has already been collected, that is, the description of methodologies for systematic review is not a discussed subject. Finally, it is worth remembering that there are many other techniques used in meta-analyses that were not addressed in this work. However, with the two examples used, the article already enables the reader to proceed with good and robust meta-analyses. Level of Evidence V, Expert Opinion.
This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.
This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.
This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).
It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:
In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).
Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).
After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.
Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).
Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.
On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).
Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).
|
Validation set | |
Model |
True |
False |
Presence |
A |
B |
Background |
C |
D |
We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).
The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.
Regarding the model evaluation and estimation, we selected the following estimators:
1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).
2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).
Students learn about the importance of good data management and begin to explore QGIS and RStudio for spatial analysis purposes. Students will explore National Land Cover Database raster data and made-up vector point data on both platforms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The scripts in this folder weer used to combine all call statistic files per day into one file, resulting in nine files containing all call statistics per data. The script ‘merging_dataset.R’ was used to combine all days worth of call statistics and create subsets of two frequency ranges (18-32 and 32-96). The script ‘camera_data’ was used to combine all camera and observation data.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is a repository for codes and datasets for the open-access paper in Linguistik Indonesia, the flagship journal for the Linguistic Society of Indonesia (Masyarakat Linguistik Indonesia [MLI]) (cf. the link in the references below).To cite the paper (in APA 6th style):Rajeg, G. P. W., Denistia, K., & Rajeg, I. M. (2018). Working with a linguistic corpus using R: An introductory note with Indonesian negating construction. Linguistik Indonesia, 36(1), 1–36. doi: 10.26499/li.v36i1.71To cite this repository:Click on the Cite (dark-pink button on the top-left) and select the citation style through the dropdown button (default style is Datacite option (right-hand side)This repository consists of the following files:1. Source R Markdown Notebook (.Rmd file) used to write the paper and containing the R codes to generate the analyses in the paper.2. Tutorial to download the Leipzig Corpus file used in the paper. It is freely available on the Leipzig Corpora Collection Download page.3. Accompanying datasets as images and .rds format so that all code-chunks in the R Markdown file can be run.4. BibLaTeX and .csl files for the referencing and bibliography (with APA 6th style). 5. A snippet of the R session info after running all codes in the R Markdown file.6. RStudio project file (.Rproj). Double click on this file to open an RStudio session associated with the content of this repository. See here and here for details on Project-based workflow in RStudio.7. A .docx template file following the basic stylesheet for Linguistik IndonesiaPut all these files in the same folder (including the downloaded Leipzig corpus file)!To render the R Markdown into MS Word document, we use the bookdown R package (Xie, 2018). Make sure this package is installed in R.Yihui Xie (2018). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.6.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This figshare item provides data and R code to reproduce the analysis in the following paper:Weller, DE; ME Baker, and RS King. 2023. New methods for quantifying the effects of catchment spatial patterns on aquatic responses. Landscape Ecology. https://doi.org/10.1007/s10980-023-01706-xThis figshare item provides 14 files: five data files (.csv files), a list of models to be fitted by the R code (Modlist.csv), and seven files of R code (.R files). The file 0SpatialAnalysis.txt provides more information on the spatial analysis we used to generate distance distributions.Data filesThe five data files are· subestPCB.csv· cdist.csv· hdist.csv· ldist.csv· tdist.csvThe file subestPCB.csv provides catchment id numbers, names, and average measured PCB concentrations from fish tissues for 14 study subestuaries. The remaining four files provide the distance distributions for commercial land, high-density residential land, low-density residential land, and all land. Each distance file has four columns, junk, count, catchment id, and distance. Information in the junk column is not used. Count provides land area as the number of 30 by 30 meter (0.09 hectare) pixels. The variable called distance provides the distance to the subestuary shoreline in decameters.R codeThe R codes reproduce the statistical analysis and most of the tables and figures from the published paper.We ran the codes using Rstudio. We invoked Rstudio’s New Project … > Existing Directory option to establish the directory containing the data files and R codes files as an Rstudio project. Then we ran five R codes in sequence according to the initial numbers in the file names (1ReadData.R, 2FitModels.R, 3Tables.R, 4Figures.R, and 5FigureS3.R). Each program adds to the objects saved in the R workspace within the Rstudio project. Figures and tables are saved in the subdirectory FiguresTables.The five numbered R files also use functions from two other files: DistWeightFunctionsV01.R and AuxillaryFunctionsV01.R.The first R program expects the five data files (subestPCB.csv, cdist.csv, hdist.csv, ldist.csv, and tdist.csv) to reside in the same directory as the program and the Rstudio project.Comments in the R files provide additional information on how each one works.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used to collect and analyze data for the MPhil Thesis, "Lithic Technological Change and Behavioral Responses to the Last Glacial Maximum Across Southwestern Europe." This dataset contains the raw data collected from published literature, and the R code used to run correspondence analysis on the data and create graphical representations of the results. It also contains notes to aid in interpreting the dataset, and a list detailing how variables in the dataset were grouped for use in analysis. The file "Diss Data.xlsx" contains the raw data collected from publications on Upper Paleolithic archaeological sites in France, Spain, and Italy. This data is the basis for all other files included in the repository. The document "Diss Data Notes.docx" contains detailed information about the raw data, and is useful for understanding its context. "Revised Variable Groups.docx" lists all of the variables from the raw data considered "tool types" and the major categories into which they were sorted for analysis. "Group Definitions.docx" provides the criteria considered to make the groups listed in the "Revised Variable Groups" document. "r_diss_data.xlsx" contains only the variables from the raw data that were considered for correspondence analysis carried-out in RStudio. The document "ca_barplot.R" contains the RStudio code written to perform correspondence analysis and percent composition analysis on the data from "R_Diss_Data.xlsx". This file also contains code for creating scatter plots and bar graphs displaying the results from the CA and Percent Comp tests. The RStudio packages used to carry out the analysis and to create graphical representations of the analysis results are listed under "Software/Usage Instructions." "climate_curve.R" contains the RStudio code used to create climate curves from NGRIP and GRIP data available open-access from the Neils Bohr Institute Center of Ice and Climate. The link to access this data is provided in "Related Resources" below.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets (.Rmd, .Rroj., .rds) are ready to use within the R software for statistical programming with the R Studio Graphical User Interface (https://posit.co/download/rstudio-desktop/). Please copy the folder structure into one single directory and follow the instructions given in the .Rmd file. Files and data are listed and described as follows:
Main directory files: results_fpath
Population estimation files: wpop_files
Steepness and elevation analysis derived from SRTM and processed in Google Earth Engine for landslides, mountain regions and urban centers in cities: gee_files
Standard deviation analysis derived from SRTM and processed in Google Earth Engine for mean slope in mountain regions and urban centers in cities: gee_sd
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file compiles the different datasets used and analysis made in the paper "Visual Continuous Time Preferences". Both RStudio and Stata were used for the analysis. The first was used for descriptive statistics and graphs, the second for regressions. We join the datasets for both analysis.
"Analysis VCTP - RStudio.R" is the RStudio analysis. "Analysis VCTP - Stata.do" is the Stata analysis.
The RStudio datasets are: "data_Seville.xlsx" is the dataset of observations. "FormularioEng.xlsx" is the dataset of control variables.
The Stata datasets are: "data_Seville_Stata.dta" is the dataset of observations. "FormularioEng.dta" is the dataset of control variables
Additionally, the experimental instructions of the six experimental conditions are also available: "Hypothetical MPL-VCTP.pdf" is the instructions and task for hypothetical payment and MPL answered before VCTP. "Hypothetical VCTP-MPL.pdf" is the instructions and task for hypothetical payment and VCTP answered before MPL. "OneTenth MPL-VCTP.pdf" is the instructions and task for BRIS payment and MPL answered before VCTP. "OneTenth VCTP-MPL.pdf" is the instructions and task for BRIS payment and VCTP answered before MPL. "Real MPL-VCTP.pdf" is the instructions and task for real payment and VCTP answered before MPL. "Real VCTP-MPL.pdf" is the instructions and task for real payment and VCTP answered before MPL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Currently, there are many application programs to perform statistical analysis, such as SPSS, EViews, and Minitab, which are commercial software, while PSPP, JASP and PAST, which are free software. STATCAL is an user-friendly statistical application program which is developed using R programming language, in RStudio using various R packages. STATCAL is designed as simple as possible so that only need a bit of step to obtain result. Various statistical tests in STATCAL, such as normality, homogeneity, comparison of two or more means, correlation, association between categorical variables, reliability, linear regression, panel data regression, covariance-based structural equation modeling and partial least square path modeling are available. Inside STATCAL is also provided tutorial video and guidance menu to make easy for user.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m2s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.
Info
ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3
barplot_R.R -> code to generate bar plot in R statistic 3.3.3
boxplotv2.R -> code to generate boxplot in R statistic 3.3.3
pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.
who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.
who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.
Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content
ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii
ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.
Consider citing our work.
Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433
This dataset contains original quantitative datafiles, analysis data, a codebook, R scripts, syntax for replication, the original output from R studio and figures from a statistical program. The analyses can be found in Chapter 5 of my PhD dissertation, i.e., ‘Political Factors Affecting the EU Legislative Decision-Making Speed’. The data supporting the findings of this study are accessible and replicable. Restrictions apply to the availability of these data, which were used under license for this study. The datafiles include: File name of R script: Chapter 5 script.R File name of syntax: Syntax for replication 5.0.docx File name of the original output from R studio: The original output 5.0.pdf File name of code book: Codebook 5.0.txt File name of the analysis data: data5.0.xlsx File name of the dataset: Original quantitative data for Chapter 5.xlsx File name of the dataset: Codebook of policy responsiveness.pdf File name of figures: Chapter 5 Figures.zip Data analysis software: R studio R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin17.0 (64-bit)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the dataset and R script used for the statistical analysis in the study investigating the association between physical activity and musculoskeletal pain in university staff during the COVID-19 pandemic.
The files include:
banco_dor_covid_atividade_fisica.xlsx
): Contains sociodemographic variables and musculoskeletal pain reports from study participants.Script_dor_covid19_atividade_fisica.R
): Performs descriptive statistics, logistic regression, and Cronbach’s Alpha calculation with confidence intervals using the bootstrap method.This study aims to evaluate the impact of physical activity on musculoskeletal pain incidence using robust statistical methods.
The associated scientific article is currently under peer review and will be added to this repository once published.
Authors:
Affiliation: Federal University of Alagoas (UFAL)
Keywords: physical activity, musculoskeletal pain, COVID-19, statistical analysis, logistic regression, Cronbach’s Alpha, RStudio.
License: Creative Commons Attribution 4.0 International (CC-BY 4.0)
https://doi.org/10.5061/dryad.mpg4f4r89
This repository contains a .csv and R studio file containing the full raw metadata and code used to analyze and plot the data shown in Figure 6 and Figure S4. Within the .csv file, columns indicate metadata that may be sources of variability in a comprehensive analysis of AAV transduction efficiency. The code can be used to perform FAMD on the existing dataset and can be adapted to create similar plots in additional datasets with the same structure. Both analysis and plotting parameters are included in the script.
The headers in the .csv files with their units (if applicable) are described as follows: age (days) weight (grams) single_dual: injection site contained two viruse...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multiapproach of use and abuse of alcohol, licit and illicit drugs and similars (Legal opinion n. 3.556.646 - CEP/UNIR 2019). 1st Year (august 2019 - july 2020), excessive alcohol consumption at Porto Velho-RO: categorization by CAGE. [Multiabordagem do uso e abuso de álcool, drogas lícitas e ilícitas e afins (Parecer n°. 3.556.646 - CEP/UNIR 2019). Ano 1 (agosto de 2019 - julho de 2020), abuso do consumo de álcool em Porto Velho - RO: categorização pelo CAGE.]
Files / [arquivos]:
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Analysis files and data for study of effect of police-involved civilian deaths on homicide rates. Includes csv of aggregated homicide and aggravated assault data for 44 US cities; an R project file; an R script for reproducible cleaning process; an interrupted time series analytical file, which also produces plots; a meta-analysis file, which also produces forest plots; records of police involved shootings with links to news reports.
To use: Download all into one folder and open the R project file with R using RStudio. I have tried to make these fully functional on their own to maximise reproducibility. You will likely need to download packages (but RStudio should prompt you to the ones that are missing). If you want to re-run the cleaning file, you will have to download the UCR and city crime data. I have provided links to these sources. Otherwise, everything should run out of the box!
Disclaimer: I do not own the original data files from cities and UCR. While I have not included these case-level data, they are all publicly available and I have provided links, aside from Tampa which I acquired through a data request. I am happy to assist any interested researchers with getting the source data.
Update 17 June 2022: Previous versions did not include 'final protested list.rds', which is essential to run analyses. This is now added.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study investigated predictive processing during both silent and oral reading, revealing a more pronounced predictability effect in the context of oral reading.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Hello. As a big ski jumping fan, I would like to invite everybody to something like a project called "Ski Jumping Data Center". Primary goal is as below:
Collect as many data about ski-jumping as possible and create as many useful insights based on them as possible
In the mid-September last year (12.09.20) I thought "Hmm, I don't know any statistical analyses of ski jumping". In fact, the only easily found public data analysis about SJ I know is https://rstudio-pubs-static.s3.amazonaws.com/153728_02db88490f314b8db409a2ce25551b82.html
Question is: why? This discipline is in fact overloaded with data, but almost nobody took this topic seriously. Therefore I decided to start collecting data and analyzing them. However, the amount of work needed to capture various data (i.e. jumps and results of competitions) was so big and there is so many ways to use these informations, that make it public was obvious. In fact, I have a plan to expand my database to be as big as possible, but it requires more time and (I wish) more help.
Data below is (in a broad sense) created by merging a lot of (>6000) PDFs with the results of almost 4000 ski jumping competitions organized between (roughly) 2009 and 2021. Creation of this dataset costed me about 150 hours of coding and parsing data and over 4 months of hard work. My current algorithm can parse in a quasi-instant way results of the consecutive events, so this dataset can be easily extended. For details see the Github page: https://github.com/wrotki8778/Ski_jumping_data_center The observations contain standard information about every jump - style points, distance, take-off speed, wind etc. Main advantage of this dataset is the number of jumps - it's quite high (by the time of uploading it's almost 250 000 rows), so we can analyze this data in various ways, although the number of columns is not so insane.
Big "thank you" should go to the creators of tika package, because without theirs contribution I probably wouldn't create this dataset at all.
I plan to make at least a few insights from this data: 1) Are the wind/gate factor well adjusted? 2) How strong is the correlation between the distance and the style marks? Is the judgement always fair? 3) (advanced) Can we create a model that predicts the performance/distance of an athlete in a given competition? Maybe some deep learning model? 4) Which characteristics of athletes are important in achieving the best jumps - height/weight etc.?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.