61 datasets found
  1. f

    Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    figshare
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  2. f

    Data from: HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE...

    • scielo.figshare.com
    tiff
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diego Ariel de Lima; Camilo Partezani Helito; Lana Lacerda de Lima; Renata Clazzer; Romeu Krause Gonçalves; Olavo Pires de Camargo (2023). HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE USING R SOFTWARE AND RSTUDIO [Dataset]. http://doi.org/10.6084/m9.figshare.19899537.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELO journals
    Authors
    Diego Ariel de Lima; Camilo Partezani Helito; Lana Lacerda de Lima; Renata Clazzer; Romeu Krause Gonçalves; Olavo Pires de Camargo
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    ABSTRACT Meta-analysis is an adequate statistical technique to combine results from different studies, and its use has been growing in the medical field. Thus, not only knowing how to interpret meta-analysis, but also knowing how to perform one, is fundamental today. Therefore, the objective of this article is to present the basic concepts and serve as a guide for conducting a meta-analysis using R and RStudio software. For this, the reader has access to the basic commands in the R and RStudio software, necessary for conducting a meta-analysis. The advantage of R is that it is a free software. For a better understanding of the commands, two examples were presented in a practical way, in addition to revising some basic concepts of this statistical technique. It is assumed that the data necessary for the meta-analysis has already been collected, that is, the description of methodologies for systematic review is not a discussed subject. Finally, it is worth remembering that there are many other techniques used in meta-analyses that were not addressed in this work. However, with the two examples used, the article already enables the reader to proceed with good and robust meta-analyses. Level of Evidence V, Expert Opinion.

  3. d

    Python and R Basics for Environmental Data Sciences

    • search.dataone.org
    • hydroshare.org
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Tao Wen
    Area covered
    Description

    This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

    This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

    This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.

  4. Codes in R for spatial statistics analysis, ecological response models and...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D. W. Rössel-Ramírez; D. W. Rössel-Ramírez; J. Palacio-Núñez; J. Palacio-Núñez; S. Espinosa; S. Espinosa; J. F. Martínez-Montoya; J. F. Martínez-Montoya (2025). Codes in R for spatial statistics analysis, ecological response models and spatial distribution models [Dataset]. http://doi.org/10.5281/zenodo.7603557
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    D. W. Rössel-Ramírez; D. W. Rössel-Ramírez; J. Palacio-Núñez; J. Palacio-Núñez; S. Espinosa; S. Espinosa; J. F. Martínez-Montoya; J. F. Martínez-Montoya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).

    It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:

    In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).

    Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).

    After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.

    Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).

    Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.

    On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).

    Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).

    Validation set

    Model

    True

    False

    Presence

    A

    B

    Background

    C

    D

    We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).

    The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.

    Regarding the model evaluation and estimation, we selected the following estimators:

    1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).

    2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).

  5. q

    Data management and introduction to QGIS and RStudio for spatial analysis

    • qubeshub.org
    Updated May 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meghan MacLean (2020). Data management and introduction to QGIS and RStudio for spatial analysis [Dataset]. http://doi.org/10.25334/48G8-6Y44
    Explore at:
    Dataset updated
    May 22, 2020
    Dataset provided by
    QUBES
    Authors
    Meghan MacLean
    Description

    Students learn about the importance of good data management and begin to explore QGIS and RStudio for spatial analysis purposes. Students will explore National Land Cover Database raster data and made-up vector point data on both platforms.

  6. R scripts used to analyze rodent call statistics generated by 'DeepSqueak'

    • figshare.com
    zip
    Updated May 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathijs Blom (2021). R scripts used to analyze rodent call statistics generated by 'DeepSqueak' [Dataset]. http://doi.org/10.6084/m9.figshare.14696304.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 28, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mathijs Blom
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The scripts in this folder weer used to combine all call statistic files per day into one file, resulting in nine files containing all call statistics per data. The script ‘merging_dataset.R’ was used to combine all days worth of call statistics and create subsets of two frequency ranges (18-32 and 32-96). The script ‘camera_data’ was used to combine all camera and observation data.

  7. m

    Data from: Working with a linguistic corpus using R: An introductory note...

    • bridges.monash.edu
    • researchdata.edu.au
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg; Karlina Denistia; I Made Rajeg (2023). Working with a linguistic corpus using R: An introductory note with Indonesian Negating Construction [Dataset]. http://doi.org/10.4225/03/5a7ee2ac84303
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg; Karlina Denistia; I Made Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is a repository for codes and datasets for the open-access paper in Linguistik Indonesia, the flagship journal for the Linguistic Society of Indonesia (Masyarakat Linguistik Indonesia [MLI]) (cf. the link in the references below).To cite the paper (in APA 6th style):Rajeg, G. P. W., Denistia, K., & Rajeg, I. M. (2018). Working with a linguistic corpus using R: An introductory note with Indonesian negating construction. Linguistik Indonesia, 36(1), 1–36. doi: 10.26499/li.v36i1.71To cite this repository:Click on the Cite (dark-pink button on the top-left) and select the citation style through the dropdown button (default style is Datacite option (right-hand side)This repository consists of the following files:1. Source R Markdown Notebook (.Rmd file) used to write the paper and containing the R codes to generate the analyses in the paper.2. Tutorial to download the Leipzig Corpus file used in the paper. It is freely available on the Leipzig Corpora Collection Download page.3. Accompanying datasets as images and .rds format so that all code-chunks in the R Markdown file can be run.4. BibLaTeX and .csl files for the referencing and bibliography (with APA 6th style). 5. A snippet of the R session info after running all codes in the R Markdown file.6. RStudio project file (.Rproj). Double click on this file to open an RStudio session associated with the content of this repository. See here and here for details on Project-based workflow in RStudio.7. A .docx template file following the basic stylesheet for Linguistik IndonesiaPut all these files in the same folder (including the downloaded Leipzig corpus file)!To render the R Markdown into MS Word document, we use the bookdown R package (Xie, 2018). Make sure this package is installed in R.Yihui Xie (2018). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.6.

  8. f

    Data and R code for "New methods for quantifying the effects of catchment...

    • smithsonian.figshare.com
    txt
    Updated Jul 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald Weller; Matthew Baker; Ryan King (2024). Data and R code for "New methods for quantifying the effects of catchment spatial patterns on aquatic responses" [Dataset]. http://doi.org/10.25573/serc.23557056.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 13, 2024
    Dataset provided by
    Smithsonian Environmental Research Center
    Authors
    Donald Weller; Matthew Baker; Ryan King
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This figshare item provides data and R code to reproduce the analysis in the following paper:Weller, DE; ME Baker, and RS King. 2023. New methods for quantifying the effects of catchment spatial patterns on aquatic responses. Landscape Ecology. https://doi.org/10.1007/s10980-023-01706-xThis figshare item provides 14 files: five data files (.csv files), a list of models to be fitted by the R code (Modlist.csv), and seven files of R code (.R files). The file 0SpatialAnalysis.txt provides more information on the spatial analysis we used to generate distance distributions.Data filesThe five data files are· subestPCB.csv· cdist.csv· hdist.csv· ldist.csv· tdist.csvThe file subestPCB.csv provides catchment id numbers, names, and average measured PCB concentrations from fish tissues for 14 study subestuaries. The remaining four files provide the distance distributions for commercial land, high-density residential land, low-density residential land, and all land. Each distance file has four columns, junk, count, catchment id, and distance. Information in the junk column is not used. Count provides land area as the number of 30 by 30 meter (0.09 hectare) pixels. The variable called distance provides the distance to the subestuary shoreline in decameters.R codeThe R codes reproduce the statistical analysis and most of the tables and figures from the published paper.We ran the codes using Rstudio. We invoked Rstudio’s New Project … > Existing Directory option to establish the directory containing the data files and R codes files as an Rstudio project. Then we ran five R codes in sequence according to the initial numbers in the file names (1ReadData.R, 2FitModels.R, 3Tables.R, 4Figures.R, and 5FigureS3.R). Each program adds to the objects saved in the R workspace within the Rstudio project. Figures and tables are saved in the subdirectory FiguresTables.The five numbered R files also use functions from two other files: DistWeightFunctionsV01.R and AuxillaryFunctionsV01.R.The first R program expects the five data files (subestPCB.csv, cdist.csv, hdist.csv, ldist.csv, and tdist.csv) to reside in the same directory as the program and the Rstudio project.Comments in the R files provide additional information on how each one works.

  9. c

    Research data supporting 'Lithic Technological Change and Behavioral...

    • repository.cam.ac.uk
    bin, docx, xlsx
    Updated Sep 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carroll, Peyton (2020). Research data supporting 'Lithic Technological Change and Behavioral Responses to the Last Glacial Maximum Across Southwestern Europe' [Dataset]. http://doi.org/10.17863/CAM.56697
    Explore at:
    xlsx(56230 bytes), bin(6066 bytes), bin(46471 bytes), xlsx(542779 bytes), docx(347181 bytes)Available download formats
    Dataset updated
    Sep 8, 2020
    Dataset provided by
    University of Cambridge
    Apollo
    Authors
    Carroll, Peyton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Europe
    Description

    This dataset was used to collect and analyze data for the MPhil Thesis, "Lithic Technological Change and Behavioral Responses to the Last Glacial Maximum Across Southwestern Europe." This dataset contains the raw data collected from published literature, and the R code used to run correspondence analysis on the data and create graphical representations of the results. It also contains notes to aid in interpreting the dataset, and a list detailing how variables in the dataset were grouped for use in analysis. The file "Diss Data.xlsx" contains the raw data collected from publications on Upper Paleolithic archaeological sites in France, Spain, and Italy. This data is the basis for all other files included in the repository. The document "Diss Data Notes.docx" contains detailed information about the raw data, and is useful for understanding its context. "Revised Variable Groups.docx" lists all of the variables from the raw data considered "tool types" and the major categories into which they were sorted for analysis. "Group Definitions.docx" provides the criteria considered to make the groups listed in the "Revised Variable Groups" document. "r_diss_data.xlsx" contains only the variables from the raw data that were considered for correspondence analysis carried-out in RStudio. The document "ca_barplot.R" contains the RStudio code written to perform correspondence analysis and percent composition analysis on the data from "R_Diss_Data.xlsx". This file also contains code for creating scatter plots and bar graphs displaying the results from the CA and Percent Comp tests. The RStudio packages used to carry out the analysis and to create graphical representations of the analysis results are listed under "Software/Usage Instructions." "climate_curve.R" contains the RStudio code used to create climate curves from NGRIP and GRIP data available open-access from the Neils Bohr Institute Center of Ice and Climate. The link to access this data is provided in "Related Resources" below.

  10. Large Landslide Exposure in Metropolitan Cities

    • zenodo.org
    bin, csv
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joaquin V. Ferrer; Joaquin V. Ferrer (2024). Large Landslide Exposure in Metropolitan Cities [Dataset]. http://doi.org/10.5281/zenodo.13842843
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Sep 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joaquin V. Ferrer; Joaquin V. Ferrer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 27, 2024
    Description

    These datasets (.Rmd, .Rroj., .rds) are ready to use within the R software for statistical programming with the R Studio Graphical User Interface (https://posit.co/download/rstudio-desktop/). Please copy the folder structure into one single directory and follow the instructions given in the .Rmd file. Files and data are listed and described as follows:

    Main directory files: results_fpath

    • Code containing statisticla analysis and ploting: 20240927_code.Rmd
    • 1_melted_lan_df.rds: Landslide time series database covering 1,085 landslides intersected with settlement footprints from 1985-2015.
    • 4_cities_lan.df.rds: City and landslide data for these 1,085 landslides intersected with settlement footprints from 1985-2015.
    • 7_zoib_nested_pop_pressure_model: brms statistical model file.
    • ghs_stat_fua_comb.gpkg: Urban center data from the GHSL - Global Human Settlement Layer.

    Population estimation files: wpop_files

    • 2015_ls_pop.csv: Estimates of population on landslides using the 100x100 population density grid from the WorldPop dataset.

    Steepness and elevation analysis derived from SRTM and processed in Google Earth Engine for landslides, mountain regions and urban centers in cities: gee_files

    • 1_mr_met.csv: Elevation and mean slope for mountain region areas in cities
    • 2_uc_met.csv: Elevation and mean slope for urban centers (defined by in the GHSL data) in cities

    Standard deviation analysis derived from SRTM and processed in Google Earth Engine for mean slope in mountain regions and urban centers in cities: gee_sd

    • gee_mr.csv: Mean slope and standard deviation for mountain region
    • gee_uc.csv: Mean slope and standard deviation for urban centers (defined by in the GHSL data)
  11. m

    Data from: Visual Continuous Time Preferences

    • data.mendeley.com
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Prisse (2023). Visual Continuous Time Preferences [Dataset]. http://doi.org/10.17632/ms63y77fcf.5
    Explore at:
    Dataset updated
    Jun 12, 2023
    Authors
    Benjamin Prisse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file compiles the different datasets used and analysis made in the paper "Visual Continuous Time Preferences". Both RStudio and Stata were used for the analysis. The first was used for descriptive statistics and graphs, the second for regressions. We join the datasets for both analysis.

    "Analysis VCTP - RStudio.R" is the RStudio analysis. "Analysis VCTP - Stata.do" is the Stata analysis.

    The RStudio datasets are: "data_Seville.xlsx" is the dataset of observations. "FormularioEng.xlsx" is the dataset of control variables.

    The Stata datasets are: "data_Seville_Stata.dta" is the dataset of observations. "FormularioEng.dta" is the dataset of control variables

    Additionally, the experimental instructions of the six experimental conditions are also available: "Hypothetical MPL-VCTP.pdf" is the instructions and task for hypothetical payment and MPL answered before VCTP. "Hypothetical VCTP-MPL.pdf" is the instructions and task for hypothetical payment and VCTP answered before MPL. "OneTenth MPL-VCTP.pdf" is the instructions and task for BRIS payment and MPL answered before VCTP. "OneTenth VCTP-MPL.pdf" is the instructions and task for BRIS payment and VCTP answered before MPL. "Real MPL-VCTP.pdf" is the instructions and task for real payment and VCTP answered before MPL. "Real VCTP-MPL.pdf" is the instructions and task for real payment and VCTP answered before MPL.

  12. o

    Data from: Brief Overview of STATCAL Statistical Application Program

    • osf.io
    Updated Apr 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prana Gio; Dina Nazriani; Rezzy Caraka; RIZKI SYAHPUTRA; Meigia Sari; Anil Syofra (2018). Brief Overview of STATCAL Statistical Application Program [Dataset]. http://doi.org/10.17605/OSF.IO/BNCE8
    Explore at:
    Dataset updated
    Apr 8, 2018
    Dataset provided by
    Center For Open Science
    Authors
    Prana Gio; Dina Nazriani; Rezzy Caraka; RIZKI SYAHPUTRA; Meigia Sari; Anil Syofra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Currently, there are many application programs to perform statistical analysis, such as SPSS, EViews, and Minitab, which are commercial software, while PSPP, JASP and PAST, which are free software. STATCAL is an user-friendly statistical application program which is developed using R programming language, in RStudio using various R packages. STATCAL is designed as simple as possible so that only need a bit of step to obtain result. Various statistical tests in STATCAL, such as normality, homogeneity, comparison of two or more means, correlation, association between categorical variables, reliability, linear regression, panel data regression, covariance-based structural equation modeling and partial least square path modeling are available. Inside STATCAL is also provided tutorial video and guidance menu to make easy for user.

  13. 96 wells fluorescence reading and R code statistic for analysis

    • zenodo.org
    bin, csv, doc, pdf
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JVD Molino; JVD Molino (2024). 96 wells fluorescence reading and R code statistic for analysis [Dataset]. http://doi.org/10.5281/zenodo.1119285
    Explore at:
    doc, csv, pdf, binAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    JVD Molino; JVD Molino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m2s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.

    Info

    ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3

    barplot_R.R -> code to generate bar plot in R statistic 3.3.3

    boxplotv2.R -> code to generate boxplot in R statistic 3.3.3

    pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.

    who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.

    who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.

    Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content

    ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.

    Consider citing our work.

    Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433

  14. d

    Replication Data for: Responsiveness of decision-makers to stakeholder...

    • dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei, Yuxuan (2023). Replication Data for: Responsiveness of decision-makers to stakeholder preferences in the European Union legislative process [Dataset]. http://doi.org/10.7910/DVN/RH5H3H
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Lei, Yuxuan
    Area covered
    European Union
    Description

    This dataset contains original quantitative datafiles, analysis data, a codebook, R scripts, syntax for replication, the original output from R studio and figures from a statistical program. The analyses can be found in Chapter 5 of my PhD dissertation, i.e., ‘Political Factors Affecting the EU Legislative Decision-Making Speed’. The data supporting the findings of this study are accessible and replicable. Restrictions apply to the availability of these data, which were used under license for this study. The datafiles include: File name of R script: Chapter 5 script.R File name of syntax: Syntax for replication 5.0.docx File name of the original output from R studio: The original output 5.0.pdf File name of code book: Codebook 5.0.txt File name of the analysis data: data5.0.xlsx File name of the dataset: Original quantitative data for Chapter 5.xlsx File name of the dataset: Codebook of policy responsiveness.pdf File name of figures: Chapter 5 Figures.zip Data analysis software: R studio R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin17.0 (64-bit)

  15. Data and Code for the Analysis of the Relationship Between Physical Activity...

    • zenodo.org
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thayse Justino Montenegro Falcão; Cyro Rego Cabral Junior; Cyro Rego Cabral Junior; Marilande Vitória Dias Rapôso; Marilande Vitória Dias Rapôso; Maria do Socorro Meneses Dantas; Maria do Socorro Meneses Dantas; Thayse Justino Montenegro Falcão (2025). Data and Code for the Analysis of the Relationship Between Physical Activity and Musculoskeletal Pain in University Staff [Dataset]. http://doi.org/10.5281/zenodo.14826721
    Explore at:
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Thayse Justino Montenegro Falcão; Cyro Rego Cabral Junior; Cyro Rego Cabral Junior; Marilande Vitória Dias Rapôso; Marilande Vitória Dias Rapôso; Maria do Socorro Meneses Dantas; Maria do Socorro Meneses Dantas; Thayse Justino Montenegro Falcão
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the dataset and R script used for the statistical analysis in the study investigating the association between physical activity and musculoskeletal pain in university staff during the COVID-19 pandemic.

    The files include:

    • Dataset (banco_dor_covid_atividade_fisica.xlsx): Contains sociodemographic variables and musculoskeletal pain reports from study participants.
    • R Script (Script_dor_covid19_atividade_fisica.R): Performs descriptive statistics, logistic regression, and Cronbach’s Alpha calculation with confidence intervals using the bootstrap method.

    This study aims to evaluate the impact of physical activity on musculoskeletal pain incidence using robust statistical methods.

    The associated scientific article is currently under peer review and will be added to this repository once published.

    Authors:

    • Thayse Justino Montenegro Falcão
    • Cyro Rego Cabral Junior (corresponding author - cyrorcjr@fanut.ufal.br)
    • Marilande Vitória Dias Rapôso
    • Maria do Socorro Meneses Dantas

    Affiliation: Federal University of Alagoas (UFAL)

    Keywords: physical activity, musculoskeletal pain, COVID-19, statistical analysis, logistic regression, Cronbach’s Alpha, RStudio.

    License: Creative Commons Attribution 4.0 International (CC-BY 4.0)

  16. d

    Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eliza Jaeger (2024). Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD FactoMineR (RStudio 4.3.2) used to probe sources of variability during an Adeno-associated viral (AAV) screen [Dataset]. http://doi.org/10.5061/dryad.mpg4f4r89
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 10, 2024
    Dataset provided by
    Dryad
    Authors
    Eliza Jaeger
    Description

    Multivariate analysis of Pleurodeles waltl injection outcomes using FAMD FactoMineR (RStudio 4.3.2) used to probe sources of variability during an Adeno-associated viral (AAV) screen

    https://doi.org/10.5061/dryad.mpg4f4r89

    Description of the data and file structure

    This repository contains a .csv and R studio file containing the full raw metadata and code used to analyze and plot the data shown in Figure 6 and Figure S4. Within the .csv file, columns indicate metadata that may be sources of variability in a comprehensive analysis of AAV transduction efficiency. The code can be used to perform FAMD on the existing dataset and can be adapted to create similar plots in additional datasets with the same structure. Both analysis and plotting parameters are included in the script.

    The headers in the .csv files with their units (if applicable) are described as follows: age (days) weight (grams) single_dual: injection site contained two viruse...

  17. m

    Data_PIBIC_3.556.646_UNIR_2019

    • data.mendeley.com
    • narcis.nl
    Updated Sep 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosely Valéria Rodrigues (2020). Data_PIBIC_3.556.646_UNIR_2019 [Dataset]. http://doi.org/10.17632/b7z524pyhv.1
    Explore at:
    Dataset updated
    Sep 1, 2020
    Authors
    Rosely Valéria Rodrigues
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multiapproach of use and abuse of alcohol, licit and illicit drugs and similars (Legal opinion n. 3.556.646 - CEP/UNIR 2019). 1st Year (august 2019 - july 2020), excessive alcohol consumption at Porto Velho-RO: categorization by CAGE. [Multiabordagem do uso e abuso de álcool, drogas lícitas e ilícitas e afins (Parecer n°. 3.556.646 - CEP/UNIR 2019). Ano 1 (agosto de 2019 - julho de 2020), abuso do consumo de álcool em Porto Velho - RO: categorização pelo CAGE.]

    Files / [arquivos]:

    1. Analysis1_R_PIBIC_3.556.646_UNIR_2019: codes for statistical analysis on R (RStudio) / [códigos para análises estatísticas no R (version 3.6.1)/Rstudio (version 1.2.1335)];
    2. Analysis1_STATA_PIBIC_3.556.646_UNIR_2019: do-file of the statistical analysis codes on STATA IC/16.1 / [do-file com códigos das análises estatísticas no STATA IC/16.1];
    3. Data_PIBIC_3.556.646_UNIR_2019.dta: edited data base on STATA / [base de dados editada no STATA];
    4. Results_Analysis1_R_PIBIC_3.556.646_UNIR_2019: "Results_Analysis1..." script analysis results / [Resultados das análises do script "Analysis1_R_PIBIC_3.556.646_UNIR_2019"]
    5. Results_Analysis1_STATA_PIBIC_3.556.646_UNIR_2019: "Analysis1_STATA_..." do-file analysis results / [Resultados das análises do do-file "Analysis1_STATA_PIBIC_3.556.646_UNIR_2019"]
    6. Results_Sample, goodness-of-fit-test: "Sample, goodness-of-fit test" script analysis results / [Resultados das análises do script "Sample, goodness-of-fit test"]
    7. Sample, goodness-of-fit test: codes of sample adequacy analysis on R (RStudio) / [códigos com análise de adequação da amostra no R/Rstudio];
    8. Variables encoding: data base variables codification / [Codificação das variáveis da base de dados]
  18. m

    Police-involved deaths and homicides - data and analysis files

    • bridges.monash.edu
    • researchdata.edu.au
    • +1more
    bin
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tyler Lane (2023). Police-involved deaths and homicides - data and analysis files [Dataset]. http://doi.org/10.26180/10033046.v10
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Monash University
    Authors
    Tyler Lane
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Analysis files and data for study of effect of police-involved civilian deaths on homicide rates. Includes csv of aggregated homicide and aggravated assault data for 44 US cities; an R project file; an R script for reproducible cleaning process; an interrupted time series analytical file, which also produces plots; a meta-analysis file, which also produces forest plots; records of police involved shootings with links to news reports.

    To use: Download all into one folder and open the R project file with R using RStudio. I have tried to make these fully functional on their own to maximise reproducibility. You will likely need to download packages (but RStudio should prompt you to the ones that are missing). If you want to re-run the cleaning file, you will have to download the UCR and city crime data. I have provided links to these sources. Otherwise, everything should run out of the box!

    Disclaimer: I do not own the original data files from cities and UCR. While I have not included these case-level data, they are all publicly available and I have provided links, aside from Tampa which I acquired through a data request. I am happy to assist any interested researchers with getting the source data.

    Update 17 June 2022: Previous versions did not include 'final protested list.rds', which is essential to run analyses. This is now added.

  19. Raw data and data for analysis in Rstudio.

    • figshare.com
    application/x-rar
    Updated Aug 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Min Chang (2024). Raw data and data for analysis in Rstudio. [Dataset]. http://doi.org/10.6084/m9.figshare.25054937.v6
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Aug 18, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Min Chang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigated predictive processing during both silent and oral reading, revealing a more pronounced predictability effect in the context of oral reading.

  20. Ski jumping results database

    • kaggle.com
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wiktor Florek (2022). Ski jumping results database [Dataset]. https://www.kaggle.com/wrotki8778/ski-jumping-results-database-2009now/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Wiktor Florek
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    Hello. As a big ski jumping fan, I would like to invite everybody to something like a project called "Ski Jumping Data Center". Primary goal is as below:

    Collect as many data about ski-jumping as possible and create as many useful insights based on them as possible

    In the mid-September last year (12.09.20) I thought "Hmm, I don't know any statistical analyses of ski jumping". In fact, the only easily found public data analysis about SJ I know is https://rstudio-pubs-static.s3.amazonaws.com/153728_02db88490f314b8db409a2ce25551b82.html

    Question is: why? This discipline is in fact overloaded with data, but almost nobody took this topic seriously. Therefore I decided to start collecting data and analyzing them. However, the amount of work needed to capture various data (i.e. jumps and results of competitions) was so big and there is so many ways to use these informations, that make it public was obvious. In fact, I have a plan to expand my database to be as big as possible, but it requires more time and (I wish) more help.

    Content

    Data below is (in a broad sense) created by merging a lot of (>6000) PDFs with the results of almost 4000 ski jumping competitions organized between (roughly) 2009 and 2021. Creation of this dataset costed me about 150 hours of coding and parsing data and over 4 months of hard work. My current algorithm can parse in a quasi-instant way results of the consecutive events, so this dataset can be easily extended. For details see the Github page: https://github.com/wrotki8778/Ski_jumping_data_center The observations contain standard information about every jump - style points, distance, take-off speed, wind etc. Main advantage of this dataset is the number of jumps - it's quite high (by the time of uploading it's almost 250 000 rows), so we can analyze this data in various ways, although the number of columns is not so insane.

    Acknowledgements

    Big "thank you" should go to the creators of tika package, because without theirs contribution I probably wouldn't create this dataset at all.

    Inspiration

    I plan to make at least a few insights from this data: 1) Are the wind/gate factor well adjusted? 2) How strong is the correlation between the distance and the style marks? Is the judgement always fair? 3) (advanced) Can we create a model that predicts the performance/distance of an athlete in a given competition? Maybe some deep learning model? 4) Which characteristics of athletes are important in achieving the best jumps - height/weight etc.?

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:
txtAvailable download formats
Dataset updated
Dec 4, 2023
Dataset provided by
figshare
Authors
Kingsley Okoye; Samira Hosseini
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Search
Clear search
Close search
Google apps
Main menu