Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R script used with accompanying data frame 'plot_character' that is within the project to calculate summary statistics and structural equation modelling.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
INTRODUCTION As part of its responsibilities, the BC Ministry of Environment monitors water quality in the province’s streams, rivers, and lakes. Often, it is necessary to compile statistics involving concentrations of contaminants or other compounds. Quite often the instruments used cannot measure concentrations below certain values. These observations are called non-detects or less thans. However, non-detects pose a difficulty when it is necessary to compute statistical measurements such as the mean, the median, and the standard deviation for a data set. The way non-detects are handled can affect the quality of any statistics generated. Non-detects, or censored data are found in many fields such as medicine, engineering, biology, and environmetrics. In such fields, it is often the case that the measurements of interest are below some threshold. Dealing with non-detects is a significant issue and statistical tools using survival or reliability methods have been developed. Basically, there are three approaches for treating data containing censored values: 1. substitution, which gives poor results and therefore, is not recommended in the literature; 2. maximum likelihood estimation, which requires an assumption of some distributional form; and 3. and nonparametric methods which assess the shape of the data based on observed percentiles rather than a strict distributional form. This document provides guidance on how to record censor data, and on when and how to use certain analysis methods when the percentage of censored observations is less than 50%. The methods presented in this document are:1. substitution; 2. Kaplan-Meier, as part of nonparametric methods; 3. lognormal model based on maximum likelihood estimation; 4. and robust regression on order statistics, which is a semiparametric method. Statistical software suitable for survival or reliability analysis is available for dealing with censored data. This software has been widely used in medical and engineering environments. In this document, methods are illustrated with both R and JMP software packages, when possible. JMP often requires some intermediate steps to obtain summary statistics with most of the methods described in this document. R, with the NADA package is usually straightforward. The package NADA was developed specifically for computing statistics with non-detects in environmental data based on Helsel (2005b). The data used to illustrate the methods described for computing summary statistics for non-detects are either simulated or based on information acquired from the B.C. Ministry of Environment. This document is strongly based on the book Nondetects And Data Analysis written by Dennis R. Helsel in 2005 (Helsel, 2005b).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Facebook
TwitterThis dataset was created by Rajdeep Kaur Bajwa
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Learn 6 essential cohort analysis reports to track SaaS growth, revenue trends, and customer churn. Data-driven insights with code examples for startup founders.
Facebook
TwitterSummary statistics of temporal trend analysis (coefficient and R square) for socio- demographic and ecological variables, (p<0.05).
Facebook
TwitterABOUT DATASET
This is the R markdown notebook. It contains step by step guide for working on Data Analysis with R. It helps you with installing the relevant packages and how to load them. it also provides a detailed summary of the "dplyr" commands that you can use to manipulate your data in the R environment.
Anyone new to R and wish to carry out some data analysis on R can check it out!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Particulate organic Carbon (POC) flux was determined through measuring Thorium (234Th) reported in dpm/kg .
AMLR (Antarctic Marine Living Resources) R/V Yuzhmorgeologiya Jan2006:
The research program was focused in the southern Drake Passage
along the Shackelton Shelf located near the Bransfield Strait.
Samples were obtained from the R/V Yuzhmorgeologiya and inflatables
that were taken to island locations.
Lat/Lon Bounding Box
-62.2538Lat, -62.9966Lon
-63.2335Lat, -59.0332Lon
-59.9964Lat, -55.7612Lon
-61.4995Lat, -53.9996Lon
NBP (Nathaniel B. Palmer) R/V Nathaniel B. Palmer July2006:
The research was conducted in the same region of the Drake Passage as the AMLR cruise.
Samples were obtained aboard the R/V Nathaniel B. Palmer
Lat/Lon bounding box
-60.4991Lat, -58.5613Lon
-62.3599Lat, -58.0392Lon
-60.2783Lat, -57.4509Lon
-61.2683Lat, -54.2852Lon
Brzezinski, M.A., Nelson, D.M., Franck, V.M. and Sigmon, D.E., 2001. "Silicon dynamics within an intense open-ocean diatom bloom in the pacific sector of the southern ocean." Deep-Sea Research Part II 48, pp. 3997-4018
Michiel Rutgers van der Loeff, Manmohan M. Sarin, Mark Baskaran, Claudia Benitez-Nelson, Ken O. Buesseler, Matt Charette, Minhan Dai, Örjan Gustafsson, Pere Masque, Paul J. Morris, Kent Orlandini, Alessia Rodriguez y Baena, Nicolas Savoye, Sabine Schmidt, Robert Turnewitsch, Ingrid Vöge, James T. Waples. "A review of present techniques and methodological advances in analyzing 234Th in aquatic systems" Marine Chemistry, Volume 100, Issues 3-4, 1 August 2006, Pages 190-212
Pike, S.M., K.O. Buesseler, J. Andrews and N. Savoye, 2005. "Quantification of 234Th recovery in small volume sea water samples by inductively coupled plasma mass spectrometry." (PDF) Journal of Radioanalytical and Nuclear Chemistry, 263(2): 355-360.
Willard S. Moore and Ralph Arnold (1996). "Measurement of 223Ra and 224Ra in coastal waters using a delayed coincidence counter." Journal of Geophysical Research, vol. 101, no. c1, pages 1321-1329, January 15, 1996.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Facebook
TwitterThese tabular data are the summarization of natural environment related variables within catchments of the Chesapeake Bay watershed at the 1:24,000 scale using the xstrm methodology. Variables being counted as natural environment related include soils/geology, lithology, elevation, slope, stream gradient, landform (geomorphon) and others. Outputs include tabular comma-separated values files (CSVs) and parquet files for the local catchment and network summaries linked to the National Hydrography Dataset Plus High-Resolution (NHDPlus HR) catchments by NHDPlus ID. Local catchments are defined as the single catchment within which the data are summarized. Network summaries are summaries for each of the local catchments and their respective network-connected upstream catchments for select variables. The summarized data tables are structured as a single column representing the catchment id values (ie. NHDPlus ID) and the remaining columns consisting of the summarized variables. Xstrm downstream network summaries are not present within this dataset as no summaries were conducted using that network summary method. For a full description of the variables included within these summaries see xstrm_nhdhr_natural_chesapeake_baywide_datadictionary.csv in the attached files. The xstrm local summary methodology takes either raster or point data as input then summarizes those data by "zones", in this case the NHDPlus HR catchments. The network summaries then take the results from the local summaries and calculates the desired network summary statistic for the local catchment and its respective upstream or downstream catchments. As a note concerning use of these data, any rasters summarized within this process only had their cells included within a catchment if the center of the raster cell fell within the catchment boundary. However, the resolution of the input raster data for these summaries was considered to provide completely adequate coverage of the summary catchments using this option. If a confirmed complete coverage of a catchment is desired (even if a raster cell only is minimally included within the catchment) then it is recommended to rerun the xstrm summary process with the "All Touched" option set to “True”. These data were updated in September 2024, March 2025, and August 2025. Some changes during these revisions include the removal of several variables unnecessary to the use of the data summaries, correction of incorrectly calculated area variables and all dependent variables, and the addition of several new variables. For a full list of changes see, xstrm_nhdhr_natural_chesapeake_baywide_versionhistory.txt. Also to note, if using R readr package read_csv function please set the guess_max parameter to 20,000 or higher. The nhdhr_underground_conduit_percent_length and nhdhr_drainageway_percent_length variables have a high number of NA values present which results in a blank column due to an error of the function's column type guess. Further information on the Xstrm summary process can be found at the Xstrm software release pages: Xstrm: Wieferich, D.J., Williams, B., Falgout, J.T., Foks, N.L. 2021. xstrm. U.S. Geological Survey software release. https://doi.org/10.5066/P9P8P7Z0. Xstrm Local: Wieferich, D.J., Gressler B., Krause K., Wieczorek M., McDonald, S. 2022. xstrm_local Version-1.1.0. U.S. Geological Survey software release. https://doi.org/10.5066/P98BOGI9.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
R code for running GLMM and BRT analysis
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time series analysis of climate data using R
Facebook
TwitterA major objective of biometrical genetics is to explore the nature of gene action in determining quantitative traits. This also includes determination of the number of major genetic factors or genes responsible for the traits. Diallel Mating Designs have been designed to deal with the type of genetic experiments that help assess variability in observed quantitative traits arising from genetic factors, environmental factors, and their interactions. Some Diallel Mating Designs are North Carolina Designs, Line by Tester Designs and Diallel designs. AGD-R is a set of R programs that performs statistical analyses to calculate Diallel, Line by Tester, North Carolina. AGD-R contains a graphical JAVA interface that helps the user to easily choose input files, which analysis to implement, and which variables to analyze.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Naturally occurring radium isotopes (224Ra, 226Ra, 228Ra) were used in determining
lateral mixing processes which are reported in dpm/m3.
AMLR (Antarctic Marine Living Resources) R/V Yuzhmorgeologiya Jan2006:
The research program was focused in the southern Drake Passage
along the Shackelton Shelf located near the Bransfield Strait.
Samples were obtained from the R/V Yuzhmorgeologiya and inflatables
that were taken to island locations.
Lat/Lon Bounding Box
-62.2538Lat, -62.9966Lon
-63.2335Lat, -59.0332Lon
-59.9964Lat, -55.7612Lon
-61.4995Lat, -53.9996Lon
NBP (Nathaniel B. Palmer) R/V Nathaniel B. Palmer July2006:
The research was conducted in the same region of the Drake Passage as the AMLR cruise.
Samples were obtained aboard the R/V Nathaniel B. Palmer
Lat/Lon bounding box
-60.4991Lat, -58.5613Lon
-62.3599Lat, -58.0392Lon
-60.2783Lat, -57.4509Lon
-61.2683Lat, -54.2852Lon
Brzezinski, M.A., Nelson, D.M., Franck, V.M. and Sigmon, D.E., 2001. "Silicon dynamics within an intense open-ocean diatom bloom in the pacific sector of the southern ocean." Deep-Sea Research Part II 48, pp. 3997-4018
Michiel Rutgers van der Loeff, Manmohan M. Sarin, Mark Baskaran, Claudia Benitez-Nelson, Ken O. Buesseler, Matt Charette, Minhan Dai, Örjan Gustafsson, Pere Masque, Paul J. Morris, Kent Orlandini, Alessia Rodriguez y Baena, Nicolas Savoye, Sabine Schmidt, Robert Turnewitsch, Ingrid Vöge, James T. Waples. "A review of present techniques and methodological advances in analyzing 234Th in aquatic systems" Marine Chemistry, Volume 100, Issues 3-4, 1 August 2006, Pages 190-212
Pike, S.M., K.O. Buesseler, J. Andrews and N. Savoye, 2005. "Quantification of 234Th recovery in small volume sea water samples by inductively coupled plasma mass spectrometry." (PDF) Journal of Radioanalytical and Nuclear Chemistry, 263(2): 355-360.
Willard S. Moore and Ralph Arnold (1996). "Measurement of 223Ra and 224Ra in coastal waters using a delayed coincidence counter." Journal of Geophysical Research, vol. 101, no. c1, pages 1321-1329, January 15, 1996.
Facebook
TwitterThis data release contains the U.S. salient statistics and world production data extracted from the QUARTZ (HIGH-PURITY AND INDUSTRIAL CULTURED CRYSTAL) data sheet of the USGS Mineral Commodity Summaries 2025.
Facebook
TwitterThis is provided by ESF 9, sar@vdem.virginia.gov for use on the Virginia SAR Hub Ground Ops page [https://virginia-sarhub-vdemgis.hub.arcgis.com/pages/ground-ops].
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
ERRA is a data-driven, nonparametric, model-independent method for quantifying rainfall-runoff relationships across a spectrum of time lags, in systems that may be nonlinear, nonstationary, and spatially heterogeneous. Researchers using ERRA in published work should cite J.W. Kirchner, "Characterizing nonlinear, nonstationary, and heterogeneous hydrologic behavior using Ensemble Rainfall-Runoff Analysis (ERRA): proof of concept", Hydrology and Earth System Sciences, 2024 (for ERRA itself) and J.W. Kirchner, "Impulse response functions for nonlinear, nonstationary, and heterogeneous systems, estimated by deconvolution and de-mixing of noisy time series", Sensors, 22(9), 3291, https://doi.org/10.3390/s22093291, 2022 (for the underlying mathematics). This data set includes two versions of the ERRA script written in the open-source programming language R, a detailed user's guide, and sample scripts and source data for all of the results in Kirchner (2024). These scripts are made publicly available under GNU General Public License 3; for details see https://www.gnu.org/licenses/. The data and documentation are made available under Creative Commons Attribution Share-Alike CC-BY-SA. ETH Zurich, WSL, and James Kirchner make ABSOLUTELY NO WARRANTIES OF ANY KIND, including NO WARRANTIES, expressed or implied, that this software is free of errors or is suitable for any particular purpose. Users are solely responsible for determining the suitability and reliability of this software for their own purposes.
Facebook
TwitterAttribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
\r The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.\r \r \r \r There are 4 csv files here:\r \r BAWAP_P_annual_BA_SYB_GLO.csv\r \r Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.\r \r Source data: annual BILO rainfall on \\wron\Project\BA\BA_N_Sydney\Working\li036_Lingtao_LI\Grids\BILO_Rain_Ann\\r \r \r \r P_PET_monthly_BA_SYB_GLO.csv\r \r long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month\r \r \r \r Climatology_Trend_BA_SYB_GLO.csv\r \r Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend\r \r \r \r Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv\r \r Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). All data used in this analysis came directly from James Risbey, CMAR, Hobart. As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).\r \r
\r Dataset was created from various BILO source data, including Monthly BILO rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET (calculated by Randall Donohue), Correlation coefficient data from James Risbey\r \r
\r Bioregional Assessment Programme (XXXX) SYD ALL climate data statistics summary. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/b0a6ccf1-395d-430e-adf1-5068f8371dea.\r \r
\r * Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012\r \r
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GWAS summary statistics for multivariate GWAS model extension of cognitive and noncognitive skills. From: 'Malanchini, M., Allegrini, A. G., Nivard, M. G., Biroli, P., Rimfeld, K., Cheesman, R., ... & Plomin, R. (2023). Genetic contributions of noncognitive skills to academic development. Research Square.' Columns: SNP = rsID, CHR = chromosome, BP = position, MAF = minor allele frequency (1000 Genomes Phase 3), A1 = effect allele, A2 = other allele, BETA = estimate of the SNP effect, SE = standard error of BETA, Z = Z-statistic, PVAL = p-value.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R script used with accompanying data frame 'plot_character' that is within the project to calculate summary statistics and structural equation modelling.