100+ datasets found

f
Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
figshare
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
f
Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...
frontiersin.figshare.com
docx
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s001
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Basic R for Data Analysis
kaggle.com
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kebba Ndure (2024). Basic R for Data Analysis [Dataset]. https://www.kaggle.com/datasets/kebbandure/basic-r-for-data-analysis/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kebba Ndure
Description
ABOUT DATASET

This is the R markdown notebook. It contains step by step guide for working on Data Analysis with R. It helps you with installing the relevant packages and how to load them. it also provides a detailed summary of the "dplyr" commands that you can use to manipulate your data in the R environment.

Anyone new to R and wish to carry out some data analysis on R can check it out!
w
Dataset of books called An introduction to data analysis in R : hands-on...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called An introduction to data analysis in R : hands-on coding, data mining, visualization and statistics from scratch [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=An+introduction+to+data+analysis+in+R+%3A+hands-on+coding%2C+data+mining%2C+visualization+and+statistics+from+scratch
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is An introduction to data analysis in R : hands-on coding, data mining, visualization and statistics from scratch. It features 7 columns including author, publication date, language, and book publisher.
a
Collision Analysis with R
hub.arcgis.com
Updated Oct 22, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Civic Analytics Network (2016). Collision Analysis with R [Dataset]. https://hub.arcgis.com/documents/1e1b49837b4d454e8b218697fc4fee40
Explore at:
Dataset updated
Oct 22, 2016
Dataset authored and provided by
Civic Analytics Network
Description
Taking place at the Leeds Institute for Data Analytics on April 27th as part of the Leeds Digital Festival, the aim of the Vision Zero Innovation Lab is to explore ways to reduce the number of road casualties to zero in Leeds. If you would like to get involved or find out more, check out the event on eventbrite.Student Data Labs runs data-driven Innovation Labs for university students to learn practical data skills whilst working on civic problems. In the past, we have held Labs that tackle Type 2 Diabetes and health inequalities in Leeds. Student Data Labs works with an interdisciplinary team of students, data scientists, designers, researchers and software developers. We also aim to connect our Data Lab Volunteers with local employers who may be interested in employing them upon graduation. Visit our website, Twitter or Facebook for more info.The Vision Zero Innovation Lab is split into two sections - a Learning Lab and a Innovation Lab. The Learning Lab helps students learn real-world data skills - getting them up and running with tools like R as well as common data science problems as part of a team. The Innovation Lab is more experimental, where the aim is to develop ideas and data-driven tools to take on wicked problems.
p
Climate Time Series Analysis using R
purr.purdue.edu
Updated Jan 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sushant Mehan; Margaret Gitau (2019). Climate Time Series Analysis using R [Dataset]. http://doi.org/10.4231/R77H1GTX
Explore at:
Unique identifier
https://doi.org/10.4231/R77H1GTX
Dataset updated
Jan 1, 2019
Dataset provided by
PURR
Authors
Sushant Mehan; Margaret Gitau
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time series analysis of climate data using R
Data Set for "Analyzing Microbial Growth with R"
zenodo.org
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian D. Connelly; Brian D. Connelly (2020). Data Set for "Analyzing Microbial Growth with R" [Dataset]. http://doi.org/10.5281/zenodo.1171129
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1171129
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Brian D. Connelly; Brian D. Connelly
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample data set used in "Analyzing Microbial Growth with R"
d
Political Analysis Using R: Example Code and Data, Plus Data for Practice...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monogan, Jamie (2023). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ARKOTI
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Monogan, Jamie
Description
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
E
Data from: AGD-R (Analysis of Genetic Designs with R for Windows) Version...
data.moa.gov.et
html
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CIMMYT Ethiopia (2025). AGD-R (Analysis of Genetic Designs with R for Windows) Version 5.0 [Dataset]. https://data.moa.gov.et/dataset/hdl-11529-10202
Explore at:
htmlAvailable download formats
Dataset updated
Jan 20, 2025
Dataset provided by
CIMMYT Ethiopia
Description
A major objective of biometrical genetics is to explore the nature of gene action in determining quantitative traits. This also includes determination of the number of major genetic factors or genes responsible for the traits. Diallel Mating Designs have been designed to deal with the type of genetic experiments that help assess variability in observed quantitative traits arising from genetic factors, environmental factors, and their interactions. Some Diallel Mating Designs are North Carolina Designs, Line by Tester Designs and Diallel designs. AGD-R is a set of R programs that performs statistical analyses to calculate Diallel, Line by Tester, North Carolina. AGD-R contains a graphical JAVA interface that helps the user to easily choose input files, which analysis to implement, and which variables to analyze.
Bayesian data analysis in the phonetic sciences: A tutorial introduction
osf.io
Updated Apr 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shravan Vasishth; Bruno Nicenboim; Mary Beckman; Iona Gessinger; Soham Mukherjee (2022). Bayesian data analysis in the phonetic sciences: A tutorial introduction [Dataset]. http://doi.org/10.17605/osf.io/g4zpv
Explore at:
Unique identifier
https://doi.org/10.17605/osf.io/g4zpv
Dataset updated
Apr 18, 2022
Dataset provided by
Center for Open Sciencehttps://cos.io/
Authors
Shravan Vasishth; Bruno Nicenboim; Mary Beckman; Iona Gessinger; Soham Mukherjee
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This tutorial analyzes voice onset time (VOT) data from Dongbei (Northeastern) Mandarin Chinese and North American English to demonstrate how Bayesian linear mixed models can be fit using the programming language Stan via the R package brms. Through this case study, we demonstrate some of the advantages of the Bayesian framework: researchers can (i) flexibly define the underlying process that they believe to have generated the data; (ii) obtain direct information regarding the uncertainty about the parameter that relates the data to the theoretical question being studied; and (iii) incorporate prior knowledge into the analysis. Getting started with Bayesian modeling can be challenging, especially when one is trying to model one’s own (often unique) data. It is difficult to see how one can apply general principles described in textbooks to one’s own specific research problem. We address this barrier to using Bayesian methods by providing three detailed examples, with source code to allow easy reproducibility. The examples presented are intended to give the reader a flavor of the process of model-fitting; suggestions for further study are also provided. All data and code are available from this website.
d
R programming code for analyzing output from the Stochastic Empirical...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). R programming code for analyzing output from the Stochastic Empirical Loading Dilution Model created for U.S. Geological Survey Scientific Investigations Report 2019-5053, 116 p., https://doi.org/10.3133/sir20195053 [Dataset]. https://catalog.data.gov/dataset/r-programming-code-for-analyzing-output-from-the-stochastic-empirical-loading-dilution-mod
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This R script can be used to analyze SELDM results. The script is specifically tailored for the SELDM simulations used in the publication: Stonewall, A.J., and Granato, G.E., 2018, Assessing potential effects of highway and urban runoff on receiving streams in total maximum daily load watersheds in Oregon using the Stochastic Empirical Loading and Dilution Model: U.S. Geological Survey Scientific Investigations Report 2019-5053, 116 p., https://doi.org/10.3133/sir20195053
d
Data from: Introduction to R Programming
search.dataone.org
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristi Thompson; Lucia Costanzo (2023). Introduction to R Programming [Dataset]. http://doi.org/10.5683/SP3/GBUD61
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/GBUD61
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Kristi Thompson; Lucia Costanzo
Description
R is an open-source software environment for data manipulation and statistical analysis. Used in a variety of disciplines, R has become a popular tool because of its power, flexibility, and active community. Join us as we teach the R language fundamentals and basic syntax, major R data structures and generate basic descriptive statistics.
m
Data for: Running a Confirmatory Factor Analysis in R: a step-by-step...
data.mendeley.com
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregor Stiglic (2022). Data for: Running a Confirmatory Factor Analysis in R: a step-by-step tutorial [Dataset]. http://doi.org/10.17632/bkh8wtgmkg.1
Explore at:
Unique identifier
https://doi.org/10.17632/bkh8wtgmkg.1
Dataset updated
Mar 31, 2022
Authors
Gregor Stiglic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary files for the "Running a Confirmatory Factor Analysis in R: a step-by-step tutorial" consist of an R script and data needed to run the analysis.
R-code, Dataset, Analysis and output (2012-2020): Occupancy and Probability...
catalog.data.gov
datasets.ai
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Fish and Wildlife Service (2025). R-code, Dataset, Analysis and output (2012-2020): Occupancy and Probability of Detection for Bachman's Sparrow (Aimophila aestivalis), Northern Bobwhite (Collinus virginianus), and Brown-headed Nuthatch (Sitta pusilla) to Habitat Management Practices on Carolina Sandhills NWR [Dataset]. https://catalog.data.gov/dataset/r-code-dataset-analysis-and-output-2012-2020-occupancy-and-probability-of-detection-for-ba
Explore at:
Dataset updated
Feb 22, 2025
Dataset provided by
U.S. Fish and Wildlife Servicehttp://www.fws.gov/
Description
This reference contains the R-code for the analysis and summary of detections of Bachman's sparrow, bobwhite quail and brown-headed nuthatch through 2020. Specifically generates probability of detection and occupancy of the species based on call counts and elicited calls with playback. The code loads raw point count (CSV files) and fire history data (CSV) and cleans/transforms into a tidy format for occupancy analysis. It then creates the necessary data structure for occupancy analysis, performs the analysis for the three focal species, and provides functionality for generating tables and figures summarizing the key findings of the occupancy analysis. The raw data, point count locations and other spatial data (ShapeFiles) are contained in the dataset.
R scripts
figshare.com
txt
Updated May 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xueying Han (2018). R scripts [Dataset]. http://doi.org/10.6084/m9.figshare.5513170.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5513170.v3
Dataset updated
May 10, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Xueying Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R scripts in this fileset are those used in the PLOS ONE publication "A snapshot of translational research funded by the National Institutes of Health (NIH): A case study using behavioral and social science research awards and Clinical and Translational Science Awards funded publications." The article can be accessed here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196545This consists of all R scripts used for data cleaning, data manipulation, and statistical analysis used in the publication.There are eleven files in total:1. "Step1a.bBSSR.format.grants.and.publications.data.R" combines all bBSSR 2008-2014 grant award data and associated publications downloaded from NIH Reporter. 2. "Step1b.BSSR.format.grants.and.publications.data.R" combines all BSSR-only 2008-2014 grant award data and associated publications downloaded from NIH Reporter. 3. "Step2a.bBSSR.get.pubdates.transl.and.all.grants.R" queries PubMed and downloads associated bBSSR publication data.4. "Step2b.BSSR.get.pubdates.transl.and.all.grants.R" queries PubMed and downloads associated BSSR-only publication data.5. "Step3.summary.stats.R" performs summary statistics6. "Step4.time.to.first.publication.R" performs time to first publication analysis.7. "Step5.time.to.citation.analysis.R" performs time to first citation and time to overall citation analyses.8. "Step6.combine.NIH.iCite.data.R" combines NIH iCite citation data.9. "Step7.iCite.data.analysis.R" performs citation analysis on combined iCite data.10. "Step8.MeSH.descriptors.R" queries PubMed and pulls down all MeSH descriptors for all publications11. "Step9.CTSA.publications.R" compares the percent of translational publications among bBSSR, BSSR-only, and CTSA publications.
Data and Code for "Climate impacts and adaptation in US dairy systems...
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Gisbert-Queral; Maria Gisbert-Queral; Arne Henningsen; Arne Henningsen; Bo Markussen; Bo Markussen; Meredith T. Niles; Ermias Kebreab; Ermias Kebreab; Angela J. Rigden; Angela J. Rigden; Nathaniel D. Mueller; Nathaniel D. Mueller; Meredith T. Niles (2021). Data and Code for "Climate impacts and adaptation in US dairy systems 1981-2018" [Dataset]. http://doi.org/10.5281/zenodo.4818011
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4818011
Dataset updated
Oct 22, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Gisbert-Queral; Maria Gisbert-Queral; Arne Henningsen; Arne Henningsen; Bo Markussen; Bo Markussen; Meredith T. Niles; Ermias Kebreab; Ermias Kebreab; Angela J. Rigden; Angela J. Rigden; Nathaniel D. Mueller; Nathaniel D. Mueller; Meredith T. Niles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
This data and code archive provides all the files that are necessary to replicate the empirical analyses that are presented in the paper "Climate impacts and adaptation in US dairy systems 1981-2018" authored by Maria Gisbert-Queral, Arne Henningsen, Bo Markussen, Meredith T. Niles, Ermias Kebreab, Angela J. Rigden, and Nathaniel D. Mueller and published in 'Nature Food' (2021, DOI: 10.1038/s43016-021-00372-z). The empirical analyses are entirely conducted with the "R" statistical software using the add-on packages "car", "data.table", "dplyr", "ggplot2", "grid", "gridExtra", "lmtest", "lubridate", "magrittr", "nlme", "OneR", "plyr", "pracma", "quadprog", "readxl", "sandwich", "tidyr", "usfertilizer", and "usmap". The R code was written by Maria Gisbert-Queral and Arne Henningsen with assistance from Bo Markussen. Some parts of the data preparation and the analyses require substantial amounts of memory (RAM) and computational power (CPU). Running the entire analysis (all R scripts consecutively) on a laptop computer with 32 GB physical memory (RAM), 16 GB swap memory, an 8-core Intel Xeon CPU E3-1505M @ 3.00 GHz, and a GNU/Linux/Ubuntu operating system takes around 11 hours. Running some parts in parallel can speed up the computations but bears the risk that the computations terminate when two or more memory-demanding computations are executed at the same time.

This data and code archive contains the following files and folders:

* README
Description: text file with this description

* flowchart.pdf
Description: a PDF file with a flow chart that illustrates how R scripts transform the raw data files to files that contain generated data sets and intermediate results and, finally, to the tables and figures that are presented in the paper.

* runAll.sh
Description: a (bash) shell script that runs all R scripts in this data and code archive sequentially and in a suitable order (on computers with a "bash" shell such as most computers with MacOS, GNU/Linux, or Unix operating systems)

* Folder "DataRaw"
Description: folder for raw data files
This folder contains the following files:

- DataRaw/COWS.xlsx
Description: MS-Excel file with the number of cows per county
Source: USDA NASS Quickstats
Observations: All available counties and years from 2002 to 2012

- DataRaw/milk_state.xlsx
Description: MS-Excel file with average monthly milk yields per cow
Source: USDA NASS Quickstats
Observations: All available states from 1981 to 2018

- DataRaw/TMAX.csv
Description: CSV file with daily maximum temperatures
Source: PRISM Climate Group (spatially averaged)
Observations: All counties from 1981 to 2018

- DataRaw/VPD.csv
Description: CSV file with daily maximum vapor pressure deficits
Source: PRISM Climate Group (spatially averaged)
Observations: All counties from 1981 to 2018

- DataRaw/countynamesandID.csv
Description: CSV file with county names, state FIPS codes, and county FIPS codes
Source: US Census Bureau
Observations: All counties

- DataRaw/statecentroids.csv
Descriptions: CSV file with latitudes and longitudes of state centroids
Source: Generated by Nathan Mueller from Matlab state shapefiles using the Matlab "centroid" function
Observations: All states

* Folder "DataGenerated"
Description: folder for data sets that are generated by the R scripts in this data and code archive. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these generated data files so that parts of the analysis can be replicated (e.g., on computers with insufficient memory to run all parts of the analysis).

* Folder "Results"
Description: folder for intermediate results that are generated by the R scripts in this data and code archive. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these intermediate results so that parts of the analysis can be replicated (e.g., on computers with insufficient memory to run all parts of the analysis).

* Folder "Figures"
Description: folder for the figures that are generated by the R scripts in this data and code archive and that are presented in our paper. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these figures so that people who replicate our analysis can more easily compare the figures that they get with the figures that are presented in our paper. Additionally, this folder contains CSV files with the data that are required to reproduce the figures.

* Folder "Tables"
Description: folder for the tables that are generated by the R scripts in this data and code archive and that are presented in our paper. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these tables so that people who replicate our analysis can more easily compare the tables that they get with the tables that are presented in our paper.

* Folder "logFiles"
Description: the shell script runAll.sh writes the output of each R script that it runs into this folder. We provide these log files so that people who replicate our analysis can more easily compare the R output that they get with the R output that we got.

* PrepareCowsData.R
Description: R script that imports the raw data set COWS.xlsx and prepares it for the further analyses

* PrepareWeatherData.R
Description: R script that imports the raw data sets TMAX.csv, VPD.csv, and countynamesandID.csv, merges these three data sets, and prepares the data for the further analyses

* PrepareMilkData.R
Description: R script that imports the raw data set milk_state.xlsx and prepares it for the further analyses

* CalcFrequenciesTHI_Temp.R
Description: R script that calculates the frequencies of days with the different THI bins and the different temperature bins in each month for each state

* CalcAvgTHI.R
Description: R script that calculates the average THI in each state

* PreparePanelTHI.R
Description: R script that creates a state-month panel/longitudinal data set with exposure to the different THI bins

* PreparePanelTemp.R
Description: R script that creates a state-month panel/longitudinal data set with exposure to the different temperature bins

* PreparePanelFinal.R
Description: R script that creates the state-month panel/longitudinal data set with all variables (e.g., THI bins, temperature bins, milk yield) that are used in our statistical analyses

* EstimateTrendsTHI.R
Description: R script that estimates the trends of the frequencies of the different THI bins within our sampling period for each state in our data set

* EstimateModels.R
Description: R script that estimates all model specifications that are used for generating results that are presented in the paper or for comparing or testing different model specifications

* CalcCoefStateYear.R
Description: R script that calculates the effects of each THI bin on the milk yield for all combinations of states and years based on our 'final' model specification

* SearchWeightMonths.R
Description: R script that estimates our 'final' model specification with different values of the weight of the temporal component relative to the weight of the spatial component in the temporally and spatially correlated error term

* TestModelSpec.R
Description: R script that applies Wald tests and Likelihood-Ratio tests to compare different model specifications and creates Table S10

* CreateFigure1a.R
Description: R script that creates subfigure a of Figure 1

* CreateFigure1b.R
Description: R script that creates subfigure b of Figure 1

* CreateFigure2a.R
Description: R script that creates subfigure a of Figure 2

* CreateFigure2b.R
Description: R script that creates subfigure b of Figure 2

* CreateFigure2c.R
Description: R script that creates subfigure c of Figure 2

* CreateFigure3.R
Description: R script that creates the subfigures of Figure 3

* CreateFigure4.R
Description: R script that creates the subfigures of Figure 4

* CreateFigure5_TableS6.R
Description: R script that creates the subfigures of Figure 5 and Table S6

* CreateFigureS1.R
Description: R script that creates Figure S1

* CreateFigureS2.R
Description: R script that creates Figure S2

* CreateTableS2_S3_S7.R
Description: R script that creates Tables S2, S3, and S7

* CreateTableS4_S5.R
Description: R script that creates Tables S4 and S5

* CreateTableS8.R
Description: R script that creates Table S8

* CreateTableS9.R
Description: R script that creates Table S9

Codes in R for spatial statistics analysis, ecological response models and...

zenodo.org
data.niaid.nih.gov

bin

Updated Apr 24, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

D. W. Rössel-Ramírez; D. W. Rössel-Ramírez; J. Palacio-Núñez; J. Palacio-Núñez; S. Espinosa; S. Espinosa; J. F. Martínez-Montoya; J. F. Martínez-Montoya (2025). Codes in R for spatial statistics analysis, ecological response models and spatial distribution models [Dataset]. http://doi.org/10.5281/zenodo.7603557

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7603557

Dataset updated

Apr 24, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

D. W. Rössel-Ramírez; D. W. Rössel-Ramírez; J. Palacio-Núñez; J. Palacio-Núñez; S. Espinosa; S. Espinosa; J. F. Martínez-Montoya; J. F. Martínez-Montoya

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).

It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:

In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).

Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).

After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.

Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).

Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.

On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).

Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).

	Validation set
Model	True	False
Presence	A	B
Background	C	D

We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).

The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.

Regarding the model evaluation and estimation, we selected the following estimators:

1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).

2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).

A
‘2018 - 2020 CLASS and ECERS-R Data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘2018 - 2020 CLASS and ECERS-R Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-2018-2020-class-and-ecers-r-data-dca3/latest
Explore at:
Dataset updated
Jan 26, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘2018 - 2020 CLASS and ECERS-R Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/f7c75606-4b80-4d35-bd02-c7b9423a1253 on 26 January 2022.

--- Dataset description provided by original source is as follows ---

2018 - 2020 CLASS and ECERS-R Data

--- Original source retains full ownership of the source dataset ---
m
R Code for Systematic Review and Meta Analysis
data.mendeley.com
Updated May 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carmen Isensee (2020). R Code for Systematic Review and Meta Analysis [Dataset]. http://doi.org/10.17632/hympskpm3x.1
Explore at:
Unique identifier
https://doi.org/10.17632/hympskpm3x.1
Dataset updated
May 22, 2020
Authors
Carmen Isensee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This project presents all codes related to the review paper "The relationship between organizational culture, sustainability, and digitalization in SMEs: A systematic review."
d
R-scripts for uncertainty analysis v01
data.gov.au
researchdata.edu.au
+2more
zip
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). R-scripts for uncertainty analysis v01 [Dataset]. https://data.gov.au/data/dataset/322c38ef-272f-4e77-964c-a14259abe9cf
Explore at:
zip(9161)Available download formats
Dataset updated
Apr 13, 2022
Dataset authored and provided by
Bioregional Assessment Program
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Abstract

This dataset was created within the Bioregional Assessment Programme. Data has not been derived from any source datasets. Metadata has been compiled by the Bioregional Assessment Programme.

This dataset contains a set of generic R scripts that are used in the propagation of uncertainty through numerical models.

Dataset History

The dataset contains a set of R scripts that are loaded as a library. The R scripts are used to carry out the propagation of uncertainty through numerical models. The scripts contain the functions to create the statistical emulators and do the necessary data transformations and backtransformations. The scripts are self-documenting and created by Dan Pagendam (CSIRO) and Warren Jin (CSIRO).

Dataset Citation

Bioregional Assessment Programme (2016) R-scripts for uncertainty analysis v01. Bioregional Assessment Source Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/322c38ef-272f-4e77-964c-a14259abe9cf.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24728073.v1

Dataset updated

Dec 4, 2023

Dataset provided by

figshare

Authors

Kingsley Okoye; Samira Hosseini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Clear search

Close search

Google apps

Main menu

Collection of example datasets used for the book - R Programming -...

Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...

Basic R for Data Analysis

Dataset of books called An introduction to data analysis in R : hands-on...

Collision Analysis with R

Climate Time Series Analysis using R

Data Set for "Analyzing Microbial Growth with R"

Political Analysis Using R: Example Code and Data, Plus Data for Practice...

Data from: AGD-R (Analysis of Genetic Designs with R for Windows) Version...

Bayesian data analysis in the phonetic sciences: A tutorial introduction

R programming code for analyzing output from the Stochastic Empirical...

Data from: Introduction to R Programming

Data for: Running a Confirmatory Factor Analysis in R: a step-by-step...

R-code, Dataset, Analysis and output (2012-2020): Occupancy and Probability...

R scripts

Data and Code for "Climate impacts and adaptation in US dairy systems...

Codes in R for spatial statistics analysis, ecological response models and...

‘2018 - 2020 CLASS and ECERS-R Data’ analyzed by Analyst-2

R Code for Systematic Review and Meta Analysis

R-scripts for uncertainty analysis v01

Abstract

Dataset History

Dataset Citation

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research