96 datasets found

f
Data from: Importing General-Purpose Graphics in R
figshare.com
auckland.figshare.com
application/gzip
Updated Sep 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Murrell (2018). Importing General-Purpose Graphics in R [Dataset]. http://doi.org/10.17608/k6.auckland.7108736.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.17608/k6.auckland.7108736.v1
Dataset updated
Sep 19, 2018
Dataset provided by
The University of Auckland
Authors
Paul Murrell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This report discusses some problems that can arise when attempting to import PostScript images into R, when the PostScript image contains coordinate transformations that skew the image. There is a description of some new features in the ‘grImport’ package for R that allow these sorts of images to be imported into R successfully.
r
Data from: Working with a linguistic corpus using R: An introductory note...
researchdata.edu.au
bridges.monash.edu
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gede Primahadi Wijaya Rajeg; I Made Rajeg; Karlina Denistia (2022). Working with a linguistic corpus using R: An introductory note with Indonesian Negating Construction [Dataset]. http://doi.org/10.4225/03/5a7ee2ac84303
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a7ee2ac84303
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Gede Primahadi Wijaya Rajeg; I Made Rajeg; Karlina Denistia
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is a repository for codes and datasets for the open-access paper in Linguistik Indonesia, the flagship journal for the Linguistic Society of Indonesia (Masyarakat Linguistik Indonesia [MLI]) (cf. the link in the references below).

To cite the paper (in APA 6th style):
Rajeg, G. P. W., Denistia, K., & Rajeg, I. M. (2018). Working with a linguistic corpus using R: An introductory note with Indonesian negating construction. Linguistik Indonesia, 36(1), 1–36. doi: 10.26499/li.v36i1.71

To cite this repository:
Click on the Cite (dark-pink button on the top-left) and select the citation style through the dropdown button (default style is Datacite option (right-hand side)

This repository consists of the following files:
1. Source R Markdown Notebook (.Rmd file) used to write the paper and containing the R codes to generate the analyses in the paper.
2. Tutorial to download the Leipzig Corpus file used in the paper. It is freely available on the Leipzig Corpora Collection Download page.
3. Accompanying datasets as images and .rds format so that all code-chunks in the R Markdown file can be run.
4. BibLaTeX and .csl files for the referencing and bibliography (with APA 6th style).
5. A snippet of the R session info after running all codes in the R Markdown file.
6. RStudio project file (.Rproj). Double click on this file to open an RStudio session associated with the content of this repository. See here and here for details on Project-based workflow in RStudio.
7. A .docx template file following the basic stylesheet for Linguistik Indonesia

Put all these files in the same folder (including the downloaded Leipzig corpus file)!

To render the R Markdown into MS Word document, we use the bookdown R package (Xie, 2018). Make sure this package is installed in R.

Yihui Xie (2018). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.6.
a
Collision Analysis with R
hub.arcgis.com
Updated Oct 22, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Civic Analytics Network (2016). Collision Analysis with R [Dataset]. https://hub.arcgis.com/documents/1e1b49837b4d454e8b218697fc4fee40
Explore at:
Dataset updated
Oct 22, 2016
Dataset authored and provided by
Civic Analytics Network
Description
Taking place at the Leeds Institute for Data Analytics on April 27th as part of the Leeds Digital Festival, the aim of the Vision Zero Innovation Lab is to explore ways to reduce the number of road casualties to zero in Leeds. If you would like to get involved or find out more, check out the event on eventbrite.Student Data Labs runs data-driven Innovation Labs for university students to learn practical data skills whilst working on civic problems. In the past, we have held Labs that tackle Type 2 Diabetes and health inequalities in Leeds. Student Data Labs works with an interdisciplinary team of students, data scientists, designers, researchers and software developers. We also aim to connect our Data Lab Volunteers with local employers who may be interested in employing them upon graduation. Visit our website, Twitter or Facebook for more info.The Vision Zero Innovation Lab is split into two sections - a Learning Lab and a Innovation Lab. The Learning Lab helps students learn real-world data skills - getting them up and running with tools like R as well as common data science problems as part of a team. The Innovation Lab is more experimental, where the aim is to develop ideas and data-driven tools to take on wicked problems.
Activity In R
kaggle.com
zip
Updated Aug 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manohar Reddy (2019). Activity In R [Dataset]. https://www.kaggle.com/datasets/manohar676/activity-in-r
Explore at:
zip(368 bytes)Available download formats
Dataset updated
Aug 30, 2019
Authors
Manohar Reddy
Description
Dataset

This dataset was created by Manohar Reddy

Contents
f
datasets
figshare.com
txt
Updated Sep 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Rodriguez-Contreras (2017). datasets [Dataset]. http://doi.org/10.6084/m9.figshare.5447167.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5447167.v1
Dataset updated
Sep 27, 2017
Dataset provided by
figshare
Authors
Carlos Rodriguez-Contreras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contains datasets to be downloaded from students for their practices with R and Python
U
Replication data and code for analyses in R presented in: Volcanic climate...
dataverse.ucla.edu
bin, html, tsv, txt
Updated Feb 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R.J. Sinensky; R.J. Sinensky (2022). Replication data and code for analyses in R presented in: Volcanic climate forcing, extreme cold and the Neolithic Transition in the northern US Southwest [Dataset]. http://doi.org/10.25346/S6/N3RVLC
Explore at:
tsv(92491), html(6992077), txt(42582), tsv(25713), tsv(44603), bin(28673), tsv(77600), tsv(675537), txt(3689), tsv(431249)Available download formats
Unique identifier
https://doi.org/10.25346/S6/N3RVLC
Dataset updated
Feb 8, 2022
Dataset provided by
UCLA Dataverse
Authors
R.J. Sinensky; R.J. Sinensky
License
https://dataverse.ucla.edu/api/datasets/:persistentId/versions/4.3/customlicense?persistentId=doi:10.25346/S6/N3RVLChttps://dataverse.ucla.edu/api/datasets/:persistentId/versions/4.3/customlicense?persistentId=doi:10.25346/S6/N3RVLC
Area covered
Southwestern United States, United States
Description
Online Supplemental Material 2 (OSM 2) contains the data and code necessary to generate Figures 3-6, 8-9, S1 and S5-S6 presented in Sinensky et al. (2022). The R Markdown document (OSM 2.0) will render these figures using the data provided in OSM 2.1-2.6.
f
Data_Sheet_5_“R” U ready?: a case study using R to analyze changes in gene...
frontiersin.figshare.com
docx
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_5_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s005
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s005
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
d
Scripts to run R-QWTREND models and produce results.
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Scripts to run R-QWTREND models and produce results. [Dataset]. https://catalog.data.gov/dataset/scripts-to-run-r-qwtrend-models-and-produce-results
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This child page contains a zipped folder which contains all files necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2020–5079 [Nustad, R.A., and Vecchia, A.V., 2020, Water-quality trends for selected sites and constituents in the international Red River of the North Basin, Minnesota and North Dakota, United States, and Manitoba, Canada, 1970–2017: U.S. Geological Survey Scientific Investigations Report 2020–5079, 75 p., https://doi.org/10.3133/sir20205079]. The folder contains: six files required to run the R–QWTREND trend analysis tool; a readme.txt file; an alldata.RData file; a siteinfo_appendix.txt: and a folder called "scripts". R–QWTREND is a software package for analyzing trends in stream-water quality. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for using R–QWTREND: • Windows 10 operating system • R (version 3.4 or later; 64 bit recommended) • RStudio (version 1.1.456 or later). An accompanying report (Vecchia and Nustad, 2020) serves as the formal documentation for R–QWTREND. Vecchia, A.V., and Nustad, R.A., 2020, Time-series model, statistical methods, and software documentation for R–QWTREND—An R package for analyzing trends in stream-water quality: U.S. Geological Survey Open-File Report 2020–1014, 51 p., https://doi.org/10.3133/ofr20201014 R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed June 12, 2019, at https://www.r-project.org.
d
Replication Data for: Reining in the Rascals: Challenger Parties' Path to...
search.dataone.org
dataverse.harvard.edu
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hjorth, Frederik; Jacob Nyrup; Martin Vinæs Larsen (2024). Replication Data for: Reining in the Rascals: Challenger Parties' Path to Power [Dataset]. http://doi.org/10.7910/DVN/FLGPW8
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/FLGPW8
Dataset updated
Mar 6, 2024
Dataset provided by
Harvard Dataverse
Authors
Hjorth, Frederik; Jacob Nyrup; Martin Vinæs Larsen
Description
### Information for replicating the analysis for "Reining in the Rascals: Challenger Parties' Path to Power" ### The Journal of Politics ### ### Frederik Hjorth, Jacob Nyrup & Martin Vinæs Larsen ###### All code to replicate the analysis is written in R. 14 files in total are used to replicate the analysis in the article: 5 r-scripts and 9 datafiles. The scripts use the R package "pacman" to install and load relevant packages, which is handled by the function pacman::p_load(). To make sure the function runs, the replicator should have "pacman" installed. The scripts use the R package "here" to automatically set the working directory to the replication folder. If "here" fails to locate the appropriate folder, simply set the working directory to the folder containing scripts and data using setwd(). When running the analysis it is important that 00-helperfunctions.R is loaded into R. This file contains a list of extra functions used throughout the analysis. ### List of r-scripts 00-helperfunctions.R 01-comparativeanalysis.R 02-mainanalysis.R 03-mechanismanalysis.R 04-appendix.R ### List of datasets df_comparative.xlsx df_main.rds df_mainretroactive.rds dkvaa13txtdf.rds dkvaa17txtdf.rds dkvaa2013.xlsx dkvaa2017.xlsx irtposbyparty.rds municodelist.txt
d
Factors Affecting United States Geological Survey Irrigation Freshwater...
search.dataone.org
beta.hydroshare.org
+1more
Updated Dec 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Levi Manley (2023). Factors Affecting United States Geological Survey Irrigation Freshwater Withdrawal Estimates In Utah: PRISM Analysis Results and R Codes [Dataset]. https://search.dataone.org/view/sha256%3A4a8b3f77b51143a5d1f90ddaca426072477db8937941265e67db7bce8f083e08
Explore at:
Dataset updated
Dec 30, 2023
Dataset provided by
Hydroshare
Authors
J. Levi Manley
Time period covered
Jan 1, 1895 - Sep 30, 2020
Area covered

Description
This Resource serves to explain and contain the methodology, R codes, and results of the PRISM freshwater supply key indicator analysis for my thesis. For more information, see my thesis at the USU Digital Commons.

Freshwater availability in the state can be summarized using streamflow, reservoir level, precipitation, and temperature data. Climate data for this study have a period of record greater than 30 years, preferably extending beyond 1950, and are representative of natural conditions at the county-level.

Oregon State University, Northwest Alliance for Computational Science and Engineering PRISM precipitation and temperature gridded data are representative of statewide, to county-level, from 1895-2015. These data are available online from the PRISM Climate Group. Using the R ‘prism’ package, monthly PRISM 4km raster grids were downloaded. Boundary shapefiles of Utah state, and each county, were obtained online from the Utah Geospatial Resource Center webpage. Using the R ‘rgdal’ and ‘sp’ packages, these shapefiles were transformed from their native World Geodetic System 1984 coordinate system to match the PRISM BIL raster’s native North American Datum 1983 coordinate system. Using the R ‘raster’ package, medians of PRISM precipitation grids at each spatial area of interest were calculated and summed for water years and seasons. Medians were also calculated for PRISM temperature grids and averaged over water years and seasons. For analysis of single months, the median results were used for all PRISM indicators. Seasons were analyzed for the calendar year which they are in, Winter being the first season of each year. Freshwater availability key indicators were non-parametrically separated per temporal/spatial delineation into quintiles representing Very Wet/Very High/Hot (top 20% of values), Wet/High/Hot (60-80%), Moderate/Mid-level (40-60%), Dry/Low/Cool (20-40%), to Very Dry/Very Low/Cool (bottom 20%). Each quintile bin was assigned a rank value 1-5, with ‘5’ being the value of the top quintile, in preparation for the Kendall Tau-b correlation analysis. These results, along with USGS irrigation withdrawal and acreage data, were loaded into R. State-level quintile results were matched according to USGS report year. County quintile results were matched with corresponding USGS irrigation withdrawal and acreage county-level data per report year for all other areas of interest. Using the base R function cor(), with the “kendall” method selected (which is, by default, the Kendall Tau-b calculation), relationship correlation matrices were produced for all areas of interest. The USGS irrigation withdrawal and acreage data correlation analysis matrices were created using the R ‘corrplot’ package for all areas of interest.

See Word file for an Example PRISM Analysis, made by Alan Butler at the United States Bureau of Reclamation, which was used as a guide for this analysis.
m
Data and R Markdown Notebook for Pemahaman kuantitatif dasar dan...
bridges.monash.edu
researchdata.edu.au
+1more
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gede Primahadi Wijaya Rajeg; I Made Rajeg (2023). Data and R Markdown Notebook for Pemahaman kuantitatif dasar dan penerapannya dalam mengkaji keterkaitan antara bentuk dan makna [Dataset]. http://doi.org/10.26180/5c6e1160b8d8a
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.26180/5c6e1160b8d8a
Dataset updated
May 31, 2023
Dataset provided by
Monash University
Authors
Gede Primahadi Wijaya Rajeg; I Made Rajeg
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Here you can find the R Markdown Notebook, dataset, and other materials for an open-access paper (in Indonesian) on Linguistik Indonesia, the journal of the Linguistic Society of Indonesia (Masyarakat Linguistik Indonesia [MLI]) (cf. further below for how to run them in R):Paper citation:Rajeg, G. P. W., & Rajeg, I. M. (2019). Pemahaman kuantitatif dasar dan penerapannya dalam mengkaji keterkaitan antara bentuk dan makna. Linguistik Indonesia, 37(1), 13–31. http://ojs.linguistik-indonesia.org/index.php/linguistik_indonesia/article/view/87/83The post-print version after peer-review (without the journal's layout and paginating) is available at INA-Rxiv. This figshare repository is imported from its GitHub repo (see the Release page for versioning of the repo).If you use data and codes from this repository, please cite this repository via the dark pink Cite button. The default citation style is "DataCite".The paper introduces the basics of chi-square test as a significance test of independence with application on the study of form-meaning relationship in the lexical field for the word "hot" (i.e. panas) in Indonesian.To run the codes in the R Notebook, you need to have the latest version of R and RStudio installed. The codes in the Notebook also use the tidyverse and vcd R packages. To render the notebook into MS Word document, we use the bookdown package. So make sure these packages are installed in R.How to run the codes in the R Notebook1. Download this repository by clicking the Download button next to the Cite button.2. Unzip the file if it is not automatically unzipped. For macOS, the file is automatically unzipped into a folder that begins with gederajeg-pemahaman_kuantitatif_...3. Go to this folder and double-click the file with .Rproj extension (i.e. 2018 Oct - PANAS.Rproj). This will open up an RStudio session whose working directory is associated with the contents of this folder.4. Then, double-click on the panas_paper.Rmd that will be open in the RStudio session. This .Rmd file contains the main text and R codes of the published paper.5. Next, you can run all the codes in the file using shortcut ALT+Ctrl/Cmd+R. 6. After running all the codes, you can preview in RStudio the rendered filed as R Notebook, which is an html document. To do this, click the dropdown arrow within the Knit button in the open .Rmd file, select Preview Notebook, then click the Preview button.
f
Data_Sheet_6_“R” U ready?: a case study using R to analyze changes in gene...
frontiersin.figshare.com
docx
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_6_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s006
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s006
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
N
Atlas of white matter function O to R terms
neurovault.org
zip
Updated May 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Atlas of white matter function O to R terms [Dataset]. http://identifiers.org/neurovault.collection:7759
Explore at:
zipAvailable download formats
Unique identifier
https://identifiers.org/neurovault.collection:7759
Dataset updated
May 23, 2020
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A collection of 204 brain maps. Each brain map is a 3D array of values representing properties of the brain at different locations.

Collection description

This collection corresponds to the two sets of the atlas of white matter function (original A and replication B) derived from the brain disconnection of 1333 stroke participants. (O to R terms)
o
Introduction to Machine Learning using R: SVM & Unsupervised Learning
explore.openaire.eu
Updated Jan 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khuong Tran; Dr Anastasios Papaioannou (2021). Introduction to Machine Learning using R: SVM & Unsupervised Learning [Dataset]. http://doi.org/10.5281/zenodo.6423747
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6423747
Dataset updated
Jan 1, 2021
Authors
Khuong Tran; Dr Anastasios Papaioannou
Description
About this course Machine Learning (ML) is a new way to program computers to solve real world problems. It has gained popularity over the last few years by achieving tremendous success in tasks that we believed only humans could solve, from recognising images to self-driving cars. In this course, we will explore the fundamentals of Machine Learning from a practical perspective with the help of the R programming language and its scientific computing packages. Learning Outcomes Comprehensive introduction to Machine Learning models and techniques such as Support Vector Machine, K-Nearest Neighbor and Dimensionality Reduction. Know the differences between various core Machine Learning models. Understand the Machine Learning modelling workflows. Use R and its relevant packages to process real datasets, train and apply Machine Learning models Prerequisites Either Learn to Program: R and Data Manipulation in R or Learn to Program: R and Data Manipulation and Visualisation in R needed to attend this course. If you already have experience with programming, please check the topics covered in the Learn to Program: R, Data Manipulation in R and Data Manipulation and Visualisation in R and Introduction to ML using R: Introduction & Linear Regression courses to ensure that you are familiar with the knowledge needed for this course, such as good understanding of R syntax and basic programming concepts, familiarity with dplyr, tidyr and ggplot2 packages, and basic understanding of Machine Learning and Model Training. Maths knowledge is not required. There are only a few Math formula that you are going to see in this course, however references to Mathematics required for learning about Machine Learning will be provided. Having an understanding of the Mathematics behind each Machine Learning algorithms is going to make you appreciate the behaviour of the model and know its pros/cons when using them. Why do this course? Useful for anyone who wants to learn about Machine Learning but are overwhelmed with the tremendous amount of resources. It does not go in depth into mathematical concepts and formula, however formal intuitions and references are provided to guide the participants for further learning. We do have applications on real datasets! Machine Learning models are introduced in this course together with important feature engineering techniques that are guaranteed to be useful in your own projects. Give you enough background to kickstart your own Machine Learning journey, or transition yourself into Deep Learning. For a better and more complete understanding of the most popular Machine Learning models and techniques please consider attending all three Introduction to Machine Learning using R workshops: Introduction to Machine Learning using R: Introduction & Linear Regression Introduction to Machine Learning using R: Classification Introduction to Machine Learning using R: SVM & Unsupervised Learning Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.
Diversification and change in the R programming language
zenodo.org
search.dataone.org
+1more
bin, csv
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timothy Staples; Timothy Staples (2023). Diversification and change in the R programming language [Dataset]. http://doi.org/10.5061/dryad.h18931zrg
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.h18931zrg
Dataset updated
Mar 28, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Timothy Staples; Timothy Staples
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Languages change over time, driven by creation of new words and cultural pressure to optimise communication. Programming languages resemble written language but communicate primarily with computer hardware rather than a human audience. I tested for changes over time in use of R, a mature, open-source programming language used for scientific computing. Across 393,142 GitHub repositories published between 2014 and 2021, I extracted 143,409,288 R functions, programming "verbs", and paired linguistic and ecological approaches to estimate change in the diversity and composition of function use over time. I found that the number of R functions in use increased and underwent substantial change, driven primarily by the popularity of the "tidyverse" collection of community-written extensions. I provide evidence that users can directly change the nature of programming languages, with patterns that match known processes from natural languages and genetic evolution. In the case of R, patterns suggested there are selective pressures for increased analytic complexity and R functions in decline but not extinct ("extinction debts"). R's evolution towards the tidyverse may also represent the start of a division into two distinct dialects, which may impact the readability and continuity of analytic and scientific inquiries codified in R, as well as the language's future.
d
qfasar: Quantitative Fatty Acid Signature Analysis in R
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). qfasar: Quantitative Fatty Acid Signature Analysis in R [Dataset]. https://catalog.data.gov/dataset/qfasar-quantitative-fatty-acid-signature-analysis-in-r
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
An implementation of Quantitative Fatty Acid Signature Analysis (QFASA) in R. QFASA is a method of estimating the diet composition of predators. The fundamental unit of information in QFASA is a fatty acid signature (signature), which is a vector of proportions describing the fatty acid composition of adipose tissue. Signature data from at least one predator and from samples of all potential prey types are required. Calibration coefficients, which adjust for the differential metabolism of individual fatty acids by predators, are also required. Given those data inputs, a predator signature is modeled as a mixture of potential prey signatures and its diet estimate is obtained as the mixture that minimizes a measure of distance between the observed and modeled signatures. A variety of estimation options, goodness-of-fit diagnostic procedures to assess the suitability of estimates, and simulation capabilities are implemented. Please refer to the package vignette and the documentation files for individual functions for details and references.
d
Health and Retirement Study (HRS)
search.dataone.org
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ELEKOY
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
H
Replication Data for: Revisiting 'The Rise and Decline' in a Population of...
dataverse.harvard.edu
search.dataone.org
Updated May 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathan TeBlunthuis; Aaron Shaw; Benjamin Mako Hill (2020). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/SG3LP1
Dataset updated
May 5, 2020
Dataset provided by
Harvard Dataverse
Authors
Nathan TeBlunthuis; Aaron Shaw; Benjamin Mako Hill
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.7910/DVN/SG3LP1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.7910/DVN/SG3LP1
Description
This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the intermediate RDS files and all.edits.RDS files do not exist in the working directory. all.edits.RDS is generated from the tsv files generated by wikiq. This may take several hours. By default building the dataset will...
r
Data from: INDILACT – Extended voluntary waiting period in primiparous dairy...
researchdata.se
Updated Mar 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Edvardsson Rasmussen (2025). INDILACT – Extended voluntary waiting period in primiparous dairy cows. Part 2: Customized VWP – Metadata and R–scripts with statistical calculations [Dataset]. https://researchdata.se/en/catalogue/dataset/2024-424
Explore at:
(108812), (428272), (22393), (5799), (3243), (23818), (5034), (747913), (7170)Available download formats
Dataset updated
Mar 13, 2025
Dataset provided by
Swedish University of Agricultural Sciences (SLU)
Authors
Anna Edvardsson Rasmussen
Time period covered
Jan 1, 2019 - Oct 27, 2022
Area covered
Sweden
Description
This is part 2 of INDILACT, part 1 is published separately.

The objective of this study is to investigate how a customized voluntary waiting period before first insemination in prmiparous dairy cows would affect milk production, fertility and health of primparous dairy cows during their first calving interval.

The data was registered between January 2019 and october 2022.

This data is archived: - Metadata (publically available) - Raw data (.txt files) from the Swedish national herd recording scheme (SNDRS), operated by Växa Sverige: access restricted due to agreements with the principle owners of the data, Växa Sverige and the farms. Code lists are available in INDILACT part 1. - Aggregated data (Excel files): access restricted due to agreements with the principle owners of the data, Växa Sverige and the farms - R- scripts with statistical calculations (Openly available)

Metadata (3 filer): - Metadata gentypning: The only new file type compared to INDILACT Part 1, description of how this data category have been handled. The other file-types have been handled in the same way as in INDILACT Part 1. - Metadata - del 2 - General summary of initioal data handeling for aggregation of the files of the same types (dates etc.) to create excel-files used in the R-scripts. - DisCodes: Divisions of the diagnoses into categories.

Raw data: -59 .txt files containing data retrieved from SNDRS from 8 separate occacions. -Data from 18 Swedish farms from Jan 2019 to Oct 2022.

Aggregeated data: - 29 Excelfiles. The textfiles have been transformed to Excel formate and all data from the same file type is aggregated into one file. - Data collected from the farms by email and phone contact, about individual cows enrolled in the trial, from Oct 2020 to Oct 2022. - One merged Script derived from initial data handeling in R where relevant variables were calculated and aggregated to be used for statistical calculations.

R-script with data handeling and statistical calculations: - "Data analysis part 2 - final": Data handeling to create the file used in the statistical calculations. - "Part 2 - Binomial models - Fertility": Statistiscal calculations of variables using Binomial models. - "Part 2 - glmmTMB models - Fertility": Statistiscal calculations of variables using glmmTMB models. - "Part 2 - linear models - Fertility": Statistiscal calculations of fertility variables using linear models. - "Part 2 - linear models": Statistiscal calculations of milk variables using linear models.

Running the R scripts requires access to the restricted files. The files should be unpacked in a subdirectory "data" relative to the working directory for the scripts. See also the file "sessionInfo.txt" for information on R packages used.
W
GAL Predictions of receptor impact variables v01
cloud.csiss.gmu.edu
researchdata.edu.au
+1more
zip
Updated Dec 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australia (2019). GAL Predictions of receptor impact variables v01 [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/67e0aec1-be25-46f5-badc-b4d895a934aa
Explore at:
zipAvailable download formats
Dataset updated
Dec 13, 2019
Dataset provided by
Australia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

Receptor impact models (RIMs) are developed for specific landscape classes. The prediction of Receptor Impact Variables is a multi-stage process. It relies on the runs from surface water and groundwater models at nodes within the analysis extent. These outputs derive directly from the hydrological model. For a given node, there is a value for each combination of hydrological response variable, future, and replicate or run number. Not all variables may be available or appropriate at every node. This differs to the quantile summary information that is otherwise used to summarise the HRV output and is also registered.

Dataset History

There is a key look up table (Excel file) that lists the assessment units (AUIDs) by landscape class (or landscape group if appropriate) and notes that groundwater modelling node and runs, and the surface water modelling node and runs, that should be used for that AUID. In some cases the AUID is only mapped to one set of hydrological modelling output. This look up table represent the AUIDs that require RIV predictions. For NAM and GAL there is a single look up table. For GLO and HUN surface and GW are provided separately.

Receptor impact models (RIMs) are developed for specific landscape classes. The hydrological response variables that a RIM within a landscape class requires are organised by the R script RIM_Prediction_CreateArray.R into an array. The formatted data is available as an R data file format called RDS and can be read directly into R.

The R script IMIA_XXX_RIM_predictions.R applies the receptor model functions (RDS object as part of Data set 1: Ecological expert elicitation and receptor impact models for the XXX subregion) to the HRV array for each landscape class (or landscape group) to make predictions of receptor impact varibles (RIVs). Predictions of a receptor impact from a RIM for a landscape class are summarised at relevant AUIDs by the 5th through to the 95th percentiles (in 5% increments) for baseline and CRDP futures. These are available in the XXX_RIV_quantiles_IMIA.csv data set. RIV predictions are further summarised and compared as boxplots (using the R script boxplotsbyfutureperiod.R) and as (aggregated) spatial risk maps using GIS.

Dataset Citation

Bioregional Assessment Programme (2018) GAL Predictions of receptor impact variables v01. Bioregional Assessment Derived Dataset. Viewed 10 December 2018, http://data.bioregionalassessments.gov.au/dataset/67e0aec1-be25-46f5-badc-b4d895a934aa.

Dataset Ancestors

Derived From Queensland wetland data version 3 - wetland areas.

Derived From Geofabric Surface Cartography - V2.1

Derived From Landscape classification of the Galilee preliminary assessment extent

Derived From Geofabric Surface Cartography - V2.1.1

Derived From GAL Landscape Class Reclassification for impact and risk analysis 20170601

Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)

Derived From Queensland groundwater dependent ecosystems

Derived From GEODATA TOPO 250K Series 3

Derived From Multi-resolution Valley Bottom Flatness MrVBF at three second resolution CSIRO 20000211

Derived From Landscape classification of the Galilee preliminary assessment extent

Derived From Biodiversity status of pre-clearing and remnant regional ecosystems - South East Qld

Facebook

Twitter

Click to copy link

Link copied

Cite

Paul Murrell (2018). Importing General-Purpose Graphics in R [Dataset]. http://doi.org/10.17608/k6.auckland.7108736.v1

Data from: Importing General-Purpose Graphics in R

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.17608/k6.auckland.7108736.v1

Dataset updated

Sep 19, 2018

Dataset provided by

The University of Auckland

Authors

Paul Murrell

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This report discusses some problems that can arise when attempting to import PostScript images into R, when the PostScript image contains coordinate transformations that skew the image. There is a description of some new features in the ‘grImport’ package for R that allow these sorts of images to be imported into R successfully.

Clear search

Close search

Google apps

Main menu

Data from: Importing General-Purpose Graphics in R

Data from: Working with a linguistic corpus using R: An introductory note...

Collision Analysis with R

Activity In R

Dataset

Contents

datasets

Replication data and code for analyses in R presented in: Volcanic climate...

Data_Sheet_5_“R” U ready?: a case study using R to analyze changes in gene...

Scripts to run R-QWTREND models and produce results.

Replication Data for: Reining in the Rascals: Challenger Parties' Path to...

Factors Affecting United States Geological Survey Irrigation Freshwater...

Data and R Markdown Notebook for Pemahaman kuantitatif dasar dan...

Data_Sheet_6_“R” U ready?: a case study using R to analyze changes in gene...

Atlas of white matter function O to R terms

Collection description

Introduction to Machine Learning using R: SVM & Unsupervised Learning

Diversification and change in the R programming language

qfasar: Quantitative Fatty Acid Signature Analysis in R

Health and Retirement Study (HRS)

Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

Data from: INDILACT – Extended voluntary waiting period in primiparous dairy...

GAL Predictions of receptor impact variables v01

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

Data from: Importing General-Purpose Graphics in R