40 datasets found
  1. f

    Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    figshare
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  2. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  3. J & r designs studio inc USA Import & Buyer Data

    • seair.co.in
    Updated Oct 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2018). J & r designs studio inc USA Import & Buyer Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Oct 4, 2018
    Dataset provided by
    Seair Info Solutions
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  4. m

    ShinyFMBN, a Shiny app to access FoodMicrobionet

    • data.mendeley.com
    Updated Sep 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eugenio Parente (2022). ShinyFMBN, a Shiny app to access FoodMicrobionet [Dataset]. http://doi.org/10.17632/8fwwjpm79y.7
    Explore at:
    Dataset updated
    Sep 2, 2022
    Authors
    Eugenio Parente
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This data set contains the ShinyFMBN app and the FoodMicrobionet database, containing metataxonomic data for bacterial communities of foods and food environments. Learn more at https://www.sciencedirect.com/science/article/pii/S0168160522001684 The ShinyFMBN app allows you to access FoodMicrobionet 4.2, a repository of data on food microbiome studies. To run the app you need to install R and R Studio. Data are available in both R (.rds) and .xlsx format (see below).

    This compressed folder contains: a. folder R_lists: contains two .rds files containing all data in FoodMicrobionet 4.1.2. FMBN.rds is in a format usable with ShinyFMBN 2.4 (see below) while FMBN_plus.rds contains all tables and fields and is best accessed using custom R scripts (see https://github.com/ep142/ for examples). b. folder xlsx_files: contains all FoodMicrobionet tables in MS Excel format. These files may be useful because the locale of each given system may affect the way some fields containing accented letters are handled during the import of text files. c. folder shiny_FMBN_2_4_3: contains the app folder, the runShinyFMBN_2_4_3.R script (a R script to install all needed packages and run the app) and the app manual in .html format d. FMBNtablespecs_4_2.html describes the table specifications

  5. U

    Tidewater goby and estuarine fish records from seining, qPCR and...

    • data.usgs.gov
    • data.niaid.nih.gov
    • +4more
    Updated Jul 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K.D. Lafferty (2024). Tidewater goby and estuarine fish records from seining, qPCR and metabarcoding data for Southern California Estuaries in 2023 [Dataset]. http://doi.org/10.25349/D9P60T
    Explore at:
    Dataset updated
    Jul 29, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    K.D. Lafferty
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Aug 16, 2022
    Area covered
    Southern California, California
    Description

    This data archive includes R code and data for reproducing the analyses and figures in Lafferty, Metabarcoding is (usually) more cost effective than seining or qPCR for detecting tidewater gobies and other estuarine fishes.

    To view the supplementary tables, open the Fig&TableSuppl.docx file. This file also includes the manuscript figures and tables and some explanatory text about how to generate them. To reproduce the figures, open the Fig&TableCode.Rmd in R studio and be sure the needed csv files included in the Dryad repository are in the working directory. The data files include more information than used in the analyses and can be used for other purposes. The code is not software, nor is it intended as an R package, but the code is annotated so others can understand and manipulate it. For each CSV file there is an associated metadata file that defines entries and columns and an information file that contains an abstract and ownership information. One of the data file ...

  6. Z

    Data from: Compositional Data Analysis (CoDA) of Clinopyroxene from Abyssal...

    • data.niaid.nih.gov
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Szilas, K. (2024). Compositional Data Analysis (CoDA) of Clinopyroxene from Abyssal Peridotites [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6791965
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Morishita, T.
    Szilas, K.
    Nishio, I.
    Itano, K.
    Tamura, A.
    Waterton, P.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional supporting information includes data, R script, and QGIS file supporting the main text:

    CSV (Data Set)

    residual_abyssal_peridotites.csv: Compilations of residual abyssal peridotites (n = 1162) and depleted MORB-mantle (n = 1)

    residual_abyssal_peridotites_coda_results.csv: Filtered data and results of PCA and k-means clustering (n = 267)

    model_cpx.csv: Clinopyroxene compositions obtained by open-system melting model

    test.csv: csv file for testing new data

      R
    

    abyssal_cpx_pca.Rproj

    coda.R: R script implemented in this study

    test_your_data.R: R script to test new data comparing to abyssal clinopyroxenes

    and modeled clinopyroxenes

    QGIS

    residual_abyssal_peridotites.qgz: QGIS using residual_abyssal_peridotites.csv and residual_abyssal_peridotites_coda_results.csv for Figure 1 and Figure S7

    color_etopo1_ice_low_modified.tiff: ETOPO1 is a 1 arc-minute global relief model of Earth's surface that integrates land topography and ocean bathymetry from NOAA

    1. Instruction

    We prepared an R script to compare new (your) clinopyroxene data with clinopyroxene from abyssal peridotites. New data will be plotted using the principal components derived from the natural clinopyroxene database presented in this paper.

    The procedure is as follows:

    1. add data below the second row in test.csv
    2. Do not change the file name
    3. Do not change the first row
    4. Add clinopyroxene data (10 elements) and its label replacing under 2nd row * Label of data can be sample name, lithology, locality etc.

    5. Open abyssal_cpx_pca.Rproj by R studio (double click) 3. Open test_your_data.R (double click)

    6. Implement test_your_data.R.

    To use test_your_data.R, first press cmd+A (ctrl+A) and press Run/cmd+enter (ctrl+enter).

    1. Results files 5-1. abyssalcpx_vs_test.csv: PC1&PC2 values using abyssal clinopyroxene PC coordinates 5-2. plot1.pdf: abyssal clinopyroxene (cluster) vs. test data plot 5-3. plot2.pdf: modeled clinopyroxene vs. test data plot 5-4. spider_cl1.pdf: PM-normalized trace elements patterns of cluster 1 from abyssal peridotites 5-5. spider_cl2.pdf: PM-normalized trace elements patterns of cluster 2 from abyssal peridotites 5-6. spider_cl3.pdf: PM-normalized trace elements patterns of cluster 3 from abyssal peridotites 5-7. spider_cl4.pdf: PM-normalized trace elements patterns of cluster 4 from abyssal peridotites 5-8. spider_test.pdf: PM-normalized trace elements patterns of new data (your data) 5-9. plot3.pdf: Discrimination diagram for clinopyroxene trace elements compositions. PM normalized Sr/Nd ratio vs. Ce/Yb ratio of clinopyroxenes from abyssal peridotites vs. test data
  7. m

    ANN dual-fluid PV/T data and code in R programming - Architecture of the...

    • data.mendeley.com
    Updated Mar 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hasila Jarimi (2021). ANN dual-fluid PV/T data and code in R programming - Architecture of the Artificial Neural Network [Dataset]. http://doi.org/10.17632/gxxszgy85t.1
    Explore at:
    Dataset updated
    Mar 8, 2021
    Authors
    Hasila Jarimi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides the artificial neural network architecture for a dual-fluid photovoltaic thermal (PV/T) collector which was experimentally tested in the outdoor environment of Malaysia. The system was set up and tested in three modes, which are (i) air mode, (ii) water mode and (iii) simultaneous mode. For modes (i), (ii) and (iii) air flows through the cooling channels, water flows through the cooling channels and both air and water flow together.

    To create this dataset, the following steps were carried out:

    1. Select input variables: 5 data inputs were selected, which are Ambient temperature, wind speed, solar irradiance, inlet air temperature and inlet water temperature.
    2. Select Algorithm: for training, the Backpropagation neural network (BPNN) was used.
    3. Select output variables: 6 data output were selected, which are PV surface temperature, PV temperature, temperature of the back plate, the temperature of the outlet air and outlet water, in addition to the electrical efficiency.

    Step 1: Import the data Step 2: Normalize the data Step 3: Split the dataset into training and testing data Step 4: Create the NN model in R studio.

    The package 'Neuralnet' in the R programming language was used. The coding in R studio is provided in the attached file.

  8. Z

    Research data supporting "Considerations for Implementing Electronic...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jul 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stevens, MM (2024). Research data supporting "Considerations for Implementing Electronic Laboratory Notebooks in an Academic Research Environment" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5012728
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Nogiwa-Valdez, AA
    Higgins, SG
    Stevens, MM
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Research data supporting the publication:

    Higgins SG, Nogiwa-Valdez AA, Stevens MM, Considerations for Implementing Electronic Laboratory Notebooks in an Academic Research Environment, Nature Protocols, 2021.

    This repository contains the raw survey data of 172 current and historic electronic laboratory notebook (ELN) software packages.

    Main files:

    "ELN_Review_Higgins_2021_Survey.csv" = raw survey data in 'tidy' data format

    "ELN_Review_Higgins_2021.Rmd" = an R Markdown File (R Notebook) that takes the survey data as input and produces summary statistics and plots. This file was written using R Studio as the IDE.

    Derived files, generated from those above:

    "ELN_Review_Higgins_2021.nb.html" = a self-contained HTML file that is automatically generated by R Studio, based on the markdown file. This can be opened in any web browser to allow manual inspection of the code and comments without the need for specialist software. Embedded within this file is also the original markdown script (i.e. a copy of the code in "ELN_Review_Higgins_2021.Rmd")

    "ELN_Review_Higgins_2021_Lifetimes_Interactive_Figure1.html" = an HTML file generated by the script above via the plotly package. It contains an interactive version of the ELN survey data, allowing the user to hover over the timeline and explore the data.

    "ELN_Review_Higgins_2021_Timeline.pdf" = static version of ELN timeline, used to generate figure in main manuscript.

    "ELN_Review_Higgins_2021_Releases-Per-Year.pdf" = static version of number of new ELNs per year, used to generate figure in main manuscript.

    This survey was generated from a mixture of primary and secondary sources (see references for secondary sources).

  9. g

    Tidewater goby and estuarine fish records from seining, qPCR and...

    • gimi9.com
    Updated Feb 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Tidewater goby and estuarine fish records from seining, qPCR and metabarcoding data for Southern California Estuaries in 2023 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_tidewater-goby-and-estuarine-fish-records-from-seining-qpcr-and-metabarcoding-data-for-sou/
    Explore at:
    Dataset updated
    Feb 2, 2024
    Description

    This data archive includes R code and data for reproducing the analyses and figures in Lafferty, Metabarcoding is (usually) more cost effective than seining or qPCR for detecting tidewater gobies and other estuarine fishes. To view the supplementary tables, open the Fig&TableSuppl.docx file. This file also includes the manuscript figures and tables and some explanatory text about how to generate them. To reproduce the figures, open the Fig&TableCode.Rmd in R studio and be sure the needed csv files included in the Dryad repository are in the working directory. The data files include more information than used in the analyses and can be used for other purposes. The code is not software, nor is it intended as an R package, but the code is annotated so others can understand and manipulate it. For each CSV file there is an associated metadata file that defines entries and columns and an information file that contains an abstract and ownership information. One of the data files required to reproduce the analyses (Schmelzle&Kinziger_occupancy.csv) was created from previously published data and was not produced by the author. Please cite it as: Schmelzle, Molly C., Kinziger, Andrew P. 2015. Data from: Using occupancy modeling to compare environmental DNA to traditional field methods for regional-scale monitoring of an endangered aquatic species. Dryad. 6rs23

  10. f

    Hemoglobin input dataset for analysis in R Studio.

    • plos.figshare.com
    xlsx
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasia Meckler; Sebastian Künert; Leonardo Poggi; Julia Jeske; Lukas Schipper; Thanusiah Selvamoorthy; Felix Nensa; Bernadette Hosters; Michael Fabian Berger; Ramsi Siaj; Mario Vincent Roser (2025). Hemoglobin input dataset for analysis in R Studio. [Dataset]. http://doi.org/10.1371/journal.pone.0325072.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 28, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Anastasia Meckler; Sebastian Künert; Leonardo Poggi; Julia Jeske; Lukas Schipper; Thanusiah Selvamoorthy; Felix Nensa; Bernadette Hosters; Michael Fabian Berger; Ramsi Siaj; Mario Vincent Roser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Column A: Binary classification of data based on laboratory values from Column B (cut-off value = 0: 0 = 0, > 0 = 1); Column B: Laboratory values; Column C: Randomized patient numbers; Columns E–H: Wavelengths with corresponding spectral data. (XLSX)

  11. d

    Replication Data for: Responsiveness of decision-makers to stakeholder...

    • dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei, Yuxuan (2023). Replication Data for: Responsiveness of decision-makers to stakeholder preferences in the European Union legislative process [Dataset]. http://doi.org/10.7910/DVN/RH5H3H
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Lei, Yuxuan
    Area covered
    European Union
    Description

    This dataset contains original quantitative datafiles, analysis data, a codebook, R scripts, syntax for replication, the original output from R studio and figures from a statistical program. The analyses can be found in Chapter 5 of my PhD dissertation, i.e., ‘Political Factors Affecting the EU Legislative Decision-Making Speed’. The data supporting the findings of this study are accessible and replicable. Restrictions apply to the availability of these data, which were used under license for this study. The datafiles include: File name of R script: Chapter 5 script.R File name of syntax: Syntax for replication 5.0.docx File name of the original output from R studio: The original output 5.0.pdf File name of code book: Codebook 5.0.txt File name of the analysis data: data5.0.xlsx File name of the dataset: Original quantitative data for Chapter 5.xlsx File name of the dataset: Codebook of policy responsiveness.pdf File name of figures: Chapter 5 Figures.zip Data analysis software: R studio R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin17.0 (64-bit)

  12. q

    Thinking deeply about quantitative analysis: Building a Biologist's Toolkit

    • qubeshub.org
    Updated Aug 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Bray; Paul Duffin; James Wagner (2021). Thinking deeply about quantitative analysis: Building a Biologist's Toolkit [Dataset]. http://doi.org/10.24918/cs.2016.4
    Explore at:
    Dataset updated
    Aug 26, 2021
    Dataset provided by
    QUBES
    Authors
    Sarah Bray; Paul Duffin; James Wagner
    Description

    Vision and Change in Undergraduate Biology Education encouraged faculty to focus on core concepts and competencies in undergraduate curriculum. We created a sophomore-level course, Biologists' Toolkit, to focus on the competencies of quantitative reasoning and scientific communication. We introduce students to the statistical analysis of data using the open source statistical language and environment, R and R Studio, in the first two-thirds of the course. During this time the students learn to write basic computer commands to input data and conduct common statistical analysis. The students also learn to graphically represent their data using R. In a final project, we assign students unique data sets that require them to develop a hypothesis that can be explored with the data, analyze and graph the data, search literature related to their data set, and write a report that emulates a scientific paper. The final report includes publication quality graphs and proper reporting of data and statistical results. At the end of the course students reported greater confidence in their ability to read and make graphs, analyze data, and develop hypotheses. Although programming in R has a steep learning curve, we found that students who learned programming in R developed a robust strategy for data analyses and they retained and successfully applied those skills in other courses during their junior and senior years.

  13. d

    Replication Data for:Influence of different stakeholders on the duration of...

    • search.dataone.org
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei, Yuxuan (2023). Replication Data for:Influence of different stakeholders on the duration of legislative decision-making in the European Union [Dataset]. http://doi.org/10.7910/DVN/VGAQIO
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Lei, Yuxuan
    Area covered
    European Union
    Description

    This dataset contains original quantitative datafiles, analysis data, a codebook, R scripts, syntax for replication, the original output from R studio and figures from a statistical program. The analyses can be found in Chapter 2 of my PhD dissertation, i.e., ‘Political Factors Affecting the EU Legislative Decision-Making Speed’. The data supporting the findings of this study are accessible and replicable. Restrictions apply to the availability of these data, which were used under license for this study. The datafiles include: File name of R script: Chapter 2 script.R File name of syntax: Syntax for replication 2.0.docx File name of the original output from R studio: The original output 2.0.pdf File name of code book: Codebook 2.0.txt File name of the analysis data: data2.1.xlsx File name of the dataset: Original quantitative data for Chapter 2.xlsx File name of figures: Chapter 2 Figures.zip

  14. e

    China - Overseas Finance Inventory Database - Dataset - ENERGYDATA.INFO

    • energydata.info
    Updated Mar 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). China - Overseas Finance Inventory Database - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/china-overseas-finance-inventory-database
    Explore at:
    Dataset updated
    Mar 22, 2022
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    The COFI database includes power-generation projects in Belt and Road Initiative (BRI) countries financed by Chinese corporations and banks that reached financial closure from 2000 to 2020. Types of financing include debt and equity investment, with the latter including greenfield foreign direct investments (FDI) and cross-border mergers and acquisitions (M&As). COFI is consolidated using nine source databases using both automated join method in R Studio, and manual joining by analysts. The database includes power plant characteristics data and investment detail data. It captures 430 power plants in 76 BRI countries, including 220 equity investment transactions and 253 debt investment transactions made by Chinese investors. Key data points for financial transactions in COFI include the financial instrument (equity or debt), investor name, amount, and financial close year. Key technical characteristics tracked for projects in COFI include name, installed capacity, commissioning year, country, and primary fuel type. This project is a collaboration among the Boston University Global Development Policy Center, the Inter-American Dialogue, the China-Africa Research Initiative at the Johns Hopkins University (CARI), and the World Resources Institute (WRI). The detailed methodology is given in the World Resources Institute publication “China Overseas Finance Inventory”.

  15. r

    Data on how honeybee host brood traits influence Varroa destructor...

    • researchdata.se
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Scaramella; Ashley Burke; Melissa Oddie; Barbara Locke (2024). Data on how honeybee host brood traits influence Varroa destructor reproduction [Dataset]. http://doi.org/10.5878/znc2-9b12
    Explore at:
    (4974), (7698), (6945), (3942), (8397), (2887), (3139), (3690), (371), (3724)Available download formats
    Dataset updated
    Mar 26, 2024
    Dataset provided by
    Swedish University of Agricultural Sciences
    Authors
    Nicholas Scaramella; Ashley Burke; Melissa Oddie; Barbara Locke
    Time period covered
    Jun 2019 - Sep 2021
    Description

    The data set was collected in Uppsala Sweden between 2019 and 2021. Hives were established using varroa resistant queens from Oslo, Norway (n = 3), Gotland Sweden, (n = 5), and Avignon, France (n = 4), with a varroa susceptible population from Uppsala, Sweden (n = 5) as control. All hives were located at the SLU Lövsta research station (GPS Coordinates: 59° 50’ 2.544”N, 17° 48’ 47.447”E). Varroa destructor mite reproductive success was measured on frames with adult honeybee workers exposed to, and excluded from access to honeybee larvae. Excluders were added directly after brood capping, and frames were dissected nine days later. Cell caps were removed using a scalpel with the pupae and mite families carefully removed from the cell using forceps and a fine paint brush. Mite reproductive success calculated by counting successful reproduction attempts, which was defined as a mite that successfully produced one male, and at least one female offspring. If a mite did not meet this requirement, it was considered a failed reproduction attempt and the reason for failure was documented. All data was analyzed in R version 4.0.1 using R Studio 1.3.959. A linear mixed-effect model was used with mite reproductive success as the response variable, population origin and excluder treatment as independent variables, with colony and year as random effect variables to compare treatments within each population as well as fecundity. Least-square means of the model were used to compare treatments between individual populations.

    Scaramella_et_al_2023_Data.tsv - Data set consists of 34 rows and 21 columns. Colony demographics, and designated treatment are listed. All data collected are count data and are explained in more detail in read me file. R script used in analysis is attached. It is split into two sections, with the first being used for statistical analysis, and the second used for plot creations used in the paper. Sections defined by title SECTION 1 - ANALYSIS and SECTION 2 - PLOTs

    The output Scaramella_et_al_2023_Analysis_Code_log.txt and plot file Rplots.pdf can, provided that the script is in the same directory as the data files and needed R packages are installed (see sessionInfo.txt), be reproduced by running: Rscript Scaramella_et_al_2023_Analysis_Code.R >
    Scaramella_et_al_2023_Analysis_Code_log.txt

    Scaramella_et_al_2023_Bar_Graph_Data.tsv - Data set consisting of 8 rows & 5 columns. Colony demographics, and designated treatment are listed. All data generated from the count data in Scaramella_et_al_2023_Data.tsv and are explained in more detail in read me file.

    Scaramella_et_al_2023_Stacked_Bar_Graph_Data.tsv - Data set consisting of 102 rows & 8 columns. Colony demographics, and designated treatment are listed. All data is Scaramella_et_al_2023_Data.tsv restructured to include reason failed as a column. The data is explained in more detail in read me file.

  16. m

    Data for Meta Analysis - Ebook Language Learning

    • data.mendeley.com
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Virgiawan Listanto (2024). Data for Meta Analysis - Ebook Language Learning [Dataset]. http://doi.org/10.17632/8757gxmwzx.1
    Explore at:
    Dataset updated
    Sep 16, 2024
    Authors
    Virgiawan Listanto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This spreadsheet contains data from studies on e-books and English language learning. The data is from reliable sources in the Scopus database. It includes details like sample sizes, means, and standard deviations for both control and experimental groups. We include the algorithm that we use in the R studio software that we use.

  17. data -- Fruiting phenology patterns, Nyungwe National Park, Rwanda

    • figshare.com
    txt
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillip Dugger; Beth A. Kaplin; Norbert J. Cordeiro; Mediatrice Bana (2025). data -- Fruiting phenology patterns, Nyungwe National Park, Rwanda [Dataset]. http://doi.org/10.6084/m9.figshare.24020898.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Phillip Dugger; Beth A. Kaplin; Norbert J. Cordeiro; Mediatrice Bana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Rwanda
    Description

    Data files (.csv) used in study of fruiting phenology patterns in Nyungwe National Park, Rwanda from 1996-2019. Datasets include climate variables (rain, irradiance, minimum and maximum temperatures, and ENSO index), fruiting phenology data, and GIS locations of study sites. Data are organized for use in statistical analyses using the R computational language.Instructions for use in R Project:We strongly suggest the creation of an R Project file in R Studio to use the scripts and data contained in this repository.Data files should be stored in a folder named "data", in the same directory as the R Project file. This will ensure that R scripts for loading data folders are accessing the correct directory.Script files should be stored in another folder in the same directory as the R Project file (suggested folder name: "scripts").

  18. d

    Climate Hazards Data Integration and Visualization for the Climate...

    • search.dataone.org
    • datadryad.org
    Updated Jun 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Curtin; Liane Chen; Hazel Vaquero; Kristina Glass (2024). Climate Hazards Data Integration and Visualization for the Climate Adaptations Solutions Accelerator through School-Community Hubs [Dataset]. http://doi.org/10.5061/dryad.1jwstqk3g
    Explore at:
    Dataset updated
    Jun 22, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Charles Curtin; Liane Chen; Hazel Vaquero; Kristina Glass
    Time period covered
    Jun 4, 2024
    Description

    Community engagement in planning is essential for effective and just climate adaptation. However, historically underserved communities are often difficult to reach through traditional means of soliciting public input. The Climate Adaptation Solutions Accelerator (CASA) through School-Community Hubs project identifies public schools as promising sites for building both community engagement and community capacity for climate adaptation. To serve in this role, schools need information about the intersecting threats climate change poses to the communities they serve. The Climate Hazard Dashboard for California Schools is a platform that maps the current and future risks associated with five climate hazards, including wildfire, extreme heat days, wildfire extreme precipitation, flooding, and sea level rise, for the nearly 10,000 public schools serving Kindergarten through Grade 12 students in California. Each hazard is mapped and visualized at the school level, providing an accessible way fo..., Data for extreme heat and extreme precipitation were retrieved using API requests from the caladaptr package. The data retrieved to calculate extreme heat days were historical observed daily maximum temperature for 1961-2005 and projected daily maximum temperature for 2006-2064. The data retrieved to calculate extreme precipitation days were historical observed daily precipitation totals for 1961-2005 and projected daily precipitation totals for 2006-2064. Data for wildfire, flooding, and sea level rise were downloaded directly from their sources and stored in a remote server for use. All data were processed in R Studio using Quarto Docs. Tabular data for extreme heat and precipitation first used the retrieved historical data to calculate a threshold value to classify an extreme event. The threshold was determined to be the 98th percentile value of observed historical data for California. For extreme heat, this is 98°F. For extreme precipitation, this is 0.73 inches. Then, projected dai..., , ---

    editor_options: markdown:

    wrap: 72

    This README.txt file was generated on 2024-05-23 by Liane Chen, Charlie Curtin, Kristina Glass, and Hazel Vaquero. It is associated with the data archival on this project through Dryad. To view the data archival and download datasets, please visit https://doi.org/10.5061/dryad.1jwstqk3g.

    Recommended citation:

    Curtin, Charles; Glass, Kristina; Chen, Liane; Vaquero, Hazel (Forthcoming 2024). Climate Hazards Data Integration and Visualization for the Climate Adaptations Solutions Accelerator through School-Community Hubs [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqk3g

    GENERAL INFORMATION

    1. Title of the Project: Climate Adaptation Solutions Accelerator through School Community Hubs (alias CASAschools)

    2. Author Information

    A. Principal Investigator Contact Information

    Name: Liane Chen, Charlie Curtin, Kristina Glass, and Hazel Vaque...

  19. r

    Data from: Biogeographical variation in diurnal behaviour of Acanthaster...

    • researchdata.edu.au
    Updated Jun 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burn Deborah; Deborah Anne Burn (2020). Biogeographical variation in diurnal behaviour of Acanthaster planci versus Acanthaster cf. solaris [Dataset]. http://doi.org/10.25903/5E37C3142EF11
    Explore at:
    Dataset updated
    Jun 17, 2020
    Dataset provided by
    James Cook University
    Authors
    Burn Deborah; Deborah Anne Burn
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Feb 25, 2017 - Mar 29, 2017
    Area covered
    Description

    This data set contains Crown of Thorns Starfish (Acanthaster planci and Acanthaster cf. solaris) behavioural data collected at Lankanfushi Island in the Maldives, and at Rib Reef on the Great Barrier Reef, Australia. The data is deposited here to accompany the Open Access publication from the Related Publications link below. Here, we include information on all individual starfish counted during surveys at different times of day (including at night) at both locations. Information provided includes location, date, time and depth at which each individual was found, as well as the maximum diameter of each individual and the behaviour each individual was exhibiting. More specifically, whether the starfish was hidden or exposed, if the individual was exhibiting resting, moving or feeding behaviour, and the prey items of those feeding is noted within this data set. Also included are point intercept coral cover data for each transect at each location as well as the R script used to analyse the data within the aforementioned publication.

    The dataset consists of the following files:

    • Burnetal.R and Burnetal.txt - R scripts used for analysis (open in R or R Studio) and plain text (.txt) formats
    • COTSmovement.csv – All data collected for individual Crown of Thorns starfish at both locations. Includes diameter and behavioural information.
    • COTS_PITRIB.csv - point intercept coral cover (Rib Reef)
    • Maldives_PIT.csv - point intercept coral cover (Maldives)

    The full methodology will be available in the Open Access publication from the Related Publications link below.

  20. U

    Groundwater arsenic data and ASCII grids for predicting elevated arsenic in...

    • data.usgs.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +2more
    Updated Feb 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Elliott; Catherine Christenson (2024). Groundwater arsenic data and ASCII grids for predicting elevated arsenic in northwestern and central Minnesota using boosted regression tree methods [Dataset]. http://doi.org/10.5066/F77H1HH8
    Explore at:
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Sarah Elliott; Catherine Christenson
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    1980 - 2016
    Area covered
    Minnesota
    Description

    This data release contains: (1) ASCII grids of predicted probability of elevated arsenic in groundwater for the Northwest and Central Minnesota regions, (2) input arsenic and predictive variable data used in model development and calculation of predictions, and (3) ASCII files used to predict the probability of elevated arsenic across the two study regions. The probability of elevated arsenic was predicted using Boosted Regression Tree (BRT) modeling methods using the gbm package in R Studio version 3.4.2. The response variable was the presence or absence of arsenic >10 µg/L, the U.S. Environmental Protection Agency’s maximum contaminant level for arsenic, in 3,283 wells located throughout both study regions (1,363 in the Northwest region and 1,920 in the Central). The original database used to develop the BRT model consisted of 127 predictor variables which included well characteristics, land use, soil properties, aquifer properties, depth to water table, and predicted nitrate ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:
txtAvailable download formats
Dataset updated
Dec 4, 2023
Dataset provided by
figshare
Authors
Kingsley Okoye; Samira Hosseini
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Search
Clear search
Close search
Google apps
Main menu