100+ datasets found
  1. Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  2. d

    Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill (2023). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill
    Description

    This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.

  3. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  4. h

    all-recipes-xs

    • huggingface.co
    Updated Apr 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AWeirdDev (2024). all-recipes-xs [Dataset]. https://huggingface.co/datasets/AWeirdDev/all-recipes-xs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2024
    Authors
    AWeirdDev
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    all-recipes-xs (500)

    All Recipes dataset (extra small). from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("AWeirdDev/all-recipes-xs")

    Alternatively, load with pickle from _frozen.pkl: import pickle import requests

    r = requests.get("https://huggingface.co/datasets/AWeirdDev/all-recipes-xs/resolve/main/_frozen.pkl") dataset = pickle.loads(r.content)

      Features
    

    Note: Empty values are presented as "unknown" instead of None (normally, unless… See the full description on the dataset page: https://huggingface.co/datasets/AWeirdDev/all-recipes-xs.

  5. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Oct 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 10/29/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the
    `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and
    supporting analyses. This script generates the key figures and summary statistics
    used in the study that then get saved in the manuscript_figures folder. Note that all
    maps were produced using Python code found in the "supporting_code"" folder.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  6. q

    Large Datasets in R - Plant Phenology & Temperature Data from NEON

    • qubeshub.org
    Updated May 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan Jones Patterson; Lee Stanish; Natalie Robinson; Katherine Jones; Cody Flagg (2018). Large Datasets in R - Plant Phenology & Temperature Data from NEON [Dataset]. http://doi.org/10.25334/Q4DQ3F
    Explore at:
    Dataset updated
    May 10, 2018
    Dataset provided by
    QUBES
    Authors
    Megan Jones Patterson; Lee Stanish; Natalie Robinson; Katherine Jones; Cody Flagg
    Description

    This module series covers how to import, manipulate, format and plot time series data stored in .csv format in R. Originally designed to teach researchers to use NEON plant phenology and air temperature data; has been used in undergraduate classrooms.

  7. A

    Dataset

    • data.amerigeoss.org
    • s.cnmilf.com
    • +1more
    pdf
    Updated Jul 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2019). Dataset [Dataset]. http://doi.org/10.23719/1504002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 27, 2019
    Dataset provided by
    United States
    License

    https://pasteur.epa.gov/license/sciencehub-license.htmlhttps://pasteur.epa.gov/license/sciencehub-license.html

    Description

    Publicly available weblinks used to develop research project.

    This dataset is associated with the following publication: Hall, E., R. Hall, J. Aron, S. Swanson, M. Philbin, R. Schaefer, T. Jones-Lepp, D. Heggem, J. Lin, E. Wilson, and H. Kahan. An Ecological Function Approach to Managing Harmful Cyanobacteria in Three Oregon Lakes: Beyond Water Quality Advisories and Total Maximum Daily Loads (TMDLs). WATER. MDPI AG, Basel, SWITZERLAND, 11(6): 1125, (2019).

  8. h

    shakespeare-dialogue

    • huggingface.co
    Updated Nov 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    r-three (2025). shakespeare-dialogue [Dataset]. https://huggingface.co/datasets/r-three/shakespeare-dialogue
    Explore at:
    Dataset updated
    Nov 17, 2025
    Dataset authored and provided by
    r-three
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Taken from Andrej Karpathy and repurposed for UofT's CSC 413 Deep Learning 40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in Andrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks': http://karpathy.github.io/2015/05/21/rnn-effectiveness/.

      Usage
    

    To use for e.g. character modelling: from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("r-three/shakespeare-dialogue")

    Access examples

    for example… See the full description on the dataset page: https://huggingface.co/datasets/r-three/shakespeare-dialogue.

  9. Basic R for Data Analysis

    • kaggle.com
    zip
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kebba Ndure (2024). Basic R for Data Analysis [Dataset]. https://www.kaggle.com/datasets/kebbandure/basic-r-for-data-analysis/code
    Explore at:
    zip(279031 bytes)Available download formats
    Dataset updated
    Dec 8, 2024
    Authors
    Kebba Ndure
    Description

    ABOUT DATASET

    This is the R markdown notebook. It contains step by step guide for working on Data Analysis with R. It helps you with installing the relevant packages and how to load them. it also provides a detailed summary of the "dplyr" commands that you can use to manipulate your data in the R environment.

    Anyone new to R and wish to carry out some data analysis on R can check it out!

  10. f

    Open data: Frequency mismatch negativity and visual load

    • su.figshare.com
    • researchdata.se
    • +1more
    pdf
    Updated Feb 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Wiens; Erik van Berlekom; Malina Szychowska; Rasmus Eklund (2021). Open data: Frequency mismatch negativity and visual load [Dataset]. http://doi.org/10.17045/sthlmuni.7016369.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 23, 2021
    Dataset provided by
    Stockholm University
    Authors
    Stefan Wiens; Erik van Berlekom; Malina Szychowska; Rasmus Eklund
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Wiens, S., van Berlekom, E., Szychowska, M., & Eklund, R. (2019). Visual Perceptual Load Does Not Affect the Frequency Mismatch Negativity. Frontiers in Psychology, 10(1970). doi:10.3389/fpsyg.2019.01970We manipulated visual perceptual load (high and low load) while we recorded electroencephalography. Event-related potentials (ERPs) were computed from these data.OSF_*.pdf contains the preregistration at open science framework (osf).https://doi.org/10.17605/OSF.IO/EWG9XERP_2019_rawdata_bdf.zip contains the raw eeg data files that were recorded with a biosemi system (www.biosemi.com). The files can be opened in matlab with the fieldtrip toolbox. https://www.mathworks.com/products/matlab.htmlhttp://www.fieldtriptoolbox.org/ERP_2019_visual_load_fieldtrip_scripts.zip contains all the matlab scripts that were used to process the ERP data with the toolbox fieldtrip. http://www.fieldtriptoolbox.org/ERP_2019_fieldtrip_mat_*.zip contain the final, preprocessed individual data files. They can be opened with matlab.ERP_2019_visual_load_python_scripts.zip contains the python scripts for the main task. They need python (https://www.python.org/) and psychopy (http://www.psychopy.org/)ERP_2019_visual_load_wmc_R_scripts.zip contains the R scripts to process the working memory capacity (wmc) data. https://www.r-project.org/.ERP_2019_visual_load_R_scripts.zip contains the R scripts to analyze the data and the output files with figures (eg scatterplots). https://www.r-project.org/.

  11. Datasets for manuscript: The impact of legacy nutrient loading from lake...

    • catalog.data.gov
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). Datasets for manuscript: The impact of legacy nutrient loading from lake sediments on cyanobacteria bloom severity [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-the-impact-of-legacy-nutrient-loading-from-lake-sediments-on-cyano
    Explore at:
    Dataset updated
    Sep 21, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The GitHub site contains the code and data to run two methods as shown in the manuscript. The first quantifies external load, internal load, and source attributed impact on cyanoHAB severity. This method was written using R language (version 4.2.2). The second method maps potential legacy P stores in upstream watersheds of Lake Mendota, WI (USA) and estimates the total P load per contributing area across watersheds. This method was written using Python language (version 3.8.5) & GeoPandas (version 0.8.2). 1. Impact_model_method The impact model method is a series of sequential programs that take original source data, performs quality control checks and re-formats the data as new data files for input into the impact model. The Impact_model_method folder has all programs used in this method. All code is written in R Markdown format (.Rmd). Here's the order to run the R programs and achieve the same results as shown in the manuscript: cyanoHAB_severity.Rmd - calculates cyanoHAB severity (as Chl-a and cyanobacteria density) TotalP_external_atmdep.Rmd - calculates external total P loads from atmospheric deposition TotalP_external_inflows.Rmd - calculates external total P loads from inflows (streams) TotalP_external_all.Rmd - calculates the sum of external total P loads (streams + atmospheric deposition) EpiVolume.Rmd - calculates the change in volume of the epilimnion using temperature profiles TotalP_water.Rmd - calculates the total P concentration in the epilimnion, total P concentration in the hypolimnion, and ratio of total P in hypolimnion to epilimnion (as alternative indicator of internal P load) TotalP_internal.Rmd - calculates internal total P loads (or alternatively the ratio of totalP_hypo:totalP_epi) Model_inputs.Rmd - prepares the data as input for impact modeling (defines predictor and response variables) Impact_model.Rmd - performs the statistical modeling for impact analysis on cyanoHAB severity by source (external vs internal) and outputs the partitioned sum of squares for each predictor term 2. Spatial_model_method The spatial model method takes original source data, performs quality control checks and re-formats the data as new data files for input into the spatial model. The model maps potential legacy P stores in upstream sub-watersheds, and quantify total P load per contributing area across sub-watersheds. All programs used in this method are provided in the Spatial_model_method folder. All code is written in Python language (.py or .ipynb). The specific order in which programs should be run to achieve the same results as shown in the manuscript is as follows: HydroGraph_functios.py - performs the network mapping of sub-watersheds upstream of the inflows to the lake watershed_analysis.ipynb - calculates the P export per contributing area for each of the upstream sub-watersheds OHSA.py - performs the optimized hot spot analysis for determining statistically clustered stream monitoring sites with consistently high P concentrations. This dataset is associated with the following publication: Knose, L.A., D.L. Cole, E. Martin-Hernandez, V.M. Zavala, M.A. Gonzalez, C. Vaneeckhaute, and G.J. Ruiz-Mercado. The Impact of Legacy Nutrient Loading from Lake Sediments on Cyanobacteria Bloom Severity. ACS ES&T Water. American Chemical Society, Washington, DC, USA, 5(9): 4997-5010, (2025).

  12. u

    Dawnn benchmarking dataset: Mouse embryo cells processing and label...

    • rdr.ucl.ac.uk
    txt
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Hall; Sergi Castellano Hereza (2023). Dawnn benchmarking dataset: Mouse embryo cells processing and label simulation [Dataset]. http://doi.org/10.5522/04/22614004.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 4, 2023
    Dataset provided by
    University College London
    Authors
    George Hall; Sergi Castellano Hereza
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on single-cell RNAseq of mouse emrbyo cells.

    FILES: Input data Dataset from: "A single-cell molecular map of mouse gastrulation and early organogenesis". Nature 566, pp490–495 (2019). The input data is loaded from the MouseGastrulationData R package. We upload here the RDS file generated by loading the dataset in process_mouse_cells.R in case the R package becomes unavailable

    MouseGastrulationData_loaded_dataset.RDS Dataset loaded from MouseGastrulationData R package in process_mouse_cells.R (in call to EmbryoAtlasData function).

    Data processing code

    process_mouse_cells.R Generates benchmarking dataset from input data. (Loads input data; Runs the standard single-cell RNAseq pipeline). Follows Dann et al. Resulting dataset saved as mouse_gastrulation_data_regen.RDS. simulate_mouse_pc1_Rscript.R R code to simulate P(Condition_1)s for benchmarking. simulate_mouse_pc1_bash.sh Bash script to execute simulate_mouse_pc1_Rscript.R. Outputs stored in benchmark_dataset_mouse_pc1s_regen.csv. simulate_mouse_labels_Rscript.R R code to simulate labels for benchmarking. simulate_mouse_labels_bash.sh Bash script to execute simulate_mouse_labels_Rscript.R. Outputs stored in benchmark_dataset_mouse.csv.

    Resulting datasets

    mouse_gastrulation_data_regen.RDS Seurat dataset generated by process_mouse_cells.R. benchmark_dataset_mouse.csv Cell labels generated by simulate_mouse_labels_bash.sh. benchmark_dataset_mouse_pc1s_regen.csv P(Condition_1)s generated by simulate_mouse_pc1_bash.sh.

  13. d

    Posted and R-Posted Bridges in New York State

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated May 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ny.gov (2022). Posted and R-Posted Bridges in New York State [Dataset]. https://catalog.data.gov/dataset/posted-and-r-posted-bridges-in-new-york-state
    Explore at:
    Dataset updated
    May 25, 2022
    Dataset provided by
    data.ny.gov
    Area covered
    New York
    Description

    This data set features a hyperlink to the New York State Department of Transportation’s Posted Bridges web page, which includes a link to the Posted Bridges interactive map. The interface provides up-to-date information for locating Load-Posted, R-Posted and Other-Posted bridges throughout New York. This site has been developed as a reference tool for operators of Commercial Vehicles to identify and locate structures in New York State where access and use is limited based on the vehicle's weight. Legal weights are established in Section 385 of the Vehicle and Traffic Law. All trucks operating over legal weight pursuant to a Divisible Load Overweight Permit are prohibited from crossing all R-Posted Bridges.

  14. h

    non-linear-classification

    • huggingface.co
    Updated Jul 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLP Guild (2023). non-linear-classification [Dataset]. https://huggingface.co/datasets/nlp-guild/non-linear-classification
    Explore at:
    Dataset updated
    Jul 23, 2023
    Dataset authored and provided by
    NLP Guild
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    please use the following code to load data:

    start data loading

    !git lfs install !git clone https://huggingface.co/datasets/nlp-guild/non-linear-classification

    def load_dataset(path='dataset.npy'): """ :return: f_and_xs: numpy array of size [sample_number, channels, sample_length] label_0, label_1, label_2: one-hot encodes of size [sample_number, number_bins] """

    r = np.load(path, allow_pickle=True).item()
    f_and_xs = r['f_and_xs']
    label_0 = r['l_0']… See the full description on the dataset page: https://huggingface.co/datasets/nlp-guild/non-linear-classification.
    
  15. h

    bkdckedsvfukv

    • huggingface.co
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R Karthiyayani Menon (2025). bkdckedsvfukv [Dataset]. https://huggingface.co/datasets/karthiyayani/bkdckedsvfukv
    Explore at:
    Dataset updated
    Nov 5, 2025
    Authors
    R Karthiyayani Menon
    Description

    Bkdckedsvfukv

      Dataset Description
    

    Dataset for classification tasks

      Dataset Statistics
    

    Total Images: 2 Total Categories: 1 Total Labels: 0 Dataset Size: 1 MB Train Images: 1 (50.0%) Test Images: 1 (50.0%)

      Usage Examples
    
    
    
    
    
      Loading the Dataset
    

    from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("karthiyayani/bkdckedsvfukv")

    Access the training split

    train_data = dataset['train']

    Iterate through examples

    for… See the full description on the dataset page: https://huggingface.co/datasets/karthiyayani/bkdckedsvfukv.

  16. Data from: A dataset to model Levantine landcover and land-use change...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Kempf; Michael Kempf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 16, 2023
    Area covered
    Levant
    Description

    Overview

    This dataset is the repository for the following paper submitted to Data in Brief:

    Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

    The Data in Brief article contains the supplement information and is the related data paper to:

    Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

    Description/abstract

    The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

    Folder structure

    The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

    “code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

    “MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

    “mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

    “yield_productivity” contains .csv files of yield information for all countries listed above.

    “population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

    “GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

    “built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

    Code structure

    1_MODIS_NDVI_hdf_file_extraction.R


    This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.


    2_MERGE_MODIS_tiles.R


    In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").


    3_CROP_MODIS_merged_tiles.R


    Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
    The repository provides the already clipped and merged NDVI datasets.


    4_TREND_analysis_NDVI.R


    Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
    To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.


    5_BUILT_UP_change_raster.R


    Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.


    6_POPULATION_numbers_plot.R


    For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.


    7_YIELD_plot.R


    In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.


    8_GLDAS_read_extract_trend


    The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
    Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
    From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
    From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

  17. Load profile data of 50 industrial plants in Germany for one year

    • zenodo.org
    csv
    Updated Jun 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fritz Braeuer; Fritz Braeuer (2020). Load profile data of 50 industrial plants in Germany for one year [Dataset]. http://doi.org/10.5281/zenodo.3899018
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 18, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fritz Braeuer; Fritz Braeuer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Germany
    Description

    This dataset holds the electric load profiles of 50 small and mid-size enterprises in Germany. The load profiles are in 15-minute time resolution for one year. The load is shown in kW as an average over 15 minutes.

    The dataset is divided into two:

    • LoadProfile_20IPs_2016 shows load profiles of 20 industrial plants (IP) for the year 2016.
    • LoadProfile_30IPs_2017 shows load profiles of 30 industrial plants (IP) for the year 2017.

    The IPs from the dataset for 2016 do not reappear in the dataset for 2017.

    The dataset LoadProfile_20IPs_2016 is evaluated in the following publication:

    • Covic, N., Braeuer, F., McKenna, R., Pandzic, H., Optimizing Industrial Facilities’ Active Participation
      in Electricity Markets under Uncertainty, 2020.

    Both datasets together are evaluated in multiple publications:

    • Braeuer, F., Finck, R., McKenna, R., Comparing empirical and model-based approaches for calculating dynamic grid emission factors: An application to CO2-minimizing storage dispatch in Germany, Journal of Cleaner Production, Volume 266, 2020, 121588, ISSN 0959-6526, https://doi.org/10.1016/j.jclepro.2020.121588.
    • Braeuer, F., Rominger, J., McKenna, R.,Fichtner, W., Battery storage systems: An economic model-based analysis of parallel revenue streams and general implications for industry, Applied Energy, Volume 239, 2019, Pages 1424-1440, ISSN 0306-2619, https://doi.org/10.1016/j.apenergy.2019.01.050.

    Enjoy.

  18. h

    Helios-R-6M

    • huggingface.co
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prithiv Sakthi (2025). Helios-R-6M [Dataset]. https://huggingface.co/datasets/prithivMLmods/Helios-R-6M
    Explore at:
    Dataset updated
    Jul 23, 2025
    Authors
    Prithiv Sakthi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Helios-R-6M

    Helios-R-6M is a high-quality, compact reasoning dataset designed to strengthen multi-step problem solving across mathematics, computer science, and scientific inquiry. While the dataset covers a range of disciplines, math constitutes the largest share of examples and drives the reasoning complexity.

      Quick Start with Hugging Face Datasets🤗
    

    pip install -U datasets

    from datasets import load_dataset

    dataset = load_dataset("prithivMLmods/Helios-R-6M"… See the full description on the dataset page: https://huggingface.co/datasets/prithivMLmods/Helios-R-6M.

  19. U

    Datasets for assessment of nutrient load estimation approaches for small...

    • data.usgs.gov
    • datasets.ai
    • +1more
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Harden; Jessica Diaz; William Hamilton (2024). Datasets for assessment of nutrient load estimation approaches for small urban streams in Durham, North Carolina, 2009-2020 [Dataset]. http://doi.org/10.5066/P9F0Q501
    Explore at:
    Dataset updated
    Aug 7, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Stephen Harden; Jessica Diaz; William Hamilton
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 2009 - Dec 31, 2020
    Area covered
    Durham, North Carolina
    Description

    In cooperation with the City of Durham Public Works Department Stormwater Division, the U.S. Geological Survey (USGS) conducted a study to evaluate whether alternate monitoring strategies that incorporated samples collected across an increased range of streamflows would improve nutrient load estimates for Ellerbe and Sandy Creeks, two small, highly urbanized streams in the City of Durham, North Carolina. This data release provides the associated datasets described in the Scientific Investigations Report, "Assessment of Nutrient Load Estimation Approaches for Small Urban Streams in Durham, North Carolina". Water-quality and streamflow data collected between January 2009 and December 2020 were used to develop instream nutrient-load models using the U.S. Geological Survey R-LOADEST program (Runkel and others, 2004; Runkel, 2013; Lorenz and others, 2017). The datasets contain water-quality data, streamflow data, input files for model calibration and prediction, and output files for mo ...

  20. Artificial single-cell datasets with cell clusters used for...

    • figshare.com
    application/x-gzip
    Updated Jun 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexis Vandenbon (2023). Artificial single-cell datasets with cell clusters used for singleCellHaystack manuscript [Dataset]. http://doi.org/10.6084/m9.figshare.12319787.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Alexis Vandenbon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An archive containing 100 artificial single-cell datasets. The data of each dataset is an R data file (.rda). The file names have the following format: splatter_thousands_of_cellskCells_groups_set_set_id.rda'For example: "splatter_1kCells_groups_set_9.rda" represents the 9th set containing 1000 cells made using the "groups" option of splatSimulate. You can import the data into R using the load() function. Each dataset includes the following R objects:1) counts : the number of reads or UMIs for each gene in each cell2) gene.data : a summary of the data generated by Splatter3) params : the parameters used to generate the dataset

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Organization logo

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:
txtAvailable download formats
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Search
Clear search
Close search
Google apps
Main menu