40 datasets found

f
Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
figshare
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

zenodo.org

application/gzip, bin +2

Updated Aug 2, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788

Explore at:

bin, application/gzip, zip, text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1419788

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb

License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description

Replication pack, FSE2018 submission #164:
------------------------------------------

**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)

Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `

J & r designs studio inc USA Import & Buyer Data
seair.co.in
Updated Oct 4, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2018). J & r designs studio inc USA Import & Buyer Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Oct 4, 2018
Dataset provided by
Seair Info Solutions
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
m
ShinyFMBN, a Shiny app to access FoodMicrobionet
data.mendeley.com
Updated Sep 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eugenio Parente (2022). ShinyFMBN, a Shiny app to access FoodMicrobionet [Dataset]. http://doi.org/10.17632/8fwwjpm79y.7
Explore at:
Unique identifier
https://doi.org/10.17632/8fwwjpm79y.7
Dataset updated
Sep 2, 2022
Authors
Eugenio Parente
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This data set contains the ShinyFMBN app and the FoodMicrobionet database, containing metataxonomic data for bacterial communities of foods and food environments. Learn more at https://www.sciencedirect.com/science/article/pii/S0168160522001684 The ShinyFMBN app allows you to access FoodMicrobionet 4.2, a repository of data on food microbiome studies. To run the app you need to install R and R Studio. Data are available in both R (.rds) and .xlsx format (see below).

This compressed folder contains: a. folder R_lists: contains two .rds files containing all data in FoodMicrobionet 4.1.2. FMBN.rds is in a format usable with ShinyFMBN 2.4 (see below) while FMBN_plus.rds contains all tables and fields and is best accessed using custom R scripts (see https://github.com/ep142/ for examples). b. folder xlsx_files: contains all FoodMicrobionet tables in MS Excel format. These files may be useful because the locale of each given system may affect the way some fields containing accented letters are handled during the import of text files. c. folder shiny_FMBN_2_4_3: contains the app folder, the runShinyFMBN_2_4_3.R script (a R script to install all needed packages and run the app) and the app manual in .html format d. FMBNtablespecs_4_2.html describes the table specifications
U
Tidewater goby and estuarine fish records from seining, qPCR and...
data.usgs.gov
data.niaid.nih.gov
+4more
Updated Jul 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K.D. Lafferty (2024). Tidewater goby and estuarine fish records from seining, qPCR and metabarcoding data for Southern California Estuaries in 2023 [Dataset]. http://doi.org/10.25349/D9P60T
Explore at:
Unique identifier
https://doi.org/10.25349/D9P60T
Dataset updated
Jul 29, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
K.D. Lafferty
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Aug 16, 2022
Area covered
Southern California, California
Description
This data archive includes R code and data for reproducing the analyses and figures in Lafferty, Metabarcoding is (usually) more cost effective than seining or qPCR for detecting tidewater gobies and other estuarine fishes.

To view the supplementary tables, open the Fig&TableSuppl.docx file. This file also includes the manuscript figures and tables and some explanatory text about how to generate them. To reproduce the figures, open the Fig&TableCode.Rmd in R studio and be sure the needed csv files included in the Dryad repository are in the working directory. The data files include more information than used in the analyses and can be used for other purposes. The code is not software, nor is it intended as an R package, but the code is annotated so others can understand and manipulate it. For each CSV file there is an associated metadata file that defines entries and columns and an information file that contains an abstract and ownership information. One of the data file ...
Z
Data from: Compositional Data Analysis (CoDA) of Clinopyroxene from Abyssal...
data.niaid.nih.gov
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Szilas, K. (2024). Compositional Data Analysis (CoDA) of Clinopyroxene from Abyssal Peridotites [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6791965
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Morishita, T.
Szilas, K.
Nishio, I.
Itano, K.
Tamura, A.
Waterton, P.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional supporting information includes data, R script, and QGIS file supporting the main text:

CSV (Data Set)

residual_abyssal_peridotites.csv: Compilations of residual abyssal peridotites (n = 1162) and depleted MORB-mantle (n = 1)

residual_abyssal_peridotites_coda_results.csv: Filtered data and results of PCA and k-means clustering (n = 267)

model_cpx.csv: Clinopyroxene compositions obtained by open-system melting model

test.csv: csv file for testing new data

R

abyssal_cpx_pca.Rproj

coda.R: R script implemented in this study

test_your_data.R: R script to test new data comparing to abyssal clinopyroxenes

and modeled clinopyroxenes

QGIS

residual_abyssal_peridotites.qgz: QGIS using residual_abyssal_peridotites.csv and residual_abyssal_peridotites_coda_results.csv for Figure 1 and Figure S7

color_etopo1_ice_low_modified.tiff: ETOPO1 is a 1 arc-minute global relief model of Earth's surface that integrates land topography and ocean bathymetry from NOAA

Instruction

We prepared an R script to compare new (your) clinopyroxene data with clinopyroxene from abyssal peridotites. New data will be plotted using the principal components derived from the natural clinopyroxene database presented in this paper.

The procedure is as follows:

add data below the second row in test.csv

Do not change the file name

Do not change the first row

Add clinopyroxene data (10 elements) and its label replacing under 2nd row * Label of data can be sample name, lithology, locality etc.

Open abyssal_cpx_pca.Rproj by R studio (double click) 3. Open test_your_data.R (double click)

Implement test_your_data.R.

To use test_your_data.R, first press cmd+A (ctrl+A) and press Run/cmd+enter (ctrl+enter).

Results files 5-1. abyssalcpx_vs_test.csv: PC1&PC2 values using abyssal clinopyroxene PC coordinates 5-2. plot1.pdf: abyssal clinopyroxene (cluster) vs. test data plot 5-3. plot2.pdf: modeled clinopyroxene vs. test data plot 5-4. spider_cl1.pdf: PM-normalized trace elements patterns of cluster 1 from abyssal peridotites 5-5. spider_cl2.pdf: PM-normalized trace elements patterns of cluster 2 from abyssal peridotites 5-6. spider_cl3.pdf: PM-normalized trace elements patterns of cluster 3 from abyssal peridotites 5-7. spider_cl4.pdf: PM-normalized trace elements patterns of cluster 4 from abyssal peridotites 5-8. spider_test.pdf: PM-normalized trace elements patterns of new data (your data) 5-9. plot3.pdf: Discrimination diagram for clinopyroxene trace elements compositions. PM normalized Sr/Nd ratio vs. Ce/Yb ratio of clinopyroxenes from abyssal peridotites vs. test data
m
ANN dual-fluid PV/T data and code in R programming - Architecture of the...
data.mendeley.com
Updated Mar 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hasila Jarimi (2021). ANN dual-fluid PV/T data and code in R programming - Architecture of the Artificial Neural Network [Dataset]. http://doi.org/10.17632/gxxszgy85t.1
Explore at:
Unique identifier
https://doi.org/10.17632/gxxszgy85t.1
Dataset updated
Mar 8, 2021
Authors
Hasila Jarimi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides the artificial neural network architecture for a dual-fluid photovoltaic thermal (PV/T) collector which was experimentally tested in the outdoor environment of Malaysia. The system was set up and tested in three modes, which are (i) air mode, (ii) water mode and (iii) simultaneous mode. For modes (i), (ii) and (iii) air flows through the cooling channels, water flows through the cooling channels and both air and water flow together.

To create this dataset, the following steps were carried out:

Select input variables: 5 data inputs were selected, which are Ambient temperature, wind speed, solar irradiance, inlet air temperature and inlet water temperature.

Select Algorithm: for training, the Backpropagation neural network (BPNN) was used.

Select output variables: 6 data output were selected, which are PV surface temperature, PV temperature, temperature of the back plate, the temperature of the outlet air and outlet water, in addition to the electrical efficiency.

Step 1: Import the data Step 2: Normalize the data Step 3: Split the dataset into training and testing data Step 4: Create the NN model in R studio.

The package 'Neuralnet' in the R programming language was used. The coding in R studio is provided in the attached file.
Z
Research data supporting "Considerations for Implementing Electronic...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jul 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stevens, MM (2024). Research data supporting "Considerations for Implementing Electronic Laboratory Notebooks in an Academic Research Environment" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5012728
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Nogiwa-Valdez, AA
Higgins, SG
Stevens, MM
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Research data supporting the publication:

Higgins SG, Nogiwa-Valdez AA, Stevens MM, Considerations for Implementing Electronic Laboratory Notebooks in an Academic Research Environment, Nature Protocols, 2021.

This repository contains the raw survey data of 172 current and historic electronic laboratory notebook (ELN) software packages.

Main files:

"ELN_Review_Higgins_2021_Survey.csv" = raw survey data in 'tidy' data format

"ELN_Review_Higgins_2021.Rmd" = an R Markdown File (R Notebook) that takes the survey data as input and produces summary statistics and plots. This file was written using R Studio as the IDE.

Derived files, generated from those above:

"ELN_Review_Higgins_2021.nb.html" = a self-contained HTML file that is automatically generated by R Studio, based on the markdown file. This can be opened in any web browser to allow manual inspection of the code and comments without the need for specialist software. Embedded within this file is also the original markdown script (i.e. a copy of the code in "ELN_Review_Higgins_2021.Rmd")

"ELN_Review_Higgins_2021_Lifetimes_Interactive_Figure1.html" = an HTML file generated by the script above via the plotly package. It contains an interactive version of the ELN survey data, allowing the user to hover over the timeline and explore the data.

"ELN_Review_Higgins_2021_Timeline.pdf" = static version of ELN timeline, used to generate figure in main manuscript.

"ELN_Review_Higgins_2021_Releases-Per-Year.pdf" = static version of number of new ELNs per year, used to generate figure in main manuscript.

This survey was generated from a mixture of primary and secondary sources (see references for secondary sources).
g
Tidewater goby and estuarine fish records from seining, qPCR and...
gimi9.com
Updated Feb 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Tidewater goby and estuarine fish records from seining, qPCR and metabarcoding data for Southern California Estuaries in 2023 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_tidewater-goby-and-estuarine-fish-records-from-seining-qpcr-and-metabarcoding-data-for-sou/
Explore at:
Dataset updated
Feb 2, 2024
Description
This data archive includes R code and data for reproducing the analyses and figures in Lafferty, Metabarcoding is (usually) more cost effective than seining or qPCR for detecting tidewater gobies and other estuarine fishes. To view the supplementary tables, open the Fig&TableSuppl.docx file. This file also includes the manuscript figures and tables and some explanatory text about how to generate them. To reproduce the figures, open the Fig&TableCode.Rmd in R studio and be sure the needed csv files included in the Dryad repository are in the working directory. The data files include more information than used in the analyses and can be used for other purposes. The code is not software, nor is it intended as an R package, but the code is annotated so others can understand and manipulate it. For each CSV file there is an associated metadata file that defines entries and columns and an information file that contains an abstract and ownership information. One of the data files required to reproduce the analyses (Schmelzle&Kinziger_occupancy.csv) was created from previously published data and was not produced by the author. Please cite it as: Schmelzle, Molly C., Kinziger, Andrew P. 2015. Data from: Using occupancy modeling to compare environmental DNA to traditional field methods for regional-scale monitoring of an endangered aquatic species. Dryad. 6rs23
f
Hemoglobin input dataset for analysis in R Studio.
plos.figshare.com
xlsx
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasia Meckler; Sebastian Künert; Leonardo Poggi; Julia Jeske; Lukas Schipper; Thanusiah Selvamoorthy; Felix Nensa; Bernadette Hosters; Michael Fabian Berger; Ramsi Siaj; Mario Vincent Roser (2025). Hemoglobin input dataset for analysis in R Studio. [Dataset]. http://doi.org/10.1371/journal.pone.0325072.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0325072.s004
Dataset updated
Jul 28, 2025
Dataset provided by
PLOS ONE
Authors
Anastasia Meckler; Sebastian Künert; Leonardo Poggi; Julia Jeske; Lukas Schipper; Thanusiah Selvamoorthy; Felix Nensa; Bernadette Hosters; Michael Fabian Berger; Ramsi Siaj; Mario Vincent Roser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Column A: Binary classification of data based on laboratory values from Column B (cut-off value = 0: 0 = 0, > 0 = 1); Column B: Laboratory values; Column C: Randomized patient numbers; Columns E–H: Wavelengths with corresponding spectral data. (XLSX)
d
Replication Data for: Responsiveness of decision-makers to stakeholder...
dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lei, Yuxuan (2023). Replication Data for: Responsiveness of decision-makers to stakeholder preferences in the European Union legislative process [Dataset]. http://doi.org/10.7910/DVN/RH5H3H
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/RH5H3H
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Lei, Yuxuan
Area covered
European Union
Description
This dataset contains original quantitative datafiles, analysis data, a codebook, R scripts, syntax for replication, the original output from R studio and figures from a statistical program. The analyses can be found in Chapter 5 of my PhD dissertation, i.e., ‘Political Factors Affecting the EU Legislative Decision-Making Speed’. The data supporting the findings of this study are accessible and replicable. Restrictions apply to the availability of these data, which were used under license for this study. The datafiles include: File name of R script: Chapter 5 script.R File name of syntax: Syntax for replication 5.0.docx File name of the original output from R studio: The original output 5.0.pdf File name of code book: Codebook 5.0.txt File name of the analysis data: data5.0.xlsx File name of the dataset: Original quantitative data for Chapter 5.xlsx File name of the dataset: Codebook of policy responsiveness.pdf File name of figures: Chapter 5 Figures.zip Data analysis software: R studio R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin17.0 (64-bit)
q
Thinking deeply about quantitative analysis: Building a Biologist's Toolkit
qubeshub.org
Updated Aug 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Bray; Paul Duffin; James Wagner (2021). Thinking deeply about quantitative analysis: Building a Biologist's Toolkit [Dataset]. http://doi.org/10.24918/cs.2016.4
Explore at:
Unique identifier
https://doi.org/10.24918/cs.2016.4
Dataset updated
Aug 26, 2021
Dataset provided by
QUBES
Authors
Sarah Bray; Paul Duffin; James Wagner
Description
Vision and Change in Undergraduate Biology Education encouraged faculty to focus on core concepts and competencies in undergraduate curriculum. We created a sophomore-level course, Biologists' Toolkit, to focus on the competencies of quantitative reasoning and scientific communication. We introduce students to the statistical analysis of data using the open source statistical language and environment, R and R Studio, in the first two-thirds of the course. During this time the students learn to write basic computer commands to input data and conduct common statistical analysis. The students also learn to graphically represent their data using R. In a final project, we assign students unique data sets that require them to develop a hypothesis that can be explored with the data, analyze and graph the data, search literature related to their data set, and write a report that emulates a scientific paper. The final report includes publication quality graphs and proper reporting of data and statistical results. At the end of the course students reported greater confidence in their ability to read and make graphs, analyze data, and develop hypotheses. Although programming in R has a steep learning curve, we found that students who learned programming in R developed a robust strategy for data analyses and they retained and successfully applied those skills in other courses during their junior and senior years.
d
Replication Data for:Influence of different stakeholders on the duration of...
search.dataone.org
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lei, Yuxuan (2023). Replication Data for:Influence of different stakeholders on the duration of legislative decision-making in the European Union [Dataset]. http://doi.org/10.7910/DVN/VGAQIO
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/VGAQIO
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Lei, Yuxuan
Area covered
European Union
Description
This dataset contains original quantitative datafiles, analysis data, a codebook, R scripts, syntax for replication, the original output from R studio and figures from a statistical program. The analyses can be found in Chapter 2 of my PhD dissertation, i.e., ‘Political Factors Affecting the EU Legislative Decision-Making Speed’. The data supporting the findings of this study are accessible and replicable. Restrictions apply to the availability of these data, which were used under license for this study. The datafiles include: File name of R script: Chapter 2 script.R File name of syntax: Syntax for replication 2.0.docx File name of the original output from R studio: The original output 2.0.pdf File name of code book: Codebook 2.0.txt File name of the analysis data: data2.1.xlsx File name of the dataset: Original quantitative data for Chapter 2.xlsx File name of figures: Chapter 2 Figures.zip
e
China - Overseas Finance Inventory Database - Dataset - ENERGYDATA.INFO
energydata.info
Updated Mar 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). China - Overseas Finance Inventory Database - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/china-overseas-finance-inventory-database
Explore at:
Dataset updated
Mar 22, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
The COFI database includes power-generation projects in Belt and Road Initiative (BRI) countries financed by Chinese corporations and banks that reached financial closure from 2000 to 2020. Types of financing include debt and equity investment, with the latter including greenfield foreign direct investments (FDI) and cross-border mergers and acquisitions (M&As). COFI is consolidated using nine source databases using both automated join method in R Studio, and manual joining by analysts. The database includes power plant characteristics data and investment detail data. It captures 430 power plants in 76 BRI countries, including 220 equity investment transactions and 253 debt investment transactions made by Chinese investors. Key data points for financial transactions in COFI include the financial instrument (equity or debt), investor name, amount, and financial close year. Key technical characteristics tracked for projects in COFI include name, installed capacity, commissioning year, country, and primary fuel type. This project is a collaboration among the Boston University Global Development Policy Center, the Inter-American Dialogue, the China-Africa Research Initiative at the Johns Hopkins University (CARI), and the World Resources Institute (WRI). The detailed methodology is given in the World Resources Institute publication “China Overseas Finance Inventory”.
r
Data on how honeybee host brood traits influence Varroa destructor...
researchdata.se
Updated Mar 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Scaramella; Ashley Burke; Melissa Oddie; Barbara Locke (2024). Data on how honeybee host brood traits influence Varroa destructor reproduction [Dataset]. http://doi.org/10.5878/znc2-9b12
Explore at:
(4974), (7698), (6945), (3942), (8397), (2887), (3139), (3690), (371), (3724)Available download formats
Unique identifier
https://doi.org/10.5878/znc2-9b12
Dataset updated
Mar 26, 2024
Dataset provided by
Swedish University of Agricultural Sciences
Authors
Nicholas Scaramella; Ashley Burke; Melissa Oddie; Barbara Locke
Time period covered
Jun 2019 - Sep 2021
Description
The data set was collected in Uppsala Sweden between 2019 and 2021. Hives were established using varroa resistant queens from Oslo, Norway (n = 3), Gotland Sweden, (n = 5), and Avignon, France (n = 4), with a varroa susceptible population from Uppsala, Sweden (n = 5) as control. All hives were located at the SLU Lövsta research station (GPS Coordinates: 59° 50’ 2.544”N, 17° 48’ 47.447”E). Varroa destructor mite reproductive success was measured on frames with adult honeybee workers exposed to, and excluded from access to honeybee larvae. Excluders were added directly after brood capping, and frames were dissected nine days later. Cell caps were removed using a scalpel with the pupae and mite families carefully removed from the cell using forceps and a fine paint brush. Mite reproductive success calculated by counting successful reproduction attempts, which was defined as a mite that successfully produced one male, and at least one female offspring. If a mite did not meet this requirement, it was considered a failed reproduction attempt and the reason for failure was documented. All data was analyzed in R version 4.0.1 using R Studio 1.3.959. A linear mixed-effect model was used with mite reproductive success as the response variable, population origin and excluder treatment as independent variables, with colony and year as random effect variables to compare treatments within each population as well as fecundity. Least-square means of the model were used to compare treatments between individual populations.

Scaramella_et_al_2023_Data.tsv - Data set consists of 34 rows and 21 columns. Colony demographics, and designated treatment are listed. All data collected are count data and are explained in more detail in read me file. R script used in analysis is attached. It is split into two sections, with the first being used for statistical analysis, and the second used for plot creations used in the paper. Sections defined by title SECTION 1 - ANALYSIS and SECTION 2 - PLOTs

The output Scaramella_et_al_2023_Analysis_Code_log.txt and plot file Rplots.pdf can, provided that the script is in the same directory as the data files and needed R packages are installed (see sessionInfo.txt), be reproduced by running: Rscript Scaramella_et_al_2023_Analysis_Code.R >
Scaramella_et_al_2023_Analysis_Code_log.txt

Scaramella_et_al_2023_Bar_Graph_Data.tsv - Data set consisting of 8 rows & 5 columns. Colony demographics, and designated treatment are listed. All data generated from the count data in Scaramella_et_al_2023_Data.tsv and are explained in more detail in read me file.

Scaramella_et_al_2023_Stacked_Bar_Graph_Data.tsv - Data set consisting of 102 rows & 8 columns. Colony demographics, and designated treatment are listed. All data is Scaramella_et_al_2023_Data.tsv restructured to include reason failed as a column. The data is explained in more detail in read me file.
m
Data for Meta Analysis - Ebook Language Learning
data.mendeley.com
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Virgiawan Listanto (2024). Data for Meta Analysis - Ebook Language Learning [Dataset]. http://doi.org/10.17632/8757gxmwzx.1
Explore at:
Unique identifier
https://doi.org/10.17632/8757gxmwzx.1
Dataset updated
Sep 16, 2024
Authors
Virgiawan Listanto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This spreadsheet contains data from studies on e-books and English language learning. The data is from reliable sources in the Scopus database. It includes details like sample sizes, means, and standard deviations for both control and experimental groups. We include the algorithm that we use in the R studio software that we use.
data -- Fruiting phenology patterns, Nyungwe National Park, Rwanda
figshare.com
txt
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phillip Dugger; Beth A. Kaplin; Norbert J. Cordeiro; Mediatrice Bana (2025). data -- Fruiting phenology patterns, Nyungwe National Park, Rwanda [Dataset]. http://doi.org/10.6084/m9.figshare.24020898.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24020898.v3
Dataset updated
Jul 14, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Phillip Dugger; Beth A. Kaplin; Norbert J. Cordeiro; Mediatrice Bana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Rwanda
Description
Data files (.csv) used in study of fruiting phenology patterns in Nyungwe National Park, Rwanda from 1996-2019. Datasets include climate variables (rain, irradiance, minimum and maximum temperatures, and ENSO index), fruiting phenology data, and GIS locations of study sites. Data are organized for use in statistical analyses using the R computational language.Instructions for use in R Project:We strongly suggest the creation of an R Project file in R Studio to use the scripts and data contained in this repository.Data files should be stored in a folder named "data", in the same directory as the R Project file. This will ensure that R scripts for loading data folders are accessing the correct directory.Script files should be stored in another folder in the same directory as the R Project file (suggested folder name: "scripts").
d
Climate Hazards Data Integration and Visualization for the Climate...
search.dataone.org
datadryad.org
Updated Jun 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Curtin; Liane Chen; Hazel Vaquero; Kristina Glass (2024). Climate Hazards Data Integration and Visualization for the Climate Adaptations Solutions Accelerator through School-Community Hubs [Dataset]. http://doi.org/10.5061/dryad.1jwstqk3g
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.1jwstqk3g
Dataset updated
Jun 22, 2024
Dataset provided by
Dryad Digital Repository
Authors
Charles Curtin; Liane Chen; Hazel Vaquero; Kristina Glass
Time period covered
Jun 4, 2024
Description
Community engagement in planning is essential for effective and just climate adaptation. However, historically underserved communities are often difficult to reach through traditional means of soliciting public input. The Climate Adaptation Solutions Accelerator (CASA) through School-Community Hubs project identifies public schools as promising sites for building both community engagement and community capacity for climate adaptation. To serve in this role, schools need information about the intersecting threats climate change poses to the communities they serve. The Climate Hazard Dashboard for California Schools is a platform that maps the current and future risks associated with five climate hazards, including wildfire, extreme heat days, wildfire extreme precipitation, flooding, and sea level rise, for the nearly 10,000 public schools serving Kindergarten through Grade 12 students in California. Each hazard is mapped and visualized at the school level, providing an accessible way fo..., Data for extreme heat and extreme precipitation were retrieved using API requests from the caladaptr package. The data retrieved to calculate extreme heat days were historical observed daily maximum temperature for 1961-2005 and projected daily maximum temperature for 2006-2064. The data retrieved to calculate extreme precipitation days were historical observed daily precipitation totals for 1961-2005 and projected daily precipitation totals for 2006-2064.Â Data for wildfire, flooding, and sea level rise were downloaded directly from their sources and stored in a remote server for use. All data were processed in R Studio using Quarto Docs. Tabular data for extreme heat and precipitation first used the retrieved historical data to calculate a threshold value to classify an extreme event. The threshold was determined to be the 98th percentile value of observed historical data for California. For extreme heat, this is 98Â°F. For extreme precipitation, this is 0.73 inches. Then, projected dai..., , ---

editor_options: markdown:

wrap: 72

This README.txt file was generated on 2024-05-23 by Liane Chen, Charlie Curtin, Kristina Glass, and Hazel Vaquero. It is associated with the data archival on this project through Dryad. To view the data archival and download datasets, please visit https://doi.org/10.5061/dryad.1jwstqk3g.

Recommended citation:

Curtin, Charles; Glass, Kristina; Chen, Liane; Vaquero, Hazel (Forthcoming 2024). Climate Hazards Data Integration and Visualization for the Climate Adaptations Solutions Accelerator through School-Community Hubs [Dataset]. Dryad. https://doi.org/10.5061/dryad.1jwstqk3g

GENERAL INFORMATION

1. Title of the Project: Climate Adaptation Solutions Accelerator through School Community Hubs (alias CASAschools)

2. Author Information

A. Principal Investigator Contact Information

Name: Liane Chen, Charlie Curtin, Kristina Glass, and Hazel Vaque...
r
Data from: Biogeographical variation in diurnal behaviour of Acanthaster...
researchdata.edu.au
Updated Jun 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burn Deborah; Deborah Anne Burn (2020). Biogeographical variation in diurnal behaviour of Acanthaster planci versus Acanthaster cf. solaris [Dataset]. http://doi.org/10.25903/5E37C3142EF11
Explore at:
Unique identifier
https://doi.org/10.25903/5E37C3142EF11
Dataset updated
Jun 17, 2020
Dataset provided by
James Cook University
Authors
Burn Deborah; Deborah Anne Burn
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Feb 25, 2017 - Mar 29, 2017
Area covered
Description
This data set contains Crown of Thorns Starfish (Acanthaster planci and Acanthaster cf. solaris) behavioural data collected at Lankanfushi Island in the Maldives, and at Rib Reef on the Great Barrier Reef, Australia. The data is deposited here to accompany the Open Access publication from the Related Publications link below. Here, we include information on all individual starfish counted during surveys at different times of day (including at night) at both locations. Information provided includes location, date, time and depth at which each individual was found, as well as the maximum diameter of each individual and the behaviour each individual was exhibiting. More specifically, whether the starfish was hidden or exposed, if the individual was exhibiting resting, moving or feeding behaviour, and the prey items of those feeding is noted within this data set. Also included are point intercept coral cover data for each transect at each location as well as the R script used to analyse the data within the aforementioned publication.
The dataset consists of the following files:
Burnetal.R and Burnetal.txt - R scripts used for analysis (open in R or R Studio) and plain text (.txt) formats
COTSmovement.csv – All data collected for individual Crown of Thorns starfish at both locations. Includes diameter and behavioural information.
COTS_PITRIB.csv - point intercept coral cover (Rib Reef)
Maldives_PIT.csv - point intercept coral cover (Maldives)
The full methodology will be available in the Open Access publication from the Related Publications link below.
U
Groundwater arsenic data and ASCII grids for predicting elevated arsenic in...
data.usgs.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+2more
Updated Feb 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Elliott; Catherine Christenson (2024). Groundwater arsenic data and ASCII grids for predicting elevated arsenic in northwestern and central Minnesota using boosted regression tree methods [Dataset]. http://doi.org/10.5066/F77H1HH8
Explore at:
Unique identifier
https://doi.org/10.5066/F77H1HH8
Dataset updated
Feb 24, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Sarah Elliott; Catherine Christenson
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
1980 - 2016
Area covered
Minnesota
Description
This data release contains: (1) ASCII grids of predicted probability of elevated arsenic in groundwater for the Northwest and Central Minnesota regions, (2) input arsenic and predictive variable data used in model development and calculation of predictions, and (3) ASCII files used to predict the probability of elevated arsenic across the two study regions. The probability of elevated arsenic was predicted using Boosted Regression Tree (BRT) modeling methods using the gbm package in R Studio version 3.4.2. The response variable was the presence or absence of arsenic >10 µg/L, the U.S. Environmental Protection Agency’s maximum contaminant level for arsenic, in 3,283 wells located throughout both study regions (1,363 in the Northwest region and 1,920 in the Central). The original database used to develop the BRT model consisted of 127 predictor variables which included well characteristics, land use, soil properties, aquifer properties, depth to water table, and predicted nitrate ...

Facebook

Twitter

Click to copy link

Link copied

Cite

Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24728073.v1

Dataset updated

Dec 4, 2023

Dataset provided by

figshare

Authors

Kingsley Okoye; Samira Hosseini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Clear search

Close search

Google apps

Main menu

Collection of example datasets used for the book - R Programming -...

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

J & r designs studio inc USA Import & Buyer Data

ShinyFMBN, a Shiny app to access FoodMicrobionet

Tidewater goby and estuarine fish records from seining, qPCR and...

Data from: Compositional Data Analysis (CoDA) of Clinopyroxene from Abyssal...

ANN dual-fluid PV/T data and code in R programming - Architecture of the...

Research data supporting "Considerations for Implementing Electronic...

Tidewater goby and estuarine fish records from seining, qPCR and...

Hemoglobin input dataset for analysis in R Studio.

Replication Data for: Responsiveness of decision-makers to stakeholder...

Thinking deeply about quantitative analysis: Building a Biologist's Toolkit

Replication Data for:Influence of different stakeholders on the duration of...

China - Overseas Finance Inventory Database - Dataset - ENERGYDATA.INFO

Data on how honeybee host brood traits influence Varroa destructor...

Data for Meta Analysis - Ebook Language Learning

data -- Fruiting phenology patterns, Nyungwe National Park, Rwanda

Climate Hazards Data Integration and Visualization for the Climate...

wrap: 72

Data from: Biogeographical variation in diurnal behaviour of Acanthaster...

Groundwater arsenic data and ASCII grids for predicting elevated arsenic in...

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research