22 datasets found

f
Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
figshare
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
f
Data_Sheet_6_“R” U ready?: a case study using R to analyze changes in gene...
frontiersin.figshare.com
docx
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_6_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s006
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s006
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
d
Data from: spectre: An R package to estimate spatially-explicit community...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Oct 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Craig Eric Simpkins; Sebastian Hanß; Matthias Spangenberg; Jan Salecker; Maximilian Hesselbarth; Kerstin Wiegand (2022). spectre: An R package to estimate spatially-explicit community composition using sparse data [Dataset]. http://doi.org/10.5061/dryad.fbg79cnz7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.fbg79cnz7
Dataset updated
Oct 6, 2022
Dataset provided by
Dryad
Authors
Craig Eric Simpkins; Sebastian Hanß; Matthias Spangenberg; Jan Salecker; Maximilian Hesselbarth; Kerstin Wiegand
Time period covered
Sep 25, 2022
Description
The simulated community datasets were built using the virtualspecies V1.5.1 R package (Leroy et al., 2016), which generates spatially-explicit presence/absence matrices from habitat suitability maps. We simulated these suitability maps using Gaussian fields neutral landscapes produced using the NLMR V1.0 R package (Sciaini et al., 2018). To allow for some level of overlap between species suitability maps, we divided the γ-diversity (i.e., the total number of simulated species) by an adjustable correlation value to create several species groups that share suitability maps. Using a full factorial design, we developed 81 presence/absence maps varying across four axes (see Supplemental Table 1 and Supplemental Figure 1): 1) landscape size, representing the number of sites in the simulated landscape; 2) γ-diversity; 3) the level of correlation among species suitability maps, with greater correlations resulting in fewer shared species groups among suitability maps; and 4) the habitat suitabil...
n
funspace: an R package to build, analyze and plot functional trait spaces
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Perez Carmona; Nicola Pavanetto; Giacomo Puglielli (2024). funspace: an R package to build, analyze and plot functional trait spaces [Dataset]. http://doi.org/10.5061/dryad.4tmpg4fg6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4tmpg4fg6
Dataset updated
Feb 28, 2024
Dataset provided by
Estonian University of Life Sciences
University of Tartu
Universidad de Sevilla
Authors
Carlos Perez Carmona; Nicola Pavanetto; Giacomo Puglielli
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Functional trait space analyses are pivotal to describe and compare organisms’ functional diversity across the tree of life. Yet, there is no single application that streamlines the many sometimes-troublesome steps needed to build and analyze functional trait spaces. To fill this gap, we propose funspace, an R package to easily handle bivariate and multivariate (PCA-based) functional trait space analyses. The six functions that constitute the package can be grouped in three modules: ‘Building and exploring’, ‘Mapping’, and ‘Plotting’. The building and exploring module defines the main features of a functional trait space (e.g., functional diversity metrics) by leveraging kernel density-based methods. The mapping module uses general additive models to map how a target variable distributes within a trait space. The plotting module provides many options for creating flexible and high-quality figures representing the outputs obtained from previous modules. We provide a worked example to demonstrate a complete funspace workflow. funspace will provide researchers working with functional traits across the tree of life with an indispensable asset to easily explore: (i) the main features of any functional trait space, (ii) the relationship between a functional trait space and any other biological or non-biological factor that might contribute to shaping species’ functional diversity.
r
2016 SoE Built environment Public transport by capital city 1990 to 2014
researchdata.edu.au
Updated Jul 21, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of the Environment (2016). 2016 SoE Built environment Public transport by capital city 1990 to 2014 [Dataset]. https://researchdata.edu.au/2016-soe-built-1990-2014/2980609
Explore at:
Dataset updated
Jul 21, 2016
Dataset provided by
data.gov.au
Authors
State of the Environment
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Billions of passenger kms. This data was sourced from the Bureau of Infrastructure, Transport and Regional Economics displaying public transport in billions of kilometres by year.\r \r For more information see http://bitre.gov.au/publications/2014/is_059.aspx.\r \r Figure BLT33 in Built environment. See; https://soe.environment.gov.au/theme/built-environment/topic/2016/livability-transport#built-environment-figure-BLT33\r
o
Lost in the Code?
explore.openaire.eu
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis Eduardo Muñoz (2023). Lost in the Code? [Dataset]. http://doi.org/10.5281/zenodo.7589898
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7589898
Dataset updated
Mar 13, 2023
Authors
Luis Eduardo Muñoz
Description
The community behind R is built by inspired scientists that share their tools and knowledge freely to encourage equal access for all aspiring researchers and championing academic integrity. The tools available through R aid in every step of data analysis; including creating experiments, cataloging and organizing data, analyzing the results, and visualizing our findings all in one software environment. The power of programming also increases the flexibility and automation of these tasks saving an abundance of time and ensuring each step can be accurately reproduced. Often, courses that use the R software to demonstrate statistical concepts face the dual challenge of introducing two distinct and equally intricate topics at once; programming and statistics. In most cases, the focus must be shifted away from programming due to constraints on time and breadth to the potential confusion and dismay (repeated appearance of error messages) of novice learners in statistics. This workshop aims to provide a solid foundation of programming concepts such that attendees can confidently approach more advanced statistical courses or independently improve their statistical skills. Many of the ideas that will be covered can apply to many different programming languages, despite R being the main tool. Online recordings. Part 1: https://youtu.be/3zUkPvYTePo Part 2: https://youtu.be/Knjbu6JwNI0 When reading through the word documents with exercises, please use the keyboard shortcut "Ctrl + *" ("Command + *" for Mac) to show the hidden text that provides hints and advice for solving the exercises.
Data from: Bike Sharing Dataset
kaggle.com
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ram Vishnu R (2024). Bike Sharing Dataset [Dataset]. https://www.kaggle.com/datasets/ramvishnur/bike-sharing-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 10, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ram Vishnu R
Description
Problem Statement:

A bike-sharing system is a service in which bikes are made available for shared use to individuals on a short term basis for a price or free. Many bike share systems allow people to borrow a bike from a "dock" which is usually computer-controlled wherein the user enters the payment information, and the system unlocks it. This bike can then be returned to another dock belonging to the same system.

A US bike-sharing provider BoomBikes has recently suffered considerable dip in their revenue due to the Corona pandemic. The company is finding it very difficult to sustain in the current market scenario. So, it has decided to come up with a mindful business plan to be able to accelerate its revenue.

In such an attempt, BoomBikes aspires to understand the demand for shared bikes among the people. They have planned this to prepare themselves to cater to the people's needs once the situation gets better all around and stand out from other service providers and make huge profits.

They have contracted a consulting company to understand the factors on which the demand for these shared bikes depends. Specifically, they want to understand the factors affecting the demand for these shared bikes in the American market. The company wants to know:

Which variables are significant in predicting the demand for shared bikes.

How well those variables describe the bike demands

Based on various meteorological surveys and people's styles, the service provider firm has gathered a large dataset on daily bike demands across the American market based on some factors.

Business Goal:

You are required to model the demand for shared bikes with the available independent variables. It will be used by the management to understand how exactly the demands vary with different features. They can accordingly manipulate the business strategy to meet the demand levels and meet the customer's expectations. Further, the model will be a good way for management to understand the demand dynamics of a new market.

Data Preparation:

You can observe in the dataset that some of the variables like 'weathersit' and 'season' have values as 1, 2, 3, 4 which have specific labels associated with them (as can be seen in the data dictionary). These numeric values associated with the labels may indicate that there is some order to them - which is actually not the case (Check the data dictionary and think why). So, it is advisable to convert such feature values into categorical string values before proceeding with model building. Please refer the data dictionary to get a better understanding of all the independent variables.

You might notice the column 'yr' with two values 0 and 1 indicating the years 2018 and 2019 respectively. At the first instinct, you might think it is a good idea to drop this column as it only has two values so it might not be a value-add to the model. But in reality, since these bike-sharing systems are slowly gaining popularity, the demand for these bikes is increasing every year proving that the column 'yr' might be a good variable for prediction. So think twice before dropping it.

Model Building:

In the dataset provided, you will notice that there are three columns named 'casual', 'registered', and 'cnt'. The variable 'casual' indicates the number casual users who have made a rental. The variable 'registered' on the other hand shows the total number of registered users who have made a booking on a given day. Finally, the 'cnt' variable indicates the total number of bike rentals, including both casual and registered. The model should be built taking this 'cnt' as the target variable.

Model Evaluation:

When you're done with model building and residual analysis and have made predictions on the test set, just make sure you use the following two lines of code to calculate the R-squared score on the test set. python from sklearn.metrics import r2_score r2_score(y_test, y_pred) - where y_test is the test data set for the target variable, and y_pred is the variable containing the predicted values of the target variable on the test set. - Please perform this step as the R-squared score on the test set holds as a benchmark for your model.
n
Data and Rscripts from: An integrated experimental and mathematical approach...
data.niaid.nih.gov
datadryad.org
zip
Updated Apr 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barbara Joncour; William Nelson; Damie Pak; Ottar Bjornstad (2022). Data and Rscripts from: An integrated experimental and mathematical approach to inferring the role of food exploitation and interference interactions in shaping life history [Dataset]. http://doi.org/10.5061/dryad.1g1jwstzd
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.1g1jwstzd
Dataset updated
Apr 10, 2022
Dataset provided by
Queen's University
Pennsylvania State University
Authors
Barbara Joncour; William Nelson; Damie Pak; Ottar Bjornstad
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Intraspecific interactions can occur through many ways but the mechanisms can be broadly categorized as food exploitation and interference interactions. Identifying how intraspecific interactions impact life history is crucial to accurately predict how population density and structure influence dynamics. However, disentangling the effects of interference interactions from exploitation using experiments, is challenging for most biological systems. Here we propose an approach that combines experiments with modeling to infer the pathways of intraspecific interactions in a system. First, a consumer-resource model is built without intraspecific interactions. Then, the model is parameterized by fitting it to life-history data from a first experiment in which food abundance was varied. Next, hypothesized scenarios of intraspecific interactions are incorporated into the model which is then used to predict life histories with increasing competitor density. Lastly, model predictions are compared against data from a second experiment which raised groups of competitors of different densities. This comparison allows us to infer the role of interference and exploitation in shaping life history. We demonstrated the approach using the smaller tea tortrix Adoxophyes honmai across a range of temperature. We investigated five scenarios of interactions that included exploitation and three pathways for interference through some effects either on energetics to represent changes in ingestion or activity, or on mortality to model deadly interactions, or on mortality and ingestion to model cannibalism. Overall, intraspecific interactions in tea tortrix are best explained by a high level of deadly interactions along with some level of interference that acts on energy such as escaping and blocking access to food. Deadly interactions increase with temperature while interference that acts on energy is strongest close to the optimal temperature for reproduction. Interestingly, exploitation is more important than interference at low competitor density. The combination of mathematical modeling and experimentation allowed us to mechanistically characterize the intraspecific interactions in tea tortrix in a way that is readily incorporated into population-level mathematical models. The primary value of this approach, however, is that it can be applied to a much wider range of taxa than is possible with pure experimental approaches. Methods We designed an approach to infer the most likely pathways of intraspecific interactions that shape life histories in a studied system. The approach is in four steps that weave together theory and experiments. We demonstrated the approach with the smaller tea tortrix moth (Adoxophies honmai). Step 1. Build base model We first built the base model which is the baseline for the theoretical framework used later to predict how different pathways of intraspecific interactions influence life histories. The base model is a consumer-resource cohort model that assumes no intraspecific interactions – no food exploitation and no interference interactions. As such, the base model describes solely how vital rates are impacted by changes in food abundance. Step 2. Parameterize base model (provided R script: Step2.r) Most model parameters can be directly estimated from independent data but a few remained unknown. Unknown parameters were estimated by fitting the base model to the observed life-history traits in the food experiment (FoodExperiment.csv). The food experiment raised individuals in the absence of intraspecific interactions and exposed them to a wide range of food abundance. Step 3. Incorporate intraspecific interactions in base model to predict their effects on life histories (provided R script: Step3.r) In this step, the parameterized base model was modified to incorporate several hypothesized scenarios of intraspecific interactions. For each scenario, we predicted how intraspecific interactions impact life-history traits and stage-structure distributions for groups of competitors. Step 4. Test model predictions using experiment To evaluate the support for each hypothesis, we compared model predictions with data from the competition experiment (CompetitionExperiment.csv). The competition experiment measured the impact of intraspecific interactions (i.e. competitor density) on life-history traits and on the stage-structure of groups of competitors. The comparison of life-history data from experiment with model predictions allowed to infer the role of interference interactions and the one of food exploitation in shaping life histories, as well as the functional dependencies for interference interactions in the studied system.
a
GBM models, constructed from historical data
arcticdata.io
search.dataone.org
+1more
Updated May 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam M. Young (2018). GBM models, constructed from historical data [Dataset]. http://doi.org/10.18739/A22Z3K
Explore at:
Unique identifier
https://doi.org/10.18739/A22Z3K
Dataset updated
May 24, 2018
Dataset provided by
Arctic Data Center
Authors
Adam M. Young
Time period covered
Jan 1, 1950 - Dec 31, 2009
Area covered

Description
Datasets used in: Young, A.M., Higuera, P.E., Duffy, P.A., and F.S. Hu. Climatic thresholds shape northern high-latitude fire regimes and imply vulnerability to future climate change. In Review at Ecography as of 10/2015. ---------------------------------------------------------------------- ----------------------- Description ---------------------------------- ---------------------------------------------------------------------- These data are the raw results/output from the boosted regression tree modeling conducted in Young et al. (In Review). Results for each of the three models (AK, BOREAL, and TUNDRA) are located separate folders. Each folder contains 100 RData files which contain the output/results from running the 'gbm()' function in R v3.2.0 100 times. Each of the 100 gbms was built using a different subsample of available data. The 'gbm()' function is available in the 'gbm' package in R (https://cran.r-project.org/). Details regarding meta-parameter selection and model building can be found in Young et al. (In Review). ---------------------------------------------------------------------- ------------------------ File Naming --------------------------------- ---------------------------------------------------------------------- 'MODEL_gbm_xx.RData' 'MODEL_' - Three different sets of GBM models: 'AK', 'BOREAL', and 'TUNDRA' 'gbm' - Generalized boosting model '_xx' - Model number (1-100) '.RData' - File Extension
r
2016 SoE Built environment Water efficiency selected industries 2008-09 to...
researchdata.edu.au
Updated Jul 6, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of the Environment (2016). 2016 SoE Built environment Water efficiency selected industries 2008-09 to 2014-2015 [Dataset]. https://researchdata.edu.au/2016-soe-built-2014-2015/2987371
Explore at:
Dataset updated
Jul 6, 2016
Dataset provided by
data.gov.au
Authors
State of the Environment
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Water efficiency ($m IGVA per gigalitre), selected industries - Includes Manufacturing, and Commercial and services (including Construction and Transport), 2008-09 to 2014-15\r \r Data provided by ABS from: http://www.abs.gov.au/AUSSTATS/abs@.nsf/allprimarymainfeatures/49F854E3831E4294CA2580580015E2A6?opendocument\r \r Figure BLT46 in Built environment theme.\r https://soe.environment.gov.au/theme/built-environment/topic/2016/urban-environmental-efficiency-water-efficiency#built-environment-figure-BLT46\r
f
Comparison of the Predictive Performance and Interpretability of Random...
acs.figshare.com
figshare.com
zip
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley (2023). Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets [Dataset]. http://doi.org/10.1021/acs.jcim.6b00753.s006
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.6b00753.s006
Dataset updated
Jun 5, 2023
Dataset provided by
ACS Publications
Authors
Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The ability to interpret the predictions made by quantitative structure–activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package (https://r-forge.r-project.org/R/?group_id=1725) for the R statistical programming language and the Python program HeatMapWrapper [https://doi.org/10.5281/zenodo.495163] for heat map generation.
f
Decision tree inversion model results.
figshare.com
plos.figshare.com
xls
Updated Mar 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
He Jing; Wang Bin; He Jiachen (2025). Decision tree inversion model results. [Dataset]. http://doi.org/10.1371/journal.pone.0319657.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0319657.t005
Dataset updated
Mar 24, 2025
Dataset provided by
PLOS ONE
Authors
He Jing; Wang Bin; He Jiachen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As a key substance for crop photosynthesis, chlorophyll content is closely related to crop growth and health. Inversion of chlorophyll content using unmanned aerial vehicle (UAV) visible light images can provide a theoretical basis for crop growth monitoring and health diagnosis. We used rice at the tasseling stage as the research object and obtained UAV visible orthophotos of two experimental fields planted manually (experimental area A) and mechanically (experimental area B), respectively. We constructed 14 vegetation indices and 15 texture features and utilized the correlation coefficient method to analyze them comprehensively. Then, four vegetation indices and four texture features were selected from them as feature variables to be added into three models, namely, K-neighborhood (KNN), decision tree (DT), and AdaBoost, respectively, for inverting chlorophyll content in experimental areas A and B. In the KNN model, the inversion model built with BGRI as the independent variable in region A has the highest accuracy, with R2 of 0.666 and RSME of 0.79; the inversion model built with RGRI as the independent variable in region B has the highest accuracy, with R2 of 0.729 and RSME of 0.626. In the DT model, the inversion model built with B-variance as the independent variable in region A has the highest accuracy, with R2 of 0.840 and RSME of 0.464; the inversion model built with G-mean as the independent variable in region B has the highest accuracy, with R2 of 0.845 and RSME of 0.530. In the AdaBoost model, the inversion model built with R-skewness as the independent variable in region A has the highest accuracy, with R2 of 0.826 and RSME of 0.642; the inversion model established with g as the independent variable in area B had the highest accuracy, with R2 of 0.879 and RSME of 0.599. In the comprehensive analysis, the best inversion models for experimental areas A and B were B-variance-decision tree and g-AdaBoost, respectively, whose models can quickly and accurately carry out the inversion of chlorophyll content of rice, and provide a theoretical basis for the monitoring of the crop’s growth and health under different cultivation methods.
f
Vegetation index.
figshare.com
xls
Updated Mar 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
He Jing; Wang Bin; He Jiachen (2025). Vegetation index. [Dataset]. http://doi.org/10.1371/journal.pone.0319657.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0319657.t001
Dataset updated
Mar 24, 2025
Dataset provided by
PLOS ONE
Authors
He Jing; Wang Bin; He Jiachen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As a key substance for crop photosynthesis, chlorophyll content is closely related to crop growth and health. Inversion of chlorophyll content using unmanned aerial vehicle (UAV) visible light images can provide a theoretical basis for crop growth monitoring and health diagnosis. We used rice at the tasseling stage as the research object and obtained UAV visible orthophotos of two experimental fields planted manually (experimental area A) and mechanically (experimental area B), respectively. We constructed 14 vegetation indices and 15 texture features and utilized the correlation coefficient method to analyze them comprehensively. Then, four vegetation indices and four texture features were selected from them as feature variables to be added into three models, namely, K-neighborhood (KNN), decision tree (DT), and AdaBoost, respectively, for inverting chlorophyll content in experimental areas A and B. In the KNN model, the inversion model built with BGRI as the independent variable in region A has the highest accuracy, with R2 of 0.666 and RSME of 0.79; the inversion model built with RGRI as the independent variable in region B has the highest accuracy, with R2 of 0.729 and RSME of 0.626. In the DT model, the inversion model built with B-variance as the independent variable in region A has the highest accuracy, with R2 of 0.840 and RSME of 0.464; the inversion model built with G-mean as the independent variable in region B has the highest accuracy, with R2 of 0.845 and RSME of 0.530. In the AdaBoost model, the inversion model built with R-skewness as the independent variable in region A has the highest accuracy, with R2 of 0.826 and RSME of 0.642; the inversion model established with g as the independent variable in area B had the highest accuracy, with R2 of 0.879 and RSME of 0.599. In the comprehensive analysis, the best inversion models for experimental areas A and B were B-variance-decision tree and g-AdaBoost, respectively, whose models can quickly and accurately carry out the inversion of chlorophyll content of rice, and provide a theoretical basis for the monitoring of the crop’s growth and health under different cultivation methods.
Data from: Dataset for Vector space model and the usage patterns of...
figshare.com
bin
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gede Primahadi Wijaya Rajeg; Karlina Denistia; Simon Musgrave (2023). Dataset for Vector space model and the usage patterns of Indonesian denominal verbs [Dataset]. http://doi.org/10.6084/m9.figshare.8187155.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8187155.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Gede Primahadi Wijaya Rajeg; Karlina Denistia; Simon Musgrave
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PrefaceThis is the data repository for the paper accepted for publication in NUSA's special issue on Linguistic studies using large annotated corpora (co-edited by Hiroki Nomoto and David Moeljadi).How to cite the datasetIf you use, adapt, and/or modify any of the dataset in this repository for your research or teaching purposes (except for the malindo_dbase, see below), please cite as:Rajeg, Gede Primahadi Wijaya; Denistia, Karlina; Musgrave, Simon (2019): Dataset for Vector space model and the usage patterns of Indonesian denominal verbs. figshare. Fileset. https://doi.org/10.6084/m9.figshare.8187155.Alternatively, click on the dark pink Cite button to browse different citation style (default is DataCite).The malindo_dbase data in this repository is from Nomoto et al. (2018) (cf the GitHub repository). So please also cite their work if you use it for your research:Nomoto, Hiroki, Hannah Choi, David Moeljadi and Francis Bond. 2018. MALINDO Morph: Morphological dictionary and analyser for Malay/Indonesian. Kiyoaki Shirai (ed.) Proceedings of the LREC 2018 Workshop "The 13th Workshop on Asian Language Resources", 36-43.Tutorial on how to use the data together with the R Markdown Notebook for the analyses is available on GitHub and figshare:Rajeg, Gede Primahadi Wijaya; Denistia, Karlina; Musgrave, Simon (2019): R Markdown Notebook for Vector space model and the usage patterns of Indonesian denominal verbs. figshare. Software. doi: https://doi.org/10.6084/m9.figshare.9970205Dataset description1. Leipzig_w2v_vector_full.bin is the vector space model used in the paper. We built it using wordVectors package (Schmidt & Li 2017) via the MonARCH High Performance Computing Cluster (We thank Philip Chan for his help with access to MonARCH).2. Files beginning with ngramexmpl_... are data for the n-grams (i.e. words sequence) of verbs discussed in the paper. The files are in tab-separated format.3. Files beginning with sentence_... are full sentences for the verbs discussed in the paper (in the plain text format and R dataset format [.rds]). Information of the corpus file and sentence number in which the verb is found are included.4. me_parsed_nountaggedbase (in three different file-formats) contains database of the me- words with noun-tagged root that MorphInd identified to occur in three morphological schemas we focus on (me-, me-/-kan, and me-/-i). The database has columns for the verbs' token frequency in the corpus, root forms, MorphInd parsing output, among others.5. wordcount_leipzig_allcorpus (in three different file-formats) contains information on the size of each corpus file used in the paper and from which the vector space model is built.6. wordlist_leipzig_ME_DI_TER_percorpus.tsv is a tab-separated frequency list of words prefixed with me-, di-, and ter- in all thirteen corpus files used. The wordlist is built by first tokenising each corpus file, lowercasing the tokens, and then extracting the words with the corresponding three prefixes using the following regular expressions: - For me-: ^(?i)(me)([a-z-]{3,})$- For di-: ^(?i)(di)([a-z-]{3,})$- For ter-: ^(?i)(ter)([a-z-]{3,})$7. malindo_dbase is the MALINDO Morphological Dictionary (see above).ReferencesSchmidt, Ben & Jian Li. 2017. wordVectors: Tools for creating and analyzing vector-space models of texts. R package. http://github.com/bmschmidt/wordVectors.
r
National Public Toilet Map
researchdata.edu.au
Updated May 12, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health and Aged Care (2013). National Public Toilet Map [Dataset]. https://researchdata.edu.au/national-public-toilet-map/3514686
Explore at:
Dataset updated
May 12, 2013
Dataset provided by
data.gov.au
Authors
Department of Health and Aged Care
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
The National Public Toilet Map shows the location of more than 17,000 public\r and private public toilet facilities across Australia. Details of toilet\r facilities can also be found along major travel routes and for shorter\r journeys as well. Useful information is provided about each toilet, such as\r location, opening hours, availability of baby change rooms, accessibility for\r people with disabilities and the details of other nearby toilets.\r \r Licence\r \r To download the National Public Toilet Map dataset, you must agree to the\r following terms and conditions:\r \r These are the terms and conditions (the Terms) upon which the Commonwealth of\r Australia represented by the Department of Finance and Deregulation (ABN 61\r 970 632 495) of Canberra A.C.T. (the Commonwealth, us, we or our as the\r context requires) makes available to you the Database referred to below. Your\r right to access the Database for the Permitted Purpose is conditional upon you\r first agreeing to the Terms. You may not access the Database if you do not\r accept the Terms.\r \r By accessing the database you will be deemed to have accepted the Terms.\r \r By accepting the Terms you warrant to us that you are of legal age and have\r capacity to form a binding contract with the Commonwealth.\r \r Before continuing you should print or save a local copy of the Terms for your\r records.\r \r 1. Definitions\r \r Commencement Date\r \r means the date you accept the Terms (or are deemed to accept the Terms).\r \r Database\r \r means the database (known as the National Public Toilet Database) owned by and\r provided on behalf of the Commonwealth including any updates provided by or on\r behalf of the Commonwealth, that records some or all of the following details\r for public toilets in Australia:\r \r (a) toilet name;\r \r (b) address;\r \r (c) latitude and longitude;\r \r (d) general toilet features;\r \r (e) location;\r \r (f) accessibility;\r \r (g) opening hours;\r \r (h) additional features (e.g. showers, baby change facilities etc);\r \r (i) notes (e.g. coin operated showers etc).\r \r Derivative Product\r \r means any product or service that you may design or build, or have designed or\r built on your behalf, that includes or otherwise incorporates the Database (or\r part of the Database).\r \r Intellectual Property Rights\r \r (or IPRs) means all intellectual property rights, including but not\r limited to all rights existing or arising in respect of the Database whether\r or not such rights are registered or capable of being registered.\r \r Permitted Purpose\r \r means the right to:\r \r (a) use, adapt, reproduce, publish and communicate to the public the Database\r in any format (including any part of the Database); and\r \r (b) design and build, or have designed and built on your behalf, any\r Derivative Products.\r \r Terms\r \r means the terms and conditions of this licence.\r \r 2. Commencement\r \r 2.1 The Terms commence on the Commencement Date and continue unless and until\r terminated in accordance with clause 8.\r \r 3. Grant of Licence\r \r 3.1 We grant you a non-exclusive, perpetual, royalty free, non-transferable\r and world-wide licence to access the Database for the Permitted Purpose.\r \r 3.2 You may not sublicense your rights under these Terms to any person. If you\r require another person to access the Database for the Permitted Purpose\r (including a person you engage to design or build a Derivative Product on your\r behalf), that person must obtain a copy of the Database from the\r data.australia.gov.au website and comply with the Terms of this licence.\r \r 4. Exclusion of liability\r \r 4.1 You agree that most of the data and information contained in the Database\r is provided by organisations and other entities on a voluntary and ad hoc\r basis.\r \r 4.2 We provide the Database to you on the basis that:\r \r (a) no warranty is given that the Database is accurate, complete or fit for\r any particular purpose, including the Permitted Purpose; and\r \r (b) you are responsible for and you accept all risks arising in connection\r with your access and disclosure of the Database; and\r \r (c) we may cease to make the Database available (or cease to make updates to\r the Database) at any time.\r \r 5. Currency of data\r \r 5.1 If a purpose for which you access the Database is to assist people to\r locate or know the features of public toilets, you must update your copy of\r the Database and any Derivative Products as soon as reasonably practicable\r after an update to the Database is made available on the data.australia.gov.au\r website (or successor site).\r \r 6. Intellectual Property Rights\r \r 6.1 You acknowledge and agree that the Database represents the Commonwealthas\r exclusive property and that the Commonwealth owns any and all IPRs in the\r Database.\r \r 7. User Disclaimer\r \r 7.1 You must ensure that if and when you make any Derivative Product available\r to third parties that you do so on terms that ensure users they understand\r that we do not guarantee, and accept no risk in respect of, the accuracy,\r currency or completeness of the Derivative Product.\r \r 8. Termination\r \r 8.1 We may by notice immediately terminate these Terms if you breach any\r obligation contained in these Terms.\r \r 8.2 In the event of termination of these Terms, you may continue to exercise\r your licensed rights in respect of your Derivative Products in existence prior\r to the date of expiry or termination but you have no right to access the Data\r \r base for any other purpose (including to update your Derivative Products).\r \r 9. General Provisions\r \r 9.1 These Terms are governed by, and are to be construed in accordance with,\r the law of the Australian Capital Territory.\r \r 9.2 We may vary these terms at any time by reasonable notice to you.\r \r
f
Top 15 predictors for machine learning algorithms with a built-in importance...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca Craig-Schapiro; Max Kuhn; Chengjie Xiong; Eve H. Pickering; Jingxia Liu; Thomas P. Misko; Richard J. Perrin; Kelly R. Bales; Holly Soares; Anne M. Fagan; David M. Holtzman (2023). Top 15 predictors for machine learning algorithms with a built-in importance measure. [Dataset]. http://doi.org/10.1371/journal.pone.0018850.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0018850.t007
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Rebecca Craig-Schapiro; Max Kuhn; Chengjie Xiong; Eve H. Pickering; Jingxia Liu; Thomas P. Misko; Richard J. Perrin; Kelly R. Bales; Holly Soares; Anne M. Fagan; David M. Holtzman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ranking of the top 15 predictors for the four models with a built-in importance statistic demonstrates considerable overlap in the top predictors for each model. Furthermore, nearly all of the markers found to best discriminate CDR 0 from CDR>0 participants in the more targeted ROC analyses (Table 5) were also identified as the top predictors in the machine learning models, reconfirming their biomarker potential.
R code, data, and analysis documentation for Colour biases in learned...
figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wyatt Toure; Simon M. Reader (2023). R code, data, and analysis documentation for Colour biases in learned foraging preferences in Trinidadian guppies [Dataset]. http://doi.org/10.6084/m9.figshare.14404868.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14404868.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wyatt Toure; Simon M. Reader
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary---------------This is the repository containing the R code and data to produce the analyses and figures in the manuscript ‘Colour biases in learned foraging preferences in Trinidadian guppies’. R version 3.6.2 was used for this project. Here, we explain how to reproduce the results, provides the location of the metadata for the data sheets, and gives descriptions of the root directory contents and folder contents. This material is adapted from the README file of the project, README.md which is located in the root directory.How to reproduce the results-------------------------------------------This project uses the renv package from RStudio to manage package dependencies and ensure reproducibility through time. To ensure results are reproduced based on the versions of the packages used at the time this project was created, you will need to install renv using install.packages("renv") in R.If you want to reproduce the results it is best to download the entire repository onto your system. This can be done by clicking the Download button on the FigShare repository (DOI: 10.6084/m9.figshare.14404868). This will download a zip file of the entire repository. Unzip the zip file to get access to the project files.Once the repository is downloaded onto your system, navigate to the root directory and open guppy-colour-learning-project.Rproj. It is important to open the project using the .Rproj file to ensure the working directory is set correctly. Then install the package dependencies onto your system using renv::restore(). Running renv::restore() will install the correct versions of all the packages needed to reproduce our results. Packages are installed in a stand-alone library for this project and will not affect your installed R packages anywhere else.If you want to reproduce specific results from the analyses you can open either analysis-experiment-1.Rmd for results from experiment 1 or analysis-experiment-2.Rmd for results from experiment 2. Both are located in the root directory. You can select the Run All option under the Code option in the navbar of RStudio to execute all the code chunks. You can also run all chunks independently as well though we advise that you do so sequentially since variables necessary for the analysis are created as the script progresses.Metadata--------------Data are available in the data/ directory. - colour-learning-experiment-1-data.csv are the data for experiment 1- colour-learning-experiment-2-full-data.csv are the data for experiment 2We provide the variable descriptions for the data sets in the file metadata.md located in the data/ directory. The packages required to conduct the analyses and construct the website as well as their versions and citations are provided in the file required-r-packages.md.Directory structure---------------------------- - data/ contains the raw data used to conduct the analyses - docs/ contains the reader-friendly html write-up of the analyses, the GitHub pages site is built from this folder - R/ contains custom R functions used in the analysis - references/ contains reference information and formatting for citations used in the project - renv/ contains an activation script and configuration files for the renv package manager - figs/ contains the individual files for the figures and residual diagnostic plots produced by the analysis scripts. This directory is created and populated by running analysis-experiment-1.Rmd, analysis-experiment-2.Rmd and combined-figures.RmdRoot directory contents------------------------------------The root directory contains Rmd scripts used to conduct the analyses, create figures, and render the website pages. Below we describe the contents of these files as well as the additional files contained in the root directory. - analysis-experiment-1.Rmd is the R code and documentation for the experiment 1 data preparation and analysis. This script generates the Analysis 1 page of the website. - analysis-experiment-2.Rmd is the R code and documentation for the experiment 2 data preparation and analysis. This script generates the Analysis 2 page of the website. - protocols.Rmd contains the protocols used to conduct the experiments and generate the data. This script generates the Protocols page of the website. - index.Rmd creates the Homepage of the project site. - combined-figures.Rmd is the R code used to create figures that combine data from experiments 1 and 2. Not used in the project site. - treatment-object-side-assignment.Rmd is the R code used to assign treatments and object sides during trials for experiment 2. Not used in the project site. - renv.lock is a JSON formatted plain text file which contains package information for the project. renv will install the packages listed in this file upon executing renv::restore() - required-r-packages.md is a plain text file containing the versions and sources of the packages required for the project. - styles.css contains the CSS formatting for the rendered html pages - LICENSE.md contains the license indicating the conditions upon which the code can be reused - guppy-colour-learning-project.Rproj is the R project file which sets the working directory of the R instance to the root directory of this repository. If trying to run the code in this repository to reproduce results it is important to open R by clicking on this .Rproj file.
[Superseded] Intellectual Property Government Open Data 2019
researchdata.edu.au
data.gov.au
Updated Jun 6, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IP Australia (2019). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://researchdata.edu.au/superseded-intellectual-property-data-2019/2994670
Explore at:
Dataset updated
Jun 6, 2019
Dataset provided by
Data.govhttps://data.gov/
Authors
IP Australia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
What is IPGOD?\r

The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.\r \r \r

How do I use IPGOD?\r

IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.\r \r \r

IP Data Platform\r

IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform\r \r

References\r

\r The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.\r \r * Patents\r * Trade Marks\r * Designs\r * Plant Breeder’s Rights\r \r \r

Updates\r

\r

Tables and columns\r

\r Due to the changes in our systems, some tables have been affected.\r \r * We have added IPGOD 225 and IPGOD 325 to the dataset!\r * The IPGOD 206 table is not available this year.\r * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.\r \r

Data quality improvements\r

\r Data quality has been improved across all tables.\r \r * Null values are simply empty rather than '31/12/9999'.\r * All date columns are now in ISO format 'yyyy-mm-dd'.\r * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.\r * All tables are encoded in UTF-8.\r * All tables use the backslash \ as the escape character.\r * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
r
2016 SoE Built Environment Mean house to block size ratios by SA1, Adelaide...
researchdata.edu.au
data.gov.au
+2more
Updated Jan 19, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of the Environment (2017). 2016 SoE Built Environment Mean house to block size ratios by SA1, Adelaide urban centre locality - 1990 [Dataset]. https://researchdata.edu.au/2016-soe-built-locality-1990/2987431
Explore at:
Dataset updated
Jan 19, 2017
Dataset provided by
data.gov.au
Authors
State of the Environment
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Map and 1990 data from Australian Bureau of Statistics (ABS) Land Account: South Australia, Experimental Estimates, 2006-2011 (cat. No. 4609.4.55.001) http://www.abs.gov.au/ausstats/abs@.nsf/Latestproducts/4609.4.55.001Feature%20Article1002006%20-%202011?opendocument&tabname=Summary&prodno=4609.4.55.001&issue=2006%20-%202011&num=&view=\r \r Data tables list 2011 SA1s within the 2011 Adelaide Urban Centre/Locality with property area calculations for specified time periods (1990 - 2014), as well corresponding mean building size and mean building size to property area ratio based on 2014 Valuer-General data. The VG data is from a single point in time and contains information for properties as of 2014. (eg. including extensions/renovations or where subdivision has been undertaken, which would include any increases in floor area or decreases in land area that occurred as a result of these). The VG oversee valuations for State Government property transactions and the making and return of council rating valuations.\r The primary data items used include Land Use, Property Area, Equivalent Main Area and Year built.\r "Equivalent main area" is the total area under the main roof excluding area of eg. carports, garages, verandahs etc. rather than the building footprint. \r Only landuse codes with a classification of "Private" were included in the analysis. State Office Land Use Classification 2007 was used to analyse Valuer General data.\r Only records classified as Single Unit Houses (land use code 1100-1119) were included in the analysis (80% of VG records), but this will include multistory houses. Because the area calculation for multistory houses could exceed the size of the land parcel, a "MeanAreaRatio_adj field has been included - Where the Pre1980_MeanAreaRatio for a property is greater than 100.0, then this has been adjusted down to 100.0 The analysis was restricted to Private Single Unit Houses (excluding multiple unit dwellings and commercial buildings) in an effort to enable the data to be used to understand residential backyard sizes and how these have changed over time. Note that although the figure heading notes a year range of 1990-2014 the original output dataset for Statistical Area Level 1 s(SA1) included records with a null year field (326 records) as well as records dating back to 1838.\r \r Map relates to Figure BLT24 in the Built environment theme of the 2016 State of the Environment Report, available at http://www.soe.environment.gov.au \r \r The map service can be viewed at http://soe.terria.io/#share=s-vLsXubDrmmC2at735prq96fFQzB\r \r Downloadable spatial data also available below.
r
2016 SoE Built environment Water consumption in manufacturing, by state and...
researchdata.edu.au
Updated Nov 23, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of the Environment (2016). 2016 SoE Built environment Water consumption in manufacturing, by state and territory, 2008-09 to 2014-15 [Dataset]. https://researchdata.edu.au/2016-soe-built-2014-15/3518178
Explore at:
Dataset updated
Nov 23, 2016
Dataset provided by
data.gov.au
Authors
State of the Environment
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This data was sourced from the Australian Bureau of Statistics. For more information please see ABS (2016). Water account, Australia, 2014-2015. Cat. no. 4610.0. Canberra. This is available at http://www.abs.gov.au/ausstats/abs@.nsf/PrimaryMainFeatures/4610.0?OpenDocument.\r \r Data used to produce Figure BLT13 in Built environment, SoE 2016. See https://soe.environment.gov.au/theme/built-environment/topic/2016/increased-consumption#built-environment-figure-BLT13\r \r

Facebook

Twitter

Click to copy link

Link copied

Cite

Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24728073.v1

Dataset updated

Dec 4, 2023

Dataset provided by

figshare

Authors

Kingsley Okoye; Samira Hosseini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Clear search

Close search

Google apps

Main menu

Collection of example datasets used for the book - R Programming -...

Data_Sheet_6_“R” U ready?: a case study using R to analyze changes in gene...

Data from: spectre: An R package to estimate spatially-explicit community...

funspace: an R package to build, analyze and plot functional trait spaces

2016 SoE Built environment Public transport by capital city 1990 to 2014

Lost in the Code?

Data from: Bike Sharing Dataset

Problem Statement:

Business Goal:

Data Preparation:

Model Building:

Model Evaluation:

Data and Rscripts from: An integrated experimental and mathematical approach...

GBM models, constructed from historical data

2016 SoE Built environment Water efficiency selected industries 2008-09 to...

Comparison of the Predictive Performance and Interpretability of Random...

Decision tree inversion model results.

Vegetation index.

Data from: Dataset for Vector space model and the usage patterns of...

National Public Toilet Map

Top 15 predictors for machine learning algorithms with a built-in importance...

R code, data, and analysis documentation for Colour biases in learned...

[Superseded] Intellectual Property Government Open Data 2019

What is IPGOD?\r

How do I use IPGOD?\r

IP Data Platform\r

References\r

Updates\r

Tables and columns\r

Data quality improvements\r

2016 SoE Built Environment Mean house to block size ratios by SA1, Adelaide...

2016 SoE Built environment Water consumption in manufacturing, by state and...

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research