Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains fiber R object, stored as an RDS file for use in R package for analysis of long-read sequencing.
The data represent web-scraping of hyperlinks from a selection of environmental stewardship organizations that were identified in the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017). There are two data sets: 1) the original scrape containing all hyperlinks within the websites and associated attribute values (see "README" file); 2) a cleaned and reduced dataset formatted for network analysis. For dataset 1: Organizations were selected from from the 2017 NYC Stewardship Mapping and Assessment Project (STEW-MAP) (USDA 2017), a publicly available, spatial data set about environmental stewardship organizations working in New York City, USA (N = 719). To create a smaller and more manageable sample to analyze, all organizations that intersected (i.e., worked entirely within or overlapped) the NYC borough of Staten Island were selected for a geographically bounded sample. Only organizations with working websites and that the web scraper could access were retained for the study (n = 78). The websites were scraped between 09 and 17 June 2020 to a maximum search depth of ten using the snaWeb package (version 1.0.1, Stockton 2020) in the R computational language environment (R Core Team 2020). For dataset 2: The complete scrape results were cleaned, reduced, and formatted as a standard edge-array (node1, node2, edge attribute) for network analysis. See "READ ME" file for further details. References: R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Version 4.0.3. Stockton, T. (2020). snaWeb Package: An R package for finding and building social networks for a website, version 1.0.1. USDA Forest Service. (2017). Stewardship Mapping and Assessment Project (STEW-MAP). New York City Data Set. Available online at https://www.nrs.fs.fed.us/STEW-MAP/data/. This dataset is associated with the following publication: Sayles, J., R. Furey, and M. Ten Brink. How deep to dig: effects of web-scraping search depth on hyperlink network analysis of environmental stewardship organizations. Applied Network Science. Springer Nature, New York, NY, 7: 36, (2022).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies.
Methods
This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies"
Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005
For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub.
The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub.
The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd
file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results.
Sequence_Analysis.Rmd
has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd
and Figures.Rmd
. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program.
To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper.
Using Identifying_Recombinant_Reads.Rmd
, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd.
Figures.Rmd
used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1
User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017.
Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files.
Each dataframe contains 55 columns:
Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions).
Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping).
Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively.
Columns 4 to 55 contain the process variables; the column names retain the original variable names.
This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms.
The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission.
In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights.
Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law.
When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work.
This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A csv file containing wetland plant flowering phenology data, pooled from observations in 2019 and 2020 at wetlands in Central New York, arranged in a numeric week format able to be read by R and produce a visual graphic using the 'geom_line' function in package ggplot2.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of income and program participation (sipp) with r if the census bureau's budget was gutted and only one complex sample survey survived, pray it's the survey of income and program participation (sipp). it's giant. it's rich with variables. it's monthly. it follows households over three, four, now five year panels. the congressional budget office uses it for their health insurance simulation . analysts read that sipp has person-month files, get scurred, and retreat to inferior options. the american community survey may be the mount everest of survey data, but sipp is most certainly the amazon. questions swing wild and free through the jungle canopy i mean core data dictionary. legend has it that there are still species of topical module variables that scientists like you have yet to analyze. ponce de león would've loved it here. ponce. what a name. what a guy. the sipp 2008 panel data started from a sample of 105,663 individuals in 42,030 households. once the sample gets drawn, the census bureau surveys one-fourth of the respondents every four months, over f our or five years (panel durations vary). you absolutely must read and understand pdf pages 3, 4, and 5 of this document before starting any analysis (start at the header 'waves and rotation groups'). if you don't comprehend what's going on, try their survey design tutorial. since sipp collects information from respondents regarding every month over the duration of the panel, you'll need to be hyper-aware of whether you want your results to be point-in-time, annualized, or specific to some other period. the analysis scripts below provide examples of each. at every four-month interview point, every respondent answers every core question for the previous four months. after that, wave-specific addenda (called topical modules) get asked, but generally only regarding a single prior month. to repeat: core wave files contain four records per person, topical modules contain one. if you stacked every core wave, you would have one record per person per month for the duration o f the panel. mmmassive. ~100,000 respondents x 12 months x ~4 years. have an analysis plan before you start writing code so you extract exactly what you need, nothing more. better yet, modify something of mine. cool? this new github repository contains eight, you read me, eight scripts: 1996 panel - download and create database.R 2001 panel - download and create database.R 2004 panel - download and create database.R 2008 panel - download and create database.R since some variables are character strings in one file and integers in anoth er, initiate an r function to harmonize variable class inconsistencies in the sas importation scripts properly handle the parentheses seen in a few of the sas importation scripts, because the SAScii package currently does not create an rsqlite database, initiate a variant of the read.SAScii
function that imports ascii data directly into a sql database (.db) download each microdata file - weights, topical modules, everything - then read 'em into sql 2008 panel - full year analysis examples.R< br /> define which waves and specific variables to pull into ram, based on the year chosen loop through each of twelve months, constructing a single-year temporary table inside the database read that twelve-month file into working memory, then save it for faster loading later if you like read the main and replicate weights columns into working memory too, merge everything construct a few annualized and demographic columns using all twelve months' worth of information construct a replicate-weighted complex sample design with a fay's adjustment factor of one-half, again save it for faster loading later, only if you're so inclined reproduce census-publish ed statistics, not precisely (due to topcoding described here on pdf page 19) 2008 panel - point-in-time analysis examples.R define which wave(s) and specific variables to pull into ram, based on the calendar month chosen read that interview point (srefmon)- or calendar month (rhcalmn)-based file into working memory read the topical module and replicate weights files into working memory too, merge it like you mean it construct a few new, exciting variables using both core and topical module questions construct a replicate-weighted complex sample design with a fay's adjustment factor of one-half reproduce census-published statistics, not exactly cuz the authors of this brief used the generalized variance formula (gvf) to calculate the margin of error - see pdf page 4 for more detail - the friendly statisticians at census recommend using the replicate weights whenever possible. oh hayy, now it is. 2008 panel - median value of household assets.R define which wave(s) and spe cific variables to pull into ram, based on the topical module chosen read the topical module and replicate weights files into working memory too, merge once again construct a replicate-weighted complex sample design with a...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Brazil Imports: NCM: fob: Discs F/Laser Read.Syst.Which May Be Record.Once(Cd-R) data was reported at 1.129 USD mn in Jan 2007. This records a decrease from the previous number of 7.819 USD mn for Dec 2006. Brazil Imports: NCM: fob: Discs F/Laser Read.Syst.Which May Be Record.Once(Cd-R) data is updated monthly, averaging 1.129 USD mn from Jan 2004 (Median) to Jan 2007, with 37 observations. The data reached an all-time high of 8.012 USD mn in Oct 2006 and a record low of 0.102 USD mn in Jan 2004. Brazil Imports: NCM: fob: Discs F/Laser Read.Syst.Which May Be Record.Once(Cd-R) data remains active status in CEIC and is reported by Special Secretariat for Foreign Trade and International Affairs. The data is categorized under Brazil Premium Database’s Foreign Trade – Table BR.NCM: HS85: Electrical Machinery and Equipment and Parts Thereof; Others: Imports: Value.
This dataset was created by SuperAI_2021
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
R script for reading hdf5 files
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
R scripts to accompany the publication listed below. The methodology and results demonstrate for the first time the feasibility of coalescence and near-perfect simulation for fairly general continuous distributions. This constitutes a quantum leap in MCMC convergence testing by decoupling the convergence error from experimental error and providing, for a given set of starting points, an exact number of iterations in which coalescence occurs.
This is Census 2020 block data specifically formatted for use by the Environmental Protection Agency (EPA) in-development Environmental Justice Analysis Multisite (EJAM) tool, which uses R code to find which block centroids are within X miles of each specified point (e.g., regulated facility), and to find those distances. The datasets have latitude and longitude of each block's internal point, as provided by Census Bureau, and the FIPS code of the block and its parent block group. The datasets also include a weight for each block, representing this block's Census 2020 population count as a fraction of the count for the parent block group overall, for use in estimating how much of a given block group is within X miles of a specified point or inside a polygon of interest. The datasets also have an effective radius of each block, which is what the radius would be in miles if the block covered the same area in square miles but were circular. The datasets also have coordinates in units that facilitate building a quadtree index of locations. They are in R data.table format, saved as .rda or .arrow files to be read by R code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files are the migrate input created using the R files to read in vcf and read out in migrate-n useable format.
Buy Kioxia KPM6XRUG15T3 PM6-R Series 15.36TB SAS 24Gb/s Read Intensive BiCS Flash TLC (SIE / PLP) U.2 2.5-inch So at serverevolution.com and avail 10% discount.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of tree (.tre) files and R code for running generalized Robinson-Foulds distance (Smith, 2020a;b) analysis.
The .tre files can be read into R (R Core Team., 2023) using the ape::read.tree function (Paradis et al., 2003), full details in R code file.
Paradis, E., Claude, J., & Strimmer, K. (2004). APE: analyses of phylogenetics and evolution in R language. Bioinformatics, 20(2), 289-290.
R Core Team. (2023). R: A Language and Environment for Statistical Computing. (Version 4.2.2). R Foundation for Statistical Computing, Vienna, Austria: https://www.R-project.org/.
Smith, M. R. (2020a). Information theoretic generalized Robinson–Foulds metrics for comparing phylogenetic trees. Bioinformatics, 36(20), 5007-5013. https://doi.org/10.1093/bioinformatics/btaa614
Smith, M. R. (2020b). TreeDist: distances between phylogenetic trees. R package version 2.7.0. doi:10.5281/zenodo.3528124.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description file for the provided code to re-run the statistical analyses
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the consumer expenditure survey (ce) with r the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories. how else are you gonna know that the average american household spent $34 (±2) on bacon, $826 (±17) on cellular phones, and $13 (±2) on digital e-readers in 2011? an integral component of the market basket calculation in the consumer price index, this survey recently became available as public-use microdata and they're slowly releasing historical files back to 1996. hooray! for a t aste of what's possible with ce data, look at the quick tables listed on their main page - these tables contain approximately a bazillion different expenditure categories broken down by demographic groups. guess what? i just learned that americans living in households with $5,000 to $9,999 of annual income spent an average of $283 (±90) on pets, toys, hobbies, and playground equipment (pdf page 3). you can often get close to your statistic of interest from these web tables. but say you wanted to look at domestic pet expenditure among only households with children between 12 and 17 years old. another one of the thirteen web tables - the consumer unit composition table - shows a few different breakouts of households with kids, but none matching that exact population of interest. the bureau of labor statistics (bls) (the survey's designers) and the census bureau (the survey's administrators) have provided plenty of the major statistics and breakouts for you, but they're not psychic. if you want to comb through this data for specific expenditure categories broken out by a you-defined segment of the united states' population, then let a little r into your life. fun starts now. fair warning: only analyze t he consumer expenditure survey if you are nerd to the core. the microdata ship with two different survey types (interview and diary), each containing five or six quarterly table formats that need to be stacked, merged, and manipulated prior to a methodologically-correct analysis. the scripts in this repository contain examples to prepare 'em all, just be advised that magnificent data like this will never be no-assembly-required. the folks at bls have posted an excellent summary of what's av ailable - read it before anything else. after that, read the getting started guide. don't skim. a few of the descriptions below refer to sas programs provided by the bureau of labor statistics. you'll find these in the C:\My Directory\CES\2011\docs directory after you run the download program. this new github repository contains three scripts: 2010-2011 - download all microdata.R lo op through every year and download every file hosted on the bls's ce ftp site import each of the comma-separated value files into r with read.csv depending on user-settings, save each table as an r data file (.rda) or stat a-readable file (.dta) 2011 fmly intrvw - analysis examples.R load the r data files (.rda) necessary to create the 'fmly' table shown in the ce macros program documentation.doc file construct that 'fmly' table, using five quarters of interviews (q1 2011 thru q1 2012) initiate a replicate-weighted survey design object perform some lovely li'l analysis examples replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using unimputed variables replicate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t -tests using unimputed variables create an rsqlite database (to minimize ram usage) containing the five imputed variable files, after identifying which variables were imputed based on pdf page 3 of the user's guide to income imputation initiate a replicate-weighted, database-backed, multiply-imputed survey design object perform a few additional analyses that highlight the modified syntax required for multiply-imputed survey designs replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using imputed variables repl icate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t-tests using imputed variables replicate the %proc_reg() and %proc_logistic() macros found in "ce macros.sas" and provide some examples of regressions and logistic regressions using both unimputed and imputed variables replicate integrated mean and se.R match each step in the bls-provided sas program "integr ated mean and se.sas" but with r instead of sas create an rsqlite database when the expenditure table gets too large for older computers to handle in ram export a table "2011 integrated mean and se.csv" that exactly matches the contents of the sas-produced "2011 integrated mean and se.lst" text file click here to view these three scripts for...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive contains the source code and data required for reproducing the analyses and figures from the manuscript "A lifetime of experiences: Modeling habitat quality through cumulative effects on individual survival", accepted for publication by the journal Methods in Ecology and Evolution on 2025-04-07. Code and data are included to:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An Open Context "predicates" dataset item. Open Context publishes structured data as granular, URL identified Web resources. This "Variables" record is part of the "Pyla-Koutsopetria Archaeological Project I: Pedestrian Survey" data publication.
The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
Receptor impact models (RIMs) use inputs from surface water and groundwater models. For a given node, there is a value for each combination of hydrological response variable, future, and replicate or run number. RIMs are developed for specific landscape classes. The hydrological response variables that a RIM within a landscape class requires are organised by the R script RIM_Prediction_CreateArray.R into an array. The formatted data is available as an R data file format called RDS and can be read directly into R. The R script IMIA_XXX_RIM_predictions.R applies the receptor model functions (RDS object as part of Data set 1: Ecological expert elicitation and receptor impact models for the XXX subregion) to the HRV array for each landscape class (or landscape group) to make predictions of receptor impact varibles (RIVs). Predictions of a receptor impact from a RIM for a landscape class are summarised at relevant AUIDs by the 5th through to the 95th percentiles (in 5% increments) for baseline and CRDP futures. These are available in the XXX_RIV_quantiles_IMIA.csv data set. RIV predictions are further summarised and compared as boxplots (using the R script boxplotsbyfutureperiod.R) and as (aggregated) spatial risk maps using GIS.
Receptor impact models (RIMs) are developed for specific landscape classes. The hydrological response variables that a RIM within a landscape class requires are organised by the R script RIM_Prediction_CreateArray.R into an array. The formatted data is available as an R data file format called RDS and can be read directly into R.
The R script IMIA_XXX_RIM_predictions.R applies the receptor model functions (RDS object as part of Data set 1: Ecological expert elicitation and receptor impact models for the XXX subregion) to the HRV array for each landscape class (or landscape group) to make predictions of receptor impact varibles (RIVs). Predictions of a receptor impact from a RIM for a landscape class are summarised at relevant AUIDs by the 5th through to the 95th percentiles (in 5% increments) for baseline and CRDP futures. These are available in the XXX_RIV_quantiles_IMIA.csv data set. RIV predictions are further summarised and compared as boxplots (using the R script boxplotsbyfutureperiod.R) and as (aggregated) spatial risk maps using GIS.
Bioregional Assessment Programme (XXXX) GLO Ecological expert elicitation and receptor impact models v01. Bioregional Assessment Derived Dataset. Viewed 12 July 2018, http://data.bioregionalassessments.gov.au/dataset/76fb9d24-b8db-4251-b944-f69f983507ff.
Derived From Groundwater Dependent Ecosystems supplied by the NSW Office of Water on 13/05/2014
Derived From Greater Hunter Native Vegetation Mapping with Classification for Mapping
Derived From BA ALL mean annual flow for NSW - Choudhury implementation of Budyko runoff v01
Derived From Bioregional Assessment areas v06
Derived From Bioregional Assessment areas v04
Derived From Bioregional Assessment areas v02
Derived From Gippsland Project boundary
Derived From Natural Resource Management (NRM) Regions 2010
Derived From GLO subregion boundaries for Impact and Risk Analysis 20160712 v01
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Derived From Bioregional_Assessment_Programme_Catchment Scale Land Use of Australia - 2014
Derived From GEODATA TOPO 250K Series 3
Derived From Australian Geological Provinces, v02
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Geological Provinces - Full Extent
Derived From GLO Preliminary Assessment Extent
Derived From Australian Coal Basins
Derived From Gloucester River Types v01
Derived From Bioregional Assessment areas v03
Derived From Bioregional Assessment areas v05
Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012
Derived From GLO Landscape Classification 20161219 v05
Derived From Gloucester river types V02
Derived From Gloucester Coal Basin
Derived From Greater Hunter Native Vegetation Mapping
Derived From Mean Annual Climate Data of Australia 1981 to 2012
Derived From Subcatchment boundaries within and nearby the Gloucester subregion
Derived From Bioregional Assessment areas v01
Derived From Geofabric Hydrology Reporting Catchments - V2.1
Derived From Victoria - Seamless Geology 2014
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains fiber R object, stored as an RDS file for use in R package for analysis of long-read sequencing.