Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset generated by our Microwell-seq 3.0 technique.
Files: In order to save space, we've packaged our data into tar.gz format. Please unzip the files once you've successfully downloaded. RNA_WT_RData.tar.gz: Seurat object along with a metadata including cell barcodes, tissue source & cell type annotation, could be loaded into R environment and used directly. RNA_Tumor_RData.tar.gz: Seurat object along with a metadata including cell barcodes, tissue source, cell type annotation & potential cell state prediction(neoplastic, intermediate & non-neoplastic), could be loaded into R environment and used directly. RNA_WT_Dge.tar.gz: Digital Expression data (in .csv format) generated by Drop-seq tools, with batch effect removed by customed scripts. RNA_Tumor_Dge.tar.gz : Digital Expression data(in .csv format) generated by Drop-seq tools, with batch effect removed by customed scripts. ATAC_WT_SparseMatrix.tar.gz: scATAC-seq data in 10X-like format(matrix.mtx, barcodes.csv, features.csv), along with a metadata including cell barcodes, tissue source & cell type annotation. ATAC_Tumor_SparseMatrix.tar.gz: scATAC-seq data in 10X-like format(matrix.mtx, barcodes.csv, features.csv), along with a metadata including cell barcodes, tissue source, cell type annotation & potential cell state prediction(neoplastic, intermediate & non-neoplastic).
Easements RESULTING OF THE APPLICATION OF ARTICLES L. 515-8 to L. 515-12 OF THE ENVIRONMENT CODE
1/Servitudes which may be established, under Article L. 515-8, within a perimeter defined around an installation classified for the protection of the environment (ICPE), which is liable to create, by the danger of explosion or the emanation of harmful products, very significant risks to the health or safety of neighbouring populations and to the environment (installations subject to authorisation with easements, referenced AS in the nomenclature of ICPE annexed to Article R. 511-9 of the Environment Code).
These easements may include: — Prohibition or limitation on the right to set up buildings or structures, as well as the right to develop campgrounds or caravan parking, — making building permits subject to technical requirements aimed at limiting the danger of exposure to explosions or concerning the insulation of buildings from toxic fumes, — limitation of the number of employees employed in industrial and commercial facilities which would be created later.
2/Servitudes which may be established under Article L. 515-12: — on land polluted by the operation of an installation, — on the right-of-way at waste disposal sites or in a 200-metre strip around the area of operation, — or on the right-of-way of or around former quarries on surfaces the integrity of which requires respect for public safety and public health.
In addition to the prohibitions and requirements listed in (a), these easements may include: — prohibition or limitation of changes in the condition of the soil or subsoil, — limitation of the use of soil, subsoil and groundwater, — subordination of these uses to the implementation of specific requirements, — implementation of site monitoring requirements.
The generator of a public easement is a geographical entity whose nature or function induced, by virtue of regulations, constraints on the way the land is occupied on the surrounding land. The disappearance or destruction on the site of the generator does not result in the removal of the easement(s) associated with it. Only a new act of annulment or repeal by the competent authority may legally remove the effects of the easement(s) in question.
Simplify your research data collection with the help of the research data repository managed by the Terrestrial Ecosystem Research Network. Our collection of ecosystem data includes ecoacustics, bio acoustics, lead area index information and much more.
The TERN research data collection provides analysis-ready environment data that facilitates a wide range of ecological research projects undertaken by established and emerging scientists from Australia and around the world. The resources which we provide support scientific investigation in a wide array of environment and climate research fields along with decision-making initiatives.
Open access ecosystem data collections via the TERN Data Discovery Portal and sub-portals:
Access all TERN Environment Data
Discover datasets published by TERN’s observing platforms and collaborators. Search geographically, then browse, query and extract the data via the TERN Data Discovery Portal.
Search EcoPlots data
Search, integrate and access Australia’s plot-based ecology survey data.
Download ausplotsR
Extract, prepare, visualise and analyse TERN Ecosystem Surveillance monitoring data in R.
Search EcoImages
Search and download Leaf Area Index (LAI), Phenocam and Photopoint images.
Tools that support the discovery, anaylsis and re-use of data:
Visualise the data
We’ve teamed up with ANU to provide 50 landscape and ecosystem datasets presented graphically.
Access CoESRA Virtual Desktop
A virtual desktop environment that enables users to create, execute and share environmental data simulations.
Submit data with SHaRED
Our user friendly tool to upload your data securely to our environment database so you can contribute to Australia’s ecological research.
The Soil and Landscape Grid of Australia provides relevant, consistent, comprehensive, nation-wide data in an easily-accessible format. It provides detailed digital maps of the country’s soil and landscape attributes at a finer resolution than ever before in Australia.
The annual Australia’s Environment products summarise a large amount of observations on the trajectory of our natural resources and ecosystems. Use the data explorer to view and download maps, accounts or charts by region and land use type. The website also has national summary reports and report cards for different types of administrative and geographical regions.
TERN’s ausplotsR is an R Studio package for extracting, preparing, visualising and analysing TERN’s Ecosystem Surveillance monitoring data. Users can use the package to directly access plot-based data on vegetation and soils across Australia, with simple function calls to extract the data and merge them into species occurrence matrices for analysis or to calculate things like basal area and fractional cover.
The Australian Cosmic-Ray Neutron Soil Moisture Monitoring Network (CosmOz) delivers soil moisture data for 16 sites over an area of about 30 hectares to depths in the soil of between 10 to 50 cm. In 2020, the CosmOz soil moisture network, which is led by CSIRO, is set to be expanded to 23 sites.
The TERN Mangrove Data Portal provides a diverse range of historical and contemporary remotely-sensed datasets on extent and change of mangrove ecosystems across Australia. It includes multi-scale field measurements of mangrove floristics, structure and biomass, a diverse range of airborne imagery collected since the 1950s, and multispectral and hyperspectral imagery captured by drones, aircraft and satellites.
The TERN Wetlands and Riparian Zones Data Portal provides access to relevant national to local remotely-sensed datasets and also facilitates the collation and collection of on-ground data that support validation.
ecocloud provides easy access to large volumes of curated ecosystem science data and tools, a computing platform and resources and tools for innovative research. ecocloud gives you 10GB of persistent storage to keep your code/notebooks so they are ready to go when you start up a server (R or Python environment). It uses the JupyterLabs interface, which includes connections to GitHub, Google Drive and Dropbox.
Our research data collection makes it easier for scientists and researchers to investigate and answer their questions by providing them with open data, research and management tools, infrastructure, and site-based research tools.
The TERN data portal provides open access ecosystem data. Our tools support data discovery, analysis, and re-use. The services which we provide facilitate research, education, and management. We maintain a network of monitoring site and sensor data streams for long-term research as part of our research data repository.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Species characterization by environmental DNA (eDNA) is a method that allows the use of DNA released into the environment by organisms from various sources (secretions, faeces, gametes, tissues, etc.). It is a complementary tool to standard sampling methods for the identification of biodiversity. This project provides a list of fish and marine mammal species whose DNA has been detected in water samples collected between 2019 and 2021 using the mitochondrial marker MiFish (12S). The surveys were carried out in the summer of 2019 (July 14-18) and (July 30 - August 5), in the fall of 2020 (October 27-28) and in the summer-fall of 2021 (May 31 - June 3 ) and (August 24-25) between Forestville and Godbout (Haute-Côte-Nord). Sampling was carried out between 1-50 meters depth in 91 stations, with 1 to 3 replicates per station. Two liters of water were filtered through a 1.2 µm fiberglass filter. DNA extractions were performed with the DNeasy Blood and Tissues or PowerWater extraction kit (Qiagen). Negative field, extraction and PCR controls were added at the different stages of the protocol. The libraries were prepared either by Génome Québec (2019, 2020) or by the Genomics Laboratory of the Maurice-Lamontagne Institute (2021), then sequenced on a NovaSeq 4000 PE250 system by Génome Québec. The bioinformatics analysis of the sequences obtained was carried out using an analysis pipeline developed in the genomics laboratory. A first step made it possible to obtain a table of molecular operational taxonomic units (MOTU) using the cutadapt software for the removal of the adapters and the R package DADA2 for the filtration, the fusion, removal of chimeras and compilation of data. The MOTUs table was then corrected using the R package metabaR to eliminate the tag-jumping and take contaminants into consideration. Samples showing a strong presence of contaminating MOTUs were removed from the dataset. The MOTUs were also filtered to remove all remaining adapter sequences and also retain only those of the expected size (around 170 bp). Finally, taxonomic assignments were made on the MOTUs using the BLAST+ program and the NCBI-nt database. Taxonomic levels (species, genus or family) were assigned using a best match method (Top hit), with a threshold of 95%. Only assignments at the level of fish and marine mammals were considered, and the taxa detected were compared to a list of regional species, and corrected if necessary. The species detections of the different replicas have been combined. The file provided includes generic activity information, including site, station name, date, marker type, assignment types used for taxa identification, and a list of taxa or species. The list of taxa has been verified by a biodiversity expert from the Maurice-Lamontagne Institute. This project was funded by Fisheries and Oceans Canada's Coastal Environmental Baseline Data Program under the Oceans Protection Plan. This initiative aims to acquire baseline environmental data that contributes to the characterization of significant coastal areas and supports evidence-based assessments and management decisions to preserve marine ecosystems. Data were also published on SLGO platform : https://doi.org/10.26071/ogsl-2239bca5-c24a
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Standardized data on large-scale and long-term patterns of species richness are critical for understanding the consequences of natural and anthropogenic changes in the environment. The North American Breeding Bird Survey (BBS) is one of the largest and most widely used sources of such data, but so far, little is known about the degree to which BBS data provide accurate estimates of regional richness. Here we test this question by comparing estimates of regional richness based on BBS data with spatially and temporally matched estimates based on state Breeding Bird Atlases (BBA). We expected that estimates based on BBA data would provide a more complete (and therefore, more accurate) representation of regional richness due to their larger number of observation units and higher sampling effort within the observation units. Our results were only partially consistent with these predictions: while estimates of regional richness based on BBA data were higher than those based on BBS data, estimates of local richness (number of species per observation unit) were higher in BBS data. The latter result is attributed to higher land-cover heterogeneity in BBS units and higher effectiveness of bird detection (more species are detected per unit time). Interestingly, estimates of regional richness based on BBA blocks were higher than those based on BBS data even when differences in the number of observation units were controlled for. Our analysis indicates that this difference was due to higher compositional turnover between BBA units, probably due to larger differences in habitat conditions between BBA units and a larger number of geographically restricted species. Our overall results indicate that estimates of regional richness based on BBS data suffer from incomplete detection of a large number of rare species, and that corrections of these estimates based on standard extrapolation techniques are not sufficient to remove this bias. Future applications of BBS data in ecology and conservation, and in particular, applications in which the representation of rare species is important (e.g., those focusing on biodiversity conservation), should be aware of this bias, and should integrate BBA data whenever possible.
Methods Overview
This is a compilation of second-generation breeding bird atlas data and corresponding breeding bird survey data. This contains presence-absence breeding bird observations in 5 U.S. states: MA, MI, NY, PA, VT, sampling effort per sampling unit, geographic location of sampling units, and environmental variables per sampling unit: elevation and elevation range from (from SRTM), mean annual precipitation & mean summer temperature (from PRISM), and NLCD 2006 land-use data.
Each row contains all observations per sampling unit, with additional tables containing information on sampling effort impact on richness, a rareness table of species per dataset, and two summary tables for both bird diversity and environmental variables.
The methods for compilation are contained in the supplementary information of the manuscript but also here:
Bird data
For BBA data, shapefiles for blocks and the data on species presences and sampling effort in blocks were received from the atlas coordinators. For BBS data, shapefiles for routes and raw species data were obtained from the Patuxent Wildlife Research Center (https://databasin.org/datasets/02fe0ebbb1b04111b0ba1579b89b7420 and https://www.pwrc.usgs.gov/BBS/RawData).
Using ArcGIS Pro© 10.0, species observations were joined to respective BBS and BBA observation units shapefiles using the Join Table tool. For both BBA and BBS, a species was coded as either present (1) or absent (0). Presence in a sampling unit was based on codes 2, 3, or 4 in the original volunteer birding checklist codes (possible breeder, probable breeder, and confirmed breeder, respectively), and absence was based on codes 0 or 1 (not observed and observed but not likely breeding). Spelling inconsistencies of species names between BBA and BBS datasets were fixed. Species that needed spelling fixes included Brewer’s Blackbird, Cooper’s Hawk, Henslow’s Sparrow, Kirtland’s Warbler, LeConte’s Sparrow, Lincoln’s Sparrow, Swainson’s Thrush, Wilson’s Snipe, and Wilson’s Warbler. In addition, naming conventions were matched between BBS and BBA data. The Alder and Willow Flycatchers were lumped into Traill’s Flycatcher and regional races were lumped into a single species column: Dark-eyed Junco regional types were lumped together into one Dark-eyed Junco, Yellow-shafted Flicker was lumped into Northern Flicker, Saltmarsh Sparrow and the Saltmarsh Sharp-tailed Sparrow were lumped into Saltmarsh Sparrow, and the Yellow-rumped Myrtle Warbler was lumped into Myrtle Warbler (currently named Yellow-rumped Warbler). Three hybrid species were removed: Brewster's and Lawrence's Warblers and the Mallard x Black Duck hybrid. Established “exotic” species were included in the analysis since we were concerned only with detection of richness and not of specific species.
The resultant species tables with sampling effort were pivoted horizontally so that every row was a sampling unit and each species observation was a column. This was done for each state using R version 3.6.2 (R© 2019, The R Foundation for Statistical Computing Platform) and all state tables were merged to yield one BBA and one BBS dataset. Following the joining of environmental variables to these datasets (see below), BBS and BBA data were joined using rbind.data.frame in R© to yield a final dataset with all species observations and environmental variables for each observation unit.
Environmental data
Using ArcGIS Pro© 10.0, all environmental raster layers, BBA and BBS shapefiles, and the species observations were integrated in a common coordinate system (North_America Equidistant_Conic) using the Project tool. For BBS routes, 400m buffers were drawn around each route using the Buffer tool. The observation unit shapefiles for all states were merged (separately for BBA blocks and BBS routes and 400m buffers) using the Merge tool to create a study-wide shapefile for each data source. Whether or not a BBA block was adjacent to a BBS route was determined using the Intersect tool based on a radius of 30m around the route buffer (to fit the NLCD map resolution). Area and length of the BBS route inside the proximate BBA block were also calculated. Mean values for annual precipitation and summer temperature, and mean and range for elevation, were extracted for every BBA block and 400m buffer BBS route using Zonal Statistics as Table tool. The area of each land-cover type in each observation unit (BBA block and BBS buffer) was calculated from the NLCD layer using the Zonal Histogram tool.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NSW Native Vegetation Area Clearing Estimate (NVACE) is a statewide spatial vector layer estimating the presence and absence of native vegetation at 2020. The NVACE does not discriminate different types of native vegetation. \r \r The NVACE dataset has been developed by DPE to provide guidance on whether a development exceeds the Area Clearing Threshold for entry into the Biodiversity Offsets Scheme (BOS), as guided by Biodiversity Conservation Regulation 2017 s7.2 . \r \r The NVACE uses best available primary and supplementary datasets to identify areas where native woody and non-woody vegetation occurs. Due to datasets being of different ages and resolution, errors of commission and omission may be included. The dataset is subject to limitations of scale and accuracy which need to be considered when applying the dataset. The limitations are more fully described in the method (see web link below). The NVACE data is intended to provide guidance on the presence or absence of native vegetation only. \r \r The dataset was first published by NSW Department Planning and Environment (DPE) in 2023 and will be updated intermittently. \r \r Primary published datasets used in the creation of NVACE Version 1 include: \r \r 1. NSW Native Vegetation Extent 5m raster \r \r 2. NSW Landuse 2017 \r \r The NVACE is refined using a combination of datasets to remove known areas of clearing, for example, Statewide Landcover and Tree Survey ( SLATS ) data for woody vegetation and non woody vegetation change clearing events from Non-Woody Landcover Disturbance Program ( NWD ). \r \r The Geoscape Surface Cover raster is used to refine native vegetation in urban areas. Components are used to remove roads and swimming pools and add increased resolution tree canopies. \r \r Land identified as Category 1 exempt under the amended Local Land Services act 2013 has been removed from NVACE as per the Biodiversity Conservation Act 2016. \r \r Small polygons resulting from editing the NVACE are removed as artefacts.\r \r A more detailed description of the methodology is published and provided on the DPE website .\r \r Together with the Biodiversity Values Map, the NVACE forms the basis for determining whether a local development ( Part 4 NSW EP&A Act ) should be assessed for inclusion in the Biodiversity Offsets Scheme. A development which is required to be assessed for clearing of native vegetation in the Biodiversity Offsets Scheme may then potentially require biodiversity offsets against any losses undertaken as part of the development. \r \r The dataset is primarily available to be displayed at a property scale when preparing a Biodiversity Map and Threshold (BMAT) report but may be provided on application to users in a spatial data format. \r \r More information on the Biodiversity Offsets Scheme can be viewed here: \r \r About the Biodiversity Offsets Scheme | NSW Environment and Heritage \r \r The Biodiversity Values Map homepage, containing links to the BMAT tool and other related BOS information can be viewed here: \r \r Biodiversity Values Map | NSW Environment and Heritage \r \r
The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
Receptor impact models (RIMs) are developed for specific landscape classes. The prediction of Receptor Impact Variables is a multi-stage process. It relies on the runs from surface water and groundwater models at nodes within the analysis extent. These outputs derive directly from the hydrological model. For a given node, there is a value for each combination of hydrological response variable, future, and replicate or run number. Not all variables may be available or appropriate at every node. This differs to the quantile summary information that is otherwise used to summarise the HRV output and is also registered.
There is a key look up table (Excel file) that lists the assessment units (AUIDs) by landscape class (or landscape group if appropriate) and notes that groundwater modelling node and runs, and the surface water modelling node and runs, that should be used for that AUID. In some cases the AUID is only mapped to one set of hydrological modelling output. This look up table represent the AUIDs that require RIV predictions. For NAM and GAL there is a single look up table. For GLO and HUN surface and GW are provided separately.
Receptor impact models (RIMs) are developed for specific landscape classes. The hydrological response variables that a RIM within a landscape class requires are organised by the R script RIM_Prediction_CreateArray.R into an array. The formatted data is available as an R data file format called RDS and can be read directly into R.
The R script IMIA_HUN_RIM_predictions.R applies the receptor model functions (RDS object as part of Data set 1: Ecological expert elicitation and receptor impact models for the HUN subregion) to the HRV array for each landscape class (or landscape group) to make predictions of receptor impact varibles (RIVs). Predictions of a receptor impact from a RIM for a landscape class are summarised at relevant AUIDs by the 5th through to the 95th percentiles (in 5% increments) for baseline and CRDP futures. These are available in the HUN_RIV_quantiles_IMIA.csv data set. RIV predictions are further summarised and compared as boxplots (using the R script boxplotsbyfutureperiod.R) and as (aggregated) spatial risk maps using GIS.
Bioregional Assessment Programme (2018) HUN Predictions of receptor impact variables v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/fbc11409-5fef-4d05-a566-cebdadff319d.
Derived From Bioregional_Assessment_Programme_Catchment Scale Land Use of Australia - 2014
Derived From NSW Wetlands
Derived From NSW Office of Water Surface Water Entitlements Locations v1_Oct2013
Derived From HUN Landscape Classification v02
Derived From Travelling Stock Route Conservation Values
Derived From Darling River Hardyhead Predicted Distribution in Hunter River Catchment NSW 2015
Derived From Climate Change Corridors Coastal North East NSW
Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only
Derived From Climate Change Corridors for Nandewar and New England Tablelands
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From Fauna Corridors for North East NSW
Derived From Asset database for the Hunter subregion on 27 August 2015
Derived From Hunter CMA GDEs (DRAFT DPI pre-release)
Derived From Estuarine Macrophytes of Hunter Subregion NSW DPI Hunter 2004
Derived From Geofabric Surface Network - V2.1.1
Derived From Birds Australia - Important Bird Areas (IBA) 2009
Derived From Camerons Gorge Grassy White Box Endangered Ecological Community (EEC) 2008
Derived From NSW Office of Water Surface Water Licences Processed for Hunter v1 20140516
Derived From Asset database for the Hunter subregion on 24 February 2016
Derived From Natural Resource Management (NRM) Regions 2010
Derived From Gosford Council Endangered Ecological Communities (Umina woodlands) EEC3906
Derived From NSW Office of Water Surface Water Offtakes - Hunter v1 24102013
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From Asset list for Hunter - CURRENT
Derived From Northern Rivers CMA GDEs (DRAFT DPI pre-release)
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Derived From Ramsar Wetlands of Australia
Derived From Native Vegetation Management (NVM) - Manage Benefits
Derived From GEODATA TOPO 250K Series 3
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Geological Provinces - Full Extent
Derived From Hunter subregion boundary
Derived From Commonwealth Heritage List Spatial Database (CHL)
Derived From Groundwater Economic Elements Hunter NSW 20150520 PersRem v02
Derived From Greater Hunter Native Vegetation Mapping with Classification for Mapping
Derived From Atlas of Living Australia NSW ALA Portal 20140613
Derived From Bioregional Assessment areas v03
Derived From Spatial Threatened Species and Communities (TESC) NSW 20131129
Derived From HUN Landscape Classification v03
Derived From National Heritage List Spatial Database (NHL) (v2.1)
Derived From GW Element Bores with Unknown FTYPE Hunter NSW Office of Water 20150514
Derived From Climate Change Corridors (Dry Habitat) for North East NSW
Derived From Groundwater Entitlement Hunter NSW Office of Water 20150324
Derived From Asset database for the Hunter subregion on 20 July 2015
Derived From NSW Office of Water combined geodatabase of regulated rivers and water sharing plan regions
Derived From NSW Office of Water GW licence extract linked to spatial locations for NorthandSouthSydney v3 13032014
Derived From [Asset database for the Hunter subregion on 16
Red sensitivity is the exception rather than the norm in most animal groups. Among species with a long wavelength sensitive (LWS) photoreceptor, peak wavelength sensitivity (λmax) varies substantially and it is unclear whether this variation can be explained by visual tuning to the light environment or to visual tasks such as signalling or foraging. Here, we examine long wavelength sensitivity across a broad range of taxa showing diversity in LWS photoreceptor λmax: insects, crustaceans, arachnids, amphibians, reptiles, fish, sharks and rays. We identified 161 species with a LWS photoreceptor (λmax ≥ 550 nm). We found evidence supporting visual tuning to the light environment: terrestrial species had longer λmax than aquatic species, and of these, species from turbid shallow waters had longer λmax than those from clear or deep waters. Of the terrestrial species, diurnal species had longer λmax than nocturnal species, but we did not detect any differences across terrestrial habitats (clo..., Full details of data collection and processing are in the manuscript., R/RStudio, Red sensitivity is associated with lighting environment but not visual tasks Authors: Bryony M. Margetts, Devi Stuart-Fox and Amanda M. Franklin
The dataset and R code in this Dryad database will run all analyses in the manuscript, and produce the figures and supplementary table.
R Script: RedVision_Analysis.rmd
This is an RMarkdown file which will run the analyses for the manuscript.
Dataset: Dataset_DRYAD.csv
Info: This dataset include all the species identified to have a long wavelength sensitive photoreceptor. For each species, we have documented environmental, behavioural and morphological variables as well as sensitivity of all photoreceptors. References for photoreceptor sensitivity data are provide.
Class: Biological classification Order: Biological classification Family: Biological classification Genus: Biological classification Species: Biological classification Population: where applicable, population location if different populations of the same species were measured. ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the final occurrence record dataset produced for the manuscript "Depth Matters for Marine Biodiversity". Detailed methods for the creation of the dataset, below, have been excerpted from Appendix I: Extended Methods. Detailed citations for the occurrence datasets from which these data were derived can also be foud in Appedix I of the manuscript.
We assembled a list of all recognized species of fishes from the orders Scombiformes (sensu Betancur-R et al., 2017), Gadiformes, and Beloniformes by accessing FishBase (Boettiger et al., 2012; Froese & Pauly, 2017) and the Ocean Biodiversity Information System (OBIS; OBIS, 2022; Provoost & Bosch, 2019) through queries in R (R Core Team, 2021). Species were considered Atlantic if their FishBase distribution or occurrence records on OBIS included any area within the Atlantic or Mediterranean major fishing regions as defined by the Food and Agriculture Organization of the United Nations (FAO Regions 21, 27, 31, 34, 37, 41, 47, and 48; FAO, 2020) The database query script can be found on the project code repository (https://github.com/hannahlowens/3DFishRichness/blob/main/1_OccurrenceSearch.R). We then curated the list of names to resolve discrepancies in taxonomy and known distributions through comparison with the Eschmeyer Catalog of Fishes (Eschmeyer & Fricke, 2015), accessed in September of 2020, as our ultimate taxonomic authority. The resulting list of species was then mapped onto the Global Biodiversity Information Facility’s backbone taxonomy (Chamberlain et al., 2021; GBIF, 2020a) to ensure taxonomic concurrence across databases (Appendix I Table 1). The final taxonomic list was used to download occurrence records from OBIS (OBIS, 2022) and GBIF (GBIF, 2020b) in R through robis and occCite (Chamberlain et al., 2020; Provoost & Bosch, 2019; Owens et al., 2021).
Once the resulting data were mapped and curated to remove records with putatively spurious coordinates, under-sampled regions and species were augmented with data from publicly available digital museum collection databases not served through OBIS or GBIF, as well as a literature search. For each species, duplicate points were removed from two- and three-dimensional species occurrence datasets separately, and inaccurate depth records were removed from 3D datasets. Inaccuracy was determined based on extreme statistical outliers (values greater than 2 or less than -2 when occurrence depths were centered and scaled), depth ranges that exceeded bathymetry at occurrence coordinates, and occurrence far outside known depth ranges compared to information from FishBase, Eschmeyer’s Catalog of Fishes, and congeneric depth ranges in the dataset. Finally, for datasets with more than 20 points remaining after cleaning, occurrence data were downsampled to the resolution of the environmental data; that is, to 1 point per 1 degree grid cell in the 2D dataset, and to one point per depth slice per 1 degree grid cell in the 3D dataset. Counts of raw and cleaned records for each species can be found in Appendix 1 Table 1.
References:
Betancur-R, R., Wiley, E. O., Arratia, G., Acero, A., Bailly, N., Miya, M., Lecointre, G., & Ortí, G. (2017). Phylogenetic classification of bony fishes. BMC Evolutionary Biology, 17(1), 162. https://doi.org/10.1186/s12862-017-0958-3
Boettiger, C., Lang, D. T., & Wainwright, P. C. (2012). rfishbase: exploring, manipulating and visualizing FishBase data from R. Journal of Fish Biology, 81(6), 2030–2039. https://doi.org/10.1111/j.1095-8649.2012.03464.x
Chamberlain, S., Barve, V., McGlinn, D., Oldoni, D., Desmet, P., Geffert, L., & Ram, K. (2021). rgbif: Interface to the Global Biodiversity Information Facility API. https://CRAN.R-project.org/package=rgbif
Eschmeyer, & Fricke, W. N. &. (2015). Taxonomic checklist of fish species listed in the CITES Appendices and EC Regulation 338/97 (Elasmobranchii, Actinopteri, Coelacanthi, and Dipneusti, except the genus Hippocampus). Catalog of Fishes, Electronic Version. Accessed September, 2020. https://www.calacademy.org/scientists/projects/eschmeyers-catalog-of-fishes
FAO. (2020). FAO Major Fishing Areas. United Nations Fisheries and Aquaculture Division. https://www.fao.org/fishery/en/collection/area
Froese, R., & Pauly, D. (2017). FishBase. Accessed September, 2022. www.fishbase.org
GBIF.org. (2020a). GBIF Backbone Taxonomy. Accessed September, 2020. GBIF.org
GBIF.org. (2020b). GBIF Occurrence Download. Accessed November, 2020. https://doi.org/10.15468
OBIS. (2020). Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. Accessed November, 2020. www.obis.org
Owens, H. L., Merow, C., Maitner, B. S., Kass, J. M., Barve, V., & Guralnick, R. P. (2021). occCite: Tools for querying and managing large biodiversity occurrence datasets. Ecography, 44(8), 1228–1235. https://doi.org/10.1111/ecog.05618
Provoost, P., & Bosch, S. (2019). robis: R Client to access data from the OBIS API. https://cran.r-project.org/package=robis
R Core Team. (2021). R: A Language and Environment for Statistical Computing. https://www.R-project.org/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the age, height, and weight data extracted from the NHANES 2017-2018 survey dataset. The original data were BMX_J.xpt (see https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Examination&CycleBeginYear=2017) and DEMO_J.xpt (see https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Demographics&CycleBeginYear=2017). I used Linux Mint 20 to get the CSV files from the above XPT files. First, I installed the R foreign package by the next command. $ sudo apt install r-cran-foreign Then, I developed two R scripts to extract the CSV data. The scripts are attached to this dataset. For analysis of the CSV file, I used the following commands within the R environment.
data h =20 & data$age w =20 & data$age wt ht model summary(model) Call: lm(formula = wt ~ ht) Residuals: Min 1Q Median 3Q Max -0.29406 -0.07182 -0.00558 0.06514 0.47048 Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.46404 0.01423 102.90
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A repository of data files, code and python/R environments for the manuscript "Rock organic carbon oxidation CO2 release offsets silicate weathering sink" by Jesse R. Zondervan, Robert G. Hilton, Mathieu Dellinger, Fiona J. Clubb, Tobias Roylands, Mateja Ogrič. This repository contains an Excel file and several zip files. Zip files containing code, data and python environment to run a simulation of the Global OCpetro Oxidation model: River rhenium (Re) and OCpetro oxidation data: Supplementary Tables.xlsx Code only: Global_OCpetro_Oxidation-v1.1.0.zip (uploaded Github repository) Geospatial data files only: input_global_(data files only).zip Python environment only: ocpetro_oxidation_env.zip Code, data and environment: ocpetro_oxidation_code_data_env.zip Outputs of the model presented in the manuscript: Geospatial raster files of Fig 1: Re sample locations shapefile (panel A): Re_sample_locations.zip Re sample catchment shapefile (panel A): Re_sample_catchments.zip Median OCpetro stocks model (panel B): Output_OCpetro_stock_median.zip Median denudation model (panel C): Output_denudation_median.zip Best-fit OCpetro oxidation extrapolation (panel D): Output_OC_petro_oxidation.zip Denudation and OCpetro stock subroutines (for transparency only, not needed to run OCpetro oxidation simulation): Code, data and R environment (python environment from ocpetro_oxidation_env.zip, see above): Submodels_code_data_Renv.zip The easiest way to run the code is by downloading the zip file containing code, data and environment, and then unpacking using packages provided by the OS, or by running the 'anaconda-project unarchive' command. Instructions for this can be found by searching for anaconda-project online, or directly via https://anaconda-project.readthedocs.io/en/latest/user-guide/tasks/create-project-archive.html?highlight=unarchive#extracting-the-archive-file [last accessed 26/01/2023] The code was developed in a python environment detailed in anaconda-project.yml, with every recursive dependency down to the individual build in anaconda-project-locked.yml. The code file is "Glob_newmethod_parr_globalresidual.py" in this repository. This code should be reproducible indefinitely, without depending on online package repositories. Both the commands and the environment have been captured into a fully locked anaconda project with all conda packages unpacked and included using anaconda-project --pack-envs. Note that only packages for running the code on Linux can be unpacked in this way; building on other platforms (Windows, Mac) will still require access to repositories. Notes: This code was run on an HPC environment with a job submitter called SLURM. As such, the code will run according to a slurm job array with numbers from 1-100 (10,000 monte carlo simulations). The command to run the Monte Carlo simulation as 10,000 seperate jobs is done like this: "sbatch --array=1-10000:1 job_script_file_name.sh". Note that the version of code uploaded here is set to run 100 simulations ("sbatch --array=1-100:1 job_script_file_name.sh"). When running 10,000 simulations, please change line 46 to "quantile = float(os.getenv('SLURM_ARRAY_TASK_ID'))/10000." Whilst it is possible to run this code on a single machine such as a personal computer, the user is warned that it takes 24 core hours per simulation to run. For example, a typical 4-core laptop would need 6 hours to run one simulation. Now calculate how many 10,000 would take... To run one simulation, line 46 ("quantile = float(os.getenv('SLURM_ARRAY_TASK_ID'))/100.") can be replaced with ("quantile = float(number between 0 and 1)"). Outputs will be saved for each simulation, which can rack up a lot of space, unless you specifically put in lines to delete these from the disk, or, in the case of the example job script for HPC usage, exclude the files when moving data from the node that ran the job. Example: sbatch --array=1-100:1 run_Glob_OCpetro_model.sh #note that this runs 100 simulations. An example of a job script file has been appended. Please note that the details of this job script depend on your machine or HPC system. Please consult your HPC support or platform's (Linux, Mac, Windows) command prompt instructions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Soil surface salinity is one of 18 attributes of soils chosen to underpin the land suitability assessment of the Victoria River Water Resource Assessment (VIWRA) through the digital soil mapping process (DSM). Soil salinity represents the salt content of the soil. This raster data represents a modelled dataset of salinity at the soil surface and is derived from field measured and laboratory analysed site data, and environmental covariates. Data values are: 1 Surface salinity absent, 2 Surface salinity present. Soil surface salinity is a parameter used in land suitability assessments as it hinders seed establishment and retards plant growth. This raster data provides improved soil information used to underpin and identify opportunities and promote detailed investigation for a range of sustainable regional development options and was created within the ‘Land Suitability’ activity of the CSIRO VIWRA. A companion dataset and statistics reflecting reliability of this data are also provided and can be found described in the lineage section of this metadata record. Processing information is supplied in ranger R scripts and attributes were modelled using a Random Forest approach. The DSM process is described in the CSIRO VIWRA published report ‘Soils and land suitability for the Victoria catchment, Northern Territory’. A technical report from the CSIRO Victoria River Water Resource Assessment to the Government of Australia. The Victoria River Water Resource Assessment provides a comprehensive overview and integrated evaluation of the feasibility of aquaculture and agriculture development in the Victoria catchment NT as well as the ecological, social and cultural (indigenous water values, rights and aspirations) impacts of development. Lineage: The soil surface salinity dataset has been generated from a range of inputs and processing steps. Following is an overview. For more information refer to the CSIRO VIWRA published reports and in particular ' Soils and land suitability for the Victoria catchment, Northern Territory’. A technical report from the CSIRO Victoria River Water Resource Assessment to the Government of Australia. 1. Collated existing data (relating to: soils, climate, topography, natural resources, remotely sensed, of various formats: reports, spatial vector, spatial raster etc). 2. Selection of additional soil and land attribute site data locations by a conditioned Latin hypercube statistical sampling method applied across the covariate data space. 3. Fieldwork was carried out to collect new attribute data, soil samples for analysis and build an understanding of geomorphology and landscape processes. 4. Database analysis was performed to extract the data to specific selection criteria required for the attribute to be modelled. 5. The R statistical programming environment was used for the attribute computing. Models were built from selected input data and covariate data using predictive learning from a Random Forest approach implemented in the ranger R package. 6. Create soil surface salinity Digital Soil Mapping (DSM) attribute raster dataset. DSM data is a geo-referenced dataset, generated from field observations and laboratory data, coupled with environmental covariate data through quantitative relationships. It applies pedometrics - the use of mathematical and statistical models that combine information from soil observations with information contained in correlated environmental variables, remote sensing images and some geophysical measurements. 7. Companion predicted reliability data was produced from the 500 individual Random Forest attribute models created. 8. QA Quality assessment of this DSM attribute data was conducted by three methods. Method 1: Statistical (quantitative) method of the model and input data. Testing the quality of the DSM models was carried out using data withheld from model computations and expressed as OOB and confusion matrix results, giving an estimate of the reliability of the model predictions. These results are supplied. Method 2: Statistical (quantitative) assessment of the spatial attribute output data presented as a raster of the attributes “reliability”. This used the 500 individual trees of the attributes RF models to generate 500 datasets of the attribute to estimate model reliability for each attribute. For categorical attributes the method for estimating reliability is the Confusion Index. This data is supplied. Method 3: Collecting independent external validation site data combined with on-ground expert (qualitative) examination of outputs during validation field trips. Across each of the study areas a two week validation field trip was conducted using a new validation site set which was produced by a random sampling design based on conditioned Latin Hypercube sampling using the reliability data of the attribute. The modelled DSM attribute value was assessed against the actual on-ground value. These results are published in the report cited in this metadata record.
TDS, TSS, and Flow data used for developing and testing relationships in the Clear Creek Watershed, Colorado. Data for development of relationships was collected by Barbara Butler while at the Colorado School of Mines. Data for testing relationships was obtained from Tim Steele of TDS Consulting in Colorado as provided by the Upper Clear Creek Watershed Association. This dataset is associated with the following publication: Butler, B., and R. Ford. Evaluating Relationships Between Total Dissolved Solids (TDS) and Total Suspended Solids (TSS) in a Mining-Influenced Watershed. Bob Kleinmann Mine Water and the Environment. Springer-Verlag, BERLIN-HEIDELBERG, GERMANY, 37(1): 18-30, (2018).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background and methodsSystematic reviews, i.e., research summaries that address focused questions in a structured and reproducible manner, are a cornerstone of evidence-based medicine and research. However, certain steps in systematic reviews, such as data extraction, are labour-intensive, which hampers their feasibility, especially with the rapidly expanding body of biomedical literature. To bridge this gap, we aimed to develop a data mining tool in the R programming environment to automate data extraction from neuroscience in vivo publications. The function was trained on a literature corpus (n = 45 publications) of animal motor neuron disease studies and tested in two validation corpora (motor neuron diseases, n = 31 publications; multiple sclerosis, n = 244 publications).ResultsOur data mining tool, STEED (STructured Extraction of Experimental Data), successfully extracted key experimental parameters such as animal models and species, as well as risk of bias items like randomization or blinding, from in vivo studies. Sensitivity and specificity were over 85% and 80%, respectively, for most items in both validation corpora. Accuracy and F1-score were above 90% and 0.9 for most items in the validation corpora, respectively. Time savings were above 99%.ConclusionsOur text mining tool, STEED, can extract key experimental parameters and risk of bias items from the neuroscience in vivo literature. This enables the tool’s deployment for probing a field in a research improvement context or replacing one human reader during data extraction, resulting in substantial time savings and contributing towards the automation of systematic reviews.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Soil surface pH is one of 18 attributes of soils chosen to underpin the land suitability assessment of the Roper River Water Resource Assessment (ROWRA) through the digital soil mapping process (DSM). Soil surface pH is used as a general indicator or proxy of conditions that affect the availability of plant nutrients and potential nutrient toxicities and/or deficiencies. This soil surface pH raster data represents a modelled dataset of pH of the soil surface (<0.10m) measured in standard pH units and is derived from field measurements, analysed site data and environmental covariates. The soil surface pH is a parameter used in land suitability assessments for indicating availability of nutrients for plant use or nutrient deficiencies and/or toxicities eg strong acidity or alkalinity may lead to reduced plant growth. This raster data provides improved soil information used to underpin and identify opportunities and promote detailed investigation for a range of sustainable regional development options and was created within the ‘Land Suitability’ activity of the CSIRO ROWRA. A companion dataset and statistics reflecting reliability of this data are also provided and can be found described in the lineage section of this metadata record. Processing information is supplied in ranger R scripts and attributes were modelled using a Random Forest approach. The DSM process is described in the CSIRO ROWRA published report ‘Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. The Roper River Water Resource Assessment provides a comprehensive overview and integrated evaluation of the feasibility of aquaculture and agriculture development in the Roper catchment NT as well as the ecological, social and cultural (indigenous water values, rights and aspirations) impacts of development. Lineage: This soil surface pH dataset has been generated from a range of inputs and processing steps. Following is an overview. For more information refer to the CSIRO ROWRA published reports and in particular ' Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. 1. Collated existing data (relating to: soils, climate, topography, natural resources, remotely sensed, of various formats: reports, spatial vector, spatial raster etc). 2. Selection of additional soil and land attribute site data locations by a conditioned Latin hypercube statistical sampling method applied across the covariate data space. 3. Fieldwork was carried out to collect new attribute data, soil samples for analysis and build an understanding of geomorphology and landscape processes. 4. Database analysis was performed to extract the data to specific selection criteria required for the attribute to be modelled. 5. The R statistical programming environment was used for the attribute computing. Models were built from selected input data and covariate data using predictive learning from a Random Forest approach implemented in the ranger R package. 6. Create soil surface pH Digital Soil Mapping (DSM) attribute raster dataset. DSM data is a geo-referenced dataset, generated from field observations and laboratory data, coupled with environmental covariate data through quantitative relationships. It applies pedometrics - the use of mathematical and statistical models that combine information from soil observations with information contained in correlated environmental variables, remote sensing images and some geophysical measurements. 7. Companion predicted reliability data was produced from the 500 individual Random Forest attribute models created. 8. QA Quality assessment of this DSM attribute data was conducted by three methods. Method 1: Statistical (quantitative) method of the model and input data. Testing the quality of the DSM models was carried out using data withheld from model computations and expressed as OOB and R squared results, giving an estimate of the reliability of the model predictions. These results are supplied. Method 2: Statistical (quantitative) assessment of the spatial attribute output data presented as a raster of the attributes “reliability”. This used the 500 individual trees of the attributes RF models to generate 500 datasets of the attribute to estimate model reliability for each attribute. For continuous attributes the method for estimating reliability is the Coefficient of Variation. This data is supplied. Method 3: Collecting independent external validation site data combined with on-ground expert (qualitative) examination of outputs during validation field trips. Across each of the study areas a two week validation field trip was conducted using a new validation site set which was produced by a random sampling design based on conditioned Latin Hypercube sampling using the reliability data of the attribute. The modelled DSM attribute value was assessed against the actual on-ground value. These results are published in the report cited in this metadata record.
Human activities frequently alter environmental conditions and affect the use of sexually selected traits like color in animals. However, the effects of environmental stressors are unlikely to be uniform across populations that experience different environments or between sexes. We aimed to understand the underlying genetic, environmental, and gene-by-environment contributions to color expression in males and females of a sexually dimorphic fish. Pseudocrenilabrus multicolor is a haplochromine cichlid found in environments that vary dramatically, particularly with respect to oxygen and turbidity levels. We reared fish from one swamp (hypoxic, clear) and one river (normoxic, turbid) population in a split-brood design (hypoxic/normoxic x clear/turbid) then quantified color and carotenoid concentrations. As expected in this sexually dimorphic species, females were far less colorful than males. In males, hypoxia and turbidity were drivers of traits associated with color, suggesting that co..., These data were collected using photographs and a spectrophotomer. Data were analyazed and processed in R., Data can be viewed using excel and analyzed in R., # Intraspecific Variation in Color and Carotenoids Across Environmental Extremes
This dataset includes color and carotenoid data of the cichlid Pseudocrenilabrus multicolor in response to rearing under hypoxic and turbid conditions. This dataset includes eight excel files and two R scripts, Spec_final.R and Carotenoids_rscript_final.R. Both scripts were analyzed using R version 4.3.0. Cells were left blank if a particular value could not be calculated for that individual.
Spec_final.R uses the input files: Rearing_Renamed.zip, Carotenoid_Fish_info.xlsx, Broodinfo.xlsx, and creates the file Rearingspecfullspectrum.xlsx, which is used in the next script.
Carotenoids_rscript_final.R uses the input files: Carotenoids_all.xlsx, Carotenoids_protein.xlsx, Broodinfo.xlsx, Carotenoids_Plasmavolume.xlsx, Rearingspecfullspectrum.xlsx, Carotenoid_Fish_info.xlsx, Rearing_Processing_Color.csv, Carotenoids_RearingInfo.xlsx
...,
Understanding whether and why microevolutionary patterns of trait covariation match macroevolutionary divergence is essential for linking evolution at different timescales. However, recent work has focused on developmental constraints for alignment between intraspecific variation and divergence, neglecting a potential role of natural selection on function to connect these scales. Here, we compare the support for the selection and constraint hypotheses to explain both phenotypic trait covariation and species divergence. To test these hypotheses, we collected data on hindlimb and jumping performance traits within and across species of two frog genera. We compared patterns of within-species phenotypic variation (the P-matrix) with divergence and selective covariance matrices, from which we could extract the major axes of the realized adaptive landscape (AL), the directions in which adaptive peaks shifted the most over evolutionary time. We also tested whether the major axes of the AL were ..., The dataset was collected using a force plate and the software BioWare to measure jumping performance on live frogs collected in French Guiana, a caliper to measure hindlimb dimensions and a scale to measure body mass, both in live frogs and museum specimens. The force traces were processed using a MATLAB script to extract acceleration and velocity for each jumping peak analyzed. Then, peak jumping performance values were extracted for each specimen using R programming environment. Morphological and jumping performance data were subsequently analyzed to contruct phenotypic covariance matrices and divergence matrices (rate matrices), and test whether variation within-species matched divergence across species. We also tested the role of selection on hindlimb morphology associated with jumping performance and on promoting divergence across species. All analyses were performed in R programming environment., , # Data from: Macroevolutionary divergence along allometric lines of least resistance in frog hindlimb traits and its effect on locomotor evolution
https://doi.org/10.5061/dryad.rn8pk0pnp
Jumping performance measured using a force plate and BioWare software.
Hindlimb morphology measured using a caliper and body mass measured with a scale.
Description:Â Maximum acceleration and velocity values for all jumping peaks analyzed with MATLAB
Description:Â ...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
An Ecological Continuity Classification corresponds to all or part of a course or channel identified in a stop taken by the Basin Coordinating Prefect pursuant to Article L.214-17 of the Environmental Code. The classification into list 1 (1° of § I of Article L. 214-17 of the Environmental Code) is intended to protect certain rivers from degradation and allows a long-term preservation objective to be displayed. They cancel, replace, and complete the classification as reserved rivers under the 1919 Act. List 2 (2° of § 1 of Article L. 214-17 of the Environmental Code), annuls, replaces and completes the concept of ‘rivers classified’ under L. 432-6 of the Environmental Code, must make it possible to ensure that existing works are quickly compatible with the objectives of ecological continuity. The founding regulatory text of an Ecological Continuity Classification is the classification decree signed by the Basin Coordinating Prefect in accordance with the procedure laid down in Article R.214-10 of the Environmental Code providing for departmental consultation of classification projects before validation by the Basin Coordinating Prefect. The classification of rivers in respect of the crossing of migrants is carried out under 2° (list 2) of Article L.214-17-I of the Environmental Code for the criterion “courses for which it is necessary to ensure the circulation of migratory fish (amphihalins or not)”.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
An Ecological Continuity Classification corresponds to all or part of a course or channel identified in a stop taken by the Basin Coordinating Prefect pursuant to Article L.214-17 of the Environmental Code. The classification into list 1 (1° of § I of Article L. 214-17 of the Environmental Code) is intended to protect certain rivers from degradation and allows a long-term preservation objective to be displayed. They cancel, replace, and complete the classification as reserved rivers under the 1919 Act. List 2 (2° of § 1 of Article L. 214-17 of the Environmental Code), annuls, replaces and completes the concept of ‘rivers classified’ under L. 432-6 of the Environmental Code, must make it possible to ensure that existing works are quickly compatible with the objectives of ecological continuity. The founding regulatory text of an Ecological Continuity Classification is the classification decree signed by the Basin Coordinating Prefect in accordance with the procedure laid down in Article R.214-10 of the Environmental Code providing for departmental consultation of classification projects before validation by the Basin Coordinating Prefect. The classification of rivers in respect of the crossing of migrants is carried out under 2° (list 2) of Article L.214-17-I of the Environmental Code for the criterion “courses for which it is necessary to ensure the circulation of migratory fish (amphihalins or not)”.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Package for the paper "Technical Debt in the Peer-Review Documentation of R Packages: a rOpenSci Case Study" (MSR '21).
# Scripts: Data Collection and Processing
These are the scripts used to extract the data from _rOpenSci_. The following steps indicate how to use them.
1. Add all attached R files into an R project.
2. Install the following R packages. Moreover, the process also requires to have a working GitHub account, in order to obtain the corresponding token.
```{r}
library(dplyr)
library(stringr)
library(stringi)
library(jsonlite)
library(httpuv)
library(httr)
library(ggplot2)
library(tidyr)
```'
3. All the individual functions on the following files should be sourced into the R Environment: `getToken.R`, `comments.R`, `issues.R`, and `tagging.R`.
4. Run the script located on the file `process.R`. This will run all the previous functions in the corresponding order.
# Datasets
The following files are included:
-Dataset_1-100_Author1.xlsx contains the randomly selected 100 comments that were classified according to TD types by Author 1.
-Dataset_1-100_Author2.xlsx contains the randomly selected 100 comments that were classified according to TD types by Author 2 and the combined classification (in blue) after discussion.
-Dataset_Phrases_Both.xlsx contains the randomly selected 358 comments (resulting in 602 phrases) that were classified according to TD types by both authors 1 and 2. Their classification was incorporated into a single spreadsheet side by side for easy comparison. Disagreement was discussed and final classification is in the “Agreement” field.
-UserRoles.csv contains the user roles associated with the 600 phrases. The “comment_id” is the unique identifier for the comment from which the phrase is extracted. The phrase is represented in the “statement” field. The “agreement” field shows the final technical debt label after the analysis by two of the authors. The user roles are shown in the “user_role” column.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset generated by our Microwell-seq 3.0 technique.
Files: In order to save space, we've packaged our data into tar.gz format. Please unzip the files once you've successfully downloaded. RNA_WT_RData.tar.gz: Seurat object along with a metadata including cell barcodes, tissue source & cell type annotation, could be loaded into R environment and used directly. RNA_Tumor_RData.tar.gz: Seurat object along with a metadata including cell barcodes, tissue source, cell type annotation & potential cell state prediction(neoplastic, intermediate & non-neoplastic), could be loaded into R environment and used directly. RNA_WT_Dge.tar.gz: Digital Expression data (in .csv format) generated by Drop-seq tools, with batch effect removed by customed scripts. RNA_Tumor_Dge.tar.gz : Digital Expression data(in .csv format) generated by Drop-seq tools, with batch effect removed by customed scripts. ATAC_WT_SparseMatrix.tar.gz: scATAC-seq data in 10X-like format(matrix.mtx, barcodes.csv, features.csv), along with a metadata including cell barcodes, tissue source & cell type annotation. ATAC_Tumor_SparseMatrix.tar.gz: scATAC-seq data in 10X-like format(matrix.mtx, barcodes.csv, features.csv), along with a metadata including cell barcodes, tissue source, cell type annotation & potential cell state prediction(neoplastic, intermediate & non-neoplastic).