The Clear Sky Mask product contains an image in the form of a binary cloud mask that identifies pixels within a coverage region as clear or cloudy. The production of the clear sky mask is an important step in the processing of many other Advanced Baseline Imager (ABI) Level 2+ products that use the information generated in the production of the clear sky mask to determine the presence of a cloud. The product includes data quality information for the binary cloud mask data values for on-earth pixels. The binary cloud mask value is a dimensionless quantity. The Clear Sky Mask product image is provided at 2 km resolution on the ABI fixed grid for Full Disk, CONUS, and Mesoscale coverage regions from GOES East and West. Product data is produced for geolocated source data to local zenith angles of 90 degrees for both daytime and nighttime conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this paper we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real data sets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
These data and computer code (written in R, https://www.r-project.org) were created to statistically evaluate a suite of spatiotemporal covariates that could potentially explain pronghorn (Antilocapra americana) mortality risk in the Northern Sagebrush Steppe (NSS) ecosystem (50.0757o N, −108.7526o W). Known-fate data were collected from 170 adult female pronghorn monitored with GPS collars from 2003-2011, which were used to construct a time-to-event (TTE) dataset with a daily timescale and an annual recurrent origin of 11 November. Seasonal risk periods (winter, spring, summer, autumn) were defined by median migration dates of collared pronghorn. We linked this TTE dataset with spatiotemporal covariates that were extracted and collated from pronghorn seasonal activity areas (estimated using 95% minimum convex polygons) to form a final dataset. Specifically, average fence and road densities (km/km2), average snow water equivalent (SWE; kg/m2), and maximum decadal normalized difference vegetation index (NDVI) were considered as predictors. We tested for these main effects of spatiotemporal risk covariates as well as the hypotheses that pronghorn mortality risk from roads or fences could be intensified during severe winter weather (i.e., interactions: SWE*road density and SWE*fence density). We also compare an analogous frequentist implementation to estimate model-averaged risk coefficients. Ultimately, the study aimed to develop the first broad-scale, spatially explicit map of predicted annual pronghorn survivorship based on anthropogenic features and environmental gradients to identify areas for conservation and habitat restoration efforts.
Methods We combined relocations from GPS-collared adult female pronghorn (n = 170) with raster data that described potentially important spatiotemporal risk covariates. We first collated relocation and time-to-event data to remove individual pronghorn from the analysis that had no spatial data available. We then constructed seasonal risk periods based on the median migration dates determined from a previous analysis; thus, we defined 4 seasonal periods as winter (11 November–21 March), spring (22 March–10 April), summer (11 April–30 October), and autumn (31 October–10 November). We used the package 'amt' in Program R to rarify relocation data to a common 4-hr interval using a 30-min tolerance. We used the package 'adehabitatHR' in Program R to estimate seasonal activity areas using 95% minimum convex polygon. We constructed annual- and seasonal-specific risk covariates by averaging values within individual activity areas. We specifically extracted values for linear features (road and fence densities), a proxy for snow depth (SWE), and a measure of forage productivity (NDVI). We resampled all raster data to a common resolution of 1 km2. Given that fence density models characterized regional-scale variation in fence density (i.e., 1.5 km2), this resolution seemed appropriate for our risk analysis. We fit Bayesian proportional hazards (PH) models using a time-to-event approach to model the effects of spatiotemporal covariates on pronghorn mortality risk. We aimed to develop a model to understand the relative effects of risk covariates for pronghorn in the NSS. The effect of fence or road densities may depend on SWE such that the variables interact in affecting mortality risk. Thus, our full candidate model included four main effects and two interaction terms. We used reversible-jump Markov Chain Monte Carlo (RJMCMC) to determine relative support for a nested set of Bayesian PH models. This allowed us to conduct Bayesian model selection and averaging in one step by using two custom samplers provided for the R package 'nimble'. For brevity, we provide the final time-to-event dataset and analysis code rather than include all of the code, GIS, etc. used to estimate seasonal activity areas and extract and collate spatial risk covariates for each individual. Rather we provide the data and all code to reproduce the risk regression results presented in the manuscript.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An Open Context "types" dataset item. Open Context publishes structured data as granular, URL identified Web resources. This record is part of the "Madaba Plains Project-`Umayri" data publication.
This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.
The attached file details the workflow for the processing and analysis of active acoustic data (Simrad EK60; 12, 38, 120 and 200 kHz) collected from RSV Aurora Australis during the 2006 BROKE-West voyage. The attached file is in Echoview(R) (https://www.echoview.com/) version 8 format.
The Echoview file is suitable for working with fisheries acoustics, i.e. water column backscatter, data collected using a Simrad EK60 and the file is set-up to read 38, 120 and 200 kHz split-beam data. The file has operators to remove acoustic noise, e.g. spikes and dropped pings, and operators for removing surface noise and seabed echoes. Echoes arising from krill are isolated using the ‘dB-difference’ method recommended by CCAMLR. The Echoview file is set-up to export the results of krill echo integration as both intervals and swarms. Full details of the method are available in Jarvis et al. (2010) and the krill swarms methods are described in Bestley et al. (2017).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data contain bathymetric data from the Namibia continental slope. The data were acquired on R/V Meteor research expeditions M76/1 in 2008, and R/V Maria S. Merian expedition MSM19/1c in 2011. The purpose of the data was the exploration of the Namibian continental slope and espressially the investigation of large seafloor depressions. The bathymetric data were acquired with the 191-beam 12 kHz Kongsberg EM120 system. The data were processed using the public software package MBSystems. The loaded data were cleaned semi-automatically and manually, removing outliers and other erroneous data. Initial velocity fields were adjusted to remove artifacts from the data. Gridding was done in 10x10 m grid cells for the MSM19-1c dataset and 50x50 m for the M76 dataset using the Gaussian Weighted Mean algorithm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
This dataset contains cleaned GBIF (www.gbif.org) occurrence records and associated climate and environmental data for all arthropod prey of listed species in California drylands as identified in Lortie et al. (2023): https://besjournals.onlinelibrary.wiley.com/doi/full/10.1002/2688-8319.12251. All arthropod records were downloaded from GBIF (https://doi.org/10.15468/dl.ngym3r) on 14 November 2022. Records were imported into R using the rgbif package and cleaned with the coordinateCleaner package to remove occurrence data with likely errors. Environmental data include bioclimatic variables from WorldClim (www.worldclim.org), landcover and NDVI data from MODIS and the LPDAAC (https://lpdaac.usgs.gov/), elevation data from the USGS (https://www.sciencebase.gov/catalog/item/542aebf9e4b057766eed286a), and distance to the nearest road from the census bureau's TIGER/Line road shapefile (https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html). All environmental data were combined into a stacked raster and we extracted the environmental variables for each occurrence record from this raster to make the final dataset.
This data set contains QA/QC-ed (Quality Assurance and Quality Control) water level data for the PLM1 and PLM6 wells. PLM1 and PLM6 are location identifiers used by the Watershed Function SFA project for two groundwater monitoring wells along an elevation gradient located along the lower montane life zone of a hillslope near the Pumphouse location at the East River Watershed, Colorado, USA. These wells are used to monitor subsurface water and carbon inventories and fluxes, and to determine the seasonally dependent flow of groundwater under the PLM hillslope. The downslope flow of groundwater in combination with data on groundwater chemistry (see related references) can be used to estimate rates of solute export from the hillslope to the floodplain and river. QA/QC analysis of measured groundwater levels in monitoring wells PLM-1 and PLM-6 included identification and flagging of duplicated values of timestamps, gap filling of missing timestamps and water levels, removal of abnormal/bad and outliers of measured water levels. The QA/QC analysis also tested the application of different QA/QC methods and the development of regular (5-minute, 1-hour, and 1-day) time series datasets, which can serve as a benchmark for testing other QA/QC techniques, and will be applicable for ecohydrological modeling. Themore » package includes a Readme file, one R code file used to perform QA/QC, a series of 8 data csv files (six QA/QC-ed regular time series datasets of varying intervals (5-min, 1-hr, 1-day) and two files with QA/QC flagging of original data), and three files for the reporting format adoption of this dataset (InstallationMethods, file level metadata (flmd), and data dictionary (dd) files).QA/QC-ed data herein were derived from the original/raw data publication available at Williams et al., 2020 (DOI: 10.15485/1818367). For more information about running R code file (10.15485_1866836_QAQC_PLM1_PLM6.R) to reproduce QA/QC output files, see README (QAQC_PLM_readme.docx). This dataset replaces the previously published raw data time series, and is the final groundwater data product for the PLM wells in the East River. Complete metadata information on the PLM1 and PLM6 wells are available in a related dataset on ESS-DIVE: Varadharajan C, et al (2022). https://doi.org/10.15485/1660962. These data products are part of the Watershed Function Scientific Focus Area collection effort to further scientific understanding of biogeochemical dynamics from genome to watershed scales.2022/09/09 Update: Converted data files using ESS-DIVE’s Hydrological Monitoring Reporting Format. With the adoption of this reporting format, the addition of three new files (v1_20220909_flmd.csv, V1_20220909_dd.csv, and InstallationMethods.csv) were added. The file-level metadata file (v1_20220909_flmd.csv) contains information specific to the files contained within the dataset. The data dictionary file (v1_20220909_dd.csv) contains definitions of column headers and other terms across the dataset. The installation methods file (InstallationMethods.csv) contains a description of methods associated with installation and deployment at PLM1 and PLM6 wells. Additionally, eight data files were re-formatted to follow the reporting format guidance (er_plm1_waterlevel_2016-2020.csv, er_plm1_waterlevel_1-hour_2016-2020.csv, er_plm1_waterlevel_daily_2016-2020.csv, QA_PLM1_Flagging.csv, er_plm6_waterlevel_2016-2020.csv, er_plm6_waterlevel_1-hour_2016-2020.csv, er_plm6_waterlevel_daily_2016-2020.csv, QA_PLM6_Flagging.csv). The major changes to the data files include the addition of header_rows above the data containing metadata about the particular well, units, and sensor description.2023/01/18 Update: Dataset updated to include additional QA/QC-ed water level data up until 2022-10-12 for ER-PLM1 and 2022-10-13 for ER-PLM6. Reporting format specific files (v2_20230118_flmd.csv, v2_20230118_dd.csv, v2_20230118_InstallationMethods.csv) were updated to reflect the additional data. R code file (QAQC_PLM1_PLM6.R) was added to replace the previously uploaded HTML files to enable execution of the associated code. R code file (QAQC_PLM1_PLM6.R) and ReadMe file (QAQC_PLM_readme.docx) were revised to clarify where original data was retrieved from and to remove local file paths.« less
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The digital Suburb/Locality Boundaries and their legal identifiers have been derived from the cadastre data from each state and territory jurisdiction and are available below.\r \r Suburb/Locality Boundaries are part of Geoscape Administrative Boundaries, which is built and maintained by Geoscape Australia using authoritative government data. Further information about contributors to Administrative Boundaries is available here.\r \r The full Administrative Boundaries dataset comprises seven Geoscape products:\r \r * Localities\r * Local Government Areas (LGAs)\r * Wards\r * Australian Bureau of Statistics (ABS) Boundaries,\r * Electoral Boundaries\r * State Boundaries and\r * Town Points\r \r Updated versions of Administrative Boundaries are published on a quarterly basis.\r Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.\r \r ** There were no updates in the November 2024 release **\r \r \r ** Notable changes in the August 2021 release: **\r \r * The Localities, Local Government Areas and Wards products have been redesigned to provide a new flattened data model, offering a simpler, easier-to-use structure. This will also:\r - change the composition of identifiers in these products.\r - provide state identifiers as an abbreviation (eg. NSW) rather than a code.\r - remove the static SA “Hundreds” data from Localities.\r * More information on the changes to Localities, Local Government Areas and Wards is available here.\r * The Australian Bureau of Statistics (ABS) Boundaries will be updated to include the 2021 Australian Statistical Geography Standard (ASGS).\r * Further information on the August changes to Geoscape datasets is available here.\r \r Further information on Administrative Boundaries, including FAQs on the data, is available here through Geoscape Australia’s network of partners. They provide a range of commercial products based on Administrative Boundaries, including software solutions, consultancy and support.\r \r Note: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia. \r \r
\r The Australian Government has negotiated the release of Administrative Boundaries to the whole economy under an open CCBY 4.0 license.\r \r Users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).\r \r Users must also note the following attribution requirements:\r \r Preferred attribution for the Licensed Material:\r \r
Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).\r \r Preferred attribution for Adapted Material:\r \r Incorporates or developed using Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).\r
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These contiguous suitable area data are based on the land suitability data from the Roper River Water Resource Assessment in the NT (ROWRA). To address operational farming constraints imposed by parcels of suitable land being too small or oddly shaped according to natural variability of land, or physical limits on suitable farming land parcel sizes, contiguous suitable area data was generated. This contiguous suitable area data is based on crop-specific minimum areas and minimum length/width of contiguous suitable land and is produced as standalone data products for all crop groups. The rules are provided for download. The data was generated to remove the component of landscape complexity that natural distributions of soil and land variability and specific crop requirements produce. This data provides improved land evaluation information to identify opportunities and promote detailed investigation for a range of sustainable development options. The land suitability evaluation methods used to produce the underlying data are a modification of the Food and Agriculture Organisation (FAO) land evaluation approach. The land suitability analysis is described in full in the CSIRO ROWRA published report ‘Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. The naming convention for these data is; ‘crop group’ underscore ‘major crop’ underscore ‘season code’ underscore ‘irrigation type code’ underscore ‘catchment code’ underscore ‘data type’ eg ‘CG7_CottonGrains_D_Fw_R_ContigArea’ is Cotton and grain crops grown in the dry season with furrow irrigation in the Roper catchment contiguous suitable area data. The codes for season are; W – wet season; D – dry season; P – perennial. The codes for irrigation type are; S – overhead spray irrigation; T – trickle irrigation; Fd – flood irrigation; Fw – furrow irrigation; R – rainfed. It is important to emphasize that this is a regional-scale assessment: further data collection and detailed soil physical, chemical and nutrient analyses would be required to plan development at a scheme, enterprise or property scale. Several limitations that may have a bearing on land suitability were out of scope and not assessed as part of this activity (refer to the report), these limitations include biophysical and socio-cultural. For example these land suitability raster datasets do not include consideration of the licensing of water, flood risk, contiguous land, risk of irrigation induced secondary salinity, or land tenure and other legislative controls. Some of these may be addressed elsewhere in ROWRA eg flooding was investigated by the Earth observation remote sensing group in the surface water activity. The Roper River Water Resource Assessment provides a comprehensive overview and integrated evaluation of the feasibility of aquaculture and agriculture development in the Roper catchment NT as well as the ecological, social and cultural (indigenous water values, rights and aspirations) impacts of development. Lineage: These contiguous suitable area raster datasets have been generated from a range of inputs and processing steps. Following is an overview. For more information refer to the CSIRO ROWRA published report ' Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. 1. Collated existing data (relating to: soils, climate, topography, natural resources, remotely sensed, of various formats: reports, spatial vector, spatial raster etc). 2. Selection of additional soil and land attribute site data locations by a conditioned Latin hypercube statistical sampling method applied across the covariate data space. 3. Fieldwork was carried out to collect new attribute data, soil samples for analysis and build an understanding of geomorphology and landscape processes. 4. Database analysis was performed to extract the data to specific selection criteria required for the attribute to be modelled. 5. The R statistical programming environment was used for the attribute computing. Models were built from selected input data and covariate data using predictive learning from a Random Forest approach implemented in the ranger R package. 6. Create Digital Soil Mapping (DSM) attribute raster datasets. DSM data is a geo-referenced dataset, generated from field observations and laboratory data, coupled with environmental covariate data through quantitative relationships. It applies pedometrics - the use of mathematical and statistical models that combine information from soil observations with information contained in correlated environmental variables, remote sensing images and some geophysical measurements. 7. Land management options were chosen and suitability rules created for DSM attributes. 8. Suitability rules were run to produce limitation subclass datasets using a modification on the FAO methods. 9. Final suitability data created for all land management options. 10. Companion predicted reliability data was produced from the 500 individual Random Forest attribute models created. 11. QA Quality assessment of these land suitability data was conducted by two methods. Method 1: Statistical (quantitative) assessment of the "reliability" of the spatial output data presented as a raster of the Confusion Index. Method 2: Collecting independent external validation site data combined with on-ground expert (qualitative) examination of outputs during validation field trips. A two-week validation field trip was conducted using a new validation site set which was produced by a random sampling design based on conditioned Latin Hypercube sampling. The modelled land suitability value was assessed against the actual on-ground value. These results are published in the report referenced above. 12. A two-step process was developed to simplify the data and was applied across the suitability data of the catchment. First the five suitability classes were aggregated to two: ‘suitable’ for suitability classes 1, 2 and 3, or ‘not suitable’ for class 4 and 5. Second, to further simplify the data, and to reflect the on-ground spatial constraints of farming practices, isolated one or two pixels of ‘not suitable’ contained in larger ‘suitable’ areas were reclassified as ‘suitable’. 13. For each crop group, a minimum area and width were defined based on knowledge of farming practices. Depending on the possible land use, minimum areas were deemed as 2.5 ha, 5 ha, 10 ha or 25 ha and minimum widths of 80 m or 120 m (rules are provided for download). 14. For each crop rule the minimum width was imposed by removing those parts of the suitable area that are narrower (in any direction) than the required minimum width. The remaining groups of connected cells were then tested to see if they meet the required minimum area and removed if they did not.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Existing panel data methods remove unobserved individual effects before change point estimation through data transformations such as first-differencing. In this paper, we show that multiple change points can be consistently estimated in short panels via ordinary least squares. Since no data variation is removed before change point estimation, our method has better small-sample properties compared to first-differencing methods. We also propose two tests that identify whether the change points found by our method originate in the slope parameters or in the covariance of the regressors with individual effects. We illustrate our method via modeling the environmental Kuznets curve and the US house price expectations after the financial crisis.
https://www.bco-dmo.org/dataset/2317/licensehttps://www.bco-dmo.org/dataset/2317/license
Meteorology and sea surface temperature (MET) 1 minute data from eight R/V Oceanus cruises in the Gulf of Maine and Georges Bank area during 1998 access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=The sea surface temperature as measured by the hull sensor is not shown since the sea surface temperature as measured via the engine inlet (field name is temp_ss1) is more accurate. awards_0_award_nid=54610 awards_0_award_number=unknown GB NSF awards_0_funder_name=National Science Foundation awards_0_funding_acronym=NSF awards_0_funding_source_nid=350 awards_0_program_manager=David L. Garrison awards_0_program_manager_nid=50534 awards_1_award_nid=54626 awards_1_award_number=unknown GB NOAA awards_1_funder_name=National Oceanic and Atmospheric Administration awards_1_funding_acronym=NOAA awards_1_funding_source_nid=352 cdm_data_type=Other comment=Emet 1 minute data starting 1996. rcg 2/10/1998 Remove SST_engine_intake parameter from display 7/8/1998 rcg File: OC317W.DAT Output of OCMETA.FOR(4/14/99) Change SSTEMP to SSTEMP3. 5/24/1999 rcg Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/bco-dmo.2317.1 Easternmost_Easting=-65.2285 geospatial_lat_max=43.8375 geospatial_lat_min=39.6182 geospatial_lat_units=degrees_north geospatial_lon_max=-65.2285 geospatial_lon_min=-71.0428 geospatial_lon_units=degrees_east infoUrl=https://www.bco-dmo.org/dataset/2317 institution=BCO-DMO instruments_0_acronym=TSG instruments_0_dataset_instrument_description=Thermosalinograph used to obtain a continuous record of sea surface temperature and salinity. instruments_0_dataset_instrument_nid=4226 instruments_0_description=A thermosalinograph (TSG) is used to obtain a continuous record of sea surface temperature and salinity. On many research vessels the TSG is integrated into the ship's underway seawater sampling system and reported with the underway or alongtrack data. instruments_0_instrument_external_identifier=https://vocab.nerc.ac.uk/collection/L05/current/133/ instruments_0_instrument_name=Thermosalinograph instruments_0_instrument_nid=470 instruments_0_supplied_name=Thermosalinograph keywords_vocabulary=GCMD Science Keywords metadata_source=https://www.bco-dmo.org/api/dataset/2317 Northernmost_Northing=43.8375 param_mapping={'2317': {'lat': 'master - latitude', 'lon': 'master - longitude', 'press_bar': 'flag - depth'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/2317/parameters people_0_affiliation=Woods Hole Oceanographic Institution people_0_affiliation_acronym=WHOI people_0_person_name=Dr Richard Payne people_0_person_nid=50490 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=Woods Hole Oceanographic Institution people_1_affiliation_acronym=WHOI BCO-DMO people_1_person_name=Robert C. Groman people_1_person_nid=50380 people_1_role=BCO-DMO Data Manager people_1_role_type=related project=GB projects_0_acronym=GB projects_0_description=The U.S. GLOBEC Georges Bank Program is a large multi- disciplinary multi-year oceanographic effort. The proximate goal is to understand the population dynamics of key species on the Bank - Cod, Haddock, and two species of zooplankton (Calanus finmarchicus and Pseudocalanus) - in terms of their coupling to the physical environment and in terms of their predators and prey. The ultimate goal is to be able to predict changes in the distribution and abundance of these species as a result of changes in their physical and biotic environment as well as to anticipate how their populations might respond to climate change. The effort is substantial, requiring broad-scale surveys of the entire Bank, and process studies which focus both on the links between the target species and their physical environment, and the determination of fundamental aspects of these species' life history (birth rates, growth rates, death rates, etc). Equally important are the modelling efforts that are ongoing which seek to provide realistic predictions of the flow field and which utilize the life history information to produce an integrated view of the dynamics of the populations. The U.S. GLOBEC Georges Bank Executive Committee (EXCO) provides program leadership and effective communication with the funding agencies. projects_0_geolocation=Georges Bank, Gulf of Maine, Northwest Atlantic Ocean projects_0_name=U.S. GLOBEC Georges Bank projects_0_project_nid=2037 projects_0_project_website=http://globec.whoi.edu/globec_program.html projects_0_start_date=1991-01 sourceUrl=(local files) Southernmost_Northing=39.6182 standard_name_vocabulary=CF Standard Name Table v55 subsetVariables=year,depth_w,depth_cs,ed_lw,temp_ss1,temp_ss5,numb_records version=1 Westernmost_Easting=-71.0428 xml_source=osprey2erddap.update_xml() v1.3
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Annual hourly air quality and meteorological data by monitoring site for the 2014 calendar year. For more information on air quality, including live air data, please visit environment.des.qld.gov.au/air. \r \r Data resolution: One-hour average values (one-hour sum for rainfall) \r Data row timestamp: Start of averaging period \r Missing data/not monitored: Blank cell \r Calm conditions: No hourly average wind direction is reported when the hourly average wind speed is zero \r Barometric pressure: Values are at monitoring station elevation, not corrected to mean sea level \r Daily zero/span response check: Automated instrument zero/span response checks are conducted daily between midnight and 1am at Queensland Government sites (can differ at industry sites). Where this takes place an ambient hourly value cannot be reported. \r Sampling height: Four metres above ground (unless otherwise indicated) \r \r PLEASE NOTE: \r \r * The Townsville Coast Guard 2014 air quality monitoring site data was updated on 26/10/2015 due to the wind direction sensor being misaligned and the reported wind direction values have now been corrected. \r * The Auckland Point 2014 air quality monitoring site data was updated on 24/04/2018 to remove invalid wind data due to a sensor fault.
This dataset contains polylines depicting non-woodland linear tree and shrub features in Cornwall and much of Devon, derived from lidar data collected by the Tellus South West project. Data from a lidar (light detection and ranging) survey of South West England was used with existing open source GIS datasets to map non-woodland linear features consisting of woody vegetation. The output dataset is the product of several steps of filtering and masking the lidar data using GIS landscape feature datasets available from the Tellus South West project (digital terrain model (DTM) and digital surface model (DSM)), the Ordnance Survey (OS VectorMap District and OpenMap Local, to remove buildings) and the Forestry Commission (Forestry Commission National Forest Inventory Great Britain 2015, to remove woodland parcels). The dataset was tiled as 20 x 20 km shapefiles, coded by the bottom-left 10 km hectad name. Ground-truthing suggests an accuracy of 73.2% for hedgerow height classes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The raw data file is available online for public access (https://data.ontario.ca/dataset/lake-simcoe-monitoring). Download the 1980-2019 csv files and open up the file named "Simcoe_Zooplankton&Bythotrephes.csv". Copy and paste the zooplankton sheet into a new excel file called "Simcoe_Zooplankton.csv". The column ZDATE in the excel file needs to be switched from GENERAL to SHORT DATE so that the dates in the ZDATE column read "YYYY/MM/DD". Save as .csv in appropriate R folder. The data file "simcoe_manual_subset_weeks_5" is the raw data that has been subset for the main analysis of the article using the .R file "Simcoe MS - 5 Station Subset Data". The .csv file produced from this must then be manually edited to remove data points that do not have 5 stations per sampling period as well as by combining data points that should fall into a single week. The "simcoe_manual_subset_weeks_5.csv" is then used for the calculation of variability, stabilization, asynchrony, and Shannon Diversity for each year in the .R file "Simcoe MS - 5 Station Calculations". The final .R file "Simcoe MS - 5 Station Analysis contains the final statistical analyses as well as code to reproduce the original figures. Data and code for main and supplementary analyses are also available on GitHub (https://github.com/reillyoc/ZPseasonalPEs).
Seven in-stream HOBO pressure transducers and one reference HOBO pressure transducer have been deployed within the Lake Sunapee, NH, USA watershed. Six of the transducers have been in operation since 2010 and an additional transducer was added in 2016. The transducers record data every 15 minutes, and data are downloaded approximately three times per year (early Spring, mid Summer and late Fall). Shortly after download, the data are processed to estimate stream depth using HOBOware's Barometric Compensation Assistant and converted to a .csv file in HOBOware. The data have been QAQC'd to recode obviously errant data to NA using R programming language. No data transformation has occurred beyond basic QAQC of the data to remove known data issues and obviously errant data. The barometric pressure data from the reference transducer located on land are also included in this data package.
These are easements concerning the riparian properties of the railways and established in areas defined by the Act of 15 July 1845 on the Police of Railways and by Article 6 of the Decree of 30 October 1935, as amended, creating visibility easements on public roads, namely: — prohibition on the construction of any construction, other than a fence wall, within a distance of two metres from a railway (art. 5 of the Law of 15 July 1845), — prohibition, without prior authorisation, of excavations in an area equal to the vertical height of a railway embankment of more than three metres, measured from the foot of the slope (art. 6 of the Law of 15 July 1845), — prohibition on establishing thatched blankets, straw and hay grindstones, and any other deposition of flammable materials, at a distance of less than 20 metres from a railway served by fire machines, measured from the foot of the slope (art. 7 of the Law of 15 July 1845), — prohibition on depositing stones or non-flammable objects without prior prefectural authorisation less than five metres from a railway (art. 8 of the Law of 15 July 1845), —Servitudes of visibility at the crossing of a public road and a railway (art. 6 of the Decree-Law of 30 October 1935 and Art. R. 114-6 of the Highway Code), easements defined by a clearance plan drawn up by the authority managing the highway and which may include, as the case may be, in accordance with Article 2 of the decree: •the obligation to remove fence walls or replace them with grids, to remove annoying plantations, to bring back and hold the terrain and any superstructure to a level that is most equal to the level is determined by the above-mentioned decommitment plan, •the absolute prohibition of building, placing fences, filling, planting and making any installations above the level set by the clearance plan Texts in force: Law of 15 July 1845 on the Railway Police — Title I: measures relating to the conservation of iron (Articles 1 to 11); Road Traffic Code (created by Act No. 89-413 and Decree No. 89-631) and in particular the following articles: —L. 123-6 and R.123-3 relating to alignment on national roads, — L. 114-1 to L. 114-6 relating to visibility easements at grade crossings, — R. 131-1 et seq. and R. 141-1 et seq. for the implementation of decommitment plans on departmental or municipal roads. The linear entities of this data relate to the use of certain resources and equipment, they affect land use. With the collection of easements from third parties, DDT-77 cannot guarantee the completeness and accuracy of the deferral of these easements on a large-scale map.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.
The Clear Sky Mask product contains an image in the form of a binary cloud mask that identifies pixels within a coverage region as clear or cloudy. The production of the clear sky mask is an important step in the processing of many other Advanced Baseline Imager (ABI) Level 2+ products that use the information generated in the production of the clear sky mask to determine the presence of a cloud. The product includes data quality information for the binary cloud mask data values for on-earth pixels. The binary cloud mask value is a dimensionless quantity. The Clear Sky Mask product image is provided at 2 km resolution on the ABI fixed grid for Full Disk, CONUS, and Mesoscale coverage regions from GOES East and West. Product data is produced for geolocated source data to local zenith angles of 90 degrees for both daytime and nighttime conditions.