https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Animal ecologists often collect hierarchically-structured data and analyze these with linear mixed-effects models. Specific complications arise when the effect sizes of covariates vary on multiple levels (e.g., within vs among subjects). Mean-centering of covariates within subjects offers a useful approach in such situations, but is not without problems. A statistical model represents a hypothesis about the underlying biological process. Mean-centering within clusters assumes that the lower level responses (e.g. within subjects) depend on the deviation from the subject mean (relative) rather than on absolute values of the covariate. This may or may not be biologically realistic. We show that mismatch between the nature of the generating (i.e., biological) process and the form of the statistical analysis produce major conceptual and operational challenges for empiricists. We explored the consequences of mismatches by simulating data with three response-generating processes differing in the source of correlation between a covariate and the response. These data were then analyzed by three different analysis equations. We asked how robustly different analysis equations estimate key parameters of interest and under which circumstances biases arise. Mismatches between generating and analytical equations created several intractable problems for estimating key parameters. The most widely misestimated parameter was the among-subject variance in response. We found that no single analysis equation was robust in estimating all parameters generated by all equations. Importantly, even when response-generating and analysis equations matched mathematically, bias in some parameters arose when sampling across the range of the covariate was limited. Our results have general implications for how we collect and analyze data. They also remind us more generally that conclusions from statistical analysis of data are conditional on a hypothesis, sometimes implicit, for the process(es) that generated the attributes we measure. We discuss strategies for real data analysis in face of uncertainty about the underlying biological process. Methods All data were generated through simulations, so included with this submission are a Read Me file containing general descriptions of data files, a code file that contains R code for the simulations and analysis data files (which will generate new datasets with the same parameters) and the analyzed results in the data files archived here. These data files form the basis for all results presented in the published paper. The code file (in R markdown) has more detailed descriptions of each file of analyzed results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aAs explained in the main text, yij is an observed outcome of individual i in group j and xij is an individual-level social capital score of individual i in group j. Furthermore, is a self-included measure that denotes the mean of social capital scores of all individuals in group j. It is calculated as , where nj is the size of group j. Similarly, is a self-excluded measure denoting the mean of social capital scores of all individuals excepting individual i in group j, calculated as .
The coastal dunes, beaches, and inner neritic zone of the Merrimack Embayment constitute a petrologic province. In addition to heavy mineral analyses, grain size statistics were generated on most of the samples. Neritic and beach sediments can be differentiated using scatter plots of statistics, but statistical parameters are ineffective in differentiating between river and neritic sediments.
https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
ERA-Interim is the latest European Centre for Medium-Range Weather Forecasts (ECMWF) global atmospheric reanalysis of the period 1979 to August 2019. This follows on from the ERA-15 and ERA-40 re-analysis projects.
The dataset includes monthly mean of daily mean vertical integral level data on a reduced N256 Gaussian grid.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[ Derived from parent entry - See data hierarchy tab ]
This is the Baltic and North Sea Climatology (BNSC) for the Baltic Sea and the North Sea in the range 47 ° N to 66 ° N and 15 ° W to 30 ° E. It is the follow-up project to the KNSC climatology. The climatology was first made available to the public in March 2018 by ICDC and is published here in a slightly revised version 2. It contains the monthly averages of mean air pressure at sea level, and air temperature, and dew point temperature at 2 meter height. It is available on a 1 ° x 1 ° grid for the period from 1950 to 2015. For the calculation of the mean values, all available quality-controlled data of the DWD (German Meteorological Service) of ship observations and buoy measurements were taken into account during this period. Additional dew point values were calculated from relative humidity and air temperature if available. Climatologies were calculated for the WMO standard periods 1951-1980, 1961-1990, 1971-2000 and 1981-2010 (monthly mean values). As a prerequisite for the calculation of the 30-year-climatology, at least 25 out of 30 (five-sixths) valid monthly means had to be present in the respective grid box. For the long-term climatology from 1950 to 2015, at least four-fifths valid monthly means had to be available. Two methods were used (in combination) to calculate the monthly averages, to account for the small number of measurements per grid box and their uneven spatial and temporal distribution: 1. For parameters with a detectable annual cycle in the data (air temperature, dew point temperature), a 2nd order polynomial was fitted to the data to reduce the variation within a month and reduce the uncertainty of the calculated averages. In addition, for the mean value of air temperature, the daily temperature cycle was removed from the data. In the case of air pressure, which has no annual cycle, in version 2 per month and grid box no data gaps longer than 14 days were allowed for the calculation of a monthly mean and standard deviation. This method differs from KNSC and BNSC version 1, where mean and standard deviation were calculated from 6-day windows means. 2. If the number of observations fell below a certain threshold, which was 20 observations per grid box and month for the air temperature as well as for the dew point temperature, and 500 per box and month for the air pressure, data from the adjacent boxes was used for the calculation. The neighbouring boxes were used in two steps (the nearest 8 boxes, and if the number was still below the threshold, the next sourrounding 16 boxes) to calculate the mean value of the center box. Thus, the spatial resolution of the parameters is reduced at certain points and, instead of 1 ° x 1 °, if neighboring values are taken into account, data from an area of 5 ° x 5 ° can also be considered, which are then averaged into a grid box value. This was especially used for air pressure, where the 24 values of the neighboring boxes were included in the averaging for most grid boxes. The mean value, the number of measurements, the standard deviation and the number of grid boxes used to calculate the mean values are available as parameters in the products. The calculated monthly and annual means were allocated to the centers of the grid boxes: Latitudes: 47.5, 48.5, ... Longitudes: -14.5, -13.5, … In order to remove any existing values over land, a land-sea mask was used, which is also provided in 1 ° x 1 ° resolution. In this version 2 of the BNSC, a slightly different database was used, than for the KNSC, which resulted in small changes (less than 1 K) in the means and standard deviations of the 2-meter air temperature and dew point temperature. The changes in mean sea level pressure values and the associated standard deviations are in the range of a few hPa, compared to the KNSC. The parameter names and units have been adjusted to meet the CF 1.6 standard.
https://www.icpsr.umich.edu/web/ICPSR/studies/2824/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/2824/terms
CrimeStat III is a spatial statistics program for the analysis of crime incident locations, developed by Ned Levine and Associates under the direction of Ned Levine, PhD, that was funded by grants from the National Institute of Justice (grants 1997-IJ-CX-0040, 1999-IJ-CX-0044, 2002-IJ-CX-0007, and 2005-IJ-CX-K037). The program is Windows-based and interfaces with most desktop GIS programs. The purpose is to provide supplemental statistical tools to aid law enforcement agencies and criminal justice researchers in their crime mapping efforts. CrimeStat is being used by many police departments around the country as well as by criminal justice and other researchers. The program inputs incident locations (e.g., robbery locations) in 'dbf', 'shp', ASCII or ODBC-compliant formats using either spherical or projected coordinates. It calculates various spatial statistics and writes graphical objects to ArcGIS, MapInfo, Surfer for Windows, and other GIS packages. CrimeStat is organized into five sections: Data Setup Primary file - this is a file of incident or point locations with X and Y coordinates. The coordinate system can be either spherical (lat/lon) or projected. Intensity and weight values are allowed. Each incident can have an associated time value. Secondary file - this is an associated file of incident or point locations with X and Y coordinates. The coordinate system has to be the same as the primary file. Intensity and weight values are allowed. The secondary file is used for comparison with the primary file in the risk-adjusted nearest neighbor clustering routine and the duel kernel interpolation. Reference file - this is a grid file that overlays the study area. Normally, it is a regular grid though irregular ones can be imported. CrimeStat can generate the grid if given the X and Y coordinates for the lower-left and upper-right corners. Measurement parameters - This page identifies the type of distance measurement (direct, indirect or network) to be used and specifies parameters for the area of the study region and the length of the street network. CrimeStat III has the ability to utilize a network for linking points. Each segment can be weighted by travel time, travel speed, travel cost or simple distance. This allows the interaction between points to be estimated more realistically. Spatial Description Spatial distribution - statistics for describing the spatial distribution of incidents, such as the mean center, center of minimum distance, standard deviational ellipse, the convex hull, or directional mean. Spatial autocorrelation - statistics for describing the amount of spatial autocorrelation between zones, including general spatial autocorrelation indices - Moran's I , Geary's C, and the Getis-Ord General G, and correlograms that calculate spatial autocorrelation for different distance separations - the Moran, Geary, Getis-Ord correlograms. Several of these routines can simulate confidence intervals with a Monte Carlo simulation. Distance analysis I - statistics for describing properties of distances between incidents including nearest neighbor analysis, linear nearest neighbor analysis, and Ripley's K statistic. There is also a routine that assigns the primary points to the secondary points, either on the basis of nearest neighbor or point-in-polygon, and then sums the results by the secondary point values. Distance analysis II - calculates matrices representing the distance between points for the primary file, for the distance between the primary and secondary points, and for the distance between either the primary or secondary file and the grid. 'Hot spot' analysis I - routines for conducting 'hot spot' analysis including the mode, the fuzzy mode, hierarchical nearest neighbor clustering, and risk-adjusted nearest neighbor hierarchical clustering. The hierarchical nearest neighbor hot spots can be output as ellipses or convex hulls. 'Hot spot' analysis II - more routines for conducting hot spot analysis including the Spatial and Temporal Analysis of Crime (STAC), K-means clustering, Anselin's local Moran, and the Getis-Ord local G statistics. The STAC and K-means hot spots can be output as ellipses or convex hulls. All of these routines can simulate confidence intervals with a Monte Carlo simulation. Spatial Modeling Interpolation I - a single-variable kernel density estimation routine for producin
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[ Derived from parent entry - See data hierarchy tab ]
This is the Baltic and North Sea Climatology (BNSC) for the Baltic Sea and the North Sea in the range 47 ° N to 66 ° N and 15 ° W to 30 ° E. It is the follow-up project to the KNSC climatology. The climatology was first made available to the public in March 2018 by ICDC and is published here in a slightly revised version 2. It contains the monthly averages of mean air pressure at sea level, and air temperature, and dew point temperature at 2 meter height. It is available on a 1 ° x 1 ° grid for the period from 1950 to 2015. For the calculation of the mean values, all available quality-controlled data of the DWD (German Meteorological Service) of ship observations and buoy measurements were taken into account during this period. Additional dew point values were calculated from relative humidity and air temperature if available. Climatologies were calculated for the WMO standard periods 1951-1980, 1961-1990, 1971-2000 and 1981-2010 (monthly mean values). As a prerequisite for the calculation of the 30-year-climatology, at least 25 out of 30 (five-sixths) valid monthly means had to be present in the respective grid box. For the long-term climatology from 1950 to 2015, at least four-fifths valid monthly means had to be available. Two methods were used (in combination) to calculate the monthly averages, to account for the small number of measurements per grid box and their uneven spatial and temporal distribution: 1. For parameters with a detectable annual cycle in the data (air temperature, dew point temperature), a 2nd order polynomial was fitted to the data to reduce the variation within a month and reduce the uncertainty of the calculated averages. In addition, for the mean value of air temperature, the daily temperature cycle was removed from the data. In the case of air pressure, which has no annual cycle, in version 2 per month and grid box no data gaps longer than 14 days were allowed for the calculation of a monthly mean and standard deviation. This method differs from KNSC and BNSC version 1, where mean and standard deviation were calculated from 6-day windows means. 2. If the number of observations fell below a certain threshold, which was 20 observations per grid box and month for the air temperature as well as for the dew point temperature, and 500 per box and month for the air pressure, data from the adjacent boxes was used for the calculation. The neighbouring boxes were used in two steps (the nearest 8 boxes, and if the number was still below the threshold, the next sourrounding 16 boxes) to calculate the mean value of the center box. Thus, the spatial resolution of the parameters is reduced at certain points and, instead of 1 ° x 1 °, if neighboring values are taken into account, data from an area of 5 ° x 5 ° can also be considered, which are then averaged into a grid box value. This was especially used for air pressure, where the 24 values of the neighboring boxes were included in the averaging for most grid boxes. The mean value, the number of measurements, the standard deviation and the number of grid boxes used to calculate the mean values are available as parameters in the products. The calculated monthly and annual means were allocated to the centers of the grid boxes: Latitudes: 47.5, 48.5, ... Longitudes: -14.5, -13.5, … In order to remove any existing values over land, a land-sea mask was used, which is also provided in 1 ° x 1 ° resolution. In this version 2 of the BNSC, a slightly different database was used, than for the KNSC, which resulted in small changes (less than 1 K) in the means and standard deviations of the 2-meter air temperature and dew point temperature. The changes in mean sea level pressure values and the associated standard deviations are in the range of a few hPa, compared to the KNSC. The parameter names and units have been adjusted to meet the CF 1.6 standard.
These datasets are continuous parameter grids (CPG) of annual mean daily maximum air temperature data for the years 2000 through 2016 in the Pacific Northwest. Source temperature data was produced by the PRISM Climate Group at Oregon State University.
These datasets are continuous parameter grids (CPG) of monthly mean evapotranspiration data for March through September, years 2000 through 2015, in the Pacific Northwest. Source evapotranspiration data was produced using the operational Simplified Surface Energy Balance (SSEBop) model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the Baltic and North Sea Climatology (BNSC) for the Baltic Sea and the North Sea in the range 47 ° N to 66 ° N and 15 ° W to 30 ° E. It is the follow-up project to the KNSC climatology. The climatology was first made available to the public in March 2018 by ICDC and is published here in a slightly revised version 2. It contains the monthly averages of mean air pressure at sea level, and air temperature, and dew point temperature at 2 meter height. It is available on a 1 ° x 1 ° grid for the period from 1950 to 2015. For the calculation of the mean values, all available quality-controlled data of the DWD (German Meteorological Service) of ship observations and buoy measurements were taken into account during this period. Additional dew point values were calculated from relative humidity and air temperature if available. Climatologies were calculated for the WMO standard periods 1951-1980, 1961-1990, 1971-2000 and 1981-2010 (monthly mean values). As a prerequisite for the calculation of the 30-year-climatology, at least 25 out of 30 (five-sixths) valid monthly means had to be present in the respective grid box. For the long-term climatology from 1950 to 2015, at least four-fifths valid monthly means had to be available. Two methods were used (in combination) to calculate the monthly averages, to account for the small number of measurements per grid box and their uneven spatial and temporal distribution: 1. For parameters with a detectable annual cycle in the data (air temperature, dew point temperature), a 2nd order polynomial was fitted to the data to reduce the variation within a month and reduce the uncertainty of the calculated averages. In addition, for the mean value of air temperature, the daily temperature cycle was removed from the data. In the case of air pressure, which has no annual cycle, in version 2 per month and grid box no data gaps longer than 14 days were allowed for the calculation of a monthly mean and standard deviation. This method differs from KNSC and BNSC version 1, where mean and standard deviation were calculated from 6-day windows means. 2. If the number of observations fell below a certain threshold, which was 20 observations per grid box and month for the air temperature as well as for the dew point temperature, and 500 per box and month for the air pressure, data from the adjacent boxes was used for the calculation. The neighbouring boxes were used in two steps (the nearest 8 boxes, and if the number was still below the threshold, the next sourrounding 16 boxes) to calculate the mean value of the center box. Thus, the spatial resolution of the parameters is reduced at certain points and, instead of 1 ° x 1 °, if neighboring values are taken into account, data from an area of 5 ° x 5 ° can also be considered, which are then averaged into a grid box value. This was especially used for air pressure, where the 24 values of the neighboring boxes were included in the averaging for most grid boxes. The mean value, the number of measurements, the standard deviation and the number of grid boxes used to calculate the mean values are available as parameters in the products. The calculated monthly and annual means were allocated to the centers of the grid boxes: Latitudes: 47.5, 48.5, ... Longitudes: -14.5, -13.5, … In order to remove any existing values over land, a land-sea mask was used, which is also provided in 1 ° x 1 ° resolution. In this version 2 of the BNSC, a slightly different database was used, than for the KNSC, which resulted in small changes (less than 1 K) in the means and standard deviations of the 2-meter air temperature and dew point temperature. The changes in mean sea level pressure values and the associated standard deviations are in the range of a few hPa, compared to the KNSC. The parameter names and units have been adjusted to meet the CF 1.6 standard.
These datasets are continuous parameter grids (CPG) of normal (average) first-of-month snow water equivalent data for March through August, averaged across all years, 2004 through 2016, in the Pacific Northwest. Source snow water equivalent data was produced by the Snow Data Assimilation System (SNODAS) at the National Snow and Ice Data Center.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Estimated mean intercepts when models are fitted to the reduced data set ED1.
https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
ERA-Interim is the latest European Centre for Medium-Range Weather Forecasts (ECMWF) global atmospheric reanalysis of the period 1979 to August 2019. This follows on from the ERA-15 and ERA-40 re-analysis projects.
The dataset includes monthly mean of daily mean potential temperature level data on a reduced N256 Gaussian grid.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The acoustic parameters of the acoustic calls were estimated automatically on each reference acoustic file using Kaleidoscope 5.6.2 (Wildlife Acoustics, Maynard, MA, USA): - Fpmean: mean frequency of the spectrum within the selection (multiple acoustic call); - Fppeak: frequency which has the highest (Peak) energy within the selection; - Fmax: average maximum frequency of call pulses; - Fc: frequency of the point at the end of the body of the call pulse which is defined as the flattest part (lowest absolute slope) of the call; - Fk: average knee frequency of calls; - Dur (ms): average duration of call pulses within the selection; - Fmin: average minimum frequency of call pulses; - Fmean: average frequency of echolocation pulses. If an individual had several acoustic reference files, the acoustic parameters were estimated on the average values.
https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
ERA-Interim is the latest European Centre for Medium-Range Weather Forecasts (ECMWF) global atmospheric reanalysis of the period 1979 to August 2019. This follows on from the ERA-15 and ERA-40 re-analysis projects.
The dataset includes synoptic monthly mean analysed surace level data on a reduced N256 Gaussian grid. Data are available at the 00, 06, 12 and 18 UT analysis times.
description: Point Layers: For the Hawaii offshore region, modeled mean wind speed data on an approximately 2-km grid were provided by Vaisala/3TIER, a renewable energy consulting firm. Each 1.2-km BOEM aliquot grid cell was assigned a mean wind speed that corresponds to the nearest 2-km Vaisala grid cell representing the majority of its area. The time-varying component of wind speed was calculated by analyzing the nearest MERRA (http://gmao.gsfc.nasa.gov/merra/) 17-year time-series record. The Weibull parameters were estimated from the MERRA wind speeds by computing the parameters of a Weibull distribution that has the same mean speed and wind energy as the observed MERRA data. These parameters were then scaled to match the Vaisala wind speeds assigned to each 1.2-km aliquot. This process created a long-term, monthly, and hourly (by month and for the whole 17-year period) Weibull representation of the 17-year wind speed for each aliquot. The resulting dataset is intended to provide broad estimates of wind speed variation for the purposes of identifying possible good wind energy sites. It is not intended to provide estimates of possible energy production for the purpose of making offshore wind project investment or financing decisions in specific locations.Explanation of Attributes:Results in the geodatabase are reported on the existing 1.2 km x 1.2 km aliquot grid defined by BOEM for the Hawaii offshore region. Wind speed statistics are reported at the center point of each aliquot grid, but represent the mean values over the entire area of each grid cell. The hybrid MERRA/AWST data set delivered to BOEM is a geodatabase consisting of one layer for the long-term statistics, one layer for each month, and one polygon layer of aliquots covered by the data. The long-term shapefile includes mean wind speed and Weibull parameters to capture the long-term wind speed distribution of the entire 17-year MERRA time series. Each monthly shapefile contains mean wind speed and Weibull parameters for that month overall and for each hour of the day within that month. All times are in HST (UTC-10). Polygon Layers - Polygons were created by creating a raster grid of the point files using the closest approximate x,y distance for a BOEM aliquot block of 0.0175 degrees, reclassifying the raster into wind classes and generating a polygon file from the reclassified raster.; abstract: Point Layers: For the Hawaii offshore region, modeled mean wind speed data on an approximately 2-km grid were provided by Vaisala/3TIER, a renewable energy consulting firm. Each 1.2-km BOEM aliquot grid cell was assigned a mean wind speed that corresponds to the nearest 2-km Vaisala grid cell representing the majority of its area. The time-varying component of wind speed was calculated by analyzing the nearest MERRA (http://gmao.gsfc.nasa.gov/merra/) 17-year time-series record. The Weibull parameters were estimated from the MERRA wind speeds by computing the parameters of a Weibull distribution that has the same mean speed and wind energy as the observed MERRA data. These parameters were then scaled to match the Vaisala wind speeds assigned to each 1.2-km aliquot. This process created a long-term, monthly, and hourly (by month and for the whole 17-year period) Weibull representation of the 17-year wind speed for each aliquot. The resulting dataset is intended to provide broad estimates of wind speed variation for the purposes of identifying possible good wind energy sites. It is not intended to provide estimates of possible energy production for the purpose of making offshore wind project investment or financing decisions in specific locations.Explanation of Attributes:Results in the geodatabase are reported on the existing 1.2 km x 1.2 km aliquot grid defined by BOEM for the Hawaii offshore region. Wind speed statistics are reported at the center point of each aliquot grid, but represent the mean values over the entire area of each grid cell. The hybrid MERRA/AWST data set delivered to BOEM is a geodatabase consisting of one layer for the long-term statistics, one layer for each month, and one polygon layer of aliquots covered by the data. The long-term shapefile includes mean wind speed and Weibull parameters to capture the long-term wind speed distribution of the entire 17-year MERRA time series. Each monthly shapefile contains mean wind speed and Weibull parameters for that month overall and for each hour of the day within that month. All times are in HST (UTC-10). Polygon Layers - Polygons were created by creating a raster grid of the point files using the closest approximate x,y distance for a BOEM aliquot block of 0.0175 degrees, reclassifying the raster into wind classes and generating a polygon file from the reclassified raster.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data represents the average time between events when the Shields parameter (Shields, 1936) exceeds 0.25 based on a Peaks-Over-Thresholds (POT) analysis. The Shields parameter (non-dimensional bed shear stress) value of 0.25 is assumed to be the threshold for creating disturbed patches. This value is several times larger than that required to initiate traction bedload transport (~0.05) and falls in the middle of the ripple and dune bedform stability field. It represents conditions when the seabed is highly mobile and where patches of disturbed habitat are likely to be created. The unit for the dataset is day. Shields, A. 1936. Application of similarity principles and turbulence research to bed-load movement. Mitteilunger der Preussischen Versuchsanstalt f'ur Wasserbau und Schiffbau 26: 5-24
You can also purchase hard copies of Geoscience Australia data and other products at http://www.ga.gov.au/products-services/how-to-order-products/sales-centre.html
This dataset is a continuous parameter grid (CPG) of mean basin elevation data (30 meter pixels) in the Pacific Northwest. Source data come from the U.S. Geological Survey National Elevation Dataset, via the National Hydrography Dataset Plus V2.
This dataset is a continuous parameter grid (CPG) of mean basin slope in the Pacific Northwest. Source data come from the U.S. Geological Survey National Elevation Dataset and NHDPlus Version 2.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
A-CURE was a NERC funded project that tackled one of the most challenging and persistent problems in atmospheric science – understanding and quantifying how changes in aerosol particles caused by anthropogenic activities affect climate. The data here are monthly mean variable data from a large perturbed parameter ensemble of UKESM1 simulations, nudged to horizontal winds above around 2km. Each variable has 220 or 221 members, as indicated in file names. Some months have one fewer member because a model variant repeatedly did not run to completion due to combined model parameter values. The 221 members are model variants that combine the effects of 54 aerosol and physical atmosphere parameters. Variable data in this ensemble span the uncertainty in UKESM1 from these parametric sources.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Animal ecologists often collect hierarchically-structured data and analyze these with linear mixed-effects models. Specific complications arise when the effect sizes of covariates vary on multiple levels (e.g., within vs among subjects). Mean-centering of covariates within subjects offers a useful approach in such situations, but is not without problems. A statistical model represents a hypothesis about the underlying biological process. Mean-centering within clusters assumes that the lower level responses (e.g. within subjects) depend on the deviation from the subject mean (relative) rather than on absolute values of the covariate. This may or may not be biologically realistic. We show that mismatch between the nature of the generating (i.e., biological) process and the form of the statistical analysis produce major conceptual and operational challenges for empiricists. We explored the consequences of mismatches by simulating data with three response-generating processes differing in the source of correlation between a covariate and the response. These data were then analyzed by three different analysis equations. We asked how robustly different analysis equations estimate key parameters of interest and under which circumstances biases arise. Mismatches between generating and analytical equations created several intractable problems for estimating key parameters. The most widely misestimated parameter was the among-subject variance in response. We found that no single analysis equation was robust in estimating all parameters generated by all equations. Importantly, even when response-generating and analysis equations matched mathematically, bias in some parameters arose when sampling across the range of the covariate was limited. Our results have general implications for how we collect and analyze data. They also remind us more generally that conclusions from statistical analysis of data are conditional on a hypothesis, sometimes implicit, for the process(es) that generated the attributes we measure. We discuss strategies for real data analysis in face of uncertainty about the underlying biological process. Methods All data were generated through simulations, so included with this submission are a Read Me file containing general descriptions of data files, a code file that contains R code for the simulations and analysis data files (which will generate new datasets with the same parameters) and the analyzed results in the data files archived here. These data files form the basis for all results presented in the published paper. The code file (in R markdown) has more detailed descriptions of each file of analyzed results.