Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
Facebook
TwitteraMixed between-within subjects analysis of variance – reported: interaction effect time x group (Wilk's Lambda).bCohens d calculated as the mean difference between groups divided by pooled standard deviation at baseline.*p<.05, **p<.01.
Facebook
TwitterThese four data files contain datasets from an interlaboratory comparison that characterized a polydisperse five-population bead dispersion in water. A more detailed version of this description is available in the ReadMe file (PdP-ILC_datasets_ReadMe_v1.txt), which also includes definitions of abbreviations used in the data files. Paired samples were evaluated, so the datasets are organized as pairs associated with a randomly assigned laboratory number. The datasets are organized in the files by instrument type: PTA (particle tracking analysis), RMM (resonant mass measurement), ESZ (electrical sensing zone), and OTH (other techniques not covered in the three largest groups, including holographic particle characterization, laser diffraction, flow imaging, and flow cytometry). In the OTH group, the specific instrument type for each dataset is noted. Each instrument type (PTA, RMM, ESZ, OTH) has a dedicated file. Included in the data files for each dataset are: (1) the cumulative particle number concentration (PNC, (1/mL)); (2) the concentration distribution density (CDD, (1/mL·nm)) based upon five bins centered at each particle population peak diameter; (3) the CDD in higher resolution, varied-width bins. The lower-diameter bin edge (µm) is given for (2) and (3). Additionally, the PTA, RMM, and ESZ files each contain unweighted mean cumulative particle number concentrations and concentration distribution densities calculated from all datasets reporting values. The associated standard deviations and standard errors of the mean are also given. In the OTH file, the means and standard deviations were calculated using only data from one of the sub-groups (holographic particle characterization) that had n = 3 paired datasets. Where necessary, datasets not using the common bin resolutions are noted (PTA, OTH groups). The data contained here are presented and discussed in a manuscript to be submitted to the Journal of Pharmaceutical Sciences and presented as part of that scientific record.
Facebook
TwitterMeans with standard deviation (SD) and within-group comparisons for state of mindfulness and emotional state.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Outliers correspond to fixes with location error (LE)>3 standard deviations from the mean location error of all fixes in the same habitat (i.e., without regard to the visibility category). The last two columns report on the mean number of outliers ± standard deviation across each visibility, and LERMS values calculated from all fixes in the same habitat after removal of outlier values.
Facebook
TwitterThese data are the standard error calculated from the AVISO Level 4 Absolute Dynamic Topography for Climate Model Comparison Number of Observations data set ( in PO.DAAC Drive at https://podaac-tools.jpl.nasa.gov/drive/files/allData/aviso/L4/abs_dynamic_topo ). This data set is not meant to be used alone, but with the absolute dynamic topography data. These data were generated to help support the CMIP5 (Coupled Model Intercomparison Project Phase 5) portion of PCMDI (Program for Climate Model Diagnosis and Intercomparison). The dynamic topograhy are from sea surface height measured by several satellites, Envisat, TOPEX/Poseidon, Jason-1 and OSTM/Jason-2 and referenced to the geoid. These data were provided by AVISO (French space agency data provider), which are based on a similar dynamic topography data set they already produce( http://www.aviso.oceanobs.com/index.php?id=1271 ).
Facebook
TwitterStandardized canonical discriminant function coefficient comparing variables in different scales (variables were adjusted by subtraction of its mean value and division by its standard deviation).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data arising from longitudinal studies are commonly analyzed with generalized estimating equations (GEE). Previous literature has shown that liberal inference may result from the use of the empirical sandwich covariance matrix estimator when the number of subjects is small. Therefore, two different approaches have been used to improve the validity of inference. First, many different small-sample corrections to the empirical estimator have been offered in order to reduce bias in resulting standard error estimates. Second, critical values can be obtained from a t-distribution or F-distribution with approximated degrees of freedom. Although limited studies on the comparison of these small-sample corrections and degrees of freedom have been published, there is need for a comprehensive study of currently existing methods in a wider range of scenarios. Therefore, in this manuscript we conduct such a simulation study, finding two methods to attain nominal type I error rates more consistently than other methods in a variety of settings: First, a recently proposed method by Westgate and Burchett (2016, Statistics in Medicine 35, 3733-3744) that specifies both a covariance estimator and degrees of freedom, and second, an average of two popular corrections developed by Mancl and DeRouen (2001, Biometrics 57, 126-134) and Kauermann and Carroll (2001, Journal of the American Statistical Association 96, 1387-1396) with degrees of freedom equaling the number of subjects minus the number of parameters in the marginal model.
Facebook
Twitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The prefix ’S’ denotes the use of a single-kernel SVR. (CC: Correlation Coefficient; RMSE: Root Mean Square Error)Performance (mean ± standard deviation) comparison among all competing methods.
Facebook
TwitterData for Figure 3.39 from Chapter 3 of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Figure 3.39 shows the observed and simulated Pacific Decadal Variability (PDV). --------------------------------------------------- How to cite this dataset --------------------------------------------------- When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Eyring, V., N.P. Gillett, K.M. Achuta Rao, R. Barimalala, M. Barreiro Parrillo, N. Bellouin, C. Cassou, P.J. Durack, Y. Kosaka, S. McGregor, S. Min, O. Morgenstern, and Y. Sun, 2021: Human Influence on the Climate System. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 423–552, doi:10.1017/9781009157896.005. --------------------------------------------------- Figure subpanels --------------------------------------------------- The figure has six panels. Files are not separated according to the panels. --------------------------------------------------- List of data provided --------------------------------------------------- pdv.obs.nc contains - Observed SST anomalies associated with the PDV pattern - Observed PDV index time series (unfiltered) - Observed PDV index time series (low-pass filtered) - Taylor statistics of the observed PDV patterns - Statistical significance of the observed SST anomalies associated with the PDV pattern pdv.hist.cmip6.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP6 historical simulations. pdv.hist.cmip5.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP5 historical simulations. pdv.piControl.cmip6.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP6 piControl simulations. pdv.piControl.cmip5.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP5 piControl simulations. --------------------------------------------------- Data provided in relation to figure --------------------------------------------------- Panel a: - ipo_pattern_obs_ref in pdv.obs.nc: shading - ipo_pattern_obs_signif (dataset = 1) in pdv.obs.nc: cross markers Panel b: - Multimodel ensemble mean of ipo_model_pattern in pdv.hist.cmip6.nc: shading, with their sign agreement for hatching Panel c: - tay_stats (stat = 0, 1) in pdv.obs.nc: black dots - tay_stats (stat = 0, 1) in pdv.hist.cmip6.nc: red crosses, and their multimodel ensemble mean for the red dot - tay_stats (stat = 0, 1) in pdv.hist.cmip5.nc: blue crosses, and their multimodel ensemble mean for the blue dot Panel d: - Lag-1 autocorrelation of tpi in pdv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.hist.cmip6.nc: red filled box-whisker in the left - Lag-10 autocorrelation of tpi_lp in pdv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.hist.cmip6.nc: red filled box-whisker in the right Panel e: - Standard deviation of tpi in pdv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.hist.cmip6.nc: red filled box-whisker in the left - Standard deviation of tpi_lp in pdv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.hist.cmip6.nc: red filled box-whisker in the right Panel f: - tpi_lp in pdv.obs.nc: black curves . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - tpi_lp in pdv.hist.cmip6.nc: 5th-95th percentiles in red shading, multimodel ensemble mean and its 5-95% confidence interval for red curves - tpi_lp in pdv.hist.cmip5.nc: 5th-95th percentiles in blue shading, multimodel ensemble mean for blue curve CMIP5 is the fifth phase of the Coupled Model Intercomparison Project. CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. SST stands for Sea Surface Temperature. --------------------------------------------------- Notes on reproducing the figure from the provided data --------------------------------------------------- Multimodel ensemble means and percentiles of historical simulations of CMIP5 and CMIP6 are calculated after weighting individual members with the inverse of the ensemble size of the same model. ensemble_assign in each file provides the model number to which each ensemble member belongs. This weighting does not apply to the sign agreement calculation. piControl simulations from CMIP5 and CMIP6 consist of a single member from each model, so the weighting is not applied. Multimodel ensemble means of the pattern correlation in Taylor statistics in (c) and the autocorrelation of the index in (d) are calculated via Fisher z-transformation and back transformation. --------------------------------------------------- Sources of additional information --------------------------------------------------- The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the report component containing the figure (Chapter 3) - Link to the Supplementary Material for Chapter 3, which contains details on the input data used in Table 3.SM.1 - Link to the code for the figure, archived on Zenodo - Link to the figure on the IPCC AR6 website
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The monthly air temperature in 1153 stations and precipitation in 1202 stations in China and neighboring countries were collected to construct a monthly climate dataset in China on 0.025 ° resolution (approximately 2.5 km) named LZU0025 dataset designed by Lanzhou University (LZU), using a partial thin plate smoothing method embedded in the ANUSPLIN software. The accuracy of the LZU0025 was evaluated from analyzing three aspects: 1) Diagnostic statistics from surface fitting model in the period of 1951-2011, and results show low mean square root of generalized cross validation (RTGCV) for monthly air temperature surface (1.1 °C) and monthly precipitation surface (2 mm1/2) which interpolated the square root of itself. This indicate exact surface fitting models. 2) Error statistics based on 265 withheld stations data in the period of 1951-2011, and results show that predicted values closely tracked true values with mean absolute error (MAE) of 0.6 °C and 4 mm and standard deviation of mean error (STD) of 1.3 °C and 5 mm, and monthly STDs presented consistent change with RTGCV varying. 3) Comparisons to other datasets through two ways, one was to compare three indices namely the standard deviation, mean and time trend derived from all datasets to referenced dataset released by the China Meteorological Administration (CMA) in the Taylor diagrams, the other was to compare LZU0025 to the Camp Tibet dataset on mountainous remote area. Taylor diagrams displayed the standard deviation derived from LZU had higher correlation with that induced from CMA (Pearson correlation R=0.76 for air temperature case and R=0.96 for precipitation case). The standard deviation for this index derived from LZU was more close to that induced from CMA, and the centered normalized root-mean-square difference for this index derived from LZU and CMA was lower. The same superior performance of LZU were found in comparing indices of the mean and time trend derived from LZU and those induced from other datasets. LZU0025 had high correlation with the Camp dataset for air temperature despite of insignificant correlation for precipitation in few stations. Based on above comprehensive analyses, LZU0025 was concluded as the reliable dataset.
Facebook
TwitterView on Map View ArcGIS Service BTM Standard deviation – this mosaic dataset is part of a series of seafloor terrain datasets aimed at providing a consistent baseline to assist users in consistently characterizing Aotearoa New Zealand seafloor habitats. This series has been developed using the tools provided within the Benthic Terrain Model (BTM [v3.0]) across different multibeam echo-sounder datasets. The series includes derived outputs from 50 MBES survey sets conducted between 1999 and 2020 from throughout the New Zealand marine environment (where available) covering an area of approximately 52,000 km2. Consistency and compatibility of the benthic terrain datasets have been achieved by utilising a common projected coordinate system (WGS84 Web Mercator), resolution (10 m), and by using a standard classification dictionary (also utilised by previous BTM studies in NZ). However, we advise caution when comparing the classification between different survey areas.Derived BTM outputs include the Bathymetric Position Index (BPI); Surface Derivative; Rugosity; Depth Statistics; Terrain Classification. A standardised digital surface model, and derived hillshade and aspect datasets have also been made available. The index of the original MBES survey surface models used in this analysis can be accessed from https://data.linz.govt.nz/layer/95574-nz-bathymetric-surface-model-index/The full report and description of available output datasets are available at: https://www.doc.govt.nz/globalassets/documents/science-and-technical/drds367entire.pdf
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Standard deviation of responses for 'Life Satisfaction' in the First ONS Annual Experimental Subjective Wellbeing survey. The Office for National Statistics has included the four subjective well-being questions below on the Annual Population Survey (APS), the largest of their household surveys. Overall, how satisfied are you with your life nowadays?
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
Facebook
TwitterThis data set contains the standard deviation of SeaWIFS k490 generated from the climatology monthly means; the monthly climatologies represent the mean values for each month across the whole dataset time series. K490 indicates the turbidity of the water column: how the visible light in the blue; green region of the spectrum penetrates within the water column. It is directly related to the presence of scattering particles in the water column. The data are received as monthly composites, with a 4 km resolution, and are constrained to the region between 90E and 180E, and 10N to 60S. The data was sourced from http://oceancolor.gsfc.nasa.gov/SeaWiFS/. This dataset is a contribution to the CERF Marine Biodiversity Hub.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains customer satisfaction scores collected from a survey, alongside key demographic and behavioral data. It includes variables such as customer age, gender, location, purchase history, support contact status, loyalty level, and satisfaction factors. The dataset is designed to help analyze customer satisfaction, identify trends, and develop insights that can drive business decisions.
File Information: File Name: customer_satisfaction_data.csv (or your specific file name)
File Type: CSV (or the actual file format you are using)
Number of Rows: 120
Number of Columns: 10
Column Names:
Customer_ID – Unique identifier for each customer (e.g., 81-237-4704)
Group – The group to which the customer belongs (A or B)
Satisfaction_Score – Customer's satisfaction score on a scale of 1-10
Age – Age of the customer
Gender – Gender of the customer (Male, Female)
Location – Customer's location (e.g., Phoenix.AZ, Los Angeles.CA)
Purchase_History – Whether the customer has made a purchase (Yes or No)
Support_Contacted – Whether the customer has contacted support (Yes or No)
Loyalty_Level – Customer's loyalty level (Low, Medium, High)
Satisfaction_Factor – Primary factor contributing to customer satisfaction (e.g., Price, Product Quality)
Statistical Analyses:
Descriptive Statistics:
Calculate mean, median, mode, standard deviation, and range for key numerical variables (e.g., Satisfaction Score, Age).
Summarize categorical variables (e.g., Gender, Loyalty Level, Purchase History) with frequency distributions and percentages.
Two-Sample t-Test (Independent t-test):
Compare the mean satisfaction scores between two independent groups (e.g., Group A vs. Group B) to determine if there is a significant difference in their average satisfaction scores.
Paired t-Test:
If there are two related measurements (e.g., satisfaction scores before and after a certain event), you can compare the means using a paired t-test.
One-Way ANOVA (Analysis of Variance):
Test if there are significant differences in mean satisfaction scores across more than two groups (e.g., comparing the mean satisfaction score across different Loyalty Levels).
Chi-Square Test for Independence:
Examine the relationship between two categorical variables (e.g., Gender vs. Purchase History or Loyalty Level vs. Support Contacted) to determine if there’s a significant association.
Mann-Whitney U Test:
For non-normally distributed data, use this test to compare satisfaction scores between two independent groups (e.g., Group A vs. Group B) to see if their distributions differ significantly.
Kruskal-Wallis Test:
Similar to ANOVA, but used for non-normally distributed data. This test can compare the median satisfaction scores across multiple groups (e.g., comparing satisfaction scores across Loyalty Levels or Satisfaction Factors).
Spearman’s Rank Correlation:
Test for a monotonic relationship between two ordinal or continuous variables (e.g., Age vs. Satisfaction Score or Satisfaction Score vs. Loyalty Level).
Regression Analysis:
Linear Regression: Model the relationship between a continuous dependent variable (e.g., Satisfaction Score) and independent variables (e.g., Age, Gender, Loyalty Level).
Logistic Regression: If analyzing binary outcomes (e.g., Purchase History or Support Contacted), you could model the probability of an outcome based on predictors.
Factor Analysis:
To identify underlying patterns or groups in customer behavior or satisfaction factors, you can apply Factor Analysis to reduce the dimensionality of the dataset and group similar variables.
Cluster Analysis:
Use K-Means Clustering or Hierarchical Clustering to group customers based on similarity in their satisfaction scores and other features (e.g., Loyalty Level, Purchase History).
Confidence Intervals:
Calculate confidence intervals for the mean of satisfaction scores or any other metric to estimate the range in which the true population mean might lie.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of method’s standard deviations. Across all datasets MPC has a smaller standard deviation than Oasis, but larger than that of STM.
Facebook
TwitterGridded climatic datasets with fine spatial resolution can potentially be used to depict the climatic characteristics across the complex topography of China. In this study we collected records of monthly temperature at 1153 stations and precipitation at 1202 stations in China and neighboring countries to construct a monthly climate dataset in China with a 0.025° resolution (~2.5 km). The dataset, named LZU0025, was designed by Lanzhou University and used a partial thin plate smoothing method embedded in the ANUSPLIN software. The accuracy of LZU0025 was evaluated based on three aspects: (1) Diagnostic statistics from the surface fitting model during 1951–2011. The results indicate a low mean square root of generalized cross validation (RTGCV) for the monthly air temperature surface (1.06 °C) and monthly precipitation surface (1.97 mm1/2). (2) Error statistics of comparisons between interpolated monthly LZU0025 with the withholding of climatic data from 265 stations during 1951–2011. The results show that the predicted values closely tracked the real true values with values of mean absolute error (MAE) of 0.59 °C and 70.5 mm, and standard deviation of the mean error (STD) of 1.27 °C and 122.6 mm. In addition, the monthly STDs exhibited a consistent pattern of variation with RTGCV. (3) Comparison with other datasets. This was done in two ways. The first was via comparison of standard deviation, mean and time trend derived from all datasets to a reference dataset released by the China Meteorological Administration (CMA), using Taylor diagrams. The second was to compare LZU0025 with the station dataset in the Tibetan Plateau. Taylor diagrams show that the standard deviation, mean and time trend derived from LZU had a higher correlation with that produced by the CMA, and the centered normalized root-mean-square difference for this index derived from LZU and CMA was lower. LZU0025 had high correlation with the Coordinated Energy and Water Cycle Observation Project (CEOP) - Asian Monsoon Project, (CAMP) Tibet surface meteorology station dataset for air temperature, despite a non-significant correlation for precipitation at a few stations. Based on this comprehensive analysis, we conclude that LZU0025 is a reliable dataset. LZU0025, which has a fine resolution, can be used to identify a greater number of climate types, such as tundra and subpolar continental, along the Himalayan Mountain. We anticipate that LZU0025 can be used for the monitoring of regional climate change and precision agriculture modulation under global climate change.
Facebook
TwitterOverview: 142: Areas used for sports, leisure and recreation purposes. Traceability (lineage): This dataset was produced with a machine learning framework with several input datasets, specified in detail in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ) Scientific methodology: The single-class probability layers were generated with a spatiotemporal ensemble machine learning framework detailed in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ). The single-class uncertainty layers were calculated by taking the standard deviation of the three single-class probabilities predicted by the three components of the ensemble. The HCL (hard class) layers represents the class with the highest probability as predicted by the ensemble. Usability: The HCL layers have a decreasing average accuracy (weighted F1-score) at each subsequent level in the CLC hierarchy. These metrics are 0.83 at level 1 (5 classes):, 0.63 at level 2 (14 classes), and 0.49 at level 3 (43 classes). This means that the hard-class maps are more reliable when aggregating classes to a higher level in the hierarchy (e.g. 'Discontinuous Urban Fabric' and 'Continuous Urban Fabric' to 'Urban Fabric'). Some single-class probabilities may more closely represent actual patterns for some classes that were overshadowed by unequal sample point distributions. Users are encouraged to set their own thresholds when postprocessing these datasets to optimize the accuracy for their specific use case. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: The LULC classification was validated through spatial 5-fold cross-validation as detailed in the accompanying publication. Completeness: The dataset has chunks of empty predictions in regions with complex coast lines (e.g. the Zeeland province in the Netherlands and the Mar da Palha bay area in Portugal). These are artifacts that will be avoided in subsequent versions of the LULC product. Consistency: The accuracy of the predictions was compared per year and per 30km*30km tile across europe to derive temporal and spatial consistency by calculating the standard deviation. The standard deviation of annual weighted F1-score was 0.135, while the standard deviation of weighted F1-score per tile was 0.150. This means the dataset is more consistent through time than through space: Predictions are notably less accurate along the Mediterrranean coast. The accompanying publication contains additional information and visualisations. Positional accuracy: The raster layers have a resolution of 30m, identical to that of the Landsat data cube used as input features for the machine learning framework that predicted it. Temporal accuracy: The dataset contains predictions and uncertainty layers for each year between 2000 and 2019. Thematic accuracy: The maps reproduce the Corine Land Cover classification system, a hierarchical legend that consists of 5 classes at the highest level, 14 classes at the second level, and 44 classes at the third level. Class 523: Oceans was omitted due to computational constraints.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.