Facebook
TwitteraMixed between-within subjects analysis of variance – reported: interaction effect time x group (Wilk's Lambda).bCohens d calculated as the mean difference between groups divided by pooled standard deviation at baseline.*p<.05, **p<.01.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
Facebook
TwitterThese four data files contain datasets from an interlaboratory comparison that characterized a polydisperse five-population bead dispersion in water. A more detailed version of this description is available in the ReadMe file (PdP-ILC_datasets_ReadMe_v1.txt), which also includes definitions of abbreviations used in the data files. Paired samples were evaluated, so the datasets are organized as pairs associated with a randomly assigned laboratory number. The datasets are organized in the files by instrument type: PTA (particle tracking analysis), RMM (resonant mass measurement), ESZ (electrical sensing zone), and OTH (other techniques not covered in the three largest groups, including holographic particle characterization, laser diffraction, flow imaging, and flow cytometry). In the OTH group, the specific instrument type for each dataset is noted. Each instrument type (PTA, RMM, ESZ, OTH) has a dedicated file. Included in the data files for each dataset are: (1) the cumulative particle number concentration (PNC, (1/mL)); (2) the concentration distribution density (CDD, (1/mL·nm)) based upon five bins centered at each particle population peak diameter; (3) the CDD in higher resolution, varied-width bins. The lower-diameter bin edge (µm) is given for (2) and (3). Additionally, the PTA, RMM, and ESZ files each contain unweighted mean cumulative particle number concentrations and concentration distribution densities calculated from all datasets reporting values. The associated standard deviations and standard errors of the mean are also given. In the OTH file, the means and standard deviations were calculated using only data from one of the sub-groups (holographic particle characterization) that had n = 3 paired datasets. Where necessary, datasets not using the common bin resolutions are noted (PTA, OTH groups). The data contained here are presented and discussed in a manuscript to be submitted to the Journal of Pharmaceutical Sciences and presented as part of that scientific record.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Outliers correspond to fixes with location error (LE)>3 standard deviations from the mean location error of all fixes in the same habitat (i.e., without regard to the visibility category). The last two columns report on the mean number of outliers ± standard deviation across each visibility, and LERMS values calculated from all fixes in the same habitat after removal of outlier values.
Facebook
TwitterMeans with standard deviation (SD) and within-group comparisons for state of mindfulness and emotional state.
Facebook
TwitterThese data are the standard error calculated from the AVISO Level 4 Absolute Dynamic Topography for Climate Model Comparison Number of Observations data set ( in PO.DAAC Drive at https://podaac-tools.jpl.nasa.gov/drive/files/allData/aviso/L4/abs_dynamic_topo ). This data set is not meant to be used alone, but with the absolute dynamic topography data. These data were generated to help support the CMIP5 (Coupled Model Intercomparison Project Phase 5) portion of PCMDI (Program for Climate Model Diagnosis and Intercomparison). The dynamic topograhy are from sea surface height measured by several satellites, Envisat, TOPEX/Poseidon, Jason-1 and OSTM/Jason-2 and referenced to the geoid. These data were provided by AVISO (French space agency data provider), which are based on a similar dynamic topography data set they already produce( http://www.aviso.oceanobs.com/index.php?id=1271 ).
Facebook
TwitterSea surface temperature (SST) plays an important role in a number of ecological processes and can vary over a wide range of time scales, from daily to decadal changes. SST influences primary production, species migration patterns, and coral health. If temperatures are anomalously warm for extended periods of time, drastic changes in the surrounding ecosystem can result, including harmful effects such as coral bleaching. This layer represents the standard deviation of SST (degrees Celsius) of the weekly time series from 2000-2013. Three SST datasets were combined to provide continuous coverage from 1985-2013. The concatenation applies bias adjustment derived from linear regression to the overlap periods of datasets, with the final representation matching the 0.05-degree (~5-km) near real-time SST product. First, a weekly composite, gap-filled SST dataset from the NOAA Pathfinder v5.2 SST 1/24-degree (~4-km), daily dataset (a NOAA Climate Data Record) for each location was produced following Heron et al. (2010) for January 1985 to December 2012. Next, weekly composite SST data from the NOAA/NESDIS/STAR Blended SST 0.1-degree (~11-km), daily dataset was produced for February 2009 to October 2013. Finally, a weekly composite SST dataset from the NOAA/NESDIS/STAR Blended SST 0.05-degree (~5-km), daily dataset was produced for March 2012 to December 2013. The standard deviation of the long-term mean SST was calculated by taking the standard deviation over all weekly data from 2000-2013 for each pixel.
Facebook
TwitterThe mean and standard deviation, shown in brackets, of the behavioral measures and group comparison results.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Dr. Hany Salem
Released under CC0: Public Domain
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data arising from longitudinal studies are commonly analyzed with generalized estimating equations (GEE). Previous literature has shown that liberal inference may result from the use of the empirical sandwich covariance matrix estimator when the number of subjects is small. Therefore, two different approaches have been used to improve the validity of inference. First, many different small-sample corrections to the empirical estimator have been offered in order to reduce bias in resulting standard error estimates. Second, critical values can be obtained from a t-distribution or F-distribution with approximated degrees of freedom. Although limited studies on the comparison of these small-sample corrections and degrees of freedom have been published, there is need for a comprehensive study of currently existing methods in a wider range of scenarios. Therefore, in this manuscript we conduct such a simulation study, finding two methods to attain nominal type I error rates more consistently than other methods in a variety of settings: First, a recently proposed method by Westgate and Burchett (2016, Statistics in Medicine 35, 3733-3744) that specifies both a covariance estimator and degrees of freedom, and second, an average of two popular corrections developed by Mancl and DeRouen (2001, Biometrics 57, 126-134) and Kauermann and Carroll (2001, Journal of the American Statistical Association 96, 1387-1396) with degrees of freedom equaling the number of subjects minus the number of parameters in the marginal model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The prefix ’S’ denotes the use of a single-kernel SVR. (CC: Correlation Coefficient; RMSE: Root Mean Square Error)Performance (mean ± standard deviation) comparison among all competing methods.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 2 rows and is filtered where the books is Standard deviations : flawed assumptions, tortured data, and other ways to lie with statistics. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
Facebook
TwitterChlorophyll-a is a widely used proxy for phytoplankton biomass and an indicator for changes in phytoplankton production. As an essential source of energy in the marine environment, the extent and availability of phytoplankton biomass can be highly influential for fisheries production and dictate trophic structure in marine ecosystems. Changes in phytoplankton biomass are predominantly effected by changes in nutrient availability, through either natural (e.g., turbulent ocean mixing) or anthropogenic (e.g., agricultural runoff) processes. This layer represents the standard deviation of the 8-day time series of chlorophyll-a (mg/m3) from 2002-2013. Monthly and 8-day 4-km (0.0417-degree) spatial resolution data were obtained from the MODIS (Moderate-resolution Imaging Spectroradiometer) Aqua satellite instrument from the NASA OceanColor website (http://oceancolor.gsfc.nasa.gov). The standard deviation was calculated over all 8-day chlorophyll-a data from 2002-2013 for each pixel. A quality control mask was applied to remove spurious data associated with shallow water, following Gove et al., 2013. Nearshore map pixels with no data were filled with values from the nearest neighboring valid offshore pixel by using a grid of points and the Near Analysis tool in ArcGIS then converting points to raster.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: Empty cells mean no such genotypes were found in our sample. Maj: Major allele; Het: Heterozygote; Min: Minor allele.aResults (p values) of post hoc comparisons. mh = Maj versus Het, mm = Maj versus Min, hm = Het versus Min.bPost hoc comparison was not run because there were only 2 groups for this locus.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The subset of data corresponding to low reaction level were divided into two data sets by randomly sampling 3 out of the 6 biological replicates from each brain region. Each probe set was individually and with its redundant probe set partner for differential expression, fold-change correlations across and within sub-data sets, as described in the Methods. The mean and standard deviation were calculated for each summary statistics across twenty re-samplings of the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The monthly air temperature in 1153 stations and precipitation in 1202 stations in China and neighboring countries were collected to construct a monthly climate dataset in China on 0.025 ° resolution (approximately 2.5 km) named LZU0025 dataset designed by Lanzhou University (LZU), using a partial thin plate smoothing method embedded in the ANUSPLIN software. The accuracy of the LZU0025 was evaluated from analyzing three aspects: 1) Diagnostic statistics from surface fitting model in the period of 1951-2011, and results show low mean square root of generalized cross validation (RTGCV) for monthly air temperature surface (1.1 °C) and monthly precipitation surface (2 mm1/2) which interpolated the square root of itself. This indicate exact surface fitting models. 2) Error statistics based on 265 withheld stations data in the period of 1951-2011, and results show that predicted values closely tracked true values with mean absolute error (MAE) of 0.6 °C and 4 mm and standard deviation of mean error (STD) of 1.3 °C and 5 mm, and monthly STDs presented consistent change with RTGCV varying. 3) Comparisons to other datasets through two ways, one was to compare three indices namely the standard deviation, mean and time trend derived from all datasets to referenced dataset released by the China Meteorological Administration (CMA) in the Taylor diagrams, the other was to compare LZU0025 to the Camp Tibet dataset on mountainous remote area. Taylor diagrams displayed the standard deviation derived from LZU had higher correlation with that induced from CMA (Pearson correlation R=0.76 for air temperature case and R=0.96 for precipitation case). The standard deviation for this index derived from LZU was more close to that induced from CMA, and the centered normalized root-mean-square difference for this index derived from LZU and CMA was lower. The same superior performance of LZU were found in comparing indices of the mean and time trend derived from LZU and those induced from other datasets. LZU0025 had high correlation with the Camp dataset for air temperature despite of insignificant correlation for precipitation in few stations. Based on above comprehensive analyses, LZU0025 was concluded as the reliable dataset.
Facebook
TwitterBackground Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics of the dataset with mean, standard deviation (SD), median, and the lower (quantile 5%) and upper (quantile 95%) boundary of the 90% confidence interval.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mean, standard deviation and standard error of absolute and signed error (cm).
Facebook
TwitteraMixed between-within subjects analysis of variance – reported: interaction effect time x group (Wilk's Lambda).bCohens d calculated as the mean difference between groups divided by pooled standard deviation at baseline.*p<.05, **p<.01.