Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
Facebook
TwitteraMixed between-within subjects analysis of variance – reported: interaction effect time x group (Wilk's Lambda).bCohens d calculated as the mean difference between groups divided by pooled standard deviation at baseline.*p<.05, **p<.01.
Facebook
TwitterThese four data files contain datasets from an interlaboratory comparison that characterized a polydisperse five-population bead dispersion in water. A more detailed version of this description is available in the ReadMe file (PdP-ILC_datasets_ReadMe_v1.txt), which also includes definitions of abbreviations used in the data files. Paired samples were evaluated, so the datasets are organized as pairs associated with a randomly assigned laboratory number. The datasets are organized in the files by instrument type: PTA (particle tracking analysis), RMM (resonant mass measurement), ESZ (electrical sensing zone), and OTH (other techniques not covered in the three largest groups, including holographic particle characterization, laser diffraction, flow imaging, and flow cytometry). In the OTH group, the specific instrument type for each dataset is noted. Each instrument type (PTA, RMM, ESZ, OTH) has a dedicated file. Included in the data files for each dataset are: (1) the cumulative particle number concentration (PNC, (1/mL)); (2) the concentration distribution density (CDD, (1/mL·nm)) based upon five bins centered at each particle population peak diameter; (3) the CDD in higher resolution, varied-width bins. The lower-diameter bin edge (µm) is given for (2) and (3). Additionally, the PTA, RMM, and ESZ files each contain unweighted mean cumulative particle number concentrations and concentration distribution densities calculated from all datasets reporting values. The associated standard deviations and standard errors of the mean are also given. In the OTH file, the means and standard deviations were calculated using only data from one of the sub-groups (holographic particle characterization) that had n = 3 paired datasets. Where necessary, datasets not using the common bin resolutions are noted (PTA, OTH groups). The data contained here are presented and discussed in a manuscript to be submitted to the Journal of Pharmaceutical Sciences and presented as part of that scientific record.
Facebook
TwitterMeans with standard deviation (SD) and within-group comparisons for state of mindfulness and emotional state.
Facebook
TwitterThese data are the standard error calculated from the AVISO Level 4 Absolute Dynamic Topography for Climate Model Comparison Number of Observations data set ( in PO.DAAC Drive at https://podaac-tools.jpl.nasa.gov/drive/files/allData/aviso/L4/abs_dynamic_topo ). This data set is not meant to be used alone, but with the absolute dynamic topography data. These data were generated to help support the CMIP5 (Coupled Model Intercomparison Project Phase 5) portion of PCMDI (Program for Climate Model Diagnosis and Intercomparison). The dynamic topograhy are from sea surface height measured by several satellites, Envisat, TOPEX/Poseidon, Jason-1 and OSTM/Jason-2 and referenced to the geoid. These data were provided by AVISO (French space agency data provider), which are based on a similar dynamic topography data set they already produce( http://www.aviso.oceanobs.com/index.php?id=1271 ).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 2 rows and is filtered where the books is Standard deviations : flawed assumptions, tortured data, and other ways to lie with statistics. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Outliers correspond to fixes with location error (LE)>3 standard deviations from the mean location error of all fixes in the same habitat (i.e., without regard to the visibility category). The last two columns report on the mean number of outliers ± standard deviation across each visibility, and LERMS values calculated from all fixes in the same habitat after removal of outlier values.
Facebook
TwitterSea surface temperature (SST) plays an important role in a number of ecological processes and can vary over a wide range of time scales, from daily to decadal changes. SST influences primary production, species migration patterns, and coral health. If temperatures are anomalously warm for extended periods of time, drastic changes in the surrounding ecosystem can result, including harmful effects such as coral bleaching. This layer represents the standard deviation of SST (degrees Celsius) of the weekly time series from 2000-2013. Three SST datasets were combined to provide continuous coverage from 1985-2013. The concatenation applies bias adjustment derived from linear regression to the overlap periods of datasets, with the final representation matching the 0.05-degree (~5-km) near real-time SST product. First, a weekly composite, gap-filled SST dataset from the NOAA Pathfinder v5.2 SST 1/24-degree (~4-km), daily dataset (a NOAA Climate Data Record) for each location was produced following Heron et al. (2010) for January 1985 to December 2012. Next, weekly composite SST data from the NOAA/NESDIS/STAR Blended SST 0.1-degree (~11-km), daily dataset was produced for February 2009 to October 2013. Finally, a weekly composite SST dataset from the NOAA/NESDIS/STAR Blended SST 0.05-degree (~5-km), daily dataset was produced for March 2012 to December 2013. The standard deviation of the long-term mean SST was calculated by taking the standard deviation over all weekly data from 2000-2013 for each pixel.
Facebook
Twitter
Facebook
TwitterComparison of exact expectations and standard deviations of times to coalescence Tk to their asymptotic approximations proposed by Chen and Chen (2013) [16], for n = 800.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: Empty cells mean no such genotypes were found in our sample. Maj: Major allele; Het: Heterozygote; Min: Minor allele.aResults (p values) of post hoc comparisons. mh = Maj versus Het, mm = Maj versus Min, hm = Het versus Min.bPost hoc comparison was not run because there were only 2 groups for this locus.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data arising from longitudinal studies are commonly analyzed with generalized estimating equations (GEE). Previous literature has shown that liberal inference may result from the use of the empirical sandwich covariance matrix estimator when the number of subjects is small. Therefore, two different approaches have been used to improve the validity of inference. First, many different small-sample corrections to the empirical estimator have been offered in order to reduce bias in resulting standard error estimates. Second, critical values can be obtained from a t-distribution or F-distribution with approximated degrees of freedom. Although limited studies on the comparison of these small-sample corrections and degrees of freedom have been published, there is need for a comprehensive study of currently existing methods in a wider range of scenarios. Therefore, in this manuscript we conduct such a simulation study, finding two methods to attain nominal type I error rates more consistently than other methods in a variety of settings: First, a recently proposed method by Westgate and Burchett (2016, Statistics in Medicine 35, 3733-3744) that specifies both a covariance estimator and degrees of freedom, and second, an average of two popular corrections developed by Mancl and DeRouen (2001, Biometrics 57, 126-134) and Kauermann and Carroll (2001, Journal of the American Statistical Association 96, 1387-1396) with degrees of freedom equaling the number of subjects minus the number of parameters in the marginal model.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
Facebook
TwitterChlorophyll-a is a widely used proxy for phytoplankton biomass and an indicator for changes in phytoplankton production. As an essential source of energy in the marine environment, the extent and availability of phytoplankton biomass can be highly influential for fisheries production and dictate trophic structure in marine ecosystems. Changes in phytoplankton biomass are predominantly effected by changes in nutrient availability, through either natural (e.g., turbulent ocean mixing) or anthropogenic (e.g., agricultural runoff) processes. This layer represents the standard deviation of the 8-day time series of chlorophyll-a (mg/m3) from 2002-2013. Monthly and 8-day 4-km (0.0417-degree) spatial resolution data were obtained from the MODIS (Moderate-resolution Imaging Spectroradiometer) Aqua satellite instrument from the NASA OceanColor website (http://oceancolor.gsfc.nasa.gov). The standard deviation was calculated over all 8-day chlorophyll-a data from 2002-2013 for each pixel. A quality control mask was applied to remove spurious data associated with shallow water, following Gove et al., 2013. Nearshore map pixels with no data were filled with values from the nearest neighboring valid offshore pixel by using a grid of points and the Near Analysis tool in ArcGIS then converting points to raster.
Facebook
TwitterThis part of the data release contains a grid of standard deviations of bathymetric soundings within each 0.5 m x 0.5 m grid cell. The bathymetry was collected on February 1, 2011, in the Sacramento River from the confluence of the Feather River to Knights Landing. The standard deviations represent one component of bathymetric uncertainty in the final digital elevation model (DEM), which is also available in this data release. The bathymetry data were collected by the USGS Pacific Coastal and Marine Science Center (PCMSC) team with collaboration and funding from the U.S. Army Corps of Engineers. This project used interferometric sidescan sonar to characterize the riverbed and channel banks along a 12 mile reach of the Sacramento River near the town of Knights Landing, California (River Mile 79 through River Mile 91) to aid in the understanding of fish response to the creation of safe habitat associated with levee restoration efforts in two 1.5 mile reaches of the Sacramento River between River Mile 80 and 86.
Facebook
TwitterVersion 1 is the current version of the dataset.This collection MODFDS_SDV_GLB_L3 provides level 3 standard deviation of climatological monthly frequency of dust storms (FDS) over land from 175°W to 175°E and 80°S to 80°N at a spatial resolution of 0.1˚ x 0.1˚. It is derived from Level 2, the Moderate Resolution Imaging Spectroradiometer (MODIS) Deep Blue aerosol products Collection 6.1 from Terra (MOD04_L2). The dataset is the standard deviation of climatological monthly mean for each month over 2000 to 2022.The FDS is calculated as the number of days per month when the daily dust optical depth is greater than a threshold optical depth (e.g., 0.025) with two quality flags: the lowest (1) and highest (3). It is advised to use flag 1, which is of lower quality, over dust source regions, and flag 3 over remote areas or polluted regions. Eight thresholds (0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 2) are saved separately in eight files.If you have any questions, please read the README document first and post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The monthly air temperature in 1153 stations and precipitation in 1202 stations in China and neighboring countries were collected to construct a monthly climate dataset in China on 0.025 ° resolution (approximately 2.5 km) named LZU0025 dataset designed by Lanzhou University (LZU), using a partial thin plate smoothing method embedded in the ANUSPLIN software. The accuracy of the LZU0025 was evaluated from analyzing three aspects: 1) Diagnostic statistics from surface fitting model in the period of 1951-2011, and results show low mean square root of generalized cross validation (RTGCV) for monthly air temperature surface (1.1 °C) and monthly precipitation surface (2 mm1/2) which interpolated the square root of itself. This indicate exact surface fitting models. 2) Error statistics based on 265 withheld stations data in the period of 1951-2011, and results show that predicted values closely tracked true values with mean absolute error (MAE) of 0.6 °C and 4 mm and standard deviation of mean error (STD) of 1.3 °C and 5 mm, and monthly STDs presented consistent change with RTGCV varying. 3) Comparisons to other datasets through two ways, one was to compare three indices namely the standard deviation, mean and time trend derived from all datasets to referenced dataset released by the China Meteorological Administration (CMA) in the Taylor diagrams, the other was to compare LZU0025 to the Camp Tibet dataset on mountainous remote area. Taylor diagrams displayed the standard deviation derived from LZU had higher correlation with that induced from CMA (Pearson correlation R=0.76 for air temperature case and R=0.96 for precipitation case). The standard deviation for this index derived from LZU was more close to that induced from CMA, and the centered normalized root-mean-square difference for this index derived from LZU and CMA was lower. The same superior performance of LZU were found in comparing indices of the mean and time trend derived from LZU and those induced from other datasets. LZU0025 had high correlation with the Camp dataset for air temperature despite of insignificant correlation for precipitation in few stations. Based on above comprehensive analyses, LZU0025 was concluded as the reliable dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains customer satisfaction scores collected from a survey, alongside key demographic and behavioral data. It includes variables such as customer age, gender, location, purchase history, support contact status, loyalty level, and satisfaction factors. The dataset is designed to help analyze customer satisfaction, identify trends, and develop insights that can drive business decisions.
File Information: File Name: customer_satisfaction_data.csv (or your specific file name)
File Type: CSV (or the actual file format you are using)
Number of Rows: 120
Number of Columns: 10
Column Names:
Customer_ID – Unique identifier for each customer (e.g., 81-237-4704)
Group – The group to which the customer belongs (A or B)
Satisfaction_Score – Customer's satisfaction score on a scale of 1-10
Age – Age of the customer
Gender – Gender of the customer (Male, Female)
Location – Customer's location (e.g., Phoenix.AZ, Los Angeles.CA)
Purchase_History – Whether the customer has made a purchase (Yes or No)
Support_Contacted – Whether the customer has contacted support (Yes or No)
Loyalty_Level – Customer's loyalty level (Low, Medium, High)
Satisfaction_Factor – Primary factor contributing to customer satisfaction (e.g., Price, Product Quality)
Statistical Analyses:
Descriptive Statistics:
Calculate mean, median, mode, standard deviation, and range for key numerical variables (e.g., Satisfaction Score, Age).
Summarize categorical variables (e.g., Gender, Loyalty Level, Purchase History) with frequency distributions and percentages.
Two-Sample t-Test (Independent t-test):
Compare the mean satisfaction scores between two independent groups (e.g., Group A vs. Group B) to determine if there is a significant difference in their average satisfaction scores.
Paired t-Test:
If there are two related measurements (e.g., satisfaction scores before and after a certain event), you can compare the means using a paired t-test.
One-Way ANOVA (Analysis of Variance):
Test if there are significant differences in mean satisfaction scores across more than two groups (e.g., comparing the mean satisfaction score across different Loyalty Levels).
Chi-Square Test for Independence:
Examine the relationship between two categorical variables (e.g., Gender vs. Purchase History or Loyalty Level vs. Support Contacted) to determine if there’s a significant association.
Mann-Whitney U Test:
For non-normally distributed data, use this test to compare satisfaction scores between two independent groups (e.g., Group A vs. Group B) to see if their distributions differ significantly.
Kruskal-Wallis Test:
Similar to ANOVA, but used for non-normally distributed data. This test can compare the median satisfaction scores across multiple groups (e.g., comparing satisfaction scores across Loyalty Levels or Satisfaction Factors).
Spearman’s Rank Correlation:
Test for a monotonic relationship between two ordinal or continuous variables (e.g., Age vs. Satisfaction Score or Satisfaction Score vs. Loyalty Level).
Regression Analysis:
Linear Regression: Model the relationship between a continuous dependent variable (e.g., Satisfaction Score) and independent variables (e.g., Age, Gender, Loyalty Level).
Logistic Regression: If analyzing binary outcomes (e.g., Purchase History or Support Contacted), you could model the probability of an outcome based on predictors.
Factor Analysis:
To identify underlying patterns or groups in customer behavior or satisfaction factors, you can apply Factor Analysis to reduce the dimensionality of the dataset and group similar variables.
Cluster Analysis:
Use K-Means Clustering or Hierarchical Clustering to group customers based on similarity in their satisfaction scores and other features (e.g., Loyalty Level, Purchase History).
Confidence Intervals:
Calculate confidence intervals for the mean of satisfaction scores or any other metric to estimate the range in which the true population mean might lie.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The prefix ’S’ denotes the use of a single-kernel SVR. (CC: Correlation Coefficient; RMSE: Root Mean Square Error)Performance (mean ± standard deviation) comparison among all competing methods.
Facebook
Twittercomparison of the mean and standard error values in each condition for both the token and bar-pull paradigms
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.