Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
Facebook
TwitterMeans, standard deviations, and correlations for all variables in Study 2.
Facebook
TwitterData for Figure 3.39 from Chapter 3 of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Figure 3.39 shows the observed and simulated Pacific Decadal Variability (PDV). --------------------------------------------------- How to cite this dataset --------------------------------------------------- When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Eyring, V., N.P. Gillett, K.M. Achuta Rao, R. Barimalala, M. Barreiro Parrillo, N. Bellouin, C. Cassou, P.J. Durack, Y. Kosaka, S. McGregor, S. Min, O. Morgenstern, and Y. Sun, 2021: Human Influence on the Climate System. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 423–552, doi:10.1017/9781009157896.005. --------------------------------------------------- Figure subpanels --------------------------------------------------- The figure has six panels. Files are not separated according to the panels. --------------------------------------------------- List of data provided --------------------------------------------------- pdv.obs.nc contains - Observed SST anomalies associated with the PDV pattern - Observed PDV index time series (unfiltered) - Observed PDV index time series (low-pass filtered) - Taylor statistics of the observed PDV patterns - Statistical significance of the observed SST anomalies associated with the PDV pattern pdv.hist.cmip6.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP6 historical simulations. pdv.hist.cmip5.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP5 historical simulations. pdv.piControl.cmip6.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP6 piControl simulations. pdv.piControl.cmip5.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP5 piControl simulations. --------------------------------------------------- Data provided in relation to figure --------------------------------------------------- Panel a: - ipo_pattern_obs_ref in pdv.obs.nc: shading - ipo_pattern_obs_signif (dataset = 1) in pdv.obs.nc: cross markers Panel b: - Multimodel ensemble mean of ipo_model_pattern in pdv.hist.cmip6.nc: shading, with their sign agreement for hatching Panel c: - tay_stats (stat = 0, 1) in pdv.obs.nc: black dots - tay_stats (stat = 0, 1) in pdv.hist.cmip6.nc: red crosses, and their multimodel ensemble mean for the red dot - tay_stats (stat = 0, 1) in pdv.hist.cmip5.nc: blue crosses, and their multimodel ensemble mean for the blue dot Panel d: - Lag-1 autocorrelation of tpi in pdv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.hist.cmip6.nc: red filled box-whisker in the left - Lag-10 autocorrelation of tpi_lp in pdv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.hist.cmip6.nc: red filled box-whisker in the right Panel e: - Standard deviation of tpi in pdv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.hist.cmip6.nc: red filled box-whisker in the left - Standard deviation of tpi_lp in pdv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.hist.cmip6.nc: red filled box-whisker in the right Panel f: - tpi_lp in pdv.obs.nc: black curves . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - tpi_lp in pdv.hist.cmip6.nc: 5th-95th percentiles in red shading, multimodel ensemble mean and its 5-95% confidence interval for red curves - tpi_lp in pdv.hist.cmip5.nc: 5th-95th percentiles in blue shading, multimodel ensemble mean for blue curve CMIP5 is the fifth phase of the Coupled Model Intercomparison Project. CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. SST stands for Sea Surface Temperature. --------------------------------------------------- Notes on reproducing the figure from the provided data --------------------------------------------------- Multimodel ensemble means and percentiles of historical simulations of CMIP5 and CMIP6 are calculated after weighting individual members with the inverse of the ensemble size of the same model. ensemble_assign in each file provides the model number to which each ensemble member belongs. This weighting does not apply to the sign agreement calculation. piControl simulations from CMIP5 and CMIP6 consist of a single member from each model, so the weighting is not applied. Multimodel ensemble means of the pattern correlation in Taylor statistics in (c) and the autocorrelation of the index in (d) are calculated via Fisher z-transformation and back transformation. --------------------------------------------------- Sources of additional information --------------------------------------------------- The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the report component containing the figure (Chapter 3) - Link to the Supplementary Material for Chapter 3, which contains details on the input data used in Table 3.SM.1 - Link to the code for the figure, archived on Zenodo - Link to the figure on the IPCC AR6 website
Facebook
Twitterhttps://www.nist.gov/open/licensehttps://www.nist.gov/open/license
These four data files contain datasets from an interlaboratory comparison that characterized a polydisperse five-population bead dispersion in water. A more detailed version of this description is available in the ReadMe file (PdP-ILC_datasets_ReadMe_v1.txt), which also includes definitions of abbreviations used in the data files. Paired samples were evaluated, so the datasets are organized as pairs associated with a randomly assigned laboratory number. The datasets are organized in the files by instrument type: PTA (particle tracking analysis), RMM (resonant mass measurement), ESZ (electrical sensing zone), and OTH (other techniques not covered in the three largest groups, including holographic particle characterization, laser diffraction, flow imaging, and flow cytometry). In the OTH group, the specific instrument type for each dataset is noted. Each instrument type (PTA, RMM, ESZ, OTH) has a dedicated file. Included in the data files for each dataset are: (1) the cumulative particle number concentration (PNC, (1/mL)); (2) the concentration distribution density (CDD, (1/mL·nm)) based upon five bins centered at each particle population peak diameter; (3) the CDD in higher resolution, varied-width bins. The lower-diameter bin edge (µm) is given for (2) and (3). Additionally, the PTA, RMM, and ESZ files each contain unweighted mean cumulative particle number concentrations and concentration distribution densities calculated from all datasets reporting values. The associated standard deviations and standard errors of the mean are also given. In the OTH file, the means and standard deviations were calculated using only data from one of the sub-groups (holographic particle characterization) that had n = 3 paired datasets. Where necessary, datasets not using the common bin resolutions are noted (PTA, OTH groups). The data contained here are presented and discussed in a manuscript to be submitted to the Journal of Pharmaceutical Sciences and presented as part of that scientific record.
Facebook
TwitterMeans and standard deviations for SWB in Studies 1 and 2.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains customer satisfaction scores collected from a survey, alongside key demographic and behavioral data. It includes variables such as customer age, gender, location, purchase history, support contact status, loyalty level, and satisfaction factors. The dataset is designed to help analyze customer satisfaction, identify trends, and develop insights that can drive business decisions.
File Information: File Name: customer_satisfaction_data.csv (or your specific file name)
File Type: CSV (or the actual file format you are using)
Number of Rows: 120
Number of Columns: 10
Column Names:
Customer_ID – Unique identifier for each customer (e.g., 81-237-4704)
Group – The group to which the customer belongs (A or B)
Satisfaction_Score – Customer's satisfaction score on a scale of 1-10
Age – Age of the customer
Gender – Gender of the customer (Male, Female)
Location – Customer's location (e.g., Phoenix.AZ, Los Angeles.CA)
Purchase_History – Whether the customer has made a purchase (Yes or No)
Support_Contacted – Whether the customer has contacted support (Yes or No)
Loyalty_Level – Customer's loyalty level (Low, Medium, High)
Satisfaction_Factor – Primary factor contributing to customer satisfaction (e.g., Price, Product Quality)
Statistical Analyses:
Descriptive Statistics:
Calculate mean, median, mode, standard deviation, and range for key numerical variables (e.g., Satisfaction Score, Age).
Summarize categorical variables (e.g., Gender, Loyalty Level, Purchase History) with frequency distributions and percentages.
Two-Sample t-Test (Independent t-test):
Compare the mean satisfaction scores between two independent groups (e.g., Group A vs. Group B) to determine if there is a significant difference in their average satisfaction scores.
Paired t-Test:
If there are two related measurements (e.g., satisfaction scores before and after a certain event), you can compare the means using a paired t-test.
One-Way ANOVA (Analysis of Variance):
Test if there are significant differences in mean satisfaction scores across more than two groups (e.g., comparing the mean satisfaction score across different Loyalty Levels).
Chi-Square Test for Independence:
Examine the relationship between two categorical variables (e.g., Gender vs. Purchase History or Loyalty Level vs. Support Contacted) to determine if there’s a significant association.
Mann-Whitney U Test:
For non-normally distributed data, use this test to compare satisfaction scores between two independent groups (e.g., Group A vs. Group B) to see if their distributions differ significantly.
Kruskal-Wallis Test:
Similar to ANOVA, but used for non-normally distributed data. This test can compare the median satisfaction scores across multiple groups (e.g., comparing satisfaction scores across Loyalty Levels or Satisfaction Factors).
Spearman’s Rank Correlation:
Test for a monotonic relationship between two ordinal or continuous variables (e.g., Age vs. Satisfaction Score or Satisfaction Score vs. Loyalty Level).
Regression Analysis:
Linear Regression: Model the relationship between a continuous dependent variable (e.g., Satisfaction Score) and independent variables (e.g., Age, Gender, Loyalty Level).
Logistic Regression: If analyzing binary outcomes (e.g., Purchase History or Support Contacted), you could model the probability of an outcome based on predictors.
Factor Analysis:
To identify underlying patterns or groups in customer behavior or satisfaction factors, you can apply Factor Analysis to reduce the dimensionality of the dataset and group similar variables.
Cluster Analysis:
Use K-Means Clustering or Hierarchical Clustering to group customers based on similarity in their satisfaction scores and other features (e.g., Loyalty Level, Purchase History).
Confidence Intervals:
Calculate confidence intervals for the mean of satisfaction scores or any other metric to estimate the range in which the true population mean might lie.
Facebook
TwitterThis is the Baltic and North Sea Climatology (BNSC) for the Baltic Sea and the North Sea in the range 47 ° N to 66 ° N and 15 ° W to 30 ° E. It is the follow-up project to the knsc climatology. The climatology was first made available to the public in March 2018 by ICDC and is published here in a slightly revised version 2. It contains the monthly averages of mean air pressure at sea level, and air temperature, and dew point temperature at 2 meter height. It is available on a 1 ° x 1 ° grid for the period from 1950 to 2015. For the calculation of the mean values, all available quality-controlled data of the DWD (German Meteorological Service) of ship observations and buoy measurements were taken into account during this period. Additional dew point values were calculated from relative humidity and air temperature if available. Climatologies were calculated for the WMO standard periods 1951-1980, 1961-1990, 1971-2000 and 1981-2010 (monthly mean values). As a prerequisite for the calculation of the 30-year-climatology, at least 25 out of 30 (five-sixths) valid monthly means to be present in the respective grid box. For the long-term climatology from 1950 to 2015, at least four-fifths valid monthly means had to be available. Two methods were used (in combination) to calculate the monthly averages, to account for the small number of measurements per grid box and their uneven spatial and temporal distribution: 1. For parameters with a detectable annual cycle in the data (air temperature, dew point temperature), a 2nd order polynomial was fitted to the data to reduce the variation within a month and reduce the uncertainty of the calculated averages. In addition, for the mean value of air temperature, the daily temperature cycle was removed from the data. In the case of air pressure, which has no annual cycle, in version 2 per month and grid box no data gaps longer than 14 days were allowed for the calculation of a monthly mean and standard deviation. This method differs from knsc and BNSC version 1, where mean and standard deviation were calculated from 6-day windows means. 2. If the number of observations fell below a certain threshold, which was 20 observations per grid box and month for the air temperature as well as for the dew point temperature, and 500 per box and month for the air pressure, data from the adjacent boxes was used for the calculation. The neighbouring boxes were used in two steps (the nearest 8 boxes, and if the number was still below the threshold, the next sourrounding 16 boxes) to calculate the mean value of the center box. Thus, the spatial resolution of the parameters is reduced at certain points and, instead of 1 ° x 1 °, if neighboring values are taken into account, data from an area of 5 ° x 5 ° can also be considered, which are then averaged into a grid box value. This was especially used for air pressure, where the 24 values of the neighboring boxes were included in the averaging for most grid boxes. The mean value, the number of measurements, the standard deviation and the number of grid boxes used to calculate the mean values are available as parameters in the products. The calculated monthly and annual means were allocated to the centers of the grid boxes: Latitudes: 47.5, 48.5,... Longitudes: —14.5, -13.5,... In order to remove any existing values over land, a land-sea mask was used, which is also provided in 1 ° x 1 ° resolution. In this version 2 of the BNSC, a slightly different database was used, than for the knsc, which resulted in small changes (less than 1 K) in the means and standard deviations of the 2-meter air temperature and dew point temperature. The changes in mean sea level pressure values and the associated standard deviations are in the range of a few hPa, compared to the knsc. The parameter names and units have been adjusted to meet the CF 1.6 standard.
Facebook
TwitterReligions, as cultural systems, influence how people view and attune to their body. This research explores whether individual differences in various dimensions of religiosity are associated with interoceptive sensibility (IS), i.e., one’s perceived ability to detect and interpret bodily signals. In Study 1, Christians, Muslims, and Hindus (N = 1570) reported their religiosity and completed the Multidimensional Assessment of Interoceptive Awareness, a well-validated measure of IS. Results show that religious identity moderates the relationship between the centrality of religion in one’s life and IS such that the association is positive and medium for Christians, large for Muslims and Hindus. In addition, the medium positive correlation between frequency of religious practice and IS was similar across religious groups. Study 2 (N = 450) extended these results by measuring additional dimensions of religiosity and spirituality as well as investigating religious-related beliefs about the body, both positive (e.g., My body is holy) and negative (e.g., My body is sinful). Associations between religiosity and IS are replicated and found for spirituality as well. Interestingly, mediation analyses reveal that belief in the body as holy partially explains the association between religiosity and IS, but belief in the body as sinful suppresses such association. We discuss how religion, as a cultural factor, may influence beliefs about the body and bodily awareness, with implications for emotion regulation and mental health.
Facebook
TwitterOverview: 142: Areas used for sports, leisure and recreation purposes. Traceability (lineage): This dataset was produced with a machine learning framework with several input datasets, specified in detail in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ) Scientific methodology: The single-class probability layers were generated with a spatiotemporal ensemble machine learning framework detailed in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ). The single-class uncertainty layers were calculated by taking the standard deviation of the three single-class probabilities predicted by the three components of the ensemble. The HCL (hard class) layers represents the class with the highest probability as predicted by the ensemble. Usability: The HCL layers have a decreasing average accuracy (weighted F1-score) at each subsequent level in the CLC hierarchy. These metrics are 0.83 at level 1 (5 classes):, 0.63 at level 2 (14 classes), and 0.49 at level 3 (43 classes). This means that the hard-class maps are more reliable when aggregating classes to a higher level in the hierarchy (e.g. 'Discontinuous Urban Fabric' and 'Continuous Urban Fabric' to 'Urban Fabric'). Some single-class probabilities may more closely represent actual patterns for some classes that were overshadowed by unequal sample point distributions. Users are encouraged to set their own thresholds when postprocessing these datasets to optimize the accuracy for their specific use case. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: The LULC classification was validated through spatial 5-fold cross-validation as detailed in the accompanying publication. Completeness: The dataset has chunks of empty predictions in regions with complex coast lines (e.g. the Zeeland province in the Netherlands and the Mar da Palha bay area in Portugal). These are artifacts that will be avoided in subsequent versions of the LULC product. Consistency: The accuracy of the predictions was compared per year and per 30km*30km tile across europe to derive temporal and spatial consistency by calculating the standard deviation. The standard deviation of annual weighted F1-score was 0.135, while the standard deviation of weighted F1-score per tile was 0.150. This means the dataset is more consistent through time than through space: Predictions are notably less accurate along the Mediterrranean coast. The accompanying publication contains additional information and visualisations. Positional accuracy: The raster layers have a resolution of 30m, identical to that of the Landsat data cube used as input features for the machine learning framework that predicted it. Temporal accuracy: The dataset contains predictions and uncertainty layers for each year between 2000 and 2019. Thematic accuracy: The maps reproduce the Corine Land Cover classification system, a hierarchical legend that consists of 5 classes at the highest level, 14 classes at the second level, and 44 classes at the third level. Class 523: Oceans was omitted due to computational constraints.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The first step to develop a quantitative structure–activity relationship (QSAR) model is to identify a set of chemicals with known activities/properties, which can be either collected from the published studies or measured experimentally. A key challenge in this process is how to determine which chemicals are used to train a QSAR model, and, of those chemicals, which should be prioritized in experimental trials to ensure that the obtained models have large applicability domains (ADs). In this study, we employ uncertainty-based active learning (AC) to address this challenge. We use the Gaussian process (GP) to develop QSAR models for three public datasets, Koc, solubility, and k•OH, each with a number of chemicals represented by molecular descriptors, in which the GP can offer prediction uncertainty (by means of standard deviation) for the model’s prediction. The training chemicals of each dataset are selected in two different ways: (1) random splitting (RS) and (2) uncertainty-based AC. Uncertainty-based AC iteratively identifies chemicals with the highest uncertainty and selects them for model training. We demonstrate that the chemicals selected by AC are more diverse than those selected by RS and that AC-based QSAR models have better generalizability than those derived from RS. We then use these two types of models to predict the properties of chemicals in the REACH dataset (>300,000 chemicals) and assess their ADs using five different AD determination methods. We demonstrate that the AD of AC-based QSAR models for all AD methods is significantly larger than those of RS-based models (up to 24 times larger). This study provides a novel method to enlarge the AD of QSAR models, which can guide model development and improve the property prediction reliability for more REACH dataset chemicals while minimizing the development cost and time.
Facebook
TwitterThe means and standard deviations of the 50 MAP estimates based upon data with 400 infected households for each parameter is shown in the form mean(standard deviation) for the BPA and DA-MCMC methods. The last row shows the difference in the mean and standard deviation between the two methods.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Untargeted liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) is a powerful tool for comprehensive chemical analysis. Such techniques allow the detection and quantification of thousands of compounds in a sample. However, the complexity and variability in the data can introduce significant errors, impacting the reliability of the results. This study investigates ensemble averaging to mitigate these errors and improve signal-to-noise (S/N) ratios, feature detection, and data quality. In this work, 256 LC-qTOF/MS1 data sets from the analysis of Morning Glory seeds were averaged to generate merged data sets. The numbers of the pooled data sets in the merged files were varied, and the number of features, the S/N ratio, the accuracy and precision of the accurate masses, relative intensities, and migration time were examined. It was proved that ensemble averaging allows an increase in the S/N up to a factor of 10, and the relative standard deviation of the accurate masses and retention time decreased by a factor of 10. Moreover, the average number of features mined per data set increased from 1192 ± 129 with the original data set to 4408 when all data sets were averaged into one. Using known target compounds, ensemble averaging benefits on quantitative analysis were investigated. The measured and theoretical relative intensities between the [M+1]+H+, [M+2]+H+, and [M+3]+H+ and [M]+H+ isotopes of known alkaloids were used. The standard deviation decreased by up to a factor of 10, and the absolute error between theoretical and experimental relative intensities was below 3%, making the theoretical isotopic pattern a valid criterion for confirming a putative molecular formula. Using a targeted approach to recover quantitative data from the original data sets from information in the merged data sets provides an accurate quantitative means. Peak lists from the merged data sets and quantitative information from the original data sets were fused to obtain a robust clustering approach that allows recognizing features (adducts, isotopes, and fragments) generated by a common chemical in the ionization chamber. Two hundred and four clusters were obtained, characterized by two or more features with migration times that differ by less than 0.05 min and with similar response patterns.
Facebook
TwitterThis a new (GPM-formated) TRMM product. There is no equivalent in the old TRMM suite of products.
Version 07 is the current version of the data set. Older versions will no longer be available and have been superseded by Version 07.
This is the GPM-like formatted TRMM Precipitation Radar (PR) daily gridded data, first released with the "V8" TRMM reprocessing. The daily radar grid data is new for TRMM nomenclature and is introduced for consistency with the GPM Dual-frequency Precipitation Radar (DPR). The closest ancestor was 3A25 which was a monthly radar statistics.
This product consists of daily statistics of the PR measurements at (0.25x0.25) degrees horizontal resolution.
The objective of the algorithm is to calculate various daily statistics from the level 2 PR output products. Four types of statistics are calculated: 1. Probabilities of occurrence (count values) 2. Means and standard deviations In all cases, the statistics are conditioned on the presence of rain or some other quantity such as the presence of stratiform rain or the presence of a bright-band. For example, to compute the unconditioned mean rain rate, the conditional mean must be multiplied by the probability of rain which, in turn is calculated from the ratio of rain counts to the total number of observations in the box of interest.
The grids are in the Planetary Grid 2 structure matching the Dual-frequency PR on the core GPM observatory that covers 67S to 67N degrees of latitudes. Areas beyond the ±40 degrees of latitudes are padded with empty grid cells.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1Mean values (SD); normal scores>9 (max score = 16).2Max score = 60.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
- Data from: Low-frequency oscillations in the magnetic nozzle of a Helicon Plasma Thruster
- Authors: Davide Maddaloni, Borja Bayón-Buján, Jaume Navarro-Cavallé, Mario Merino, Filippo Terragni
- Contact email: dmaddalo@ing.uc3m.es
- Date: 2024-09-13
- Version: 1.0.0
- License: This dataset is made available under the Creative Commons Attribution 4.0 International
This dataset contains the postprocessed experimental data used in:
Davide Maddaloni, Borja Bayón-Buján, Jaume Navarro-Cavallé, Mario Merino, Filippo Terragni, "Low-frequency oscillations in the magnetic nozzle of a Helicon Plasma Thruster", Plasma Sources Science and Technology
Which is currently submitted.
The experimental data is gathered by means of three distinct floating Langmuir Probes (LPs).
For the time-resolved analysis, the original floating potential data is postprocessed according to the description provided in the corresponding journal article (Section 3). For the time-averaged analysis, sweeping of the LPs is performed and the I-V curves are postprocessed according to the routine illustrated in Lobbia et al.
Please refer to the relative article for further details regarding any of the parameters and/or configurations.
The data files are in standard Matlab .mat format. A recent version of Matlab is recommended.
For the time-resolved results, data is subdivided according to the spatial position inspected and the xenon injected mass flow rate. The nomenclature of the files is the tfollowing: "Output_[injected mass flow rate]_[axial position]_[angular position]". Currently, all the arrays collect frequencies until 200 kHz. In a future update, frequencies until 1 MHz will be included.
Each file consists of several subfields, as follows:
The results of the time-resolved analysis, including plasma potential and plasma density for the two mass flow rates inspected, will be added in a future update.
Works using this dataset or any part of it in any form shall cite it as follows.
The preferred means of citation is to reference the publication associated to this dataset, as soon as it is available.
Optionally, the dataset may be cited directly by referencing the corresponding DOI: 10.5281/zenodo.13758358.
This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (project ERC-STG ZARATHUSTRA, grant agreement No 950466). Additionally, F. Terragni was also supported by the FEDER / Ministerio de Ciencia, Innovación y Universidades - Agencia Estatal de Investigación (grant agreement No PID2020-112796RB-C22), while B. Bayón-Buján enjoyed a grant from the Consejería de Educación, Universidades, Ciencia y Portavocía of the Community of Madrid (grant PEJ-2021-AI/TIC-23158).
Facebook
TwitterData for Figure 3.40 from Chapter 3 of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Figure 3.40 shows the observed and simulated Atlantic Multidecadal Variability (AMV). --------------------------------------------------- How to cite this dataset --------------------------------------------------- When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Eyring, V., N.P. Gillett, K.M. Achuta Rao, R. Barimalala, M. Barreiro Parrillo, N. Bellouin, C. Cassou, P.J. Durack, Y. Kosaka, S. McGregor, S. Min, O. Morgenstern, and Y. Sun, 2021: Human Influence on the Climate System. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 423–552, doi:10.1017/9781009157896.005. --------------------------------------------------- Figure subpanels --------------------------------------------------- The figure has six panels. Files are not separated according to the panels. --------------------------------------------------- List of data provided --------------------------------------------------- amv.obs.nc contains - Observed SST anomalies associated with the AMV pattern - Observed AMV index time series (unfiltered) - Observed AMV index time series (low-pass filtered) - Taylor statistics of the observed AMV patterns amv.hist.cmip6.nc contains - Statistical significance of the observed SST anomalies associated with the AMV pattern - Simulated SST anomalies associated with the AMV pattern - Simulated AMV index time series (unfiltered) - Simulated AMV index time series (low-pass filtered) - Taylor statistics of the simulated AMV patterns based on CMIP6 historical simulations. amv.hist.cmip5.nc contains - Simulated SST anomalies associated with the AMV pattern - Simulated AMV index time series (unfiltered) - Simulated AMV index time series (low-pass filtered) - Taylor statistics of the simulated AMV patterns based on CMIP5 historical simulations. amv.piControl.cmip6.nc contains - Simulated SST anomalies associated with the AMV pattern - Simulated AMV index time series (unfiltered) - Simulated AMV index time series (low-pass filtered) - Taylor statistics of the simulated AMV patterns based on CMIP6 piControl simulations. amv.piControl.cmip5.nc contains - Simulated SST anomalies associated with the AMV pattern - Simulated AMV index time series (unfiltered) - Simulated AMV index time series (low-pass filtered) - Taylor statistics of the simulated AMV patterns based on CMIP5 piControl simulations. --------------------------------------------------- Data provided in relation to figure --------------------------------------------------- Panel a: - amv_pattern_obs_ref in amv.obs.nc: shading - amv_pattern_obs_signif (dataset = 1) in amv.obs.nc: cross markers Panel b: - Multimodel ensemble mean of amv_pattern in amv.hist.cmip6.nc: shading, with their sign agreement for hatching Panel c: - tay_stats (stat = 0, 1) in amv.obs.nc: black dots - tay_stats (stat = 0, 1) in amv.hist.cmip6.nc: red crosses, and their multimodel ensemble mean for the red dot - tay_stats (stat = 0, 1) in amv.hist.cmip5.nc: blue crosses, and their multimodel ensemble mean for the blue dot Panel d: - Lag-1 autocorrelation of amv_timeseries_raw in amv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of amv_timeseries_raw in amv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of amv_timeseries_raw in amv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of amv_timeseries_raw in amv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of amv_timeseries_raw in amv.hist.cmip6.nc: red filled box-whisker in the left - Lag-10 autocorrelation of amv_timeseries in amv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of amv_timeseries in amv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of amv_timeseries in amv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of amv_timeseries in amv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of amv_timeseries in amv.hist.cmip6.nc: red filled box-whisker in the right Panel e: - Standard deviation of amv_timeseries_raw in amv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries_raw in amv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries_raw in amv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries_raw in amv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries_raw in amv.hist.cmip6.nc: red filled box-whisker in the left - Standard deviation of amv_timeseries in amv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries in amv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries in amv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries in amv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries in amv.hist.cmip6.nc: red filled box-whisker in the right Panel f: - amv_timeseries in amv.obs.nc: black curves . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - amv_timeseries in amv.hist.cmip6.nc: 5th-95th percentiles in red shading, multimodel ensemble mean and its 5-95% confidence interval for red curves - amv_timeseries in amv.hist.cmip5.nc: 5th-95th percentiles in blue shading, multimodel ensemble mean for blue curve CMIP5 is the fifth phase of the Coupled Model Intercomparison Project. CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. SST stands for Sea Surface Temperature. --------------------------------------------------- Notes on reproducing the figure from the provided data --------------------------------------------------- Multimodel ensemble means and percentiles of historical simulations of CMIP5 and CMIP6 are calculated after weighting individual members with the inverse of the ensemble size of the same model. ensemble_assign in each file provides the model number to which each ensemble member belongs. This weighting does not apply to the sign agreement calculation. piControl simulations from CMIP5 and CMIP6 consist of a single member from each model, so the weighting is not applied. Multimodel ensemble means of the pattern correlation in Taylor statistics in (c) and the autocorrelation of the index in (d) are calculated via Fisher z-transformation and back transformation. --------------------------------------------------- Sources of additional information --------------------------------------------------- The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the report component containing the figure (Chapter 3) - Link to the Supplementary Material for Chapter 3, which contains details on the input data used in Table 3.SM.1 - Link to the code for the figure, archived on Zenodo - Link to the figure on the IPCC AR6 website
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This model learning dataset is created out of the Raw Synthetic RWD raw dataset, including some of the original attributes. It is distributed in JOBLIB files, where .joblib files contain the vectors and _ids.joblib contain the ID of the person from which each vector is extracted. This is useful in case it is needed to map the vectors to metadata about the people that are found in the original raw dataset. Note that corresponds to , or , depending on the dataset. The split is roughly 60% of the people are in the training dataset, and 20% in each of the validation and the testing datasets. The input attributes are the age, the short-term averages and the trends of the current week’s BMI, steps walked, calories burned, sleep quality, mood and water consumption, as well as the previous week’s short-term average and trend of the answer to the health self-assessment question. The outcome to be predicted is a tristate quantized version of the health self-assessment answer to be given in the current week. The dataset is normalized based on the training set. The means and standard deviations used can be found in the train_statistics.joblib file. Finally, the output_descriptions.joblib file contains descriptions of the outcomes to be predicted (not actually needed, since included here).
Facebook
TwitterMeans and standard deviations for session 1 and session 2, mean differences between time-moments including 95% confidence intervals, paired t-test and effect size (n = 22).
Facebook
TwitterOur target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Facebook
TwitterNOTE: Within rows, values with different superscripts (a, b) are significantly (p<.05) different from each other controlling for familywise error (Scheffé’s test).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.