Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
About Datasets:
Domain : Finance Project: Variance Analysis Datasets: Budget vs Actuals Dataset Type: Excel Data Dataset Size: 482 records
KPI's: 1. Total Income 2. Total Expenses 3. Total Savings 4. Budget vs Actual Income 5. Actual Expenses Breakdown
Process: 1. Understanding the problem 2. Data Collection 3. Exploring and analyzing the data 4. Interpreting the results
This data contains dynamic dashboard, data validation, index match, SUMIFS, conditional formatting, if conditions, column chart, pie chart.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the features and probabilites of ten different functions. Each dataset is saved using numpy arrays. \item The data set \textit{Arc} corresponds to a two-dimensional random sample drawn from a random vector $$X=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)=\mathcal{N}(x_2|0,4)\mathcal{N}(x_1|0.25x_2^2,1)$$ where $$\mathcal{N}(u|\mu,\sigma^2)$$ denotes the density function of a normal distribution with mean $$\mu$$ and variance $$\sigma^2$$. \cite{Papamakarios2017} used this data set to evaluate his neural density estimation methods. \item The data set \textit{Potential 1} corresponds to a two-dimensional random sample drawn from a random vector $$X=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)=\frac{1}{2}\left(\frac{||x||-2}{0.4}\right)^2 - \ln{\left(\exp\left\{-\frac{1}{2}\left[\frac{x_1-2}{0.6}\right]^2\right\}+\exp\left\{-\frac{1}{2}\left[\frac{x_1+2}{0.6}\right]^2\right\}\right)}$$ with a normalizing constant of approximately 6.52 calculated by Monte Carlo integration. \item The data set \textit{Potential 2} corresponds to a two-dimensional random sample drawn from a random vector $$X=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)=\frac{1}{2}\left[ \frac{x_2-w_1(x)}{0.4}\right]^2$$ where $$w_1(x)=\sin{(\frac{2\pi x_1}{4})}$$ with a normalizing constant of approximately 8 calculated by Monte Carlo integration. \item The data set \textit{Potential 3} corresponds to a two-dimensional random sample drawn from a random vector $$x=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)= - \ln{\left(\exp\left\{-\frac{1}{2}\left[\frac{x_2-w_1(x)}{0.35}\right]^2\right\}+\exp\left\{-\frac{1}{2}\left[\frac{x_2-w_1(x)+w_2(x)}{0.35}^2\right]\right\}\right)}$$ where $$w_1(x)=\sin{(\frac{2\pi x_1}{4})}$$ and $$w_2(x)=3 \exp \left\{-\frac{1}{2}\left[ \frac{x_1-1}{0.6}\right]^2\right\}$$ with a normalizing constant of approximately 13.9 calculated by Monte Carlo integration. \item The data set \textit{Potential 4} corresponds to a two-dimensional random sample drawn from a random vector $$x=(X_1,X_2)$$ with probability density function given by $$f(x_1,x_2)= - \ln{\left(\exp\left\{-\frac{1}{2}\left[\frac{x_2-w_1(x)}{0.4}\right]^2\right\}+\exp\left\{-\frac{1}{2}\left[\frac{x_2-w_1(x)+w_3(x)}{0.35}^2\right]\right\}\right)}$$ where $$w_1(x)=\sin{(\frac{2\pi x_1}{4})}$$, $$w_3(x)=3 \sigma \left(\left[ \frac{x_1-1}{0.3}\right]^2\right)$$, and $$\sigma(x)= \frac{1}{1+\exp(x)}$$ with a normalizing constant of approximately 13.9 calculated by Monte Carlo integration. \item The data set \textit{2D mixture} corresponds to a two-dimensional random sample drawn from the random vector $$x=(X_1, X_2)$$ with a probability density function given by $$f(x) = \frac{1}{2}\mathcal{N}(x|\mu_1,\Sigma_1) + \frac{1}{2}\mathcal{N}(x|\mu_2,\Sigma_2)$$ with means and covariance matrices $$\mu_1 = [1, -1]^T$$, $$\mu_2 = [-2, 2]^T$$, $$\Sigma_1=\left[\begin{array}{cc} 1 & 0 \\ 0 & 2 \end{array}\right]$$, and $$\Sigma_1=\left[\begin{array}{cc} 2 & 0 \\ 0 & 1 \end{array}\right]$$ \item The data set \textit{10D-mixture} corresponds to a 10-dimensional random sample drawn from the random vector $$x=(X_1,\cdots,X_{10})$$ with a mixture of four diagonal normal probability density functions $$\mathcal{N}(X_i|\mu_i, \sigma_i)$$, where each $$\mu_i$$ is drawn uniformly in the interval $$[-0.5,0.5]$$, and the $$\sigma_i$$ is drawn uniformly in the interval $$[-0.01, 0.5]$$. Each diagonal normal probability density has the same probability of being drawn $$1/4$$.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To evaluate variance homogeneity among groups for clustered data, Iachine et al. (Robust tests for the equality of variances for clustered data. J Stat Comput Simul 2010;80(4):365–377) introduced an extension of the well-known Levene test. However, this method does not account for informative cluster size (ICS) or informative within-cluster group size (IWCGS), which can occur in clustered data when cluster and group sizes are random variables. This article introduces two tests of variance homogeneity that are appropriate for data with ICS and IWCGS, one extending the Levene-style transformation method and one based on a direct comparison of estimates of variance. We demonstrate the properties of our tests in a detailed simulation study and show that they are resistant to the potentially biasing effects of ICS and IWCGS. We illustrate the use of these tests by applying them to a data set of x-ray diffraction measurements collected from a specimen of duplex steel.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Public goods games are models of social dilemmas where cooperators pay a cost for the production of a public good while defectors free ride on the contributions of cooperators. In the traditional framework of evolutionary game theory, the payoffs of cooperators and defectors result from interactions in groups formed by binomial sampling from an infinite population. Despite empirical evidence showing that group-size distributions in nature are highly heterogeneous, most models of social evolution assume that the group size is constant. In this paper, I remove this assumption and explore the effects of having random group sizes on the evolutionary dynamics of public goods games. By a straightforward application of Jensen's inequality, I show that the outcome of general nonlinear public goods games depends not only on the average group size but also on the variance of the group-size distribution. This general result is illustrated with two nonlinear public goods games (the public goods game with discounted or synergistic benefits and the volunteer's dilemma) and three different group-size distributions (Poisson, geometric, and Waring). The results suggest that failing to acknowledge the natural variation of group sizes can lead to an underestimation of the actual level of cooperation exhibited in evolving populations.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Theories demand much of data, often more than a single data collection can provide. For example, many important research questions are set in the past and must rely on data collected at that time and for other purposes. As a result, we often find that the data lack crucial variables. Another common problem arises when we wish to estimate the relationship between variables that are measured in different data sets. A variation of this occurs with a split half sample design in which one or more important variables appear on the "wrong" half. Finally, we may need panel data but have only cross sections available. In each of these cases our ability to estimate the theoretically determined equation is limited by the data that are available. In many cases there is simply no solution, and theory must await new opportunities for testing. Under certain circumstances, however, we may still be able to estimate relationships between variables even though they are not measured on the same set of observations. This technique, which I call two-stage auxiliary instrumental variables (2SAIV), provides some new leverage on such problems and offers the opportunity to test hypotheses that were previously out of reach. T his article develops the 2SAIV estimator, proves its consistency and derives its asymptotic variance. A set of simulations illustrates the performance of the estimator in finite samples and several applications are sketched out.
Facebook
TwitterThe purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
See the file 'Plaid_Data-description.docx' for a detailed description of the dataset. Plaid designs are characterised by having one set of treatments applied to rows and another set of treatments applied to columns. In a 2003 publication, Farewell and Herzberg presented an analysis of variance structure for such designs. They presented an example of a study in which medical practitioners, trained in different ways, evaluated a series of videos of patients obtained under a variety of conditions. However, their analysis did not take full account of all error terms. In this paper, a more comprehensive analysis of this study is presented, informed by the recognition that the study can also be regarded as a two-phase design. The development of random effects models is outlined and the potential importance of block-treatment interactions is highlighted. The use of a variety of techniques is shown to lead to a better understanding of the study. Examination of the variance components involved in the expected mean squares is demonstrated to have particular value in identifying appropriate error terms for F-tests derived from an analysis of variance table. A package such as ASReml can also be used provided an appropriate error structure is specified. The methods presented can be applied to the design and analysis of other complex studies in which participants supply multiple measurements under a variety of conditions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The analysis of variance was designed with "measurement", i.e. either all 20 variables or a reduced set of 15 variables, from which the variables 11,…,15 were removed, as within-subjects factor and "group" as between-subjects factor.
Facebook
TwitterNew dataset replacing https://citydata.mesaaz.gov/Information-Technology/Information-Technology-Project-Schedule-and-Budget/spka-r4fd.
This data set lists projects currently in process and managed by the Department of Information Technology. Projects are noted if they are in an implementation phase, which would make their schedule and budget adherence applicable. Budget is listed and determined by project manager if they are within budget, at or under budget, to date. Schedule is determined by project start date, project manager original go live estimate and current go live estimate and /or actual go live date.
Facebook
TwitterPresented are point estimates of the component of variance analysis on the full data set for all 10 selected outcome measures. For the random factors in the LMM also confidence intervals (CI95) of the point estimates are presented in square brackets. Please note: confidence intervals were limited to the maximum possible range (i.e., [0,1]). LMM, linear mixed model. (XLSX)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data set was produced by MetaVision team using original content.
Several segmentation techniques were used to generate accurate masks from the original video streams, preserving natural variance and original shapes.
This set is just a "building block". Feel free to augment it or combine it with other sets.
https://i.postimg.cc/wvXkn680/04-drones.jpg" alt="Accurate drone shapes/segmentation">
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Abstract1. Age-specific variances and covariances in reproductive success shape the total variance in lifetime reproductive success (LRS), age-specific opportunities for selection, and population demographic variance and effective size. Age-specific (co)variances in reproductive success achieved through different reproductive routes must therefore be quantified to predict population, phenotypic and evolutionary dynamics in age-structured populations. 2. While numerous studies have quantified age-specific variation in mean reproductive success, age-specific variances and covariances in reproductive success, and the contributions of different reproductive routes to these (co)variances, have not been comprehensively quantified in natural populations. 3. We applied ‘additive’ and ‘independent’ methods of variance decomposition to complete data describing apparent (social) and realised (genetic) age-specific reproductive success across 11 cohorts of socially monogamous but genetically polygynandrous song sparrows (Melospiza melodia). We thereby quantified age-specific (co)variances in male within-pair and extra-pair reproductive success (WPRS and EPRS) and the contributions of these (co)variances to the total variances in age-specific reproductive success and LRS. 4. ‘Additive’ decomposition showed that within-age and among-age (co)variances in WPRS across males aged 2–4 years contributed most to the total variance in LRS. Age-specific (co)variances in EPRS contributed relatively little. However, extra-pair reproduction altered age-specific variances in reproductive success relative to the social mating system, and hence altered the relative contributions of age-specific reproductive success to the total variance in LRS. 5. ‘Independent’ decomposition showed that the (co)variances in age-specific WPRS, EPRS and total reproductive success, and the resulting opportunities for selection, varied substantially across males that survived to each age. Furthermore, extra-pair reproduction increased the variance in age-specific reproductive success relative to the social mating system to a degree that increased across successive age classes. 6. This comprehensive decomposition of the total variances in age-specific reproductive success and LRS into age-specific (co)variances attributable to two reproductive routes showed that within-age and among-age covariances contributed substantially to the total variance and that extra-pair reproduction can alter the (co)variance structure of age-specific reproductive success. Such covariances and impacts should consequently be integrated into theoretical assessments of demographic and evolutionary processes in age-structured populations. Usage notesLebigre_et_al_JAE_Data_AdditiveMethodLebigre_et_al_JAE_Data_IndependentMethod
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains the daily variance of flow rate and various chemical/physical properties of water in surface runoff and subsurface lateral flow, plus soil moisture, soil temperature and precipitation, for the fifteen catchments on the North Wyke Farm Platform (NWFP), for the period Jan 2012 – Dec 2023. The data set was calculated from 15-minute time step values that are available from the NWFP data portal. Prior to calculation of the daily variance, each of the variables was first screened for potentially poor quality according to the flag assigned each time step value during the quality control (QC) process. Where data did not meet the QC criteria flag of ‘Good’, Acceptable’ or ‘Outlier’, values were rectified as missing data (NA).
Other data sets that complement this one are ‘NWFP flow rate and chemical/physical parameters 2012-2023 Variances', which is the daily variance derived from non-adjusted data (https://doi.org/10.23637/ry81ao5d) and are ‘NWFP flow rate and chemical/physical parameters 2012-2023 Variances QC and missing values adjusted' that is the daily variance of data that have been adjusted according to both QC flag and acceptable threshold for number of daily missing values (https://doi.org/10.23637/dfzekfy1).
Facebook
TwitterIn socially monogamous species, male reproductive success consists of ‘within-pair’ offspring produced with their socially-paired mate(s), and ‘extra-pair’ offspring produced with additional females throughout the population. Both reproductive pathways offer distinct opportunities for selection in wild populations, as each is composed of separate components of mate attraction, female fecundity, and paternity allocation. Identifying key sources of variance and covariance among these components is a crucial step towards understanding the reproductive strategies that males use to maximize fitness both annually and over their lifetimes. We use 16 years of complete reproductive data from a population of black-throated blue warblers (Setophaga caerulescens) to partition variance in male annual and lifetime reproductive success, and thereby identify if the opportunity for selection varies over the lifetimes of individual males and what reproductive strategies likely favor maximum lifetime fitn...
Facebook
TwitterIMP-8 PLS propagated solar wind data and linearly interpolated to have the measurements on the minute at 60 s resolution data in GSE coordinates. This data set consists of propagated solar wind data that has first been propagated to a position just outside of the nominal bow shock (about 17, 0, 0 Re) and then linearly interpolated to 1 min resolution using the interp1.m function in MATLAB. The input data for this data set is a 1 min resolution processed solar wind data constructed by Dr. J.M. Weygand. The method of propagation is similar to the minimum variance technique and is outlined in Dan Weimer et al. [2003; 2004]. The basic method is to find the minimum variance direction of the magnetic field in the plane orthogonal to the mean magnetic field direction. This minimum variance direction is then dotted with the difference between final position vector minus the original position vector and the quantity is divided by the minimum variance dotted with the solar wind velocity vector, which gives the propagation time. This method does not work well for shocks and minimum variance directions with tilts greater than 70 degrees of the sun-earth line. This data set was originally constructed by Dr. J.M. Weygand for Prof. R.L. McPherron, who was the principle investigator of two National Science Foundation studies: GEM Grant ATM 02-1798 and a Space Weather Grant ATM 02-08501. These data were primarily used in superposed epoch studies References: Weimer, D. R. (2004), Correction to ‘‘Predicting interplanetary magnetic field (IMF) propagation delay times using the minimum variance technique,’’ J. Geophys. Res., 109, A12104, doi:10.1029/2004JA010691. Weimer, D.R., D.M. Ober, N.C. Maynard, M.R. Collier, D.J. McComas, N.F. Ness, C. W. Smith, and J. Watermann (2003), Predicting interplanetary magnetic field (IMF) propagation delay times using the minimum variance technique, J. Geophys. Res., 108, 1026, doi:10.1029/2002JA009405.
Facebook
TwitterReplication code for "Bias and Variance in Multiparty Election Polls" includes code and data for scraping and cleaning pre-election polls conducted for German national (Bundestag) and regional (Landtag) elections from 1994 to 2021. The data set allows for the analysis of polling errors and the visualization of the results.
Facebook
TwitterThis data set provides an estimate of the vapor pressure deficit. It infers a value for each grid point based on nearby and distant values of the input Level-2 datasets and estimates of the variance of those values, with lower variances given higher weight.The Spatial Statistical Data Fusion (SSDF) surface continental United States (CONUS) products, fuse data from the Atmospheric InfraRed Sounder (AIRS) instrument on the EOS-Aqua spacecraft with data from the Cross-track Infrared and Microwave Sounding Suite (CrIMSS) instruments on the Suomi-NPP spacecraft. The CrIMSS instrument suite consists of the Cross-track Infrared Sounder (CrIS) infrared sounder and the Advanced Technology Microwave Sounder (ATMS) microwave sounder. These are all daily products on a ¼ x ¼ degree latitude/longitude grid covering the continental United States (CONUS).The SSDF algorithm infers a value for each grid point based on nearby and distant values of the input Level-2 datasets and estimates of the variance of those values, with lower variances given higher weight. Performing the data fusion of two (or more) remote sensing datasets that estimate the same physical state involves four major steps: (1) Filtering input data; (2) Matching the remote sensing datasets to an in situ dataset, taken as a truth estimate; (3) Using these matchups to characterize the input datasets via estimation of their bias and variance relative to the truth estimate; (4) Performing the spatial statistical data fusion. We note that SSDF can also be performed on a single remote sensing input dataset. The SSDF algorithm only ingests the bias-corrected estimates, their latitudes and longitudes, and their estimated variances; the algorithm is agnostic as to which dataset or datasets those estimates, latitudes, longitudes, and variances originated from.
Facebook
TwitterGeotail Weimer propagated solar wind data and linearly interpolated to have the measurements on the minute at 60 s resolution CPI data in GSE coordinates. This data set consists of propagated solar wind data that has first been propagated to a position just outside of the nominal bow shock (about 17, 0, 0 Re) and then linearly interpolated to 1 min resolution using the interp1.m function in MATLAB. The input data for this data set is a 1 min resolution processed solar wind data constructed by Dr. J.M. Weygand. The method of propagation is similar to the minimum variance technique and is outlined in Dan Weimer et al. [2003; 2004]. The basic method is to find the minimum variance direction of the magnetic field in the plane orthogonal to the mean magnetic field direction. This minimum variance direction is then dotted with the difference between final position vector minus the original position vector and the quantity is divided by the minimum variance dotted with the solar wind velocity vector, which gives the propagation time. This method does not work well for shocks and minimum variance directions with tilts greater than 70 degrees of the sun-earth line. This data set was originally constructed by Dr. J.M. Weygand for Prof. R.L. McPherron, who was the principle investigator of two National Science Foundation studies: GEM Grant ATM 02-1798 and a Space Weather Grant ATM 02-08501. These data were primarily used in superposed epoch studies References: Weimer, D. R. (2004), Correction to ‘‘Predicting interplanetary magnetic field (IMF) propagation delay times using the minimum variance technique,’’ J. Geophys. Res., 109, A12104, doi:10.1029/2004JA010691. Weimer, D.R., D.M. Ober, N.C. Maynard, M.R. Collier, D.J. McComas, N.F. Ness, C. W. Smith, and J. Watermann (2003), Predicting interplanetary magnetic field (IMF) propagation delay times using the minimum variance technique, J. Geophys. Res., 108, 1026, doi:10.1029/2002JA009405.
Facebook
TwitterACE Weimer propagated solar wind data and linearly interpolated to have the measurements on the minute at 60 s resolution SWEPAM data in GSE coordinates. This data set consists of propagated solar wind data that has first been propagated to a position just outside of the nominal bow shock (about 17, 0, 0 Re) and then linearly interpolated to 1 min resolution using the interp1.m function in MATLAB. The input data for this data set is a 1 min resolution processed solar wind data constructed by Dr. J.M. Weygand. The method of propagation is similar to the minimum variance technique and is outlined in Dan Weimer et al. [2003; 2004]. The basic method is to find the minimum variance direction of the magnetic field in the plane orthogonal to the mean magnetic field direction. This minimum variance direction is then dotted with the difference between final position vector minus the original position vector and the quantity is divided by the minimum variance dotted with the solar wind velocity vector, which gives the propagation time. This method does not work well for shocks and minimum variance directions with tilts greater than 70 degrees of the sun-earth line. This data set was originally constructed by Dr. J.M. Weygand for Prof. R.L. McPherron, who was the principle investigator of two National Science Foundation studies: GEM Grant ATM 02-1798 and a Space Weather Grant ATM 02-08501. These data were primarily used in superposed epoch studies References: Weimer, D. R. (2004), Correction to Predicting interplanetary magnetic field (IMF) propagation delay times using the minimum variance technique, J. Geophys. Res., 109, A12104, doi:10.1029/2004JA010691. Weimer, D.R., D.M. Ober, N.C. Maynard, M.R. Collier, D.J. McComas, N.F. Ness, C. W. Smith, and J. Watermann (2003), Predicting interplanetary magnetic field (IMF) propagation delay times using the minimum variance technique, J. Geophys. Res., 108, 1026, doi:10.1029/2002JA009405.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
About Datasets:
Domain : Finance Project: Variance Analysis Datasets: Budget vs Actuals Dataset Type: Excel Data Dataset Size: 482 records
KPI's: 1. Total Income 2. Total Expenses 3. Total Savings 4. Budget vs Actual Income 5. Actual Expenses Breakdown
Process: 1. Understanding the problem 2. Data Collection 3. Exploring and analyzing the data 4. Interpreting the results
This data contains dynamic dashboard, data validation, index match, SUMIFS, conditional formatting, if conditions, column chart, pie chart.