This dataset was created by Johar M. Ashfaque
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bivariate correlations for students of variables n ≥ 250.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT Multiple longitudinal outcomes are common in public health research and adequate methods are required when there is interest in the joint evolution of response variables over time. However, the main drawback of joint modeling procedures is the requirement to specify the joint density of all outcomes and their correlation structure, as well as numerical difficulties in statistical inference, when the dimension of these outcomes increases. To overcome such difficulty, we present two procedures to deal with multivariate longitudinal data. We first present an univariate approach, for which linear mixed-effects models are considered for each response variable separately. Then, a novel copula-based modeling is presented, in order to characterize relationships among the response variables. Both methodologies are applied to a real Brazilian data set on child growth.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background: Clean water is an essential part of human healthy life and wellbeing. More recently, rapid population growth, high illiteracy rate, lack of sustainable development, and climate change; faces a global challenge in developing countries. The discontinuity of drinking water supply forces households either to use unsafe water storage materials or to use water from unsafe sources. The present study aimed to identify the determinants of water source types, use, quality of water, and sanitation perception of physical parameters among urban households in North-West Ethiopia.
Methods: A community-based cross-sectional study was conducted among households from February to March 2019. An interview-based a pretested and structured questionnaire was used to collect the data. Data collection samples were selected randomly and proportional to each of the kebeles' households. MS Excel and R Version 3.6.2 were used to enter and analyze the data; respectively. Descriptive statistics using frequencies and percentages were used to explain the sample data concerning the predictor variable. Both bivariate and multivariate logistic regressions were used to assess the association between independent and response variables.
Results: Four hundred eighteen (418) households have participated. Based on the study undertaken,78.95% of households used improved and 21.05% of households used unimproved drinking water sources. Households drinking water sources were significantly associated with the age of the participant (x2 = 20.392, df=3), educational status(x2 = 19.358, df=4), source of income (x2 = 21.777, df=3), monthly income (x2 = 13.322, df=3), availability of additional facilities (x2 = 98.144, df=7), cleanness status (x2 =42.979, df=4), scarcity of water (x2 = 5.1388, df=1) and family size (x2 = 9.934, df=2). The logistic regression analysis also indicated that those factors are significantly determining the water source types used by the households. Factors such as availability of toilet facility, household member type, and sex of the head of the household were not significantly associated with drinking water sources.
Conclusion: The uses of drinking water from improved sources were determined by different demographic, socio-economic, sanitation, and hygiene-related factors. Therefore, ; the local, regional, and national governments and other supporting organizations shall improve the accessibility and adequacy of drinking water from improved sources in the area.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset for: Leipold, B. & Loepthien, T. (2021). Attentive and emotional listening to music: The role of positive and negative affect. Jahrbuch Musikpsychologie, 30. https://doi.org/10.5964/jbdgm.78 In a cross-sectional study associations of global affect with two ways of listening to music – attentive–analytical listening (AL) and emotional listening (EL) were examined. More specifically, the degrees to which AL and EL are differentially correlated with positive and negative affect were examined. In Study 1, a sample of 1,291 individuals responded to questionnaires on listening to music, positive affect (PA), and negative affect (NA). We used the PANAS that measures PA and NA as high arousal dimensions. AL was positively correlated with PA, EL with NA. Moderation analyses showed stronger associations between PA and AL when NA was low. Study 2 (499 participants) differentiated between three facets of affect and focused, in addition to PA and NA, on the role of relaxation. Similar to the findings of Study 1, AL was correlated with PA, EL with NA and PA. Moderation analyses indicated that the degree to which PA is associated with an individual´s tendency to listen to music attentively depends on their degree of relaxation. In addition, the correlation between pleasant activation and EL was stronger for individuals who were more relaxed; for individuals who were less relaxed the correlation between unpleasant activation and EL was stronger. In sum, the results demonstrate not only simple bivariate correlations, but also that the expected associations vary, depending on the different affective states. We argue that the results reflect a dual function of listening to music, which includes emotional regulation and information processing.: Dataset Study 1
This paper develops estimating parameters for Morgenstern type bivariatedistribution by using bivariate ranked set sampling procedure as an alterna-tive method to simple random sampling. This proposed procedure gives anopportunity to estimate all distribution's parameters simultaneously whichis not investigated in previous studies, yet. In the last part of this paper,simulation studies show properties of the new estimators and compare themwith some other existed estimators.
Objective(s): The 2024 Pediatric Sepsis Data Challenge provides an opportunity to address the lack of appropriate mortality prediction models for LMICs. For this challenge, we are asking participants to develop a working, open-source algorithm to predict in-hospital mortality and length of stay using only the provided synthetic dataset. The original data used to generate the real-world data (RWD) informed synthetic training set available to participants was obtained from a prospective, multisite, observational cohort study of children with suspected sepsis aged 6 months to 60 months at the time of admission to hospitals in Uganda. For this challenge, we have created a RWD-informed synthetically generated training data set to reduce the risk of re-identification in this highly vulnerable population. The synthetic training set was generated from a random subset of the original data (full dataset A) of 2686 records (70% of the total dataset - training dataset B). All challenge solutions will be evaluated against the remaining 1235 records (30% of the total dataset - test dataset C). Data Description: Report describing the comparison of univariate and bivariate distributions between the Synthetic Dataset and Test Dataset C. Additionally, a report showing the maximum mean discrepancy (MMD) and Kullback–Leibler (KL) divergence statistics. Data dictionary for the synthetic training dataset containing 148 variables. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator at sepsiscolab@bcchr.ca or visit our website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The correlation coefficient is a commonly used criterion to measure the strength of a linear relationship between the two quantitative variables. For a bivariate normal distribution, numerous procedures have been proposed for testing a precise null hypothesis of the correlation coefficient, whereas the construction of flexible procedures for testing a set of (multiple) precise and/or interval hypotheses has received less attention. This paper fills the gap by proposing an objective Bayesian testing procedure using the divergence-based priors. The proposed Bayes factors can be used for testing any combination of precise and interval hypotheses and also allow a researcher to quantify evidence in the data in favor of the null or any other hypothesis under consideration. An extensive simulation study is conducted to compare the performances between the proposed Bayesian methods and some existing ones in the literature. Finally, a real-data example is provided for illustrative purposes.
This paper investigates estimating the association parameter of Morgenstern type bivariate distribution using a modified maximum likelihood method where the regular maximum likelihood methods failed to achieve estimation. The simple random sampling, concomitant of ordered statistics and bivariate ranked set sampling methods are used and compared. Efficiency and bias of the produced estimators are compared for two specific examples, Morgenstern type bivariate uniform and exponential distributions.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Maintaining current customers is very important as acquiring new customers is very expensive compared to maintaining current customers. So to understand what rate the customers are leaving Churn is calculated. The dataset contains the customer churn which is calculated by the number of customers who leave the company during a given period. The target variable in the dataset is 'Churn'. There may be many reasons for customer churn like bad onboarding, poor customer service, less engagement, and others.
Target 1. Total charges 2. Monthly charges
CustomerID Gender Senior Citizen Partner Dependents Tenure Phone Service Multiple Lines Internet Service Online Security Online Backup Device Protection Tech Support Streaming TV Streaming Movies Contract Paperless Billing Payment Method Monthly Charges Total Charges Churn
** Acknowledgment**
The dataset was provided by Squark
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The paper considers some of the issues emerging from the discrete wavelet analysis of popular bivariate spectral quantities such as the coherence and phase spectra and the frequency-dependent time delay. The approach utilised here is based on the maximal overlap discrete Hilbert wavelet transform (MODHWT). Firstly, via a broad set of simulation experiments, we examine the small and large sample properties of two wavelet estimators of the scale-dependent time delay. The estimators are the wavelet cross-correlator and the wavelet phase angle-based estimator. Our results provide some practical guidelines for the empirical examination of short- and medium-term lead-lag relations for octave frequency bands. Further, we point out a deficiency in the implementation of the MODHWT and suggest using a modified implementation scheme, which was proposed earlier in the context of the dual-tree complex wavelet transform. In addition, we show how MODHWT-based wavelet quantities can serve to approximate the Fourier bivariate spectra and discuss issues connected with building confidence intervals for them. The discrete wavelet analysis of coherence and phase angle is illustrated with a scale-dependent examination of business cycle synchronisation between 11 euro zone countries. The study is supplemented by a wavelet analysis of the variance and covariance of the euro zone business cycles. The empirical examination underlines the good localisation properties and high computational efficie ncy of the wavelet transformations applied and provides new arguments in favour of the endogeneity hypothesis of the optimum currency area criteria as well as the wavelet evidence on dating the Great Moderation in the euro zone.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Expected bivariate distribution of the number of times bacon and eggs were purchased on four consecutive shopping trips (see [23, 28]).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model’s parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Discrete character-taxon matrices are increasingly being used in an attempt to understand the pattern and tempo of morphological evolution; however, methodological sophistication and bespoke software implementations have lagged behind. In the present study, an attempt is made to provide a state-of-the-art description of methodologies and introduce a new R package (Claddis) for performing foundational disparity (morphologic diversity) and rate calculations. Simulations using its core functions show that: (1) of the two most commonly used distance metrics (Generalized Euclidean Distance and Gower's Coefficient), the latter tends to carry forward more of the true signal; (2) a novel distance metric may improve signal retention further; (3) this signal retention may come at the cost of pruning incomplete taxa from the data set; and (4) the utility of bivariate plots of ordination spaces are undermined by their frequently extremely low variances. By contrast, challenges to estimating morphologic tempo are presented qualitatively, such as how trees are time-scaled and changes are counted. Both disparity and rates deserve better time series approaches that could unlock new macroevolutionary analyses. However, these challenges need not be fatal, and several potential future solutions and directions are suggested.
Usage Notes Matrix used for the tutorialtutorial_matrix.nexAges file for the tutorial data settutorial_ages.txtR code for the tutorialtutorial_code.rR code used for the simulationssimulation_code.r
This study was conducted to address the dropping rates in residential placements of adjudicated youth after the 1990s. Policymakers, advocates, and reseraches began to attirbute the decline to reform measures and proposed that this was the cause of the drop seen in historic national crime. In response, researchers set out to use state-level data on economic factors, crime rates, political ideology scores, and youth justice policies and practices to test the association between the youth justice policy environment and recent reductions in out-of-home placements for adjudicated youth. This data collection contains two files, a multivariate and bivariate analyses. In the multivariate file the aim was to assess the impact of the progressive policy characteristics on the dependent variable which is known as youth confinement. In the bivariate analyses file Wave 1-Wave 10 the aim was to assess the states as they are divided into 2 groups across all 16 dichotomized variables that comprised the progressive policy scale: those with more progressive youth justice environments and those with less progressive or punitive environments. Some examples of these dichotomized variables include purpose clause, courtroom shackling, and competency standard.
Pooling individual samples prior to DNA extraction can mitigate the cost of DNA extraction and genotyping; however, these methods need to accurately generate equal representation of individuals within pools. This data set was generated to determine accuracy of pool construction based on white blood cell counts compared to two common DNA quantification methods. Fifty individual bovine blood samples were collected, and then pooled with all individuals represented in each pool. Pools were constructed with the target of equal representation of each individual animal based on number of white blood cells, spectrophotometric readings, spectrofluorometric readings and whole blood volume with 9 pools per method and a total of 36 pools. Pools and individual samples that comprised the pools were genotyped using a commercially available genotyping array. ASReml was used to estimate variance components for individual animal contribution to pools. The correlation between animal contributions between two pools was estimated using bivariate analysis with starting values set to the result of a univariate analysis. The dataset includes: 1) pooling allele frequencies (PAF) for all pools and individual animals computed from normalized intensities for red (X) and green (Y); PAF = X/(X+Y). 2) Genotypes or number of copies of B(green) allele (0,1,2). 3) Definitions for each sample. Resources in this dataset:Resource Title: Pooling Allele Frequencies (paf) for all pools and individual animals. File Name: pafAnimal.csv.gzResource Description: Pooling Allele Frequencies (paf) for all pools and individual animals computed from normalized intensities for red (X) and green (Y); paf = X / (X + Y)Resource Title: Genotypes for individuals within pools. File Name: g.csv.gzResource Description: Genotypes (number of copies of the B (green) allele (0,1,2)) for individual bovine animals within pools.Resource Title: Sample Definitions . File Name: XY Data Key.xlsxResource Description: Definitions for each sample (both pools and individual animals).
The Wages of War is part of the Correlates of War (COW) Project at the University of Michigan under the guidance of J. David Singer and Melvin Small. The data are meant to be a statistical handbook of war from 1816-1965. It includes mainly bivariate data on every war fought in the time period. War data described cover all international wars that began and ended between 1816 and 1965, and those that satisfied the theoretical criteria: political status of the belligerents and the severity of the armed conflict in terms of battle-connected fatalities. These criteria are applied to a chronological list of all deadly quarrels between 1816 and 1965. Note: It is believed these data contain serious errors and should not be used. However, since the replacement data for International Disputes was never provided to ICPSR by the principal investigator, these files may be the only source for parts of those data.
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR -- https://doi.org/10.3886/ICPSR09905.v1. We highly recommend using the ICPSR version as they made this dataset available in multiple data formats and for additional years of data,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The field of community ecology is evolving rapidly as researchers are able to tie functions of systems to variation in taxa. In inferring processes, functions, and causal taxa, common practice is to assume a ‘core’ community can be defined. The core refers to a group of taxa found across samples, and statistically, is the discretization or categorization of continuous data. Assuming thresholds in abundance exist, and that a core microbiome exists, has the potential to be misleading. Rather, the existence of a core set of taxa should be treated as a hypothesis with support from empirical observations. An additional challenge is that there is no standard set of criteria for core membership. Consequently, comparison across studies is often impossible. We considered four common methods for defining a core and applied them to 25 simulations that cover a range of plausible communities and two published microbial data sets. Next, we used hierarchical clustering and bivariate plots of mean taxon abundance and variance to evaluate each method. Assignment of taxa to the core varied substantially among methods. Across simulations and published data sets, hierarchical clustering of taxa based on their abundance and prevalence (variation) offered no support for a core set of taxa. The categorization of taxa into sets corresponding to a core community and other taxa has the potential to be misleading. Given that the concept of core communities received poor support from data, the concept is questionable and should not be used without testing its validity in any particular context.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 4: Results Linear Mixed-Effects Models with interaction terms of Threshold × Group × POD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a diverse collection of pre-processed flow cytometry data assembled
to support the training and evaluation of machine learning (ML) models for the gating of
cell populations. The data was curated through a citizen science initiative embedded in
the EVE Online video game, known as Project Discovery. Participants contributed to
scientific research by gating bivariate plots generated from flow cytometry data, creating
a crowdsourced reference set. The original flow cytometry datasets were sourced from
publicly available COVID-19 and immunology-related studies on FlowRepository.org and
PubMed. Data were compensated, transformed, and split into bivariate plots for analysis.
This datset includes: 1) CSV files containing two-channel marker combinations per plot, 2)
A SQL database capturing player-generated gating polygons in normalized coordinates, 3)
Scripts and containerized environments (Singularity and Docker) for reproducible
evaluation of gating accuracy and consensus scoring using the flowMagic
pipeline, 4)
Code for filtering bot inputs, evaluating user submissions, calculating F1 scores, and
generating consensus gating regions. This data is especially valuable for training and
benchmarking models that aim to automate the labor-intensive gating process in
immunological and clinical cytometry applications.
This dataset was created by Johar M. Ashfaque