99 datasets found
  1. Bivariate Data Set with 3 Clusters

    • kaggle.com
    Updated Mar 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johar M. Ashfaque (2020). Bivariate Data Set with 3 Clusters [Dataset]. https://www.kaggle.com/ukveteran/bivariate-data-set-with-3-clusters/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Johar M. Ashfaque
    Description

    Dataset

    This dataset was created by Johar M. Ashfaque

    Contents

  2. f

    Bivariate correlations for students of variables n ≥ 250.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher D. Lynn; Michaela E. Howells; Max J. Stein (2023). Bivariate correlations for students of variables n ≥ 250. [Dataset]. http://doi.org/10.1371/journal.pone.0203500.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Christopher D. Lynn; Michaela E. Howells; Max J. Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bivariate correlations for students of variables n ≥ 250.

  3. f

    Data from: Bivariate Copula-based Linear Mixed-effects Models: An...

    • scielo.figshare.com
    jpeg
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P.H. FERREIRA; R.L. FIACCONE; J.S. LORDELO; S.O.L. SENA; V.R. DURAN (2023). Bivariate Copula-based Linear Mixed-effects Models: An Application to Longitudinal Child Growth Data [Dataset]. http://doi.org/10.6084/m9.figshare.8227328.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELO journals
    Authors
    P.H. FERREIRA; R.L. FIACCONE; J.S. LORDELO; S.O.L. SENA; V.R. DURAN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Multiple longitudinal outcomes are common in public health research and adequate methods are required when there is interest in the joint evolution of response variables over time. However, the main drawback of joint modeling procedures is the requirement to specify the joint density of all outcomes and their correlation structure, as well as numerical difficulties in statistical inference, when the dimension of these outcomes increases. To overcome such difficulty, we present two procedures to deal with multivariate longitudinal data. We first present an univariate approach, for which linear mixed-effects models are considered for each response variable separately. Then, a novel copula-based modeling is presented, in order to characterize relationships among the response variables. Both methodologies are applied to a real Brazilian data set on child growth.

  4. Survey Data of the socio-demographic, economic and water source types that...

    • zenodo.org
    • datadryad.org
    bin, csv
    Updated Jun 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shewayiref Geremew Gebremichael; Shewayiref Geremew Gebremichael (2022). Survey Data of the socio-demographic, economic and water source types that influences HHs drinking water supply [Dataset]. http://doi.org/10.5061/dryad.mw6m905w8
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jun 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shewayiref Geremew Gebremichael; Shewayiref Geremew Gebremichael
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Background: Clean water is an essential part of human healthy life and wellbeing. More recently, rapid population growth, high illiteracy rate, lack of sustainable development, and climate change; faces a global challenge in developing countries. The discontinuity of drinking water supply forces households either to use unsafe water storage materials or to use water from unsafe sources. The present study aimed to identify the determinants of water source types, use, quality of water, and sanitation perception of physical parameters among urban households in North-West Ethiopia.

    Methods: A community-based cross-sectional study was conducted among households from February to March 2019. An interview-based a pretested and structured questionnaire was used to collect the data. Data collection samples were selected randomly and proportional to each of the kebeles' households. MS Excel and R Version 3.6.2 were used to enter and analyze the data; respectively. Descriptive statistics using frequencies and percentages were used to explain the sample data concerning the predictor variable. Both bivariate and multivariate logistic regressions were used to assess the association between independent and response variables.

    Results: Four hundred eighteen (418) households have participated. Based on the study undertaken,78.95% of households used improved and 21.05% of households used unimproved drinking water sources. Households drinking water sources were significantly associated with the age of the participant (x2 = 20.392, df=3), educational status(x2 = 19.358, df=4), source of income (x2 = 21.777, df=3), monthly income (x2 = 13.322, df=3), availability of additional facilities (x2 = 98.144, df=7), cleanness status (x2 =42.979, df=4), scarcity of water (x2 = 5.1388, df=1) and family size (x2 = 9.934, df=2). The logistic regression analysis also indicated that those factors are significantly determining the water source types used by the households. Factors such as availability of toilet facility, household member type, and sex of the head of the household were not significantly associated with drinking water sources.

    Conclusion: The uses of drinking water from improved sources were determined by different demographic, socio-economic, sanitation, and hygiene-related factors. Therefore, ; the local, regional, and national governments and other supporting organizations shall improve the accessibility and adequacy of drinking water from improved sources in the area.

  5. p

    Music & Affect 2020 Dataset Study 1.csv

    • psycharchives.org
    Updated Sep 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Music & Affect 2020 Dataset Study 1.csv [Dataset]. https://www.psycharchives.org/handle/20.500.12034/3089
    Explore at:
    Dataset updated
    Sep 17, 2020
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset for: Leipold, B. & Loepthien, T. (2021). Attentive and emotional listening to music: The role of positive and negative affect. Jahrbuch Musikpsychologie, 30. https://doi.org/10.5964/jbdgm.78 In a cross-sectional study associations of global affect with two ways of listening to music – attentive–analytical listening (AL) and emotional listening (EL) were examined. More specifically, the degrees to which AL and EL are differentially correlated with positive and negative affect were examined. In Study 1, a sample of 1,291 individuals responded to questionnaires on listening to music, positive affect (PA), and negative affect (NA). We used the PANAS that measures PA and NA as high arousal dimensions. AL was positively correlated with PA, EL with NA. Moderation analyses showed stronger associations between PA and AL when NA was low. Study 2 (499 participants) differentiated between three facets of affect and focused, in addition to PA and NA, on the role of relaxation. Similar to the findings of Study 1, AL was correlated with PA, EL with NA and PA. Moderation analyses indicated that the degree to which PA is associated with an individual´s tendency to listen to music attentively depends on their degree of relaxation. In addition, the correlation between pleasant activation and EL was stronger for individuals who were more relaxed; for individuals who were less relaxed the correlation between unpleasant activation and EL was stronger. In sum, the results demonstrate not only simple bivariate correlations, but also that the expected associations vary, depending on the different affective states. We argue that the results reflect a dual function of listening to music, which includes emotional regulation and information processing.: Dataset Study 1

  6. s

    Data from: Estimating parameters of Morgenstern type bivariate distribution...

    • sindex.sdl.edu.sa
    Updated Apr 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Al Kadiri; Mohammad Migdadi (2019). Estimating parameters of Morgenstern type bivariate distribution using bivariate ranked set sampling [Dataset]. https://sindex.sdl.edu.sa/esploro/outputs/dataset/Estimating-parameters-of-Morgenstern-type-bivariate/9948648408331
    Explore at:
    Dataset updated
    Apr 1, 2019
    Dataset provided by
    University of Salento
    Authors
    Mohammad Al Kadiri; Mohammad Migdadi
    Time period covered
    Apr 1, 2019
    Description

    This paper develops estimating parameters for Morgenstern type bivariatedistribution by using bivariate ranked set sampling procedure as an alterna-tive method to simple random sampling. This proposed procedure gives anopportunity to estimate all distribution's parameters simultaneously whichis not investigated in previous studies, yet. In the last part of this paper,simulation studies show properties of the new estimators and compare themwith some other existed estimators.

  7. d

    Data Challenges: 2024 Pediatric Sepsis Challenge

    • search.dataone.org
    • borealisdata.ca
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nguyen, Vuong; Huxford, Charly; Rafiei, Alireza; Wiens, Matthew; Ansermino, J Mark; Kissoon, Niranjan; Kamaleswaran, Rishikesan (2024). Data Challenges: 2024 Pediatric Sepsis Challenge [Dataset]. http://doi.org/10.5683/SP3/TFAV36
    Explore at:
    Dataset updated
    Aug 28, 2024
    Dataset provided by
    Borealis
    Authors
    Nguyen, Vuong; Huxford, Charly; Rafiei, Alireza; Wiens, Matthew; Ansermino, J Mark; Kissoon, Niranjan; Kamaleswaran, Rishikesan
    Description

    Objective(s): The 2024 Pediatric Sepsis Data Challenge provides an opportunity to address the lack of appropriate mortality prediction models for LMICs. For this challenge, we are asking participants to develop a working, open-source algorithm to predict in-hospital mortality and length of stay using only the provided synthetic dataset. The original data used to generate the real-world data (RWD) informed synthetic training set available to participants was obtained from a prospective, multisite, observational cohort study of children with suspected sepsis aged 6 months to 60 months at the time of admission to hospitals in Uganda. For this challenge, we have created a RWD-informed synthetically generated training data set to reduce the risk of re-identification in this highly vulnerable population. The synthetic training set was generated from a random subset of the original data (full dataset A) of 2686 records (70% of the total dataset - training dataset B). All challenge solutions will be evaluated against the remaining 1235 records (30% of the total dataset - test dataset C). Data Description: Report describing the comparison of univariate and bivariate distributions between the Synthetic Dataset and Test Dataset C. Additionally, a report showing the maximum mean discrepancy (MMD) and Kullback–Leibler (KL) divergence statistics. Data dictionary for the synthetic training dataset containing 148 variables. NOTE for restricted files: If you are not yet a CoLab member, please complete our membership application survey to gain access to restricted files within 2 business days. Some files may remain restricted to CoLab members. These files are deemed more sensitive by the file owner and are meant to be shared on a case-by-case basis. Please contact the CoLab coordinator at sepsiscolab@bcchr.ca or visit our website.

  8. f

    Data from: Objective Bayesian testing for the correlation coefficient under...

    • tandf.figshare.com
    txt
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bo Peng; Min Wang (2024). Objective Bayesian testing for the correlation coefficient under divergence-based priors [Dataset]. http://doi.org/10.6084/m9.figshare.10260752.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Bo Peng; Min Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The correlation coefficient is a commonly used criterion to measure the strength of a linear relationship between the two quantitative variables. For a bivariate normal distribution, numerous procedures have been proposed for testing a precise null hypothesis of the correlation coefficient, whereas the construction of flexible procedures for testing a set of (multiple) precise and/or interval hypotheses has received less attention. This paper fills the gap by proposing an objective Bayesian testing procedure using the divergence-based priors. The proposed Bayes factors can be used for testing any combination of precise and interval hypotheses and also allow a researcher to quantify evidence in the data in favor of the null or any other hypothesis under consideration. An extensive simulation study is conducted to compare the performances between the proposed Bayesian methods and some existing ones in the literature. Finally, a real-data example is provided for illustrative purposes.

  9. s

    Data from: Estimating Morgenstern Type Bivariate Association Parameter Using...

    • sindex.sdl.edu.sa
    Updated Apr 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Al Kadiri; Mohammad Migdadi; M. K. Migdadi (2019). Estimating Morgenstern Type Bivariate Association Parameter Using a Modified Maximum Likelihood Method [Dataset]. https://sindex.sdl.edu.sa/esploro/outputs/dataset/Estimating-Morgenstern-Type-Bivariate-Association-Parameter/9949745008331
    Explore at:
    Dataset updated
    Apr 1, 2019
    Dataset provided by
    University of Salento
    Authors
    Mohammad Al Kadiri; Mohammad Migdadi; M. K. Migdadi
    Time period covered
    Apr 1, 2019
    Description

    This paper investigates estimating the association parameter of Morgenstern type bivariate distribution using a modified maximum likelihood method where the regular maximum likelihood methods failed to achieve estimation. The simple random sampling, concomitant of ordered statistics and bivariate ranked set sampling methods are used and compared. Efficiency and bias of the produced estimators are compared for two specific examples, Morgenstern type bivariate uniform and exponential distributions.

  10. Churn Data set

    • kaggle.com
    Updated Nov 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Maheshwarappa (2020). Churn Data set [Dataset]. https://www.kaggle.com/abhigm/churn-data-set/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 6, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abhishek Maheshwarappa
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Customers

    Maintaining current customers is very important as acquiring new customers is very expensive compared to maintaining current customers. So to understand what rate the customers are leaving Churn is calculated. The dataset contains the customer churn which is calculated by the number of customers who leave the company during a given period. The target variable in the dataset is 'Churn'. There may be many reasons for customer churn like bad onboarding, poor customer service, less engagement, and others.

    Data Set Characteristics:

    Classification

    1. Bivariate - Target Churn
    2. Multivariate - Target Contract

    Regression

    Target 1. Total charges 2. Monthly charges

    Number of Instances: 6499

    Features

    CustomerID Gender Senior Citizen Partner Dependents Tenure Phone Service Multiple Lines Internet Service Online Security Online Backup Device Protection Tech Support Streaming TV Streaming Movies Contract Paperless Billing Payment Method Monthly Charges Total Charges Churn

    ** Acknowledgment**

    The dataset was provided by Squark

  11. H

    Some Aspects of the Discrete Wavelet Analysis of Bivariate Spectra for...

    • data.niaid.nih.gov
    xls
    Updated Oct 4, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joanna Bruzda (2011). Some Aspects of the Discrete Wavelet Analysis of Bivariate Spectra for Business Cycle Synchronisation [Dataset] [Dataset]. http://doi.org/10.7910/DVN/JP6YQZ
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 4, 2011
    Authors
    Joanna Bruzda
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Euro Area
    Description

    The paper considers some of the issues emerging from the discrete wavelet analysis of popular bivariate spectral quantities such as the coherence and phase spectra and the frequency-dependent time delay. The approach utilised here is based on the maximal overlap discrete Hilbert wavelet transform (MODHWT). Firstly, via a broad set of simulation experiments, we examine the small and large sample properties of two wavelet estimators of the scale-dependent time delay. The estimators are the wavelet cross-correlator and the wavelet phase angle-based estimator. Our results provide some practical guidelines for the empirical examination of short- and medium-term lead-lag relations for octave frequency bands. Further, we point out a deficiency in the implementation of the MODHWT and suggest using a modified implementation scheme, which was proposed earlier in the context of the dual-tree complex wavelet transform. In addition, we show how MODHWT-based wavelet quantities can serve to approximate the Fourier bivariate spectra and discuss issues connected with building confidence intervals for them. The discrete wavelet analysis of coherence and phase angle is illustrated with a scale-dependent examination of business cycle synchronisation between 11 euro zone countries. The study is supplemented by a wavelet analysis of the variance and covariance of the euro zone business cycles. The empirical examination underlines the good localisation properties and high computational efficie ncy of the wavelet transformations applied and provides new arguments in favour of the endogeneity hypothesis of the optimum currency area criteria as well as the wavelet evidence on dating the Great Moderation in the euro zone.

  12. f

    Expected bivariate distribution of the number of times bacon and eggs were...

    • figshare.com
    xls
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiziano Squartini; Enrico Ser-Giacomi; Diego Garlaschelli; George Judge (2023). Expected bivariate distribution of the number of times bacon and eggs were purchased on four consecutive shopping trips (see [23, 28]). [Dataset]. http://doi.org/10.1371/journal.pone.0125077.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Tiziano Squartini; Enrico Ser-Giacomi; Diego Garlaschelli; George Judge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Expected bivariate distribution of the number of times bacon and eggs were purchased on four consecutive shopping trips (see [23, 28]).

  13. f

    Multivariate Longitudinal Analysis with Bivariate Correlation Test

    • plos.figshare.com
    application/gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel (2023). Multivariate Longitudinal Analysis with Bivariate Correlation Test [Dataset]. http://doi.org/10.1371/journal.pone.0159649
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model’s parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated.

  14. m

    Data from: Estimating morphological diversity and tempo with discrete...

    • figshare.mq.edu.au
    • researchdata.edu.au
    • +1more
    bin
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graeme T. Lloyd (2023). Data from: Estimating morphological diversity and tempo with discrete character-taxon matrices: implementation, challenges, progress, and future directions [Dataset]. http://doi.org/10.5061/dryad.gp16s
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Macquarie University
    Authors
    Graeme T. Lloyd
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Discrete character-taxon matrices are increasingly being used in an attempt to understand the pattern and tempo of morphological evolution; however, methodological sophistication and bespoke software implementations have lagged behind. In the present study, an attempt is made to provide a state-of-the-art description of methodologies and introduce a new R package (Claddis) for performing foundational disparity (morphologic diversity) and rate calculations. Simulations using its core functions show that: (1) of the two most commonly used distance metrics (Generalized Euclidean Distance and Gower's Coefficient), the latter tends to carry forward more of the true signal; (2) a novel distance metric may improve signal retention further; (3) this signal retention may come at the cost of pruning incomplete taxa from the data set; and (4) the utility of bivariate plots of ordination spaces are undermined by their frequently extremely low variances. By contrast, challenges to estimating morphologic tempo are presented qualitatively, such as how trees are time-scaled and changes are counted. Both disparity and rates deserve better time series approaches that could unlock new macroevolutionary analyses. However, these challenges need not be fatal, and several potential future solutions and directions are suggested.

    Usage Notes Matrix used for the tutorialtutorial_matrix.nexAges file for the tutorial data settutorial_ages.txtR code for the tutorialtutorial_code.rR code used for the simulationssimulation_code.r

  15. Data from: Youth Justice Policy Environments and Their Effects on Youth...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Juvenile Justice and Delinquency Prevention (2025). Youth Justice Policy Environments and Their Effects on Youth Confinement Rates, United States, 1996-2016 [Dataset]. https://catalog.data.gov/dataset/youth-justice-policy-environments-and-their-effects-on-youth-confinement-rates-united-1996-2a380
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    Office of Juvenile Justice and Delinquency Preventionhttp://ojjdp.gov/
    Area covered
    United States
    Description

    This study was conducted to address the dropping rates in residential placements of adjudicated youth after the 1990s. Policymakers, advocates, and reseraches began to attirbute the decline to reform measures and proposed that this was the cause of the drop seen in historic national crime. In response, researchers set out to use state-level data on economic factors, crime rates, political ideology scores, and youth justice policies and practices to test the association between the youth justice policy environment and recent reductions in out-of-home placements for adjudicated youth. This data collection contains two files, a multivariate and bivariate analyses. In the multivariate file the aim was to assess the impact of the progressive policy characteristics on the dependent variable which is known as youth confinement. In the bivariate analyses file Wave 1-Wave 10 the aim was to assess the states as they are divided into 2 groups across all 16 dichotomized variables that comprised the progressive policy scale: those with more progressive youth justice environments and those with less progressive or punitive environments. Some examples of these dichotomized variables include purpose clause, courtroom shackling, and competency standard.

  16. d

    Data from: Evaluating accuracy of DNA pool construction based on white blood...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Evaluating accuracy of DNA pool construction based on white blood cell counts [Dataset]. https://catalog.data.gov/dataset/evaluating-accuracy-of-dna-pool-construction-based-on-white-blood-cell-counts-0130b
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Pooling individual samples prior to DNA extraction can mitigate the cost of DNA extraction and genotyping; however, these methods need to accurately generate equal representation of individuals within pools. This data set was generated to determine accuracy of pool construction based on white blood cell counts compared to two common DNA quantification methods. Fifty individual bovine blood samples were collected, and then pooled with all individuals represented in each pool. Pools were constructed with the target of equal representation of each individual animal based on number of white blood cells, spectrophotometric readings, spectrofluorometric readings and whole blood volume with 9 pools per method and a total of 36 pools. Pools and individual samples that comprised the pools were genotyped using a commercially available genotyping array. ASReml was used to estimate variance components for individual animal contribution to pools. The correlation between animal contributions between two pools was estimated using bivariate analysis with starting values set to the result of a univariate analysis. The dataset includes: 1) pooling allele frequencies (PAF) for all pools and individual animals computed from normalized intensities for red (X) and green (Y); PAF = X/(X+Y). 2) Genotypes or number of copies of B(green) allele (0,1,2). 3) Definitions for each sample. Resources in this dataset:Resource Title: Pooling Allele Frequencies (paf) for all pools and individual animals. File Name: pafAnimal.csv.gzResource Description: Pooling Allele Frequencies (paf) for all pools and individual animals computed from normalized intensities for red (X) and green (Y); paf = X / (X + Y)Resource Title: Genotypes for individuals within pools. File Name: g.csv.gzResource Description: Genotypes (number of copies of the B (green) allele (0,1,2)) for individual bovine animals within pools.Resource Title: Sample Definitions . File Name: XY Data Key.xlsxResource Description: Definitions for each sample (both pools and individual animals).

  17. c

    Wages of War, 1816-1965

    • archive.ciser.cornell.edu
    Updated Dec 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Singer; Melvin Small (2019). Wages of War, 1816-1965 [Dataset]. http://doi.org/10.6077/j5/ai6soy
    Explore at:
    Dataset updated
    Dec 27, 2019
    Authors
    J. Singer; Melvin Small
    Variables measured
    EventOrProcess
    Description

    The Wages of War is part of the Correlates of War (COW) Project at the University of Michigan under the guidance of J. David Singer and Melvin Small. The data are meant to be a statistical handbook of war from 1816-1965. It includes mainly bivariate data on every war fought in the time period. War data described cover all international wars that began and ended between 1816 and 1965, and those that satisfied the theoretical criteria: political status of the belligerents and the severity of the armed conflict in terms of battle-connected fatalities. These criteria are applied to a chronological list of all deadly quarrels between 1816 and 1965. Note: It is believed these data contain serious errors and should not be used. However, since the replacement data for International Disputes was never provided to ICPSR by the principal investigator, these files may be the only source for parts of those data.

    Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR -- https://doi.org/10.3886/ICPSR09905.v1. We highly recommend using the ICPSR version as they made this dataset available in multiple data formats and for additional years of data,

  18. Z

    Data from: The hypothesis of a 'core' community receives poor support when...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gordon Custer (2021). The hypothesis of a 'core' community receives poor support when confronted with simulated and empirical data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2231200
    Explore at:
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    Gordon Custer
    Linda TA van Diepen
    Alex Buerkle
    Maya Gans
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The field of community ecology is evolving rapidly as researchers are able to tie functions of systems to variation in taxa. In inferring processes, functions, and causal taxa, common practice is to assume a ‘core’ community can be defined. The core refers to a group of taxa found across samples, and statistically, is the discretization or categorization of continuous data. Assuming thresholds in abundance exist, and that a core microbiome exists, has the potential to be misleading. Rather, the existence of a core set of taxa should be treated as a hypothesis with support from empirical observations. An additional challenge is that there is no standard set of criteria for core membership. Consequently, comparison across studies is often impossible. We considered four common methods for defining a core and applied them to 25 simulations that cover a range of plausible communities and two published microbial data sets. Next, we used hierarchical clustering and bivariate plots of mean taxon abundance and variance to evaluate each method. Assignment of taxa to the core varied substantially among methods. Across simulations and published data sets, hierarchical clustering of taxa based on their abundance and prevalence (variation) offered no support for a core set of taxa. The categorization of taxa into sets corresponding to a core community and other taxa has the potential to be misleading. Given that the concept of core communities received poor support from data, the concept is questionable and should not be used without testing its validity in any particular context.

  19. f

    Additional file 4 of Procalcitonin for the diagnosis of postoperative...

    • springernature.figshare.com
    xlsx
    Updated Aug 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davide Nicolotti; Silvia Grossi; Valeria Palermo; Federico Pontone; Giuseppe Maglietta; Francesca Diodati; Matteo Puntoni; Sandra Rossi; Caterina Caminiti (2024). Additional file 4 of Procalcitonin for the diagnosis of postoperative bacterial infection after adult cardiac surgery: a systematic review and meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26678171.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 14, 2024
    Dataset provided by
    figshare
    Authors
    Davide Nicolotti; Silvia Grossi; Valeria Palermo; Federico Pontone; Giuseppe Maglietta; Francesca Diodati; Matteo Puntoni; Sandra Rossi; Caterina Caminiti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4: Results Linear Mixed-Effects Models with interaction terms of Threshold × Group × POD.

  20. F

    Crowdsourced Flow Cytometry Dataset from EVE Online’s Project Discovery for...

    • frdr-dfdr.ca
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brinkman, Ryan; Yokosawa, Daniel Y. O. (2025). Crowdsourced Flow Cytometry Dataset from EVE Online’s Project Discovery for Machine Learning Applications [Dataset]. http://doi.org/10.20383/103.01043
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Federated Research Data Repository / dépôt fédéré de données de recherche
    Authors
    Brinkman, Ryan; Yokosawa, Daniel Y. O.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a diverse collection of pre-processed flow cytometry data assembled to support the training and evaluation of machine learning (ML) models for the gating of cell populations. The data was curated through a citizen science initiative embedded in the EVE Online video game, known as Project Discovery. Participants contributed to scientific research by gating bivariate plots generated from flow cytometry data, creating a crowdsourced reference set. The original flow cytometry datasets were sourced from publicly available COVID-19 and immunology-related studies on FlowRepository.org and PubMed. Data were compensated, transformed, and split into bivariate plots for analysis. This datset includes: 1) CSV files containing two-channel marker combinations per plot, 2) A SQL database capturing player-generated gating polygons in normalized coordinates, 3) Scripts and containerized environments (Singularity and Docker) for reproducible evaluation of gating accuracy and consensus scoring using the flowMagic pipeline, 4) Code for filtering bot inputs, evaluating user submissions, calculating F1 scores, and generating consensus gating regions. This data is especially valuable for training and benchmarking models that aim to automate the labor-intensive gating process in immunological and clinical cytometry applications.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Johar M. Ashfaque (2020). Bivariate Data Set with 3 Clusters [Dataset]. https://www.kaggle.com/ukveteran/bivariate-data-set-with-3-clusters/code
Organization logo

Bivariate Data Set with 3 Clusters

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 30, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Johar M. Ashfaque
Description

Dataset

This dataset was created by Johar M. Ashfaque

Contents

Search
Clear search
Close search
Google apps
Main menu