100+ datasets found
  1. Power of Bivariate vs. Univariate Analyses for the Combined Data of...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei Zhang; Yu-Fang Pei; Jian Li; Christopher J. Papasian; Hong-Wen Deng (2023). Power of Bivariate vs. Univariate Analyses for the Combined Data of Unrelated Samples and Nuclear Families (One Binary Trait and One Continuous Trait). [Dataset]. http://doi.org/10.1371/journal.pone.0006502.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lei Zhang; Yu-Fang Pei; Jian Li; Christopher J. Papasian; Hong-Wen Deng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Notes: Three population structures are considered. For the binary trait, the OR value ranges from 12 to 1.5. For the continuous trait, the contribution of the causal site ranges from 0.0025 to 0.01. Powers are estimated on 1,000 replicates. See notes in Table 1 for sample sizes.Abbreviations: T12, the proposed test for bivariate analysis; T1, the proposed test for only the first trait; T2, the proposed test for only the second trait.

  2. Survey Data of the socio-demographic, economic and water source types that...

    • zenodo.org
    • datadryad.org
    bin, csv
    Updated Jun 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shewayiref Geremew Gebremichael; Shewayiref Geremew Gebremichael (2022). Survey Data of the socio-demographic, economic and water source types that influences HHs drinking water supply [Dataset]. http://doi.org/10.5061/dryad.mw6m905w8
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jun 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shewayiref Geremew Gebremichael; Shewayiref Geremew Gebremichael
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Background: Clean water is an essential part of human healthy life and wellbeing. More recently, rapid population growth, high illiteracy rate, lack of sustainable development, and climate change; faces a global challenge in developing countries. The discontinuity of drinking water supply forces households either to use unsafe water storage materials or to use water from unsafe sources. The present study aimed to identify the determinants of water source types, use, quality of water, and sanitation perception of physical parameters among urban households in North-West Ethiopia.

    Methods: A community-based cross-sectional study was conducted among households from February to March 2019. An interview-based a pretested and structured questionnaire was used to collect the data. Data collection samples were selected randomly and proportional to each of the kebeles' households. MS Excel and R Version 3.6.2 were used to enter and analyze the data; respectively. Descriptive statistics using frequencies and percentages were used to explain the sample data concerning the predictor variable. Both bivariate and multivariate logistic regressions were used to assess the association between independent and response variables.

    Results: Four hundred eighteen (418) households have participated. Based on the study undertaken,78.95% of households used improved and 21.05% of households used unimproved drinking water sources. Households drinking water sources were significantly associated with the age of the participant (x2 = 20.392, df=3), educational status(x2 = 19.358, df=4), source of income (x2 = 21.777, df=3), monthly income (x2 = 13.322, df=3), availability of additional facilities (x2 = 98.144, df=7), cleanness status (x2 =42.979, df=4), scarcity of water (x2 = 5.1388, df=1) and family size (x2 = 9.934, df=2). The logistic regression analysis also indicated that those factors are significantly determining the water source types used by the households. Factors such as availability of toilet facility, household member type, and sex of the head of the household were not significantly associated with drinking water sources.

    Conclusion: The uses of drinking water from improved sources were determined by different demographic, socio-economic, sanitation, and hygiene-related factors. Therefore, ; the local, regional, and national governments and other supporting organizations shall improve the accessibility and adequacy of drinking water from improved sources in the area.

  3. f

    An example of combining ANOVA terms for bivariate principle component data...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Skalski, John R.; Townsend, Richard L.; Richins, Shelby M. (2018). An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000666955
    Explore at:
    Dataset updated
    Oct 24, 2018
    Authors
    Skalski, John R.; Townsend, Richard L.; Richins, Shelby M.
    Description

    An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared.

  4. i

    Sample points for constructing Bivariate Cut-HDMR and CCD metamodels

    • ieee-dataport.org
    Updated May 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-Hsiang Yang (2021). Sample points for constructing Bivariate Cut-HDMR and CCD metamodels [Dataset]. https://ieee-dataport.org/documents/sample-points-constructing-bivariate-cut-hdmr-and-ccd-metamodels
    Explore at:
    Dataset updated
    May 18, 2021
    Authors
    Yu-Hsiang Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    045 sample points that were used to construct the CCD metamodel; and (3) the Monte Carlo simulation sample points that used for validation.

  5. f

    Data from: Bivariate Analysis of Distribution Functions Under Biased...

    • tandf.figshare.com
    • figshare.com
    txt
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hsin-wen Chang; Shu-Hsiang Wang (2024). Bivariate Analysis of Distribution Functions Under Biased Sampling [Dataset]. http://doi.org/10.6084/m9.figshare.23998414.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 17, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Hsin-wen Chang; Shu-Hsiang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article compares distribution functions among pairs of locations in their domains, in contrast to the typical approach of univariate comparison across individual locations. This bivariate approach is studied in the presence of sampling bias, which has been gaining attention in COVID-19 studies that over-represent more symptomatic people. In cases with either known or unknown sampling bias, we introduce Anderson–Darling-type tests based on both the univariate and bivariate formulation. A simulation study shows the superior performance of the bivariate approach over the univariate one. We illustrate the proposed methods using real data on the distribution of the number of symptoms suggestive of COVID-19.

  6. p

    Music & Affect 2020 Dataset Study 2.csv

    • psycharchives.org
    Updated Sep 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Music & Affect 2020 Dataset Study 2.csv [Dataset]. https://www.psycharchives.org/handle/20.500.12034/3089
    Explore at:
    Dataset updated
    Sep 17, 2020
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset for: Leipold, B. & Loepthien, T. (2021). Attentive and emotional listening to music: The role of positive and negative affect. Jahrbuch Musikpsychologie, 30. https://doi.org/10.5964/jbdgm.78 In a cross-sectional study associations of global affect with two ways of listening to music – attentive–analytical listening (AL) and emotional listening (EL) were examined. More specifically, the degrees to which AL and EL are differentially correlated with positive and negative affect were examined. In Study 1, a sample of 1,291 individuals responded to questionnaires on listening to music, positive affect (PA), and negative affect (NA). We used the PANAS that measures PA and NA as high arousal dimensions. AL was positively correlated with PA, EL with NA. Moderation analyses showed stronger associations between PA and AL when NA was low. Study 2 (499 participants) differentiated between three facets of affect and focused, in addition to PA and NA, on the role of relaxation. Similar to the findings of Study 1, AL was correlated with PA, EL with NA and PA. Moderation analyses indicated that the degree to which PA is associated with an individual´s tendency to listen to music attentively depends on their degree of relaxation. In addition, the correlation between pleasant activation and EL was stronger for individuals who were more relaxed; for individuals who were less relaxed the correlation between unpleasant activation and EL was stronger. In sum, the results demonstrate not only simple bivariate correlations, but also that the expected associations vary, depending on the different affective states. We argue that the results reflect a dual function of listening to music, which includes emotional regulation and information processing.: Dataset Study 2

  7. m

    MASEM Dataset on Educational AI Technology Adoption among Students(from 2020...

    • data.mendeley.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Researcher 1 (2025). MASEM Dataset on Educational AI Technology Adoption among Students(from 2020 to June 2025). [Dataset]. http://doi.org/10.17632/t8ns6fdky2.5
    Explore at:
    Dataset updated
    Oct 15, 2025
    Authors
    Researcher 1
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports a meta-analytic structural equation modelling (MASEM) study investigating the factors influencing students’ behavioural intention to use educational AI (EAI) technologies. The research integrates constructs from the Technology Acceptance Model (TAM), Theory of Planned Behaviour (TPB), and Artificial Intelligence Literacy (AIL), aiming to resolve inconsistencies in previous studies and improve theoretical understanding of EAI technology adoption.

    Research Hypotheses The study hypothesized that: Students’ behavioural intention (INT) to use EAI technologies is influenced by perceived usefulness (PU), perceived ease of use (PEU), attitude (ATT), subjective norm (SN), and perceived behavioural control (PBC), as described in TAM and TPB. AI literacy (AIL) directly and indirectly predicts PU, PEU, ATT, and INT. These relationships are moderated by contextual factors such as academic level (K–12 vs. higher education) and regional economic development (developed vs. developing countries).

    What the Data Shows The meta-analytic dataset comprises 166 empirical studies involving over 69,000 participants. It includes pairwise Pearson correlations among seven constructs (PU, PEU, ATT, SN, PBC, INT, AIL) and is used to compute a pooled correlation matrix. This matrix was then used to test three models via MASEM: A baseline TAM-TPB model, An internal-extended model with additional TPB internal paths, An AIL-integrated extended model. The AIL-integrated model achieved the best fit (CFI = 0.997, RMSEA = 0.053) and explained 62.3% of the variance in behavioural intention.

    Notable Findings AI literacy (AIL) is the strongest predictor of intention to use EAI technologies (Total Effect = 0.408). PU, ATT, and SN also significantly influence intention. The effect of PEU on intention is fully mediated by PU and ATT. Moderation analysis showed that the relationships differ between developed and developing countries and between K–12 and higher education populations.

    How the Data Can Be Interpreted and Used The dataset includes bivariate correlations between variables, publication metadata, sample sizes, coding information, and reliability values (e.g., CR scores). Suitable for replication of MASEM procedures, moderation analysis, and meta-regression. Researchers may use it to test additional theoretical models or assess the influence of new moderators (e.g., AI tool type). Educators and policymakers can leverage insights from the meta-analytic results to inform AI literacy training and technology adoption strategies.

  8. f

    Bivariate analysis of IL28B polymorphisms and HTLV-1-associated myelopathy.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Sep 18, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casseb, Jorge; de Oliveira, Augusto Cesar Penalva; Fonseca, Luiz Augusto Marcondes; Assone, Tatiane; do Carmo Luiz, Olinda; Gaester, Karen Oliveira; da Silva Duarte, Alberto Jose; de Toledo Gonçalves, Fernanda; Pinho, João Renato Rebello; Malta, Fernanda; de Souza, Fernando Vieira (2014). Bivariate analysis of IL28B polymorphisms and HTLV-1-associated myelopathy. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001249057
    Explore at:
    Dataset updated
    Sep 18, 2014
    Authors
    Casseb, Jorge; de Oliveira, Augusto Cesar Penalva; Fonseca, Luiz Augusto Marcondes; Assone, Tatiane; do Carmo Luiz, Olinda; Gaester, Karen Oliveira; da Silva Duarte, Alberto Jose; de Toledo Gonçalves, Fernanda; Pinho, João Renato Rebello; Malta, Fernanda; de Souza, Fernando Vieira
    Description

    SNP: Single nucleotide polymorphism.*Four patients without enough samples for IL28b rs8099917 assay.**Six patients without enough samples for IL28b rs12979860 assay.Bivariate analysis of IL28B polymorphisms and HTLV-1-associated myelopathy.

  9. Examples of applying a multivariate Wilson prior to comparative...

    • zenodo.org
    bin, zip
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doeke Hekstra; Doeke Hekstra; Harrison K. Wang; Harrison K. Wang; Kevin M. Dalton; Kevin M. Dalton (2025). Examples of applying a multivariate Wilson prior to comparative crystallography data [Dataset]. http://doi.org/10.5281/zenodo.17082201
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Sep 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Doeke Hekstra; Doeke Hekstra; Harrison K. Wang; Harrison K. Wang; Kevin M. Dalton; Kevin M. Dalton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains four examples of merging crystallographic intensities with a bivariate prior:

    • time-resolved Laue crystallography of the photoactive yellow protein
    • anomalous diffraction from serial XFEL crystallography of thermolysin
    • anomalous diffraction from Laue crystallography of NaI-soaked lysozyme
    • fragment screening monochromatic data of Nsp3 Mac1

    Additionally, we provide several auxilliary examples:

    • For PYP, an example where we set aside a test fraction to semi-independently optimize the double-Wilson r
    • for lysozyme, two examples, one where we use Laue-DIALS instead of precognition, and another where we set aside the first 90 images to semi-independently optimize the double-Wilson r
    • For thermolysin, an example where we use a bivariate versus a univariate prior as the number of scaled images grows, and another where we set aside the first 395 images to semi-independently optimize the double-Wilson r

    Every example includes scripts to run Careless as well as to analyze the outputs in order to reproduce the figures in the double-Wilson manuscript. For every example, there is a `README.md` that describes the contents of each example folder.

  10. d

    Data from: Evaluating accuracy of DNA pool construction based on white blood...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Evaluating accuracy of DNA pool construction based on white blood cell counts [Dataset]. https://catalog.data.gov/dataset/evaluating-accuracy-of-dna-pool-construction-based-on-white-blood-cell-counts-0130b
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Pooling individual samples prior to DNA extraction can mitigate the cost of DNA extraction and genotyping; however, these methods need to accurately generate equal representation of individuals within pools. This data set was generated to determine accuracy of pool construction based on white blood cell counts compared to two common DNA quantification methods. Fifty individual bovine blood samples were collected, and then pooled with all individuals represented in each pool. Pools were constructed with the target of equal representation of each individual animal based on number of white blood cells, spectrophotometric readings, spectrofluorometric readings and whole blood volume with 9 pools per method and a total of 36 pools. Pools and individual samples that comprised the pools were genotyped using a commercially available genotyping array. ASReml was used to estimate variance components for individual animal contribution to pools. The correlation between animal contributions between two pools was estimated using bivariate analysis with starting values set to the result of a univariate analysis. The dataset includes: 1) pooling allele frequencies (PAF) for all pools and individual animals computed from normalized intensities for red (X) and green (Y); PAF = X/(X+Y). 2) Genotypes or number of copies of B(green) allele (0,1,2). 3) Definitions for each sample. Resources in this dataset:Resource Title: Pooling Allele Frequencies (paf) for all pools and individual animals. File Name: pafAnimal.csv.gzResource Description: Pooling Allele Frequencies (paf) for all pools and individual animals computed from normalized intensities for red (X) and green (Y); paf = X / (X + Y)Resource Title: Genotypes for individuals within pools. File Name: g.csv.gzResource Description: Genotypes (number of copies of the B (green) allele (0,1,2)) for individual bovine animals within pools.Resource Title: Sample Definitions . File Name: XY Data Key.xlsxResource Description: Definitions for each sample (both pools and individual animals).

  11. f

    Additional file 2 of Meta-analysis of diagnostic cell-free circulating...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Jun 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sehovic, Emir; Chiorino, Giovanna; Urru, Sara; Doebler, Philipp (2022). Additional file 2 of Meta-analysis of diagnostic cell-free circulating microRNAs for breast cancer detection [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000262918
    Explore at:
    Dataset updated
    Jun 10, 2022
    Authors
    Sehovic, Emir; Chiorino, Giovanna; Urru, Sara; Doebler, Philipp
    Description

    Additional file 2: Table S1. Complete list of exclusion reasons and their frequencies. Table S2. Bivariate generalized linear mixed effect model on all reported models adjusted for covariates. Table S3. Bivariate generalized linear mixed effect model on most important model of each study adjusted for covariates. Table S4. Summary of the bivariate analysis on the all the reported models and its corresponding subgroup analyses. Subgroups marked with an asterix () do not have a large enough model sample size in order for the result to be reliable. Table S5. Summary of the bivariate analysis on the most important model of each study and its corresponding subgroup analyses. Subgroups marked with an asterix () do not have a large enough model sample size in order for the result to be reliable. Table S6. Summary of the univariate (log-DOR) analysis on all the reported models and its corresponding subgroup analysis. Table S7. Summary of the univariate analysis (log-DOR) on the most important model of each study and its corresponding subgroup analysis.

  12. A statistical test and sample size recommendations for comparing community...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John R. Skalski; Shelby M. Richins; Richard L. Townsend (2023). A statistical test and sample size recommendations for comparing community composition following PCA [Dataset]. http://doi.org/10.1371/journal.pone.0206033
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    John R. Skalski; Shelby M. Richins; Richard L. Townsend
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Many investigations of anthropogenic and natural impacts in ecological systems attempt to detect differences in ecological variables or community composition. Frequently, ordination procedures such as principal components analysis (PCA) or canonical correspondence analysis (CCA) are used to simplify such complex data sets into a set of primary factors that express the variation across the original variables. Scatterplots of the first and second principal components are then used to visually inspect for differences in community composition between treatment groups. We present a multidimensional extension of analysis of variance based on an analysis of distance (ANODIS) that can be used to formally test for differences in community composition using 1, 2, or more dimensions of a PCA or CCA of the original sample observations. The statistical tests of significance are based on F-statistics adapted for the analysis of this multidimensional data. Because the analysis is parametric, power and sample size calculations useful in the design of field studies can be readily computed. The use of ANODIS is illustrated using bivariate PCA scatterplots from three published studies. Statistical power calculations using the noncentral F-distribution are illustrated.

  13. f

    Data from: Applying univariate vs. multivariate statistics to investigate...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerber, Susanne; Searle-White, Emily; Todorov, Hristo (2020). Applying univariate vs. multivariate statistics to investigate therapeutic efficacy in (pre)clinical trials: A Monte Carlo simulation study on the example of a controlled preclinical neurotrauma trial [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000493539
    Explore at:
    Dataset updated
    Mar 26, 2020
    Authors
    Gerber, Susanne; Searle-White, Emily; Todorov, Hristo
    Description

    BackgroundSmall sample sizes combined with multiple correlated endpoints pose a major challenge in the statistical analysis of preclinical neurotrauma studies. The standard approach of applying univariate tests on individual response variables has the advantage of simplicity of interpretation, but it fails to account for the covariance/correlation in the data. In contrast, multivariate statistical techniques might more adequately capture the multi-dimensional pathophysiological pattern of neurotrauma and therefore provide increased sensitivity to detect treatment effects.ResultsWe systematically evaluated the performance of univariate ANOVA, Welch’s ANOVA and linear mixed effects models versus the multivariate techniques, ANOVA on principal component scores and MANOVA tests by manipulating factors such as sample and effect size, normality and homogeneity of variance in computer simulations. Linear mixed effects models demonstrated the highest power when variance between groups was equal or variance ratio was 1:2. In contrast, Welch’s ANOVA outperformed the remaining methods with extreme variance heterogeneity. However, power only reached acceptable levels of 80% in the case of large simulated effect sizes and at least 20 measurements per group or moderate effects with at least 40 replicates per group. In addition, we evaluated the capacity of the ordination techniques, principal component analysis (PCA), redundancy analysis (RDA), linear discriminant analysis (LDA), and partial least squares discriminant analysis (PLS-DA) to capture patterns of treatment effects without formal hypothesis testing. While LDA suffered from a high false positive rate due to multicollinearity, PCA, RDA, and PLS-DA were robust and PLS-DA outperformed PCA and RDA in capturing a true treatment effect pattern.ConclusionsMultivariate tests do not provide an appreciable increase in power compared to univariate techniques to detect group differences in preclinical studies. However, PLS-DA seems to be a useful ordination technique to explore treatment effect patterns without formal hypothesis testing.

  14. Air Pollution Forecasting - LSTM Multivariate

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupak Roy/ Bob (2022). Air Pollution Forecasting - LSTM Multivariate [Dataset]. https://www.kaggle.com/datasets/rupakroy/lstm-datasets-multivariate-univariate
    Explore at:
    zip(454764 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Rupak Roy/ Bob
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    THE MISSION

    The story behind the dataset is how to apply LSTM architecture to understand and apply multiple variables together to contribute more accuracy towards forecasting.

    THE CONTENT

    Air Pollution Forecasting The Air Quality dataset.

    This is a dataset that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China.

    The data includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain. The complete feature list in the raw data is as follows:

    No: row number year: year of data in this row month: month of data in this row day: day of data in this row hour: hour of data in this row pm2.5: PM2.5 concentration DEWP: Dew Point TEMP: Temperature PRES: Pressure cbwd: Combined wind direction Iws: Cumulated wind speed Is: Cumulated hours of snow Ir: Cumulated hours of rain We can use this data and frame a forecasting problem where, given the weather conditions and pollution for prior hours, we forecast the pollution at the next hour.

  15. f

    Data from: Statistical Power to Detect Genetic (Co)Variance of Complex...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 10, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee, Sang Hong; Vinkhuyzen, Anna A. E.; Hemani, Gibran; Yang, Jian; Wray, Naomi R.; Visscher, Peter M.; Goddard, Michael E.; Chen, Guo-Bo (2014). Statistical Power to Detect Genetic (Co)Variance of Complex Traits Using SNP Data in Unrelated Samples [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001170155
    Explore at:
    Dataset updated
    Apr 10, 2014
    Authors
    Lee, Sang Hong; Vinkhuyzen, Anna A. E.; Hemani, Gibran; Yang, Jian; Wray, Naomi R.; Visscher, Peter M.; Goddard, Michael E.; Chen, Guo-Bo
    Description

    We have recently developed analysis methods (GREML) to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP) data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (co)variation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases) in particular when the traits (diseases) are not measured on the same samples.

  16. H

    Some Aspects of the Discrete Wavelet Analysis of Bivariate Spectra for...

    • data.niaid.nih.gov
    • dataverse.harvard.edu
    xls
    Updated Oct 4, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joanna Bruzda (2011). Some Aspects of the Discrete Wavelet Analysis of Bivariate Spectra for Business Cycle Synchronisation [Dataset] [Dataset]. http://doi.org/10.7910/DVN/JP6YQZ
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 4, 2011
    Authors
    Joanna Bruzda
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Euro Area
    Description

    The paper considers some of the issues emerging from the discrete wavelet analysis of popular bivariate spectral quantities such as the coherence and phase spectra and the frequency-dependent time delay. The approach utilised here is based on the maximal overlap discrete Hilbert wavelet transform (MODHWT). Firstly, via a broad set of simulation experiments, we examine the small and large sample properties of two wavelet estimators of the scale-dependent time delay. The estimators are the wavelet cross-correlator and the wavelet phase angle-based estimator. Our results provide some practical guidelines for the empirical examination of short- and medium-term lead-lag relations for octave frequency bands. Further, we point out a deficiency in the implementation of the MODHWT and suggest using a modified implementation scheme, which was proposed earlier in the context of the dual-tree complex wavelet transform. In addition, we show how MODHWT-based wavelet quantities can serve to approximate the Fourier bivariate spectra and discuss issues connected with building confidence intervals for them. The discrete wavelet analysis of coherence and phase angle is illustrated with a scale-dependent examination of business cycle synchronisation between 11 euro zone countries. The study is supplemented by a wavelet analysis of the variance and covariance of the euro zone business cycles. The empirical examination underlines the good localisation properties and high computational efficie ncy of the wavelet transformations applied and provides new arguments in favour of the endogeneity hypothesis of the optimum currency area criteria as well as the wavelet evidence on dating the Great Moderation in the euro zone.

  17. R

    BLUPF90 Scripts for Genetic Analysis of Goose Steatosis Traits

    • entrepot.recherche.data.gouv.fr
    bin, sh, txt +1
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Herve Chapuis; Herve Chapuis (2025). BLUPF90 Scripts for Genetic Analysis of Goose Steatosis Traits [Dataset]. http://doi.org/10.57745/ZCJIUH
    Explore at:
    sh(124), bin(11), sh(144), bin(14), bin(1025), sh(118), bin(1478), type/x-r-syntax(1086), bin(441), txt(14)Available download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Herve Chapuis; Herve Chapuis
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    This repository contains example scripts for estimating genetic parameters using the BLUPF90 software suite. The scripts handle up to four traits simultaneously (from the 15 available in the dataset data.txt found at https://doi.org/10.57745/4MI9JN ). script1.sh runs renumf90 using the parameter file renum_ex1.par. This file processes the traits LW, AFW, CW, and BMW. The model includes the effects of animal, sex, and slaughter date. Optional instructions allow blupf90+ to compute variance ratios and their standard errors. script2.sh follows a similar structure but analyzes the traits LW, BW14r, and BW26. In this case, the fixed effects used in the model are different. script3.sh runs a bivariate analysis, using a categorical data (LCAT) to describe the liver. Hence, it calls gibbsf90+ instead of blupf90+. Pay attention to the missing value code, which must be 0. gibbs_samples.R is a R program to read the output from gibbsf90+. One must provide the number of estimated components (here NCOMP = 6) and the program computes the variance ratios and their posterior distributions.

  18. Data from: SGS-LTER Paleopedology Study - soil particle size and grain size...

    • catalog.data.gov
    • search.dataone.org
    • +3more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). SGS-LTER Paleopedology Study - soil particle size and grain size on the Central Plains Experimental Range, Nunn, Colorado, USA 1992- 1993 [Dataset]. https://catalog.data.gov/dataset/sgs-lter-paleopedology-study-soil-particle-size-and-grain-size-on-the-central-plains-1992--12164
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Area covered
    Colorado, Nunn, United States
    Description

    This data package was produced by researchers working on the Shortgrass Steppe Long Term Ecological Research (SGS-LTER) Project, administered at Colorado State University. Long-term datasets and background information (proposals, reports, photographs, etc.) on the SGS-LTER project are contained in a comprehensive project collection within the Digital Collections of Colorado (http://digitool.library.colostate.edu/R/?func=collections&collection_id=3429). The data table and associated metadata document, which is generated in Ecological Metadata Language, may be available through other repositories serving the ecological research community and represent components of the larger SGS-LTER project collection. CPER Paleopedology Study – Particle and Grain Size - Grain size data from 39 pedons were compared with modal fluvial (7) and eolian (3) samples in order to characterize the origin of CPER parent materials and distinguish the origin of CPER geomorphic features. The seven fluvial sites were located along Owl and Eastman Creeks. The three eolian sites were located on the nearest undisputed dune fields located approximately 5 km north of Roggen, CO (Muhs, 1985). For statistical analysis, the sand and coarse silt fractions were shaken in a nest of half phi(0) interval sieves ranging from -1.0 0 (10 mesh) to 4.5 0 (325 mesh) for 3 minutes. Phi intervals (-log2) were utilized to normalize the particle size data for use in conventional statistics (Krumbein, 1934). The silt and clay fractions were separated by sedimentation using the pipette method. Statistical methods adopted from Folk and Ward (1957) were applied to the -1.0 0 to 7.0 0 fractions using the Sedimentary Petrology Computer Program SEDPET (Warner, 1970) to determine mean grain size (Mz), sorting (Iz), skewness (Skz), and kurtosis (Kz). These parameters were then subjected to univariate and bivariate analysis. The clay fraction was not included in the statistical computations to avoid excessively fine skewing the sample. Additional information and referenced materials can be found: http://hdl.handle.net/10217/85625. Resources in this dataset:Resource Title: Website Pointer to html file. File Name: Web Page, url: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-sgs&identifier=168 Webpage with information and links to data files for download

  19. Data from: An analysis of a survey for child sexual abuse material seekers...

    • data.europa.eu
    • data.niaid.nih.gov
    • +1more
    unknown
    Updated Jan 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). An analysis of a survey for child sexual abuse material seekers on Tor (N = 11,470) [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-10457588?locale=pl
    Explore at:
    unknown(108651)Available download formats
    Dataset updated
    Jan 10, 2024
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We analyse the responses of users who searched for child sexual abuse material (CSAM) on Tor web search engines. We analyse responses from all participants who answered our 'Help us to help you' survey from 5 May 2021 to 28 February 2023 (N = 11,470) and compare the tendencies and habits of people who searched for CSAM. The 'Help us to help you' survey consists of 32 questions, takes about 15 to 20 minutes to complete, and participants receive no compensation. We ask CSAM users about their thoughts, feelings, and actions related to their use of CSAM so that we can build a cognitive behavioural therapy-based anonymous rehabilitation programme for CSAM users. For this study, we analysed responses to 12 survey questions. All 12 questions are single-answer questions, i.e., the respondent is asked to pick one option from a predetermined list of answer options. We may potentially be targeting a specific population due to the fact that the demographics of Tor users are probably not representative of all internet users. The participants in the sample are Tor users who (i) conducted a search for CSAM and (ii) opted to complete the survey; thus, they constitute a convenience sample. We analysed the data with both univariate and bivariate methods. The analyses in the main part of the text mainly describe the population seeking CSAM material on the Tor network, and these results provide a point of comparison to our other results. The bivariate analyses, on the other hand, deepen the picture of the factors associated with help-seeking for CSAM use. In these analyses, the outcome variable is based on help-seeking, whereas we selected the independent variables to measure both the intensity of CSAM use and the effects of CSAM use on the users themselves.

  20. Fama–French Factors and Portfolios

    • kaggle.com
    zip
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Manaenkov (2025). Fama–French Factors and Portfolios [Dataset]. https://www.kaggle.com/datasets/nikitamanaenkov/famafrench-factors-and-portfolios
    Explore at:
    zip(177539895 bytes)Available download formats
    Dataset updated
    Oct 30, 2025
    Authors
    Nikita Manaenkov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides foundational factor and portfolio return data used in empirical finance and asset pricing research. It contains: - Fama–French 3-Factor and 5-Factor models - Size (ME), Book-to-Market (B/M), Operating Profitability (OP), and Investment (Inv) portfolios - Bivariate portfolios (e.g., 2x3 Size-B/M sorts) - Industry portfolio returns All data originate from the Kenneth R. French Data Library and are based on CRSP and Compustat databases. Data are value-weighted and expressed in percentages.

    Some files in this dataset contain header comments describing data sources and methodology (as shown below):

    This file was created using the 202508 CRSP database.
    The 1-month TBill rate data until 202405 are from Ibbotson Associates. 
    Starting from 202406, the 1-month TBill rate is from ICE BofA US 1-Month Treasury Bill Index.
    

    To correctly read such files in Python (pandas), use the comment parameter — it automatically ignores all lines starting with a specific symbol (e.g., none here, so you can skip manually):

    Example 1 — Automatically detect header rows:

    import pandas as pd
    
    # Detect the first numeric line to find where data starts
    file_path = "F-F_Research_Data_5_Factors_2x3.csv"
    
    with open(file_path) as f:
      lines = f.readlines()
    
    # Find where the header line (column names) appears
    for i, line in enumerate(lines):
      if "Mkt-RF" in line:
        skip_rows = i
        break
    
    df = pd.read_csv(file_path, skiprows=skip_rows, sep=r"\s+")
    print(df.head())
    

    Example 2 — Skip a known number of comment lines manually:

    df = pd.read_csv("F-F_Research_Data_5_Factors_2x3.csv", skiprows=3, sep=r"\s+")
    

    Example 3 — If comments are prefixed (e.g., with #):

    df = pd.read_csv("F-F_Research_Data_5_Factors_2x3.csv", comment="#", sep=",")
    

    File Structure Example

    ColumnDescription
    Mkt-RFMarket excess return
    SMBSmall minus Big (size factor)
    HMLHigh minus Low (book-to-market factor)
    RMWRobust minus Weak (profitability factor)
    CMAConservative minus Aggressive (investment factor)
    RFRisk-free rate (1-month Treasury Bill)
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lei Zhang; Yu-Fang Pei; Jian Li; Christopher J. Papasian; Hong-Wen Deng (2023). Power of Bivariate vs. Univariate Analyses for the Combined Data of Unrelated Samples and Nuclear Families (One Binary Trait and One Continuous Trait). [Dataset]. http://doi.org/10.1371/journal.pone.0006502.t007
Organization logo

Power of Bivariate vs. Univariate Analyses for the Combined Data of Unrelated Samples and Nuclear Families (One Binary Trait and One Continuous Trait).

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Lei Zhang; Yu-Fang Pei; Jian Li; Christopher J. Papasian; Hong-Wen Deng
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Notes: Three population structures are considered. For the binary trait, the OR value ranges from 12 to 1.5. For the continuous trait, the contribution of the causal site ranges from 0.0025 to 0.01. Powers are estimated on 1,000 replicates. See notes in Table 1 for sample sizes.Abbreviations: T12, the proposed test for bivariate analysis; T1, the proposed test for only the first trait; T2, the proposed test for only the second trait.

Search
Clear search
Close search
Google apps
Main menu