34 datasets found
  1. f

    An example of combining ANOVA terms for bivariate principle component data...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Skalski, John R.; Townsend, Richard L.; Richins, Shelby M. (2018). An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000666955
    Explore at:
    Dataset updated
    Oct 24, 2018
    Authors
    Skalski, John R.; Townsend, Richard L.; Richins, Shelby M.
    Description

    An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared.

  2. Data from: Distance Covariance, Independence, and Pairwise Differences

    • tandf.figshare.com
    txt
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakob Raymaekers; Peter J. Rousseeuw (2025). Distance Covariance, Independence, and Pairwise Differences [Dataset]. http://doi.org/10.6084/m9.figshare.26169340.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Jakob Raymaekers; Peter J. Rousseeuw
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Distance covariance (Székely, Rizzo, and Bakirov) is a fascinating recent notion, which is popular as a test for dependence of any type between random variables X and Y. This approach deserves to be touched upon in modern courses on mathematical statistics. It makes use of distances of the type |X−X′| and |Y−Y′|, where (X′,Y′) is an independent copy of (X, Y). This raises natural questions about independence of variables like X−X′ and Y−Y′, about the connection between cov(|X−X′|,|Y−Y′|) and the covariance between doubly centered distances, and about necessary and sufficient conditions for independence. We show some basic results and present a new and nontechnical counterexample to a common fallacy, which provides more insight. We also show some motivating examples involving bivariate distributions and contingency tables, which can be used as didactic material for introducing distance correlation.

  3. Survey Data of the socio-demographic, economic and water source types that...

    • zenodo.org
    • datadryad.org
    bin, csv
    Updated Jun 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shewayiref Geremew Gebremichael; Shewayiref Geremew Gebremichael (2022). Survey Data of the socio-demographic, economic and water source types that influences HHs drinking water supply [Dataset]. http://doi.org/10.5061/dryad.mw6m905w8
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jun 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shewayiref Geremew Gebremichael; Shewayiref Geremew Gebremichael
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Background: Clean water is an essential part of human healthy life and wellbeing. More recently, rapid population growth, high illiteracy rate, lack of sustainable development, and climate change; faces a global challenge in developing countries. The discontinuity of drinking water supply forces households either to use unsafe water storage materials or to use water from unsafe sources. The present study aimed to identify the determinants of water source types, use, quality of water, and sanitation perception of physical parameters among urban households in North-West Ethiopia.

    Methods: A community-based cross-sectional study was conducted among households from February to March 2019. An interview-based a pretested and structured questionnaire was used to collect the data. Data collection samples were selected randomly and proportional to each of the kebeles' households. MS Excel and R Version 3.6.2 were used to enter and analyze the data; respectively. Descriptive statistics using frequencies and percentages were used to explain the sample data concerning the predictor variable. Both bivariate and multivariate logistic regressions were used to assess the association between independent and response variables.

    Results: Four hundred eighteen (418) households have participated. Based on the study undertaken,78.95% of households used improved and 21.05% of households used unimproved drinking water sources. Households drinking water sources were significantly associated with the age of the participant (x2 = 20.392, df=3), educational status(x2 = 19.358, df=4), source of income (x2 = 21.777, df=3), monthly income (x2 = 13.322, df=3), availability of additional facilities (x2 = 98.144, df=7), cleanness status (x2 =42.979, df=4), scarcity of water (x2 = 5.1388, df=1) and family size (x2 = 9.934, df=2). The logistic regression analysis also indicated that those factors are significantly determining the water source types used by the households. Factors such as availability of toilet facility, household member type, and sex of the head of the household were not significantly associated with drinking water sources.

    Conclusion: The uses of drinking water from improved sources were determined by different demographic, socio-economic, sanitation, and hygiene-related factors. Therefore, ; the local, regional, and national governments and other supporting organizations shall improve the accessibility and adequacy of drinking water from improved sources in the area.

  4. Example of data.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel (2023). Example of data. [Dataset]. http://doi.org/10.1371/journal.pone.0159649.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example of data.

  5. Examples of applying a multivariate Wilson prior to comparative...

    • zenodo.org
    bin, zip
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doeke Hekstra; Doeke Hekstra; Harrison K. Wang; Harrison K. Wang; Kevin M. Dalton; Kevin M. Dalton (2025). Examples of applying a multivariate Wilson prior to comparative crystallography data [Dataset]. http://doi.org/10.5281/zenodo.17082201
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Sep 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Doeke Hekstra; Doeke Hekstra; Harrison K. Wang; Harrison K. Wang; Kevin M. Dalton; Kevin M. Dalton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains four examples of merging crystallographic intensities with a bivariate prior:

    • time-resolved Laue crystallography of the photoactive yellow protein
    • anomalous diffraction from serial XFEL crystallography of thermolysin
    • anomalous diffraction from Laue crystallography of NaI-soaked lysozyme
    • fragment screening monochromatic data of Nsp3 Mac1

    Additionally, we provide several auxilliary examples:

    • For PYP, an example where we set aside a test fraction to semi-independently optimize the double-Wilson r
    • for lysozyme, two examples, one where we use Laue-DIALS instead of precognition, and another where we set aside the first 90 images to semi-independently optimize the double-Wilson r
    • For thermolysin, an example where we use a bivariate versus a univariate prior as the number of scaled images grows, and another where we set aside the first 395 images to semi-independently optimize the double-Wilson r

    Every example includes scripts to run Careless as well as to analyze the outputs in order to reproduce the figures in the double-Wilson manuscript. For every example, there is a `README.md` that describes the contents of each example folder.

  6. f

    Data from: A Graphical Goodness-of-Fit Test for Dependence Models in Higher...

    • tandf.figshare.com
    application/gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marius Hofert; Martin Mächler (2023). A Graphical Goodness-of-Fit Test for Dependence Models in Higher Dimensions [Dataset]. http://doi.org/10.6084/m9.figshare.1067049.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Marius Hofert; Martin Mächler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article introduces a graphical goodness-of-fit test for copulas in more than two dimensions. The test is based on pairs of variables and can thus be interpreted as a first-order approximation of the underlying dependence structure. The idea is to first transform pairs of data columns with the Rosenblatt transform to bivariate standard uniform distributions under the null hypothesis. This hypothesis can be graphically tested with a matrix of bivariate scatterplots, Q-Q plots, or other transformations. Furthermore, additional information can be encoded as background color, such as measures of association or (approximate) p-values of tests of independence. The proposed goodness-of-fit test is designed as a basic graphical tool for detecting deviations from a postulated, possibly high-dimensional, dependence model. Various examples are given and the methodology is applied to a financial dataset. An implementation is provided by the R package copula. Supplementary material for this article is available online, which provides the R package copula and reproduces all the graphical results of this article.

  7. p

    Music & Affect 2020 Dataset Study 2.csv

    • psycharchives.org
    Updated Sep 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Music & Affect 2020 Dataset Study 2.csv [Dataset]. https://www.psycharchives.org/handle/20.500.12034/3089
    Explore at:
    Dataset updated
    Sep 17, 2020
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset for: Leipold, B. & Loepthien, T. (2021). Attentive and emotional listening to music: The role of positive and negative affect. Jahrbuch Musikpsychologie, 30. https://doi.org/10.5964/jbdgm.78 In a cross-sectional study associations of global affect with two ways of listening to music – attentive–analytical listening (AL) and emotional listening (EL) were examined. More specifically, the degrees to which AL and EL are differentially correlated with positive and negative affect were examined. In Study 1, a sample of 1,291 individuals responded to questionnaires on listening to music, positive affect (PA), and negative affect (NA). We used the PANAS that measures PA and NA as high arousal dimensions. AL was positively correlated with PA, EL with NA. Moderation analyses showed stronger associations between PA and AL when NA was low. Study 2 (499 participants) differentiated between three facets of affect and focused, in addition to PA and NA, on the role of relaxation. Similar to the findings of Study 1, AL was correlated with PA, EL with NA and PA. Moderation analyses indicated that the degree to which PA is associated with an individual´s tendency to listen to music attentively depends on their degree of relaxation. In addition, the correlation between pleasant activation and EL was stronger for individuals who were more relaxed; for individuals who were less relaxed the correlation between unpleasant activation and EL was stronger. In sum, the results demonstrate not only simple bivariate correlations, but also that the expected associations vary, depending on the different affective states. We argue that the results reflect a dual function of listening to music, which includes emotional regulation and information processing.: Dataset Study 2

  8. Data from: Bivariate Residual Plots With Simulation Polygons

    • tandf.figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael A. Moral; John Hinde; Clarice G. B. Demétrio (2023). Bivariate Residual Plots With Simulation Polygons [Dataset]. http://doi.org/10.6084/m9.figshare.9116864
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Rafael A. Moral; John Hinde; Clarice G. B. Demétrio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    When using univariate models, goodness of fit can be assessed through many different methods, including graphical tools such as half-normal plots with a simulation envelope. This is straightforward due to the notion of ordering of a univariate sample, which can readily reveal possible outliers. In the bivariate case, however, it is often difficult to detect extreme points and verify whether a sample of residuals is a reasonable realization from a fitted model. We propose a new framework, implemented as the bivrp R package, available on CRAN. Our framework uses the same principles of the simulation envelope in a half-normal plot, but as a simulation polygon for each point in a bivariate sample. By using algorithms of convex hull construction and polygon area reduction, we describe how our method works and illustrate its functionality with examples using simulated bivariate normal data and real bivariate count data. We show how different model diagnostics can produce different results and pinpoint potential drawbacks of our approach, such as the limitations in terms of computational burden. Supplementary materials for this article are available online.

  9. m

    MASEM Dataset on Educational AI Technology Adoption among Students(from 2020...

    • data.mendeley.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Researcher 1 (2025). MASEM Dataset on Educational AI Technology Adoption among Students(from 2020 to June 2025). [Dataset]. http://doi.org/10.17632/t8ns6fdky2.5
    Explore at:
    Dataset updated
    Oct 15, 2025
    Authors
    Researcher 1
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports a meta-analytic structural equation modelling (MASEM) study investigating the factors influencing students’ behavioural intention to use educational AI (EAI) technologies. The research integrates constructs from the Technology Acceptance Model (TAM), Theory of Planned Behaviour (TPB), and Artificial Intelligence Literacy (AIL), aiming to resolve inconsistencies in previous studies and improve theoretical understanding of EAI technology adoption.

    Research Hypotheses The study hypothesized that: Students’ behavioural intention (INT) to use EAI technologies is influenced by perceived usefulness (PU), perceived ease of use (PEU), attitude (ATT), subjective norm (SN), and perceived behavioural control (PBC), as described in TAM and TPB. AI literacy (AIL) directly and indirectly predicts PU, PEU, ATT, and INT. These relationships are moderated by contextual factors such as academic level (K–12 vs. higher education) and regional economic development (developed vs. developing countries).

    What the Data Shows The meta-analytic dataset comprises 166 empirical studies involving over 69,000 participants. It includes pairwise Pearson correlations among seven constructs (PU, PEU, ATT, SN, PBC, INT, AIL) and is used to compute a pooled correlation matrix. This matrix was then used to test three models via MASEM: A baseline TAM-TPB model, An internal-extended model with additional TPB internal paths, An AIL-integrated extended model. The AIL-integrated model achieved the best fit (CFI = 0.997, RMSEA = 0.053) and explained 62.3% of the variance in behavioural intention.

    Notable Findings AI literacy (AIL) is the strongest predictor of intention to use EAI technologies (Total Effect = 0.408). PU, ATT, and SN also significantly influence intention. The effect of PEU on intention is fully mediated by PU and ATT. Moderation analysis showed that the relationships differ between developed and developing countries and between K–12 and higher education populations.

    How the Data Can Be Interpreted and Used The dataset includes bivariate correlations between variables, publication metadata, sample sizes, coding information, and reliability values (e.g., CR scores). Suitable for replication of MASEM procedures, moderation analysis, and meta-regression. Researchers may use it to test additional theoretical models or assess the influence of new moderators (e.g., AI tool type). Educators and policymakers can leverage insights from the meta-analytic results to inform AI literacy training and technology adoption strategies.

  10. Fama–French Factors and Portfolios

    • kaggle.com
    zip
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Manaenkov (2025). Fama–French Factors and Portfolios [Dataset]. https://www.kaggle.com/datasets/nikitamanaenkov/famafrench-factors-and-portfolios
    Explore at:
    zip(177539895 bytes)Available download formats
    Dataset updated
    Oct 30, 2025
    Authors
    Nikita Manaenkov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides foundational factor and portfolio return data used in empirical finance and asset pricing research. It contains: - Fama–French 3-Factor and 5-Factor models - Size (ME), Book-to-Market (B/M), Operating Profitability (OP), and Investment (Inv) portfolios - Bivariate portfolios (e.g., 2x3 Size-B/M sorts) - Industry portfolio returns All data originate from the Kenneth R. French Data Library and are based on CRSP and Compustat databases. Data are value-weighted and expressed in percentages.

    Some files in this dataset contain header comments describing data sources and methodology (as shown below):

    This file was created using the 202508 CRSP database.
    The 1-month TBill rate data until 202405 are from Ibbotson Associates. 
    Starting from 202406, the 1-month TBill rate is from ICE BofA US 1-Month Treasury Bill Index.
    

    To correctly read such files in Python (pandas), use the comment parameter — it automatically ignores all lines starting with a specific symbol (e.g., none here, so you can skip manually):

    Example 1 — Automatically detect header rows:

    import pandas as pd
    
    # Detect the first numeric line to find where data starts
    file_path = "F-F_Research_Data_5_Factors_2x3.csv"
    
    with open(file_path) as f:
      lines = f.readlines()
    
    # Find where the header line (column names) appears
    for i, line in enumerate(lines):
      if "Mkt-RF" in line:
        skip_rows = i
        break
    
    df = pd.read_csv(file_path, skiprows=skip_rows, sep=r"\s+")
    print(df.head())
    

    Example 2 — Skip a known number of comment lines manually:

    df = pd.read_csv("F-F_Research_Data_5_Factors_2x3.csv", skiprows=3, sep=r"\s+")
    

    Example 3 — If comments are prefixed (e.g., with #):

    df = pd.read_csv("F-F_Research_Data_5_Factors_2x3.csv", comment="#", sep=",")
    

    File Structure Example

    ColumnDescription
    Mkt-RFMarket excess return
    SMBSmall minus Big (size factor)
    HMLHigh minus Low (book-to-market factor)
    RMWRobust minus Weak (profitability factor)
    CMAConservative minus Aggressive (investment factor)
    RFRisk-free rate (1-month Treasury Bill)
  11. f

    Data from: Applying univariate vs. multivariate statistics to investigate...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerber, Susanne; Searle-White, Emily; Todorov, Hristo (2020). Applying univariate vs. multivariate statistics to investigate therapeutic efficacy in (pre)clinical trials: A Monte Carlo simulation study on the example of a controlled preclinical neurotrauma trial [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000493539
    Explore at:
    Dataset updated
    Mar 26, 2020
    Authors
    Gerber, Susanne; Searle-White, Emily; Todorov, Hristo
    Description

    BackgroundSmall sample sizes combined with multiple correlated endpoints pose a major challenge in the statistical analysis of preclinical neurotrauma studies. The standard approach of applying univariate tests on individual response variables has the advantage of simplicity of interpretation, but it fails to account for the covariance/correlation in the data. In contrast, multivariate statistical techniques might more adequately capture the multi-dimensional pathophysiological pattern of neurotrauma and therefore provide increased sensitivity to detect treatment effects.ResultsWe systematically evaluated the performance of univariate ANOVA, Welch’s ANOVA and linear mixed effects models versus the multivariate techniques, ANOVA on principal component scores and MANOVA tests by manipulating factors such as sample and effect size, normality and homogeneity of variance in computer simulations. Linear mixed effects models demonstrated the highest power when variance between groups was equal or variance ratio was 1:2. In contrast, Welch’s ANOVA outperformed the remaining methods with extreme variance heterogeneity. However, power only reached acceptable levels of 80% in the case of large simulated effect sizes and at least 20 measurements per group or moderate effects with at least 40 replicates per group. In addition, we evaluated the capacity of the ordination techniques, principal component analysis (PCA), redundancy analysis (RDA), linear discriminant analysis (LDA), and partial least squares discriminant analysis (PLS-DA) to capture patterns of treatment effects without formal hypothesis testing. While LDA suffered from a high false positive rate due to multicollinearity, PCA, RDA, and PLS-DA were robust and PLS-DA outperformed PCA and RDA in capturing a true treatment effect pattern.ConclusionsMultivariate tests do not provide an appreciable increase in power compared to univariate techniques to detect group differences in preclinical studies. However, PLS-DA seems to be a useful ordination technique to explore treatment effect patterns without formal hypothesis testing.

  12. n

    Data from: Identifying stationary phases in multivariate time series for...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Dec 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rémi Patin; Marie-Pierre Etienne; Emilie Lebarbier; Simon Benhamou; Simon Chamaillé‐Jammes (2019). Identifying stationary phases in multivariate time series for highlighting behavioural modes and home range settlements [Dataset]. http://doi.org/10.5061/dryad.2j63369
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 2, 2019
    Dataset provided by
    ,
    Institut de recherche mathématique de Rennes
    Centre d'Écologie Fonctionnelle et Évolutive
    Authors
    Rémi Patin; Marie-Pierre Etienne; Emilie Lebarbier; Simon Benhamou; Simon Chamaillé‐Jammes
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Hwange National Park
    Description
    1. Recent advances in bio-logging open promising perspectives in the study of animal movements at numerous scales. It is now possible to record time-series of animal locations and ancillary data (e.g. activity level derived from on-board accelerometers) over extended areas and long durations with a high spatial and temporal resolution. Such time-series are often piecewise stationary, as the animal may alternate between different stationary phases (i.e. characterised by a specific mean and variance of some key parameter for limited periods). Identifying when these phases start and end is a critical first step to understand the dynamics of the underlying movement processes.
    2. We introduce a new segmentation-clustering method we called segclust2d (available as a R package at cran.r-project.org/package=segclust2d). It can segment bi- (or more generally multi-) variate time-series and possibly cluster the various segments obtained, corresponding to different phases assumed to be stationary. This method is easy to use, as it only requires specifying a minimum segment length (to prevent over-segmentation), based on biological rather than statistical considerations.
    3. This method can be applied to bivariate piecewise time-series of any nature. We focus here on two types of time-series related to animal movement, corresponding to (i) at large scale, series of bivariate coordinates of relocations, to highlight temporary home ranges, and (ii) at smaller scale, bivariate series derived from relocations data, such as speed and turning angle, to highlight different behavioural modes such as transit, feeding and resting.
    4. Using computer simulations, we show that segclust2d can rival and even outperform previous, more complex methods, which were specifically developed to highlight changes in movement modes or home range shifts (based on Hidden Markov and Ornstein-Uhlenbeck modelling), which, contrary to our method, usually require the user to provide relevant initial guesses to be efficient. Furthermore we demonstrate it on actual examples involving a zebra's small scale movements and an elephant's large scale movements, to illustrate how various movement modes and home range shifts, respectively, can be identified. 15-Aug-2019
  13. R

    BLUPF90 Scripts for Genetic Analysis of Goose Steatosis Traits

    • entrepot.recherche.data.gouv.fr
    bin, sh, txt +1
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Herve Chapuis; Herve Chapuis (2025). BLUPF90 Scripts for Genetic Analysis of Goose Steatosis Traits [Dataset]. http://doi.org/10.57745/ZCJIUH
    Explore at:
    sh(124), bin(11), sh(144), bin(14), bin(1025), sh(118), bin(1478), type/x-r-syntax(1086), bin(441), txt(14)Available download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Herve Chapuis; Herve Chapuis
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    This repository contains example scripts for estimating genetic parameters using the BLUPF90 software suite. The scripts handle up to four traits simultaneously (from the 15 available in the dataset data.txt found at https://doi.org/10.57745/4MI9JN ). script1.sh runs renumf90 using the parameter file renum_ex1.par. This file processes the traits LW, AFW, CW, and BMW. The model includes the effects of animal, sex, and slaughter date. Optional instructions allow blupf90+ to compute variance ratios and their standard errors. script2.sh follows a similar structure but analyzes the traits LW, BW14r, and BW26. In this case, the fixed effects used in the model are different. script3.sh runs a bivariate analysis, using a categorical data (LCAT) to describe the liver. Hence, it calls gibbsf90+ instead of blupf90+. Pay attention to the missing value code, which must be 0. gibbs_samples.R is a R program to read the output from gibbsf90+. One must provide the number of estimated components (here NCOMP = 6) and the program computes the variance ratios and their posterior distributions.

  14. Roadside Noise Level Dataset with Labels

    • kaggle.com
    zip
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    荒川由人 (2025). Roadside Noise Level Dataset with Labels [Dataset]. https://www.kaggle.com/datasets/arakawayuito/roadside-noise-level-dataset-with-labels
    Explore at:
    zip(665030 bytes)Available download formats
    Dataset updated
    Feb 5, 2025
    Authors
    荒川由人
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset is a univariate time-series dataset that records environmental noise levels along roads. It can be used for anomaly detection and forecasting tasks. The dataset includes numerical noise level data along with corresponding anomaly labels.

    📉**Data Details**

    • Number of Samples: 417,000
    • Sampling Period: 200ms (Noise levels were recorded at 200-millisecond intervals.)
    • Column Information:
      • noise level (float64): Noise level (dB)
      • label (int64): Anomaly label
      • 0: Road traffic noise (normal traffic noise, not an anomaly)
      • 1: Non-road traffic noise (classified as anomalous noise, e.g., bird chirping, construction noise, sirens, etc.)

    In this dataset, normal road traffic noise is assigned the label 0, while other anomalous sounds (non-road traffic noise) are assigned the label 1. This dataset can be used for noise analysis and anomaly detection in accordance with environmental standards.

    Note:The teacher labels of the noise level data may not fully reflect fine variations in sound, potentially containing some degree of error. For example, even within a segment labeled as an anomaly, there may be a mix of periods when the anomalous sound is actually present and when it is absent.

    💻**Usage**

    This dataset can be utilized in the following research and experimental applications:

    Time-Series Forecasting

    • Predicting future noise levels using past noise data
    • Can be used as training data for time-series models such as LSTM, Transformer, ARIMA, etc.

    Anomaly Detection

    • Classifying normal road traffic noise and anomalous sounds
    • Enables the construction of anomaly detection models using label 0 (normal noise) and label 1 (anomalous noise)

    Environmental Noise Analysis

    • Analyzing variation patterns of urban road noise
    • Data analysis for noise regulations and environmental standards

    📋**Data Format**

    • Filename: noise_level_data.csv
    • Format: CSV
    • Features per sample:
      • Each row contains one noise level value and its corresponding label
      • Consecutive data points can be treated as a time series

    💡**Usage Example**

    import pandas as pd
    # Load the data
    df = pd.read_csv("noise_level_data.csv")
    # Check the first few rows
    print(df.head())
    
  15. d

    Data from: Testing the association of phenotypes with polyploidy: An example...

    • datadryad.org
    • search.dataone.org
    • +1more
    zip
    Updated Feb 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosana Zenil-Ferguson; José M. Ponciano; J. Gordon Burleigh (2017). Testing the association of phenotypes with polyploidy: An example using herbaceous and woody eudicots [Dataset]. http://doi.org/10.5061/dryad.6g2c7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2017
    Dataset provided by
    Dryad
    Authors
    Rosana Zenil-Ferguson; José M. Ponciano; J. Gordon Burleigh
    Time period covered
    Feb 24, 2017
    Description

    BiChroM Raw R Files1. Dataset and tree 2. Raw R files for optimizations 3. Full model optimizations 4. Reduced model optimizations 5. Profile rhoH and Profile rhoW 6. Bivariate profile rhoqH and Bivariate profile rhoqW 7. Raw R files for simulations 8. Simulations number of taxa 9. Simulations for tree heightBiChroMRawRfiles.zip

  16. Data from: Robust Maximum Association Estimators

    • search.datacite.org
    • tandf.figshare.com
    Updated Sep 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taylor & Francis (2021). Robust Maximum Association Estimators [Dataset]. http://doi.org/10.6084/m9.figshare.2082718.v1
    Explore at:
    Dataset updated
    Sep 29, 2021
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Taylor & Francis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The maximum association between two multivariate variables X and Y is defined as the maximal value that a bivariate association measure between one-dimensional projections αtX and βtY can attain. Taking the Pearson correlation as projection index results in the first canonical correlation coefficient. We propose to use more robust association measures, such as Spearman’s or Kendall’s rank correlation, or association measures derived from bivariate scatter matrices. We study the robustness of the proposed maximum association measures and the corresponding estimators of the coefficients yielding the maximum association. In the important special case of Y being univariate, maximum rank correlation estimators yield regression estimators that are invariant against monotonic transformations of the response. We obtain asymptotic variances for this special case. It turns out that maximum rank correlation estimators combine good efficiency and robustness properties. Simulations and a real data example illustrate the robustness and the power for handling nonlinear relationships of these estimators. Supplementary materials for this article are available online.

  17. s

    Citation Trends for "Interpreting Measured Serial Correlation In Univariate...

    • shibatadb.com
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2025). Citation Trends for "Interpreting Measured Serial Correlation In Univariate Time Series Analysis, With An Example From The New York Stock Exchange" [Dataset]. https://www.shibatadb.com/article/5sdxUfa9
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    1995 - 1996
    Area covered
    New York
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Interpreting Measured Serial Correlation In Univariate Time Series Analysis, With An Example From The New York Stock Exchange".

  18. Z

    Data from: Evaluating Window Size Effects on Univariate Time Series...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Freitas, João David; Ponte, Caio; Bomfim, Rafael; Caminha, Carlos (2024). Data from: Evaluating Window Size Effects on Univariate Time Series Forecasting with Machine Learning [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_12665354
    Explore at:
    Dataset updated
    Jul 5, 2024
    Dataset provided by
    Universidade de Fortaleza
    Universidade Federal do Ceará
    Authors
    Freitas, João David; Ponte, Caio; Bomfim, Rafael; Caminha, Carlos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the realm of time series prediction modeling, the window size (w) is a critical hyperparameter that determines the number of time units included in each example provided to a learning model. This hyperparameter is crucial because it allows the learning model to recognize both long-term and short-term trends, as well as seasonal patterns, while reducing sensitivity to random noise. This study aims to elucidate the impact of window size on the performance of machine learning algorithms in univariate time series forecasting tasks. To achieve this, we employed 40 time series from two different domains, conducting experiments with varying window sizes using four types of machine learning algorithms: Bagging, Boosting, Stacking, and a Recurrent Neural Network (RNN) architecture. The results reveal that increasing the window size generally enhances the evaluation metric values up to a stabilization point, beyond which further increases do not significantly improve predictive accuracy. This stabilization effect was observed in both domains when w values exceeded 100 time steps. Moreover, the study found that RNN architectures do not consistently outperform ensemble models in various univariate time series forecasting scenarios.

  19. Air Pollution Forecasting - LSTM Multivariate

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupak Roy/ Bob (2022). Air Pollution Forecasting - LSTM Multivariate [Dataset]. https://www.kaggle.com/datasets/rupakroy/lstm-datasets-multivariate-univariate
    Explore at:
    zip(454764 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Rupak Roy/ Bob
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    THE MISSION

    The story behind the dataset is how to apply LSTM architecture to understand and apply multiple variables together to contribute more accuracy towards forecasting.

    THE CONTENT

    Air Pollution Forecasting The Air Quality dataset.

    This is a dataset that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China.

    The data includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain. The complete feature list in the raw data is as follows:

    No: row number year: year of data in this row month: month of data in this row day: day of data in this row hour: hour of data in this row pm2.5: PM2.5 concentration DEWP: Dew Point TEMP: Temperature PRES: Pressure cbwd: Combined wind direction Iws: Cumulated wind speed Is: Cumulated hours of snow Ir: Cumulated hours of rain We can use this data and frame a forecasting problem where, given the weather conditions and pollution for prior hours, we forecast the pollution at the next hour.

  20. Youth Justice Policy Environments and Their Effects on Youth Confinement...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Juvenile Justice and Delinquency Prevention (2025). Youth Justice Policy Environments and Their Effects on Youth Confinement Rates, United States, 1996-2016 [Dataset]. https://catalog.data.gov/dataset/youth-justice-policy-environments-and-their-effects-on-youth-confinement-rates-united-1996-2a380
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    Office of Juvenile Justice and Delinquency Preventionhttp://ojjdp.gov/
    Area covered
    United States
    Description

    This study was conducted to address the dropping rates in residential placements of adjudicated youth after the 1990s. Policymakers, advocates, and reseraches began to attirbute the decline to reform measures and proposed that this was the cause of the drop seen in historic national crime. In response, researchers set out to use state-level data on economic factors, crime rates, political ideology scores, and youth justice policies and practices to test the association between the youth justice policy environment and recent reductions in out-of-home placements for adjudicated youth. This data collection contains two files, a multivariate and bivariate analyses. In the multivariate file the aim was to assess the impact of the progressive policy characteristics on the dependent variable which is known as youth confinement. In the bivariate analyses file Wave 1-Wave 10 the aim was to assess the states as they are divided into 2 groups across all 16 dichotomized variables that comprised the progressive policy scale: those with more progressive youth justice environments and those with less progressive or punitive environments. Some examples of these dichotomized variables include purpose clause, courtroom shackling, and competency standard.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Skalski, John R.; Townsend, Richard L.; Richins, Shelby M. (2018). An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000666955

An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared.

Explore at:
Dataset updated
Oct 24, 2018
Authors
Skalski, John R.; Townsend, Richard L.; Richins, Shelby M.
Description

An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared.

Search
Clear search
Close search
Google apps
Main menu