100+ datasets found
  1. f

    Multivariate analysis for entire sample using logistic regression.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Oct 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramay, Brooke M.; Cerón, Alejandro; Méndez-Alburez, Luis Pablo; Lou-Meda, Randall (2017). Multivariate analysis for entire sample using logistic regression. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001806787
    Explore at:
    Dataset updated
    Oct 16, 2017
    Authors
    Ramay, Brooke M.; Cerón, Alejandro; Méndez-Alburez, Luis Pablo; Lou-Meda, Randall
    Description

    Multivariate analysis for entire sample using logistic regression.

  2. Component loadings for a previously reported real-life example of a...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alfred Ultsch; Jörn Lötsch (2023). Component loadings for a previously reported real-life example of a principal component analysis performed on the intercorrelation matrix among eight pain threshold measurements ([3]; for comparison, see Table 2 in that publication). [Dataset]. http://doi.org/10.1371/journal.pone.0129767.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alfred Ultsch; Jörn Lötsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The relevant four principal components (PCs) are given in bold font. Without the present method, only PCs #1 - #3 with eigenvalues > 1 [11,12] could be validly retained. The set of three principal allowed to show that all different pain measures shared an important common source of variance (PC1) pain evoked by cold stimuli, with or without sensitization by topical menthol application, by blunt pressure or by electrical stimuli (5 Hz sine waves) shared a common source of variance (PC2), and a further common source of variance e was shared by pain evoked by heat stimuli, with or without sensitization by topical capsaicin application, or by punctate mechanical pressure. However, with applying the here reported method, PC4 can now be also be retained, which singles out heat pain corresponding to the different pathophysiology underlying heat perception.Component loadings for a previously reported real-life example of a principal component analysis performed on the intercorrelation matrix among eight pain threshold measurements ([3]; for comparison, see Table 2 in that publication).

  3. f

    Additional file 1 of The multivariate analysis of variance as a powerful...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Apr 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruxton, Graeme D.; Malkemper, E. Pascal; Landler, Lukas (2022). Additional file 1 of The multivariate analysis of variance as a powerful approach for circular data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000445189
    Explore at:
    Dataset updated
    Apr 28, 2022
    Authors
    Ruxton, Graeme D.; Malkemper, E. Pascal; Landler, Lukas
    Description

    Additional file 1. R-code and example data to perform the statistical tests described in the manuscript.

  4. r

    Small-sample confidence intervals for multivariate impulse response...

    • resodate.org
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Pesavento (2025). Small-sample confidence intervals for multivariate impulse response functions at long horizons (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9zbWFsbHNhbXBsZS1jb25maWRlbmNlLWludGVydmFscy1mb3ItbXVsdGl2YXJpYXRlLWltcHVsc2UtcmVzcG9uc2UtZnVuY3Rpb25zLWF0LWxvbmctaG9yaXpvbnM=
    Explore at:
    Dataset updated
    Oct 6, 2025
    Dataset provided by
    Journal of Applied Econometrics
    ZBW
    ZBW Journal Data Archive
    Authors
    Elena Pesavento
    Description

    Existing methods for constructing confidence bands for multivariate impulse response functions may have poor coverage at long lead times when variables are highly persistent. The goal of this paper is to propose a simple method that is not pointwise and that is robust to the presence of highly persistent processes. We use approximations based on local-to-unity asymptotic theory, and allow the horizon to be a fixed fraction of the sample size. We show that our method has better coverage properties at long horizons than existing methods, and may provide different economic conclusions in empirical applications. We also propose a modification of this method which has good coverage properties at both short and long horizons.

  5. f

    Data from: Applying univariate vs. multivariate statistics to investigate...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerber, Susanne; Searle-White, Emily; Todorov, Hristo (2020). Applying univariate vs. multivariate statistics to investigate therapeutic efficacy in (pre)clinical trials: A Monte Carlo simulation study on the example of a controlled preclinical neurotrauma trial [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000493539
    Explore at:
    Dataset updated
    Mar 26, 2020
    Authors
    Gerber, Susanne; Searle-White, Emily; Todorov, Hristo
    Description

    BackgroundSmall sample sizes combined with multiple correlated endpoints pose a major challenge in the statistical analysis of preclinical neurotrauma studies. The standard approach of applying univariate tests on individual response variables has the advantage of simplicity of interpretation, but it fails to account for the covariance/correlation in the data. In contrast, multivariate statistical techniques might more adequately capture the multi-dimensional pathophysiological pattern of neurotrauma and therefore provide increased sensitivity to detect treatment effects.ResultsWe systematically evaluated the performance of univariate ANOVA, Welch’s ANOVA and linear mixed effects models versus the multivariate techniques, ANOVA on principal component scores and MANOVA tests by manipulating factors such as sample and effect size, normality and homogeneity of variance in computer simulations. Linear mixed effects models demonstrated the highest power when variance between groups was equal or variance ratio was 1:2. In contrast, Welch’s ANOVA outperformed the remaining methods with extreme variance heterogeneity. However, power only reached acceptable levels of 80% in the case of large simulated effect sizes and at least 20 measurements per group or moderate effects with at least 40 replicates per group. In addition, we evaluated the capacity of the ordination techniques, principal component analysis (PCA), redundancy analysis (RDA), linear discriminant analysis (LDA), and partial least squares discriminant analysis (PLS-DA) to capture patterns of treatment effects without formal hypothesis testing. While LDA suffered from a high false positive rate due to multicollinearity, PCA, RDA, and PLS-DA were robust and PLS-DA outperformed PCA and RDA in capturing a true treatment effect pattern.ConclusionsMultivariate tests do not provide an appreciable increase in power compared to univariate techniques to detect group differences in preclinical studies. However, PLS-DA seems to be a useful ordination technique to explore treatment effect patterns without formal hypothesis testing.

  6. Air Pollution Forecasting - LSTM Multivariate

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupak Roy/ Bob (2022). Air Pollution Forecasting - LSTM Multivariate [Dataset]. https://www.kaggle.com/datasets/rupakroy/lstm-datasets-multivariate-univariate
    Explore at:
    zip(454764 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Rupak Roy/ Bob
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    THE MISSION

    The story behind the dataset is how to apply LSTM architecture to understand and apply multiple variables together to contribute more accuracy towards forecasting.

    THE CONTENT

    Air Pollution Forecasting The Air Quality dataset.

    This is a dataset that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China.

    The data includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain. The complete feature list in the raw data is as follows:

    No: row number year: year of data in this row month: month of data in this row day: day of data in this row hour: hour of data in this row pm2.5: PM2.5 concentration DEWP: Dew Point TEMP: Temperature PRES: Pressure cbwd: Combined wind direction Iws: Cumulated wind speed Is: Cumulated hours of snow Ir: Cumulated hours of rain We can use this data and frame a forecasting problem where, given the weather conditions and pollution for prior hours, we forecast the pollution at the next hour.

  7. d

    Data from: \"Size\" and \"shape\" in the measurement of multivariate...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Greenacre (2025). \"Size\" and \"shape\" in the measurement of multivariate proximity [Dataset]. http://doi.org/10.5061/dryad.6r5j8
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Michael Greenacre
    Time period covered
    Mar 16, 2018
    Description
    1. Ordination and clustering methods are widely applied to ecological data that are nonnegative, for example species abundances or biomasses. These methods rely on a measure of multivariate proximity that quantifies differences between the sampling units (e.g. individuals, stations, time points), leading to results such as: (i) ordinations of the units, where interpoint distances optimally display the measured differences; (ii) clustering the units into homogeneous clusters; or (iii) assessing differences between pre-specified groups of units (e.g., regions, periods, treatment-control groups). 2. These methods all conceal a fundamental question: To what extent are the differences between the sampling units, computed according to the chosen proximity function, capturing the "size" in the multivariate observations, or their "shape"? "Size" means the overall level of the measurements: for example, some samples contain higher total abundances or more biomass, others less. "Shape" mea...
  8. Z

    Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Mar 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aarts, Emmeke (2022). Accompanying simulated data for "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity" [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_6384006
    Explore at:
    Dataset updated
    Mar 25, 2022
    Dataset provided by
    Mildiner Moraga, Sebastian
    Aarts, Emmeke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

    This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

  9. i

    MATLAB Scripts to Partition Multivariate Sedimentary Geochemical Data Sets

    • get.iedadata.org
    xml
    Updated 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murray, Richard; Pisias, Nicklas (2012). MATLAB Scripts to Partition Multivariate Sedimentary Geochemical Data Sets [Dataset]. http://doi.org/10.1594/IEDA/100047
    Explore at:
    xmlAvailable download formats
    Dataset updated
    2012
    Authors
    Murray, Richard; Pisias, Nicklas
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    Abstract: This contribution provides MATLAB scripts to assist users in factor analysis, constrained least squares regression, and total inversion techniques. These scripts respond to the increased availability of large datasets generated by modern instrumentation, for example, the SedDB database. The download (.zip) includes one descriptive paper (.pdf) and one file of the scripts and example output (.doc). Other Description: Pisias, N. G., R. W. Murray, and R. P. Scudder (2013), Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts, Geochem. Geophys. Geosyst., 14, 4015–4020, doi:10.1002/ggge.20247.

  10. Accompanying simulated data for "Go multivariate: recommendations on...

    • zenodo.org
    zip
    Updated Mar 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts (2022). Accompanying simulated data for "Go multivariate: recommendations on multilevel hidden Markov models with categorical data of varying complexity" [Dataset]. http://doi.org/10.5281/zenodo.6385197
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 26, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

    This repository contains data generated for the manuscript: "Go multivariate: recommendations on multilevel hidden Markov models with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

  11. Multivariate Time Series Search - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Multivariate Time Series Search - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/multivariate-time-series-search
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem — (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.

  12. Expanded HR Analytics Data Lab

    • kaggle.com
    zip
    Updated Aug 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KrisMurphy (2017). Expanded HR Analytics Data Lab [Dataset]. https://www.kaggle.com/krismurphy01/data-lab
    Explore at:
    zip(541119 bytes)Available download formats
    Dataset updated
    Aug 18, 2017
    Authors
    KrisMurphy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    We are building an HR analytics data set that can be used for building useful reports, understanding the difference between data and information, and multivariate analysis. The data set we are building is similar to that used in several academic reports and what may be found in ERP HR subsystems.

    We will update the sample data set as we gain a better understanding of the data elements using the calculations that exist in scholarly journals. Specifically, we will use the correlation tables to rebuild the data sets.

    Content

    The fields represent a fictitious data set where a survey was taken and actual employee metrics exist for a particular organization. None of this data is real.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Prabhjot Singh contributed a portion of the data (the columns on the right before the survey data was added). https://www.kaggle.com/prabhjotindia https://www.kaggle.com/prabhjotindia/visualizing-employee-data/data

    About this Dataset Why are our best and most experienced employees leaving prematurely? Have fun with this database and try to predict which valuable employees will leave next. Fields in the dataset include:

    Satisfaction Level Last evaluation Number of projects Average monthly hours Time spent at the company Whether they have had a work accident Whether they have had a promotion in the last 5 years Departments Salary

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  13. Examples of applying a multivariate Wilson prior to comparative...

    • zenodo.org
    bin, zip
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doeke Hekstra; Doeke Hekstra; Harrison K. Wang; Harrison K. Wang; Kevin M. Dalton; Kevin M. Dalton (2025). Examples of applying a multivariate Wilson prior to comparative crystallography data [Dataset]. http://doi.org/10.5281/zenodo.17082201
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Sep 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Doeke Hekstra; Doeke Hekstra; Harrison K. Wang; Harrison K. Wang; Kevin M. Dalton; Kevin M. Dalton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains four examples of merging crystallographic intensities with a bivariate prior:

    • time-resolved Laue crystallography of the photoactive yellow protein
    • anomalous diffraction from serial XFEL crystallography of thermolysin
    • anomalous diffraction from Laue crystallography of NaI-soaked lysozyme
    • fragment screening monochromatic data of Nsp3 Mac1

    Additionally, we provide several auxilliary examples:

    • For PYP, an example where we set aside a test fraction to semi-independently optimize the double-Wilson r
    • for lysozyme, two examples, one where we use Laue-DIALS instead of precognition, and another where we set aside the first 90 images to semi-independently optimize the double-Wilson r
    • For thermolysin, an example where we use a bivariate versus a univariate prior as the number of scaled images grows, and another where we set aside the first 395 images to semi-independently optimize the double-Wilson r

    Every example includes scripts to run Careless as well as to analyze the outputs in order to reproduce the figures in the double-Wilson manuscript. For every example, there is a `README.md` that describes the contents of each example folder.

  14. i

    Supplement to Multivariate statistical analysis and partitioning of...

    • get.iedadata.org
    • search.dataone.org
    • +1more
    xml
    Updated 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murray, Richard; Scudder, Rachel; Pisias, Nicklas (2014). Supplement to Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts [Dataset]. http://doi.org/10.1594/IEDA/100422
    Explore at:
    xmlAvailable download formats
    Dataset updated
    2014
    Authors
    Murray, Richard; Scudder, Rachel; Pisias, Nicklas
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    Abstract: We present here annotated MATLAB scripts (and specific guidelines for their use) for Q-mode factor analysis, a constrained least squares multiple linear regression technique, and a total inversion protocol, that are based on the well-known approaches taken by Dymond (1981), Leinen and Pisias (1984), Kyte et al. (1993), and their predecessors. Although these techniques have been used by investigators for the past decades, their application has been neither consistent nor transparent, as their code has remained in-house or in formats not commonly used by many of today's researchers (e.g., FORTRAN). In addition to providing the annotated scripts and instructions for use, we include a sample data set for the user to test their own manipulation of the scripts. Other Description: Pisias, N. G., R. W. Murray, and R. P. Scudder (2013), Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts, Geochem. Geophys. Geosyst., 14, 4015–4020, doi:10.1002/ggge.20247.

  15. Example of data.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel (2023). Example of data. [Dataset]. http://doi.org/10.1371/journal.pone.0159649.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example of data.

  16. d

    CLM - Groundwater Chemistry outputs from multivariate statistics

    • data.gov.au
    • researchdata.edu.au
    zip
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2022). CLM - Groundwater Chemistry outputs from multivariate statistics [Dataset]. https://data.gov.au/data/dataset/4c128b86-1089-4ba9-85f8-76bbd65db396
    Explore at:
    zip(1324446)Available download formats
    Dataset updated
    Apr 13, 2022
    Dataset authored and provided by
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from "QLD DNRM Hydrochemistry with QA/QC" and "NSW Office of Water Groundwater Quality extract 28_nov_2013" data provided by the Qld DNRM and NSW Office of Water. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    The dataset contains the outputs of a multivariate statistical analysis conducted on groundwater chemistry data for different river basins or sub-regions within the CLM bioregion. The analysis was conducted using Statgraphics software. Only samples that passed the QA/QC checks (e.g. charge balances within +-5%) were included in the analysis.

    Dataset History

    The original datasets were clipped to the CLM bioregion. After an initial data quality check, only those samples that met the criteria (e.g. charge balance between + and - 5%) were included in the multivariate statistical analysis. Multivariate statistical analysis was conducted on the remaining dataset (i.e. those samples that did not meet the QA/QC criteria removed), resulting in different groundwater chemistry groups.

    The methodology is described in more detail by Raiber et al., (2012).

    M Raiber, PA White, CJ Daughney, C Tschritter, P Davidson (2012). Three-dimensional geological modelling and multivariate statistical analysis of water chemistry data to analyse and visualise aquifer structure and groundwater composition in the Wairau Plain, Marlborough District, New Zealand, Journal of Hydrology 436, 13-34

    Dataset Citation

    Bioregional Assessment Programme (2014) CLM - Groundwater Chemistry outputs from multivariate statistics. Bioregional Assessment Derived Dataset. Viewed 28 September 2017, http://data.bioregionalassessments.gov.au/dataset/4c128b86-1089-4ba9-85f8-76bbd65db396.

    Dataset Ancestors

  17. Z

    Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...

    • data-staging.niaid.nih.gov
    Updated Mar 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mildiner Moraga, Sebastian; Aarts, Emmeke (2022). Accompanying simulated data for "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6384006
    Explore at:
    Dataset updated
    Mar 25, 2022
    Dataset provided by
    Utrecht University
    Authors
    Mildiner Moraga, Sebastian; Aarts, Emmeke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

    This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

  18. m

    SPHERE: Students' performance dataset of conceptual understanding,...

    • data.mendeley.com
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    Purwoko Haryadi Santoso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.

  19. Time Series Multivariate Educational Data Analysis

    • kaggle.com
    zip
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wayne_127 (2024). Time Series Multivariate Educational Data Analysis [Dataset]. https://www.kaggle.com/wayne127/time-series-multivariate-educational-data-analysis
    Explore at:
    zip(3543111 bytes)Available download formats
    Dataset updated
    Nov 27, 2024
    Authors
    Wayne_127
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data Description

    This dataset provides simulated learning behavior data for 15 classes over 148 days, from August 31, 2023, to January 25, 2024. The data includes basic learner information (a total of 1,364 learners), basic information about the exercises (a total of 44 items), and learner submission behavior logs (a total of 232,818 records). All data is provided in CSV format. The dataset contains noise such as missing values, outliers, or data inconsistencies (e.g., invalid classes, missing log entries, etc.), which participants need to identify and handle. The specific fields of the three data tables are described as follows:

    Learner Basic Information Table

    Data_StudentInfo.csv

    Field NameDescriptionRemarks
    indexLearner index
    student_IDLearner IDUnique identifier
    sexGender
    ageAge
    majorMajor

    Exercise Basic Information Table

    Data_TitleInfo.csv

    Field NameDescriptionRemarks
    indexExercise index
    title_IDExercise IDUnique identifier
    scoreExercise score
    knowledgeKnowledge pointsEach exercise may test multiple knowledge points
    sub_knowledgeSub-knowledge pointsKnowledge points may have multiple sub-knowledge points

    Learner Submission Behavior Log Information

    The Data_SubmitRecord folder contains the learner submission behavior log data for 15 classes (Class1~Class15). For example, the file SubmitRecord-Class1.csv contains the submission logs for Class 1.

    SubmitRecord-Class1.csv

    Field NameDescriptionRemarks
    indexRecord index
    classClass
    timeLog generation timeTimestamp, accurate to the second
    stateSubmission stateExamples include fully correct, partially correct, etc., with a total of 12 statuses
    scoreSubmission scoreScore obtained from test cases
    title_IDExercise IDReferences title_ID in the exercise basic information table
    methodLanguageProgramming language used by the learner
    memoryMemoryUnit: KB
    timeconsumeTime consumedUnit: milliseconds
    student_IDLearner IDReferences student_ID in the learner basic information table

    "Analysis for Insight: Visual Analysis Challenge of Multivariate Time-series Educational Data"

    NorthClass is a renowned higher education training institution offering over 100 courses across a wide range of disciplines, including literature, science, engineering, medicine, economics, and management. With approximately 300,000 registered learners, the institution has created a flexible and convenient learning environment by providing high-quality educational services.

    To keep up with the trends of the digital age and enhance its market competitiveness in the technology sector, NorthClass has developed a programming course. Learners are required to complete designated programming tasks during the course, with the opportunity for multiple attempts and submissions to ensure mastery and application of the learned knowledge. At the end of the course, the institution collected learners' time-series learning data to evaluate whether the teaching outcomes met predefined standards and requirements.

    To optimize teaching resources and improve the quality of instruction, the institution plans to establish a specialized "Innovative Learning Development Group." This group will explore how to leverage next-generation AI technologies to empower education and better cultivate innovative talent capable of meeting the demands of the modern era.

    Visualization and visual analysis utilize the high-bandwidth capabilities of human visual perception to transform complex time-series learning behavior data into graphical representations. These techniques enable the diagnosis and analysis of learners' knowledge mastery levels, dynamic tracking of the evolution of learning behaviors, and identification and analysis of potential factors causing learning difficulties.

    As a member of the Innovative Learning Development Group, your task is to design and implement a ...

  20. f

    Multivariate analysis on attitudes binary variable (positive versus negative...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Sep 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tinland, Aurlie; Manning, Rachel; Vargas-Moniz, Maria; Ornelas, Jose; Bernad, Roberto; Wolf, Judith; Kallmen, Hakan; Auquier, Pascal; Bokszczanin, Anna; Petit, Junie; Spinnewijn, Freek; Loubiere, Sandrine; Santinello, Massimo (2019). Multivariate analysis on attitudes binary variable (positive versus negative attitudes) (N = 4,670, weighted sample). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000069656
    Explore at:
    Dataset updated
    Sep 25, 2019
    Authors
    Tinland, Aurlie; Manning, Rachel; Vargas-Moniz, Maria; Ornelas, Jose; Bernad, Roberto; Wolf, Judith; Kallmen, Hakan; Auquier, Pascal; Bokszczanin, Anna; Petit, Junie; Spinnewijn, Freek; Loubiere, Sandrine; Santinello, Massimo
    Description

    Multivariate analysis on attitudes binary variable (positive versus negative attitudes) (N = 4,670, weighted sample).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ramay, Brooke M.; Cerón, Alejandro; Méndez-Alburez, Luis Pablo; Lou-Meda, Randall (2017). Multivariate analysis for entire sample using logistic regression. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001806787

Multivariate analysis for entire sample using logistic regression.

Explore at:
Dataset updated
Oct 16, 2017
Authors
Ramay, Brooke M.; Cerón, Alejandro; Méndez-Alburez, Luis Pablo; Lou-Meda, Randall
Description

Multivariate analysis for entire sample using logistic regression.

Search
Clear search
Close search
Google apps
Main menu