100+ datasets found

f
Multivariate analysis for entire sample using logistic regression.
datasetcatalog.nlm.nih.gov
figshare.com
+1more
Updated Oct 16, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramay, Brooke M.; Cerón, Alejandro; Méndez-Alburez, Luis Pablo; Lou-Meda, Randall (2017). Multivariate analysis for entire sample using logistic regression. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001806787
Explore at:
Dataset updated
Oct 16, 2017
Authors
Ramay, Brooke M.; Cerón, Alejandro; Méndez-Alburez, Luis Pablo; Lou-Meda, Randall
Description
Multivariate analysis for entire sample using logistic regression.
Component loadings for a previously reported real-life example of a...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfred Ultsch; Jörn Lötsch (2023). Component loadings for a previously reported real-life example of a principal component analysis performed on the intercorrelation matrix among eight pain threshold measurements ([3]; for comparison, see Table 2 in that publication). [Dataset]. http://doi.org/10.1371/journal.pone.0129767.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0129767.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Alfred Ultsch; Jörn Lötsch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The relevant four principal components (PCs) are given in bold font. Without the present method, only PCs #1 - #3 with eigenvalues > 1 [11,12] could be validly retained. The set of three principal allowed to show that all different pain measures shared an important common source of variance (PC1) pain evoked by cold stimuli, with or without sensitization by topical menthol application, by blunt pressure or by electrical stimuli (5 Hz sine waves) shared a common source of variance (PC2), and a further common source of variance e was shared by pain evoked by heat stimuli, with or without sensitization by topical capsaicin application, or by punctate mechanical pressure. However, with applying the here reported method, PC4 can now be also be retained, which singles out heat pain corresponding to the different pathophysiology underlying heat perception.Component loadings for a previously reported real-life example of a principal component analysis performed on the intercorrelation matrix among eight pain threshold measurements ([3]; for comparison, see Table 2 in that publication).
f
Additional file 1 of The multivariate analysis of variance as a powerful...
datasetcatalog.nlm.nih.gov
springernature.figshare.com
Updated Apr 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruxton, Graeme D.; Malkemper, E. Pascal; Landler, Lukas (2022). Additional file 1 of The multivariate analysis of variance as a powerful approach for circular data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000445189
Explore at:
Dataset updated
Apr 28, 2022
Authors
Ruxton, Graeme D.; Malkemper, E. Pascal; Landler, Lukas
Description
Additional file 1. R-code and example data to perform the statistical tests described in the manuscript.
r
Small-sample confidence intervals for multivariate impulse response...
resodate.org
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elena Pesavento (2025). Small-sample confidence intervals for multivariate impulse response functions at long horizons (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9zbWFsbHNhbXBsZS1jb25maWRlbmNlLWludGVydmFscy1mb3ItbXVsdGl2YXJpYXRlLWltcHVsc2UtcmVzcG9uc2UtZnVuY3Rpb25zLWF0LWxvbmctaG9yaXpvbnM=
Explore at:
Dataset updated
Oct 6, 2025
Dataset provided by
Journal of Applied Econometrics
ZBW
ZBW Journal Data Archive
Authors
Elena Pesavento
Description
Existing methods for constructing confidence bands for multivariate impulse response functions may have poor coverage at long lead times when variables are highly persistent. The goal of this paper is to propose a simple method that is not pointwise and that is robust to the presence of highly persistent processes. We use approximations based on local-to-unity asymptotic theory, and allow the horizon to be a fixed fraction of the sample size. We show that our method has better coverage properties at long horizons than existing methods, and may provide different economic conclusions in empirical applications. We also propose a modification of this method which has good coverage properties at both short and long horizons.
f
Data from: Applying univariate vs. multivariate statistics to investigate...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerber, Susanne; Searle-White, Emily; Todorov, Hristo (2020). Applying univariate vs. multivariate statistics to investigate therapeutic efficacy in (pre)clinical trials: A Monte Carlo simulation study on the example of a controlled preclinical neurotrauma trial [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000493539
Explore at:
Dataset updated
Mar 26, 2020
Authors
Gerber, Susanne; Searle-White, Emily; Todorov, Hristo
Description
BackgroundSmall sample sizes combined with multiple correlated endpoints pose a major challenge in the statistical analysis of preclinical neurotrauma studies. The standard approach of applying univariate tests on individual response variables has the advantage of simplicity of interpretation, but it fails to account for the covariance/correlation in the data. In contrast, multivariate statistical techniques might more adequately capture the multi-dimensional pathophysiological pattern of neurotrauma and therefore provide increased sensitivity to detect treatment effects.ResultsWe systematically evaluated the performance of univariate ANOVA, Welch’s ANOVA and linear mixed effects models versus the multivariate techniques, ANOVA on principal component scores and MANOVA tests by manipulating factors such as sample and effect size, normality and homogeneity of variance in computer simulations. Linear mixed effects models demonstrated the highest power when variance between groups was equal or variance ratio was 1:2. In contrast, Welch’s ANOVA outperformed the remaining methods with extreme variance heterogeneity. However, power only reached acceptable levels of 80% in the case of large simulated effect sizes and at least 20 measurements per group or moderate effects with at least 40 replicates per group. In addition, we evaluated the capacity of the ordination techniques, principal component analysis (PCA), redundancy analysis (RDA), linear discriminant analysis (LDA), and partial least squares discriminant analysis (PLS-DA) to capture patterns of treatment effects without formal hypothesis testing. While LDA suffered from a high false positive rate due to multicollinearity, PCA, RDA, and PLS-DA were robust and PLS-DA outperformed PCA and RDA in capturing a true treatment effect pattern.ConclusionsMultivariate tests do not provide an appreciable increase in power compared to univariate techniques to detect group differences in preclinical studies. However, PLS-DA seems to be a useful ordination technique to explore treatment effect patterns without formal hypothesis testing.
Air Pollution Forecasting - LSTM Multivariate
kaggle.com
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupak Roy/ Bob (2022). Air Pollution Forecasting - LSTM Multivariate [Dataset]. https://www.kaggle.com/datasets/rupakroy/lstm-datasets-multivariate-univariate
Explore at:
zip(454764 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Rupak Roy/ Bob
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
THE MISSION

The story behind the dataset is how to apply LSTM architecture to understand and apply multiple variables together to contribute more accuracy towards forecasting.

THE CONTENT

Air Pollution Forecasting The Air Quality dataset.

This is a dataset that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China.

The data includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain. The complete feature list in the raw data is as follows:

No: row number year: year of data in this row month: month of data in this row day: day of data in this row hour: hour of data in this row pm2.5: PM2.5 concentration DEWP: Dew Point TEMP: Temperature PRES: Pressure cbwd: Combined wind direction Iws: Cumulated wind speed Is: Cumulated hours of snow Ir: Cumulated hours of rain We can use this data and frame a forecasting problem where, given the weather conditions and pollution for prior hours, we forecast the pollution at the next hour.
d
Data from: \"Size\" and \"shape\" in the measurement of multivariate...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Greenacre (2025). \"Size\" and \"shape\" in the measurement of multivariate proximity [Dataset]. http://doi.org/10.5061/dryad.6r5j8
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.6r5j8
Dataset updated
Jul 4, 2025
Dataset provided by
Dryad Digital Repository
Authors
Michael Greenacre
Time period covered
Mar 16, 2018
Description
Ordination and clustering methods are widely applied to ecological data that are nonnegative, for example species abundances or biomasses. These methods rely on a measure of multivariate proximity that quantifies differences between the sampling units (e.g. individuals, stations, time points), leading to results such as: (i) ordinations of the units, where interpoint distances optimally display the measured differences; (ii) clustering the units into homogeneous clusters; or (iii) assessing differences between pre-specified groups of units (e.g., regions, periods, treatment-control groups). 2. These methods all conceal a fundamental question: To what extent are the differences between the sampling units, computed according to the chosen proximity function, capturing the "size" in the multivariate observations, or their "shape"? "Size" means the overall level of the measurements: for example, some samples contain higher total abundances or more biomass, others less. "Shape" mea...
Z
Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...
nde-dev.biothings.io
data.niaid.nih.gov
+1more
Updated Mar 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aarts, Emmeke (2022). Accompanying simulated data for "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity" [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_6384006
Explore at:
Dataset updated
Mar 25, 2022
Dataset provided by
Mildiner Moraga, Sebastian
Aarts, Emmeke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.
i
MATLAB Scripts to Partition Multivariate Sedimentary Geochemical Data Sets
get.iedadata.org
xml
Updated 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murray, Richard; Pisias, Nicklas (2012). MATLAB Scripts to Partition Multivariate Sedimentary Geochemical Data Sets [Dataset]. http://doi.org/10.1594/IEDA/100047
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.1594/IEDA/100047
Dataset updated
2012
Authors
Murray, Richard; Pisias, Nicklas
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
Abstract: This contribution provides MATLAB scripts to assist users in factor analysis, constrained least squares regression, and total inversion techniques. These scripts respond to the increased availability of large datasets generated by modern instrumentation, for example, the SedDB database. The download (.zip) includes one descriptive paper (.pdf) and one file of the scripts and example output (.doc). Other Description: Pisias, N. G., R. W. Murray, and R. P. Scudder (2013), Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts, Geochem. Geophys. Geosyst., 14, 4015–4020, doi:10.1002/ggge.20247.
Accompanying simulated data for "Go multivariate: recommendations on...
zenodo.org
zip
Updated Mar 26, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts (2022). Accompanying simulated data for "Go multivariate: recommendations on multilevel hidden Markov models with categorical data of varying complexity" [Dataset]. http://doi.org/10.5281/zenodo.6385197
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6385197
Dataset updated
Mar 26, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

This repository contains data generated for the manuscript: "Go multivariate: recommendations on multilevel hidden Markov models with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.
Multivariate Time Series Search - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Multivariate Time Series Search - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/multivariate-time-series-search
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem — (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.
Expanded HR Analytics Data Lab
kaggle.com
zip
Updated Aug 18, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KrisMurphy (2017). Expanded HR Analytics Data Lab [Dataset]. https://www.kaggle.com/krismurphy01/data-lab
Explore at:
zip(541119 bytes)Available download formats
Dataset updated
Aug 18, 2017
Authors
KrisMurphy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

We are building an HR analytics data set that can be used for building useful reports, understanding the difference between data and information, and multivariate analysis. The data set we are building is similar to that used in several academic reports and what may be found in ERP HR subsystems.

We will update the sample data set as we gain a better understanding of the data elements using the calculations that exist in scholarly journals. Specifically, we will use the correlation tables to rebuild the data sets.

Content

The fields represent a fictitious data set where a survey was taken and actual employee metrics exist for a particular organization. None of this data is real.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Prabhjot Singh contributed a portion of the data (the columns on the right before the survey data was added). https://www.kaggle.com/prabhjotindia https://www.kaggle.com/prabhjotindia/visualizing-employee-data/data

About this Dataset Why are our best and most experienced employees leaving prematurely? Have fun with this database and try to predict which valuable employees will leave next. Fields in the dataset include:

Satisfaction Level Last evaluation Number of projects Average monthly hours Time spent at the company Whether they have had a work accident Whether they have had a promotion in the last 5 years Departments Salary

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Examples of applying a multivariate Wilson prior to comparative...
zenodo.org
bin, zip
Updated Sep 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doeke Hekstra; Doeke Hekstra; Harrison K. Wang; Harrison K. Wang; Kevin M. Dalton; Kevin M. Dalton (2025). Examples of applying a multivariate Wilson prior to comparative crystallography data [Dataset]. http://doi.org/10.5281/zenodo.17082201
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17082201
Dataset updated
Sep 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Doeke Hekstra; Doeke Hekstra; Harrison K. Wang; Harrison K. Wang; Kevin M. Dalton; Kevin M. Dalton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contains four examples of merging crystallographic intensities with a bivariate prior:

time-resolved Laue crystallography of the photoactive yellow protein

anomalous diffraction from serial XFEL crystallography of thermolysin

anomalous diffraction from Laue crystallography of NaI-soaked lysozyme

fragment screening monochromatic data of Nsp3 Mac1

Additionally, we provide several auxilliary examples:

For PYP, an example where we set aside a test fraction to semi-independently optimize the double-Wilson r

for lysozyme, two examples, one where we use Laue-DIALS instead of precognition, and another where we set aside the first 90 images to semi-independently optimize the double-Wilson r

For thermolysin, an example where we use a bivariate versus a univariate prior as the number of scaled images grows, and another where we set aside the first 395 images to semi-independently optimize the double-Wilson r

Every example includes scripts to run Careless as well as to analyze the outputs in order to reproduce the figures in the double-Wilson manuscript. For every example, there is a `README.md` that describes the contents of each example folder.
i
Supplement to Multivariate statistical analysis and partitioning of...
get.iedadata.org
search.dataone.org
+1more
xml
Updated 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murray, Richard; Scudder, Rachel; Pisias, Nicklas (2014). Supplement to Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts [Dataset]. http://doi.org/10.1594/IEDA/100422
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.1594/IEDA/100422
Dataset updated
2014
Authors
Murray, Richard; Scudder, Rachel; Pisias, Nicklas
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
Abstract: We present here annotated MATLAB scripts (and specific guidelines for their use) for Q-mode factor analysis, a constrained least squares multiple linear regression technique, and a total inversion protocol, that are based on the well-known approaches taken by Dymond (1981), Leinen and Pisias (1984), Kyte et al. (1993), and their predecessors. Although these techniques have been used by investigators for the past decades, their application has been neither consistent nor transparent, as their code has remained in-house or in formats not commonly used by many of today's researchers (e.g., FORTRAN). In addition to providing the annotated scripts and instructions for use, we include a sample data set for the user to test their own manipulation of the scripts. Other Description: Pisias, N. G., R. W. Murray, and R. P. Scudder (2013), Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts, Geochem. Geophys. Geosyst., 14, 4015–4020, doi:10.1002/ggge.20247.
Example of data.
plos.figshare.com
figshare.com
xls
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel (2023). Example of data. [Dataset]. http://doi.org/10.1371/journal.pone.0159649.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0159649.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example of data.
d
CLM - Groundwater Chemistry outputs from multivariate statistics
data.gov.au
researchdata.edu.au
zip
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). CLM - Groundwater Chemistry outputs from multivariate statistics [Dataset]. https://data.gov.au/data/dataset/4c128b86-1089-4ba9-85f8-76bbd65db396
Explore at:
zip(1324446)Available download formats
Dataset updated
Apr 13, 2022
Dataset authored and provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from "QLD DNRM Hydrochemistry with QA/QC" and "NSW Office of Water Groundwater Quality extract 28_nov_2013" data provided by the Qld DNRM and NSW Office of Water. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

The dataset contains the outputs of a multivariate statistical analysis conducted on groundwater chemistry data for different river basins or sub-regions within the CLM bioregion. The analysis was conducted using Statgraphics software. Only samples that passed the QA/QC checks (e.g. charge balances within +-5%) were included in the analysis.

Dataset History

The original datasets were clipped to the CLM bioregion. After an initial data quality check, only those samples that met the criteria (e.g. charge balance between + and - 5%) were included in the multivariate statistical analysis. Multivariate statistical analysis was conducted on the remaining dataset (i.e. those samples that did not meet the QA/QC criteria removed), resulting in different groundwater chemistry groups.

The methodology is described in more detail by Raiber et al., (2012).

M Raiber, PA White, CJ Daughney, C Tschritter, P Davidson (2012). Three-dimensional geological modelling and multivariate statistical analysis of water chemistry data to analyse and visualise aquifer structure and groundwater composition in the Wairau Plain, Marlborough District, New Zealand, Journal of Hydrology 436, 13-34

Dataset Citation

Bioregional Assessment Programme (2014) CLM - Groundwater Chemistry outputs from multivariate statistics. Bioregional Assessment Derived Dataset. Viewed 28 September 2017, http://data.bioregionalassessments.gov.au/dataset/4c128b86-1089-4ba9-85f8-76bbd65db396.

Dataset Ancestors

Derived From NSW Office of Water - Groundwater quality extract

Derived From QLD DNRM Hydrochemistry with QA/QC

Derived From QLD Department of Natural Resources and Mining Groundwater Database Extract 20131111
Z
Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...
data-staging.niaid.nih.gov
Updated Mar 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mildiner Moraga, Sebastian; Aarts, Emmeke (2022). Accompanying simulated data for "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6384006
Explore at:
Dataset updated
Mar 25, 2022
Dataset provided by
Utrecht University
Authors
Mildiner Moraga, Sebastian; Aarts, Emmeke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.
m
SPHERE: Students' performance dataset of conceptual understanding,...
data.mendeley.com
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
Explore at:
Unique identifier
https://doi.org/10.17632/88d7m2fv7p.2
Dataset updated
Jan 15, 2025
Authors
Purwoko Haryadi Santoso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.

Time Series Multivariate Educational Data Analysis

kaggle.com

zip

Updated Nov 27, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Wayne_127 (2024). Time Series Multivariate Educational Data Analysis [Dataset]. https://www.kaggle.com/wayne127/time-series-multivariate-educational-data-analysis

Explore at:

zip(3543111 bytes)Available download formats

Dataset updated

Nov 27, 2024

Authors

Wayne_127

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Data Description

This dataset provides simulated learning behavior data for 15 classes over 148 days, from August 31, 2023, to January 25, 2024. The data includes basic learner information (a total of 1,364 learners), basic information about the exercises (a total of 44 items), and learner submission behavior logs (a total of 232,818 records). All data is provided in CSV format. The dataset contains noise such as missing values, outliers, or data inconsistencies (e.g., invalid classes, missing log entries, etc.), which participants need to identify and handle. The specific fields of the three data tables are described as follows:

Learner Basic Information Table

Data_StudentInfo.csv

Field Name	Description	Remarks
index	Learner index
student_ID	Learner ID	Unique identifier
sex	Gender
age	Age
major	Major

Exercise Basic Information Table

Data_TitleInfo.csv

Field Name	Description	Remarks
index	Exercise index
title_ID	Exercise ID	Unique identifier
score	Exercise score
knowledge	Knowledge points	Each exercise may test multiple knowledge points
sub_knowledge	Sub-knowledge points	Knowledge points may have multiple sub-knowledge points

Learner Submission Behavior Log Information

The Data_SubmitRecord folder contains the learner submission behavior log data for 15 classes (Class1~Class15). For example, the file SubmitRecord-Class1.csv contains the submission logs for Class 1.

SubmitRecord-Class1.csv

Field Name	Description	Remarks
index	Record index
class	Class
time	Log generation time	Timestamp, accurate to the second
state	Submission state	Examples include fully correct, partially correct, etc., with a total of 12 statuses
score	Submission score	Score obtained from test cases
title_ID	Exercise ID	References `title_ID` in the exercise basic information table
method	Language	Programming language used by the learner
memory	Memory	Unit: KB
timeconsume	Time consumed	Unit: milliseconds
student_ID	Learner ID	References `student_ID` in the learner basic information table

"Analysis for Insight: Visual Analysis Challenge of Multivariate Time-series Educational Data"

NorthClass is a renowned higher education training institution offering over 100 courses across a wide range of disciplines, including literature, science, engineering, medicine, economics, and management. With approximately 300,000 registered learners, the institution has created a flexible and convenient learning environment by providing high-quality educational services.

To keep up with the trends of the digital age and enhance its market competitiveness in the technology sector, NorthClass has developed a programming course. Learners are required to complete designated programming tasks during the course, with the opportunity for multiple attempts and submissions to ensure mastery and application of the learned knowledge. At the end of the course, the institution collected learners' time-series learning data to evaluate whether the teaching outcomes met predefined standards and requirements.

To optimize teaching resources and improve the quality of instruction, the institution plans to establish a specialized "Innovative Learning Development Group." This group will explore how to leverage next-generation AI technologies to empower education and better cultivate innovative talent capable of meeting the demands of the modern era.

Visualization and visual analysis utilize the high-bandwidth capabilities of human visual perception to transform complex time-series learning behavior data into graphical representations. These techniques enable the diagnosis and analysis of learners' knowledge mastery levels, dynamic tracking of the evolution of learning behaviors, and identification and analysis of potential factors causing learning difficulties.

As a member of the Innovative Learning Development Group, your task is to design and implement a ...

f
Multivariate analysis on attitudes binary variable (positive versus negative...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Sep 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tinland, Aurlie; Manning, Rachel; Vargas-Moniz, Maria; Ornelas, Jose; Bernad, Roberto; Wolf, Judith; Kallmen, Hakan; Auquier, Pascal; Bokszczanin, Anna; Petit, Junie; Spinnewijn, Freek; Loubiere, Sandrine; Santinello, Massimo (2019). Multivariate analysis on attitudes binary variable (positive versus negative attitudes) (N = 4,670, weighted sample). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000069656
Explore at:
Dataset updated
Sep 25, 2019
Authors
Tinland, Aurlie; Manning, Rachel; Vargas-Moniz, Maria; Ornelas, Jose; Bernad, Roberto; Wolf, Judith; Kallmen, Hakan; Auquier, Pascal; Bokszczanin, Anna; Petit, Junie; Spinnewijn, Freek; Loubiere, Sandrine; Santinello, Massimo
Description
Multivariate analysis on attitudes binary variable (positive versus negative attitudes) (N = 4,670, weighted sample).

Facebook

Twitter

Click to copy link

Link copied

Cite

Ramay, Brooke M.; Cerón, Alejandro; Méndez-Alburez, Luis Pablo; Lou-Meda, Randall (2017). Multivariate analysis for entire sample using logistic regression. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001806787

Multivariate analysis for entire sample using logistic regression.

Explore at:

Dataset updated

Oct 16, 2017

Authors

Ramay, Brooke M.; Cerón, Alejandro; Méndez-Alburez, Luis Pablo; Lou-Meda, Randall

Description

Multivariate analysis for entire sample using logistic regression.

Clear search

Close search

Google apps

Main menu

Multivariate analysis for entire sample using logistic regression.

Component loadings for a previously reported real-life example of a...

Additional file 1 of The multivariate analysis of variance as a powerful...

Small-sample confidence intervals for multivariate impulse response...

Data from: Applying univariate vs. multivariate statistics to investigate...

Air Pollution Forecasting - LSTM Multivariate

THE MISSION

THE CONTENT

Data from: \"Size\" and \"shape\" in the measurement of multivariate...

Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...

MATLAB Scripts to Partition Multivariate Sedimentary Geochemical Data Sets

Accompanying simulated data for "Go multivariate: recommendations on...

Multivariate Time Series Search - Dataset - NASA Open Data Portal

Expanded HR Analytics Data Lab

Context

Content

Acknowledgements

Inspiration

Examples of applying a multivariate Wilson prior to comparative...

Supplement to Multivariate statistical analysis and partitioning of...

Example of data.

CLM - Groundwater Chemistry outputs from multivariate statistics

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...

SPHERE: Students' performance dataset of conceptual understanding,...

Time Series Multivariate Educational Data Analysis

Data Description

Learner Basic Information Table

Exercise Basic Information Table

Learner Submission Behavior Log Information

SubmitRecord-Class1.csv

"Analysis for Insight: Visual Analysis Challenge of Multivariate Time-series Educational Data"

Multivariate analysis on attitudes binary variable (positive versus negative...

Multivariate analysis for entire sample using logistic regression.