56 datasets found
  1. t

    Bivariate Gaussian likelihood example - Dataset - LDM

    • service.tib.eu
    Updated Dec 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Bivariate Gaussian likelihood example - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/bivariate-gaussian-likelihood-example
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    The dataset used in the paper is a bivariate Gaussian likelihood example with uncorrelated priors.

  2. Dataset for: Quantifying The Regression to The Mean Effect in Poisson...

    • wiley.figshare.com
    txt
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manzoor Khan; Jake Olivier (2023). Dataset for: Quantifying The Regression to The Mean Effect in Poisson Processes [Dataset]. http://doi.org/10.6084/m9.figshare.6394475.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    Manzoor Khan; Jake Olivier
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Regression to the mean (RTM) can occur whenever an extreme observation is selected from a population and a later observation is closer to the population mean. A consequence of this phenomenon is that natural variability can be mistaken as real change. Simple expressions are available to quantify RTM when the underlying distribution is bivariate normal. However, there are many real world situations which are better approximated as a Poisson process. Examples include the number of hard disk failures during a year, the number of cargo ships damaged by waves, daily homicide counts in California, and the number of deaths per quarter attributable to AIDS in Australia. In this paper, we derive expressions for quantifying RTM effects for the bivariate Poisson distribution for both the homogeneous and inhomogeneous cases. Statistical properties of our derivations have been evaluated through a simulation study. The asymptotic distributions of RTM estimators have been derived. The RTM effect for the number of people killed in road accidents in different regions of New South Wales (Australia) is estimated using maximum likelihood

  3. p

    Music & Affect 2020 Dataset Study 2.csv

    • psycharchives.org
    Updated Sep 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Music & Affect 2020 Dataset Study 2.csv [Dataset]. https://www.psycharchives.org/handle/20.500.12034/3089
    Explore at:
    Dataset updated
    Sep 17, 2020
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset for: Leipold, B. & Loepthien, T. (2021). Attentive and emotional listening to music: The role of positive and negative affect. Jahrbuch Musikpsychologie, 30. https://doi.org/10.5964/jbdgm.78 In a cross-sectional study associations of global affect with two ways of listening to music โ€“ attentiveโ€“analytical listening (AL) and emotional listening (EL) were examined. More specifically, the degrees to which AL and EL are differentially correlated with positive and negative affect were examined. In Study 1, a sample of 1,291 individuals responded to questionnaires on listening to music, positive affect (PA), and negative affect (NA). We used the PANAS that measures PA and NA as high arousal dimensions. AL was positively correlated with PA, EL with NA. Moderation analyses showed stronger associations between PA and AL when NA was low. Study 2 (499 participants) differentiated between three facets of affect and focused, in addition to PA and NA, on the role of relaxation. Similar to the findings of Study 1, AL was correlated with PA, EL with NA and PA. Moderation analyses indicated that the degree to which PA is associated with an individualยดs tendency to listen to music attentively depends on their degree of relaxation. In addition, the correlation between pleasant activation and EL was stronger for individuals who were more relaxed; for individuals who were less relaxed the correlation between unpleasant activation and EL was stronger. In sum, the results demonstrate not only simple bivariate correlations, but also that the expected associations vary, depending on the different affective states. We argue that the results reflect a dual function of listening to music, which includes emotional regulation and information processing.: Dataset Study 2

  4. f

    An example of combining ANOVA terms for bivariate principle component data...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Skalski, John R.; Townsend, Richard L.; Richins, Shelby M. (2018). An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000666955
    Explore at:
    Dataset updated
    Oct 24, 2018
    Authors
    Skalski, John R.; Townsend, Richard L.; Richins, Shelby M.
    Description

    An example of combining ANOVA terms for bivariate principle component data to create the ANODIS F-statistic where N is the total number of samples drawn and K, the number of assemblages compared.

  5. Air Pollution Forecasting - LSTM Multivariate

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupak Roy/ Bob (2022). Air Pollution Forecasting - LSTM Multivariate [Dataset]. https://www.kaggle.com/datasets/rupakroy/lstm-datasets-multivariate-univariate
    Explore at:
    zip(454764 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Rupak Roy/ Bob
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    THE MISSION

    The story behind the dataset is how to apply LSTM architecture to understand and apply multiple variables together to contribute more accuracy towards forecasting.

    THE CONTENT

    Air Pollution Forecasting The Air Quality dataset.

    This is a dataset that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China.

    The data includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain. The complete feature list in the raw data is as follows:

    No: row number year: year of data in this row month: month of data in this row day: day of data in this row hour: hour of data in this row pm2.5: PM2.5 concentration DEWP: Dew Point TEMP: Temperature PRES: Pressure cbwd: Combined wind direction Iws: Cumulated wind speed Is: Cumulated hours of snow Ir: Cumulated hours of rain We can use this data and frame a forecasting problem where, given the weather conditions and pollution for prior hours, we forecast the pollution at the next hour.

  6. Multivariate Time Series Search - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Multivariate Time Series Search - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/multivariate-time-series-search
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem โ€” (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.

  7. z

    Controlled Anomalies Time Series (CATS) Dataset

    • zenodo.org
    bin
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7646897
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Solenix Engineering GmbH
    Authors
    Patrick Fleith; Patrick Fleith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

    The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

    • Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:
      • 4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.
      • 3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.
      • 10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.
    • 5 million timestamps. Sensors readings are at 1Hz sampling frequency.
      • 1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.
      • 4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).
    • 200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.
    • Different types of anomalies to understand what anomaly types can be detected by different approaches.
    • Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.
    • Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.
    • Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.
    • Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.
    • No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

    [1] Example Benchmark of Anomaly Detection in Time Series: โ€œSebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602โ€

    About Solenix

    Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.

  8. Sport Activity Dataset - MTS-5

    • kaggle.com
    zip
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarno Matarmaa (2023). Sport Activity Dataset - MTS-5 [Dataset]. https://www.kaggle.com/datasets/jarnomatarmaa/sportdata-mts-5
    Explore at:
    zip(498699 bytes)Available download formats
    Dataset updated
    Jul 13, 2023
    Authors
    Jarno Matarmaa
    License

    https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en

    Description

    Description

    Dataset consists of data in categories walking, running, biking, skiing, and roller skiing (5). Sport activities have been recorded by an individual active (non-competitive) athlete. Data is pre-processed, standardized and splitted in four parts (each dimension in its own file): * HR-DATA_std_1140x69 (heart rate signals) * SPD-DATA_std_1140x69 (speed signals) * ALT-DATA_std_1140x69 (altitude signals) * META-DATA_1140x4 (labels and details)

    NOTE: Signal order between the separate files must not be confused when processing the data. Signal order is critical; first index in each of the file comes from the same activity which label corresponds to first index in the target data file, and so on. So, data should be constructed and files combined into the same table while reading the files, ideally using nested data structure. Something like in the picture below:

    You may check the related TSC projects in GitHub: - "https://github.com/JABE22/MasterProject">Sport Activity Classification Using Classical Machine Learning and Time Series Methods - Symbolic Representation of Multivariate Time Series Signals in Sport Activity Classification - Kaggle Project

    https://mediauploads.data.world/e1ccd4d36522e04c0061d12d05a87407bec80716f6fe7301991eaaccd577baa8_mts_data.png" alt="Nested data structure for multivariate time series classifiers">

    In the following picture one can see five signal samples for each dimension (Heart Rate, Speed, Altitude) in standard feature value format. So, each figure contains signal from five different random activities (can be same or different category). However, for example, signal indexes number 1 in each three figure are from the same activity. Figures just visualizes what kind of signals dataset consists. They do not have any particular meaning.

    https://mediauploads.data.world/162b7086448d8dbd202d282014bcf12bd95bd3174b41c770aa1044bab22ad655_signal_samples.png" alt="Signals from sport activities (Heart Rate, Speed, and Altitude)">

    Dataset size and construction procedure

    The original amount of sport activities is 228. From each of them, starting from the index 100 (seconds), have been picked 5 x 69 second consecutive segments, that is expressed as a formula below:

    https://mediauploads.data.world/68ce83092ec65f6fbaee90e5de6e12df40498e08fa6725c111f1205835c1a842_segment_equation.png" alt="Data segmentation and augmentation formula">

    where ๐ท = ๐‘œ๐‘Ÿ๐‘–๐‘”๐‘–๐‘›๐‘Ž๐‘™ ๐‘“๐‘–๐‘™๐‘ก๐‘’๐‘Ÿ๐‘’๐‘‘ ๐‘‘๐‘Ž๐‘ก๐‘Ž ,๐‘ = ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘Ž๐‘๐‘ก๐‘–๐‘ฃ๐‘–๐‘ก๐‘–๐‘’๐‘  , ๐‘  = ๐‘ ๐‘’๐‘”๐‘š๐‘’๐‘›๐‘ก ๐‘ ๐‘ก๐‘Ž๐‘Ÿ๐‘ก ๐‘–๐‘›๐‘‘๐‘’๐‘ฅ , ๐‘™ = ๐‘ ๐‘’๐‘”๐‘š๐‘’๐‘›๐‘ก ๐‘™๐‘’๐‘›๐‘”๐‘กโ„Ž, and ๐‘› = ๐‘กโ„Ž๐‘’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘ ๐‘’๐‘”๐‘š๐‘’๐‘›๐‘ก๐‘  from a single original sequence ๐ท๐‘– , resulting the new set of equal length segments ๐ท๐‘ ๐‘’๐‘”. And in this certain case the equation takes the form of:

    https://mediauploads.data.world/63dd87bf3d0010923ad05a8286224526e241b17bbbce790133030d8e73f3d3a7_data_segmentation_formula.png" alt="Data segmentation and augmentation formula with values">

    Thus, dataset has dimesions of 1140 x 69 x 3.

    Additional information

    Data has been recorded without knowing it will be used in research, therefore it represents well real-world application of data source and can provide excellent tool to test algorithms in real data.

    Recording devices

    Data has been recorded using two type of Garmin devices. Models are Forerunner 920XT and vivosport. Vivosport is activity tracker and measures heart rate from the wrist using optical sensor, whereas 920XT requires external sensor belt (hear rate + inertial) installed under chest when doing exercises. Otherwise devices are not essentially different, they uses GPS location to measure speed and inertial barometer to measure elevation changes.

    Device manuals - Garmin FR-920XT - Garmin Vivosport

    Person profile

    Age: 30-31, Weight: 82, Length: 181, Active athlete (non-competitive)

  9. Dataset for: Spatio-temporal multivariate mixture models for Bayesian model...

    • wiley.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Lawson; Rachel Carroll; Christel Faes; Russell S Kirby; Mehreteab Aregay; Kevin Watjou (2023). Dataset for: Spatio-temporal multivariate mixture models for Bayesian model selection in disease mapping [Dataset]. http://doi.org/10.6084/m9.figshare.5284423.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    Andrew Lawson; Rachel Carroll; Christel Faes; Russell S Kirby; Mehreteab Aregay; Kevin Watjou
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    It is often the case that researchers wish to simultaneously explore the behavior of multiple diseases while accounting for potential spatial and/or temporal correlation. In this paper, we propose a flexible class of multivariate spatio-temporal mixture models to fill this role. Further, these models offer flexibility with the potential for model selection as well as the ability to accommodate lifestyle, socio-economic, and physical environmental variables with spatial, temporal, or both structures. Here, we explore the capability of this approach via a large scale simulation study and examine a real data example. The results which are focused on four model variants suggest that all models possess the ability to recover simulation ground truth and display improved model fit over two baseline Knorr-Held spatio-temporal interaction model variants in a real data application.

  10. m

    MASEM Dataset on Educational AI Technology Adoption among Students๏ผˆfrom 2020...

    • data.mendeley.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Researcher 1 (2025). MASEM Dataset on Educational AI Technology Adoption among Students๏ผˆfrom 2020 to June 2025๏ผ‰. [Dataset]. http://doi.org/10.17632/t8ns6fdky2.5
    Explore at:
    Dataset updated
    Oct 15, 2025
    Authors
    Researcher 1
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports a meta-analytic structural equation modelling (MASEM) study investigating the factors influencing studentsโ€™ behavioural intention to use educational AI (EAI) technologies. The research integrates constructs from the Technology Acceptance Model (TAM), Theory of Planned Behaviour (TPB), and Artificial Intelligence Literacy (AIL), aiming to resolve inconsistencies in previous studies and improve theoretical understanding of EAI technology adoption.

    Research Hypotheses The study hypothesized that: Studentsโ€™ behavioural intention (INT) to use EAI technologies is influenced by perceived usefulness (PU), perceived ease of use (PEU), attitude (ATT), subjective norm (SN), and perceived behavioural control (PBC), as described in TAM and TPB. AI literacy (AIL) directly and indirectly predicts PU, PEU, ATT, and INT. These relationships are moderated by contextual factors such as academic level (Kโ€“12 vs. higher education) and regional economic development (developed vs. developing countries).

    What the Data Shows The meta-analytic dataset comprises 166 empirical studies involving over 69,000 participants. It includes pairwise Pearson correlations among seven constructs (PU, PEU, ATT, SN, PBC, INT, AIL) and is used to compute a pooled correlation matrix. This matrix was then used to test three models via MASEM: A baseline TAM-TPB model, An internal-extended model with additional TPB internal paths, An AIL-integrated extended model. The AIL-integrated model achieved the best fit (CFI = 0.997, RMSEA = 0.053) and explained 62.3% of the variance in behavioural intention.

    Notable Findings AI literacy (AIL) is the strongest predictor of intention to use EAI technologies (Total Effect = 0.408). PU, ATT, and SN also significantly influence intention. The effect of PEU on intention is fully mediated by PU and ATT. Moderation analysis showed that the relationships differ between developed and developing countries and between Kโ€“12 and higher education populations.

    How the Data Can Be Interpreted and Used The dataset includes bivariate correlations between variables, publication metadata, sample sizes, coding information, and reliability values (e.g., CR scores). Suitable for replication of MASEM procedures, moderation analysis, and meta-regression. Researchers may use it to test additional theoretical models or assess the influence of new moderators (e.g., AI tool type). Educators and policymakers can leverage insights from the meta-analytic results to inform AI literacy training and technology adoption strategies.

  11. m

    SPHERE: Students' performance dataset of conceptual understanding,...

    • data.mendeley.com
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    Purwoko Haryadi Santoso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of studentsโ€™ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and studentsโ€™ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of studentsโ€™ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the studentsโ€™ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a studentsโ€™ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.

  12. i

    MATLAB Scripts to Partition Multivariate Sedimentary Geochemical Data Sets

    • get.iedadata.org
    xml
    Updated 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murray, Richard; Pisias, Nicklas (2012). MATLAB Scripts to Partition Multivariate Sedimentary Geochemical Data Sets [Dataset]. http://doi.org/10.1594/IEDA/100047
    Explore at:
    xmlAvailable download formats
    Dataset updated
    2012
    Authors
    Murray, Richard; Pisias, Nicklas
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    Abstract: This contribution provides MATLAB scripts to assist users in factor analysis, constrained least squares regression, and total inversion techniques. These scripts respond to the increased availability of large datasets generated by modern instrumentation, for example, the SedDB database. The download (.zip) includes one descriptive paper (.pdf) and one file of the scripts and example output (.doc). Other Description: Pisias, N. G., R. W. Murray, and R. P. Scudder (2013), Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts, Geochem. Geophys. Geosyst., 14, 4015โ€“4020, doi:10.1002/ggge.20247.

  13. Example of data.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel (2023). Example of data. [Dataset]. http://doi.org/10.1371/journal.pone.0159649.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Eric Houngla Adjakossa; Ibrahim Sadissou; Mahouton Norbert Hounkonnou; Gregory Nuel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example of data.

  14. S

    Data from: A 1 km monthly dataset of historical and future climate changes...

    • scidb.cn
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaofei Hu; Shaolin Shi; Borui Zhou; Jian Ni (2024). A 1 km monthly dataset of historical and future climate changes over China [Dataset]. http://doi.org/10.57760/sciencedb.13546
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Xiaofei Hu; Shaolin Shi; Borui Zhou; Jian Ni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset provides 30-year averaged climate data for both historical and future periods, with a spatial resolution of 0.01ยฐ ร— 0.01ยฐ. Historical data (1991โ€“2020) are based on the China Surface Climate Standard Dataset and were interpolated using ANUSPLIN software. Future climate data are derived from CMIP6 simulations, bias-corrected using the Delta downscaling method. The dataset includes 10 models (9 Global Climate Models, namely, GCMs, and 1 ensemble model), 3 scenarios (SSP1-2.6, SSP2-4.5, and SSP5-8.5), and 3 future periods (2021โ€“2040, 2041โ€“2070, 2071โ€“2100). For each period (or scenario), 28 climate variables are provided, including: 5 monthly basic climate variables (mean temperature, maximum temperature, minimum temperature, precipitation, and percentage of sunshine), and 23 bioclimatic variables based on the basic variables (for details, see the dataset documentation file).The data quality was strictly evaluated. The ANUSPLIN interpolated historical data showed a strong correlation with observations (all correlation coefficients above 0.91). The historical interpolations generated by the ANUSPLIIN software showed a good fit (above 0.91) with observations. The bias correction improved the accuracy of most GCM original simulations, reducing the bias by 0.69%โ€“58.63%. This dataset aims to provide high-resolution, bias-corrected long-term historical and future climate data for climate and ecological research. All computations were performed using R, and the corresponding code can be found in the dataset folder: โ€œCodeโ€.All data are provided in GeoTIFF (.tif) format, where each file for the basic climate variables contains 12 bands, representing monthly data in ascending order (e.g., Band 1 corresponds to January). To facilitate data storage, all files are provided in compressed archives, following a consistent naming convention:(1) Historical data: China_Variable_1km_1991โ€“2020.tifWhere, Variable represents the abbreviation of the 28 climate variables.Example: China_pr_1km_1991โ€“2020.tif.(2) Future data: China_Variable_Model_VariantLabel_1km_StartYear-EndYear_Scenario.tifWhere, Variable is the 28 climate variables; Model is the GCM name; VariantLabel is r1i1p1f1 in this study; StartYear-EndYear is the future period; Scenario is the SSP climate scenarioExample: China_tasmin_MRI-ESM2-0_r1i1p1f1_1km_2071โ€“2100_SSP585.tif.

  15. d

    Power, voltage, frequency and temperature dataset from Mesa Del Sol...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adnan Bashir; Christopher Leap; Ansel Blumenthal; Trilce Estrada; Ali Bidram; Manel Martinez-Ramon; Mueen Abdullah (2025). Power, voltage, frequency and temperature dataset from Mesa Del Sol microgrid [Dataset]. http://doi.org/10.5061/dryad.fqz612jzb
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Adnan Bashir; Christopher Leap; Ansel Blumenthal; Trilce Estrada; Ali Bidram; Manel Martinez-Ramon; Mueen Abdullah
    Time period covered
    Jan 1, 2023
    Description

    Microgrids are small, self-contained power grids that can operate independently of the main grid. They are becoming increasingly popular as a way to improve the reliability and resilience of the power grid. This paper presents a dataset of power data collected from Mesa Del Sol microgrid located in Albuquerque, New Mexico. The dataset includes measurements of voltage, current, power, and energy for microgrid's components. This dataset contains 18 features and was collected over the past 13 months. The dataset is valuable for machine learning applications that can be used to improve the operation and management of microgrids. For example, the data could be used to train machine learning models to predict power outages or to optimise the microgrid's energy consumption., , , # MDS, a multivariate microgrid dataset

    https://doi.org/10.5061/dryad.fqz612jzb

    This dataset contains power data collected from Mesa Del Sol microgrid located in Albuquerque, New Mexico. The dataset includes measurements of voltage, power, frequency and temperature of different sensors and devices installed at the microgrid. This dataset contains 17 features and was collected over the past 13 months. The dataset is valuable for machine learning applications that can be used to improve the operation and management of microgrids. For example, the data could be used to train machine learning models to predict power outages or to optimise the microgrid's energy consumption.

    Description of the data and file structure

    Dataset is divided into monthly CSV files. Total number of csv files : 15 (1 file per month) Data resolution: 10 Seconds Total number of features: 17 Start date of data collection: 1st May, 2022 End date : 31st July, 2023

    Each ...

  16. Z

    Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...

    • data-staging.niaid.nih.gov
    Updated Mar 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mildiner Moraga, Sebastian; Aarts, Emmeke (2022). Accompanying simulated data for "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_6384006
    Explore at:
    Dataset updated
    Mar 25, 2022
    Dataset provided by
    Utrecht University
    Authors
    Mildiner Moraga, Sebastian; Aarts, Emmeke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

    This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

  17. IRIS-FLOWER-DATASETS

    • kaggle.com
    zip
    Updated Dec 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simran Sharma (2022). IRIS-FLOWER-DATASETS [Dataset]. https://www.kaggle.com/datasets/sims22/irisflowerdatasets/discussion?sort=undefined
    Explore at:
    zip(2025 bytes)Available download formats
    Dataset updated
    Dec 26, 2022
    Authors
    Simran Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Abstract

    (from wikipedia)

    The Iris flower data set or Fisher's Iris data set is a multivariate data set used and made famous by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspรฉ Peninsula "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus".

    The Datasets

    The dataset IRIS.CSV consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

    The dataset IRIS1.CSV is a modified version of IRIS.CSV, containing missing values.

    Acknowledgements

    The dataset, IRIS.CSV, is free and is publicly available at the UCI Machine Learning Repository.

  18. Z

    Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Mar 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aarts, Emmeke (2022). Accompanying simulated data for "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity" [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_6384006
    Explore at:
    Dataset updated
    Mar 25, 2022
    Dataset provided by
    Mildiner Moraga, Sebastian
    Aarts, Emmeke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

    This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

  19. Component loadings for a previously reported real-life example of a...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alfred Ultsch; Jรถrn Lรถtsch (2023). Component loadings for a previously reported real-life example of a principal component analysis performed on the intercorrelation matrix among eight pain threshold measurements ([3]; for comparison, see Table 2 in that publication). [Dataset]. http://doi.org/10.1371/journal.pone.0129767.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alfred Ultsch; Jรถrn Lรถtsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The relevant four principal components (PCs) are given in bold font. Without the present method, only PCs #1 - #3 with eigenvalues > 1 [11,12] could be validly retained. The set of three principal allowed to show that all different pain measures shared an important common source of variance (PC1) pain evoked by cold stimuli, with or without sensitization by topical menthol application, by blunt pressure or by electrical stimuli (5 Hz sine waves) shared a common source of variance (PC2), and a further common source of variance e was shared by pain evoked by heat stimuli, with or without sensitization by topical capsaicin application, or by punctate mechanical pressure. However, with applying the here reported method, PC4 can now be also be retained, which singles out heat pain corresponding to the different pathophysiology underlying heat perception.Component loadings for a previously reported real-life example of a principal component analysis performed on the intercorrelation matrix among eight pain threshold measurements ([3]; for comparison, see Table 2 in that publication).

  20. Accompanying simulated data for "Go multivariate: recommendations on...

    • zenodo.org
    zip
    Updated Mar 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts (2022). Accompanying simulated data for "Go multivariate: recommendations on multilevel hidden Markov models with categorical data of varying complexity" [Dataset]. http://doi.org/10.5281/zenodo.6385197
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 26, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

    This repository contains data generated for the manuscript: "Go multivariate: recommendations on multilevel hidden Markov models with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Bivariate Gaussian likelihood example - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/bivariate-gaussian-likelihood-example

Bivariate Gaussian likelihood example - Dataset - LDM

Explore at:
Dataset updated
Dec 3, 2024
Description

The dataset used in the paper is a bivariate Gaussian likelihood example with uncorrelated priors.

Search
Clear search
Close search
Google apps
Main menu