Study data of comparative study of visualizations for multiple time series including anonymized participant data from Prolific, data set generation scripts, source code for the study framework, and analysis scripts. This repository also serves as supplemental material for the publication titled "A Comparative Study of Visualizations for Multiple Time Series", presented at IVAPP 2022. The goal of the study was to get insight about how well three visualization techniques for multiple time series (line charts, stream graphs, and aligned area charts) can be understood to solve three basic tasks: Deciding which time series has the highest value at a time, deciding which time series has the highest value over all time steps (area under the graph), and deciding at which of two time points the sum of all time series is the largest. The study was performed online on the Prolific platform with 51 participants. Each participant was shown at least 108 stimuli. Measured data for each participant is mainly which stimuli they gave the correct answer to, and how long they took. For more information about the data, please consult the paper and the README.txt.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TimeSpec4LULC is a smart open-source global dataset of multi-spectral time series for 29 Land Use and Land Cover (LULC) classes ready to train machine learning models. It was built based on the seven spectral bands of the MODIS sensors at 500 m resolution from 2000 to 2021 (262 observations in each time series). Then, was annotated using spatial-temporal agreement across the 15 global LULC products available in Google Earth Engine (GEE).
TimeSpec4LULC contains two datasets: the original dataset distributed over 6,076,531 pixels, and the balanced subset of the original dataset distributed over 29000 pixels.
The original dataset contains 30 folders, namely "Metadata", and 29 folders corresponding to the 29 LULC classes. The folder "Metadata" holds 29 different CSV files describing the metadata of the 29 LULC classes. The remaining 29 folders contain the time series data for the 29 LULC classes. Each folder holds 262 CSV files corresponding to the 262 months. Inside each CSV file, we provide the seven values of the spectral bands as well as the coordinates for all the LULC class-related pixels.
The balanced subset of the original dataset contains the metadata and the time series data for 1000 pixels per class representative of the globe. It holds 29 different JSON files following the names of the 29 LULC classes.
The features of the dataset are:
".geo": the geometry and coordinates (longitude and latitude) of the pixel center.
"ADM0_Code": the GAUL country code.
"ADM1_Code": the GAUL first-level administrative unit code.
GHM_Index": the average of the global human modification index.
"Products_Agreement_Percentage": the agreement percentage over the 15 global LULC products available in GEE.
"Temporal_Availability_Percentage": the percentage of non-missing values in each band.
"Pixel_TS": the time series values of the seven spectral bands.
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem — (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistical comparison of multiple time series in their underlying frequency patterns has many real applications. However, existing methods are only applicable to a small number of mutually independent time series, and empirical results for dependent time series are only limited to comparing two time series. We propose scalable methods based on a new algorithm that enables us to compare the spectral density of a large number of time series. The new algorithm helps us efficiently obtain all pairwise feature differences in frequency patterns between M time series, which plays an essential role in our methods. When all M time series are independent of each other, we derive the joint asymptotic distribution of their pairwise feature differences. The asymptotic dependence structure between the feature differences motivates our proposed test for multiple mutually independent time series. We then adapt this test to the case of multiple dependent time series by partially accounting for the underlying dependence structure. Additionally, we introduce a global test to further enhance the approach. To examine the finite sample performance of our proposed methods, we conduct simulation studies. The new approaches demonstrate the ability to compare a large number of time series, whether independent or dependent, while exhibiting competitive power. Finally, we apply our methods to compare multiple mechanical vibrational time series.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All results of the primary interrupted time-series results evaluating targeted and total border closures that met the following criteria: 1) at least seven days of data is available before and after the intervention point, 2) for multiple intervention time series, at least seven days has passed since the last intervention point, and 3) for multiple sequential targeted border closures, the second (or third) intervention is observed to indicate an increase of at least 20% of the world’s population being targeted by the new border closures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset serves as supplementary material to the fully reproducible paper entitled "Comparison of stochastic and machine learning methods for multi-step ahead forecasting of hydrological processes". We provide the R codes and their outcomes. We also provide the reports entitled “Definitions of the stochastic processes’’, “Definitions of the forecast quality metrics’’ and “Selected figures for the qualitative comparison of the forecasting methods’’. The former version of this dataset is available in the provided link.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AcTBeCalf Dataset Description
The AcTBeCalf dataset is a comprehensive dataset designed to support the classification of pre-weaned calf behaviors from accelerometer data. It contains detailed accelerometer readings aligned with annotated behaviors, providing a valuable resource for research in multivariate time-series classification and animal behavior analysis. The dataset includes accelerometer data collected from 30 pre-weaned Holstein Friesian and Jersey calves, housed in group pens at the Teagasc Moorepark Research Farm, Ireland. Each calf was equipped with a 3D accelerometer sensor (AX3, Axivity Ltd, Newcastle, UK) sampling at 25 Hz and attached to a neck collar from one week of birth over 13 weeks.
This dataset encompasses 27.4 hours of accelerometer data aligned with calf behaviors, including both prominent behaviors like lying, standing, and running, as well as less frequent behaviors such as grooming, social interaction, and abnormal behaviors.
The dataset consists of a single CSV file with the following columns:
dateTime: Timestamp of the accelerometer reading, sampled at 25 Hz.
calfid: Identification number of the calf (1-30).
accX: Accelerometer reading for the X axis (top-bottom direction)*.
accY: Accelerometer reading for the Y axis (backward-forward direction)*.
accZ: Accelerometer reading for the Z axis (left-right direction)*.
behavior: Annotated behavior based on an ethogram of 23 behaviors.
segId: Segment identification number associated with each accelerometer reading/row, representing all readings of the same behavior segment.
Code Files Description
The dataset is accompanied by several code files to facilitate the preprocessing and analysis of the accelerometer data and to support the development and evaluation of machine learning models. The main code files included in the dataset repository are:
accelerometer_time_correction.ipynb: This script corrects the accelerometer time drift, ensuring the alignment of the accelerometer data with the reference time.
shake_pattern_detector.py: This script includes an algorithm to detect shake patterns in the accelerometer signal for aligning the accelerometer time series with reference times.
aligning_accelerometer_data_with_annotations.ipynb: This notebook aligns the accelerometer time series with the annotated behaviors based on timestamps.
manual_inspection_ts_validation.ipynb: This notebook provides a manual inspection process for ensuring the accurate alignment of the accelerometer data with the annotated behaviors.
additional_ts_generation.ipynb: This notebook generates additional time-series data from the original X, Y, and Z accelerometer readings, including Magnitude, ODBA (Overall Dynamic Body Acceleration), VeDBA (Vectorial Dynamic Body Acceleration), pitch, and roll.
genSplit.py: This script provides the logic used for the generalized subject separation for machine learning model training, validation and testing.
active_inactive_classification.ipynb: This notebook details the process of classifying behaviors into active and inactive categories using a RandomForest model, achieving a balanced accuracy of 92%.
four_behv_classification.ipynb: This notebook employs the mini-ROCKET feature derivation mechanism and a RidgeClassifierCV to classify behaviors into four categories: drinking milk, lying, running, and other, achieving a balanced accuracy of 84%.
Kindly cite one of the following papers when using this data:
Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models. arXiv preprint arXiv:2404.18159.
Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Development of a digital tool for monitoring the behaviour of pre-weaned calves using accelerometer neck-collars. arXiv preprint arXiv:2406.17352
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets created by Single Flow Time Series Analysis
Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:
J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.
This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf
In the following table is a description of each dataset file:
File name Detection problem Citation of original raw dataset
botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv Binary detection of DoH
Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
vpn_vnat_multiclass.csv Multi-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
The Global Monthly and Seasonal Urban and Land Backscatter Time Series, 1993-2020, is a multi-sensor, multi-decadal, data set of global microwave backscatter, for 1993 to 2020. It assembles data from C-band sensors onboard the European Remote Sensing Satellites (ERS-1 and ERS-2) covering 1993-2000, Advanced Scatterometer (ASCAT) onboard EUMETSAT satellites for 2007-2020, and the Ku-band sensor onboard the QuikSCAT satellite for 1999-2009, onto a common spatial grid (0.05 degree latitude /longitude resolution) and time step (both monthly and seasonal). Data are provided for all land (except high latitudes and islands), and for urban grid cells, based on a specific masking that removes grid cells with > 50% open water or < 20% built land. The all-land data allows users to choose and evaluate other urban masks. There is an offset between C-band and Ku-band backscatter from both vegetated and urban surfaces that is not spatially constant. There is a strong linear correlation (overall R-squared value = 0.69) between 2015 ASCAT urban backscatter and a continental-scale gridded product of building volume, across 8,450 urban grid cells (0.05 degree resolution) from large cities in Europe, China, and the United States.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains along track geo-referenced Sea Surface Height Anomalies (SSHA) from TOPEX/Poseidon, Jason-1, OSTM/Jason-2 and Jason-3 (depending on time period) merged onto a single mean reference orbit. All biases and cross-calibrations have been applied to the data so SSHA are consistent between satellites to form a single climate data record. Altimeter data from the multi-mission Geophysical Data Records (GDRs) are interpolated to a common reference orbit facilitating direct time series analysis of the geo-referenced SSH. However this product does not use the TOPEX internal calibration-mode range correction. This is the main difference between version 4 and version 4.2. More information on this calibration can be found at Beckley et al. 2017 DOI 10.1002/2017JC013090. The data are in netCDF format . The data start at September 1992. The newest data are appended to the file quarterly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Index Time Series for Capital Group U.S. Multi-Sector Income ETF. The frequency of the observation is daily. Moving average series are also typically included. The fund normally invests at least 80% of its assets in the securities of issuers domiciled within the United States. The fund invests primarily in bonds and other debt instruments, which may be represented by derivatives. In seeking to achieve a high level of current income, the fund invests in a broad range of debt securities across the credit spectrum. The fund may invest in debt securities of any maturity or duration. The fund is non-diversified.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
The electric grid is a key enabling infrastructure for the ambitious transition towards carbon neutrality as we grapple with climate change. With deepening penetration of renewable energy resources and electrified transportation, the reliable and secure operation of the electric grid becomes increasingly challenging. In this paper, we present PSML, a first-of-its-kind open-access multi-scale time-series dataset, to aid in the development of data-driven machine learning (ML) based approaches towards reliable operation of future electric grids. The dataset is generated through a novel transmission + distribution (T+D) co-simulation designed to capture the increasingly important interactions and uncertainties of the grid dynamics, containing electric load, renewable generation, weather, voltage and current measurements at multiple spatio-temporal scales. Using PSML, we provide state-of-the-art ML baselines on three challenging use cases of critical importance to achieve: (i) early detection, accurate classification and localization of dynamic disturbance events; (ii) robust hierarchical forecasting of load and renewable energy with the presence of uncertainties and extreme events; and (iii) realistic synthetic generation of physical-law-constrained measurement time series. We envision that this dataset will enable advances for ML in dynamic systems, while simultaneously allowing ML researchers to contribute towards carbon-neutral electricity and mobility.
Data Navigation
Please download, unzip and put somewhere for later benchmark results reproduction and data loading and performance evaluation for proposed methods.
wget https://zenodo.org/record/5130612/files/PSML.zip?download=1 7z x 'PSML.zip?download=1' -o./
Minute-level Load and Renewable
File Name
ISO_zone_#.csv: CAISO_zone_1.csv
contains minute-level load, renewable and weather data from 2018 to 2020 in the zone 1 of CAISO.
Field time
: Time of minute resolution.
Field load_power
: Normalized load power.
Field wind_power
: Normalized wind turbine power.
Field solar_power
: Normalized solar PV power.
Field DHI
: Direct normal irradiance.
Field DNI
: Diffuse horizontal irradiance.
Field GHI
: Global horizontal irradiance.
Field Dew Point
: Dew point in degree Celsius.
Field Solar Zeinth Angle
: The angle between the sun's rays and the vertical direction in degree.
Field Wind Speed
: Wind speed (m/s).
Field Relative Humidity
: Relative humidity (%).
Field Temperature
: Temperature in degree Celsius.
Minute-level PMU Measurements
File Name
case #: The case 0
folder contains all data of scenario setting #0.
pf_input_#.txt: Selected load, renewable and solar generation for the simulation.
pf_result_#.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
Filed Description
Field time
: Time of minute resolution.
Field Vm_###
: Voltage magnitude (p.u.) at the bus ### in the simulated model.
Field Va_###
: Voltage angle (rad) at the bus ### in the simulated model.
Field P_#_#_#
: P_3_4_1
means the active power transferring in the #1 branch from the bus 3 to 4.
Field Q_#_#_#
: Q_5_20_1
means the reactive power transferring in the #1 branch from the bus 5 to 20.
Millisecond-level PMU Measurements
File Name
Forced Oscillation: The folder contains all forced oscillation cases.
row_#: The folder contains all data of the disturbance scenario #.
dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.
info.csv: This file contains the start time, end time, location and type of the disturbance
trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
Natural Oscillation: The folder contains all natural oscillation cases.
row_#: The folder contains all data of the disturbance scenario #.
dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.
info.csv: This file contains the start time, end time, location and type of the disturbance.
trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
Filed Description
trans.csv
Field Time(s)
: Time of millisecond resolution.
Field VOLT ###
: Voltage magnitude (p.u.) at the bus ### in the transmission model.
Field POWR ### TO ### CKT #
: POWR 151 TO 152 CKT '1 '
means the active power transferring in the #1 branch from the bus 151 to 152.
Field VARS ### TO ### CKT #
: VARS 151 TO 152 CKT '1 '
means the reactive power transferring in the #1 branch from the bus 151 to 152.
dist.csv
Field Time(s)
: Time of millisecond resolution.
Field ####.###.#
: 3005.633.1
means per-unit voltage magnitude of the phase A at the bus 633 of the distribution grid, the one connecting to the bus 3005 in the transmission system.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Index Time Series for Multi Units Luxembourg - Lyxor Daily LevDAX UCITS ETF. The frequency of the observation is daily. Moving average series are also typically included. NA
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a multi-temporal, multi-modal remote-sensing dataset for predicting how active wildfires will spread at a resolution of 24 hours. The dataset consists of 13.607 images across 607 fire events in the United States from January 2018 to October 2021. For each fire event, the dataset contains a full time series of daily observations, containing detected active fires and variables related to fuel, topography and weather conditions. Documentation WildfireSpreadTS_Documentation.pdf includes further details about the dataset, following Gebru et al.'s "Datasheets for Datasets" framework. This documentation is similar to the supplementary material of the associated NeurIPS paper, excluding only information about experimental setup and results. For full details, please refer to the associated paper. Code: Getting started Get started working with the dataset at https://github.com/SebastianGer/WildfireSpreadTS. The code includes a PyTorch Dataset and Lightning DataModule to allow for easy access. We recommend converting the GeoTIFF files provided here to HDF5 files (bigger files, but much faster). The necessary code is also available in the repository.
This work is funded by Digital Futures in the project EO-AI4GlobalChange. The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at C3SE partially funded by the Swedish Research Council through grant agreement no. 2022-06725.
The Integrated Multi-Mission Ocean Altimeter Sea Surface Height (SSH) Version 5.2 dataset provides level 2 along track sea surface height anomalies (SSHA) from the TOPEX/Poseidon, Jason-1, OSTM/Jason-2, Jason-3, and Sentinel-6A missions geo-referenced to a mean reference orbit. It is produced by NASA Sea Surface Height (NASA-SSH) project investigators at Goddard Space Flight Center and Jet Propulsion Laboratory with support from NASA’s Physical Oceanography program, and was developed originally as an Earth System Data Record (ESDR) under the Making Earth System Data Records for Use in Research Environments (MEaSUREs) program, which supported forward processing and incremental refinements through version 5.1 (released in April 2022).Geophysical Data Records (GDRs) from each altimetry mission were interpolated to a common reference orbit with biases and cross-calibrations applied so that the derived SSHA are consistent between satellites to form a single homogeneous climate data record. The entire multi-mission data record covers the period from September 1992 to present; it is extended to include new observations approximately once each quarter. The previous release (version 5.1) integrated Jason-3 data and applied revised internal tides and pole tide across missions (GDR_F standard). The current release (version 5.2) includes the following revisions: a) GSFC std2006_cs21 orbit for all missions, b) GOT5.1 ocean tide model, c) TOPEX/Poseidon GDR_F data, d) Sentinel-6 LR version F08 data, e) Jason-3 re-calibrated radiometer wet troposphere correction. More information about the data content and derivation can be found in the v5.2 User’s Handbook (https://doi.org/10.5067/ALTUG-TJ152).Please note that this collection is the same data as https://doi.org/10.5067/ALTCY-TJA52 but with all cycles included in one netCDF file.
The dataset contains hourly Anthropogenic heat (AH) from buildings in Los Angeles County, based on weather data from 2018. The hourly AH is aggregated at three spatial resolutions: 450m x 450m grid, 12km x 12km grid, and census tract. The AH is broken down into three components: building envelope surface convection, heating, ventilation, and air conditioning (HVAC) system heat release, and zone exfiltration and exhaust air heat loss. The dataset is created with the physics-based EnergyPlus building energy models to calculate individual buildings' AH considering WRF-UCM simulated microclimate conditions. Please refer to the paper "A multi-scale time-series dataset of anthropogenic heat from buildings in Los Angeles County" for more information about the data generation workflow and the data validation procedure. The data set contains two folders: the "output_data" folder holds the simulation results (EP_output and EP_output_csv), building metadata (building_metadata.geojson and building_metadata.csv), aggregated heat emission and energy consumption time-series data (hourly_heat_energy), and geographical data (geo_data) associated with the GEOID referenced in heat and energy consumption data. The "input_data" folder contains the raw data used to generate files in the "output_data" folder as well as data sets used in the validation. The code repository (https://github.com/IMMM-SFA/xu_etal_2022_sdata) holds the processing scripts for data curation, validation, and visualization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Preprocessed AL data used in paper "Multi variables time series information bottleneck" with the GitHub code
This dataset is created from a public available dataset of solar power data collected in Alabama by NREL.
The npz file is a numpy (np) compressed data and can be loaded using np.load with allow_pickle=True
Loaded data is then a python dict described bellow.
Each sample 'data' is a np.ndarray with 2 dimensions: time (various length) and wavelength (length=137 representing 137 solar plants ordered like in NREL).
Each sample is given a 'position' which is a list of length 4:
position[1] is a string that gives the name of the event
position[4] is a boolean vector that gives the time positionsof the corresponding sample in the original sequence of public IRIS level2 data
Data file info :
Type: .npz
Size: 34.48MB
*** Key: 'data_TR_AL'
ndarray data of length 161
containing np.ndarray of shapes ['various', 137]
*** Key: 'data_VAL_AL'
ndarray data of length 11
containing np.ndarray of shapes ['various', 137]
*** Key: 'data_TE_AL'
ndarray data of length 57
containing np.ndarray of shapes ['various', 137]
*** Key: 'data_TR'
ndarray data of length 161
containing np.ndarray of shapes ['various', 137]
*** Key: 'data_VAL'
ndarray data of length 11
containing np.ndarray of shapes ['various', 137]
*** Key: 'data_TE'
ndarray data of length 57
containing np.ndarray of shapes ['various', 137]
*** Key: 'position_TR_AL'
ndarray data of length 161
containing ndarray data of length 4
containing mix of types {'str', 'ndarray', 'int'}
*** Key: 'position_VAL_AL'
ndarray data of length 11
containing ndarray data of length 4
containing mix of types {'str', 'ndarray', 'int'}
*** Key: 'position_TE_AL'
ndarray data of length 57
containing ndarray data of length 4
containing mix of types {'str', 'ndarray', 'int'}
*** Key: 'position_TR'
ndarray data of length 161
containing ndarray data of length 4
containing mix of types {'str', 'ndarray', 'int'}
*** Key: 'position_VAL'
ndarray data of length 11
containing ndarray data of length 4
containing mix of types {'str', 'ndarray', 'int'}
*** Key: 'position_TE'
ndarray data of length 57
containing ndarray data of length 4
containing mix of types {'str', 'ndarray', 'int'}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
The application of machine learning has become commonplace for problems in modern data science. The democratization of the decision process when choosing a machine learning algorithm has also received considerable attention through the use of meta features and automated machine learning for both classification and regression type problems. However, this is not the case for multistep-ahead time series problems. Time series models generally rely upon the series itself to make future predictions, as opposed to independent features used in regression and classification problems. The structure of a time series is generally described by features such as trend, seasonality, cyclicality, and irregularity. In this research, we demonstrate how time series metrics for these features, in conjunction with an ensemble based regression learner, were used to predict the standardized mean square error of candidate time series prediction models. These experiments used datasets that cover a wide feature space and enable researchers to select the single best performing model or the top N performing models. A robust evaluation was carried out to test the learner's performance on both synthetic and real time series.
Proposed Dataset
The dataset proposed here gives the results for 20 step ahead predictions for eight Machine Learning/Multi-step ahead prediction strategies for 5,842 time series datasets outlined here. It was used as the training data for the Meta Learners in this research. The meta features used are columns C to AE. Columns AH outlines the method/strategy used and columns AI to BB (the error) is the outcome variable for each prediction step. The description of the method/strategies is as follows:
Machine Learning methods:
Multistep ahead prediction strategy:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Data acquisition missions were designed and executed using DJI Pilot 2’s flight route planning feature. The missions encompassed five distinct geometric patterns: 1. triangular, 2. circular, 3. rectangular, 4. linear, and 5. multi-dimensional. Each mission was configured as a waypoint flight path, allowing precise customization of parameters such as altitude, speed, and turning angle for each waypoint. The dataset consists of 3D space flight data such as take-off, landing and varying altitude to introduce the z-axis changes. It must be noted that data was logged at a frequency of 10 Hz.
To ensure consistency within the data, identical parameters were maintained across all data acquisition missions. The dataset comprises 20 distinct flights, with each flight path repeated multiple times, resulting in approximately 30 minutes of flight time per mission. The dataset is structured as time-series data, with each flight uniquely identified by a flight number and corresponding timestamp. The drone's spatial position is represented by the variables position_x, position_y, position_z while its orientation is captured by the variables orientation_x, orientation_y, orientation_z, orientation_w. Additionally, the drone's velocity and angular velocity are represented by the variables velocity_x, velocity_y, velocity_z, angular_x, angular_y, angular_z respectively. The linear acceleration is described by the variables linear_acceleration_x, linear_acceleration_y, linear_acceleration_z. The dataset also includes environmental data such as wind_speed, wind_angle using the TriSonica Mini Wind and Weather Sensor as well as information regarding the drone's battery status, including battery_voltage, battery_current.
Data Acquisition Paths: Data acquisition paths
The dataset includes labels for various operational states of the drone, such as IDLE_HOVER, ASCEND, TURN, HMSL and DESCEND. These labels can be utilized to classify the drone's current activity. Moreover, the annotated dataset can be applied in multi-task learning to predict the drone's trajectory.
The DJI Matrice 300 RTK is utilized as the primary platform for data acquisition, leveraging its compatibility with onboard development kits to facilitate the extraction of data from its integrated sensors and flight controller. To execute the developed software the NVIDIA Jetson Xavier NX serves as the embedded computing device. Utilizing the Onboard software development kit the Jetson Xavier NX enables real-time access and processing of data from the drone's sensors and flight controller.
Study data of comparative study of visualizations for multiple time series including anonymized participant data from Prolific, data set generation scripts, source code for the study framework, and analysis scripts. This repository also serves as supplemental material for the publication titled "A Comparative Study of Visualizations for Multiple Time Series", presented at IVAPP 2022. The goal of the study was to get insight about how well three visualization techniques for multiple time series (line charts, stream graphs, and aligned area charts) can be understood to solve three basic tasks: Deciding which time series has the highest value at a time, deciding which time series has the highest value over all time steps (area under the graph), and deciding at which of two time points the sum of all time series is the largest. The study was performed online on the Prolific platform with 51 participants. Each participant was shown at least 108 stimuli. Measured data for each participant is mainly which stimuli they gave the correct answer to, and how long they took. For more information about the data, please consult the paper and the README.txt.