100+ datasets found

P
UCR Time Series Classification Archive Dataset
paperswithcode.com
opendatalab.com
Updated May 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoang Anh Dau; Anthony Bagnall; Kaveh Kamgar; Chin-Chia Michael Yeh; Yan Zhu; Shaghayegh Gharghabi; Chotirat Ann Ratanamahatana; Eamonn Keogh (2023). UCR Time Series Classification Archive Dataset [Dataset]. https://paperswithcode.com/dataset/ucr-time-series-classification-archive
Explore at:
Dataset updated
May 17, 2023
Authors
Hoang Anh Dau; Anthony Bagnall; Kaveh Kamgar; Chin-Chia Michael Yeh; Yan Zhu; Shaghayegh Gharghabi; Chotirat Ann Ratanamahatana; Eamonn Keogh
Description
The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 when the archive grew from 45 to 85 data sets. This paper introduces and will focus on the new data expansion from 85 to 128 data sets. Beyond expanding this valuable resource, this paper offers pragmatic advice to anyone who may wish to evaluate a new algorithm on the archive. Finally, this paper makes a novel and yet actionable claim: of the hundreds of papers that show an improvement over the standard baseline (1-nearest neighbor classification), a large fraction may be misattributing the reasons for their improvement. Moreover, they may have been able to achieve the same improvement with a much simpler modification, requiring just a single line of code.
P
UEA time-series datasets Dataset
paperswithcode.com
Updated May 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). UEA time-series datasets Dataset [Dataset]. https://paperswithcode.com/dataset/uea-time-series-datasets
Explore at:
Dataset updated
May 25, 2023
Description
Five datasets used in NeurTraL-AD paper: \textit{RacketSports (RS).} Accelerometer and gyroscope recording of players playing four different racket sports. Each sport is designated as a different class. \textit{Epilepsy (EPSY).} Accelerometer recording of healthy actors simulating four different activity classes, one of them being an epileptic shock. \textit{Naval air training and operating procedures standardization (NAT).} Positions of sensors mounted on different body parts of a person performing activities. There are six different activity classes in the dataset. \textit{Character trajectories (CT).} Velocity trajectories of a pen on a WACOM tablet. There are $20$ different characters in this dataset. \textit{Spoken Arabic Digits (SAD).} MFCC features of ten arabic digits spoken by $88$ different speakers.
h
Classification of Types of Changes in Gully Environments Using Time Series...
heidata.uni-heidelberg.de
csv, text/x-python +2
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miguel Vallejo Orti; Carlos Castillo; Vivien Zahs; Olaf Bubenzer; Bernhard Höfle; Miguel Vallejo Orti; Carlos Castillo; Vivien Zahs; Olaf Bubenzer; Bernhard Höfle (2024). Classification of Types of Changes in Gully Environments Using Time Series Forest Algorithm [data] [Dataset]. http://doi.org/10.11588/DATA/NSMM6P
Explore at:
csv(98093), csv(1833843), csv(8041823), txt(4164), text/x-python(6667), txt(3340), tsv(7978335), csv(3585970)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/NSMM6P
Dataset updated
Jan 16, 2024
Dataset provided by
heiDATA
Authors
Miguel Vallejo Orti; Carlos Castillo; Vivien Zahs; Olaf Bubenzer; Bernhard Höfle; Miguel Vallejo Orti; Carlos Castillo; Vivien Zahs; Olaf Bubenzer; Bernhard Höfle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This code implements the TimeSeriesForest algorithm to classify different types of changes in gully environments. i)gully topographical change, ii)no change outside gully, iii) no change inside gully, and iv) non-topographical change. The algorithm is specifically designed for time series classification tasks, where the input data represents the characteristics of gullies over time. The code follows a series of steps to prepare the data, train the classifier, calculate performance metrics, and generate predictions. The data preparation phase involves importing training and testing data from CSV files. The training data is then divided into classes based on their labels, and a subset of the top rows is selected for each class to create a balanced training dataset. Time series data and corresponding labels are extracted from the training data, while only the time series data is extracted from the testing data. Next, the code calculates various performance metrics to evaluate the trained classifier. It splits the training data into training and testing sets, initializes the TimeSeriesForest classifier, and trains it using the training set. The accuracy of the classifier is calculated on the testing set, and feature importances are determined. Predictions are generated for both the testing set and new data using the trained classifier. The code then computes a confusion matrix to analyze the classification results, visualizing it using Seaborn and Matplotlib. Performance metrics such as True Accuracy, Kappa, Producer's Accuracy, and User's Accuracy are calculated and printed to assess the classifier's effectiveness in classifying gully changes. Lastly, the code performs ensemble predictions by combining the testing data with the generated predictions. The results, including predictions and associated probabilities, are saved to an output file. Overall, this code provides a practical implementation of the TimeSeriesForest algorithm for classifying types of changes in gully environments, demonstrating its potential for environmental monitoring and management.
Computed HCTSA matrices for the UEA/UCR 2018 time-series classification...
figshare.com
bin
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl H Lubba; Ben Fulcher (2023). Computed HCTSA matrices for the UEA/UCR 2018 time-series classification tasks [Dataset]. http://doi.org/10.6084/m9.figshare.6865163.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6865163.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Carl H Lubba; Ben Fulcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Using the hctsa toolbox v0.97 (link in References below), we computed 7,500+ time-series features on each of the time-series classification tasks contained in the UEA/UCR Time Series Classification Repository. This repository provides the computed hctsa output files (.mat-files) for each classification task.We used the computed feature matrices to select a small subset of 22 hctsa estimators (termed catch22) that were the most useful for the UEA/UCR datasets:C.H. Lubba, S.S. Sethi, P. Knaute, S.R. Schultz, B.D. Fulcher, N.S. Jones. catch22: CAnonical Time-series CHaracteristics. arXiv (2019). https://arxiv.org/abs/1901.10200The matrices can be read in from Python as well using the Matlab_IO interface for which examples can be found in our selection pipeline for catch22 ("op_importance" in References) and in the "hctsaAnalysisPython" GitHub repository.
r
Index1NN: Time Series Indexing (TSI)
researchdata.edu.au
bridges.monash.edu
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Wei Tan (2022). Index1NN: Time Series Indexing (TSI) [Dataset]. http://doi.org/10.4225/03/587db15ba0852
Explore at:
Unique identifier
https://doi.org/10.4225/03/587db15ba0852
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Chang Wei Tan
Description
This is the required files to run the experiment published in the paper "Indexing and classifying gigabytes of time series under time warping". It contains the nearest neighbour indices for each query in each dataset.
Z
Data from: Accelerometer-Based Multivariate Time-Series Dataset for Calf...
data.niaid.nih.gov
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dissanayake, Oshana (2024). Accelerometer-Based Multivariate Time-Series Dataset for Calf Behavior Classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13259481
Explore at:
Dataset updated
Aug 13, 2024
Dataset provided by
Cunningham, Padraig
Dissanayake, Oshana
McPherson, Sarah E.
Kennedy, Emer
Allyndrée, Joseph
Riaboff, Lucile
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AcTBeCalf Dataset Description

The AcTBeCalf dataset is a comprehensive dataset designed to support the classification of pre-weaned calf behaviors from accelerometer data. It contains detailed accelerometer readings aligned with annotated behaviors, providing a valuable resource for research in multivariate time-series classification and animal behavior analysis. The dataset includes accelerometer data collected from 30 pre-weaned Holstein Friesian and Jersey calves, housed in group pens at the Teagasc Moorepark Research Farm, Ireland. Each calf was equipped with a 3D accelerometer sensor (AX3, Axivity Ltd, Newcastle, UK) sampling at 25 Hz and attached to a neck collar from one week of birth over 13 weeks.

This dataset encompasses 27.4 hours of accelerometer data aligned with calf behaviors, including both prominent behaviors like lying, standing, and running, as well as less frequent behaviors such as grooming, social interaction, and abnormal behaviors.

The dataset consists of a single CSV file with the following columns:

dateTime: Timestamp of the accelerometer reading, sampled at 25 Hz.

calfid: Identification number of the calf (1-30).

accX: Accelerometer reading for the X axis (top-bottom direction)*.

accY: Accelerometer reading for the Y axis (backward-forward direction)*.

accZ: Accelerometer reading for the Z axis (left-right direction)*.

behavior: Annotated behavior based on an ethogram of 23 behaviors.

segId: Segment identification number associated with each accelerometer reading/row, representing all readings of the same behavior segment.

the directions are mentioned in relation to the position of the accelerometer sensor on the calf.

Code Files Description

The dataset is accompanied by several code files to facilitate the preprocessing and analysis of the accelerometer data and to support the development and evaluation of machine learning models. The main code files included in the dataset repository are:

accelerometer_time_correction.ipynb: This script corrects the accelerometer time drift, ensuring the alignment of the accelerometer data with the reference time.

shake_pattern_detector.py: This script includes an algorithm to detect shake patterns in the accelerometer signal for aligning the accelerometer time series with reference times.

aligning_accelerometer_data_with_annotations.ipynb: This notebook aligns the accelerometer time series with the annotated behaviors based on timestamps.

manual_inspection_ts_validation.ipynb: This notebook provides a manual inspection process for ensuring the accurate alignment of the accelerometer data with the annotated behaviors.

additional_ts_generation.ipynb: This notebook generates additional time-series data from the original X, Y, and Z accelerometer readings, including Magnitude, ODBA (Overall Dynamic Body Acceleration), VeDBA (Vectorial Dynamic Body Acceleration), pitch, and roll.

genSplit.py: This script provides the logic used for the generalized subject separation for machine learning model training, validation and testing.

active_inactive_classification.ipynb: This notebook details the process of classifying behaviors into active and inactive categories using a RandomForest model, achieving a balanced accuracy of 92%.

four_behv_classification.ipynb: This notebook employs the mini-ROCKET feature derivation mechanism and a RidgeClassifierCV to classify behaviors into four categories: drinking milk, lying, running, and other, achieving a balanced accuracy of 84%.

Kindly cite one of the following papers when using this data:

Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models. arXiv preprint arXiv:2404.18159.

Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Development of a digital tool for monitoring the behaviour of pre-weaned calves using accelerometer neck-collars. arXiv preprint arXiv:2406.17352
PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized...
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie; Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie (2021). PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids (Dataset) [Dataset]. http://doi.org/10.5281/zenodo.5130612
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5130612
Dataset updated
Nov 10, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie; Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The electric grid is a key enabling infrastructure for the ambitious transition towards carbon neutrality as we grapple with climate change. With deepening penetration of renewable energy resources and electrified transportation, the reliable and secure operation of the electric grid becomes increasingly challenging. In this paper, we present PSML, a first-of-its-kind open-access multi-scale time-series dataset, to aid in the development of data-driven machine learning (ML) based approaches towards reliable operation of future electric grids. The dataset is generated through a novel transmission + distribution (T+D) co-simulation designed to capture the increasingly important interactions and uncertainties of the grid dynamics, containing electric load, renewable generation, weather, voltage and current measurements at multiple spatio-temporal scales. Using PSML, we provide state-of-the-art ML baselines on three challenging use cases of critical importance to achieve: (i) early detection, accurate classification and localization of dynamic disturbance events; (ii) robust hierarchical forecasting of load and renewable energy with the presence of uncertainties and extreme events; and (iii) realistic synthetic generation of physical-law-constrained measurement time series. We envision that this dataset will enable advances for ML in dynamic systems, while simultaneously allowing ML researchers to contribute towards carbon-neutral electricity and mobility.

Data Navigation

Please download, unzip and put somewhere for later benchmark results reproduction and data loading and performance evaluation for proposed methods.

wget https://zenodo.org/record/5130612/files/PSML.zip?download=1 7z x 'PSML.zip?download=1' -o./

Minute-level Load and Renewable

File Name

ISO_zone_#.csv: `CAISO_zone_1.csv` contains minute-level load, renewable and weather data from 2018 to 2020 in the zone 1 of CAISO.

- Field Description

Field `time`: Time of minute resolution.

Field `load_power`: Normalized load power.

Field `wind_power`: Normalized wind turbine power.

Field `solar_power`: Normalized solar PV power.

Field `DHI`: Direct normal irradiance.

Field `DNI`: Diffuse horizontal irradiance.

Field `GHI`: Global horizontal irradiance.

Field `Dew Point`: Dew point in degree Celsius.

Field `Solar Zeinth Angle`: The angle between the sun's rays and the vertical direction in degree.

Field `Wind Speed`: Wind speed (m/s).

Field `Relative Humidity`: Relative humidity (%).

Field `Temperature`: Temperature in degree Celsius.

Minute-level PMU Measurements

File Name

case #: The `case 0` folder contains all data of scenario setting #0.

pf_input_#.txt: Selected load, renewable and solar generation for the simulation.

pf_result_#.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.

Filed Description

Field `time`: Time of minute resolution.

Field `Vm_###`: Voltage magnitude (p.u.) at the bus ### in the simulated model.

Field `Va_###`: Voltage angle (rad) at the bus ### in the simulated model.

Field `P_#_#_#`: `P_3_4_1` means the active power transferring in the #1 branch from the bus 3 to 4.

Field `Q_#_#_#`: `Q_5_20_1` means the reactive power transferring in the #1 branch from the bus 5 to 20.

Millisecond-level PMU Measurements

File Name

Forced Oscillation: The folder contains all forced oscillation cases.

row_#: The folder contains all data of the disturbance scenario #.

dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.

info.csv: This file contains the start time, end time, location and type of the disturbance

trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.

Natural Oscillation: The folder contains all natural oscillation cases.

row_#: The folder contains all data of the disturbance scenario #.

dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.

info.csv: This file contains the start time, end time, location and type of the disturbance.

trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.

Filed Description

trans.csv

- Field `Time(s)`: Time of millisecond resolution.

- Field `VOLT ###`: Voltage magnitude (p.u.) at the bus ### in the transmission model.

- Field `POWR ### TO ### CKT #`: `POWR 151 TO 152 CKT '1 '` means the active power transferring in the #1 branch from the bus 151 to 152.

- Field `VARS ### TO ### CKT #`: `VARS 151 TO 152 CKT '1 '` means the reactive power transferring in the #1 branch from the bus 151 to 152.

dist.csv

Field `Time(s)`: Time of millisecond resolution.

Field `####.###.#`: `3005.633.1` means per-unit voltage magnitude of the phase A at the bus 633 of the distribution grid, the one connecting to the bus 3005 in the transmission system.

Network traffic datasets created by Single Flow Time Series Analysis

zenodo.org
explore.openaire.eu
+1more

csv, pdf

Updated Jul 11, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724

Explore at:

csv, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8035724

Dataset updated

Jul 11, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Network traffic datasets created by Single Flow Time Series Analysis

Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

In the following table is a description of each dataset file:

File name	Detection problem	Citation of original raw dataset
botnet_binary.csv	Binary detection of botnet	S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv	Multi-class classification of botnet	S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv	Binary detection of cryptomining; the design part	Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv	Binary detection of cryptomining; the evaluation part	Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv	Binary detection of malware DNS	Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv	Binary detection of DoH	Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv	Binary detection of DoH	Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv	Binary detection of DoS	Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv	Binary detection of IoT malware	Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv	Multi-class classification of IoT malware	Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv	Binary detection of HTTPS Brute Force	Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv	Binary detection of intrusion in IDS	Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv	Multi-class classification of intrusion in IDS	Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_unsw_nb_15_binary.csv	Binary detection of intrusion in IDS	Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
ids_unsw_nb_15_multiclass.csv	Multi-class classification of intrusion in IDS	Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv	Binary detection of IoT malware	Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv	Binary detection of IoT malware	Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv	Multi-class classification of IoT malware	Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
tor_binary.csv	Binary detection of TOR	Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
tor_multiclass.csv	Multi-class classification of TOR	Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
vpn_iscx_binary.csv	Binary detection of VPN	Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_iscx_multiclass.csv	Multi-class classification of VPN	Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
vpn_vnat_binary.csv	Binary detection of VPN	Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
vpn_vnat_multiclass.csv	Multi-class classification of VPN	Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

Z
Data from: A Meta-Learner Approach to Multistep-Ahead Time Series Prediction...
data.niaid.nih.gov
zenodo.org
Updated May 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fouad Bahrpeyma (2023). A Meta-Learner Approach to Multistep-Ahead Time Series Prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7907676
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
andrew.mccarren@dcu.ie
Fouad Bahrpeyma
Mark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The application of machine learning has become commonplace for problems in modern data science. The democratization of the decision process when choosing a machine learning algorithm has also received considerable attention through the use of meta features and automated machine learning for both classification and regression type problems. However, this is not the case for multistep-ahead time series problems. Time series models generally rely upon the series itself to make future predictions, as opposed to independent features used in regression and classification problems. The structure of a time series is generally described by features such as trend, seasonality, cyclicality, and irregularity. In this research, we demonstrate how time series metrics for these features, in conjunction with an ensemble based regression learner, were used to predict the standardized mean square error of candidate time series prediction models. These experiments used datasets that cover a wide feature space and enable researchers to select the single best performing model or the top N performing models. A robust evaluation was carried out to test the learner's performance on both synthetic and real time series.

Proposed Dataset

The dataset proposed here gives the results for 20 step ahead predictions for eight Machine Learning/Multi-step ahead prediction strategies for 5,842 time series datasets outlined here. It was used as the training data for the Meta Learners in this research. The meta features used are columns C to AE. Columns AH outlines the method/strategy used and columns AI to BB (the error) is the outcome variable for each prediction step. The description of the method/strategies is as follows:

Machine Learning methods:

NN: Neural Network

ARIMA: Autoregressive Integrated Moving Average

SVR: Support Vector Regression

LSTM: Long Short Term Memory

RNN: Recurrent Neural Network

Multistep ahead prediction strategy:

OSAP: One Step ahead strategy

MRFA: Multi Resolution Forecast Aggregation
m
Source Code
bridges.monash.edu
researchdata.edu.au
zip
Updated Oct 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Wei Tan (2017). Source Code [Dataset]. http://doi.org/10.4225/03/59e33dfb920f1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4225/03/59e33dfb920f1
Dataset updated
Oct 15, 2017
Dataset provided by
Monash University
Authors
Chang Wei Tan
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This is the source code for the paper "Efficient search of the best warping window for Dynamic Time Warping".This work focused on fast learning/searching for the best warping window for Dynamic Time Warping and Time Series Classification.For more info, visit https://github.com/ChangWeiTan/FastWWSearch
h
S2Agri-17
huggingface.co
Updated Feb 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monash Scalable Time Series Evaluation Repository (2025). S2Agri-17 [Dataset]. https://huggingface.co/datasets/monster-monash/S2Agri-17
Explore at:
Dataset updated
Feb 25, 2025
Dataset authored and provided by
Monash Scalable Time Series Evaluation Repository
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Part of MONSTER: https://arxiv.org/abs/2502.15122.

S2Agri-17

Category Satellite

Num. Examples 59,268,823

Num. Channels 10

Length 24

Sampling Freq. 10 days

Num. Classes 17

License CC BY 4.0

Citations [1] [2]

S2Agri is a land cover classification dataset and contains a single tile of Sentinel-2 data (T31TFM), which covers a 12,100 km2area in France: see Figure [1, 2]. Ten spectral bands covering the visible and infrared frequencies are used, and these are provided… See the full description on the dataset page: https://huggingface.co/datasets/monster-monash/S2Agri-17.
P
Multivariate-Mobility-Paris Dataset
paperswithcode.com
Updated Apr 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Héber H. Arcolezi; Jean-François Couchot; Denis Renaud; Bechara Al Bouna; Xiaokui Xiao (2022). Multivariate-Mobility-Paris Dataset [Dataset]. https://paperswithcode.com/dataset/multivariate-mobility-paris
Explore at:
Dataset updated
Apr 30, 2022
Authors
Héber H. Arcolezi; Jean-François Couchot; Denis Renaud; Bechara Al Bouna; Xiaokui Xiao
Description
The original dataset was provided by Orange telecom in France, which contains anonymized and aggregated human mobility data. The Multivariate-Mobility-Paris dataset comprises information from 2020-08-24 to 2020-11-04 (72 days during the COVID-19 pandemic), with time granularity of 30 minutes and spatial granularity of 6 coarse regions in Paris, France. In other words, it represents a multivariate time series dataset.

This dataset can be used for several time-series tasks such as univariate/multivariate forecasting/classification with classic, machine learning, and privacy-preserving machine learning techniques.
Time Series International Trade: Monthly U.S. Exports by North American...
datasets.ai
catalog.data.gov
2
Updated Aug 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Commerce (2024). Time Series International Trade: Monthly U.S. Exports by North American Industry Classification System (NAICS) Code [Dataset]. https://datasets.ai/datasets/time-series-international-trade-monthly-u-s-exports-by-north-american-industry-classificat
Explore at:
2Available download formats
Dataset updated
Aug 8, 2024
Dataset provided by
United States Department of Commercehttp://www.commerce.gov/
Authors
Department of Commerce
Area covered
United States
Description
The Census data API provides access to the most comprehensive set of data on current month and cumulative year-to-date exports using the North American Industry Classification System (NAICS). The NAICS endpoint in the Census data API also provides value, shipping weight, and method of transportation totals at the district level for all U.S. trading partners. The Census data API will help users research new markets for their products, establish pricing structures for potential export markets, and conduct economic planning. If you have any questions regarding U.S. international trade data, please call us at 1(800)549-0595 option #4 or email us at eid.international.trade.data@census.gov.
FiftyWords UCR Archive Dataset
zenodo.org
data.niaid.nih.gov
bin
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). FiftyWords UCR Archive Dataset [Dataset]. http://doi.org/10.5281/zenodo.11191097
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11191097
Dataset updated
May 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of the UCR Archive maintained by University of Southampton researchers. Please cite a relevant or the latest full archive release if you use the datasets. See http://www.timeseriesclassification.com/.

FiftyWords is a data set of word outlines taken from the George Washington library by T. Rath and used in the paper "Word image matching using dynamic time warping", CVPR 2003. Each case is a word. A series is formed by taking the height profile of the word.

Donator: T. Rath, R. Manmatha
f
Accuracy of all methods on different datasets.
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafis Irtiza Tripto; Mohimenul Kabir; Md. Shamsuzzoha Bayzid; Atif Rahman (2023). Accuracy of all methods on different datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0241686.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0241686.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Nafis Irtiza Tripto; Mohimenul Kabir; Md. Shamsuzzoha Bayzid; Atif Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All accuracy value is mentioned in percentage (%).
f
Final LCZ maps with post-classification processing
springernature.figshare.com
zip
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steve Hankey; Meng Qi; Chunxue Xu; Wenwen Zhang; Matthias Demuzere; Perry Hystad; Tianjun Lu; Peter James; Benjamin Bechtel (2024). Final LCZ maps with post-classification processing [Dataset]. http://doi.org/10.6084/m9.figshare.24964275.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24964275.v1
Dataset updated
Feb 12, 2024
Dataset provided by
figshare
Authors
Steve Hankey; Meng Qi; Chunxue Xu; Wenwen Zhang; Matthias Demuzere; Perry Hystad; Tianjun Lu; Peter James; Benjamin Bechtel
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This compressed folder contains annual CONUS-wide LCZ maps ranging from 1986 to 2020, which is the main and final LCZ product of this dataset. The maps are derived from a lightweight contextual Random Forest model with spatial and temporal post-classification processing. Each map is provided in the Geo TIFF file format with representing year indicated in the file name. For example, the file "TP_2020.tif" represents the LCZ map for 2020. All LCZ maps have a spatial resolution at 100m and projection of USA Contiguous Albers Equal Area Conic (EPSG=5070). The LCZ classes are indicated by numbers 1-17. Note that LCZ class 7 (Lightweight low-rise) is not present in this dataset. Pixels of value 0 represents NoData.
DodgerLoopGame UCR Archive Dataset
zenodo.org
data.niaid.nih.gov
bin
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). DodgerLoopGame UCR Archive Dataset [Dataset]. http://doi.org/10.5281/zenodo.11186628
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11186628
Dataset updated
May 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of the UCR Archive maintained by University of Southampton researchers. Please cite a relevant or the latest full archive release if you use the datasets. See http://www.timeseriesclassification.com/.

The traffic data are collected with the loop sensor installed on ramp for the 101 North freeway in Los Angeles. This location is close to Dodgers Stadium; therefore the traffic is affected by volume of visitors to the stadium. Missing values are represented with NaN. - Class 1: Normal Day - Class 2: Game Day There is nothing to infer from the order of examples in the train and test set. Missing values are represented with NaN in the text file. Data created by Ihler, Alexander, Jon Hutchins, and Padhraic Smyth (see [1][2][3]). Data edited by Chin-Chia Michael Yeh.

[1] Ihler, Alexander, Jon Hutchins, and Padhraic Smyth. "Adaptive event detection with time-varying poisson processes." Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.

[2] “UCI Machine Learning Repository: Dodgers Loop Sensor Data Set.” UCI Machine Learning Repository, archive.ics.uci.edu/ml/datasets/dodgers+loop+sensor.

[3] “Caltrans PeMS.” Caltrans, pems.dot.ca.gov/.

Donator: C. Yeh
f
RMSE value of all methods on different datasets.
plos.figshare.com
xls
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafis Irtiza Tripto; Mohimenul Kabir; Md. Shamsuzzoha Bayzid; Atif Rahman (2023). RMSE value of all methods on different datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0241686.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0241686.t004
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Nafis Irtiza Tripto; Mohimenul Kabir; Md. Shamsuzzoha Bayzid; Atif Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RMSE value of different methods for different test percents are grouped together and best RMSE values are highlighted.
Landsat time series classification training data
zenodo.org
csv, text/x-python
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang Hankui; Zhang Hankui (2023). Landsat time series classification training data [Dataset]. http://doi.org/10.5281/zenodo.8097697
Explore at:
text/x-python, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8097697
Dataset updated
Jun 30, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zhang Hankui; Zhang Hankui
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data for the paper

Hankui K. Zhang, Dong Luo, Zhongbin Li, Classifying raw irregular Landsat time series (CRIT) for large area land cover mapping by adapting Transformer model.

It stores the daily raw Landsat ARD annual good quality surface reflectance time series for 1985, 2006 and 2018 for CONUS with 7 land cover classes. Details are in the paper.
Multivariate time series for testing -- RacketSports dataset
zenodo.org
explore.openaire.eu
+1more
zip
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Huber; Florian Huber (2020). Multivariate time series for testing -- RacketSports dataset [Dataset]. http://doi.org/10.5281/zenodo.3742271
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3742271
Dataset updated
Apr 7, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Florian Huber; Florian Huber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The original data was retrieved from http://www.timeseriesclassification.com/description.php?Dataset=RacketSports

Original data description:
The data was created by university students plyaing badminton or squash whilst wearing a smart watch (Sony Smart watch 35). The watch relayed the x-y-z coordinates for
both the gyroscope and accelerometer to an android phone (One Plus 56). The phone
wrote these values to an Attribute-Relation File Format (arff) file using an app developed
by a UEA computer science masters student. The problem is to identify which sport and which stroke the players are making. The data was collected at a rate of 10 HZ over 3 seconds whilst the player played
either a forehand/backhand in squash or a clear/smash in badminton.
The data was collected as part of an undergraduate project by Phillip Perks in 2017/18.

Pre-processing
Data processing was done as described in: https://github.com/NLeSC/mcfly-tutorial/blob/master/utils/tutorial_racketsports.py
The original data was split into train and test set. Here the data was loaded and further divided into train, test, validation sets.
To keep it simple we here simply divided the original test part into test and validation.
The resulting data was stored as numpy .npy files.

The zip file contains three sets of time series data (X_train, X_test, X_valid) and the respective labels (y_train, y_test, y_valid).

Reference:
http://www.timeseriesclassification.com/description.php?Dataset=RacketSports
(The data was collected as part of an undergraduate project by Phillip Perks in 2017/18.)

Facebook

Twitter

Click to copy link

Link copied

Cite

Hoang Anh Dau; Anthony Bagnall; Kaveh Kamgar; Chin-Chia Michael Yeh; Yan Zhu; Shaghayegh Gharghabi; Chotirat Ann Ratanamahatana; Eamonn Keogh (2023). UCR Time Series Classification Archive Dataset [Dataset]. https://paperswithcode.com/dataset/ucr-time-series-classification-archive

UCR Time Series Classification Archive Dataset

Explore at:

Dataset updated

May 17, 2023

Authors

Hoang Anh Dau; Anthony Bagnall; Kaveh Kamgar; Chin-Chia Michael Yeh; Yan Zhu; Shaghayegh Gharghabi; Chotirat Ann Ratanamahatana; Eamonn Keogh

Description

The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 when the archive grew from 45 to 85 data sets. This paper introduces and will focus on the new data expansion from 85 to 128 data sets. Beyond expanding this valuable resource, this paper offers pragmatic advice to anyone who may wish to evaluate a new algorithm on the archive. Finally, this paper makes a novel and yet actionable claim: of the hundreds of papers that show an improvement over the standard baseline (1-nearest neighbor classification), a large fraction may be misattributing the reasons for their improvement. Moreover, they may have been able to achieve the same improvement with a much simpler modification, requiring just a single line of code.

Clear search

Close search

Google apps

Main menu

UCR Time Series Classification Archive Dataset

UEA time-series datasets Dataset

Classification of Types of Changes in Gully Environments Using Time Series...

Computed HCTSA matrices for the UEA/UCR 2018 time-series classification...

Index1NN: Time Series Indexing (TSI)

Data from: Accelerometer-Based Multivariate Time-Series Dataset for Calf...

PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized...

Network traffic datasets created by Single Flow Time Series Analysis

Data from: A Meta-Learner Approach to Multistep-Ahead Time Series Prediction...

Source Code

S2Agri-17

Multivariate-Mobility-Paris Dataset

Time Series International Trade: Monthly U.S. Exports by North American...

FiftyWords UCR Archive Dataset

Accuracy of all methods on different datasets.

Final LCZ maps with post-classification processing

DodgerLoopGame UCR Archive Dataset

RMSE value of all methods on different datasets.

Landsat time series classification training data

Multivariate time series for testing -- RacketSports dataset

UCR Time Series Classification Archive Dataset