100+ datasets found
  1. P

    UCR Time Series Classification Archive Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated May 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hoang Anh Dau; Anthony Bagnall; Kaveh Kamgar; Chin-Chia Michael Yeh; Yan Zhu; Shaghayegh Gharghabi; Chotirat Ann Ratanamahatana; Eamonn Keogh (2023). UCR Time Series Classification Archive Dataset [Dataset]. https://paperswithcode.com/dataset/ucr-time-series-classification-archive
    Explore at:
    Dataset updated
    May 17, 2023
    Authors
    Hoang Anh Dau; Anthony Bagnall; Kaveh Kamgar; Chin-Chia Michael Yeh; Yan Zhu; Shaghayegh Gharghabi; Chotirat Ann Ratanamahatana; Eamonn Keogh
    Description

    The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 when the archive grew from 45 to 85 data sets. This paper introduces and will focus on the new data expansion from 85 to 128 data sets. Beyond expanding this valuable resource, this paper offers pragmatic advice to anyone who may wish to evaluate a new algorithm on the archive. Finally, this paper makes a novel and yet actionable claim: of the hundreds of papers that show an improvement over the standard baseline (1-nearest neighbor classification), a large fraction may be misattributing the reasons for their improvement. Moreover, they may have been able to achieve the same improvement with a much simpler modification, requiring just a single line of code.

  2. P

    UEA time-series datasets Dataset

    • paperswithcode.com
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). UEA time-series datasets Dataset [Dataset]. https://paperswithcode.com/dataset/uea-time-series-datasets
    Explore at:
    Dataset updated
    May 25, 2023
    Description

    Five datasets used in NeurTraL-AD paper: \textit{RacketSports (RS).} Accelerometer and gyroscope recording of players playing four different racket sports. Each sport is designated as a different class. \textit{Epilepsy (EPSY).} Accelerometer recording of healthy actors simulating four different activity classes, one of them being an epileptic shock. \textit{Naval air training and operating procedures standardization (NAT).} Positions of sensors mounted on different body parts of a person performing activities. There are six different activity classes in the dataset. \textit{Character trajectories (CT).} Velocity trajectories of a pen on a WACOM tablet. There are $20$ different characters in this dataset. \textit{Spoken Arabic Digits (SAD).} MFCC features of ten arabic digits spoken by $88$ different speakers.

  3. h

    Classification of Types of Changes in Gully Environments Using Time Series...

    • heidata.uni-heidelberg.de
    csv, text/x-python +2
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Vallejo Orti; Carlos Castillo; Vivien Zahs; Olaf Bubenzer; Bernhard Höfle; Miguel Vallejo Orti; Carlos Castillo; Vivien Zahs; Olaf Bubenzer; Bernhard Höfle (2024). Classification of Types of Changes in Gully Environments Using Time Series Forest Algorithm [data] [Dataset]. http://doi.org/10.11588/DATA/NSMM6P
    Explore at:
    csv(98093), csv(1833843), csv(8041823), txt(4164), text/x-python(6667), txt(3340), tsv(7978335), csv(3585970)Available download formats
    Dataset updated
    Jan 16, 2024
    Dataset provided by
    heiDATA
    Authors
    Miguel Vallejo Orti; Carlos Castillo; Vivien Zahs; Olaf Bubenzer; Bernhard Höfle; Miguel Vallejo Orti; Carlos Castillo; Vivien Zahs; Olaf Bubenzer; Bernhard Höfle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This code implements the TimeSeriesForest algorithm to classify different types of changes in gully environments. i)gully topographical change, ii)no change outside gully, iii) no change inside gully, and iv) non-topographical change. The algorithm is specifically designed for time series classification tasks, where the input data represents the characteristics of gullies over time. The code follows a series of steps to prepare the data, train the classifier, calculate performance metrics, and generate predictions. The data preparation phase involves importing training and testing data from CSV files. The training data is then divided into classes based on their labels, and a subset of the top rows is selected for each class to create a balanced training dataset. Time series data and corresponding labels are extracted from the training data, while only the time series data is extracted from the testing data. Next, the code calculates various performance metrics to evaluate the trained classifier. It splits the training data into training and testing sets, initializes the TimeSeriesForest classifier, and trains it using the training set. The accuracy of the classifier is calculated on the testing set, and feature importances are determined. Predictions are generated for both the testing set and new data using the trained classifier. The code then computes a confusion matrix to analyze the classification results, visualizing it using Seaborn and Matplotlib. Performance metrics such as True Accuracy, Kappa, Producer's Accuracy, and User's Accuracy are calculated and printed to assess the classifier's effectiveness in classifying gully changes. Lastly, the code performs ensemble predictions by combining the testing data with the generated predictions. The results, including predictions and associated probabilities, are saved to an output file. Overall, this code provides a practical implementation of the TimeSeriesForest algorithm for classifying types of changes in gully environments, demonstrating its potential for environmental monitoring and management.

  4. Computed HCTSA matrices for the UEA/UCR 2018 time-series classification...

    • figshare.com
    bin
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl H Lubba; Ben Fulcher (2023). Computed HCTSA matrices for the UEA/UCR 2018 time-series classification tasks [Dataset]. http://doi.org/10.6084/m9.figshare.6865163.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Carl H Lubba; Ben Fulcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using the hctsa toolbox v0.97 (link in References below), we computed 7,500+ time-series features on each of the time-series classification tasks contained in the UEA/UCR Time Series Classification Repository. This repository provides the computed hctsa output files (.mat-files) for each classification task.We used the computed feature matrices to select a small subset of 22 hctsa estimators (termed catch22) that were the most useful for the UEA/UCR datasets:C.H. Lubba, S.S. Sethi, P. Knaute, S.R. Schultz, B.D. Fulcher, N.S. Jones. catch22: CAnonical Time-series CHaracteristics. arXiv (2019). https://arxiv.org/abs/1901.10200The matrices can be read in from Python as well using the Matlab_IO interface for which examples can be found in our selection pipeline for catch22 ("op_importance" in References) and in the "hctsaAnalysisPython" GitHub repository.

  5. r

    Index1NN: Time Series Indexing (TSI)

    • researchdata.edu.au
    • bridges.monash.edu
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Wei Tan (2022). Index1NN: Time Series Indexing (TSI) [Dataset]. http://doi.org/10.4225/03/587db15ba0852
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Chang Wei Tan
    Description

    This is the required files to run the experiment published in the paper "Indexing and classifying gigabytes of time series under time warping". It contains the nearest neighbour indices for each query in each dataset.

  6. Z

    Data from: Accelerometer-Based Multivariate Time-Series Dataset for Calf...

    • data.niaid.nih.gov
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dissanayake, Oshana (2024). Accelerometer-Based Multivariate Time-Series Dataset for Calf Behavior Classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13259481
    Explore at:
    Dataset updated
    Aug 13, 2024
    Dataset provided by
    Cunningham, Padraig
    Dissanayake, Oshana
    McPherson, Sarah E.
    Kennedy, Emer
    Allyndrée, Joseph
    Riaboff, Lucile
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AcTBeCalf Dataset Description

    The AcTBeCalf dataset is a comprehensive dataset designed to support the classification of pre-weaned calf behaviors from accelerometer data. It contains detailed accelerometer readings aligned with annotated behaviors, providing a valuable resource for research in multivariate time-series classification and animal behavior analysis. The dataset includes accelerometer data collected from 30 pre-weaned Holstein Friesian and Jersey calves, housed in group pens at the Teagasc Moorepark Research Farm, Ireland. Each calf was equipped with a 3D accelerometer sensor (AX3, Axivity Ltd, Newcastle, UK) sampling at 25 Hz and attached to a neck collar from one week of birth over 13 weeks.

    This dataset encompasses 27.4 hours of accelerometer data aligned with calf behaviors, including both prominent behaviors like lying, standing, and running, as well as less frequent behaviors such as grooming, social interaction, and abnormal behaviors.

    The dataset consists of a single CSV file with the following columns:

    dateTime: Timestamp of the accelerometer reading, sampled at 25 Hz.

    calfid: Identification number of the calf (1-30).

    accX: Accelerometer reading for the X axis (top-bottom direction)*.

    accY: Accelerometer reading for the Y axis (backward-forward direction)*.

    accZ: Accelerometer reading for the Z axis (left-right direction)*.

    behavior: Annotated behavior based on an ethogram of 23 behaviors.

    segId: Segment identification number associated with each accelerometer reading/row, representing all readings of the same behavior segment.

    • the directions are mentioned in relation to the position of the accelerometer sensor on the calf.

    Code Files Description

    The dataset is accompanied by several code files to facilitate the preprocessing and analysis of the accelerometer data and to support the development and evaluation of machine learning models. The main code files included in the dataset repository are:

    accelerometer_time_correction.ipynb: This script corrects the accelerometer time drift, ensuring the alignment of the accelerometer data with the reference time.

    shake_pattern_detector.py: This script includes an algorithm to detect shake patterns in the accelerometer signal for aligning the accelerometer time series with reference times.

    aligning_accelerometer_data_with_annotations.ipynb: This notebook aligns the accelerometer time series with the annotated behaviors based on timestamps.

    manual_inspection_ts_validation.ipynb: This notebook provides a manual inspection process for ensuring the accurate alignment of the accelerometer data with the annotated behaviors.

    additional_ts_generation.ipynb: This notebook generates additional time-series data from the original X, Y, and Z accelerometer readings, including Magnitude, ODBA (Overall Dynamic Body Acceleration), VeDBA (Vectorial Dynamic Body Acceleration), pitch, and roll.

    genSplit.py: This script provides the logic used for the generalized subject separation for machine learning model training, validation and testing.

    active_inactive_classification.ipynb: This notebook details the process of classifying behaviors into active and inactive categories using a RandomForest model, achieving a balanced accuracy of 92%.

    four_behv_classification.ipynb: This notebook employs the mini-ROCKET feature derivation mechanism and a RidgeClassifierCV to classify behaviors into four categories: drinking milk, lying, running, and other, achieving a balanced accuracy of 84%.

    Kindly cite one of the following papers when using this data:

    Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models. arXiv preprint arXiv:2404.18159.

    Dissanayake, O., McPherson, S. E., Allyndrée, J., Kennedy, E., Cunningham, P., & Riaboff, L. (2024). Development of a digital tool for monitoring the behaviour of pre-weaned calves using accelerometer neck-collars. arXiv preprint arXiv:2406.17352

  7. PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie; Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie (2021). PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids (Dataset) [Dataset]. http://doi.org/10.5281/zenodo.5130612
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie; Xiangtian Zheng; Nan Xu; Dongqi Wu; Loc Trinh; Tong Huang; S Sivaranjani; Yan Liu; Le Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The electric grid is a key enabling infrastructure for the ambitious transition towards carbon neutrality as we grapple with climate change. With deepening penetration of renewable energy resources and electrified transportation, the reliable and secure operation of the electric grid becomes increasingly challenging. In this paper, we present PSML, a first-of-its-kind open-access multi-scale time-series dataset, to aid in the development of data-driven machine learning (ML) based approaches towards reliable operation of future electric grids. The dataset is generated through a novel transmission + distribution (T+D) co-simulation designed to capture the increasingly important interactions and uncertainties of the grid dynamics, containing electric load, renewable generation, weather, voltage and current measurements at multiple spatio-temporal scales. Using PSML, we provide state-of-the-art ML baselines on three challenging use cases of critical importance to achieve: (i) early detection, accurate classification and localization of dynamic disturbance events; (ii) robust hierarchical forecasting of load and renewable energy with the presence of uncertainties and extreme events; and (iii) realistic synthetic generation of physical-law-constrained measurement time series. We envision that this dataset will enable advances for ML in dynamic systems, while simultaneously allowing ML researchers to contribute towards carbon-neutral electricity and mobility.

    Data Navigation

    Please download, unzip and put somewhere for later benchmark results reproduction and data loading and performance evaluation for proposed methods.

    wget https://zenodo.org/record/5130612/files/PSML.zip?download=1
    7z x 'PSML.zip?download=1' -o./
    

    Minute-level Load and Renewable

    • File Name
      • ISO_zone_#.csv: `CAISO_zone_1.csv` contains minute-level load, renewable and weather data from 2018 to 2020 in the zone 1 of CAISO.
    • - Field Description
      • Field `time`: Time of minute resolution.
      • Field `load_power`: Normalized load power.
      • Field `wind_power`: Normalized wind turbine power.
      • Field `solar_power`: Normalized solar PV power.
      • Field `DHI`: Direct normal irradiance.
      • Field `DNI`: Diffuse horizontal irradiance.
      • Field `GHI`: Global horizontal irradiance.
      • Field `Dew Point`: Dew point in degree Celsius.
      • Field `Solar Zeinth Angle`: The angle between the sun's rays and the vertical direction in degree.
      • Field `Wind Speed`: Wind speed (m/s).
      • Field `Relative Humidity`: Relative humidity (%).
      • Field `Temperature`: Temperature in degree Celsius.

    Minute-level PMU Measurements

    • File Name
      • case #: The `case 0` folder contains all data of scenario setting #0.
        • pf_input_#.txt: Selected load, renewable and solar generation for the simulation.
        • pf_result_#.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
    • Filed Description
      • Field `time`: Time of minute resolution.
      • Field `Vm_###`: Voltage magnitude (p.u.) at the bus ### in the simulated model.
      • Field `Va_###`: Voltage angle (rad) at the bus ### in the simulated model.
      • Field `P_#_#_#`: `P_3_4_1` means the active power transferring in the #1 branch from the bus 3 to 4.
      • Field `Q_#_#_#`: `Q_5_20_1` means the reactive power transferring in the #1 branch from the bus 5 to 20.

    Millisecond-level PMU Measurements

    • File Name
      • Forced Oscillation: The folder contains all forced oscillation cases.
        • row_#: The folder contains all data of the disturbance scenario #.
          • dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.
          • info.csv: This file contains the start time, end time, location and type of the disturbance
          • trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
      • Natural Oscillation: The folder contains all natural oscillation cases.
        • row_#: The folder contains all data of the disturbance scenario #.
          • dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.
          • info.csv: This file contains the start time, end time, location and type of the disturbance.
          • trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
    • Filed Description
      • trans.csv
        • - Field `Time(s)`: Time of millisecond resolution.
        • - Field `VOLT ###`: Voltage magnitude (p.u.) at the bus ### in the transmission model.
        • - Field `POWR ### TO ### CKT #`: `POWR 151 TO 152 CKT '1 '` means the active power transferring in the #1 branch from the bus 151 to 152.
        • - Field `VARS ### TO ### CKT #`: `VARS 151 TO 152 CKT '1 '` means the reactive power transferring in the #1 branch from the bus 151 to 152.
      • dist.csv
        • Field `Time(s)`: Time of millisecond resolution.
        • Field `####.###.#`: `3005.633.1` means per-unit voltage magnitude of the phase A at the bus 633 of the distribution grid, the one connecting to the bus 3005 in the transmission system.
  8. Network traffic datasets created by Single Flow Time Series Analysis

    • zenodo.org
    • explore.openaire.eu
    • +1more
    csv, pdf
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets created by Single Flow Time Series Analysis

    Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

    J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

    In the following table is a description of each dataset file:

    File nameDetection problemCitation of original raw dataset
    botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    cryptomining_design.csvBinary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
    doh_cic.csv Binary detection of DoH

    Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    edge_iiot_multiclass.csvMulti-class classification of IoT malwareMohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    https_brute_force.csvBinary detection of HTTPS Brute ForceJan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
    ids_cic_binary.csvBinary detection of intrusion in IDSIman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
    vpn_vnat_multiclass.csvMulti-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

  9. Z

    Data from: A Meta-Learner Approach to Multistep-Ahead Time Series Prediction...

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fouad Bahrpeyma (2023). A Meta-Learner Approach to Multistep-Ahead Time Series Prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7907676
    Explore at:
    Dataset updated
    May 9, 2023
    Dataset provided by
    andrew.mccarren@dcu.ie
    Fouad Bahrpeyma
    Mark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The application of machine learning has become commonplace for problems in modern data science. The democratization of the decision process when choosing a machine learning algorithm has also received considerable attention through the use of meta features and automated machine learning for both classification and regression type problems. However, this is not the case for multistep-ahead time series problems. Time series models generally rely upon the series itself to make future predictions, as opposed to independent features used in regression and classification problems. The structure of a time series is generally described by features such as trend, seasonality, cyclicality, and irregularity. In this research, we demonstrate how time series metrics for these features, in conjunction with an ensemble based regression learner, were used to predict the standardized mean square error of candidate time series prediction models. These experiments used datasets that cover a wide feature space and enable researchers to select the single best performing model or the top N performing models. A robust evaluation was carried out to test the learner's performance on both synthetic and real time series.

    Proposed Dataset

    The dataset proposed here gives the results for 20 step ahead predictions for eight Machine Learning/Multi-step ahead prediction strategies for 5,842 time series datasets outlined here. It was used as the training data for the Meta Learners in this research. The meta features used are columns C to AE. Columns AH outlines the method/strategy used and columns AI to BB (the error) is the outcome variable for each prediction step. The description of the method/strategies is as follows:

    Machine Learning methods:

    NN: Neural Network

    ARIMA: Autoregressive Integrated Moving Average

    SVR: Support Vector Regression

    LSTM: Long Short Term Memory

    RNN: Recurrent Neural Network

    Multistep ahead prediction strategy:

    OSAP: One Step ahead strategy

    MRFA: Multi Resolution Forecast Aggregation

  10. m

    Source Code

    • bridges.monash.edu
    • researchdata.edu.au
    zip
    Updated Oct 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Wei Tan (2017). Source Code [Dataset]. http://doi.org/10.4225/03/59e33dfb920f1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 15, 2017
    Dataset provided by
    Monash University
    Authors
    Chang Wei Tan
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This is the source code for the paper "Efficient search of the best warping window for Dynamic Time Warping".This work focused on fast learning/searching for the best warping window for Dynamic Time Warping and Time Series Classification.For more info, visit https://github.com/ChangWeiTan/FastWWSearch

  11. h

    S2Agri-17

    • huggingface.co
    Updated Feb 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monash Scalable Time Series Evaluation Repository (2025). S2Agri-17 [Dataset]. https://huggingface.co/datasets/monster-monash/S2Agri-17
    Explore at:
    Dataset updated
    Feb 25, 2025
    Dataset authored and provided by
    Monash Scalable Time Series Evaluation Repository
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Part of MONSTER: https://arxiv.org/abs/2502.15122.

    S2Agri-17

    Category Satellite

    Num. Examples 59,268,823

    Num. Channels 10

    Length 24

    Sampling Freq. 10 days

    Num. Classes 17

    License CC BY 4.0

    Citations [1] [2]

    S2Agri is a land cover classification dataset and contains a single tile of Sentinel-2 data (T31TFM), which covers a 12,100 km2area in France: see Figure [1, 2]. Ten spectral bands covering the visible and infrared frequencies are used, and these are provided… See the full description on the dataset page: https://huggingface.co/datasets/monster-monash/S2Agri-17.

  12. P

    Multivariate-Mobility-Paris Dataset

    • paperswithcode.com
    Updated Apr 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Héber H. Arcolezi; Jean-François Couchot; Denis Renaud; Bechara Al Bouna; Xiaokui Xiao (2022). Multivariate-Mobility-Paris Dataset [Dataset]. https://paperswithcode.com/dataset/multivariate-mobility-paris
    Explore at:
    Dataset updated
    Apr 30, 2022
    Authors
    Héber H. Arcolezi; Jean-François Couchot; Denis Renaud; Bechara Al Bouna; Xiaokui Xiao
    Description

    The original dataset was provided by Orange telecom in France, which contains anonymized and aggregated human mobility data. The Multivariate-Mobility-Paris dataset comprises information from 2020-08-24 to 2020-11-04 (72 days during the COVID-19 pandemic), with time granularity of 30 minutes and spatial granularity of 6 coarse regions in Paris, France. In other words, it represents a multivariate time series dataset.

    This dataset can be used for several time-series tasks such as univariate/multivariate forecasting/classification with classic, machine learning, and privacy-preserving machine learning techniques.

  13. Time Series International Trade: Monthly U.S. Exports by North American...

    • datasets.ai
    • catalog.data.gov
    2
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Commerce (2024). Time Series International Trade: Monthly U.S. Exports by North American Industry Classification System (NAICS) Code [Dataset]. https://datasets.ai/datasets/time-series-international-trade-monthly-u-s-exports-by-north-american-industry-classificat
    Explore at:
    2Available download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    United States Department of Commercehttp://www.commerce.gov/
    Authors
    Department of Commerce
    Area covered
    United States
    Description

    The Census data API provides access to the most comprehensive set of data on current month and cumulative year-to-date exports using the North American Industry Classification System (NAICS). The NAICS endpoint in the Census data API also provides value, shipping weight, and method of transportation totals at the district level for all U.S. trading partners. The Census data API will help users research new markets for their products, establish pricing structures for potential export markets, and conduct economic planning. If you have any questions regarding U.S. international trade data, please call us at 1(800)549-0595 option #4 or email us at eid.international.trade.data@census.gov.

  14. FiftyWords UCR Archive Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). FiftyWords UCR Archive Dataset [Dataset]. http://doi.org/10.5281/zenodo.11191097
    Explore at:
    binAvailable download formats
    Dataset updated
    May 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the UCR Archive maintained by University of Southampton researchers. Please cite a relevant or the latest full archive release if you use the datasets. See http://www.timeseriesclassification.com/.

    FiftyWords is a data set of word outlines taken from the George Washington library by T. Rath and used in the paper "Word image matching using dynamic time warping", CVPR 2003. Each case is a word. A series is formed by taking the height profile of the word.

    Donator: T. Rath, R. Manmatha

  15. f

    Accuracy of all methods on different datasets.

    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafis Irtiza Tripto; Mohimenul Kabir; Md. Shamsuzzoha Bayzid; Atif Rahman (2023). Accuracy of all methods on different datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0241686.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Nafis Irtiza Tripto; Mohimenul Kabir; Md. Shamsuzzoha Bayzid; Atif Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All accuracy value is mentioned in percentage (%).

  16. f

    Final LCZ maps with post-classification processing

    • springernature.figshare.com
    zip
    Updated Feb 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Hankey; Meng Qi; Chunxue Xu; Wenwen Zhang; Matthias Demuzere; Perry Hystad; Tianjun Lu; Peter James; Benjamin Bechtel (2024). Final LCZ maps with post-classification processing [Dataset]. http://doi.org/10.6084/m9.figshare.24964275.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    figshare
    Authors
    Steve Hankey; Meng Qi; Chunxue Xu; Wenwen Zhang; Matthias Demuzere; Perry Hystad; Tianjun Lu; Peter James; Benjamin Bechtel
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This compressed folder contains annual CONUS-wide LCZ maps ranging from 1986 to 2020, which is the main and final LCZ product of this dataset. The maps are derived from a lightweight contextual Random Forest model with spatial and temporal post-classification processing. Each map is provided in the Geo TIFF file format with representing year indicated in the file name. For example, the file "TP_2020.tif" represents the LCZ map for 2020. All LCZ maps have a spatial resolution at 100m and projection of USA Contiguous Albers Equal Area Conic (EPSG=5070). The LCZ classes are indicated by numbers 1-17. Note that LCZ class 7 (Lightweight low-rise) is not present in this dataset. Pixels of value 0 represents NoData.

  17. DodgerLoopGame UCR Archive Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). DodgerLoopGame UCR Archive Dataset [Dataset]. http://doi.org/10.5281/zenodo.11186628
    Explore at:
    binAvailable download formats
    Dataset updated
    May 14, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the UCR Archive maintained by University of Southampton researchers. Please cite a relevant or the latest full archive release if you use the datasets. See http://www.timeseriesclassification.com/.

    The traffic data are collected with the loop sensor installed on ramp for the 101 North freeway in Los Angeles. This location is close to Dodgers Stadium; therefore the traffic is affected by volume of visitors to the stadium. Missing values are represented with NaN. - Class 1: Normal Day - Class 2: Game Day There is nothing to infer from the order of examples in the train and test set. Missing values are represented with NaN in the text file. Data created by Ihler, Alexander, Jon Hutchins, and Padhraic Smyth (see [1][2][3]). Data edited by Chin-Chia Michael Yeh.

    [1] Ihler, Alexander, Jon Hutchins, and Padhraic Smyth. "Adaptive event detection with time-varying poisson processes." Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006.

    [2] “UCI Machine Learning Repository: Dodgers Loop Sensor Data Set.” UCI Machine Learning Repository, archive.ics.uci.edu/ml/datasets/dodgers+loop+sensor.

    [3] “Caltrans PeMS.” Caltrans, pems.dot.ca.gov/.

    Donator: C. Yeh

  18. f

    RMSE value of all methods on different datasets.

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafis Irtiza Tripto; Mohimenul Kabir; Md. Shamsuzzoha Bayzid; Atif Rahman (2023). RMSE value of all methods on different datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0241686.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Nafis Irtiza Tripto; Mohimenul Kabir; Md. Shamsuzzoha Bayzid; Atif Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RMSE value of different methods for different test percents are grouped together and best RMSE values are highlighted.

  19. Landsat time series classification training data

    • zenodo.org
    csv, text/x-python
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang Hankui; Zhang Hankui (2023). Landsat time series classification training data [Dataset]. http://doi.org/10.5281/zenodo.8097697
    Explore at:
    text/x-python, csvAvailable download formats
    Dataset updated
    Jun 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zhang Hankui; Zhang Hankui
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for the paper

    Hankui K. Zhang, Dong Luo, Zhongbin Li, Classifying raw irregular Landsat time series (CRIT) for large area land cover mapping by adapting Transformer model.

    It stores the daily raw Landsat ARD annual good quality surface reflectance time series for 1985, 2006 and 2018 for CONUS with 7 land cover classes. Details are in the paper.

  20. Multivariate time series for testing -- RacketSports dataset

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Huber; Florian Huber (2020). Multivariate time series for testing -- RacketSports dataset [Dataset]. http://doi.org/10.5281/zenodo.3742271
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Florian Huber; Florian Huber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The original data was retrieved from http://www.timeseriesclassification.com/description.php?Dataset=RacketSports

    Original data description:
    The data was created by university students plyaing badminton or squash whilst wearing a smart watch (Sony Smart watch 35). The watch relayed the x-y-z coordinates for
    both the gyroscope and accelerometer to an android phone (One Plus 56). The phone
    wrote these values to an Attribute-Relation File Format (arff) file using an app developed
    by a UEA computer science masters student. The problem is to identify which sport and which stroke the players are making. The data was collected at a rate of 10 HZ over 3 seconds whilst the player played
    either a forehand/backhand in squash or a clear/smash in badminton.
    The data was collected as part of an undergraduate project by Phillip Perks in 2017/18.

    Pre-processing
    Data processing was done as described in: https://github.com/NLeSC/mcfly-tutorial/blob/master/utils/tutorial_racketsports.py
    The original data was split into train and test set. Here the data was loaded and further divided into train, test, validation sets.
    To keep it simple we here simply divided the original test part into test and validation.
    The resulting data was stored as numpy .npy files.

    The zip file contains three sets of time series data (X_train, X_test, X_valid) and the respective labels (y_train, y_test, y_valid).

    Reference:
    http://www.timeseriesclassification.com/description.php?Dataset=RacketSports
    (The data was collected as part of an undergraduate project by Phillip Perks in 2017/18.)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hoang Anh Dau; Anthony Bagnall; Kaveh Kamgar; Chin-Chia Michael Yeh; Yan Zhu; Shaghayegh Gharghabi; Chotirat Ann Ratanamahatana; Eamonn Keogh (2023). UCR Time Series Classification Archive Dataset [Dataset]. https://paperswithcode.com/dataset/ucr-time-series-classification-archive

UCR Time Series Classification Archive Dataset

Explore at:
Dataset updated
May 17, 2023
Authors
Hoang Anh Dau; Anthony Bagnall; Kaveh Kamgar; Chin-Chia Michael Yeh; Yan Zhu; Shaghayegh Gharghabi; Chotirat Ann Ratanamahatana; Eamonn Keogh
Description

The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 when the archive grew from 45 to 85 data sets. This paper introduces and will focus on the new data expansion from 85 to 128 data sets. Beyond expanding this valuable resource, this paper offers pragmatic advice to anyone who may wish to evaluate a new algorithm on the archive. Finally, this paper makes a novel and yet actionable claim: of the hundreds of papers that show an improvement over the standard baseline (1-nearest neighbor classification), a large fraction may be misattributing the reasons for their improvement. Moreover, they may have been able to achieve the same improvement with a much simpler modification, requiring just a single line of code.

Search
Clear search
Close search
Google apps
Main menu