100+ datasets found
  1. r

    1000 Empirical Time series

    • researchdata.edu.au
    • bridges.monash.edu
    • +1more
    Updated May 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Fulcher (2022). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Ben Fulcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.


    The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.

    The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv.

    These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.

    The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as
    >> TS_Init('INP_Empirical1000.mat');

    Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.

    See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.

  2. Time Series Data

    • kaggle.com
    zip
    Updated Oct 21, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurav Anand (2020). Time Series Data [Dataset]. https://www.kaggle.com/datasets/saurav9786/time-series-data
    Explore at:
    zip(643937 bytes)Available download formats
    Dataset updated
    Oct 21, 2020
    Authors
    Saurav Anand
    Description

    Dataset

    This dataset was created by Saurav Anand

    Contents

  3. Sample Time Series Data

    • kaggle.com
    zip
    Updated Feb 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tan Phan (2022). Sample Time Series Data [Dataset]. https://www.kaggle.com/datasets/phanttan/sample-time-series-data
    Explore at:
    zip(15113 bytes)Available download formats
    Dataset updated
    Feb 12, 2022
    Authors
    Tan Phan
    Description

    Dataset

    This dataset was created by Tan Phan

    Contents

  4. Multivariate Time Series Search - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Multivariate Time Series Search - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/multivariate-time-series-search
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem β€” (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.

  5. Rainfall Dataset for Simple Time Series Analysis

    • kaggle.com
    zip
    Updated Apr 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujith K Mandala (2024). Rainfall Dataset for Simple Time Series Analysis [Dataset]. https://www.kaggle.com/datasets/sujithmandala/rainfall-dataset-for-simple-time-series-analysis
    Explore at:
    zip(684 bytes)Available download formats
    Dataset updated
    Apr 20, 2024
    Authors
    Sujith K Mandala
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains daily rainfall measurements (in millimeters) for the year 2022. The data spans from January 1, 2022, to July 3, 2022, covering a total of 184 days. The dataset can be used for various machine learning tasks, such as time series forecasting, pattern recognition, or anomaly detection related to rainfall patterns.

    Column Descriptors:

    date (date): Description: The date of the rainfall measurement in the format YYYY-MM-DD. Example: 2022-01-01 rainfall (float): Description: The amount of rainfall recorded on the corresponding date, measured in millimeters (mm). Example: 12.5 Range: The rainfall values range from 0.0 mm (no rainfall) to 22.4 mm (the maximum recorded value in the dataset). Missing values: There are no missing values in this column.

  6. O

    Time series

    • data.open-power-system-data.org
    csv, sqlite, xlsx
    Updated Oct 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Muehlenpfordt (2020). Time series [Dataset]. http://doi.org/10.25832/time_series/2020-10-06
    Explore at:
    csv, sqlite, xlsxAvailable download formats
    Dataset updated
    Oct 6, 2020
    Dataset provided by
    Open Power System Data
    Authors
    Jonathan Muehlenpfordt
    Time period covered
    Jan 1, 2015 - Oct 1, 2020
    Variables measured
    utc_timestamp, DE_wind_profile, DE_solar_profile, DE_wind_capacity, DK_wind_capacity, SE_wind_capacity, CH_solar_capacity, DE_solar_capacity, DK_solar_capacity, AT_price_day_ahead, and 290 more
    Description

    Load, wind and solar, prices in hourly resolution. This data package contains different kinds of timeseries data relevant for power system modelling, namely electricity prices, electricity consumption (load) as well as wind and solar power generation and capacities. The data is aggregated either by country, control area or bidding zone. Geographical coverage includes the EU and some neighbouring countries. All variables are provided in hourly resolution. Where original data is available in higher resolution (half-hourly or quarter-hourly), it is provided in separate files. This package version only contains data provided by TSOs and power exchanges via ENTSO-E Transparency, covering the period 2015-mid 2020. See previous versions for historical data from a broader range of sources. All data processing is conducted in Python/pandas and has been documented in the Jupyter notebooks linked below.

  7. p

    Santa Fe Time Series Competition Data Set B

    • physionet.org
    • search.datacite.org
    Updated Jan 6, 2000
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2000). Santa Fe Time Series Competition Data Set B [Dataset]. http://doi.org/10.13026/C20W2T
    Explore at:
    Dataset updated
    Jan 6, 2000
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This is a multivariate data set recorded from a patient in the sleep laboratory of the Beth Israel Hospital (now the Beth Israel Deaconess Medical Center) in Boston, Massachusetts. This data set was extracted from record slp60 of the MIT-BIH Polysomnographic Database, and it was submitted to the Santa Fe Time Series Competition in 1991 by our group. The data are presented in text form and have been split into two sequential parts. Each line contains simultaneous samples of three parameters; the interval between samples in successive lines is 0.5 seconds. The first column is the heart rate, the second is the chest volume (respiration force), and the third is the blood oxygen concentration (measured by ear oximetry). The sampling frequency for each measurement is 2 Hz (i.e., the time interval between measurements in successive rows is 0.5 seconds).

  8. d

    COVID-19 Time Series Data

    • data.world
    • kaggle.com
    csv, zip
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shad Reynolds (2025). COVID-19 Time Series Data [Dataset]. https://data.world/shad/covid-19-time-series-data
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Mar 18, 2025
    Authors
    Shad Reynolds
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    This data is synced hourly from https://github.com/CSSEGISandData/COVID-19. All credit is to them.

    Latest Confirmed Cases

    @(https://data.world/shad/covid-analysis/workspace/query?datasetid=covid-19-time-series-data&queryid=e066701e-fa8d-4c9f-97f8-aab3a6f219a8)

    ​

    I have also added confirmed_pivot.csv which gives a slightly more workable view of the data. Extra columns/day makes things difficult.

    @(https://data.world/shad/covid-analysis/workspace/file?datasetid=covid-19-time-series-data&filename=confirmed_pivot)

    ​

    #

  9. TimeSeries Weather Dataset

    • kaggle.com
    zip
    Updated Jun 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parth (2024). TimeSeries Weather Dataset [Dataset]. https://www.kaggle.com/datasets/parthdande/timeseries-weather-dataset
    Explore at:
    zip(11919419 bytes)Available download formats
    Dataset updated
    Jun 8, 2024
    Authors
    Parth
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains historical weather data of 2 different places , the data features parameters like temperature, humidity, dew point, precipitation, pressure, cloud cover, vapor pressure deficit, wind speed, and wind direction.

  10. h

    Timeseries-PILE

    • huggingface.co
    Updated May 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Auton Lab (2024). Timeseries-PILE [Dataset]. https://huggingface.co/datasets/AutonLab/Timeseries-PILE
    Explore at:
    Dataset updated
    May 11, 2024
    Dataset authored and provided by
    Auton Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Time Series PILE

    The Time-series Pile is a large collection of publicly available data from diverse domains, ranging from healthcare to engineering and finance. It comprises of over 5 public time-series databases, from several diverse domains for time series foundation model pre-training and evaluation.

      Time Series PILE Description
    

    We compiled a large collection of publicly available datasets from diverse domains into the Time Series Pile. It has 13 unique domains of data… See the full description on the dataset page: https://huggingface.co/datasets/AutonLab/Timeseries-PILE.

  11. Z

    Controlled Anomalies Time Series (CATS) Dataset

    • data.niaid.nih.gov
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7646896
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Solenix Engineering GmbH
    Authors
    Patrick Fleith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

    The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

    Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

    4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

    3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

    10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

    5 million timestamps. Sensors readings are at 1Hz sampling frequency.

    1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.

    4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).

    200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.

    Different types of anomalies to understand what anomaly types can be detected by different approaches. The categories are available in the dataset and in the metadata.

    Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.

    Suitable for root cause analysis. In addition to the anomaly category, the time series channel in which the anomaly first developed itself is recorded and made available as part of the metadata. This can be useful to evaluate the performance of algorithm to trace back anomalies to the right root cause channel.

    Affected channels. In addition to the knowledge of the root cause channel in which the anomaly first developed itself, we provide information of channels possibly affected by the anomaly. This can also be useful to evaluate the explainability of anomaly detection systems which may point out to the anomalous channels (root cause and affected).

    Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.

    Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.

    Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

    No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

    Change Log

    Version 2

    Metadata: we include a metadata.csv with information about:

    Anomaly categories

    Root cause channel (signal in which the anomaly is first visible)

    Affected channel (signal in which the anomaly might propagate) through coupled system dynamics

    Removal of anomaly overlaps: version 1 contained anomalies which overlapped with each other resulting in only 190 distinct anomalous segments. Now, there are no more anomaly overlaps.

    Two data files: CSV and parquet for convenience.

    [1] Example Benchmark of Anomaly Detection in Time Series: β€œSebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

    About Solenix

    Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.

  12. u

    Data from: Predicting spatial-temporal patterns of diet quality and large...

    • agdatacommons.nal.usda.gov
    docx
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sean Kearney; Lauren M. Porensky; David J. Augustine; Justin D. Derner; Feng Gao (2025). Data from: Predicting spatial-temporal patterns of diet quality and large herbivore performance using satellite time series [Dataset]. http://doi.org/10.15482/USDA.ADC/1522609
    Explore at:
    docxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Sean Kearney; Lauren M. Porensky; David J. Augustine; Justin D. Derner; Feng Gao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Analysis-ready tabular data from "Predicting spatial-temporal patterns of diet quality and large herbivore performance using satellite time series" in Ecological Applications, Kearney et al., 2021. Data is tabular data only, summarized to the pasture scale. Weight gain data for individual cattle and the STARFM-derived Landsat-MODIS fusion imagery can be made available upon request. Resources in this dataset:Resource Title: Metadata - CSV column names, units and descriptions. File Name: Kearney_et_al_ECOLAPPL_Patterns of herbivore - metada.docxResource Description: Column names, units and descriptions for all CSV files in this datasetResource Title: Fecal quality data. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_FQ_cln.csvResource Description: Field-sampled fecal quality (CP = crude protein; DOM = digestible organic matter) data and phenology-related APAR metrics derived from 30 m daily Landsat-MODIS fusion satellite imagery. All data are paddock-scale averages and the paddock is the spatial scale of replication and week is the temporal scale of replication. Fecal samples were collected by USDA-ARS staff from 3-5 animals per paddock (10% - 25% of animals in each herd) weekly during each grazing season from 2014 to 2019 across 10 different paddocks at the Central Plains Experimental Range (CPER) near Nunn, CO. Samples were analyzed at the Grazingland Animal Nutrition Lab (GANlab, https://cnrit.tamu.edu/index.php/ganlab/) using near infrared spectroscopy (see Lyons & Stuth, 1992; Lyons, Stuth, & Angerer, 1995). Not every herd was sampled every week or every year, resulting in a total of 199 samples. Samples represent all available data at the CPER during the study period and were collected for different research and adaptive management objectives, but following the basic protocol described above. APAR metrics were derived from the paddock-scale APAR daily time series (all paddock pixels averaged daily to create a single paddock-scale time series). All APAR metrics are calculated for the week that corresponds to the week that fecal quality samples were collected in the field. See Section 2.2.4 of the corresponding manuscript for a complete description of the APAR metrics. Resource Title: Monthly ADG. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_ADG_monthly_cln.csvResource Description: Monthly average daily gain (ADG) of cattle weights at the paddock scale and the three satellite-derived metrics used to build regression model to predict AD: crude protein (CP), digestible organic matter (DOM) and aboveground net herbaceous production (ANHP). Data table also includes stocking rate (animal units per hectare) used as an interaction term in the ADG regression model and all associated data to derive each of these variables (e.g., sampling start and end dates, 30 m daily Landsat-MODIS fusion satellite imagery-derived APAR metrics, cattle weights, etc.). We calculated paddock-scale average daily gain (ADG, kg hd-1 day-1) from 2000-2019 for yearlings weighed approximately every 28-days during the grazing season across 6 different paddocks with stocking densities of 0.08 – 0.27 animal units (AU) ha-1, where one AU is equivalent to a 454 kg animal. It is worth noting that AU’s change as a function of both the number of cattle within a paddock and the size of individual animals, the latter of which changes within a single grazing season. This becomes important to consider when using sub-seasonal weight data for fast-growing yearlings. For paddock-scale ADG, we first calculated ADG for each individual yearling as the difference between the weights obtained at the end and beginning of each period, divided by the number of days in each period, and then averaged for all individuals in the paddock. We excluded data from 2013 due to data collection inconsistencies. We note that most of the monthly weight data (97%) is from 3 paddocks where cattle were weighed every year, whereas in the other 3 paddocks, monthly weights were only measured during 2017-2019. Apart from the 2013 data, which were not comparable to data from other years, the data represents all available weight gain data for CPER to maximize spatial-temporal coverage and avoid potential bias from subjective decisions to subset the data. Data may have been collected for different projects at different times, but was collected in a consistent way. This resulted in 269 paddock-scale estimates of monthly ADG, with robust temporal, but limited spatial, coverage. CP and DOM were estimated from a random forest model trained from the five APAR metrics: rAPAR, dAPAR, tPeak, iAPAR and iAPAR-dry (see manuscript Section 2.3 for description). APAR metrics were derived from the paddock-scale APAR daily time series (all paddock pixels averaged daily to create a single paddock-scale time series). All APAR metrics are calculated as the average of the approximately 28-day period that corresponds to the ADG calculation. See Section 2.2.4 of the manuscript for a complete description of the APAR metrics. ANHP was estimated from a linear regression model developed by Gaffney et al. (2018) to calculate net aboveground herbaceous productivity (ANHP; kg ha-1) from iAPAR. We averaged the coefficients of 4 spatial models (2013-2016) developed by Gaffney et al. (2018), resulting in the following equation: ANHP = -26.47 + 2.07(iAPAR) We first calculated ANHP for each day of the grazing season at the paddock scale, and then took the average ANHP for the 28-day period. REFERENCES: Gaffney, R., Porensky, L. M., Gao, F., Irisarri, J. G., Durante, M., Derner, J. D., & Augustine, D. J. (2018). Using APAR to predict aboveground plant productivity in semi-aid rangelands: Spatial and temporal relationships differ. Remote Sensing, 10(9). doi: 10.3390/rs10091474 Resource Title: Season-long ADG. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_ADG_seasonal_cln.csvResource Description: Season-long observed and model-predicted average daily gain (ADG) of cattle weights at the paddock scale. Also includes two variables used to analyze patterns in model residuals: percent sand content and season-long aboveground net herbaceous production (ANHP). We calculated observed paddock-scale ADG for the entire grazing season from 2010-2019 (excluding 2013 due to data collection inconsistencies) by averaging seasonal ADG of each yearling, determined as the difference between the end and starting weights divided by the number of days in the grazing season. This dataset was available for 40 paddocks spanning a range of soil types, plant communities, and topographic positions. Data may have been collected for different projects at different times, but was collected in a consistent way. We note that there was spatial overlap among a small number paddock boundaries across different years since some fence lines were moved in 2012 and 2014. Model-predicted paddock-scale ADG was derived using the monthly ADG regression model described in Sections 2.3.3 and 2.3.4. of the associated manuscript. In short, we predicted season-long cattle weight gains by first predicting daily weight gain for each day of the grazing season from the monthly regression model using a 28-day moving average of model inputs (CP, DOM and ANHP ). We calculated the final ADG for the entire grazing season as the average predicted ADG, starting 28-days into the growing season. Percent sand content was obtained as the paddock-scale average of POLARIS sand content in the upper 0-30 cm. ANHP was calculated on the last day of the grazing season fusing a linear regression model developed by Gaffney et al. (2018) to calculate net aboveground herbaceous productivity (ANHP; kg ha-1) from satellite-derived integrated absorbed photosynthetically active radiation (iAPAR) (see Section 3.1.2 of the associated manuscript). We averaged the coefficients of 4 spatial models (2013-2016) developed by Gaffney et al. (2018), resulting in the following equation: ANHP = -26.47 + 2.07(iAPAR) REFERENCES: Gaffney, R., Porensky, L. M., Gao, F., Irisarri, J. G., Durante, M., Derner, J. D., & Augustine, D. J. (2018). Using APAR to predict aboveground plant productivity in semi-aid rangelands: Spatial and temporal relationships differ. Remote Sensing, 10(9). doi: 10.3390/rs10091474

  13. Network traffic datasets created by Single Flow Time Series Analysis

    • zenodo.org
    • data.niaid.nih.gov
    csv, pdf
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; TomÑő Čejka; TomÑő Čejka (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. http://doi.org/10.5281/zenodo.8035724
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; TomÑő Čejka; TomÑő Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets created by Single Flow Time Series Analysis

    Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, TomÑő Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

    J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

    In the following table is a description of each dataset file:

    File nameDetection problemCitation of original raw dataset
    botnet_binary.csv Binary detection of botnet S. GarcΓ­a et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    botnet_multiclass.csv Multi-class classification of botnet S. GarcΓ­a et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
    cryptomining_design.csvBinary detection of cryptomining; the design part Richard PlnΓ½ et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard PlnΓ½ et al. Datasets of Cryptomining Communication. Zenodo, October 2022
    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
    doh_cic.csv Binary detection of DoH

    Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil JeΕ™Γ‘bek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    edge_iiot_multiclass.csvMulti-class classification of IoT malwareMohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
    https_brute_force.csvBinary detection of HTTPS Brute ForceJan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
    ids_cic_binary.csvBinary detection of intrusion in IDSIman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
    ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
    tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.
    vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.
    vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022
    vpn_vnat_multiclass.csvMulti-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

  14. Example Timeseries -- Drosophila midgut, minimal data

    • figshare.com
    zip
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Mitchell; Dillon Cislo (2023). Example Timeseries -- Drosophila midgut, minimal data [Dataset]. http://doi.org/10.6084/m9.figshare.20733091.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Noah Mitchell; Dillon Cislo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Minimal ingredients for running example TubULAR pipeline for timeseries data, from surface extraction to kinematic analysis. See also Example Timeseries -- Drosophila midgut, analyzed data for an expanded package with example analysis.

    This dataset is a downsampled and clipped version of data investigated in: N. P. Mitchell, D. J. Cislo, S. Shankar, Y. Lin, B. I. Shraiman, S. J. Streichan, β€œVisceral organ morphogenesis via calcium-patterned muscle contractions.” eLife 11:e77355 (2022).

    Support for the development of the TubULAR analysis codebase was provided by NSF Grant No. PHY-2047140.

  15. N

    Population Estimates Time Series Data

    • dtechtive.com
    • find.data.gov.scot
    Updated Mar 27, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Records of Scotland (2011). Population Estimates Time Series Data [Dataset]. https://dtechtive.com/datasets/3616
    Explore at:
    Dataset updated
    Mar 27, 2011
    Dataset provided by
    National Records of Scotland
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Scotland
    Description

    Over time statistical outputs (and time series data) may be subject to revisions or corrections. Revisions are generally planned, and are the result of either improvements in statistical methods or the availability of additional data. For example, the annual mid-year population estimates are revised after a census to take account of the additional information gained from the census results. Details of planned revisions are held within the Metadata alongside each publication. Corrections are unplanned and occur when errors in either the statistical data or methodology are found after release of the data. The latest correction to these datasets was in September 2018, for more information please see the revisions and corrections page. This time series section provides access to the latest time series data, taking into account any revisions or corrections over the years. Note: Tables are mainly offered for the purposes of extracting figures. Due to the size of some of the sheets they are not recommended for printing.

  16. Example timeseries -- zebrafish heart, analyzed data

    • figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Mitchell; Dillon Cislo (2023). Example timeseries -- zebrafish heart, analyzed data [Dataset]. http://doi.org/10.6084/m9.figshare.20670105.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Noah Mitchell; Dillon Cislo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of a beating zebrafish heart dataset using TubULAR. The data was acquired by Sebastian Streichan and Michael Leibling at UC Santa Barbara.

    Financial support provided via Streichan Lab under NSF Grant No. PHY-2047140

    This data was acquired using the techniques described in: K. G. Chan, S. J. Streichan, L. A. Trinh and M. Liebling, "Simultaneous Temporal Superresolution and Denoising for Cardiac Fluorescence Microscopy," in IEEE Transactions on Computational Imaging, vol. 2, no. 3, pp. 348-358, Sept. 2016, doi: 10.1109/TCI.2016.2579606.

  17. m

    Example Stata syntax and data construction for negative binomial time series...

    • data.mendeley.com
    Updated Nov 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Price (2022). Example Stata syntax and data construction for negative binomial time series regression [Dataset]. http://doi.org/10.17632/3mj526hgzx.2
    Explore at:
    Dataset updated
    Nov 2, 2022
    Authors
    Sarah Price
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).

    The variables contained therein are defined as follows:

    case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).

    patid: a unique patient identifier.

    time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,

    ncons: number of consultations per month.

    period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.

    burden: binary variable denoting membership of one of two multimorbidity burden groups.

    We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).

    Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.

  18. D

    Bayesian Modeling of Time Series Data (BayModTS)

    • darus.uni-stuttgart.de
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian HΓΆpfl (2024). Bayesian Modeling of Time Series Data (BayModTS) [Dataset]. http://doi.org/10.18419/DARUS-3876
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    DaRUS
    Authors
    Sebastian HΓΆpfl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    BayModTS is a FAIR workflow for processing highly variable and sparse data. The code and results of the examples in the BayModTS paper are stored in this repository. A maintained version of BayModTS that can be applied to your personal applications can be found on Git Hub.

  19. Sport Activity Dataset - MTS-5

    • kaggle.com
    zip
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarno Matarmaa (2023). Sport Activity Dataset - MTS-5 [Dataset]. https://www.kaggle.com/datasets/jarnomatarmaa/sportdata-mts-5
    Explore at:
    zip(498699 bytes)Available download formats
    Dataset updated
    Jul 13, 2023
    Authors
    Jarno Matarmaa
    License

    https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en

    Description

    Description

    Dataset consists of data in categories walking, running, biking, skiing, and roller skiing (5). Sport activities have been recorded by an individual active (non-competitive) athlete. Data is pre-processed, standardized and splitted in four parts (each dimension in its own file): * HR-DATA_std_1140x69 (heart rate signals) * SPD-DATA_std_1140x69 (speed signals) * ALT-DATA_std_1140x69 (altitude signals) * META-DATA_1140x4 (labels and details)

    NOTE: Signal order between the separate files must not be confused when processing the data. Signal order is critical; first index in each of the file comes from the same activity which label corresponds to first index in the target data file, and so on. So, data should be constructed and files combined into the same table while reading the files, ideally using nested data structure. Something like in the picture below:

    You may check the related TSC projects in GitHub: - "https://github.com/JABE22/MasterProject">Sport Activity Classification Using Classical Machine Learning and Time Series Methods - Symbolic Representation of Multivariate Time Series Signals in Sport Activity Classification - Kaggle Project

    https://mediauploads.data.world/e1ccd4d36522e04c0061d12d05a87407bec80716f6fe7301991eaaccd577baa8_mts_data.png" alt="Nested data structure for multivariate time series classifiers">

    In the following picture one can see five signal samples for each dimension (Heart Rate, Speed, Altitude) in standard feature value format. So, each figure contains signal from five different random activities (can be same or different category). However, for example, signal indexes number 1 in each three figure are from the same activity. Figures just visualizes what kind of signals dataset consists. They do not have any particular meaning.

    https://mediauploads.data.world/162b7086448d8dbd202d282014bcf12bd95bd3174b41c770aa1044bab22ad655_signal_samples.png" alt="Signals from sport activities (Heart Rate, Speed, and Altitude)">

    Dataset size and construction procedure

    The original amount of sport activities is 228. From each of them, starting from the index 100 (seconds), have been picked 5 x 69 second consecutive segments, that is expressed as a formula below:

    https://mediauploads.data.world/68ce83092ec65f6fbaee90e5de6e12df40498e08fa6725c111f1205835c1a842_segment_equation.png" alt="Data segmentation and augmentation formula">

    where 𝐷 = π‘œπ‘Ÿπ‘–π‘”π‘–π‘›π‘Žπ‘™ π‘“π‘–π‘™π‘‘π‘’π‘Ÿπ‘’π‘‘ π‘‘π‘Žπ‘‘π‘Ž ,𝑁 = π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘Žπ‘π‘‘π‘–π‘£π‘–π‘‘π‘–π‘’π‘  , 𝑠 = π‘ π‘’π‘”π‘šπ‘’π‘›π‘‘ π‘ π‘‘π‘Žπ‘Ÿπ‘‘ 𝑖𝑛𝑑𝑒π‘₯ , 𝑙 = π‘ π‘’π‘”π‘šπ‘’π‘›π‘‘ π‘™π‘’π‘›π‘”π‘‘β„Ž, and 𝑛 = π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘ π‘’π‘”π‘šπ‘’π‘›π‘‘π‘  from a single original sequence 𝐷𝑖 , resulting the new set of equal length segments 𝐷𝑠𝑒𝑔. And in this certain case the equation takes the form of:

    https://mediauploads.data.world/63dd87bf3d0010923ad05a8286224526e241b17bbbce790133030d8e73f3d3a7_data_segmentation_formula.png" alt="Data segmentation and augmentation formula with values">

    Thus, dataset has dimesions of 1140 x 69 x 3.

    Additional information

    Data has been recorded without knowing it will be used in research, therefore it represents well real-world application of data source and can provide excellent tool to test algorithms in real data.

    Recording devices

    Data has been recorded using two type of Garmin devices. Models are Forerunner 920XT and vivosport. Vivosport is activity tracker and measures heart rate from the wrist using optical sensor, whereas 920XT requires external sensor belt (hear rate + inertial) installed under chest when doing exercises. Otherwise devices are not essentially different, they uses GPS location to measure speed and inertial barometer to measure elevation changes.

    Device manuals - Garmin FR-920XT - Garmin Vivosport

    Person profile

    Age: 30-31, Weight: 82, Length: 181, Active athlete (non-competitive)

  20. 18S Monterey Bay Time Series: an eDNA data set from Monterey Bay,...

    • gbif.org
    • demo.gbif.org
    • +2more
    Updated Sep 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Chavez; Kathleen Pitz; Francisco Chavez; Kathleen Pitz (2025). 18S Monterey Bay Time Series: an eDNA data set from Monterey Bay, California, including years 2006, 2013 - 2016 [Dataset]. http://doi.org/10.15468/84ntea
    Explore at:
    Dataset updated
    Sep 24, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    NOAA Integrated Ocean Observing System
    Authors
    Francisco Chavez; Kathleen Pitz; Francisco Chavez; Kathleen Pitz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Monterey Bay
    Description

    These data are from marine filtered seawater samples collected at a nearshore station in Monterey Bay, CA. They have undergone metabarcoding for the 18S V9 region. A selection of samples from this plate were included in the publication "Environmental DNA reveals seasonal shifts and potential interactions in a marine community" (Djurhuus et al., 2020). Samples were collected by CTD rosette and filtered by a peristaltic pump system.

    Illumina MiSeq metabarcoding data was processed in the following steps: (1) primer sequences were removed through atropos (Didion et al., 2017), (2) reads were denoised, ASV sequences inferred, paired reads merged and chimeras removed through Dada2 (Callahan et al., 2016), (3) taxonomic ranks were assigned through blastn searches to NCBI GenBank's non-redundant nucleotide database (nt) with hits filtered by lowest common ancestor algorithm within MEGAN6 (Huson et al., 2016). Furthermore, post-MEGAN6 filtering was performed to ensure only contigs with a hit of β‰₯97% sequence identity were annotated to the species level and only contigs with a hit of β‰₯95% sequence identity were annotated to the genus level. Annotations were elevated to the next highest taxonomic level for contigs that failed these conditions.

    Data are presented in two comma-separated values files: occurrence.csv, and DNADerivedData.csv. The former contains the taxonomic identification of each ASV observed and its number of reads, in addition to relevant metadata including the location the water sample was taken, references for the identification procedure, and links to archived sequences. The latter contains the DNA sequence of each ASV observed, in addition to relevant metadata including primer information and links to detailed field and laboratory methods. This data set was transformed from its native format into a table structure using Darwin Core and DNA Derived Data Extension term names as column names.

    References:

    Djurhuus, A, Closek, CJ, Kelly, RP et al. (2020). Environmental DNA reveals seasonal shifts and potential interactions in a marine community. Nat Commun 11, 254. https://doi.org/10.1038/s41467-019-14105-1

    Didion JP, Martin M, Collins FS. (2017) Atropos: specific, sensitive, and speedy trimming of sequencing reads. PeerJ 5:e3720 https://doi.org/10.7717/peerj.3720

    Callahan, B., McMurdie, P., Rosen, M. et al. (2016) DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods 13, 581–583 . https://doi.org/10.1038/nmeth.3869

    Huson DH, Beier S, Flade I, GΓ³rska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R. (2016) MEGAN community edition-interactive exploration and analysis of large-scale microbiome sequencing data. PLoS computational biology. Jun 21;12(6):e1004957.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ben Fulcher (2022). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10

1000 Empirical Time series

Explore at:
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Ben Fulcher
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.


The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.

The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv.

These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.

The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as
>> TS_Init('INP_Empirical1000.mat');

Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.

See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.

Search
Clear search
Close search
Google apps
Main menu