100+ datasets found

1000 Empirical Time series
figshare.com
bridges.monash.edu
+1more
png
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Fulcher (2023). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5436136.v10
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ben Fulcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
Sample Time Series Data
kaggle.com
zip
Updated Feb 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tan Phan (2022). Sample Time Series Data [Dataset]. https://www.kaggle.com/datasets/phanttan/sample-time-series-data
Explore at:
zip(15113 bytes)Available download formats
Dataset updated
Feb 12, 2022
Authors
Tan Phan
Description
Dataset

This dataset was created by Tan Phan

Contents
Rainfall Dataset for Simple Time Series Analysis
kaggle.com
zip
Updated Apr 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sujith K Mandala (2024). Rainfall Dataset for Simple Time Series Analysis [Dataset]. https://www.kaggle.com/datasets/sujithmandala/rainfall-dataset-for-simple-time-series-analysis
Explore at:
zip(684 bytes)Available download formats
Dataset updated
Apr 20, 2024
Authors
Sujith K Mandala
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains daily rainfall measurements (in millimeters) for the year 2022. The data spans from January 1, 2022, to July 3, 2022, covering a total of 184 days. The dataset can be used for various machine learning tasks, such as time series forecasting, pattern recognition, or anomaly detection related to rainfall patterns.

Column Descriptors:

date (date): Description: The date of the rainfall measurement in the format YYYY-MM-DD. Example: 2022-01-01 rainfall (float): Description: The amount of rainfall recorded on the corresponding date, measured in millimeters (mm). Example: 12.5 Range: The rainfall values range from 0.0 mm (no rainfall) to 22.4 mm (the maximum recorded value in the dataset). Missing values: There are no missing values in this column.
O
Time series
data.open-power-system-data.org
csv, sqlite, xlsx
Updated Oct 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Muehlenpfordt (2020). Time series [Dataset]. http://doi.org/10.25832/time_series/2020-10-06
Explore at:
csv, sqlite, xlsxAvailable download formats
Unique identifier
https://doi.org/10.25832/time_series/2020-10-06
Dataset updated
Oct 6, 2020
Dataset provided by
Open Power System Data
Authors
Jonathan Muehlenpfordt
Time period covered
Jan 1, 2015 - Oct 1, 2020
Variables measured
utc_timestamp, DE_wind_profile, DE_solar_profile, DE_wind_capacity, DK_wind_capacity, SE_wind_capacity, CH_solar_capacity, DE_solar_capacity, DK_solar_capacity, AT_price_day_ahead, and 290 more
Description
Load, wind and solar, prices in hourly resolution. This data package contains different kinds of timeseries data relevant for power system modelling, namely electricity prices, electricity consumption (load) as well as wind and solar power generation and capacities. The data is aggregated either by country, control area or bidding zone. Geographical coverage includes the EU and some neighbouring countries. All variables are provided in hourly resolution. Where original data is available in higher resolution (half-hourly or quarter-hourly), it is provided in separate files. This package version only contains data provided by TSOs and power exchanges via ENTSO-E Transparency, covering the period 2015-mid 2020. See previous versions for historical data from a broader range of sources. All data processing is conducted in Python/pandas and has been documented in the Jupyter notebooks linked below.
Multivariate Time Series Search - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Multivariate Time Series Search - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/multivariate-time-series-search
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem — (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.
p
Santa Fe Time Series Competition Data Set B
physionet.org
search.datacite.org
Updated Jan 6, 2000
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2000). Santa Fe Time Series Competition Data Set B [Dataset]. http://doi.org/10.13026/C20W2T
Explore at:
Unique identifier
https://doi.org/10.13026/C20W2T
Dataset updated
Jan 6, 2000
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This is a multivariate data set recorded from a patient in the sleep laboratory of the Beth Israel Hospital (now the Beth Israel Deaconess Medical Center) in Boston, Massachusetts. This data set was extracted from record slp60 of the MIT-BIH Polysomnographic Database, and it was submitted to the Santa Fe Time Series Competition in 1991 by our group. The data are presented in text form and have been split into two sequential parts. Each line contains simultaneous samples of three parameters; the interval between samples in successive lines is 0.5 seconds. The first column is the heart rate, the second is the chest volume (respiration force), and the third is the blood oxygen concentration (measured by ear oximetry). The sampling frequency for each measurement is 2 Hz (i.e., the time interval between measurements in successive rows is 0.5 seconds).
d
COVID-19 Time Series Data
data.world
kaggle.com
csv, zip
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shad Reynolds (2025). COVID-19 Time Series Data [Dataset]. https://data.world/shad/covid-19-time-series-data
Explore at:
csv, zipAvailable download formats
Dataset updated
Mar 18, 2025
Authors
Shad Reynolds
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description

This data is synced hourly from https://github.com/CSSEGISandData/COVID-19. All credit is to them.

Latest Confirmed Cases

@(https://data.world/shad/covid-analysis/workspace/query?datasetid=covid-19-time-series-data&queryid=e066701e-fa8d-4c9f-97f8-aab3a6f219a8)

I have also added confirmed_pivot.csv which gives a slightly more workable view of the data. Extra columns/day makes things difficult.

@(https://data.world/shad/covid-analysis/workspace/file?datasetid=covid-19-time-series-data&filename=confirmed_pivot)

#
Z
Controlled Anomalies Time Series (CATS) Dataset
data.niaid.nih.gov
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7646896
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Solenix Engineering GmbH
Authors
Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

5 million timestamps. Sensors readings are at 1Hz sampling frequency.

1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.

4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).

200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.

Different types of anomalies to understand what anomaly types can be detected by different approaches. The categories are available in the dataset and in the metadata.

Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.

Suitable for root cause analysis. In addition to the anomaly category, the time series channel in which the anomaly first developed itself is recorded and made available as part of the metadata. This can be useful to evaluate the performance of algorithm to trace back anomalies to the right root cause channel.

Affected channels. In addition to the knowledge of the root cause channel in which the anomaly first developed itself, we provide information of channels possibly affected by the anomaly. This can also be useful to evaluate the explainability of anomaly detection systems which may point out to the anomalous channels (root cause and affected).

Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.

Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.

Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

Change Log

Version 2

Metadata: we include a metadata.csv with information about:

Anomaly categories

Root cause channel (signal in which the anomaly is first visible)

Affected channel (signal in which the anomaly might propagate) through coupled system dynamics

Removal of anomaly overlaps: version 1 contained anomalies which overlapped with each other resulting in only 190 distinct anomalous segments. Now, there are no more anomaly overlaps.

Two data files: CSV and parquet for convenience.

[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

About Solenix

Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.
Air Pollution Forecasting - LSTM Multivariate
kaggle.com
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupak Roy/ Bob (2022). Air Pollution Forecasting - LSTM Multivariate [Dataset]. https://www.kaggle.com/datasets/rupakroy/lstm-datasets-multivariate-univariate
Explore at:
zip(454764 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Rupak Roy/ Bob
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
THE MISSION

The story behind the dataset is how to apply LSTM architecture to understand and apply multiple variables together to contribute more accuracy towards forecasting.

THE CONTENT

Air Pollution Forecasting The Air Quality dataset.

This is a dataset that reports on the weather and the level of pollution each hour for five years at the US embassy in Beijing, China.

The data includes the date-time, the pollution called PM2.5 concentration, and the weather information including dew point, temperature, pressure, wind direction, wind speed and the cumulative number of hours of snow and rain. The complete feature list in the raw data is as follows:

No: row number year: year of data in this row month: month of data in this row day: day of data in this row hour: hour of data in this row pm2.5: PM2.5 concentration DEWP: Dew Point TEMP: Temperature PRES: Pressure cbwd: Combined wind direction Iws: Cumulated wind speed Is: Cumulated hours of snow Ir: Cumulated hours of rain We can use this data and frame a forecasting problem where, given the weather conditions and pollution for prior hours, we forecast the pollution at the next hour.
u
Data from: Predicting spatial-temporal patterns of diet quality and large...
agdatacommons.nal.usda.gov
docx
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean Kearney; Lauren M. Porensky; David J. Augustine; Justin D. Derner; Feng Gao (2025). Data from: Predicting spatial-temporal patterns of diet quality and large herbivore performance using satellite time series [Dataset]. http://doi.org/10.15482/USDA.ADC/1522609
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1522609
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Sean Kearney; Lauren M. Porensky; David J. Augustine; Justin D. Derner; Feng Gao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Analysis-ready tabular data from "Predicting spatial-temporal patterns of diet quality and large herbivore performance using satellite time series" in Ecological Applications, Kearney et al., 2021. Data is tabular data only, summarized to the pasture scale. Weight gain data for individual cattle and the STARFM-derived Landsat-MODIS fusion imagery can be made available upon request. Resources in this dataset:Resource Title: Metadata - CSV column names, units and descriptions. File Name: Kearney_et_al_ECOLAPPL_Patterns of herbivore - metada.docxResource Description: Column names, units and descriptions for all CSV files in this datasetResource Title: Fecal quality data. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_FQ_cln.csvResource Description: Field-sampled fecal quality (CP = crude protein; DOM = digestible organic matter) data and phenology-related APAR metrics derived from 30 m daily Landsat-MODIS fusion satellite imagery. All data are paddock-scale averages and the paddock is the spatial scale of replication and week is the temporal scale of replication. Fecal samples were collected by USDA-ARS staff from 3-5 animals per paddock (10% - 25% of animals in each herd) weekly during each grazing season from 2014 to 2019 across 10 different paddocks at the Central Plains Experimental Range (CPER) near Nunn, CO. Samples were analyzed at the Grazingland Animal Nutrition Lab (GANlab, https://cnrit.tamu.edu/index.php/ganlab/) using near infrared spectroscopy (see Lyons & Stuth, 1992; Lyons, Stuth, & Angerer, 1995). Not every herd was sampled every week or every year, resulting in a total of 199 samples. Samples represent all available data at the CPER during the study period and were collected for different research and adaptive management objectives, but following the basic protocol described above. APAR metrics were derived from the paddock-scale APAR daily time series (all paddock pixels averaged daily to create a single paddock-scale time series). All APAR metrics are calculated for the week that corresponds to the week that fecal quality samples were collected in the field. See Section 2.2.4 of the corresponding manuscript for a complete description of the APAR metrics. Resource Title: Monthly ADG. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_ADG_monthly_cln.csvResource Description: Monthly average daily gain (ADG) of cattle weights at the paddock scale and the three satellite-derived metrics used to build regression model to predict AD: crude protein (CP), digestible organic matter (DOM) and aboveground net herbaceous production (ANHP). Data table also includes stocking rate (animal units per hectare) used as an interaction term in the ADG regression model and all associated data to derive each of these variables (e.g., sampling start and end dates, 30 m daily Landsat-MODIS fusion satellite imagery-derived APAR metrics, cattle weights, etc.). We calculated paddock-scale average daily gain (ADG, kg hd-1 day-1) from 2000-2019 for yearlings weighed approximately every 28-days during the grazing season across 6 different paddocks with stocking densities of 0.08 – 0.27 animal units (AU) ha-1, where one AU is equivalent to a 454 kg animal. It is worth noting that AU’s change as a function of both the number of cattle within a paddock and the size of individual animals, the latter of which changes within a single grazing season. This becomes important to consider when using sub-seasonal weight data for fast-growing yearlings. For paddock-scale ADG, we first calculated ADG for each individual yearling as the difference between the weights obtained at the end and beginning of each period, divided by the number of days in each period, and then averaged for all individuals in the paddock. We excluded data from 2013 due to data collection inconsistencies. We note that most of the monthly weight data (97%) is from 3 paddocks where cattle were weighed every year, whereas in the other 3 paddocks, monthly weights were only measured during 2017-2019. Apart from the 2013 data, which were not comparable to data from other years, the data represents all available weight gain data for CPER to maximize spatial-temporal coverage and avoid potential bias from subjective decisions to subset the data. Data may have been collected for different projects at different times, but was collected in a consistent way. This resulted in 269 paddock-scale estimates of monthly ADG, with robust temporal, but limited spatial, coverage. CP and DOM were estimated from a random forest model trained from the five APAR metrics: rAPAR, dAPAR, tPeak, iAPAR and iAPAR-dry (see manuscript Section 2.3 for description). APAR metrics were derived from the paddock-scale APAR daily time series (all paddock pixels averaged daily to create a single paddock-scale time series). All APAR metrics are calculated as the average of the approximately 28-day period that corresponds to the ADG calculation. See Section 2.2.4 of the manuscript for a complete description of the APAR metrics. ANHP was estimated from a linear regression model developed by Gaffney et al. (2018) to calculate net aboveground herbaceous productivity (ANHP; kg ha-1) from iAPAR. We averaged the coefficients of 4 spatial models (2013-2016) developed by Gaffney et al. (2018), resulting in the following equation: ANHP = -26.47 + 2.07(iAPAR) We first calculated ANHP for each day of the grazing season at the paddock scale, and then took the average ANHP for the 28-day period. REFERENCES: Gaffney, R., Porensky, L. M., Gao, F., Irisarri, J. G., Durante, M., Derner, J. D., & Augustine, D. J. (2018). Using APAR to predict aboveground plant productivity in semi-aid rangelands: Spatial and temporal relationships differ. Remote Sensing, 10(9). doi: 10.3390/rs10091474 Resource Title: Season-long ADG. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_ADG_seasonal_cln.csvResource Description: Season-long observed and model-predicted average daily gain (ADG) of cattle weights at the paddock scale. Also includes two variables used to analyze patterns in model residuals: percent sand content and season-long aboveground net herbaceous production (ANHP). We calculated observed paddock-scale ADG for the entire grazing season from 2010-2019 (excluding 2013 due to data collection inconsistencies) by averaging seasonal ADG of each yearling, determined as the difference between the end and starting weights divided by the number of days in the grazing season. This dataset was available for 40 paddocks spanning a range of soil types, plant communities, and topographic positions. Data may have been collected for different projects at different times, but was collected in a consistent way. We note that there was spatial overlap among a small number paddock boundaries across different years since some fence lines were moved in 2012 and 2014. Model-predicted paddock-scale ADG was derived using the monthly ADG regression model described in Sections 2.3.3 and 2.3.4. of the associated manuscript. In short, we predicted season-long cattle weight gains by first predicting daily weight gain for each day of the grazing season from the monthly regression model using a 28-day moving average of model inputs (CP, DOM and ANHP ). We calculated the final ADG for the entire grazing season as the average predicted ADG, starting 28-days into the growing season. Percent sand content was obtained as the paddock-scale average of POLARIS sand content in the upper 0-30 cm. ANHP was calculated on the last day of the grazing season fusing a linear regression model developed by Gaffney et al. (2018) to calculate net aboveground herbaceous productivity (ANHP; kg ha-1) from satellite-derived integrated absorbed photosynthetically active radiation (iAPAR) (see Section 3.1.2 of the associated manuscript). We averaged the coefficients of 4 spatial models (2013-2016) developed by Gaffney et al. (2018), resulting in the following equation: ANHP = -26.47 + 2.07(iAPAR) REFERENCES: Gaffney, R., Porensky, L. M., Gao, F., Irisarri, J. G., Durante, M., Derner, J. D., & Augustine, D. J. (2018). Using APAR to predict aboveground plant productivity in semi-aid rangelands: Spatial and temporal relationships differ. Remote Sensing, 10(9). doi: 10.3390/rs10091474
N
Population Estimates Time Series Data
dtechtive.com
find.data.gov.scot
Updated Mar 27, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Records of Scotland (2011). Population Estimates Time Series Data [Dataset]. https://dtechtive.com/datasets/3616
Explore at:
Dataset updated
Mar 27, 2011
Dataset provided by
National Records of Scotland
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
Scotland
Description
Over time statistical outputs (and time series data) may be subject to revisions or corrections. Revisions are generally planned, and are the result of either improvements in statistical methods or the availability of additional data. For example, the annual mid-year population estimates are revised after a census to take account of the additional information gained from the census results. Details of planned revisions are held within the Metadata alongside each publication. Corrections are unplanned and occur when errors in either the statistical data or methodology are found after release of the data. The latest correction to these datasets was in September 2018, for more information please see the revisions and corrections page. This time series section provides access to the latest time series data, taking into account any revisions or corrections over the years. Note: Tables are mainly offered for the purposes of extracting figures. Due to the size of some of the sheets they are not recommended for printing.
Sport Activity Dataset - MTS-5
kaggle.com
zip
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarno Matarmaa (2023). Sport Activity Dataset - MTS-5 [Dataset]. https://www.kaggle.com/datasets/jarnomatarmaa/sportdata-mts-5
Explore at:
zip(498699 bytes)Available download formats
Dataset updated
Jul 13, 2023
Authors
Jarno Matarmaa
License
https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en
Description
Description

Dataset consists of data in categories walking, running, biking, skiing, and roller skiing (5). Sport activities have been recorded by an individual active (non-competitive) athlete. Data is pre-processed, standardized and splitted in four parts (each dimension in its own file): * HR-DATA_std_1140x69 (heart rate signals) * SPD-DATA_std_1140x69 (speed signals) * ALT-DATA_std_1140x69 (altitude signals) * META-DATA_1140x4 (labels and details)

NOTE: Signal order between the separate files must not be confused when processing the data. Signal order is critical; first index in each of the file comes from the same activity which label corresponds to first index in the target data file, and so on. So, data should be constructed and files combined into the same table while reading the files, ideally using nested data structure. Something like in the picture below:

You may check the related TSC projects in GitHub: - "https://github.com/JABE22/MasterProject">Sport Activity Classification Using Classical Machine Learning and Time Series Methods - Symbolic Representation of Multivariate Time Series Signals in Sport Activity Classification - Kaggle Project

https://mediauploads.data.world/e1ccd4d36522e04c0061d12d05a87407bec80716f6fe7301991eaaccd577baa8_mts_data.png" alt="Nested data structure for multivariate time series classifiers">

In the following picture one can see five signal samples for each dimension (Heart Rate, Speed, Altitude) in standard feature value format. So, each figure contains signal from five different random activities (can be same or different category). However, for example, signal indexes number 1 in each three figure are from the same activity. Figures just visualizes what kind of signals dataset consists. They do not have any particular meaning.

https://mediauploads.data.world/162b7086448d8dbd202d282014bcf12bd95bd3174b41c770aa1044bab22ad655_signal_samples.png" alt="Signals from sport activities (Heart Rate, Speed, and Altitude)">

Dataset size and construction procedure

The original amount of sport activities is 228. From each of them, starting from the index 100 (seconds), have been picked 5 x 69 second consecutive segments, that is expressed as a formula below:

https://mediauploads.data.world/68ce83092ec65f6fbaee90e5de6e12df40498e08fa6725c111f1205835c1a842_segment_equation.png" alt="Data segmentation and augmentation formula">

where 𝐷 = 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑓𝑖𝑙𝑡𝑒𝑟𝑒𝑑 𝑑𝑎𝑡𝑎 ,𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑖𝑒𝑠 , 𝑠 = 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 𝑠𝑡𝑎𝑟𝑡 𝑖𝑛𝑑𝑒𝑥 , 𝑙 = 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 𝑙𝑒𝑛𝑔𝑡ℎ, and 𝑛 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑠 from a single original sequence 𝐷𝑖 , resulting the new set of equal length segments 𝐷𝑠𝑒𝑔. And in this certain case the equation takes the form of:

https://mediauploads.data.world/63dd87bf3d0010923ad05a8286224526e241b17bbbce790133030d8e73f3d3a7_data_segmentation_formula.png" alt="Data segmentation and augmentation formula with values">

Thus, dataset has dimesions of 1140 x 69 x 3.

Additional information

Data has been recorded without knowing it will be used in research, therefore it represents well real-world application of data source and can provide excellent tool to test algorithms in real data.

Recording devices

Data has been recorded using two type of Garmin devices. Models are Forerunner 920XT and vivosport. Vivosport is activity tracker and measures heart rate from the wrist using optical sensor, whereas 920XT requires external sensor belt (hear rate + inertial) installed under chest when doing exercises. Otherwise devices are not essentially different, they uses GPS location to measure speed and inertial barometer to measure elevation changes.

Device manuals - Garmin FR-920XT - Garmin Vivosport

Person profile

Age: 30-31, Weight: 82, Length: 181, Active athlete (non-competitive)
Data from: Web Traffic Dataset
kaggle.com
zip
Updated May 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramin Huseyn (2024). Web Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/raminhuseyn/web-traffic-time-series-dataset
Explore at:
zip(14740 bytes)Available download formats
Dataset updated
May 19, 2024
Authors
Ramin Huseyn
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.
A Sample Time Series Data
kaggle.com
zip
Updated Dec 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haydar Ozler (2018). A Sample Time Series Data [Dataset]. https://www.kaggle.com/hozler/a-sample-time-series-data
Explore at:
zip(5767 bytes)Available download formats
Dataset updated
Dec 15, 2018
Authors
Haydar Ozler
Description
Dataset

This dataset was created by Haydar Ozler

Contents
GOCE Satellite Telemetry
kaggle.com
Updated Jul 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
astro_pat (2024). GOCE Satellite Telemetry [Dataset]. https://www.kaggle.com/datasets/patrickfleith/goce-satellite-telemetry
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2024
Dataset provided by
Kaggle
Authors
astro_pat
Description
Utilisation of this data is subject to European Space Agency's Earth Observation Terms and Conditions. Read T&C here

This is Dataset Version 3 - Updates may be done following feedback from the machine learning community.

Dataset Description

This dataset contains 327 time series corresponding to the temporal values of 327 telemetry parameters over the life of the real GOCE satellite (from March 2009 to October 2013). It consists both the raw data and Machine-Learning ready-to-use resampled data: - The raw values (calibrated values of each parameter) as {param}_raw.parquet files (irregular) - Resampled and popular statistics computed over 10-minutes windows for each parameter as {param}_stats_10min.parquet files. - Resampled and popular statistics computed over 6-hours windows for each parameter as {param}_stats_6h.parquet - metadata.csv list of all parameters with description, subsystem, first and last timestamp where a value is recorded, fraction of NaN in the calculated statistics and the longest data gap. - mass_properties.csv: provides information relative to the satellite mass (for example the remaining fuel on-board).

Why is it a good dataset for time series forecasting?

Real-world: the data originates from a real-world complex engineering system

Many variables: 327 allowing for multivariate time series forecasting.

Variables having engineering values and units (Volt, Ampere, bar, m, m/s, etc...). See the metadata

Different and irregular sampling rates: some parameters have a value recorded every second, other have a value recorded at a lower sampling rate such as every 16 or 32s. This is a challenge often encountered in real-world systems with sensor records that complexity the data pipelines, and input data fed into your models. If you want to start easy, work with the 10min or 6h resampled files.

Missing Data and Large Gaps: you'll have to drop many parameters which have too much missing data, and carefully design and test you data processing, model training, and model evaluation strategy.

Suggested task 1: forecast 24 hrs ahead the 10-min last value given historical data

Suggested task 2: forecast 7 days ahead the 6-hour last value given historical data

About the GOCE Satellite

The Gravity Field and Steady-State Ocean Circulation Explorer (GOCE; pronounced ‘go-chay’), is a scientific mission satellite from the European Space Agency (ESA).

Objectives

GOCE's primary mission objective was to provide an accurate and detailed global model of Earth's gravity field and geoid. For this purpose, it is equipped with a state-of-the-art Gravity Gradiometer and precise tracking system.

Payloads

The satellite's main payload was the Electrostatic Gravity Gradiometer (EGG) to measure the gravity field of Earth. Other payload was an onboard GPS receiver used as a Satellite-to-Satellite Tracking Instrument (SSTI); a compensation system for all non-gravitational forces acting on the spacecraft. The satellite was also equipped with a laser retroreflector to enable tracking by ground-based Satellite laser ranging station.

The satellite's unique arrow shape and fins helped keep GOCE stable as it flew through the thermosphere at a comparatively low altitude of 255 kilometres (158 mi). Additionally, an ion propulsion system continuously compensated for the variable deceleration due to air drag without the vibration of a conventional chemically powered rocket engine, thus limiting the errors in gravity gradient measurements caused by non-gravitational forces and restoring the path of the craft as closely as possible to a purely inertial trajectory.

Thermal considerations

Due to the orbit and satellite configuration, the solar panels experienced extreme temperature variations. The design therefore had to include materials that could tolerate temperatures as high as 160 degC and as low as -170 degC.

Due to its stringent temperature stability requirements (for the gradiometer sensor heads, in the range of milli-Kelvin) the gradiometer was thermally decoupled from the satellite and had its own dedicated thermal-control system.

Mission Operations

Flight operations were conducted from the European Space Operations Centre, based in Darmstadt, Germany.

It was launched on 17 March 2009 and came to and end of mission on 21 October 2013 because it ran out of propellant. As planned, the satellite began dropping out of orbit and made an uncontrolled re-entry on 11 November 2013

Orbit

GOCE used a Sun-synchronous orbit with an inclindation of 96.7 degree, a mean altitude of approximately 263 km, an orbital period of 90 minutes, and a mean local solar time at ascending node of 18:00.

Resources

[Data Source](https://earth.esa....
Population estimates time series dataset
ons.gov.uk
cy.ons.gov.uk
csv, xlsx
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Population estimates time series dataset [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatestimeseriesdataset
Explore at:
csv, xlsxAvailable download formats
Dataset updated
Nov 27, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The mid-year estimates refer to the population on 30 June of the reference year and are produced in line with the standard United Nations (UN) definition for population estimates. They are the official set of population estimates for the UK and its constituent countries, the regions and counties of England, and local authorities and their equivalents.
d
Time-series coral-cover data from Hawaii, Florida, Mo'orea, and the Virgin...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Time-series coral-cover data from Hawaii, Florida, Mo'orea, and the Virgin Islands [Dataset]. https://catalog.data.gov/dataset/time-series-coral-cover-data-from-hawaii-florida-moorea-and-the-virgin-islands
Explore at:
Dataset updated
Nov 19, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Hawaii, U.S. Virgin Islands, Mo'orea, Florida
Description
Coral reefs around the world have degraded over the last half-century as evidenced by loss of live coral cover. This ubiquitous observation led to the establishment of long-term, ecological monitoring programs in several regions with sizable coral-reef resources. As part of the U.S. Geological Survey (USGS) John Wesley Powell Center for Analysis and Synthesis working group "Local-scale ecosystem resilience amid global-scale ocean change: the coral reef example," scientists gathered resultant data from four of these programs in the main Hawaiian Islands, the Florida Keys, Mo'orea in French Polynesia, and St. John in the U.S. Virgin Islands to examine among-site, within-region spatial and temporal variation in coral cover. Data from the four focal regions represent spatial scales ranging from ~80 to 17,000 km2. The surveys chosen for the analysis were carried out at fixed sites between 1992 and 2015. Survey durations differed among focal regions and extended from 11 years at Mo'orea to 24 years at some of the sites in St. John. One hundred and twenty-three fixed sites (defined here as distinct surveys carried out within a defined reef habitat, depth range, or area of shoreline) were surveyed repeatedly (annually or every few years) in each focal region. Only sites with surveys extending over a decade or more and with at least 3 years of surveys were used so as to capture a variety of disturbance events (for example, El Niño events, major storms, etc.). Each focal region has experienced disturbances such as overfishing, disease pandemics, thermal stress, pollution, invasive species, predator outbreaks, and major storms. The data gathered for analysis are provided in this data release and are interpreted in Guest and others (2018).
Coalinga Canal Permethrin ug/L Time Series Data
data.usbr.gov
Updated Jun 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Bureau of Reclamation (2021). Coalinga Canal Permethrin ug/L Time Series Data [Dataset]. https://data.usbr.gov/catalog/4079/item/8718
Explore at:
Dataset updated
Jun 30, 2021
Dataset authored and provided by
United States Bureau of Reclamationhttp://www.usbr.gov/
Time period covered
Mar 19, 2013
Area covered

Variables measured
Permethrin
Description
Measurements of Permethrin collected at Coalinga Canal. Currently collected twice a year, previously collected quarterly. Access further information for this data set by contacting Bureau of Reclamation, California-Great Basin Region, Environmental Affairs Division (CGB-157). See ResultAttributes for STAFF_GAUGE, SMPL_DEPTH, SMPL_CATEGORY_NAME, METHOD_CODE, RESULT_RL, RESULT_RL-UNIT_STD_NAME, RESULT_MDL, RESULT_MDL-UNIT_STD_NAME, USBR_QA_SUBTYPE_NAME, USBR_QULFR_DESCRIPTION. STAFF_GAUGE is the water height in decimal feet measured by gauge (e.g., 15.2). SMPL_DEPTH is the vertical depth at which sample is collected (e.g., 0 - 15 cm). For water samples: depth below water/air interface. For sediment and soil samples: depth below water/solid or air/solid interface. SMPL_CATEGORY_NAME is the category type of sample (e.g., Composite). METHOD_CODE is the name of method used to obtain result (e.g., EPA 200.8). RESULT_RL is the result reporting limit (accounting for dilution) (e.g., 0.02). RESULT_RL-UNIT_STD_NAME is the unit associated with RESULT_RL (e.g., mg/L). RESULT_MDL is the result method detection limit (e.g., 0.007). RESULT_MDL-UNIT_STD_NAME is the unit associated with RESULT_MDL (e.g., mg/L). USBR_QA_SUBTYPE_NAME is the quality control type of the sample (e.g., USBR_BLANK_SPIKE). USBR_QULFR_DESCRIPTION is the quality assurance description (if any) (e.g., Result may have a high bias.).
b
Samples collected from the Monterey Bay Times Series from May 2014 to...
bco-dmo.org
search.dataone.org
csv
Updated Aug 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Francis; Francisco Chavez (2019). Samples collected from the Monterey Bay Times Series from May 2014 to February 2016. These data include CTD, nutrient, chlorophyll a and phaeopigment concentration data. [Dataset]. http://doi.org/10.1575/1912/bco-dmo.774848.1
Explore at:
csv(63.09 KB)Available download formats
Unique identifier
https://doi.org/10.1575/1912/bco-dmo.774848.1
Dataset updated
Aug 7, 2019
Dataset provided by
Biological and Chemical Data Management Office
Authors
Christopher Francis; Francisco Chavez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 5, 2014 - Feb 3, 2016
Area covered

Variables measured
NH4, NO2, NO3, PO4, Date, SiO4, Year, Depth, Cruise, Oxygen, and 22 more
Measurement technique
CTD Sea-Bird
Description
Samples collected from the Monterey Bay Time Series from May 2014 to February 2016. These data include CTD, nutrient, chlorophyll a and paeopigment concentration data.

These data were published in Tolar et al., submitted (Table S1)
Example Timeseries -- Drosophila midgut, minimal data
figshare.com
zip
Updated Jun 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Mitchell; Dillon Cislo (2023). Example Timeseries -- Drosophila midgut, minimal data [Dataset]. http://doi.org/10.6084/m9.figshare.20733091.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20733091.v2
Dataset updated
Jun 4, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Noah Mitchell; Dillon Cislo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Minimal ingredients for running example TubULAR pipeline for timeseries data, from surface extraction to kinematic analysis. See also Example Timeseries -- Drosophila midgut, analyzed data for an expanded package with example analysis.

This dataset is a downsampled and clipped version of data investigated in: N. P. Mitchell, D. J. Cislo, S. Shankar, Y. Lin, B. I. Shraiman, S. J. Streichan, “Visceral organ morphogenesis via calcium-patterned muscle contractions.” eLife 11:e77355 (2022).

Support for the development of the TubULAR analysis codebase was provided by NSF Grant No. PHY-2047140.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ben Fulcher (2023). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10

1000 Empirical Time series

Explore at:

pngAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.5436136.v10

Dataset updated

May 30, 2023

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Ben Fulcher

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.

Clear search

Close search

Google apps

Main menu

1000 Empirical Time series

Sample Time Series Data

Dataset

Contents

Rainfall Dataset for Simple Time Series Analysis

Time series

Multivariate Time Series Search - Dataset - NASA Open Data Portal

Santa Fe Time Series Competition Data Set B

COVID-19 Time Series Data

Controlled Anomalies Time Series (CATS) Dataset

Air Pollution Forecasting - LSTM Multivariate

THE MISSION

THE CONTENT

Data from: Predicting spatial-temporal patterns of diet quality and large...

Population Estimates Time Series Data

Sport Activity Dataset - MTS-5

Description

Dataset size and construction procedure

Additional information

Data from: Web Traffic Dataset

A Sample Time Series Data

Dataset

Contents

GOCE Satellite Telemetry

Dataset Description

Why is it a good dataset for time series forecasting?

About the GOCE Satellite

Objectives

Payloads

Thermal considerations

Mission Operations

Orbit

Resources

Population estimates time series dataset

Time-series coral-cover data from Hawaii, Florida, Mo'orea, and the Virgin...

Coalinga Canal Permethrin ug/L Time Series Data

Samples collected from the Monterey Bay Times Series from May 2014 to...

Example Timeseries -- Drosophila midgut, minimal data

1000 Empirical Time seriesSee More Versions

1000 Empirical Time series