100+ datasets found

d
U.S. Geological Survey Oceanographic Time Series Data Collection
catalog.data.gov
data.usgs.gov
+4more
Updated Oct 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). U.S. Geological Survey Oceanographic Time Series Data Collection [Dataset]. https://catalog.data.gov/dataset/u-s-geological-survey-oceanographic-time-series-data-collection
Explore at:
Dataset updated
Oct 30, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The oceanographic time series data collected by U.S. Geological Survey scientists and collaborators are served in an online database at http://stellwagen.er.usgs.gov/index.html. These data were collected as part of research experiments investigating circulation and sediment transport in the coastal ocean. The experiments (projects, research programs) are typically one month to several years long and have been carried out since 1975. New experiments will be conducted, and the data from them will be added to the collection. As of 2016, all but one of the experiments were conducted in waters abutting the U.S. coast; the exception was conducted in the Adriatic Sea. Measurements acquired vary by site and experiment; they usually include current velocity, wave statistics, water temperature, salinity, pressure, turbidity, and light transmission from one or more depths over a time period. The measurements are concentrated near the sea floor but may also include data from the water column. The user interface provides an interactive map, a tabular summary of the experiments, and a separate page for each experiment. Each experiment page has documentation and maps that provide details of what data were collected at each site. Links to related publications with additional information about the research are also provided. The data are stored in Network Common Data Format (netCDF) files using the Equatorial Pacific Information Collection (EPIC) conventions defined by the National Oceanic and Atmospheric Administration (NOAA) Pacific Marine Environmental Laboratory. NetCDF is a general, self-documenting, machine-independent, open source data format created and supported by the University Corporation for Atmospheric Research (UCAR). EPIC is an early set of standards designed to allow researchers from different organizations to share oceanographic data. The files may be downloaded or accessed online using the Open-source Project for a Network Data Access Protocol (OPeNDAP). The OPeNDAP framework allows users to access data from anywhere on the Internet using a variety of Web services including Thematic Realtime Environmental Distributed Data Services (THREDDS). A subset of the data compliant with the Climate and Forecast convention (CF, currently version 1.6) is also available.
Time series data for forecasting
kaggle.com
zip
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vikram Baliga (2024). Time series data for forecasting [Dataset]. https://www.kaggle.com/datasets/vikkubaliga/monash-oiklab-weather
Explore at:
zip(104274828 bytes)Available download formats
Dataset updated
Apr 17, 2024
Authors
Vikram Baliga
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset

This dataset was created by Vikram Baliga

Released under CC BY-SA 4.0

Contents
1000 Empirical Time series
figshare.com
bridges.monash.edu
+1more
png
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Fulcher (2023). 1000 Empirical Time series [Dataset]. http://doi.org/10.6084/m9.figshare.5436136.v10
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5436136.v10
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ben Fulcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
Dates
figshare.com
search.datacite.org
txt
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandra Villamizar (2016). Dates [Dataset]. http://doi.org/10.6084/m9.figshare.1483488.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1483488.v1
Dataset updated
Jan 20, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Sandra Villamizar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A table of dates for a period of interest, usually a month, expressed in two different formats: mm/dd/yyyy and mm-dd-yyyy Start date: 12/01/2014 End date: 12/31/2014
Rainfall Dataset for Simple Time Series Analysis
kaggle.com
zip
Updated Apr 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sujith K Mandala (2024). Rainfall Dataset for Simple Time Series Analysis [Dataset]. https://www.kaggle.com/datasets/sujithmandala/rainfall-dataset-for-simple-time-series-analysis
Explore at:
zip(684 bytes)Available download formats
Dataset updated
Apr 20, 2024
Authors
Sujith K Mandala
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains daily rainfall measurements (in millimeters) for the year 2022. The data spans from January 1, 2022, to July 3, 2022, covering a total of 184 days. The dataset can be used for various machine learning tasks, such as time series forecasting, pattern recognition, or anomaly detection related to rainfall patterns.

Column Descriptors:

date (date): Description: The date of the rainfall measurement in the format YYYY-MM-DD. Example: 2022-01-01 rainfall (float): Description: The amount of rainfall recorded on the corresponding date, measured in millimeters (mm). Example: 12.5 Range: The rainfall values range from 0.0 mm (no rainfall) to 22.4 mm (the maximum recorded value in the dataset). Missing values: There are no missing values in this column.
Wikipedia time-series graph
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benzi Kirell; Miz Volodymyr; Ricaud Benjamin; Vandergheynst Pierre; Benzi Kirell; Miz Volodymyr; Ricaud Benjamin; Vandergheynst Pierre (2025). Wikipedia time-series graph [Dataset]. http://doi.org/10.5281/zenodo.886484
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.886484
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benzi Kirell; Miz Volodymyr; Ricaud Benjamin; Vandergheynst Pierre; Benzi Kirell; Miz Volodymyr; Ricaud Benjamin; Vandergheynst Pierre
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wikipedia temporal graph.

The dataset is based on two Wikipedia SQL dumps: (1) English language articles and (2) user visit counts per page per hour (aka pagecounts). The original datasets are publicly available on the Wikimedia website.

Static graph structure is extracted from English language Wikipedia articles. Redirects are removed. Before building the Wikipedia graph we introduce thresholds on the minimum number of visits per hour and maximum in-degree. We remove the pages that have less than 500 visits per hour at least once during the specified period. Besides, we remove the nodes (pages) with in-degree higher than 8 000 to build a more meaningful initial graph. After cleaning, the graph contains 116 016 nodes (out of total 4 856 639 pages), 6 573 475 edges. The graph can be imported in two ways: (1) using edges.csv and vertices.csv or (2) using enwiki-20150403-graph.gt file that can be opened with open source Python library Graph-Tool.

Time-series data contains users' visit counts from 02:00, 23 September 2014 until 23:00, 30 April 2015. The total number of hours is 5278. The data is stored in two formats: CSV and H5. CSV file contains data in the following format [page_id :: count_views :: layer], where layer represents an hour. In H5 file, each layer corresponds to an hour as well.
u
NCAR S-Pol radar time series data
data.ucar.edu
ckanprod.data-commons.k8s.ucar.edu
archive
Updated Oct 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCAR/EOL S-Pol Team (2025). NCAR S-Pol radar time series data [Dataset]. http://doi.org/10.5065/D69P3016
Explore at:
archiveAvailable download formats
Unique identifier
https://doi.org/10.5065/D69P3016
Dataset updated
Oct 7, 2025
Authors
NCAR/EOL S-Pol Team
Time period covered
Feb 10, 2014 - Mar 1, 2014
Area covered

Description
S-Pol radar full time series data in IWRF format collected continuously during the LATTE (Lower Atmospheric Thermodynamics & Turbulence Experiment) project. Each file covers about 15 minutes of S-Pol operation. See the FRONT S-Pol Data Availability 2014-2015 document linked below to check on data availability.
u
NCAR S-Pol radar time series data
data.ucar.edu
ckanprod.data-commons.k8s.ucar.edu
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCAR/EOL S-Pol Team (2025). NCAR S-Pol radar time series data [Dataset]. http://doi.org/10.5065/D6VD6WNC
Explore at:
Unique identifier
https://doi.org/10.5065/D6VD6WNC
Dataset updated
Oct 7, 2025
Authors
NCAR/EOL S-Pol Team
Time period covered
Mar 9, 2015 - Jul 16, 2015
Area covered

Description
S-Pol Radar full time series data collected during the Plains Elevated Convection at Night (PECAN) campaign from 9 March 2015 to 16 July 2015. This is a "realtime" PECAN data set. The files are a mix of hourly files ("SPOL_scan") and episodic files (SPOL_vert and SPOL_sunscan). The files are in Integrated Weather Radar Facility (IWRF) format and are available as tar archives.
Timeseries(5min)
kaggle.com
zip
Updated Nov 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Mirza (2022). Timeseries(5min) [Dataset]. https://www.kaggle.com/datasets/qaim313/timeseries5min
Explore at:
zip(1656 bytes)Available download formats
Dataset updated
Nov 10, 2022
Authors
Ali Mirza
Description
The data has been extracted from Alpha Vantage API through rapidAPI and stored in dataframe through Pandas. This is then saved in CSV format.

You can extract the data through the below code👍 .

===============================

*import requests import pandas as pd url = "https://alpha-vantage.p.rapidapi.com/query" querystring = {"interval": "5min", "function": "TIME_SERIES_INTRADAY", "symbol": "MSFT", "datatype": "json", "output_size": "compact"} headers = { "X-RapidAPI-Key": "yourrapidapikey", "X-RapidAPI-Host": "alpha-vantage.p.rapidapi.com" } response = requests.request("GET", url, headers=headers, params=querystring) response.json() def get_data(): while True:

response = requests.request("GET", url, headers=headers, params=querystring) df = pd.DataFrame(response.json()['Time Series (5min)']) df = df.T break return df.to_csv('TimeSeries(5mins).csv')

get_data()*

Please refer to this link for more information https://rapidapi.com/alphavantage/api/alpha-vantage.
Pseudo Periodic Synthetic Time Series
kaggle.com
zip
Updated Aug 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Overfitted (2020). Pseudo Periodic Synthetic Time Series [Dataset]. https://www.kaggle.com/vipulgote4/pseudo-periodic-synthetic-time-series
Explore at:
zip(10970583 bytes)Available download formats
Dataset updated
Aug 25, 2020
Authors
Overfitted
Description
Data type:

The data is a synthetic univariate time series.

Abstract

This data set is designed for testing indexing schemes in time seriesdatabases. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing tasks.

Context

Data Characteristics

This data set is designed for testing indexing schemes in time series databases. It is a much larger dataset than has been used in any published study (That we are currently aware of). It contains one million data points. The data has been split into 10 sections to facilitate testing (see below). We recommend building the index with 9 of the 100,000-datapoint sections, and randomly extracting a query shape from the 10th section. (Some previously published work seems to have used queries that were also used to build the indexing structure. This will produce optimistic results) The data are interesting because they have structure at different resolutions. Each of the 10 sections where generated by independent invocations of the function:https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3650646%2F63a7467c9c096ba461b6f02702e6d816%2Fequation.jpg?generation=1598371655944726&alt=media" alt="">

Where rand(x) produces a random integer between zero and x. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing structure.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Data Format

The data is stored in one ASCII file. There are 10 columns, 100,000 rows. All data points are in the range -0.5 to +0.5. Rows are separated by carriage returns, columns by spaces.

Acknowledgements

Acknowledgements, Copyright Information, and Availability.Freely available for research use.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Internet subscriptions time series data
kaggle.com
zip
Updated Aug 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmet Zamanis (2022). Internet subscriptions time series data [Dataset]. https://www.kaggle.com/datasets/ahmetzamanis/internet-subscriptions-time-series-data
Explore at:
zip(40615 bytes)Available download formats
Dataset updated
Aug 9, 2022
Authors
Ahmet Zamanis
License
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
Description
A dataset of broadband subscriptions and GDP per capita statistics, for 217 countries, between the years 2000-2020. The data is in long format, suitable for time series analysis.

The variables in the dataset are: - year: Year of the observation, between 2000-2020. - country: Country of the observation, 217 in total. - broadband_subs: Number of broadband subscriptions per 100 people. - GDPPC: GDP per capita, in 2022 US$

There are some missing values (NAs) for broadband_subs and GDPPC, especially in the eariler years.

The data source is World Bank Open Data. The original data was retrieved as two separate datasets in wide format, and converted into long format, in May 2022.
d
Time-series water level and water quality data to accompany Scientific...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Time-series water level and water quality data to accompany Scientific Investigations Report 2018-5040 [Dataset]. https://catalog.data.gov/dataset/time-series-water-level-and-water-quality-data-to-accompany-scientific-investigations-2018
Explore at:
Dataset updated
Nov 20, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This Data Release serves as a repository for a set of time-series data used in Scientific Investigations Report 2018-5040. The data represent continuous measurements of specific conductance, water temperature, and/or water level (stage), recorded by a variety of types of data loggers during three multi-day interference tests conducted on the Virgin River at Pah Tempe Springs during November 2013, February 2014, and November 2014. The data presented are the raw data downloaded from the data loggers and are organized according to the date of the test and the type and name of the observation site. The Data Release contains 3 items: 1. An explanatory table, "PahTempe_table1.xlsx", which indicates which parameters were collected and on what instrument at each site during a given test 2. The data, "PahTempe_data.zip"; this zipped file contains the raw data logger files in comma-separated values (CSV) format, organized into folders according to the date of the interference pumping test 3. The metadata document, "PahTempe_metadata.xml" Because these data were collected during multi-day interference pumping tests, they do not represent natural hydrologic conditions in the river, springs, or shallow groundwater system. Users of this data are advised to refer to the larger work citation for proper use and interpretation of the data.
H
2 - Discharge time series data
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Dec 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine D Leclerc (2020). 2 - Discharge time series data [Dataset]. http://doi.org/10.4211/hs.1182147d58724a2a84dc3a382636d35e
Explore at:
zip(4.2 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.1182147d58724a2a84dc3a382636d35e
Dataset updated
Dec 19, 2020
Dataset provided by
HydroShare
Authors
Christine D Leclerc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 3, 1951 - Jun 28, 2020
Area covered

Description
This directory includes discharge time series data (q) for 14 headwater stream networks, produced in standard format and common units of mm/day for straightforward hydrograph inter-comparison.
ERA5 Land hourly time-series data from 1950 to present
cds.climate.copernicus.eu
{csv,netcdf}
Updated Nov 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 Land hourly time-series data from 1950 to present [Dataset]. https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land-timeseries
Explore at:
{csv,netcdf}Available download formats
Dataset updated
Nov 23, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1950 - Dec 31, 2026
Description
ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to the Fifth Generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis (ERA5). Produced by replaying only the land component of the ECMWF ERA5 climate reanalysis, it benefits from the same physical data-assimilation framework but runs offline at higher spatial detail (9 km grid) to deliver richer land-surface information. Reanalysis merges numerical model output with global observations into a globally complete, physically consistent climate record; this “data assimilation” approach mirrors operational weather forecasting but is optimised for historical completeness rather than forecast timeliness. Reanalysis datasets extend back several decades by sacrificing forecast deadlines, allowing additional time to gather observations and retrospectively ingest improved data, thereby enhancing data quality in earlier periods. ERA5-Land uses atmospheric fields from ERA5—air temperature, humidity, pressure—as “forcing” inputs to drive its land-surface model, preventing rapid drift from reality that unconstrained simulations would suffer. Although observations do not enter the land model directly, they shape the atmospheric forcing through assimilation, giving ERA5-Land an indirect observational anchor. To reconcile ERA5’s coarser grid with ERA5-Land’s finer 9 km grid, a lapse-rate correction adjusts input temperatures, humidity, and pressures for altitude differences. Like all numerical simulations, ERA5-Land carries uncertainty that generally grows backward in time as fewer observations were available to constrain the forcing. Users can combine ERA5-Land fields with the uncertainty estimates from equivalent ERA5 variables to assess confidence bounds. The temporal resolution (hourly) and spatial detail (9 km) of ERA5-Land make it invaluable for land-surface applications such as flood and drought forecasting, agricultural monitoring, and hydrological studies. The dataset presented here is a regridded subset of the full ERA5-Land archive, stored in an Analysis-Ready, Cloud-Optimised (ARCO) format specifically designed for retrieving long time-series for individual points. When a user’s requested location does not exactly match a grid point, the nearest grid point is automatically selected. This optimised data source ensures rapid response times.
c
Data from: Cross-National Time Series, 1815-1973
archive.ciser.cornell.edu
icpsr.umich.edu
Updated Jan 5, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arthur Banks (2020). Cross-National Time Series, 1815-1973 [Dataset]. http://doi.org/10.6077/y09q-rh18
Explore at:
Unique identifier
https://doi.org/10.6077/y09q-rh18
Dataset updated
Jan 5, 2020
Authors
Arthur Banks
Variables measured
GeographicUnit
Description
This study is a longitudinal national data series for 167 nations. The present dataset represents an expansion both of temporal coverage and of substantive variable categories from the earlier CROSS POLITY TIME SERIES (ICPSR 5002) by the Center for Comparative Political Research, State University of New York (Binghamton). General areas included among the variables now available are demographic, social, political, and economic topics. Cases in the data collection represent nation-year observations. (Source: downloaded from ICPSR 7/13/10)

Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR07412.v1. We highly recommend using the ICPSR version as they have made this dataset available in multiple data formats.
Pre-Processed Power Grid Frequency Time Series
zenodo.org
bin, zip
Updated Jul 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Kruse; Johannes Kruse; Benjamin Schäfer; Benjamin Schäfer; Dirk Witthaut; Dirk Witthaut (2021). Pre-Processed Power Grid Frequency Time Series [Dataset]. http://doi.org/10.5281/zenodo.3744121
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3744121
Dataset updated
Jul 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johannes Kruse; Johannes Kruse; Benjamin Schäfer; Benjamin Schäfer; Dirk Witthaut; Dirk Witthaut
Description
Overview
This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:

Continental Europe

Great Britain

Nordic

This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.

Data sources
We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).

Continental Europe [2]: We downloaded the data from the German TSO TransnetBW GmbH, which retains the Copyright on the data, but allows to re-publish it upon request [3].

Great Britain [4]: The download was supported by National Grid ESO Open Data, which belongs to the British TSO National Grid. They publish the frequency recordings under the NGESO Open License [5].

Nordic [6]: We obtained the data from the Finish TSO Fingrid, which provides the data under the open license CC-BY 4.0 [7].

Content of the repository

A) Scripts

In the "Download_scripts" folder you will find three scripts to automatically download frequency data from the TSO's websites.

In "convert_data_format.py" we save the data with corrected timestamp formats. Missing data is marked as NaN (processing step (1) in the supplementary material of [1]).

In "clean_corrupted_data.py" we load the converted data and identify corrupted recordings. We mark them as NaN and clean some of the resulting data holes (processing step (2) in the supplementary material of [1]).

The python scripts run with Python 3.7 and with the packages found in "requirements.txt".

B) Data_converted and Data_cleansed
The folder "Data_converted" contains the output of "convert_data_format.py" and "Data_cleansed" contains the output of "clean_corrupted_data.py".

File type: The files are zipped csv-files, where each file comprises one year.

Data format: The files contain two columns. The first one represents the time stamps in the format Year-Month-Day Hour-Minute-Second, which is given as naive local time. The second column contains the frequency values in Hz.

NaN representation: We mark corrupted and missing data as "NaN" in the csv-files.

Use cases
We point out that this repository can be used in two different was:

Use pre-processed data: You can directly use the converted or the cleansed data. Note however that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much. If your application cannot deal with NaNs, you could build upon the following commands to select the longest interval of valid data from the cleansed data:

from helper_functions import * import pandas as pd cleansed_data = pd.read_csv('/Path_to_cleansed_data/data.zip', index_col=0, header=None, squeeze=True, parse_dates=[0]) valid_bounds, valid_sizes = true_intervals(~cleansed_data.isnull()) start,end= valid_bounds[ np.argmax(valid_sizes) ] data_without_nan = cleansed_data.iloc[start:end]

Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "Data_converted".

License
We release the code in the folder "Scripts" under the MIT license [8]. In the case of Nationalgrid and Fingrid, we further release the pre-processed data in the folder "Data_converted" and "Data_cleansed" under the CC-BY 4.0 license [7]. TransnetBW originally did not publish their data under an open license. We have explicitly received the permission to publish the pre-processed version from TransnetBW. However, we cannot publish our pre-processed version under an open license due to the missing license of the original TransnetBW data.
ERA5 hourly time-series data on single levels from 1940 to present
cds.climate.copernicus.eu
netcdf
Updated Apr 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly time-series data on single levels from 1940 to present [Dataset]. https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-timeseries
Explore at:
netcdfAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The dataset presented here is a regridded subset of the full ERA5 data set on native resolution that is stored in a format designed for retrieving long time-series for a single point. When the requested location does not match the exact location of a grid point then the nearest grid point is used instead. It is this source of ERA5 data that is used by the ERA-Explorer to ensure response times required for the interactive web-application. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines.
m
EPR 9°50'N hydrothermal vent temperature data compilation, 1991-2025 (raw...
marine-geo.org
Updated Aug 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MGDS > Marine Geoscience Data System (2025). EPR 9°50'N hydrothermal vent temperature data compilation, 1991-2025 (raw time-series, ASCII format) [Dataset]. http://doi.org/10.60521/332402
Explore at:
Unique identifier
https://doi.org/10.60521/332402
Dataset updated
Aug 14, 2025
Dataset authored and provided by
MGDS > Marine Geoscience Data System
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
Raw ASCII time-series data (cleaned of corrupted records). This data set contains continuous time-series temperature measurements from several hydrothermal vents (Biovent, Mvent, Bio9 vents, Pvent, Lvent) located along the East Pacific Rise (EPR) near 9°50'N. The compilation contains legacy data along with data from cruises AT42-06, AT42-21, RR2102, AT50-07, AT50-21, AT50-33, and AT50-36. The data files are in ASCII format and were collected with temperature probes and autonomous temperature loggers. The data compilation was funded through awards OCE-1834797, OCE-1949485, OCE-1949938, OCE-1948936, ANR-24-CE56-6841 (Project OMENS), ERC-10117070619 (Project SeaSALT).
Various Daily 2.5-degree Tropospheric Analysis Time Series in GRIB Format,...
data.ucar.edu
rda-web-prod.ucar.edu
+4more
grib
Updated Oct 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fleet Numerical Meteorology and Oceanography Center, U.S. Navy, U. S. Department of Defense; National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce; Research Data Archive, Computational and Information Systems Laboratory, National Center for Atmospheric Research, University Corporation for Atmospheric Research (2025). Various Daily 2.5-degree Tropospheric Analysis Time Series in GRIB Format, 1946-1999 [Dataset]. http://doi.org/10.5065/WZ4V-WB07
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.5065/WZ4V-WB07
Dataset updated
Oct 9, 2025
Dataset provided by
National Science Foundationhttp://www.nsf.gov/
Authors
Fleet Numerical Meteorology and Oceanography Center, U.S. Navy, U. S. Department of Defense; National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce; Research Data Archive, Computational and Information Systems Laboratory, National Center for Atmospheric Research, University Corporation for Atmospheric Research
Time period covered
Jan 1946 - Dec 1999
Description
This dataset contains various tropospheric analysis time series' created from the grids of other DSS datasets and converted to GRIB (for grids that were not already in this format).
Currently, the only available time series' are series' of 500mb geopotential height.
Sport Activity Dataset - MTS-5
kaggle.com
zip
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jarno Matarmaa (2023). Sport Activity Dataset - MTS-5 [Dataset]. https://www.kaggle.com/datasets/jarnomatarmaa/sportdata-mts-5
Explore at:
zip(498699 bytes)Available download formats
Dataset updated
Jul 13, 2023
Authors
Jarno Matarmaa
License
https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en
Description
Description

Dataset consists of data in categories walking, running, biking, skiing, and roller skiing (5). Sport activities have been recorded by an individual active (non-competitive) athlete. Data is pre-processed, standardized and splitted in four parts (each dimension in its own file): * HR-DATA_std_1140x69 (heart rate signals) * SPD-DATA_std_1140x69 (speed signals) * ALT-DATA_std_1140x69 (altitude signals) * META-DATA_1140x4 (labels and details)

NOTE: Signal order between the separate files must not be confused when processing the data. Signal order is critical; first index in each of the file comes from the same activity which label corresponds to first index in the target data file, and so on. So, data should be constructed and files combined into the same table while reading the files, ideally using nested data structure. Something like in the picture below:

You may check the related TSC projects in GitHub: - "https://github.com/JABE22/MasterProject">Sport Activity Classification Using Classical Machine Learning and Time Series Methods - Symbolic Representation of Multivariate Time Series Signals in Sport Activity Classification - Kaggle Project

https://mediauploads.data.world/e1ccd4d36522e04c0061d12d05a87407bec80716f6fe7301991eaaccd577baa8_mts_data.png" alt="Nested data structure for multivariate time series classifiers">

In the following picture one can see five signal samples for each dimension (Heart Rate, Speed, Altitude) in standard feature value format. So, each figure contains signal from five different random activities (can be same or different category). However, for example, signal indexes number 1 in each three figure are from the same activity. Figures just visualizes what kind of signals dataset consists. They do not have any particular meaning.

https://mediauploads.data.world/162b7086448d8dbd202d282014bcf12bd95bd3174b41c770aa1044bab22ad655_signal_samples.png" alt="Signals from sport activities (Heart Rate, Speed, and Altitude)">

Dataset size and construction procedure

The original amount of sport activities is 228. From each of them, starting from the index 100 (seconds), have been picked 5 x 69 second consecutive segments, that is expressed as a formula below:

https://mediauploads.data.world/68ce83092ec65f6fbaee90e5de6e12df40498e08fa6725c111f1205835c1a842_segment_equation.png" alt="Data segmentation and augmentation formula">

where 𝐷 = 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑓𝑖𝑙𝑡𝑒𝑟𝑒𝑑 𝑑𝑎𝑡𝑎 ,𝑁 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑖𝑒𝑠 , 𝑠 = 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 𝑠𝑡𝑎𝑟𝑡 𝑖𝑛𝑑𝑒𝑥 , 𝑙 = 𝑠𝑒𝑔𝑚𝑒𝑛𝑡 𝑙𝑒𝑛𝑔𝑡ℎ, and 𝑛 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑠 from a single original sequence 𝐷𝑖 , resulting the new set of equal length segments 𝐷𝑠𝑒𝑔. And in this certain case the equation takes the form of:

https://mediauploads.data.world/63dd87bf3d0010923ad05a8286224526e241b17bbbce790133030d8e73f3d3a7_data_segmentation_formula.png" alt="Data segmentation and augmentation formula with values">

Thus, dataset has dimesions of 1140 x 69 x 3.

Additional information

Data has been recorded without knowing it will be used in research, therefore it represents well real-world application of data source and can provide excellent tool to test algorithms in real data.

Recording devices

Data has been recorded using two type of Garmin devices. Models are Forerunner 920XT and vivosport. Vivosport is activity tracker and measures heart rate from the wrist using optical sensor, whereas 920XT requires external sensor belt (hear rate + inertial) installed under chest when doing exercises. Otherwise devices are not essentially different, they uses GPS location to measure speed and inertial barometer to measure elevation changes.

Device manuals - Garmin FR-920XT - Garmin Vivosport

Person profile

Age: 30-31, Weight: 82, Length: 181, Active athlete (non-competitive)

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2025). U.S. Geological Survey Oceanographic Time Series Data Collection [Dataset]. https://catalog.data.gov/dataset/u-s-geological-survey-oceanographic-time-series-data-collection

U.S. Geological Survey Oceanographic Time Series Data Collection

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Oct 30, 2025

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Description

The oceanographic time series data collected by U.S. Geological Survey scientists and collaborators are served in an online database at http://stellwagen.er.usgs.gov/index.html. These data were collected as part of research experiments investigating circulation and sediment transport in the coastal ocean. The experiments (projects, research programs) are typically one month to several years long and have been carried out since 1975. New experiments will be conducted, and the data from them will be added to the collection. As of 2016, all but one of the experiments were conducted in waters abutting the U.S. coast; the exception was conducted in the Adriatic Sea. Measurements acquired vary by site and experiment; they usually include current velocity, wave statistics, water temperature, salinity, pressure, turbidity, and light transmission from one or more depths over a time period. The measurements are concentrated near the sea floor but may also include data from the water column. The user interface provides an interactive map, a tabular summary of the experiments, and a separate page for each experiment. Each experiment page has documentation and maps that provide details of what data were collected at each site. Links to related publications with additional information about the research are also provided. The data are stored in Network Common Data Format (netCDF) files using the Equatorial Pacific Information Collection (EPIC) conventions defined by the National Oceanic and Atmospheric Administration (NOAA) Pacific Marine Environmental Laboratory. NetCDF is a general, self-documenting, machine-independent, open source data format created and supported by the University Corporation for Atmospheric Research (UCAR). EPIC is an early set of standards designed to allow researchers from different organizations to share oceanographic data. The files may be downloaded or accessed online using the Open-source Project for a Network Data Access Protocol (OPeNDAP). The OPeNDAP framework allows users to access data from anywhere on the Internet using a variety of Web services including Thematic Realtime Environmental Distributed Data Services (THREDDS). A subset of the data compliant with the Climate and Forecast convention (CF, currently version 1.6) is also available.

Clear search

Close search

Google apps

Main menu

U.S. Geological Survey Oceanographic Time Series Data Collection

Time series data for forecasting

Dataset

Contents

1000 Empirical Time series

Dates

Rainfall Dataset for Simple Time Series Analysis

Wikipedia time-series graph

NCAR S-Pol radar time series data

NCAR S-Pol radar time series data

Timeseries(5min)

get_data()*

Pseudo Periodic Synthetic Time Series

Data type:

Abstract

Context

Data Characteristics

Content

Data Format

Acknowledgements

Inspiration

Internet subscriptions time series data

Time-series water level and water quality data to accompany Scientific...

2 - Discharge time series data

ERA5 Land hourly time-series data from 1950 to present

Data from: Cross-National Time Series, 1815-1973

Pre-Processed Power Grid Frequency Time Series

ERA5 hourly time-series data on single levels from 1940 to present

EPR 9°50'N hydrothermal vent temperature data compilation, 1991-2025 (raw...

Various Daily 2.5-degree Tropospheric Analysis Time Series in GRIB Format,...

Sport Activity Dataset - MTS-5

Description

Dataset size and construction procedure

Additional information

U.S. Geological Survey Oceanographic Time Series Data CollectionSee More Versions

U.S. Geological Survey Oceanographic Time Series Data Collection