Facebook
TwitterThe oceanographic time series data collected by U.S. Geological Survey scientists and collaborators are served in an online database at http://stellwagen.er.usgs.gov/index.html. These data were collected as part of research experiments investigating circulation and sediment transport in the coastal ocean. The experiments (projects, research programs) are typically one month to several years long and have been carried out since 1975. New experiments will be conducted, and the data from them will be added to the collection. As of 2016, all but one of the experiments were conducted in waters abutting the U.S. coast; the exception was conducted in the Adriatic Sea. Measurements acquired vary by site and experiment; they usually include current velocity, wave statistics, water temperature, salinity, pressure, turbidity, and light transmission from one or more depths over a time period. The measurements are concentrated near the sea floor but may also include data from the water column. The user interface provides an interactive map, a tabular summary of the experiments, and a separate page for each experiment. Each experiment page has documentation and maps that provide details of what data were collected at each site. Links to related publications with additional information about the research are also provided. The data are stored in Network Common Data Format (netCDF) files using the Equatorial Pacific Information Collection (EPIC) conventions defined by the National Oceanic and Atmospheric Administration (NOAA) Pacific Marine Environmental Laboratory. NetCDF is a general, self-documenting, machine-independent, open source data format created and supported by the University Corporation for Atmospheric Research (UCAR). EPIC is an early set of standards designed to allow researchers from different organizations to share oceanographic data. The files may be downloaded or accessed online using the Open-source Project for a Network Data Access Protocol (OPeNDAP). The OPeNDAP framework allows users to access data from anywhere on the Internet using a variety of Web services including Thematic Realtime Environmental Distributed Data Services (THREDDS). A subset of the data compliant with the Climate and Forecast convention (CF, currently version 1.6) is also available.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was created by Vikram Baliga
Released under CC BY-SA 4.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A table of dates for a period of interest, usually a month, expressed in two different formats: mm/dd/yyyy and mm-dd-yyyy Start date: 12/01/2014 End date: 12/31/2014
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains daily rainfall measurements (in millimeters) for the year 2022. The data spans from January 1, 2022, to July 3, 2022, covering a total of 184 days. The dataset can be used for various machine learning tasks, such as time series forecasting, pattern recognition, or anomaly detection related to rainfall patterns.
Column Descriptors:
date (date): Description: The date of the rainfall measurement in the format YYYY-MM-DD. Example: 2022-01-01 rainfall (float): Description: The amount of rainfall recorded on the corresponding date, measured in millimeters (mm). Example: 12.5 Range: The rainfall values range from 0.0 mm (no rainfall) to 22.4 mm (the maximum recorded value in the dataset). Missing values: There are no missing values in this column.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wikipedia temporal graph.
The dataset is based on two Wikipedia SQL dumps: (1) English language articles and (2) user visit counts per page per hour (aka pagecounts). The original datasets are publicly available on the Wikimedia website.
Static graph structure is extracted from English language Wikipedia articles. Redirects are removed. Before building the Wikipedia graph we introduce thresholds on the minimum number of visits per hour and maximum in-degree. We remove the pages that have less than 500 visits per hour at least once during the specified period. Besides, we remove the nodes (pages) with in-degree higher than 8 000 to build a more meaningful initial graph. After cleaning, the graph contains 116 016 nodes (out of total 4 856 639 pages), 6 573 475 edges. The graph can be imported in two ways: (1) using edges.csv and vertices.csv or (2) using enwiki-20150403-graph.gt file that can be opened with open source Python library Graph-Tool.
Time-series data contains users' visit counts from 02:00, 23 September 2014 until 23:00, 30 April 2015. The total number of hours is 5278. The data is stored in two formats: CSV and H5. CSV file contains data in the following format [page_id :: count_views :: layer], where layer represents an hour. In H5 file, each layer corresponds to an hour as well.
Facebook
TwitterS-Pol radar full time series data in IWRF format collected continuously during the LATTE (Lower Atmospheric Thermodynamics & Turbulence Experiment) project. Each file covers about 15 minutes of S-Pol operation. See the FRONT S-Pol Data Availability 2014-2015 document linked below to check on data availability.
Facebook
TwitterS-Pol Radar full time series data collected during the Plains Elevated Convection at Night (PECAN) campaign from 9 March 2015 to 16 July 2015. This is a "realtime" PECAN data set. The files are a mix of hourly files ("SPOL_scan") and episodic files (SPOL_vert and SPOL_sunscan). The files are in Integrated Weather Radar Facility (IWRF) format and are available as tar archives.
Facebook
TwitterThe data has been extracted from Alpha Vantage API through rapidAPI and stored in dataframe through Pandas. This is then saved in CSV format.
You can extract the data through the below codeπ .
===============================
*import requests import pandas as pd url = "https://alpha-vantage.p.rapidapi.com/query" querystring = {"interval": "5min", "function": "TIME_SERIES_INTRADAY", "symbol": "MSFT", "datatype": "json", "output_size": "compact"} headers = { "X-RapidAPI-Key": "yourrapidapikey", "X-RapidAPI-Host": "alpha-vantage.p.rapidapi.com" } response = requests.request("GET", url, headers=headers, params=querystring) response.json() def get_data(): while True:
response = requests.request("GET", url, headers=headers, params=querystring)
df = pd.DataFrame(response.json()['Time Series (5min)'])
df = df.T
break
return df.to_csv('TimeSeries(5mins).csv')
Please refer to this link for more information https://rapidapi.com/alphavantage/api/alpha-vantage.
Facebook
TwitterThe data is a synthetic univariate time series.
This data set is designed for testing indexing schemes in time seriesdatabases. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing tasks.
This data set is designed for testing indexing schemes in time series databases. It is a much larger dataset than has been used in any published study (That we are currently aware of). It contains one million data points. The data has been split into 10 sections to facilitate testing (see below). We recommend building the index with 9 of the 100,000-datapoint sections, and randomly extracting a query shape from the 10th section. (Some previously published work seems to have used queries that were also used to build the indexing structure. This will produce optimistic results) The data are interesting because they have structure at different resolutions. Each of the 10 sections where generated by independent invocations of the function:https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3650646%2F63a7467c9c096ba461b6f02702e6d816%2Fequation.jpg?generation=1598371655944726&alt=media" alt="">
Where rand(x) produces a random integer between zero and x. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing structure.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
The data is stored in one ASCII file. There are 10 columns, 100,000 rows. All data points are in the range -0.5 to +0.5. Rows are separated by carriage returns, columns by spaces.
Acknowledgements, Copyright Information, and Availability.Freely available for research use.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
Twitterhttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
A dataset of broadband subscriptions and GDP per capita statistics, for 217 countries, between the years 2000-2020. The data is in long format, suitable for time series analysis.
The variables in the dataset are: - year: Year of the observation, between 2000-2020. - country: Country of the observation, 217 in total. - broadband_subs: Number of broadband subscriptions per 100 people. - GDPPC: GDP per capita, in 2022 US$
There are some missing values (NAs) for broadband_subs and GDPPC, especially in the eariler years.
The data source is World Bank Open Data. The original data was retrieved as two separate datasets in wide format, and converted into long format, in May 2022.
Facebook
TwitterThis Data Release serves as a repository for a set of time-series data used in Scientific Investigations Report 2018-5040. The data represent continuous measurements of specific conductance, water temperature, and/or water level (stage), recorded by a variety of types of data loggers during three multi-day interference tests conducted on the Virgin River at Pah Tempe Springs during November 2013, February 2014, and November 2014. The data presented are the raw data downloaded from the data loggers and are organized according to the date of the test and the type and name of the observation site. The Data Release contains 3 items: 1. An explanatory table, "PahTempe_table1.xlsx", which indicates which parameters were collected and on what instrument at each site during a given test 2. The data, "PahTempe_data.zip"; this zipped file contains the raw data logger files in comma-separated values (CSV) format, organized into folders according to the date of the interference pumping test 3. The metadata document, "PahTempe_metadata.xml" Because these data were collected during multi-day interference pumping tests, they do not represent natural hydrologic conditions in the river, springs, or shallow groundwater system. Users of this data are advised to refer to the larger work citation for proper use and interpretation of the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This directory includes discharge time series data (q) for 14 headwater stream networks, produced in standard format and common units of mm/day for straightforward hydrograph inter-comparison.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to the Fifth Generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis (ERA5). Produced by replaying only the land component of the ECMWF ERA5 climate reanalysis, it benefits from the same physical data-assimilation framework but runs offline at higher spatial detail (9 km grid) to deliver richer land-surface information. Reanalysis merges numerical model output with global observations into a globally complete, physically consistent climate record; this βdata assimilationβ approach mirrors operational weather forecasting but is optimised for historical completeness rather than forecast timeliness. Reanalysis datasets extend back several decades by sacrificing forecast deadlines, allowing additional time to gather observations and retrospectively ingest improved data, thereby enhancing data quality in earlier periods. ERA5-Land uses atmospheric fields from ERA5βair temperature, humidity, pressureβas βforcingβ inputs to drive its land-surface model, preventing rapid drift from reality that unconstrained simulations would suffer. Although observations do not enter the land model directly, they shape the atmospheric forcing through assimilation, giving ERA5-Land an indirect observational anchor. To reconcile ERA5βs coarser grid with ERA5-Landβs finer 9 km grid, a lapse-rate correction adjusts input temperatures, humidity, and pressures for altitude differences. Like all numerical simulations, ERA5-Land carries uncertainty that generally grows backward in time as fewer observations were available to constrain the forcing. Users can combine ERA5-Land fields with the uncertainty estimates from equivalent ERA5 variables to assess confidence bounds. The temporal resolution (hourly) and spatial detail (9 km) of ERA5-Land make it invaluable for land-surface applications such as flood and drought forecasting, agricultural monitoring, and hydrological studies. The dataset presented here is a regridded subset of the full ERA5-Land archive, stored in an Analysis-Ready, Cloud-Optimised (ARCO) format specifically designed for retrieving long time-series for individual points. When a userβs requested location does not exactly match a grid point, the nearest grid point is automatically selected. This optimised data source ensures rapid response times.
Facebook
TwitterThis study is a longitudinal national data series for 167 nations. The present dataset represents an expansion both of temporal coverage and of substantive variable categories from the earlier CROSS POLITY TIME SERIES (ICPSR 5002) by the Center for Comparative Political Research, State University of New York (Binghamton). General areas included among the variables now available are demographic, social, political, and economic topics. Cases in the data collection represent nation-year observations. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR07412.v1. We highly recommend using the ICPSR version as they have made this dataset available in multiple data formats.
Facebook
TwitterOverview
This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:
This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.
Data sources
We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).
Content of the repository
A) Scripts
The python scripts run with Python 3.7 and with the packages found in "requirements.txt".
B) Data_converted and Data_cleansed
The folder "Data_converted" contains the output of "convert_data_format.py" and "Data_cleansed" contains the output of "clean_corrupted_data.py".
Use cases
We point out that this repository can be used in two different was:
from helper_functions import *
import pandas as pd
cleansed_data = pd.read_csv('/Path_to_cleansed_data/data.zip',
index_col=0, header=None, squeeze=True,
parse_dates=[0])
valid_bounds, valid_sizes = true_intervals(~cleansed_data.isnull())
start,end= valid_bounds[ np.argmax(valid_sizes) ]
data_without_nan = cleansed_data.iloc[start:end]
License
We release the code in the folder "Scripts" under the MIT license [8]. In the case of Nationalgrid and Fingrid, we further release the pre-processed data in the folder "Data_converted" and "Data_cleansed" under the CC-BY 4.0 license [7]. TransnetBW originally did not publish their data under an open license. We have explicitly received the permission to publish the pre-processed version from TransnetBW. However, we cannot publish our pre-processed version under an open license due to the missing license of the original TransnetBW data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The dataset presented here is a regridded subset of the full ERA5 data set on native resolution that is stored in a format designed for retrieving long time-series for a single point. When the requested location does not match the exact location of a grid point then the nearest grid point is used instead. It is this source of ERA5 data that is used by the ERA-Explorer to ensure response times required for the interactive web-application. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Raw ASCII time-series data (cleaned of corrupted records). This data set contains continuous time-series temperature measurements from several hydrothermal vents (Biovent, Mvent, Bio9 vents, Pvent, Lvent) located along the East Pacific Rise (EPR) near 9Β°50'N. The compilation contains legacy data along with data from cruises AT42-06, AT42-21, RR2102, AT50-07, AT50-21, AT50-33, and AT50-36. The data files are in ASCII format and were collected with temperature probes and autonomous temperature loggers. The data compilation was funded through awards OCE-1834797, OCE-1949485, OCE-1949938, OCE-1948936, ANR-24-CE56-6841 (Project OMENS), ERC-10117070619 (Project SeaSALT).
Facebook
TwitterThis dataset contains various tropospheric analysis time series' created from the grids of other DSS datasets and converted to GRIB (for grids that were not already in this format).
Currently, the only available time series' are series' of 500mb geopotential height.
Facebook
Twitterhttps://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en
Dataset consists of data in categories walking, running, biking, skiing, and roller skiing (5). Sport activities have been recorded by an individual active (non-competitive) athlete. Data is pre-processed, standardized and splitted in four parts (each dimension in its own file): * HR-DATA_std_1140x69 (heart rate signals) * SPD-DATA_std_1140x69 (speed signals) * ALT-DATA_std_1140x69 (altitude signals) * META-DATA_1140x4 (labels and details)
NOTE: Signal order between the separate files must not be confused when processing the data. Signal order is critical; first index in each of the file comes from the same activity which label corresponds to first index in the target data file, and so on. So, data should be constructed and files combined into the same table while reading the files, ideally using nested data structure. Something like in the picture below:
You may check the related TSC projects in GitHub: - "https://github.com/JABE22/MasterProject">Sport Activity Classification Using Classical Machine Learning and Time Series Methods - Symbolic Representation of Multivariate Time Series Signals in Sport Activity Classification - Kaggle Project
https://mediauploads.data.world/e1ccd4d36522e04c0061d12d05a87407bec80716f6fe7301991eaaccd577baa8_mts_data.png" alt="Nested data structure for multivariate time series classifiers">
In the following picture one can see five signal samples for each dimension (Heart Rate, Speed, Altitude) in standard feature value format. So, each figure contains signal from five different random activities (can be same or different category). However, for example, signal indexes number 1 in each three figure are from the same activity. Figures just visualizes what kind of signals dataset consists. They do not have any particular meaning.
https://mediauploads.data.world/162b7086448d8dbd202d282014bcf12bd95bd3174b41c770aa1044bab22ad655_signal_samples.png" alt="Signals from sport activities (Heart Rate, Speed, and Altitude)">
The original amount of sport activities is 228. From each of them, starting from the index 100 (seconds), have been picked 5 x 69 second consecutive segments, that is expressed as a formula below:
https://mediauploads.data.world/68ce83092ec65f6fbaee90e5de6e12df40498e08fa6725c111f1205835c1a842_segment_equation.png" alt="Data segmentation and augmentation formula">
where π· = ππππππππ ππππ‘ππππ πππ‘π ,π = ππ’ππππ ππ πππ‘ππ£ππ‘πππ , π = π ππππππ‘ π π‘πππ‘ πππππ₯ , π = π ππππππ‘ πππππ‘β, and π = π‘βπ ππ’ππππ ππ π ππππππ‘π from a single original sequence π·π , resulting the new set of equal length segments π·π ππ. And in this certain case the equation takes the form of:
https://mediauploads.data.world/63dd87bf3d0010923ad05a8286224526e241b17bbbce790133030d8e73f3d3a7_data_segmentation_formula.png" alt="Data segmentation and augmentation formula with values">
Thus, dataset has dimesions of 1140 x 69 x 3.
Data has been recorded without knowing it will be used in research, therefore it represents well real-world application of data source and can provide excellent tool to test algorithms in real data.
Recording devices
Data has been recorded using two type of Garmin devices. Models are Forerunner 920XT and vivosport. Vivosport is activity tracker and measures heart rate from the wrist using optical sensor, whereas 920XT requires external sensor belt (hear rate + inertial) installed under chest when doing exercises. Otherwise devices are not essentially different, they uses GPS location to measure speed and inertial barometer to measure elevation changes.
Device manuals - Garmin FR-920XT - Garmin Vivosport
Person profile
Age: 30-31, Weight: 82, Length: 181, Active athlete (non-competitive)
Facebook
TwitterThe oceanographic time series data collected by U.S. Geological Survey scientists and collaborators are served in an online database at http://stellwagen.er.usgs.gov/index.html. These data were collected as part of research experiments investigating circulation and sediment transport in the coastal ocean. The experiments (projects, research programs) are typically one month to several years long and have been carried out since 1975. New experiments will be conducted, and the data from them will be added to the collection. As of 2016, all but one of the experiments were conducted in waters abutting the U.S. coast; the exception was conducted in the Adriatic Sea. Measurements acquired vary by site and experiment; they usually include current velocity, wave statistics, water temperature, salinity, pressure, turbidity, and light transmission from one or more depths over a time period. The measurements are concentrated near the sea floor but may also include data from the water column. The user interface provides an interactive map, a tabular summary of the experiments, and a separate page for each experiment. Each experiment page has documentation and maps that provide details of what data were collected at each site. Links to related publications with additional information about the research are also provided. The data are stored in Network Common Data Format (netCDF) files using the Equatorial Pacific Information Collection (EPIC) conventions defined by the National Oceanic and Atmospheric Administration (NOAA) Pacific Marine Environmental Laboratory. NetCDF is a general, self-documenting, machine-independent, open source data format created and supported by the University Corporation for Atmospheric Research (UCAR). EPIC is an early set of standards designed to allow researchers from different organizations to share oceanographic data. The files may be downloaded or accessed online using the Open-source Project for a Network Data Access Protocol (OPeNDAP). The OPeNDAP framework allows users to access data from anywhere on the Internet using a variety of Web services including Thematic Realtime Environmental Distributed Data Services (THREDDS). A subset of the data compliant with the Climate and Forecast convention (CF, currently version 1.6) is also available.