Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
S-Pol radar full time series data in IWRF format collected continuously during the LATTE (Lower Atmospheric Thermodynamics & Turbulence Experiment) project. Each file covers about 15 minutes of S-Pol operation. See the FRONT S-Pol Data Availability 2014-2015 document linked below to check on data availability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MAPS Model Location Time Series (MOLTS) is one of the model output datasets provided in the Southern Great Plains - 1997 (SGP97). The full MAPS MOLTS dataset covers most of North America east of the Rocky Mountains (283 locations). MOLTS are hourly time series output at selected locations that contain values for various surface parameters and ‘sounding' profiles at MAPS model levels and are derived from the MAPS model output. The MOLTS output files were converted into Joint Office for Science Support (JOSS) Quality Control Format (QCF), the same format used for atmospheric rawinsonde soundings processed by JOSS. The MOLTS output provided by JOSS online includes only the initial analysis output (i.e. no forecast MOLTS) and only state parameters (pressure, altitude, temperature, humidity, and wind). The full output, including the forecast MOLTS and all output parameters, in its original format (Binary Universal Form for the Representation of meteorological data, or BUFR) is available from the National Center for Atmospheric Research (NCAR)/Scientific Computing Division. The Forecast Systems Laboratory (FSL) operates the MAPS model with a resolution of 40 km and 40 vertical levels. The MAPS analysis and forecast fields are generated every 3 hours at 0000, 0300, 0600, 0900, 1200, 1500, 1800, and 2100 UTC daily. MOLTS are hourly vertical profile and surface time series derived from the MAPS model output. The complete MOLTS output includes six informational items, 16 parameters for each level and 27 parameters at the surface. Output are available each hour beginning at the initial analysis (the only output available from JOSS) and ending at the 48 hour forecast. JOSS converts the raw format files into JOSS QCF format which is the same format used for atmospheric sounding data such as National Weather Service (NWS) soundings. JOSS calculated the total wind speed and direction from the u and v wind components. JOSS calculated the mixing ratio from the specific humidity (Pruppacher and Klett 1980) and the dew point from the mixing ratio (Wallace and Hobbs 1977). Then the relative humidity was calculated from the dew point (Bolton 1980). JOSS did not conduct any quality control on this output. The header records (15 total records) contain output type, project ID, the location of the nearest station to the MOLTS location (this can be a rawinsonde station, an Atmospheric Radiation Measurement (ARM)/Cloud and Radiation Testbed (CART) station, a wind profiler station, a surface station, or just the nearest town), the location of the MOLTS output, and the valid time for the MOLTS output. The five header lines contain information identifying the sounding, and have a rigidly defined form. The following 6 header lines are used for auxiliary information and comments about the sounding, and they vary significantly from dataset to dataset. The last 3 header records contain header information for the data columns. Line 13 holds the field names, line 14 the field units, and line 15 contains dashes ('-' characters) delineating the extent of the field. Resources in this dataset:Resource Title: GeoData catalog record. File Name: Web Page, url: https://geodata.nal.usda.gov/geonetwork/srv/eng/catalog.search#/metadata/2ad09880-6439-440c-9829-c4653ec12a4f
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A table of dates for a period of interest, usually a month, expressed in two different formats: mm/dd/yyyy and mm-dd-yyyy Start date: 12/01/2014 End date: 12/31/2014
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides monthly stock price data for the MAG7 over the past 20 years (2004–2024). The data includes key financial metrics such as opening price, closing price, highest and lowest prices, trading volume, and percentage change. The dataset is valuable for financial analysis, stock trend forecasting, and portfolio optimization.
MAG7 refers to the seven largest and most influential technology companies in the U.S. stock market : - Microsoft (MSFT) - Apple (AAPL) - Google (Alphabet, GOOGL) - Amazon (AMZN) - Nvidia (NVDA) - Meta (META) - Tesla (TSLA)
These companies are known for their market dominance, technological innovation, and significant impact on global stock indices such as the S&P 500 and Nasdaq-100.
The dataset consists of historical monthly stock prices of MAG7, retrieved from Investing.com. It provides an overview of how these stocks have performed over two decades, reflecting market trends, economic cycles, and technological shifts.
Date
The recorded month and year (DD-MM-YYYY)Price
The closing price of the stock at the end of the monthOpen
The price at which the stock opened at the beginning of the monthHigh
The highest stock price recorded in the monthLow
The lowest stock price recorded in the monthVol.
The total trading volume for the monthChange %
The percentage change in stock price compared to the previous month
# 5. Data Source & Format
The dataset was obtained from Investing.com and downloaded in CSV format.
The data has been processed to ensure consistency and accuracy, with date formats standardized for time-series analysis.
# 6. Potential Use Cases
This dataset can be used for :This file type contains time series measurements of wind and other surface meteorological parameters taken at fixed locations. The instrument arrays may be deployed on automated buoys, ships, or towers. The data record includes values of east-west (u) and north-south (v) wind components at specified date and time. Wind values may have been averaged or filtered and are typically reported at time intervals of 10-15 minutes. Air temperature, atmospheric pressure, and dew point temperatures may also be reported. Data were primarily collected in coastal Alaska and Puget Sound, but measurements from a few specific equatorial Pacific Ocean and Atlantic Ocean sites are also available.
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/INSPIRE_Directive_Article13_1ahttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/INSPIRE_Directive_Article13_1a
The European Space Agency, in collaboration with BlackBridge collected two time series datasets with a five day revisit at high resolution: February to June 2013 over 14 selected sites around the world April to September 2015 over 10 selected sites around the world. The RapidEye Earth Imaging System provides data at 5 m spatial resolution (multispectral L3A orthorectified). The products are radiometrically and sensor corrected similar to the 1B Basic product, but have geometric corrections applied to the data during orthorectification using DEMs and GCPs. The product accuracy depends on the quality of the ground control and DEMs used. The imagery is delivered in GeoTIFF format with a pixel spacing of 5 metres. The dataset is composed of data over: 14 selected sites in 2013: Argentina, Belgium, Chesapeake Bay, China, Congo, Egypt, Ethiopia, Gabon, Jordan, Korea, Morocco, Paraguay, South Africa and Ukraine. 10 selected sites in 2015: Limburgerhof, Railroad Valley, Libya4, Algeria4, Figueres, Libya1, Mauritania1, Barrax, Esrin, Uyuni Salt Lake. Spatial coverage: Check the spatial coverage of the collection on a map available on the Third Party Missions Dissemination Service.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Weighted estimates from the voluntary fortnightly business survey (BICS) about financial performance, workforce, prices, trade, and business resilience in a timeseries format.
This Data Release serves as a repository for a set of time-series data used in Scientific Investigations Report 2018-5040. The data represent continuous measurements of specific conductance, water temperature, and/or water level (stage), recorded by a variety of types of data loggers during three multi-day interference tests conducted on the Virgin River at Pah Tempe Springs during November 2013, February 2014, and November 2014. The data presented are the raw data downloaded from the data loggers and are organized according to the date of the test and the type and name of the observation site. The Data Release contains 3 items: 1. An explanatory table, "PahTempe_table1.xlsx", which indicates which parameters were collected and on what instrument at each site during a given test 2. The data, "PahTempe_data.zip"; this zipped file contains the raw data logger files in comma-separated values (CSV) format, organized into folders according to the date of the interference pumping test 3. The metadata document, "PahTempe_metadata.xml" Because these data were collected during multi-day interference pumping tests, they do not represent natural hydrologic conditions in the river, springs, or shallow groundwater system. Users of this data are advised to refer to the larger work citation for proper use and interpretation of the data.
This dataset depicts locations maintained in the Idaho Department of Water Resources database. Database records include water flow measurements and estimates in time-series intervals ranging from 15 minutes to daily. Data include time-series records collected by IDWR as well as data collected in conjunction with and by third-parties. Points in this dataset correspond to locations found in IDWR’s Aqua Info application (see URL) which gives its users access to flow estimates which can be viewed in charts and downloaded in tabulated format.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The dataset presented here is a regridded subset of the full ERA5 data set on native resolution that is stored in a format designed for retrieving long time-series for a single point. When the requested location does not match the exact location of a grid point then the nearest grid point is used instead. It is this source of ERA5 data that is used by the ERA-Explorer to ensure response times required for the interactive web-application. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an updated version of Gütschow et al. (2018, http://doi.org/10.5880/pik.2018.003). Please use this version which incorporates updates to input data as well as correction of errors in the original dataset and its previous updates. For a detailed description of the changes please consult the CHANGELOG included in the data description document. The PRIMAP-hist dataset combines several published datasets to create a comprehensive set of greenhouse gas emission pathways for every country and Kyoto gas covering the years 1850 to 2016, and all UNFCCC (United Nations Framework Convention on Climate Change) member states, as well as most non-UNFCCC territories. The data resolves the main IPCC (Intergovernmental Panel on Climate Change) 2006 categories. For CO2, CH4, and N2O subsector data for Energy, Industrial Processes and Agriculture is available. Version 2.0 of the PRIMAP-hist dataset does not include emissions from Land use, land use change and forestry (LULUCF). List of datasets included in this data publication:(1) PRIMAP-hist_v2.0_11-Dec-2018.csv: With numerical extrapolation of all time series to 2016. (only in .zip folder)(2) PRIMAP-hist_no_extrapolation_v2.0_11-Dec-2018.csv: Without numerical extrapolation of missing values. (only in .zip folder)(3) PRIMAP-hist_v2.0_data-format-description: including CHANGELOG(4) PRIMAP-hist_v2.0_updated_figures: updated figures of those published in Gütschow et al. (2016)(all files are also included in the .zip folder) When using this dataset or one of its updates, please also cite the data description article (Gütschow et al., 2016, http://doi.org/10.5194/essd-8-571-2016) to which this data are supplement to. Please consider also citing the relevant original sources. SOURCES:- Global CO2 emissions from cement production v2: Andrew (2018)- BP Statistical Review of World Energy: BP (2018)- CDIAC: Boden et al. (2017)- EDGAR version 4.3.2: JRC and PBL (2017), Janssens-Maenhout et al. (2017)- EDGAR versions 4.2 and 4.2 FT2010: JRC and PBL (2011), Olivier and Janssens-Maenhout (2012)- EDGAR-HYDE 1.4: Van Aardenne et al. (2001), Olivier and Berdowski (2001)- FAOSTAT database: Food and Agriculture Organization of the United Nations (2018)- RCP historical data: Meinshausen et al. (2011)- UNFCCC National Communications and National Inventory Reports for developing countries: UNFCCC (2018)- UNFCCC Biennal Update Reports: UNFCCC (2018)- UNFCCC Common Reporting Format (CRF): UNFCCC (2017), UNFCCC (2018), Jeffery et al. (2018) Full references are available in the data description document.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
README
1.1 This folder contains time series for the elevation angle obtained from simulations for the following orbit altitudes:
1. 1000 km
2. 1200 km
3. 1400 km
4. 1600 km
5. 1800 km
6. 2000 km
1.2 For each altitude, the following orbit inclinations where considered:
20º, 25º, 30º, ..., 85º
1.3 The elevation angle time series were observed from 16 Earth Station's, ES's. Those ES's are numbered and located* as follows:
1. ES1 = ( 0º, 276.7121º)
2. ES2 = ( 5º, 276.7121º)
3. ES3 = (10º, 276.7121º)
4. ES4 = (15º, 276.7121º)
5. ES5 = (20º, 276.7121º)
6. ES6 = (25º, 261.7121º)
7. ES1 = (30º, 259.7121º)
8. ES2 = (35º, 259.7121º)
9. ES3 = (40º, 259.7121º)
10. ES4 = (45º, 259.7121º)
11. ES5 = (50º, 259.7121º)
12. ES6 = (55º, 259.7121º)
13. ES7 = (60º, 259.7121º)
14. ES8 = (65º, 259.7121º)
15. ES9 = (70º, 259.7121º)
16. ES10 = (75º, 259.7121º)
17. ES11 = (80º, 259.7121º)
18. ES12 = (85º, 259.7121º)
*Location for the ES's is provided in cartesian format:
(latitude in deg, longitude in deg)
1.4 The eccentricity for all the simulations correspond to that of a circular orbit.
1.5 For organization convenience, all the time series files are organized within six folders according to their orbit altitudes.
1.6 Files in this folder are provided to facilitate reproduction of our work and to encourage future work based on our proposed methodology and results.
2.1 Time series filenames Each time series is identified with the following character format:
"hhhh-ii-ES#.csv".
where:
hhhh indicates the orbit altitude
ii indicates the orbit inclination
ES# indicates the ES from which the elevation angle is
being observed
2.2 Time series length Elevation angle time series are provided for a simulation period of 180 days.
3.1 Time series are provided using the .csv (comma separated values) file format to facilitate reading through diverse programming tools.
3.2 Each time series is provided as an individual two-column .csv file. The first column contains the time vector and the second contains the corresponding elevation angle vector.
The global precipitation time series provides time series charts showing observations of daily precipitation as well as accumulated precipitation compared to normal accumulated amounts for various stations around the world. These charts are created for different scales of time (30, 90, 365 days). Each station has a graphic that contains two charts. The first chart in the graphic is a time series in the format of a line graph, representing accumulated precipitation for each day in the time series compared to the accumulated normal amount of precipitation. The second chart is a bar graph displaying actual daily precipitation. The total accumulation and surplus or deficit amounts are displayed as text on the charts representing the entire time scale, in both inches and millimeters. The graphics are updated daily and the graphics reflect the updated observations and accumulated precipitation amounts including the latest daily data available. The available graphics are rotated, meaning that only the most recently created graphics are available. Previously made graphics are not archived.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set includes the materials required to reproduce the figures and tables presented in the study: "A 21st Century Warming Threshold for Sustained Greenland Ice Sheet Mass Loss". The data consist of:
NB: the NetCDF files above use a Polar Stereographic North (EPSG:3413) projection with a horizontal resolution of 1 km x 1 km. The reference point is located at 45ºW longitude and 70ºN latitude.
NB: the file Crossref_sim_names.txt cross-references the simulation abbreviations in the above .dat files to official simulation names from the National Center for Atmospheric Research (NCAR).
The daily downscaled SMB data set from the CESM2-forced RACMO2.3p2 historical simulation and SSP5-8.5 projection are freely available from the authors upon request and without conditions (contact: b.p.y.noel@uu.nl). Besides SMB, the data set includes daily total precipitation (snow and rain), snowfall, total melt (snow and ice), meltwater runoff, retention and refreezing, total sublimation (surface and drifting snow), snow drift erosion, as well as 2 m air temperature at 1 km horizontal resolution.
Abstract: "Under anticipated future warming, the Greenland ice sheet (GrIS) will pass a threshold when meltwater runoff exceeds the accumulation of snow, resulting in a negative surface mass balance (SMB < 0) and sustained mass loss. In spite of several recent warm summers with high melt rates, SMB < 0 has not been reached since at least the year 1958. Here we dynamically and statistically downscale the outputs of an Earth system model to 1 km resolution to infer that a Greenland near-surface atmospheric warming of 4.5 ± 0.3 °C—relative to pre-industrial—is required for GrIS SMB to become persistently negative. Climate models from CMIP5 and CMIP6 translate this regional temperature change to a global warming threshold of 2.7 ± 0.2 °C. Under a high-end warming scenario, this threshold may be reached around 2055, while for a strong mitigation scenario it will likely not be passed."
S-Pol Radar full time series data collected during the Plains Elevated Convection at Night (PECAN) campaign from 9 March 2015 to 16 July 2015. This is a "realtime" PECAN data set. The files are a mix of hourly files ("SPOL_scan") and episodic files (SPOL_vert and SPOL_sunscan). The files are in Integrated Weather Radar Facility (IWRF) format and are available as tar archives.
Load, wind and solar, prices in hourly resolution. This data package contains different kinds of timeseries data relevant for power system modelling, namely electricity prices, electricity consumption (load) as well as wind and solar power generation and capacities. The data is aggregated either by country, control area or bidding zone. Geographical coverage includes the EU and some neighbouring countries. All variables are provided in hourly resolution. Where original data is available in higher resolution (half-hourly or quarter-hourly), it is provided in separate files. This package version only contains data provided by TSOs and power exchanges via ENTSO-E Transparency, covering the period 2015-mid 2020. See previous versions for historical data from a broader range of sources. All data processing is conducted in Python/pandas and has been documented in the Jupyter notebooks linked below.
This data set contains QA/QC-ed (Quality Assurance and Quality Control) water level data for the PLM1 and PLM6 wells. PLM1 and PLM6 are location identifiers used by the Watershed Function SFA project for two groundwater monitoring wells along an elevation gradient located along the lower montane life zone of a hillslope near the Pumphouse location at the East River Watershed, Colorado, USA. These wells are used to monitor subsurface water and carbon inventories and fluxes, and to determine the seasonally dependent flow of groundwater under the PLM hillslope. The downslope flow of groundwater in combination with data on groundwater chemistry (see related references) can be used to estimate rates of solute export from the hillslope to the floodplain and river. QA/QC analysis of measured groundwater levels in monitoring wells PLM-1 and PLM-6 included identification and flagging of duplicated values of timestamps, gap filling of missing timestamps and water levels, removal of abnormal/bad and outliers of measured water levels. The QA/QC analysis also tested the application of different QA/QC methods and the development of regular (5-minute, 1-hour, and 1-day) time series datasets, which can serve as a benchmark for testing other QA/QC techniques, and will be applicable for ecohydrological modeling. The package includes a Readme file, one R code file used to perform QA/QC, a series of 8 data csv files (six QA/QC-ed regular time series datasets of varying intervals (5-min, 1-hr, 1-day) and two files with QA/QC flagging of original data), and three files for the reporting format adoption of this dataset (InstallationMethods, file level metadata (flmd), and data dictionary (dd) files).QA/QC-ed data herein were derived from the original/raw data publication available at Williams et al., 2020 (DOI: 10.15485/1818367). For more information about running R code file (10.15485_1866836_QAQC_PLM1_PLM6.R) to reproduce QA/QC output files, see README (QAQC_PLM_readme.docx). This dataset replaces the previously published raw data time series, and is the final groundwater data product for the PLM wells in the East River. Complete metadata information on the PLM1 and PLM6 wells are available in a related dataset on ESS-DIVE: Varadharajan C, et al (2022). https://doi.org/10.15485/1660962. These data products are part of the Watershed Function Scientific Focus Area collection effort to further scientific understanding of biogeochemical dynamics from genome to watershed scales. 2022/09/09 Update: Converted data files using ESS-DIVE’s Hydrological Monitoring Reporting Format. With the adoption of this reporting format, the addition of three new files (v1_20220909_flmd.csv, V1_20220909_dd.csv, and InstallationMethods.csv) were added. The file-level metadata file (v1_20220909_flmd.csv) contains information specific to the files contained within the dataset. The data dictionary file (v1_20220909_dd.csv) contains definitions of column headers and other terms across the dataset. The installation methods file (InstallationMethods.csv) contains a description of methods associated with installation and deployment at PLM1 and PLM6 wells. Additionally, eight data files were re-formatted to follow the reporting format guidance (er_plm1_waterlevel_2016-2020.csv, er_plm1_waterlevel_1-hour_2016-2020.csv, er_plm1_waterlevel_daily_2016-2020.csv, QA_PLM1_Flagging.csv, er_plm6_waterlevel_2016-2020.csv, er_plm6_waterlevel_1-hour_2016-2020.csv, er_plm6_waterlevel_daily_2016-2020.csv, QA_PLM6_Flagging.csv). The major changes to the data files include the addition of header_rows above the data containing metadata about the particular well, units, and sensor description. 2023/01/18 Update: Dataset updated to include additional QA/QC-ed water level data up until 2022-10-12 for ER-PLM1 and 2022-10-13 for ER-PLM6. Reporting format specific files (v2_20230118_flmd.csv, v2_20230118_dd.csv, v2_20230118_InstallationMethods.csv) were updated to reflect the additional data. R code file (QAQC_PLM1_PLM6.R) was added to replace the previously uploaded HTML files to enable execution of the associated code. R code file (QAQC_PLM1_PLM6.R) and ReadMe file (QAQC_PLM_readme.docx) were revised to clarify where original data was retrieved from and to remove local file paths.
Overview This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:
Continental Europe
Great Britain
Nordic
This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.
Data sources We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).
Continental Europe [2]: We downloaded the data from the German TSO TransnetBW GmbH, which retains the Copyright on the data, but allows to re-publish it upon request [3].
Great Britain [4]: The download was supported by National Grid ESO Open Data, which belongs to the British TSO National Grid. They publish the frequency recordings under the NGESO Open License [5].
Nordic [6]: We obtained the data from the Finish TSO Fingrid, which provides the data under the open license CC-BY 4.0 [7].
Content of the repository
A) Scripts
In the "Download_scripts" folder you will find three scripts to automatically download frequency data from the TSO's websites.
In "convert_data_format.py" we save the data with corrected timestamp formats. Missing data is marked as NaN (processing step (1) in the supplementary material of [1]).
In "clean_corrupted_data.py" we load the converted data and identify corrupted recordings. We mark them as NaN and clean some of the resulting data holes (processing step (2) in the supplementary material of [1]).
The python scripts run with Python 3.7 and with the packages found in "requirements.txt".
B) Yearly converted and cleansed data The folders "_converted" contain the output of "convert_data_format.py" and "_cleansed" contain the output of "clean_corrupted_data.py".
File type: The files are zipped csv-files, where each file comprises one year.
Data format: The files contain two columns. The second column contains the frequency values in Hz. The first one represents the time stamps in the format Year-Month-Day Hour-Minute-Second, which is given as naive local time. The local time refers to the following time zones and includes Daylight Saving Times (python time zone in brackets):
TransnetBW: Continental European Time (CE)
Nationalgrid: Great Britain (GB)
Fingrid: Finland (Europe/Helsinki)
NaN representation: We mark corrupted and missing data as "NaN" in the csv-files.
Use cases We point out that this repository can be used in two different was:
Use pre-processed data: You can directly use the converted or the cleansed data. Note however, that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much.
Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "_converted".
License
This work is licensed under multiple licenses, which are located in the "LICENSES" folder.
We release the code in the folder "Scripts" under the MIT license .
The pre-processed data in the subfolders "**/Fingrid" and "**/Nationalgrid" are licensed under CC-BY 4.0.
TransnetBW originally did not publish their data under an open license. We have explicitly received the permission to publish the pre-processed version from TransnetBW. However, we cannot publish our pre-processed version under an open license due to the missing license of the original TransnetBW data.
Changelog Version 2:
Add time zone information to description
Include new frequency data
Update references
Change folder structure to yearly folders
Version 3:
Correct TransnetBW files for missing data in May 2016
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wikipedia temporal graph.
The dataset is based on two Wikipedia SQL dumps: (1) English language articles and (2) user visit counts per page per hour (aka pagecounts). The original datasets are publicly available on the Wikimedia website.
Static graph structure is extracted from English language Wikipedia articles. Redirects are removed. Before building the Wikipedia graph we introduce thresholds on the minimum number of visits per hour and maximum in-degree. We remove the pages that have less than 500 visits per hour at least once during the specified period. Besides, we remove the nodes (pages) with in-degree higher than 8 000 to build a more meaningful initial graph. After cleaning, the graph contains 116 016 nodes (out of total 4 856 639 pages), 6 573 475 edges. The graph can be imported in two ways: (1) using edges.csv and vertices.csv or (2) using enwiki-20150403-graph.gt file that can be opened with open source Python library Graph-Tool.
Time-series data contains users' visit counts from 02:00, 23 September 2014 until 23:00, 30 April 2015. The total number of hours is 5278. The data is stored in two formats: CSV and H5. CSV file contains data in the following format [page_id :: count_views :: layer], where layer represents an hour. In H5 file, each layer corresponds to an hour as well.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.