MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Time Series PILE
The Time-series Pile is a large collection of publicly available data from diverse domains, ranging from healthcare to engineering and finance. It comprises of over 5 public time-series databases, from several diverse domains for time series foundation model pre-training and evaluation.
Time Series PILE Description
We compiled a large collection of publicly available datasets from diverse domains into the Time Series Pile. It has 13 unique domains of data… See the full description on the dataset page: https://huggingface.co/datasets/AutonLab/Timeseries-PILE.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset package includes four datasets
Modal Service data and Safety & Security (S&S) public transit time series data delineated by transit/agency/mode/year/month. Includes all Full Reporters--transit agencies operating modes with more than 30 vehicles in maximum service--to the National Transit Database (NTD). This dataset will be updated monthly. The monthly ridership data is released one month after the month in which the service is provided. Records with null monthly service data reflect late reporting. The S&S statistics provided include both Major and Non-Major Events where applicable. Events occurring in the past three months are excluded from the corresponding monthly ridership rows in this dataset while they undergo validation. This dataset is the only NTD publication in which all Major and Non-Major S&S data are presented without any adjustment for historical continuity.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F8734253%2F832430253683be01796f74de8f532b34%2Fweather%20forecasting.png?generation=1730602999355141&alt=media" alt="">
Weather is recorded every 10 minutes throughout the entire year of 2020, comprising 20 meteorological indicators measured at a Max Planck Institute weather station. The dataset provides comprehensive atmospheric measurements including air temperature, humidity, wind patterns, radiation, and precipitation. With over 52,560 data points per variable (365 days Ă— 24 hours Ă— 6 measurements per hour), this high-frequency sampling offers detailed insights into weather patterns and atmospheric conditions. The measurements include both basic weather parameters and derived quantities such as vapor pressure deficit and potential temperature, making it suitable for both meteorological research and practical applications. You can find some initial analysis using this dataset here: "Weather Long-term Time Series Forecasting Analysis".
The dataset is provided in a CSV format with the following columns:
Column Name | Description |
---|---|
date | Date and time of the observation. |
p | Atmospheric pressure in millibars (mbar). |
T | Air temperature in degrees Celsius (°C). |
Tpot | Potential temperature in Kelvin (K), representing the temperature an air parcel would have if moved to a standard pressure level. |
Tdew | Dew point temperature in degrees Celsius (°C), indicating the temperature at which air becomes saturated with moisture. |
rh | Relative humidity as a percentage (%), showing the amount of moisture in the air relative to the maximum it can hold at that temperature. |
VPmax | Maximum vapor pressure in millibars (mbar), representing the maximum pressure exerted by water vapor at the given temperature. |
VPact | Actual vapor pressure in millibars (mbar), indicating the current water vapor pressure in the air. |
VPdef | Vapor pressure deficit in millibars (mbar), measuring the difference between maximum and actual vapor pressure, used to gauge drying potential. |
sh | Specific humidity in grams per kilogram (g/kg), showing the mass of water vapor per kilogram of air. |
H2OC | Concentration of water vapor in millimoles per mole (mmol/mol) of dry air. |
rho | Air density in grams per cubic meter (g/mÂł), reflecting the mass of air per unit volume. |
wv | Wind speed in meters per second (m/s), measuring the horizontal motion of air. |
max. wv | Maximum wind speed in meters per second (m/s), indicating the highest recorded wind speed over the period. |
wd | Wind direction in degrees (°), representing the direction from which the wind is blowing. |
rain | Total rainfall in millimeters (mm), showing the amount of precipitation over the observation period. |
raining | Duration of rainfall in seconds (s), recording the time for which rain occurred during the observation period. |
SWDR | Short-wave downward radiation in watts per square meter (W/m²), measuring incoming solar radiation. |
PAR | Photosynthetically active radiation in micromoles per square meter per second (µmol/m²/s), indicating the amount of light available for photosynthesis. |
max. PAR | Maximum photosynthetically active radiation recorded in the observation period in µmol/m²/s. |
Tlog | Temperature logged in degrees Celsius (°C), potentially from a secondary sensor or logger. |
OT | Likely refers to an "operational timestamp" or an offset in time, but may need clarification depending on the dataset's context. |
This high-resolution meteorological dataset enables applications across multiple domains. For weather forecasting, the frequent measurements support development of prediction models, while climate researchers can study microclimate variations and seasonal patterns. In agriculture, temperature and vapor pressure deficit data aids crop modeling and irrigation planning. The wind and radiation measurements benefit renewable energy planning, while the comprehensive atmospheric data supports environmental monitoring. The dataset's detailed nature makes it particularly suitable for machine learning applications and educational purposes in meteorology and data science.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains hourly sensor data collected over a period of time. The primary objective is to forecast future sensor values using various time series forecasting methods, such as SARIMA, Prophet, and machine learning models. The dataset includes an ID column, a Datetime column and a Count column, where the Count represents the sensor reading at each timestamp.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
LOTSA Data
The Large-scale Open Time Series Archive (LOTSA) is a collection of open time series datasets for time series forecasting. It was collected for the purpose of pre-training Large Time Series Models. See the paper and codebase for more information.
Citation
If you're using LOTSA data in your research or applications, please cite it using this BibTeX: BibTeX: @article{woo2024unified, title={Unified Training of Universal Time Series Forecasting Transformers}… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/lotsa_data.
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on time series analysis.
Introduction
Time series are a special class of dataset, where a response variable is tracked over time. The frequency of measurement and the timespan of the dataset can vary widely. At its most simple, a time series model includes an explanatory time component and a response variable. Mixed models can include additional explanatory variables (check out the nlme
and lme4
R packages). We will be covering a few simple applications of time series analysis in these lessons.
Opportunities
Analysis of time series presents several opportunities. In aquatic sciences, some of the most common questions we can answer with time series modeling are:
Can we forecast conditions in the future?
Challenges
Time series datasets come with several caveats, which need to be addressed in order to effectively model the system. A few common challenges that arise (and can occur together within a single dataset) are:
Autocorrelation: Data points are not independent from one another (i.e., the measurement at a given time point is dependent on previous time point(s)).
Data gaps: Data are not collected at regular intervals, necessitating interpolation between measurements. There are often gaps between monitoring periods. For many time series analyses, we need equally spaced points.
Seasonality: Cyclic patterns in variables occur at regular intervals, impeding clear interpretation of a monotonic (unidirectional) trend. Ex. We can assume that summer temperatures are higher.
Heteroscedasticity: The variance of the time series is not constant over time.
Covariance: the covariance of the time series is not constant over time. Many of these models assume that the variance and covariance are similar over the time-->heteroschedasticity.
Learning Objectives
After successfully completing this notebook, you will be able to:
Choose appropriate time series analyses for trend detection and forecasting
Discuss the influence of seasonality on time series analysis
Interpret and communicate results of time series analyses
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sales dataset
All datasets contain univariate time series and they are available in a new format that we name as .tsf, pioneered by the sktime .ts format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset serves as supplementary material to the fully reproducible paper entitled "Comparison of stochastic and machine learning methods for multi-step ahead forecasting of hydrological processes". We provide the R codes and their outcomes. We also provide the reports entitled “Definitions of the stochastic processes’’, “Definitions of the forecast quality metrics’’ and “Selected figures for the qualitative comparison of the forecasting methods’’. The former version of this dataset is available in the provided link.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Monthly movements in output for the services industries: distribution, hotels and restaurants; transport, storage and communication; business services and finance; and government and other services.
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/INSPIRE_Directive_Article13_1ahttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/INSPIRE_Directive_Article13_1a
The European Space Agency, in collaboration with BlackBridge collected two time series datasets with a five day revisit at high resolution: February to June 2013 over 14 selected sites around the world April to September 2015 over 10 selected sites around the world. The RapidEye Earth Imaging System provides data at 5 m spatial resolution (multispectral L3A orthorectified). The products are radiometrically and sensor corrected similar to the 1B Basic product, but have geometric corrections applied to the data during orthorectification using DEMs and GCPs. The product accuracy depends on the quality of the ground control and DEMs used. The imagery is delivered in GeoTIFF format with a pixel spacing of 5 metres. The dataset is composed of data over: 14 selected sites in 2013: Argentina, Belgium, Chesapeake Bay, China, Congo, Egypt, Ethiopia, Gabon, Jordan, Korea, Morocco, Paraguay, South Africa and Ukraine. 10 selected sites in 2015: Limburgerhof, Railroad Valley, Libya4, Algeria4, Figueres, Libya1, Mauritania1, Barrax, Esrin, Uyuni Salt Lake. Spatial coverage: Check the spatial coverage of the collection on a map available on the Third Party Missions Dissemination Service.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The market for Time Series Analysis Software is projected to reach $X million by 2033, growing at a CAGR of XX% from 2025 to 2033. Key drivers of this growth include the increasing adoption of IoT devices, the need for real-time data analysis, and the growing complexity of time series data. Additionally, the market is expected to benefit from advancements in artificial intelligence (AI) and machine learning (ML), which can be used to automate time series analysis tasks and improve the accuracy of predictions. The market for Time Series Analysis Software is segmented by application, type, and region. By application, the market is divided into large enterprises and SMEs. By type, the market is divided into cloud-based and on-premises solutions. By region, the market is divided into North America, South America, Europe, the Middle East & Africa, and Asia Pacific. North America is expected to be the largest market for Time Series Analysis Software throughout the forecast period, followed by Europe and Asia Pacific. The growing adoption of IoT devices and the need for real-time data analysis are expected to be the key drivers of growth in these regions.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Movements in the volume of production for the UK production industries: manufacturing, mining and quarrying, energy supply, and water and waste management. Figures are seasonally adjusted.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to the Fifth Generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis (ERA5). Produced by replaying only the land component of the ECMWF ERA5 climate reanalysis, it benefits from the same physical data-assimilation framework but runs offline at higher spatial detail (9 km grid) to deliver richer land-surface information. Reanalysis merges numerical model output with global observations into a globally complete, physically consistent climate record; this “data assimilation” approach mirrors operational weather forecasting but is optimised for historical completeness rather than forecast timeliness. Reanalysis datasets extend back several decades by sacrificing forecast deadlines, allowing additional time to gather observations and retrospectively ingest improved data, thereby enhancing data quality in earlier periods. ERA5-Land uses atmospheric fields from ERA5—air temperature, humidity, pressure—as “forcing” inputs to drive its land-surface model, preventing rapid drift from reality that unconstrained simulations would suffer. Although observations do not enter the land model directly, they shape the atmospheric forcing through assimilation, giving ERA5-Land an indirect observational anchor. To reconcile ERA5’s coarser grid with ERA5-Land’s finer 9 km grid, a lapse-rate correction adjusts input temperatures, humidity, and pressures for altitude differences. Like all numerical simulations, ERA5-Land carries uncertainty that generally grows backward in time as fewer observations were available to constrain the forcing. Users can combine ERA5-Land fields with the uncertainty estimates from equivalent ERA5 variables to assess confidence bounds. The temporal resolution (hourly) and spatial detail (9 km) of ERA5-Land make it invaluable for land-surface applications such as flood and drought forecasting, agricultural monitoring, and hydrological studies. The dataset presented here is a regridded subset of the full ERA5-Land archive, stored in an Analysis-Ready, Cloud-Optimised (ARCO) format specifically designed for retrieving long time-series for individual points. When a user’s requested location does not exactly match a grid point, the nearest grid point is automatically selected. This optimised data source ensures rapid response times.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The mid-year estimates refer to the population on 30 June of the reference year and are produced in line with the standard United Nations (UN) definition for population estimates. They are the official set of population estimates for the UK and its constituent countries, the regions and counties of England, and local authorities and their equivalents.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Seasonally adjusted and non-seasonally adjusted quarterly time series of UK public sector employment, containing the latest estimates.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global cloud-based time series database market is expected to reach USD 9.3 billion by 2033, growing at a CAGR of 12.8% during the forecast period. The market growth is attributed to increasing demand for real-time data analytics, growing adoption of IoT devices, and rising need for efficient and scalable storage solutions for large time-series datasets. However, high implementation cost and data security concerns may restrain market growth. The cloud-based time series database market is segmented by application into BFSI, retail, mining, chemical, automotive, manufacturing, scientific research, telecommunication, aerospace and defense, and others. The BFSI segment is expected to hold the largest market share due to increasing adoption of cloud-based solutions by financial institutions for real-time data analysis, fraud detection, and risk management. The retail segment is also anticipated to witness significant growth, as retailers are investing in cloud-based time series databases for inventory management, demand forecasting, and customer behavior analysis. Cloud-based time series databases (TSDBs) are designed to handle large volumes of timestamped data, enabling businesses to analyze and visualize data over time.
The U.S. Census Bureau.s economic indicator surveys provide monthly and quarterly data that are timely, reliable, and offer comprehensive measures of the U.S. economy. These surveys produce a variety of statistics covering construction, housing, international trade, retail trade, wholesale trade, services and manufacturing. The survey data provide measures of economic activity that allow analysis of economic performance and inform business investment and policy decisions. Other data included, which are not considered principal economic indicators, are the Quarterly Summary of State & Local Taxes, Quarterly Survey of Public Pensions, and the Manufactured Homes Survey. For information on the reliability and use of the data, including important notes on estimation and sampling variance, seasonal adjustment, measures of sampling variability, and other information pertinent to the economic indicators, visit the individual programs' webpages - http://www.census.gov/cgi-bin/briefroom/BriefRm.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Time Series PILE
The Time-series Pile is a large collection of publicly available data from diverse domains, ranging from healthcare to engineering and finance. It comprises of over 5 public time-series databases, from several diverse domains for time series foundation model pre-training and evaluation.
Time Series PILE Description
We compiled a large collection of publicly available datasets from diverse domains into the Time Series Pile. It has 13 unique domains of data… See the full description on the dataset page: https://huggingface.co/datasets/AutonLab/Timeseries-PILE.