100+ datasets found

h
store-sales-time-series-forecasting
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tiana, store-sales-time-series-forecasting [Dataset]. https://huggingface.co/datasets/t4tiana/store-sales-time-series-forecasting
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Tiana
Description
taken from this Kaggle competition:

Dataset Description

In this competition, you will predict sales for the thousands of product families sold at Favorita stores located in Ecuador. The training data includes dates, store and product information, whether that item was being promoted, as well as the sales numbers. Additional files include supplementary information that may be useful in building your models.

File Descriptions and Data Field Information

train.csv… See the full description on the dataset page: https://huggingface.co/datasets/t4tiana/store-sales-time-series-forecasting.
Hourly Sensor Data for Time Series Forecasting
kaggle.com
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SudhanvaHG (2024). Hourly Sensor Data for Time Series Forecasting [Dataset]. https://www.kaggle.com/datasets/sudhanvahg/hourly-sensor-data-for-forecasting
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SudhanvaHG
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains hourly sensor data collected over a period of time. The primary objective is to forecast future sensor values using various time series forecasting methods, such as SARIMA, Prophet, and machine learning models. The dataset includes an ID column, a Datetime column and a Count column, where the Count represents the sensor reading at each timestamp.
h
Timeseries-PILE
huggingface.co
Updated May 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Auton Lab (2024). Timeseries-PILE [Dataset]. https://huggingface.co/datasets/AutonLab/Timeseries-PILE
Explore at:
Dataset updated
May 11, 2024
Dataset authored and provided by
Auton Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Time Series PILE

The Time-series Pile is a large collection of publicly available data from diverse domains, ranging from healthcare to engineering and finance. It comprises of over 5 public time-series databases, from several diverse domains for time series foundation model pre-training and evaluation.

Time Series PILE Description

We compiled a large collection of publicly available datasets from diverse domains into the Time Series Pile. It has 13 unique domains of data… See the full description on the dataset page: https://huggingface.co/datasets/AutonLab/Timeseries-PILE.

Weather Long-term Time Series Forecasting

kaggle.com

Updated Nov 3, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Alistair King (2024). Weather Long-term Time Series Forecasting [Dataset]. https://www.kaggle.com/datasets/alistairking/weather-long-term-time-series-forecasting

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 3, 2024

Dataset provided by

Kaggle

Authors

Alistair King

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Weather Long-term Time Series Forecasting (2020)

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F8734253%2F832430253683be01796f74de8f532b34%2Fweather%20forecasting.png?generation=1730602999355141&alt=media" alt="">

Dataset Description

Weather is recorded every 10 minutes throughout the entire year of 2020, comprising 20 meteorological indicators measured at a Max Planck Institute weather station. The dataset provides comprehensive atmospheric measurements including air temperature, humidity, wind patterns, radiation, and precipitation. With over 52,560 data points per variable (365 days × 24 hours × 6 measurements per hour), this high-frequency sampling offers detailed insights into weather patterns and atmospheric conditions. The measurements include both basic weather parameters and derived quantities such as vapor pressure deficit and potential temperature, making it suitable for both meteorological research and practical applications. You can find some initial analysis using this dataset here: "Weather Long-term Time Series Forecasting Analysis".

File Structure

The dataset is provided in a CSV format with the following columns:

Column Name	Description
`date`	Date and time of the observation.
`p`	Atmospheric pressure in millibars (mbar).
`T`	Air temperature in degrees Celsius (°C).
`Tpot`	Potential temperature in Kelvin (K), representing the temperature an air parcel would have if moved to a standard pressure level.
`Tdew`	Dew point temperature in degrees Celsius (°C), indicating the temperature at which air becomes saturated with moisture.
`rh`	Relative humidity as a percentage (%), showing the amount of moisture in the air relative to the maximum it can hold at that temperature.
`VPmax`	Maximum vapor pressure in millibars (mbar), representing the maximum pressure exerted by water vapor at the given temperature.
`VPact`	Actual vapor pressure in millibars (mbar), indicating the current water vapor pressure in the air.
`VPdef`	Vapor pressure deficit in millibars (mbar), measuring the difference between maximum and actual vapor pressure, used to gauge drying potential.
`sh`	Specific humidity in grams per kilogram (g/kg), showing the mass of water vapor per kilogram of air.
`H2OC`	Concentration of water vapor in millimoles per mole (mmol/mol) of dry air.
`rho`	Air density in grams per cubic meter (g/m³), reflecting the mass of air per unit volume.
`wv`	Wind speed in meters per second (m/s), measuring the horizontal motion of air.
`max. wv`	Maximum wind speed in meters per second (m/s), indicating the highest recorded wind speed over the period.
`wd`	Wind direction in degrees (°), representing the direction from which the wind is blowing.
`rain`	Total rainfall in millimeters (mm), showing the amount of precipitation over the observation period.
`raining`	Duration of rainfall in seconds (s), recording the time for which rain occurred during the observation period.
`SWDR`	Short-wave downward radiation in watts per square meter (W/m²), measuring incoming solar radiation.
`PAR`	Photosynthetically active radiation in micromoles per square meter per second (µmol/m²/s), indicating the amount of light available for photosynthesis.
`max. PAR`	Maximum photosynthetically active radiation recorded in the observation period in µmol/m²/s.
`Tlog`	Temperature logged in degrees Celsius (°C), potentially from a secondary sensor or logger.
`OT`	Likely refers to an "operational timestamp" or an offset in time, but may need clarification depending on the dataset's context.

Potential Use Cases

This high-resolution meteorological dataset enables applications across multiple domains. For weather forecasting, the frequent measurements support development of prediction models, while climate researchers can study microclimate variations and seasonal patterns. In agriculture, temperature and vapor pressure deficit data aids crop modeling and irrigation planning. The wind and radiation measurements benefit renewable energy planning, while the comprehensive atmospheric data supports environmental monitoring. The dataset's detailed nature makes it particularly suitable for machine learning applications and educational purposes in meteorology and data science.

Credits

This data was provided by the Max Planck Institute, and acc...

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly...
zenodo.org
data.niaid.nih.gov
application/gzip, csv
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Pavel Šiška; Pavel Šiška (2025). CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting [Dataset]. http://doi.org/10.5281/zenodo.13382427
Explore at:
csv, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13382427
Dataset updated
Feb 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Pavel Šiška; Pavel Šiška
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CESNET-TimeSeries24: The dataset for network traffic forecasting and anomaly detection

The dataset called CESNET-TimeSeries24 was collected by long-term monitoring of selected statistical metrics for 40 weeks for each IP address on the ISP network CESNET3 (Czech Education and Science Network). The dataset encompasses network traffic from more than 275,000 active IP addresses, assigned to a wide variety of devices, including office computers, NATs, servers, WiFi routers, honeypots, and video-game consoles found in dormitories. Moreover, the dataset is also rich in network anomaly types since it contains all types of anomalies, ensuring a comprehensive evaluation of anomaly detection methods.

Last but not least, the CESNET-TimeSeries24 dataset provides traffic time series on institutional and IP subnet levels to cover all possible anomaly detection or forecasting scopes. Overall, the time series dataset was created from the 66 billion IP flows that contain 4 trillion packets that carry approximately 3.7 petabytes of data. The CESNET-TimeSeries24 dataset is a complex real-world dataset that will finally bring insights into the evaluation of forecasting models in real-world environments.

Please cite the usage of our dataset as:

Koumar, J., Hynek, K., Čejka, T. et al. CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting. Sci Data 12, 338 (2025). https://doi.org/10.1038/s41597-025-04603-x

@Article{cesnettimeseries24,
author={Koumar, Josef and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and {\v{S}}i{\v{s}}ka, Pavel},
title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting},
journal={Scientific Data},
year={2025},
month={Feb},
day={26},
volume={12},
number={1},
pages={338},
issn={2052-4463},
doi={10.1038/s41597-025-04603-x},
url={https://doi.org/10.1038/s41597-025-04603-x}
}

Time series

We create evenly spaced time series for each IP address by aggregating IP flow records into time series datapoints. The created datapoints represent the behavior of IP addresses within a defined time window of 10 minutes. The vector of time-series metrics v_{ip, i} describes the IP address ip in the i-th time window. Thus, IP flows for vector v_{ip, i} are captured in time windows starting at t_i and ending at t_{i+1}. The time series are built from these datapoints.

Datapoints created by the aggregation of IP flows contain the following time-series metrics:

Simple volumetric metrics: the number of IP flows, the number of packets, and the transmitted data size (i.e. number of bytes)

Unique volumetric metrics: the number of unique destination IP addresses, the number of unique destination Autonomous System Numbers (ASNs), and the number of unique destination transport layer ports. The aggregation of \textit{Unique volumetric metrics} is memory intensive since all unique values must be stored in an array. We used a server with 41 GB of RAM, which was enough for 10-minute aggregation on the ISP network.

Ratios metrics: the ratio of UDP/TCP packets, the ratio of UDP/TCP transmitted data size, the direction ratio of packets, and the direction ratio of transmitted data size

Average metrics: the average flow duration, and the average Time To Live (TTL)

Multiple time aggregation: The original datapoints in the dataset are aggregated by 10 minutes of network traffic. The size of the aggregation interval influences anomaly detection procedures, mainly the training speed of the detection model. However, the 10-minute intervals can be too short for longitudinal anomaly detection methods. Therefore, we added two more aggregation intervals to the datasets--1 hour and 1 day.

Time series of institutions: We identify 283 institutions inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution's data.

Time series of institutional subnets: We identify 548 institution subnets inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution subnet's data.

Data Records

The file hierarchy is described below:

cesnet-timeseries24/

|- institution_subnets/

| |- agg_10_minutes/

| |- agg_1_hour/

| |- agg_1_day/

| |- identifiers.csv

|- institutions/

| |- agg_10_minutes/

| |- agg_1_hour/

| |- agg_1_day/

| |- identifiers.csv

|- ip_addresses_full/

| |- agg_10_minutes/

| |- agg_1_hour/

| |- agg_1_day/

| |- identifiers.csv

|- ip_addresses_sample/

| |- agg_10_minutes/

| |- agg_1_hour/

| |- agg_1_day/

| |- identifiers.csv

|- times/

| |- times_10_minutes.csv

| |- times_1_hour.csv

| |- times_1_day.csv

|- ids_relationship.csv
|- weekends_and_holidays.csv

The following list describes time series data fields in CSV files:

id_time: Unique identifier for each aggregation interval within the time series, used to segment the dataset into specific time periods for analysis.

n_flows: Total number of flows observed in the aggregation interval, indicating the volume of distinct sessions or connections for the IP address.

n_packets: Total number of packets transmitted during the aggregation interval, reflecting the packet-level traffic volume for the IP address.

n_bytes: Total number of bytes transmitted during the aggregation interval, representing the data volume for the IP address.

n_dest_ip: Number of unique destination IP addresses contacted by the IP address during the aggregation interval, showing the diversity of endpoints reached.

n_dest_asn: Number of unique destination Autonomous System Numbers (ASNs) contacted by the IP address during the aggregation interval, indicating the diversity of networks reached.

n_dest_port: Number of unique destination transport layer ports contacted by the IP address during the aggregation interval, representing the variety of services accessed.

tcp_udp_ratio_packets: Ratio of packets sent using TCP versus UDP by the IP address during the aggregation interval, providing insight into the transport protocol usage pattern. This metric belongs to the interval <0, 1> where 1 is when all packets are sent over TCP, and 0 is when all packets are sent over UDP.

tcp_udp_ratio_bytes: Ratio of bytes sent using TCP versus UDP by the IP address during the aggregation interval, highlighting the data volume distribution between protocols. This metric belongs to the interval <0, 1> with same rule as tcp_udp_ratio_packets.

dir_ratio_packets: Ratio of packet directions (inbound versus outbound) for the IP address during the aggregation interval, indicating the balance of traffic flow directions. This metric belongs to the interval <0, 1>, where 1 is when all packets are sent in the outgoing direction from the monitored IP address, and 0 is when all packets are sent in the incoming direction to the monitored IP address.

dir_ratio_bytes: Ratio of byte directions (inbound versus outbound) for the IP address during the aggregation interval, showing the data volume distribution in traffic flows. This metric belongs to the interval <0, 1> with the same rule as dir_ratio_packets.

avg_duration: Average duration of IP flows for the IP address during the aggregation interval, measuring the typical session length.

avg_ttl: Average Time To Live (TTL) of IP flows for the IP address during the aggregation interval, providing insight into the lifespan of packets.

Moreover, the time series created by re-aggregation contains following time series metrics instead of n_dest_ip, n_dest_asn, and n_dest_port:

sum_n_dest_ip: Sum of numbers of unique destination IP addresses.

avg_n_dest_ip: The average number of unique destination IP addresses.

std_n_dest_ip: Standard deviation of numbers of unique destination IP addresses.

sum_n_dest_asn: Sum of numbers of unique destination ASNs.

avg_n_dest_asn: The average number of unique destination ASNs.

std_n_dest_asn: Standard deviation of numbers of unique destination ASNs)

sum_n_dest_port: Sum of numbers of unique destination transport layer ports.

avg_n_dest_port: The average number of unique destination transport layer ports.

std_n_dest_port: Standard deviation of numbers of unique destination transport layer
h
monash_tsf
huggingface.co
Updated Jun 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monash University (2022). monash_tsf [Dataset]. https://huggingface.co/datasets/Monash-University/monash_tsf
Explore at:
Dataset updated
Jun 7, 2022
Dataset authored and provided by
Monash University
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Monash Time Series Forecasting Repository which contains 30+ datasets of related time series for global forecasting research. This repository includes both real-world and competition time series datasets covering varied domains.
m
COVID-19: Time Series Datasets India versus World
data.mendeley.com
narcis.nl
Updated Aug 30, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohit Salgotra (2020). COVID-19: Time Series Datasets India versus World [Dataset]. http://doi.org/10.17632/tmrs92j7pv.25
Explore at:
Unique identifier
https://doi.org/10.17632/tmrs92j7pv.25
Dataset updated
Aug 30, 2020
Authors
Rohit Salgotra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World, India
Description
This dataset consists of COVID-19 time series data of India since 24th March 2020. The data set is for all the States and Union Territories of India and is divided into five parts, including i) Confirmed cases; ii) Death Count; iii) Recovered Cases; iv) Temperature of that place; and v) Percentage humidity in the region. The data set also provides basic details of confirmed cases and death count for all the countries of the world updated daily since 30 January 2020. The end user can contact the corresponding author (Rohit Salgotra : nicresearchgroup@gmail.com) for more details. . [Dataset is updated Twice a Week]

The Authors can Refer to and CITE our latest Papers on COVID: 1. Rohit Salgotra, Mostafa Gandomi, Amir H Gandomi. "Evolutionary Modelling of the COVID-19 Pandemic in Fifteen Most Affected Countries" Chaos, Solitons & Fractals: (2020). https://doi.org/10.1016/j.chaos.2020.110118 2. Rohit Salgotra, Mostafa Gandomi, Amir H Gandomi. "Time Series Analysis and Forecast of the COVID-19 Pandemic in India using Genetic Programming" Chaos, Solitons & Fractals: (2020). https://doi.org/10.1016/j.chaos.2020.109945
m
Data from: Renewable Energy and Electricity Demand Time Series Dataset with...
data.mendeley.com
openenergyhub.ornl.gov
Updated Apr 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Rojas Ortega (2023). Renewable Energy and Electricity Demand Time Series Dataset with Exogenous Variables at 5-minute Interval [Dataset]. http://doi.org/10.17632/fdfftr3tc2.1
Explore at:
Unique identifier
https://doi.org/10.17632/fdfftr3tc2.1
Dataset updated
Apr 26, 2023
Authors
Sebastian Rojas Ortega
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The described database was created using data obtained from the California Independent System Operator (CAISO) and the National Renewable Energy Laboratory (NREL). All data was collected at five-minute intervals, and subsequently cleaned and modified to create a database comprising three time series: solar energy production, wind energy production, and electricity demand. The database contains 12 columns, including date, station (1: Winter, 2: Spring, 3: Summer, 4: Autumn), day of the week (0: Monday, ... , 6: Sunday), DHI (W/m2), DNI (W/m2), GHI (W/m2), wind speed (m/s), humidity (%), temperature (degrees), solar energy production (MW), wind energy production (MW), and electricity demand (MW).
Dynamical System Multivariate Time Series
zenodo.org
data.niaid.nih.gov
csv
Updated Jun 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith; Patrick Fleith (2024). Dynamical System Multivariate Time Series [Dataset]. http://doi.org/10.5281/zenodo.11526904
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11526904
Dataset updated
Jun 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick Fleith; Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Dynamical System Multivariate Time Series (DSMTS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system under fully nominal conditions (no outliers or anomalies).

The DSMTS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Multivariate Time Series Forecasting especially for industrial processes of complex systems:

Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

5 million timestamps. Sensors readings are at 1Hz sampling frequency.

Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.
lotsa_data
huggingface.co
Updated Aug 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salesforce (2025). lotsa_data [Dataset]. https://huggingface.co/datasets/Salesforce/lotsa_data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2025
Dataset provided by
Salesforce Inchttp://salesforce.com/
Authors
Salesforce
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
LOTSA Data

The Large-scale Open Time Series Archive (LOTSA) is a collection of open time series datasets for time series forecasting. It was collected for the purpose of pre-training Large Time Series Models. See the paper and codebase for more information.

Citation

If you're using LOTSA data in your research or applications, please cite it using this BibTeX: BibTeX: @article{woo2024unified, title={Unified Training of Universal Time Series Forecasting Transformers}… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/lotsa_data.
t
Monash Time Series Forecasting Repository - Dataset - LDM
service.tib.eu
Updated Oct 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Monash Time Series Forecasting Repository - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/monash-time-series-forecasting-repository
Explore at:
Dataset updated
Oct 17, 2023
Area covered
City of Monash
Description
All datasets contain univariate time series and they are available in a new format that we name as .tsf, pioneered by the sktime .ts format.
h
Beam-Level-Traffic-Timeseries-Dataset
huggingface.co
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HUAWEI Netop Team (2025). Beam-Level-Traffic-Timeseries-Dataset [Dataset]. https://huggingface.co/datasets/netop/Beam-Level-Traffic-Timeseries-Dataset
Explore at:
Dataset updated
Jun 18, 2025
Dataset authored and provided by
HUAWEI Netop Team
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📶 Beam-Level (5G) Time-Series Dataset

This dataset introduces a novel multivariate time series specifically curated to support research in enabling accurate prediction of KPIs across communication networks, as illustrated below:

Precise forecasting of network traffic is critical for optimizing network management and enhancing resource allocation efficiency. This task is of both practical and theoretical importance to researchers in networking and machine learning, offering a… See the full description on the dataset page: https://huggingface.co/datasets/netop/Beam-Level-Traffic-Timeseries-Dataset.
d
Introduction to Time Series Analysis for Hydrologic Data
dataone.org
beta.hydroshare.org
+1more
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriela Garcia; Kateri Salk (2021). Introduction to Time Series Analysis for Hydrologic Data [Dataset]. https://dataone.org/datasets/sha256%3Abeb9302f6cb5eee6fa9269c97b1b0f404cdfecd6b4b4767b2e3bd96919e2ad54
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Gabriela Garcia; Kateri Salk
Time period covered
Oct 1, 1974 - Jan 27, 2021
Area covered
Description
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on time series analysis.

Introduction

Time series are a special class of dataset, where a response variable is tracked over time. The frequency of measurement and the timespan of the dataset can vary widely. At its most simple, a time series model includes an explanatory time component and a response variable. Mixed models can include additional explanatory variables (check out the nlme and lme4 R packages). We will be covering a few simple applications of time series analysis in these lessons.

Opportunities

Analysis of time series presents several opportunities. In aquatic sciences, some of the most common questions we can answer with time series modeling are:

Has there been an increasing or decreasing trend in the response variable over time?

Can we forecast conditions in the future?

Challenges

Time series datasets come with several caveats, which need to be addressed in order to effectively model the system. A few common challenges that arise (and can occur together within a single dataset) are:

Autocorrelation: Data points are not independent from one another (i.e., the measurement at a given time point is dependent on previous time point(s)).

Data gaps: Data are not collected at regular intervals, necessitating interpolation between measurements. There are often gaps between monitoring periods. For many time series analyses, we need equally spaced points.

Seasonality: Cyclic patterns in variables occur at regular intervals, impeding clear interpretation of a monotonic (unidirectional) trend. Ex. We can assume that summer temperatures are higher.

Heteroscedasticity: The variance of the time series is not constant over time.

Covariance: the covariance of the time series is not constant over time. Many of these models assume that the variance and covariance are similar over the time-->heteroschedasticity.

Learning Objectives

After successfully completing this notebook, you will be able to:

Choose appropriate time series analyses for trend detection and forecasting

Discuss the influence of seasonality on time series analysis

Interpret and communicate results of time series analyses
One-step ahead forecasting of geophysical processes within a purely...
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgia Papacharalampous; Hristos Tyralis (2023). One-step ahead forecasting of geophysical processes within a purely statistical framework: Supplementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5357359.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5357359.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Georgia Papacharalampous; Hristos Tyralis
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Supplementary material for the paper entitled "One-step ahead forecasting of geophysical processes within a purely statistical framework"Abstract: The simplest way to forecast geophysical processes, an engineering problem with a widely recognised challenging character, is the so called “univariate time series forecasting” that can be implemented using stochastic or machine learning regression models within a purely statistical framework. Regression models are in general fast-implemented, in contrast to the computationally intensive Global Circulation Models, which constitute the most frequently used alternative for precipitation and temperature forecasting. For their simplicity and easy applicability, the former have been proposed as benchmarks for the latter by forecasting scientists. Herein, we assess the one-step ahead forecasting performance of 20 univariate time series forecasting methods, when applied to a large number of geophysical and simulated time series of 91 values. We use two real-world annual datasets, a dataset composed by 112 time series of precipitation and another composed by 185 time series of temperature, as well as their respective standardized datasets, to conduct several real-world experiments. We further conduct large-scale experiments using 12 simulated datasets. These datasets contain 24 000 time series in total, which are simulated using stochastic models from the families of Autoregressive Moving Average and Autoregressive Fractionally Integrated Moving Average. We use the first 50, 60, 70, 80 and 90 data points for model-fitting and model-validation and make predictions corresponding to the 51st, 61st, 71st, 81st and 91st respectively. The total number of forecasts produced herein is 2 177 520, among which 47 520 are obtained using the real-world datasets. The assessment is based on eight error metrics and accuracy statistics. The simulation experiments reveal the most and least accurate methods for long-term forecasting applications, also suggesting that the simple methods may be competitive in specific cases. Regarding the results of the real-world experiments using the original (standardized) time series, the minimum and maximum medians of the absolute errors are found to be 68 mm (0.55) and 189 mm (1.42) respectively for precipitation, and 0.23 °C (0.33) and 1.10 °C (1.46) respectively for temperature. Since there is an absence of relevant information in the literature, the numerical results obtained using the standardised real-world datasets could be used as rough benchmarks for the one-step ahead predictability of annual precipitation and temperature.
h
Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting...
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xu (2025). Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting [Dataset]. https://huggingface.co/datasets/Wenyan0110/Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting
Explore at:
Dataset updated
May 28, 2025
Authors
Xu
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The sp500stock_data_description.csv file provides detailed information on the existence of four modalities (text, image, time series, and table) for 4,213 S&P 500 stocks. The hs300stock_data_description.csv file provides detailed information on the existence of four modalities (text, image, time series, and table) for 858 HS 300 stocks.

If you find our research helpful, please cite our paper:

@article{xu2025finmultitime, title={FinMultiTime: A Four-Modal Bilingual Dataset for… See the full description on the dataset page: https://huggingface.co/datasets/Wenyan0110/Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting.
(for simple exercises) Time Series Forecasting
kaggle.com
Updated Apr 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bulent Siyah (2020). (for simple exercises) Time Series Forecasting [Dataset]. https://www.kaggle.com/bulentsiyah/for-simple-exercises-time-series-forecasting/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2020
Dataset provided by
Kaggle
Authors
Bulent Siyah
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Alcohol_Sales.csv: This dataset was taken from https://fred.stlouisfed.org/series/S4248SM144NCEN(old url https://fred.stlouisfed.org/series/.)

energydata_complete.csv: Experimental data used to create regression models of appliances energy use in a low energy building. Data Set Information: The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non predictive attributes (parameters). The original source of the dataset: http://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction
Forecasting Book Sales
kaggle.com
Updated May 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oscar Aguilar (2023). Forecasting Book Sales [Dataset]. https://www.kaggle.com/datasets/oscarm524/forecasting-book-sales
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Oscar Aguilar
Description
Because of the sheer number of products available, the German book market is one of the largest business trading today. In order to display a highly individual profile to customers and, at the same time, keep the effort involved in selecting and ordering as low as possible, the key to success for the bookshop therefore lies in the effective purchasing from a choice of roughly 96,000 new titles each year. The challenge for the bookseller is to buy the right amount of the right books at the right time.

It is with this in mind that this year’s DATA MINING CUP Competition will be held in cooperation with Libri, Germany’s leading book wholesaler. Among Libri’s many successful support measures for booksellers, purchase recommendations give the bookshop a competitive advantage. Accordingly, the DATA MINING CUP 2009 challenge will be to forecast of purchase quantities of a clearly defined title portfolio per location, using simulated data.

The Task

The task of the DATA MINING CUP Competition 2009 is to forecast purchase quantities for 8 titles for 2,418 different locations. In order to create the model, simulated purchase data from an additional 2,394 locations will be supplied. All data refers to a fixed period of time. The object is to forecast the purchase quantities of these 8 different titles for the 2,418 locations as exactly as possible.

The Data

There are two text files available to assist in solving the problem: dmc2009_train.txt (train data file) and dmc2009_forecast.txt (data of 2,418 locations for whom a prediction is to be made).

Acknowledgement

This data is publicly available in the data-mining-website.
u
Time series decomposition
researchdata.up.ac.za
txt
Updated Nov 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simamkele Mtsengu (2021). Time series decomposition [Dataset]. http://doi.org/10.25403/UPresearchdata.16883317.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25403/UPresearchdata.16883317.v1
Dataset updated
Nov 17, 2021
Dataset provided by
University of Pretoria
Authors
Simamkele Mtsengu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The research conducted using this univariate data set is on time series decomposition and a review of how to implement four decomposition methods namely: Classical decomposition, X11, Signal extraction in ARIMA time series(SEATS) and Seasonal trend decomposition procedure based on Loess(STL) decomposition. Following decomposition, forecasting with decomposition is implemented on the monthly electricity available for distribution to South Africa by Eskom time series data set. R Studio was used for the research. explain the components of a time series, moving averages, . Other data sets as well as those that are R built-in were used in the second section of the work, that is, to illustrate the components of a time series and moving averages. Following this the monthly electricity available for distribution to South Africa by Eskom time series data set was used for the third and fourth section of the research. That is, to implement the time series decomposition methods, analyze the random component of the methods, as well as to forecast with decomposition and to compute the forecast accuracy of four different forecasting methods.

Customer Sales Forecasting Dataset

kaggle.com

Updated Jun 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Sahil Islam007 (2025). Customer Sales Forecasting Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/custom-sales-forecasting-dataset/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 15, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sahil Islam007

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

🗂 Dataset Description Title: Custom Sales Forecasting Dataset

This dataset contains a synthetic yet realistic representation of product sales across multiple stores and time periods. It is designed for use in time series forecasting, retail analytics, or machine learning experiments focusing on demand prediction and inventory planning. Each row corresponds to daily sales data for a given product at a particular store, enriched with contextual information like promotions and holidays.

This dataset is ideal for:

Building and testing time series models (ARIMA, Prophet, LSTM, etc.)

Forecasting product demand

Evaluating store-level sales trends

Training machine learning models with tabular time series data

Column Name	Description
`order_id`	Unique identifier for the order placed by a customer.
`customer_id`	Unique identifier for the customer making the purchase.
`order_date`	Date on which the order was placed (`YYYY-MM-DD`).
`product_category`	Category of the product purchased (e.g., Sports, Home, Beauty).
`product_price`	Original price of a single unit of the product (before discount).
`quantity`	Number of units of the product ordered.
`payment_method`	Method used for payment (e.g., PayPal, Cash on Delivery).
`delivery_status`	Current delivery status of the order (e.g., Delivered, Pending).
`city`	City to which the order was delivered.
`state`	U.S. state where the customer is located.
`zipcode`	Postal code of the delivery location.
`product_id`	Unique identifier for the purchased product.
`discount_applied`	Fractional discount applied to the order (e.g., 0.20 for 20% off).
`order_value`	Total value of the order after discount (`product_price * quantity * (1 - discount_applied)`).
`review_rating`	Customer’s review rating of the order on a 1–5 scale.
`return_requested`	Boolean value indicating if the customer requested a return (`True`/`False`).

f
Data from: Enriching time series datasets using Nonparametric kernel...
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1609661.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Mohamad Ivan Fanany
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Tiana, store-sales-time-series-forecasting [Dataset]. https://huggingface.co/datasets/t4tiana/store-sales-time-series-forecasting

store-sales-time-series-forecasting

t4tiana/store-sales-time-series-forecasting

Explore at:

14 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Authors

Tiana

Description

taken from this Kaggle competition:

  Dataset Description

In this competition, you will predict sales for the thousands of product families sold at Favorita stores located in Ecuador. The training data includes dates, store and product information, whether that item was being promoted, as well as the sales numbers. Additional files include supplementary information that may be useful in building your models.

  File Descriptions and Data Field Information

train.csv… See the full description on the dataset page: https://huggingface.co/datasets/t4tiana/store-sales-time-series-forecasting.

Clear search

Close search

Google apps

Main menu

store-sales-time-series-forecasting

Hourly Sensor Data for Time Series Forecasting

Timeseries-PILE

Weather Long-term Time Series Forecasting

Weather Long-term Time Series Forecasting (2020)

Dataset Description

File Structure

Potential Use Cases

Credits

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly...

CESNET-TimeSeries24: The dataset for network traffic forecasting and anomaly detection

Time series

Data Records

monash_tsf

COVID-19: Time Series Datasets India versus World

Data from: Renewable Energy and Electricity Demand Time Series Dataset with...

Dynamical System Multivariate Time Series

lotsa_data

Monash Time Series Forecasting Repository - Dataset - LDM

Beam-Level-Traffic-Timeseries-Dataset

Introduction to Time Series Analysis for Hydrologic Data

One-step ahead forecasting of geophysical processes within a purely...

Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting...

(for simple exercises) Time Series Forecasting

Forecasting Book Sales

The Task

The Data

Acknowledgement

Time series decomposition

Customer Sales Forecasting Dataset

Data from: Enriching time series datasets using Nonparametric kernel...

store-sales-time-series-forecasting

t4tiana/store-sales-time-series-forecasting